The effects of mutation and recombination rate heterogeneity on the inference of demography and the distribution of fitness effects

Disentangling the effects of demography and selection has remained a focal point of population genetic analysis. Knowledge about mutation and recombination is essential in this endeavour; however, despite clear evidence that both mutation and recombination rates vary across genomes, it is common practice to model both rates as fixed. In this study, we quantify how this unaccounted for rate heterogeneity may impact inference using common approaches for inferring selection (DFE-alpha, Grapes, and polyDFE) and/or demography (fastsimcoal2 and δaδi). We demonstrate that, if not properly modelled, this heterogeneity can increase uncertainty in the estimation of demographic and selective parameters and in some scenarios may result in mis-leading inference. These results highlight the importance of quantifying the fundamental evolutionary parameters of mutation and recombination prior to utilizing population genomic data to quantify the effects of genetic drift (i.e., as modulated by demographic history) and selection; or, at the least, that the effects of uncertainty in these parameters can and should be directly modelled in downstream inference.

S4: Summary statistics for a neutrally evolving population that has undergone population contraction 1Nancestral generations before sampling (whereby Ncurrent = 0.5Nancestral), under fixed and variable recombination and mutation rates.Data point represent the mean value, whilst error bars represent the standard deviation.θ = 2Nμ and ρ = 2Nr, where N is the ancestral population size (Nancestral), μ is the mutation rate, and r is the rate of recombination.Variable rates are drawn from a uniform distribution, such that the mean rate across each simulation replicate was equal to the fixed rate, to enable fair comparisons (see the Materials and Methods section for further details).S5: Summary statistics for a neutrally evolving population that has undergone population contraction 1Nancestral generations before sampling (whereby Ncurrent = 0.01Nancestral), under fixed and variable recombination and mutation rates.Data point represent the mean value, whilst error bars represent the standard deviation.θ = 2Nμ and ρ = 2Nr, where N is the ancestral population size (Nancestral), μ is the mutation rate, and r is the rate of recombination.Variable rates are drawn from a uniform distribution, such that the mean rate across each simulation replicate was equal to the fixed rate, to enable fair comparisons (see the Materials and Methods section for further details).S7: Demographic inference results for a single equilibrium population under multiple distributions of fitness effects (DFEs), with exonic regions unmasked.Top row depicts the 6 discrete DFEs that were simulated.Exonic mutations were drawn from a DFE comprised of four fixed classes (Johri et al. 2020), whose frequencies were denoted by fi: f0 with 0 ≤ 2Nancestral s < 1 (i.e., effectively neutral mutations), f1 with 1 ≤ 2Nancestral s < 10 (i.e., weakly deleterious mutations), f2 with 10 ≤ 2Nancestral s < 100 (i.e., moderately deleterious mutations), and f3 with 100 ≤ 2Nancestral s (i.e., strongly deleterious mutations), where Nancestral was the initial population size and s was the reduction in fitness of the mutant homozygote relative to wild-type.Middle and bottom rows show inference results for fastsimcoal2 and dadI, respectively.All inference values are scaled by the true value.As such, a value of 1 on the Y-axis indicates that the inferred parameter value matches the true parameter value (as shown by the black dashed line).For the equilibrium population model, fastsimcoal2 infers the Ncurrent parameter, which is the population size at time of sampling, whilst dadi infers Nuopt , which is the population size at time of sampling relative to the initial population size.Data points represent the mean inference value across 10 replicates, whilst error bars represent the standard deviation.θ = 2Nμ and ρ = 2Nr, where N is the ancestral population size (Nancestral), μ is the per-site per-generation mutation rate, and r is the per-site per-generation recombination rate.Variable rates were drawn from a uniform distribution, such that the mean rate across each simulation replicate was equal to the fixed rate, to enable fair comparisons (see the Materials and Methods section for further details).S8: Summary statistics for an equilibrium population simulated under 6 different distribution of fitness effects (DFEs), under fixed and variable recombination and mutation rates, with exonic regions masked.Exonic mutations were drawn from a DFE comprised of four fixed classes (Johri et al. 2020), whose frequencies were denoted by fi: f0, with 0 ≤ 2Nancestrals < 1 (i.e., effectively neutral mutations), f1, with 1 ≤ 2Nancestrals < 10 (i.e., weakly deleterious mutations), f2, with 10 ≤ 2Nancestrals < 100 (i.e., moderately deleterious mutations), and f3, with 100 ≤ 2Nancestrals < 2Nancestral (i.e., strongly deleterious mutations), where Nancestral was the initial population size and s was the reduction in fitness of the mutant homozygote relative to wild-type.These summary statistics correspond to demographic inference in Figure 2 in main text.Data point represent the mean value, whilst error bars represent the standard deviation.θ = 2Nμ and ρ = 2Nr, where N is the ancestral population size (Nancestral), μ is the mutation rate, and r is the rate of recombination.Variable rates are drawn from a uniform distribution, such that the mean rate across each simulation replicate was equal to the fixed rate, to enable fair comparisons (see the Materials and Methods section for further details).S9: Summary statistics for an equilibrium population simulated under 6 different distribution of fitness effects (DFEs), under fixed and variable recombination and mutation rates, with no masking.Exonic mutations were drawn from a DFE comprised of four fixed classes (Johri et al. 2020), whose frequencies were denoted by fi: f0, with 0 ≤ 2Nancestrals < 1 (i.e., effectively neutral mutations), f1, with 1 ≤ 2Nancestrals < 10 (i.e., weakly deleterious mutations), f2, with 10 ≤ 2Nancestrals < 100 (i.e., moderately deleterious mutations), and f3, with 100 ≤ 2Nancestrals < 2Nancestral (i.e., strongly deleterious mutations), where Nancestral was the initial population size and s was the reduction in fitness of the mutant homozygote relative to wild-type.These summary statistics correspond to demographic inference in Supplementary Figure S7 in main text.Data point represent the mean value, whilst error bars represent the standard deviation.θ = 2Nμ and ρ = 2Nr, where N is the ancestral population size (Nancestral), μ is the mutation rate, and r is the rate of recombination.Variable rates are drawn from a uniform distribution, such that the mean rate across each simulation replicate was equal to the fixed rate, to enable fair comparisons (see the Materials and Methods section for further details).S10 S10: Demographic inference results for a population contraction (such that Ncurrent = 0.5Nancestral) under fixed and variable recombination and mutation rates, across the 6 simulated distributions of fitness effects (DFEs), with exonic regions masked.Population size change occurred 1Ncurrent generations before sampling, where Ncurrent is the population size at time of sampling.Exonic mutations were drawn from a DFE comprised of four fixed classes (Johri et al. 2020), whose frequencies were denoted by fi: f0 with 0 ≤ 2Nancestral s < 1 (i.e., effectively neutral mutations), f1 with 1 ≤ 2Nancestral s < 10 (i.e., weakly deleterious mutations), f2 with 10 ≤ 2Nancestral s < 100 (i.e., moderately deleterious mutations), and f3 with 100 ≤ 2Nancestral s (i.e., strongly deleterious mutations), where Nancestral was the initial population size and s was the reduction in fitness of the mutant homozygote relative to wild-type.fastsimcoal2 infers three parameters: Ncurrent; Nancestral; and τancestral (the time of population size change in Ncurrent generations).dadI infers two parameters: Nuopt (the population size at time of sampling relative to the initial population size); and τopt (the time of population size change relative to the initial population size).Data points represent the mean inference value across 10 replicates, whilst error bars represent the standard deviation.θ = 2Nμ and ρ = 2Nr, where N is the ancestral population size (Nancestral), μ is the per-site per-generation mutation rate, and r is the per-site per-generation recombination rate.Variable rates are drawn from a uniform distribution, such that the mean rate across each simulation replicate was equal to the fixed rate, to enable fair comparisons (see the Materials and Methods section for further details).S11: Demographic inference results for a population contraction (such that Ncurrent = 0.01Nancestral) under fixed and variable recombination and mutation rates, across the 6 simulated distributions of fitness effects (DFEs), with exonic regions masked.Population size change occurred 1Ncurrent generations before sampling, where Ncurrent is the population size at time of sampling.Exonic mutations were drawn from a DFE comprised of four fixed classes (Johri et al. 2020), whose frequencies were denoted by fi: f0 with 0 ≤ 2Nancestral s < 1 (i.e., effectively neutral mutations), f1 with 1 ≤ 2Nancestral s < 10 (i.e., weakly deleterious mutations), f2 with 10 ≤ 2Nancestral s < 100 (i.e., moderately deleterious mutations), and f3 with 100 ≤ 2Nancestral s (i.e., strongly deleterious mutations), where Nancestral was the initial population size and s was the reduction in fitness of the mutant homozygote relative to wild-type.fastsimcoal2 infers three parameters: Ncurrent; Nancestral; and τancestral (the time of population size change in Ncurrent generations).dadI infers two parameters: Nuopt (the population size at time of sampling relative to the initial population size); and τopt (the time of population size change relative to the initial population size).Data points represent the mean inference value across 10 replicates, whilst error bars represent the standard deviation.θ = 2Nμ and ρ = 2Nr, where N is the ancestral population size (Nancestral), μ is the per-site per-generation mutation rate, and r is the per-site per-generation recombination rate.Variable rates are drawn from a uniform distribution, such that the mean rate across each simulation replicate was equal to the fixed rate, to enable fair comparisons (see the Materials and Methods section for further details).S12: Summary statistics for a population expansion (such that Ncurrent = 2Nancestral) simulated under 6 different distribution of fitness effects (DFEs), under fixed and variable recombination and mutation rates, with exonic regions masked.Exonic mutations were drawn from a DFE comprised of four fixed classes (Johri et al. 2020), whose frequencies were denoted by fi: f0, with 0 ≤ 2Nancestrals < 1 (i.e., effectively neutral mutations), f1, with 1 ≤ 2Nancestrals < 10 (i.e., weakly deleterious mutations), f2, with 10 ≤ 2Nancestrals < 100 (i.e., moderately deleterious mutations), and f3, with 100 ≤ 2Nancestrals < 2Nancestral (i.e., strongly deleterious mutations), where Nancestral was the initial population size and s was the reduction in fitness of the mutant homozygote relative to wild-type.These summary statistics correspond to demographic inference in Figure 3 in main text.Data point represent the mean value, whilst error bars represent the standard deviation.θ = 2Nμ and ρ = 2Nr, where N is the ancestral population size (Nancestral), μ is the mutation rate, and r is the rate of recombination.Variable rates are drawn from a uniform distribution, such that the mean rate across each simulation replicate was equal to the fixed rate, to enable fair comparisons (see the Materials and Methods section for further details).S13: Summary statistics for a population contraction (such that Ncurrent = 0.5Nancestral) simulated under 6 different distribution of fitness effects (DFEs), under fixed and variable recombination and mutation rates, with exonic regions masked.Exonic mutations were drawn from a DFE comprised of four fixed classes (Johri et al. 2020), whose frequencies were denoted by fi: f0, with 0 ≤ 2Nancestrals < 1 (i.e., effectively neutral mutations), f1, with 1 ≤ 2Nancestrals < 10 (i.e., weakly deleterious mutations), f2, with 10 ≤ 2Nancestrals < 100 (i.e., moderately deleterious mutations), and f3, with 100 ≤ 2Nancestrals < 2Nancestral (i.e., strongly deleterious mutations), where Nancestral was the initial population size and s was the reduction in fitness of the mutant homozygote relative to wild-type.Data point represent the mean value, whilst error bars represent the standard deviation.θ = 2Nμ and ρ = 2Nr, where N is the ancestral population size (Nancestral), μ is the mutation rate, and r is the rate of recombination.Variable rates are drawn from a uniform distribution, such that the mean rate across each simulation replicate was equal to the fixed rate, to enable fair comparisons (see the Materials and Methods section for further details).S14: Summary statistics for a population contraction (such that Ncurrent = 0.01Nancestral) simulated under 6 different distribution of fitness effects (DFEs), under fixed and variable recombination and mutation rates, with exonic regions masked.Exonic mutations were drawn from a DFE comprised of four fixed classes (Johri et al. 2020), whose frequencies were denoted by fi: f0, with 0 ≤ 2Nancestrals < 1 (i.e., effectively neutral mutations), f1, with 1 ≤ 2Nancestrals < 10 (i.e., weakly deleterious mutations), f2, with 10 ≤ 2Nancestrals < 100 (i.e., moderately deleterious mutations), and f3, with 100 ≤ 2Nancestrals < 2Nancestral (i.e., strongly deleterious mutations), where Nancestral was the initial population size and s was the reduction in fitness of the mutant homozygote relative to wild-type.Data point represent the mean value, whilst error bars represent the standard deviation.θ = 2Nμ and ρ = 2Nr, where N is the ancestral population size (Nancestral), μ is the mutation rate, and r is the rate of recombination.Variable rates are drawn from a uniform distribution, such that the mean rate across each simulation replicate was equal to the fixed rate, to enable fair comparisons (see the Materials and Methods section for further details).S15: Demographic inference results for a population expansion (such that Ncurrent = 2Nancestral) under fixed and variable recombination and mutation rates, across the 6 simulated distributions of fitness effects (DFEs), with exonic regions unmasked.Population size change occurred 1Ncurrent generations before sampling, where Ncurrent is the population size at time of sampling.Exonic mutations were drawn from a DFE comprised of four fixed classes (Johri et al. 2020), whose frequencies were denoted by fi: f0 with 0 ≤ 2Nancestral s < 1 (i.e., effectively neutral mutations), f1 with 1 ≤ 2Nancestral s < 10 (i.e., weakly deleterious mutations), f2 with 10 ≤ 2Nancestral s < 100 (i.e., moderately deleterious mutations), and f3 with 100 ≤ 2Nancestral s (i.e., strongly deleterious mutations), where Nancestral was the initial population size and s was the reduction in fitness of the mutant homozygote relative to wild-type.Fastsimcoal2 infers three parameters: Ncurrent; Nancestral; and τancestral (the time of population size change in Ncurrent generations).dadI infers two parameters: Nuopt (the population size at time of sampling relative to the initial population size); and τopt (the time of population size change relative to the initial population size).Data points represent the mean inference value across 10 replicates, whilst error bars represent the standard deviation.θ = 2Nμ and ρ = 2Nr, where N is the ancestral population size (Nancestral), μ is the per-site per-generation mutation rate, and r is the per-site per-generation recombination rate.Variable rates are drawn from a uniform distribution, such that the mean rate across each simulation replicate was equal to the fixed rate, to enable fair comparisons (see the Materials and Methods section for further details).S16: Demographic inference results for a population contraction (such that Ncurrent = 0.5Nancestral) under fixed and variable recombination and mutation rates, across the 6 simulated distributions of fitness effects (DFEs), with exonic regions unmasked.Population size change occurred 1Ncurrent generations before sampling, where Ncurrent is the population size at time of sampling.Exonic mutations were drawn from a DFE comprised of four fixed classes (Johri et al. 2020), whose frequencies were denoted by fi: f0 with 0 ≤ 2Nancestral s < 1 (i.e., effectively neutral mutations), f1 with 1 ≤ 2Nancestral s < 10 (i.e., weakly deleterious mutations), f2 with 10 ≤ 2Nancestral s < 100 (i.e., moderately deleterious mutations), and f3 with 100 ≤ 2Nancestral s (i.e., strongly deleterious mutations), where Nancestral was the initial population size and s was the reduction in fitness of the mutant homozygote relative to wild-type.Fastsimcoal2 infers three parameters: Ncurrent; Nancestral; and τancestral (the time of population size change in Ncurrent generations).dadI infers two parameters: Nuopt (the population size at time of sampling relative to the initial population size); and τopt (the time of population size change relative to the initial population size).Data points represent the mean inference value across 10 replicates, whilst error bars represent the standard deviation.θ = 2Nμ and ρ = 2Nr, where N is the ancestral population size (Nancestral), μ is the per-site per-generation mutation rate, and r is the per-site per-generation recombination rate.Variable rates are drawn from a uniform distribution, such that the mean rate across each simulation replicate was equal to the fixed rate, to enable fair comparisons (see the Materials and Methods section for further details).

S21:
Comparison of true distribution of fitness effects (DFEs) and DFE inference with Grapes when recombination and mutation rates are fixed and/or variable, under our six DFE models for equilibrium population simulations.Exonic mutations were drawn from a DFE comprised of four fixed classes (Johri et al. 2020), whose frequencies were denoted by fi: f0, with 0 ≤ 2Nancestrals < 1 (i.e., effectively neutral mutations), f1, with 1 ≤ 2Nancestrals < 10 (i.e., weakly deleterious mutations), f2, with 10 ≤ 2Nancestrals < 100 (i.e., moderately deleterious mutations), and f3, with 100 ≤ 2Nancestrals < 2Nancestral (i.e., strongly deleterious mutations), where Nancestral was the initial population size and s was the reduction in fitness of the mutant homozygote relative to wild-type.θ = 2Nμ and ρ = 2Nr, where N is the ancestral population size (Nancestral), μ is the mutation rate, and r is the rate of recombination.Variable rates are drawn from a uniform distribution, such that the mean rate across each simulation replicate was equal to the fixed rate, to enable fair comparisons (see the Materials and Methods section for further details).

S22:
Comparison of true distribution of fitness effects (DFEs) and DFE inference with DFEalpha when recombination and mutation rates are fixed and/or variable, under our six DFE models for population expansion simulations (such that Ncurrent = 2Nancestral).Exonic mutations were drawn from a DFE comprised of four fixed classes (Johri et al. 2020), whose frequencies were denoted by fi: f0, with 0 ≤ 2Nancestrals < 1 (i.e., effectively neutral mutations), f1, with 1 ≤ 2Nancestrals < 10 (i.e., weakly deleterious mutations), f2, with 10 ≤ 2Nancestrals < 100 (i.e., moderately deleterious mutations), and f3, with 100 ≤ 2Nancestrals < 2Nancestral (i.e., strongly deleterious mutations), where Nancestral was the initial population size and s was the reduction in fitness of the mutant homozygote relative to wild-type.θ = 2Nμ and ρ = 2Nr, where N is the ancestral population size (Nancestral), μ is the mutation rate, and r is the rate of recombination.Variable rates are drawn from a uniform distribution, such that the mean rate across each simulation replicate was equal to the fixed rate, to enable fair comparisons (see the Materials and Methods section for further details).

S24:
Comparison of true distribution of fitness effects (DFEs) and DFE inference with DFEalpha when recombination and mutation rates are fixed and/or variable, under our six DFE models for population contraction simulations (such that Ncurrent = 0.5Nancestral).Exonic mutations were drawn from a DFE comprised of four fixed classes (Johri et al. 2020), whose frequencies were denoted by fi: f0, with 0 ≤ 2Nancestrals < 1 (i.e., effectively neutral mutations), f1, with 1 ≤ 2Nancestrals < 10 (i.e., weakly deleterious mutations), f2, with 10 ≤ 2Nancestrals < 100 (i.e., moderately deleterious mutations), and f3, with 100 ≤ 2Nancestrals < 2Nancestral (i.e., strongly deleterious mutations), where Nancestral was the initial population size and s was the reduction in fitness of the mutant homozygote relative to wild-type.θ = 2Nμ and ρ = 2Nr, where N is the ancestral population size (Nancestral), μ is the mutation rate, and r is the rate of recombination.Variable rates are drawn from a uniform distribution, such that the mean rate across each simulation replicate was equal to the fixed rate, to enable fair comparisons (see the Materials and Methods section for further details).

S25:
Comparison of true distribution of fitness effects (DFEs) and DFE inference with Grapes when recombination and mutation rates are fixed and/or variable, under our six DFE models for population contraction simulations (such that Ncurrent = 0.5Nancestral).Exonic mutations were drawn from a DFE comprised of four fixed classes (Johri et al. 2020), whose frequencies were denoted by fi: f0, with 0 ≤ 2Nancestrals < 1 (i.e., effectively neutral mutations), f1, with 1 ≤ 2Nancestrals < 10 (i.e., weakly deleterious mutations), f2, with 10 ≤ 2Nancestrals < 100 (i.e., moderately deleterious mutations), and f3, with 100 ≤ 2Nancestrals < 2Nancestral (i.e., strongly deleterious mutations), where Nancestral was the initial population size and s was the reduction in fitness of the mutant homozygote relative to wild-type.θ = 2Nμ and ρ = 2Nr, where N is the ancestral population size (Nancestral), μ is the mutation rate, and r is the rate of recombination.Variable rates are drawn from a uniform distribution, such that the mean rate across each simulation replicate was equal to the fixed rate, to enable fair comparisons (see the Materials and Methods section for further details).

S26:
Comparison of true distribution of fitness effects (DFEs) and DFE inference with DFEalpha when recombination and mutation rates are fixed and/or variable, under our six DFE models for population contraction simulations (such that Ncurrent = 0.01Nancestral).Exonic mutations were drawn from a DFE comprised of four fixed classes (Johri et al. 2020), whose frequencies were denoted by fi: f0, with 0 ≤ 2Nancestrals < 1 (i.e., effectively neutral mutations), f1, with 1 ≤ 2Nancestrals < 10 (i.e., weakly deleterious mutations), f2, with 10 ≤ 2Nancestrals < 100 (i.e., moderately deleterious mutations), and f3, with 100 ≤ 2Nancestrals < 2Nancestral (i.e., strongly deleterious mutations), where Nancestral was the initial population size and s was the reduction in fitness of the mutant homozygote relative to wild-type.θ = 2Nμ and ρ = 2Nr, where N is the ancestral population size (Nancestral), μ is the mutation rate, and r is the rate of recombination.Variable rates are drawn from a uniform distribution, such that the mean rate across each simulation replicate was equal to the fixed rate, to enable fair comparisons (see the Materials and Methods section for further details).