ABSTRACT
It has previously been shown that, conditional on its fixation, the time to fixation of a semi-dominant deleterious autosomal mutation in a randomly mating population is the same as that of an advantageous mutation. This result implies that deleterious mutations could generate selective sweep-like effects. Although their fixation probabilities greatly differ, the much larger input of deleterious relative to beneficial mutations suggests that this phenomenon could be important. We here examine how the fixation of mildly deleterious mutations affects levels and patterns of polymorphism at linked sites - both in the presence and absence of interference amongst deleterious mutations - and how this class of sites may contribute to divergence between-populations and species. We find that, while deleterious fixations are unlikely to represent a significant proportion of outliers in polymorphism-based genomic scans within populations, minor shifts in the frequencies of deleterious mutations can influence the proportions of private variants and the value of FST after a recent population split. As sites subject to deleterious mutations are necessarily found in functional genomic regions, interpretations in terms of recurrent positive selection may require reconsideration.
INTRODUCTION
Among the most important results in theoretical population genetics, now nearly a century old, are the fixation probabilities of new beneficial and deleterious mutations, which were obtained by Fisher (1922, 1930), Haldane (1927) and Wright (1931), using different approaches. Their results were later generalized by Kimura (1957, 1962, 1964), using the backward diffusion equation. A somewhat lesser-known result concerns the trajectories of these selected mutations. Specifically, Maruyama & Kimura (1974) found that, conditional on fixation, the time that a beneficial autosomal mutation with selection coefficient +s and dominance coefficient h spends in a given interval of allele frequency in a randomly mating population is the same as that for a deleterious mutation with selection coefficient −s and dominance coefficient 1 − h, provided that the conditions for the validity of the diffusion equation approximation hold (i.e., the change in allele frequency of the mutation is small enough to be approximated by a continuous-time diffusion process). Thus, given that the effects of selective sweeps on variability at linked neutral sites are related to their speed of transit through a population (Maynard Smith & Haigh 1974; Stephan 2019), the fixation of a deleterious mutation by genetic drift can generate a similar selective sweep effect to that caused by the fixation of a beneficial mutation, for mutations with the same magnitude of selection coefficient. Moreover, Tajima (1990) demonstrated that, on average, there is a ~42% mean reduction in diversity at a site where a neutral mutation has recently become fixed by genetic drift. While the mean time to fixation for this class of mutation in diploids is well known to be 4Ne generations (Kimura & Ohta 1969), this is associated with a wide variance of approximately (Kimura 1970), such that neutral mutations may also fix relatively rapidly (in less than Ne generations) and generate an appreciable but highly localized sweep effect (see Tables 1 and 2 of Tajima 1990).
Of course, the probabilities of fixation of deleterious and beneficial fixations differ greatly. However, given that the input of deleterious mutations is much higher than the input of beneficial mutations each generation (see reviews by Eyre-Walker & Keightley 2007; Bank et al. 2014a), the potential contribution of such deleterious sweeps to levels and patterns of nucleotide variation, as well as divergence between populations and species, remains an important open question (Charlesworth 2020a). An alternative way of viewing this issue, as discussed by Gillespie (1994) and Charlesworth & Eyre-Walker (2007), is that under a model of constant selection and reversible mutation between two alternative nucleotide variants, statistical equilibrium with respect to the frequencies of sites fixed for the alternatives implies equal rates of beneficial and deleterious substitutions per unit time. It is important to note, however, that only deleterious mutations with selection coefficients on the order of the reciprocal of the population size have significant probabilities of fixation (Fisher 1930; Kimura 1964), implying that the substitutions concerned involve only very weakly selected mutations. The effects on diversity statistics of sweeps of very weakly selected mutations, including those of deleterious mutations, appears to have been investigated previously only by Mafessoni & Lachmann (2015).
A starting point for investigating this problem is the distribution of fitness effects of new mutations (the DFE). There is substantial evidence from both empirical and experimental studies that the DFE of new mutations is bimodal - consisting of a strongly deleterious mode, and a weakly deleterious / neutral mode that may contain a beneficial tail under certain conditions (e.g., Crow 1993; Lynch et al. 1999; Sanjuán 2010; Jacquier et al. 2013; Bank et al. 2014b). While the calculations and simulations presented below represent a general approach to addressing this topic, we have necessarily chosen a specific DFE realization and species for illustration. Specifically, Johri et al. (2020) recently presented an approximate Bayesian (ABC) approach, which represents the first joint estimator of the DFE shape together with population history, and which corrects for the effects of background selection (BGS; Charlesworth et al. 1993). They estimated that a substantial proportion of new mutations in coding regions have mildly deleterious effects on fitness, emphasizing the importance of further understanding the consequences of such mutations in dictating observed polymorphism and divergence. Furthermore, they found it unnecessary to invoke a beneficial mutational class in order to fit the data from the African population of Drosophila melanogaster that they considered.
This analysis provides a basis for exploring the possible implications of sweeps of deleterious mutations, a topic that has been neglected. Here, we re-examine this question, considering both single and recurrent substitution models, which we use to examine the possibility that genomic scans for positive selection may in fact also be identifying deleterious fixations, when based on: a) levels and patterns of variation; b) population differentiation; and c) species-level divergence. Our results suggest that, while this phenomenon is unlikely to be a major factor in polymorphism-based scans within populations, it may be a serious confounder in among-population-based analyses.
METHODS
Analytical calculations
For convenience, we assume a Wright-Fisher population of N randomly mating diploid individuals throughout the analytical section. This assumption of an absence of population structure is common and reasonably well-justified in organisms with low FST like D. melanogaster, although the investigation of these dynamics in the presence of structure would be of great value in the future. Further, the selection coefficient for homozygotes is constant over time and is denoted by s, with sa > 0 and sd > 0 representing selection for and against homozygotes, respectively. In addition, the DFE for semi-dominant deleterious mutations is assumed to be discrete, with four fixed classes of mutations given by 0 <2Nsd ≤ 1, 1 <Nsd ≤ 10, 10 <2Nsd≤ 100 and 100< 2Nsd ≤ 2N, where mutations are assumed to follow a uniform distribution within each class. These assumptions concerning the distribution of sd were made in order to simplify integration over the DFE (Johri et al. 2020; and see Johri et al. 2021).
Probability of fixation
The fixation probability (Pfix) of a new semi-dominant mutation with an initial frequency of 1/(2N) in a Wright-Fisher population of size N was calculated using Equation 10 of Kimura (1962):
Note that this equation assumes demographic equilibrium and independence between the selected sites (violations of this assumption are investigated below), as well as |s| << 1.
Contributions to divergence
Under the DFE model described above, the number of fixations Nfix (i) expected per generation per site for a given DFE class i of deleterious mutations is given by the following expression: where μ is the total mutation rate per site per generation, fi represents the proportion of new mutations belonging to the ith DFE class and sd2 and sd1 represent the upper and lower bounds to the DFE class, respectively. The integral in equation (2) was evaluated analytically by means of an infinite series representation (derivation provided in Appendix 1), which was validated using the “integrate” function in R (R Core Team 2018).
Probability of fixation when correcting for the effects of background selection (BGS)
The fixation probability (PfixB) of a new semi-dominant mutation with the homozygous selection coefficient s was calculated as: where B = Ne/N represents the effective population size in the presence of BGS, calculated from the corresponding ratio of neutral diversity with and without BGS (see, for example, Campos & Charlesworth 2019). The mean probability of fixation over an interval of selection coefficients, assuming that mutations are uniformly distributed in this interval (PfixB(γ1 − γ2)), was calculated as: where γ1 = 2NBs1, γ2 = 2NBs2 and γ2 and γ1 represent the upper and lower bound to the interval, respectively.
Waiting and fixation times
The waiting time (tw) between fixations under a Poisson process was calculated as follows: where L represents the number of functional sites under consideration.
In order to compare the results of simulations to theoretical expectations, the expected time to fixation of a semi-dominant mutation was also calculated by numerically integrating equation (17) of Kimura & Ohta (1969), using Simpson’s rule (Atkinson 1989).
Reduction in diversity due to a single sweep
The expected reduction in pairwise nucleotide site diversity at the end of a sweep (–Δπ), relative to the diversity in the absence of selection, was calculated using equation (14a) of Charlesworth (2020b) for a non-zero rate of recombination (omitting a factor that describes the effect of background selection): where Ts is the mean coalescent time (in units of N) for a pair of alleles sampled at the end of a sweep, Td is the duration of the deterministic phase of the sweep (i.e., excluding the initial and final stochastic phases) in units of N, Tr is the mean time to a recombination event during the sweep, conditioned on the occurrence of a recombination event; Pr is the probability of at least one recombination event transferring a sampled allele onto the wild-type background during the sweep, and Prs is the probability that there is only a single such recombination event. For large values of the ratio of the rate of recombination between the neutral and selected site (r) to the magnitude of the selection coefficient, this expression can become negative, in which case it is reset to zero.
We also used simulations based on equations (27) of Tajima (1990), which provide recursion equations for the expectation of the pairwise diversity at a neutral locus linked to a selected locus, conditional on a given trajectory of allele frequency change at the selected locus, as described by Charlesworth (2020b). These equations require only the validity of the diffusion equation approximation. Binomial sampling of allele frequencies post-selection in each generation was used to generate the trajectories of change at the selected locus, using the standard selection equation for a single locus to calculate the deterministic change in allele frequency each generation. Application of the recursion equations to a trajectory of allele-frequency change simulated in this way gives one realization of Δπ, and the overall expected value of Δπ can be obtained from the mean of the simulated values over a large number of replicates. It was found that 1000 replicates gave very accurate estimates of Δπ, with ratios of the standard errors to the means of < 5% for the parameter sets used here. Due to the symmetry in the sojourn time (conditional on fixation) in a given interval of allele frequency for beneficial and deleterious mutations when semi-dominance is assumed (Maruyama and Kimura 1974), trajectories for weakly deleterious mutations of selection coefficient sd were simulated assuming a beneficial selection coefficient sa. Note that this symmetry does not apply to partially dominant or partially recessive mutations – see Ewens (2004, pp. 170-171) for a full discussion.
Population genetic parameters used for analytical calculations
The parameters for the calculations were chosen to match those estimated from D. melanogaster populations, estimated from exonic sites. Mutations occurred at a rate μ per basepair (bp) per generation, and were assumed to be a mixture of neutral, nearly neutral and weakly deleterious mutations. For this analysis, μ = 3 × 10−9 (Keightley et al. 2009, 2014). The sex-averaged rate of crossing over per bp (rc) was assumed to be equal to 10−8 per generation, the mean value for D. melanogaster autosomes (Fiston-Lavier et al. 2010), and the effective population size was 106 (Arguello et al. 2019; Johri et al. 2020), and 10 generations per year were assumed. Given estimates of neutral divergence between D. melanogaster and D. simulans, this means that t ~21.3 × 106 generations elapsed since their common ancestor (Li et al. 1999; Halligan & Keightley 2006), corresponding to 2.13 million years.
Because non-crossover associated gene conversion is an important source of recombination between closely linked sites in Drosophila, it was assumed to occur uniformly across the genome, independently of local differences in the rate of crossing over, as indicated by the data on D, melanogaster (Comeron et al. 2012; Miller et al. 2016). The sex-averaged rate of initiation per bp of conversion events was rg = 10−8 per bp per generation, and there was an exponential distribution of tract length with a mean of dg = 440 bp (Comeron et al. 2012; Miller et al. 2016). The net rate of recombination between sites separated by z bp, r(z), is the sum of the contributions from crossing over and non-crossover gene conversion, given by the formula of Frisse et al. (2001):
For some of the results presented below, only the net rate of recombination r, or its value scaled by the effective population size, ρ = 2Nr, was used. In the presence of gene conversion with the parameters described above, the corresponding value of z can then be obtained by Newton-Raphson iteration of the equation r(z) − r = 0, assuming that the rate of gene conversion does not vary with the rate of crossing over,
Simulations
Simulating individual fixations in a single population
SLiM 3.3.1 (Haller & Messer 2019) was used to simulate a genomic element of length 10 kb. For the single sweep model, in order to quantify the hitchhiking effect of a single fixation, selection only acted on a single site in the middle of the region, with all other sites evolving neutrally. Simulations were performed for five different values of the scaled selection coefficient: 2Ns = 0, −1, +1, −5, and +5. Population genetic parameters resembling those of D. melanogaster were utilized (as defined above) for illustrative purposes. In order to perform simulations efficiently, population size was scaled down by a factor of 100 while the mutation and recombination rates were correspondingly scaled up by the same factor. Simulations were run for a burn-in period of 105 generations (10Nsim) after which a mutation with a scaled selection coefficient of 2Ns was introduced at the selected site. Simulations in which the introduced mutation reached fixation were retained for analysis. Fifty diploid individuals were sampled at the completion of the simulations to mimic generally-available population-genomic data, and population genetic summary statistics were calculated using Pylibseq (Thornton 2003).
Simulating fixations in a single population in the presence of other deleterious mutations
SLiM 3.1 (Haller & Messer 2019) was used to perform simulations resembling both Drosophila-like and human-like populations in order to assess the effects of Hill-Robertson interference (Hill & Robertson 1966) amongst deleterious mutations (reviewed by Charlesworth 2012). For parameters resembling the D. melanogaster populations, a ~15kb region which was composed of 2 genes, each with 5 exons (of length 300 bp) and 4 introns (of length 100 bp), and with intergenic lengths of 4kb was simulated. Intergenic and intronic regions experienced only effectively neutral mutations, while the exons experienced mutations with the DFE inferred by Johri et al. (2020). The population parameters were chosen to mimic the Zimbabwe population of D. melanogaster, with Ne = 1.95 × 106, mutation rate = 3.0 × 10−9 per site/generation and recombination rate = 1 × 10−8 per site/generation (Arguello et al. 2019).
For parameters resembling human populations, a ~60kb region composed of 2 genes, each with 5 exons (of length 300 bp) and 4 introns (of length 2 kb), and with intergenic lengths of 15kb was simulated. Note that these values are nearer to the median than the mean values of the distribution of lengths of genomic regions in humans, and provide a more conservative assessment of potential interference effects. Intergenic and intronic regions experienced only neutral mutations, while the exons experienced mutations with the DFE inferred by Huber et al. (2017), such that f0 = 0.51, f1 = 0.14, f2 = 0.14 and f3 = 0.21. The population parameters used were Ne = 104, mutation rate = 1.2 × 10−8 per site/generation and recombination rate = 1 × 10−8 per site/generation.
Simulations were conducted with N = 10,000 diploid individuals for both species, and mutation and recombination rates were scaled appropriately (by the factor 195 for D. melanogaster and 1 for humans). In order to estimate the rates of fixation of deleterious mutations and their effects on linked sites, the simulations were run for 100N generations (of which 10N generations represent the burn-in period) and were replicated 100 times for D. melanogaster and 600 times for humans. Our simulations resembling D. melanogaster populations resulted in a mean KA/KS value of 0.08, which is consistent with an observed mean autosomal value of 0.145 for a comparison with a closely related species (Campos et al. 2014), where 50% of substitutions are expected to be due to beneficial mutations.
The total number of observed substitutions (presented in Supp Table 1) were used to obtain the frequencies of fixation and times to fixation of deleterious mutations, and nucleotide site diversities at linked neutral sites immediately post-fixation, in order to quantify the potential effects of interference among selected mutations. The frequency of fixations of weakly deleterious mutations with scaled selection coefficients distributed between γ1 and γ2, Pfixsim(γ1 − γ2),)) where 1 ≤ γ1 < γ2 ≤ 10 (i.e., for a set of mutations that are a subset of the weakly deleterious class of mutations) was calculated as: where 2Nsstart = 1, 2Nsend = 10; and γ1 = 2Ns1, γ2 = 2Ns2 when not correcting for BGS, and γ1 = 2NBs1, γ2 = 2NBs2 when correcting for BGS. Again, B is the ratio of the expected nucleotide diversity at a neutral site in the presence of BGS to its value in the absence of BGS.
Simulating multiple populations and FST-based analyses
Based on a recently inferred demographic history of African and European populations of D. melanogaster using populations sampled from Beijing, the Netherlands, and Zimbabwe (Arguello et al. 2019), an ancestral population of size of 1.95 × 106 was simulated, which split into two populations (6.62 × 104 generations ago) of constant size: 3.91 × 106 and 4.73 × 105, representing African and European populations, respectively. Note that, although Arguello et al. inferred recent growth in both populations, we have assumed constant sizes in order to avoid confounding effects of such growth on the fixation probabilities of mutations, and we also assumed no migration between the two populations. For scenarios following the demographic model presented in Arguello et al. (2019), simulations were also performed for the extreme values of the 95% CIs for the current size of the Zimbabwe population (3.02 × 105, 4.69×106), the current size of the European population (2.03 × 105, 3.89×106), and the time of split (1.17×104, 1.03×105). Ten replicates of each combination of the above-mentioned values (i.e., 8 different combinations) were simulated in order to account for uncertainties in parameter estimates. For the purpose of FST analyses, a functional region of size 10 kb was simulated, in which mutations had selection coefficients given by the DFE inferred in Johri et al. (2020). Specifically, the DFE was given by a discrete distribution of four fixed bins with 24.7% of mutations belonging to the effectively neutral class (f0: 0 ≤ 2Nsd <1), 49.4% to the weakly deleterious class (f1: 1 ≤ 2Nsd< 10), 3.9% to the moderately deleterious class (f2: 10 ≤ 2Nsd < 100), and 21.9% to the strongly deleterious class of mutations (f3: 100 < 2Nsd ≤ 2N).
In this two-population framework, four separate scenarios were tested: 1) the DFE remained scaled to the ancestral population size (i.e., the distribution of selection coefficients remained constant throughout, making selection effectively weaker in the smaller derived (European) population); 2) the DFE was rescaled after the population split with respect to subpopulation-specific sizes (i.e., both populations experienced equal proportions of mutations belonging to each DFE class as defined in terms of 2Nsd), such that selection was equally strong in both populations - this is an arbitrary biological model that is simply chosen for comparison, as one would naturally expect selection to be effectively weaker in the derived population as in scenario 1; 3) in addition to this neutral and deleterious DFE, 1% of all mutations were mildly beneficial with selective effects drawn uniformly from the interval 1 ≤ 2Nsa ≤ 10, where sa is the increase in fitness of the mutant homozygote; and 4) in addition to this neutral and deleterious DFE, 1% of all mutations were strongly beneficial with 2Nsa = 1000. In scenarios 3 and 4, the deleterious and beneficial DFEs were scaled to the respective sizes of each population post-split, so that the proportions of beneficial mutations were the same in the two populations. Note that we refer to selection as being weak when 2Ns is less than 10. In order to assess the role of population bottlenecks in generating neutral outliers, a fifth scenario was simulated in which there was no selection and the demographic history was that inferred by Li & Stephan (2006) - a model that involves a much larger size reduction than inferred in the Arguello et al. (2019) model utilized above. The parameters of both demographic models are provided in Supp Table 2.
Fifty diploid individuals were sampled from both populations in order to calculate FST. All sites that would be considered polymorphic in the metapopulation were used to calculate FST (i.e., sites fixed either in one population or both populations (for different alleles) were also included in FST calculations). FST was calculated in sliding windows across the genomic region for: a) windows containing a constant number of SNPs (10 SNPs) using the package PopGenome (Pfeifer et al. 2014) in R, and b) for windows representing the same total number of bases (500 bp) using Pylibseq 0.2.3 (Thornton 2003). FST was calculated for both cases by the method of Hudson et al. (1992). FST was also calculated individually for different mutation types (i.e., for neutral, weakly deleterious and beneficial mutations, by simply restricting the calculations to segregating sites of the specific mutation type). Although there will be an upper bound to the FST values obtained in this way, which is determined by the frequency of the most frequent allele in the metapopulation (Jakobsson et al. 2013), the detection of outliers should not be affected by this procedure and should mimic the empirical practice. In order to evaluate the performance of haplotype-based population differentiation statistics, φST was calculated following Excoffier et al. (1992), by performing the Analysis of Molecular Variance (AMOVA) approach implemented in the R package Pegas (http://ape-package.ird.fr/pegas.htlm) on SNPs present in 500 bp, 1 kb and 2 kb sliding windows.
Simulating multiple species, and divergence-based analyses
McDonald-Kreitman (MK) tests (McDonald & Kreitman 1991) were performed to investigate the degree to which substitutions of mildly deleterious mutations might affect the inference of divergence due to positive selection. A population resembling a D. melanogaster African population was simulated under demographic equilibrium. Ten independent replicates of a 10 kb protein-coding region were considered such that every third position was neutral (representing synonymous sites) while all other sites represented nonsynonymous positions. Nonsynonymous sites experienced purifying selection given by a DFE that comprised only the non-neutral bins (i.e., f1, f2, and f3, in the same proportions described above), while the neutral sites experienced mutations that belonged to class f0. For comparison, simulations were performed in which 10% or 20% of nonsynonymous mutations were also neutral. In all cases, the simulation was run for 20N generations (where N = 104). The number of segregating nonsynonymous (PN) and synonymous (PS) sites were estimated by sampling 50 diploid individuals from the population from the functional and neutral regions respectively. The number of fixed substitutions occurring at the functional sites (DN) and neutral sites (DS) were calculated post burn-in (i.e., after 10N generations), and then rescaled to the number of generations since the D. melanogaster ancestor.
In order to correct for mildly deleterious mutations segregating in populations, the proportion of adaptive substitutions (α) was also inferred by implementing a variant of the test referred to as the asymptotic MK test. Messer & Petrov (2013) suggested plotting the derived allele frequency (x) of variants against α inferred using the number of segregating sites (P(x)) at that derived allele frequency (i.e., ), and showed that the asymptote of this curve would tend towards the true value of α. The asymptotic MK test was performed using a web-based tool available at: http://benhaller.com/messerlab/asymptoticMK.html (Haller & Messer 2017). For the purpose of the asymptotic MK test, values of PN/PS were binned with a bin size of 0.05, and the curve-fitting (of α(x) with respect to x) was restricted to derived allele frequencies between 0.1-1.0
Data Availability Statement
All scripts used in this study are provided in the Github repository: https://github.com/paruljohri/Deleterious_Sweeps.
RESULTS & DISCUSSION
Theoretical expectations
As has been long appreciated (Fisher 1930; Wright 1931), equation (1) implies that the probability of fixation of a mutation in the strongly deleterious DFE classes is vanishingly small. However, the weakly deleterious class may contribute substantially to divergence (Figure 1a). We derived an analytical approximation for the probability of fixation for mutations with a DFE represented as a combination of four non-overlapping uniform distributions (see Methods and Appendix 1). In a Wright-Fisher population, the ratio of the mean probability of fixation of mutations with fitness effects between 1 < 2Nsd ≤ 5, relative to the probability of fixation of effectively neutral mutations (0 < 2Nsd ≤ 1), is 0.27 (note that the value of N is irrelevant if the diffusion approximation holds). This ratio rapidly declines to 0.01 for mutations with fitness effects 5 < 2Nsd ≤ 10 (Figure 1a).
Taking the recently estimated DFE for the Zambian D. melanogaster population (Johri et al., 2020) as an example, a question of interest concerns the probabilities of fixation of different classes of mutations and their contributions to population- and species-level divergence (Supp Figure 1). In the absence of positive selection, the weakly deleterious class of mutations would be expected to contribute 7.2% of the total divergence in exonic regions, while effectively neutral mutations would be expected to contribute 92.8%. If we were to assume that approximately 50% of all substitutions in Drosophila have been fixed by positive selection (Eyre-Walker and Keightley 2009; Campos et al. 2017), weakly deleterious mutations are still likely to have contributed 3.5% of the total divergence in functional regions and possibly much more in regions experiencing reduced selective constraints.
Hitchhiking effects of deleterious sweeps: levels and patterns of variation
We next considered the fixation times of these mutations contributing to divergence, as well as the expected waiting time between fixations. Using population parameters relevant for D. melanogaster (see Methods), the neutral and deleterious DFE of Johri et al. (2020), and assuming that 60% of the D. melanogaster genome (of size 140 Mb) is functional, equation (5) shows that the genome-wide waiting time between successive weakly deleterious fixations is ~ 83 generations. Hence, the ~2.13 million years estimated for the time since the D. melanogaster and D. simulans split is expected to equate to many such fixations.
As expected, there is no significant difference (p = 0.16; Student’s t-test) in fixation times between the mildly beneficial (3.2N generations; SD: 1.5N) vs mildly deleterious mutations (2.9N generations; SD: 1.1N) with the same fitness effects (2Nsa=2Nsd=5) (Supp Table 3). Importantly however, the variance in fixation times of weakly selected mutations is extremely large, such that the faster tails of the fixation time distributions for both 2Nsd = 5 and 2Ns = 0 occupy ~N generations, which corresponds to a sweep effect of the same size as the mean for 2Nsa = 30 (Figure 1b). In other words, weakly deleterious and neutral fixations can match the sweep effects of comparatively strongly selected beneficial fixations.
We evaluated the impact of these fixations on observed genomic variation in two ways. The first corresponds to a model of a single sweep event, which is relevant to the literature on detecting signatures of individual fixations, as done in genomic scans for positively selected loci (e.g., Harr et al. 2002; Glinka et al. 2003; Haddrill et al. 2005; Nielsen et al. 2005; see also Jensen 2009; Stephan 2019). Such scans operate under the assumption that the selective sweeps in question have achieved fixation immediately prior to sampling, due to the rapid loss of signal as the time since fixation increases (Kim & Stephan 2002; Przeworski 2002). The second corresponds to a recurrent substitution model, which is relevant to the recurrent sweep literature that attempts to quantify the effect of selective sweeps on genome-wide patterns of variability and their relation to rates of recombination (e.g., Wiehe & Stephan 1993; Kim 2006; Andolfatto 2007; Macpherson et al. 2007; Jensen et al. 2008; Campos & Charlesworth 2019; Charlesworth 2020b; see review by Sella et al. 2009).
The reduction of diversity resulting from the fixation of a mildly deleterious or beneficial semi-dominant mutation with a very small selective effect is not expected to be substantially different from that caused by the fixation of a selectively neutral mutation (Tajima 1990). The widely-used approximation of Barton (2000) for the reduction in diversity relative to neutrality caused by the sweep of a semi-dominant mutation, Δπ= (2Nsa)(−4r/s), suggests that, when 2Nsa = 5, diversity will only be reduced by more than 20% in a region for which r/s ≤ 0.2, corresponding to approximately 20 bp for a typical D. melanogaster recombination rate of 3 × 10−8 per bp, including the contribution from gene conversion as given by equation (7), and assuming an effective population size of ≥ 106.
However, this formula assumes that fixations are so fast that swept alleles that have failed to recombine onto a wild-type background experience no coalescent events during the duration of the sweep (Hartfield & Bataillon 2020; Charlesworth 2020b), which is unlikely to be true for weakly selected mutations, such that NIsI is close to 1. We therefore used equation (6) for analytical predictions (Figure 2, Table 1, Supp Table 4), which is based on the results of Charlesworth (2020b). But this equation is also likely to be inaccurate with weak selection, because it assumes that the trajectory of allele frequency change is close to that for the deterministic case, except for the initial and final stochastic phases at the two extremes of allele frequencies. We therefore also used simulations based on equations (27) of Tajima (1990) to predict sweep effects, as described in the Methods.
With the full simulations of a 10 kb region, in which we assumed a population size of 106 and mutation rate of 3 × 10−9 /site/generation, with 2Nsa= 5 the nucleotide diversity 10 bp (~ρ = 0.2) around the selected site (i.e., 5 bp in both directions) was 0.0058 (SE: 0.0012), corresponding to a reduction of 52% below the neutral value. For 2Ns = 10, the 10-bp nucleotide diversity was 0.0083 (SE: 0.0017), corresponding to a reduction of 31%. The observed reduction in both cases almost fully recovers to the expected level under neutrality within 500 bp (ρ ≈ 10; Figure 2). A similar pattern is seen in Table 1 and Supp Table 4, which shows both the analytical predictions and those based on Tajima’s equations, which agree surprisingly well except at the two highest rates of recombination displayed. For comparison, the results of simulating neutral fixations are also shown in Table 1 and Figure 2. The case with Ns = 10 is at the higher limit of what is likely to be produced by sweeps of deleterious mutations, given that the ratio of the fixation probability given by equation (1) to the neutral value of 1/(2N) is then approximately 0.0045, compared to 0.034 with Ns = 5.
Mafessoni & Lachmann (2015) showed that that fixations of weakly selected, highly dominant favorable mutations - or highly recessive deleterious mutations - could reduce diversity at linked sites by a smaller amount than fixations of neutral mutations (with no recombination, these are associated with a diversity reduction of 42% below the mean neutral value), with a maximum effect when 2Ns is approximately 2 (and h = 1 for favorable mutations, and 0 for deleterious ones). We have confirmed this unexpected observation using Tajima algorithm simulations (Supp Table 5), finding that it exists even for 2Ns = 5 (Supp Figure 2; Supp Table 5). However, our use of h = 0.5 and 2Ns ≥ 2.5 means that this phenomenon is absent from the results presented above.
Because gene conversion is an important contributor to recombination between closely linked sites (Miller et al. 2016), the effects of fixations in the presence of gene conversion are restricted to a region that is about one-third of the distance in the absence of gene conversion (Table 1). Thus, a greater than 20% reduction in nucleotide diversity for 2Ns = 5 [2Ns = 10] was observed up to ρ = 0.8 [ρ = 1.6], corresponding to 14 bp [27 bp] with gene conversion, and 40 bp [80 bp] without gene conversion. This quite localized effect is similar for weakly beneficial, weakly deleterious, and neutral mutations.
Given the relatively faster mean speed (3.1N generations) of fixation of weakly selected semi-dominant mutations compared to the neutral expectation (4N generations), they should also result in small distortions of the SFS at closely linked neutral sites. We observed a slight skew towards rare variants (as measured by Tajima’s D, Supp Table 6) restricted to ~50 base pairs from the selected site immediately after fixation (for 2Nsd = 2Nsa = 5). This highly localized distortion of the SFS is probably too weak to play any important role in generating false positives in genomic scans. Indeed, owing to the inherent stochasticity involved in the underlying processes, such scans generally only have power to detect very strongly selected fixations (often requiring values of 2Nsa > 1000 in order to observe appreciable true positive rates: Crisci et al. 2013). Viewed in another way, given that the false-positive rates associated with genomic scans may often be inflated well above true-positive rates owing to the underlying demographic history of the population (e.g., Teshima et al. 2006; Thornton & Jensen 2007; Crisci et al. 2013; Harris et al. 2018), demography is probably a much stronger confounder than deleterious sweeps in polymorphism-based scans.
Finally, using a recurrent fixation model, we have examined the steady-state impact of weakly deleterious sweeps (Figure 3). Here we used a model of a gene with five exons of 100 codons each, with 70% of exonic mutations subject to selection, which were separated from each other by 100 bp introns, as described by Campos & Charlesworth (2019) and Charlesworth (2020b). Five equally large classes of sites subject to deleterious mutations were modeled, with the lowest scaled selection coefficient being 2Nsd = 2.5 and the largest 2Nsd = 10; as described above, a uniform distribution of 2Nsd values was assumed within each class. The theoretical predictions used the method for predicting the effects of recurrent sweeps of Charlesworth (2020b), based on Equation 6 above. For consistency with the simulations of population divergence described below, the population size was assumed to be 1.95 × 106, giving a neutral diversity of 0.0234 with a mutation rate of 3 × 10−9. Substitution rates of deleterious nonsynonymous mutations were calculated from equations (1) and (2). The expected number of nonsynonymous substitutions per gene over 2N generations was 0.506, and the ratio of nonsynonymous to synonymous substitutions was 0.0412, which is somewhat less than half the mean value for comparisons of D. melanogaster and its close relatives (e.g., Campos et al. 2017). The major source of these substitutions is the class with 2Nsd between 2.5 and 4.375, which accounts for 76% of all substitutions, reflecting the fact that mutations in this class have the highest fixation probabilities.
The results for single sweeps shown in Table 1 suggest that these theoretical predictions will tend to overestimate sweep effects, so that the results in Figure 3 must be viewed with caution. In addition, the model ignores the mean neutral diversity (π) during the progress of a sweep, considering only the contribution from intervals between sweeps. This provides accurate predictions of recurrent sweep effects with 2Nsd >> 1, since the sweep duration is then a relatively small fraction of the interval between sweeps, unless sweeps are extremely frequent (Charlesworth 2020b). However, this is not true with very weak selection, and the expectation of π during a sweep should, therefore, be included in a rigorous treatment. For weak sweeps, it is possible that this quantity could be larger than the expected diversity in the absence of selection, π0. This follows from the fact that π is reduced immediately after the fixation of a neutral mutation at a linked site, but such fixations cannot alter the net expected value of π. Since the post-fixation recovery can only bring π back up to π0, the expectation of π at linked sites over the course of the neutral fixation must exceed π0. This arises because, in the absence of recombination, the two alternative haplotypes associated with the two variants in transit to fixation can only coalesce at a time preceding the sweep, and hence have a coalescent time > 2N. By continuity, a similar effect must arise with sufficiently weak selection, although it is not seen when 2Nsd >> 1, when the expected value of π is reduced during a sweep (see Figure 8 of Zeng et al. 2021), so that the theory used here is likely to overestimate sweep effects. Furthermore, with such weak selection it is somewhat artificial to isolate the effects on π of fixations of linked deleterious mutations from the effects of their loss from the population (background selection), given the fact that there will be traffic backwards and forwards between alternative variants at a site that is subject to selection and reverse mutation (Gillespie 1994; Charlesworth & Eyre-Walker 2008).
With the standard D. melanogaster sex-averaged rate of crossing over per bp of 1 × 10−8, the mean reduction in nucleotide diversity at synonymous sites caused by deleterious sweeps was approximately 5% with gene conversion and 8% in its absence (Figure 3). For the lowest rates of crossing over, and with no gene conversion, average reductions can reach ~19%, suggesting that the fixation of mildly deleterious mutations could play a significant role in organisms or genomic regions with highly reduced rates of recombination. In this low recombination environment however, selective interference will become a factor, and should accelerate the rate of deleterious fixations, while also making deleterious variants behave more like neutral mutations (see below). Regardless of these details, however, it seems unlikely that substitutions of deleterious mutations will have more than a minor effect on average diversity in the Drosophila genome overall, particularly compared with the effects of population history, background selection, and sweeps of positively selected mutations. Furthermore, the findings of Mafessoni & Lachmann (2015) suggest that fixations of strongly recessive deleterious mutations will have even smaller effects than those studied here, and can even enhance variability (see Supp. Figure 2, resulting in associative overdominance effects; Charlesworth & Jensen 2021). A detailed study of the possible range of effects of both losses and fixations of weakly selected mutations is planned.
The effects of interference on deleterious sweeps with Drosophila- and human-like parameters
The relatively large input of deleterious mutations could result in interference amongst them, and in turn might affect their probabilities of fixation. In order to evaluate this possibility, genomic regions composed of two adjacent genes with 5 exons and 4 introns each, as well as intergenic regions (see Methods for details) were simulated. Two separate genomic regions were simulated resembling D. melanogaster and humans - both in terms of architecture and underlying parameter values - with exons experiencing purifying selection specified by the DFE estimated by Johri et al. (2020) for Drosophila, and that estimated by Huber et al. (2017) for humans. Because interference is more likely to occur in regions of lower recombination rates, additional simulations were carried out for 0.5× and 0.1× the mean recombination rate. Simulations were performed with N = 104 and all other parameters were scaled accordingly.
The frequencies of fixation of mildly deleterious mutations were found to be higher than expected with no interference, suggesting an appreciable role of interference (Supp Figure 3). However, the effective reduction in Ne due to background selection also affects the scaled selection coefficients (Barton 1995; Charlesworth 2012; Campos & Charlesworth 2019). After correcting for the effects of BGS (in which case, expected probabilities were obtained using equations (3) and (4)), the frequencies of fixations of mildly deleterious mutations in humans were found to be unaffected by interference (Supp Figure 4). While no interference effects were observed for regions corresponding to the mean recombination rate in Drosophila, for regions with 0.1× the mean recombination rate, the frequencies of fixation were higher than that expected in the absence of interference (Figure 4a).
Because of these interference effects in regions of low recombination, we further evaluated the times to fixation and the effects of sweeps on linked neutral sites. In both Drosophila and humans, the fixation times of deleterious mutations were found to be slightly lengthened in the presence of interference (Supp Figures 5 and 6) as compared to expected time to fixation obtained by numerically integrating Equation 17 of Kimura & Ohta (1969). This suggests that the reduction in Ne caused by BGS might result in mutations behaving more neutrally, thus taking longer to fix. We have however not accounted for BGS when obtaining expected times numerically. Although the decrease in nucleotide diversity at linked neutral sites remained qualitatively similar to that observed in the absence of interference for D. melanogaster (Figure 4b-d, Supp Figure 7), the reduction in mean neutral diversity post-fixation was somewhat more modest for very weakly deleterious alleles in the presence of interference. This smaller effect is probably due to BGS rather than interference per se, as suggested by results above. For recombination rates one-half and one-tenth of the mean, nucleotide diversity up to ρ~0.2 was 62% and 60% of the mean intergenic diversity, consistent with a ~40% reduction in variation. For human-like parameters, we similarly observed that the mean nucleotide diversity post-fixation of a weakly deleterious mutation (Figure 4e-g, Supp Figure 7) was very similar to that observed post-fixation of a neutral mutation (Table 1). In sum, for the parameter ranges investigated, we found that interference between weakly deleterious mutations resulted in only very minor deviations from expectations.
The contribution of deleterious mutations to population- and species-level divergence-based scans
Given the potentially substantial contribution of the weakly deleterious class to observed fixations, it is also of interest to consider their impact on divergence-based analyses (e.g., methodology related to dN/dS) and population differentiation-based scans (e.g., methodology related to FST). We examined properties related to inferring the proportion of substitutions fixed by positive selection (α) by performing MK tests in the presence of mildly deleterious mutations. Specifically, we simulated two scenarios: 1) 50% of mutations at nonsynonymous sites were weakly deleterious, and 50% were neutral, and 2) nonsynonymous sites experienced deleterious mutations that followed the DFE inferred by Johri et al. (2020).
Although, as expected, the presence of mildly deleterious mutations substantially increases values of dN/dS (Supp Table 7) relative to stronger purifying selection (i.e., larger proportions of moderately and strongly deleterious mutations), it also leads to strongly negative values of α when performing MK tests. This is due to the fact that mildly deleterious mutations often segregate in the population at low frequency, inflating the total number of segregating nonsynonymous polymorphisms (PN) significantly (Supp Table 7). As such, the presence of mildly deleterious mutations can result in negative values of α in the absence of positive selection. This is consistent with previous studies that have proposed a derived allele frequency cutoff (Fay et al. 2001, 2002; Andolfatto 2005; but see Charlesworth & Eyre-Walker 2008) to correct for segregating mildly deleterious alleles, as well as proposed modification of the traditional MK test (e.g., the asymptotic MK test; Messer & Petrov 2013). Nevertheless, under the asymptotic MK test, α remains underestimated (Supp Table 7) when the proportion of mildly deleterious mutations is sufficiently high.
In order to study inter-population effects, we simulated a model similar to that recently inferred by Arguello et al. (2019), which represents the European and African split of D. melanogaster (see Methods), and overlaid it with the estimated DFE of Johri et al. (2020). Under this model (i.e., in the absence of positive selection), roughly 50% of SNPs identified as FST outliers (defined as representing the upper 1% or 2.5% tails) are mildly deleterious (Figure 5a; Supp Figure 8). Two models of purifying selection are given for comparison (Table 2): 1) a more biologically realistic model in which selection coefficients are scaled to the ancestral population size in defining the DFE classes, such that selection is effectively weaker in the smaller derived population; and 2) an arbitrary model in which the DFE is rescaled such that selective effects are equally strong in the larger ancestral and the smaller derived populations (which, under the chosen demographic model, differ from one another by roughly an order of magnitude). It should be noted that in the model based on Arguello et al., the time post-split between the African and European population is extremely brief, such that there are few or no substitutions post-split (Supp Table 8). Thus, FST values are almost entirely dictated by allele frequency differences with respect to co-segregating mutations (Supp Table 8).
Mean FST values at both neutral and weakly deleterious sites are larger with the rescaled purifying selection model, whereas the unscaled model yields similar values to the purely neutral model (Table 1 and Supp Tables 9, 12). In addition, the frequency of private SNPs at all sites is higher for the rescaled model than the unscaled model, with the neutral model having the lowest value of the three (Supp Table 8). In order to determine whether these weakly deleterious mutations were associated with a decrease in diversity at linked sites, potentially leading to an increase in FST values (Charlesworth 1998; Cruickshank & Hahn 2014), we evaluated the relationship between FST and nucleotide diversity using neutral variants alone (Supp Figure 9; see also the comparison with directly selected sites in Supp Figures 10-11). The lack of a strong negative correlation suggests that sweep-like effects of deleterious mutations on diversity at linked sites are not primarily responsible for the observed effects on FST and the frequency of private alleles. In contrast, there is a negative relationship between nucleotide diversity and FST after the strong population bottleneck represented by the neutral model of Li & Stephan (2006).
The lack of sweep-like effects of deleterious mutations is not surprising in view of the time-scale to fixation required for weakly selected mutations, which is of the order of the coalescent time, 2N generations; the split times for both the African and European populations are both only a small fraction of their respective coalescent times. Thus, neither new neutral nor weakly deleterious mutations are likely to have reached fixation in either population, consistent with the results shown in the last column of Supp Table 8. In addition, there is no time for large changes in π as a result of the altered N values.
Instead, it is more likely that BGS effects on the frequencies of segregating mutations explain these patterns (B = 0.18 and 0.15 in the African and European populations, respectively, obtained from the ratios of neutral π values in the absence of purifying selection to those without, see Supp Table 9). In the absence of rescaling of the DFEs for deleterious mutations (the more biologically plausible case), the enhanced N for the African population means that fewer deleterious mutations behave as effectively neutral, so that there is less effect of drift on their frequencies compared with the ancestral population; the reverse is true for the European population. As far as linked neutral variants are concerned, there is likely to be a greater BGS effect in the African than in the ancestral population, and vice-versa for the European population. FST relative to the purely neutral case is thus subject to two opposing factors, which presumably explains the lack of any strong effect of this model of purifying selection on FST in Table 2 and Supp Table 9. However, the lower effective population size induced by BGS means that rare variants are more likely to be lost after the population split than under neutrality, explaining the increased proportion of private variants compared with neutrality with purifying selection (Supp Table 8). Because of the effects of BGS, the relative Ne values of the African and European populations are less disparate than the relative N values, so that the African/European ratio of the proportions of private SNPs is smaller than under neutrality (Supp Table 8).
The effect of rescaling is to keep the proportion of deleterious mutations that are effectively neutral the same in the two descendant populations, with the absolute strength of selection being higher in the African population, so that overall there is a stronger BGS effect in this population. This results in a higher overall fraction of private SNPs, and a larger enrichment of private SNPs in the African population, compared with the neutral case (Supp Table 8). There is a corresponding increase in mean FST for both neutral and weakly deleterious variants compared with the neutral case (Table 2 and Supp Table 9).
It is also of interest to consider the contribution of deleterious mutations to outliers (i.e., variants that are highly differentiated between the two populations) present in the tails of the distribution in the presence of positive selection. When a class of weakly positively selected sites was added to the DFE (i.e. beneficial mutations have fitness effects 1 < 2Nsa ≤ 10, and comprise 1% of new mutations), these adaptive mutations contributed little to the observed outliers (< 5%) - with mildly deleterious and neutral mutations strongly represented amongst outliers (Figure 5b). Conversely, when positive selection is very strong (i.e., mutations have selective effects with 2Nsa=1000, and comprise 1% of new mutations), the majority (~75%) of outliers are drawn from this beneficial class (Figure 5c). Yet, even in the presence of this exceptionally strong and frequent positive selection, ~10% of outliers remain in the mildly deleterious class. Upon simulating the same scenarios - but drawing from the extreme CIs of the demographic parameters to account for underlying uncertainty in the model of Arguello et al. (2019) - a very similar proportion of mildly deleterious mutations persists in the presence of weak positive selection (Supp Table 10). However, in the case of strong positive selection, the proportion of strongly beneficial outlier mutations ranges from 25% to 90% depending on the underlying population history (Supp Table 10). It is also noteworthy that under this model of strong, recurrent positive selection, little variation within populations is observed, owing to the severity of the selective sweeps (Supp Table 11), and inter-population allele frequencies are only weakly correlated with one another (Supp Table 8). A haplotype-based measure of population differentiation, φST (Excoffier et al. 1992), was also used to identify outliers (i.e., genomic regions present in the tails of the distribution) and was not found to differ substantially (Supp Table 12).
Because any model, including equilibrium neutrality, will have outliers based on empirical p-values, we further quantified the properties of outliers (2.5% and 1%) relative to the genome-wide mean. Under models of purifying selection, the values of FST outliers were ~2.3-4.5 fold larger than mean FST values (Table 2; Supp Table 13), but differed only slightly from the neutral case. Under the strongly bottlenecked neutral model of Li & Stephan (2006), in which the European population experiences a substantial decrease in population size during the bottleneck, the genome-wide mean FST values obtained are much higher, as would be expected, though outlier FST values were smaller in relative magnitude (~1.5-2.7 fold higher than the means). Under models including weak positive selection, only slight increases in genome-wide FST values were observed; whereas the strong positive selection model greatly increased genome-wide values. However, outlier values under both positive selection models were ~2.4-4.6 fold higher than the respective means. Similar results were obtained when using haplotype-based calculations of population differentiation (Supp Table 14). This suggests that recurrent positive selection does not generate substantially larger effect sizes for outlier FST values than neutrality or purifying selection.
CONCLUSIONS
In this paper, we have examined the expected impact of deleterious fixations on polymorphism- and divergence-based scans for selection. Amongst the class of weakly deleterious mutations that have some chance of reaching fixation (1 < 2Nsd ≤ 10), the resulting sweep effects are highly localized, as expected: on the order of a few dozen base pairs for the parameters considered here. This suggests that deleterious sweeps of this kind are unlikely to be detected in genomic scans based on localized deficits of variation or strongly skewed site frequency spectra. Given the theoretically expected symmetry between beneficial and deleterious sweeps (Maruyama & Kimura 1974; Mafessoni & Lachmann 2015; Charlesworth 2020a), this is expected, as common polymorphism-based methods generally have little power unless selection is exceptionally strong (e.g., Kim & Stephan 2002; Jensen et al. 2005; Crisci et al. 2013). However, our results suggest that studies that estimate the frequency and strength of classic selective sweeps using patterns of diversity around substitutions (Hernandez et al. 2011; Sattath et al. 2011; Elyashiv et al. 2016), which assume that reductions are entirely caused by the fixation of positively selected mutations, should take into account the effects of reductions caused by neutral and weakly deleterious substitutions.
Furthermore, for among-population comparisons based on FST, mildly deleterious mutations contribute significantly to observed outliers, even in the presence of positive selection, particularly in the case of a recent population size reduction. This appears to be the true regardless of whether selective effects are equally strong in both populations (Figure 5), or if selection is relaxed in the smaller derived population (Supp Figure 8). These observations further stress the important point that genomic outliers of a given statistical distribution do not necessarily represent positively selected loci. As such, the performance of outlier-based tests must be assessed on a case-by-case basis under an appropriate baseline model incorporating population history as well as direct and linked purifying selection effects, in order to determine the power and false-positive rates associated with the detection of beneficial alleles (Jensen et al. 2019). Moreover, because such deleterious effects are localized to functional sites (i.e., those genomic regions experiencing purifying selection), this may be particularly pernicious in the sense that this class of outlier will not fall in non-functional regions, where they are often attributed to demographic effects. Rather, owing to the common tendency of constructing biological narratives (true or otherwise) around functional outliers (Pavlidis et al. 2012), these results suggest that adaptive story-telling may arise from weakly deleterious outliers.
SUPPLEMENTARY MATERIAL
ACKNOWLEDGMENTS
The authors thank Laurent Excoffier for addressing queries about the AMOVA analyses, as well as Susanne Pfeifer and two anonymous reviewers for providing helpful comments on the manuscript. This work was funded by National Institutes of Health grant R01GM135899 and R35GM139383 to JDJ.
APPENDIX 1 Calculating the probability of fixation with uniformly distributed fitness effects
For a selection coefficient s << 1, the fixation probability of a deleterious mutation when Ne ≠ N is approximated by:
For large Ne, this should be a good approximation, since when 2Nes ≥ 10, the fixation probability is negligible, so that the contribution from the bin with 2Nes ≥ 10 can be ignored. For 2Nes < 10, s is sufficiently small that this equation is an accurate approximation.
The integral of this expression over a given interval of 2Nes values can be found as follows; for convenience, x is substituted for s and a for 2Ne. For a > 0, we need to evaluate the following indefinite integral:
Integration by parts give:
For a > 0, the logarithm can be expanded as a power series in exp(−ax), which can be integrated term by term: where Li is the polylogarithm. The final expression for the integral of equation (1) (provided in the Methods) is thus:
The contribution to the mean fixation probability from the interval si to si+1 is obtained by dividing this expression by (si+1− si).
For s0 = 0, equation (A4) is invalid. However, the integral between 0 and a small positive value of s, sε, can be found as follows. For 2Nes << 1, the initial integrand can be approximated by a−1(1−ax/2), so that equation 2 (see Methods) can be replaced with: and the indefinite integral of the fixation probability becomes:
The contribution to the integral of Pfix between s0 = 0 and s1 from the interval (0, sε) is thus:
The corresponding mean fixation probability over this interval is: This is slightly smaller than the neutral value, 1/(2N), as would be expected when 2Nes << 1.
Thus, for the interval (s0, s1) with s0 = 0, equation (A5b) should be used for the interval (0≤ s ≤ sε), and equation (A4) with integration limits sε and s1 for the remainder of the interval.