Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Revisiting the notion of deleterious sweeps

Parul Johri, Brian Charlesworth, Emma K. Howell, Michael Lynch, Jeffrey D. Jensen
doi: https://doi.org/10.1101/2020.11.16.385666
Parul Johri
1School of Life Sciences, Arizona State University, Tempe, AZ 85287, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brian Charlesworth
2Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, EH9 3FL, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Emma K. Howell
1School of Life Sciences, Arizona State University, Tempe, AZ 85287, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael Lynch
1School of Life Sciences, Arizona State University, Tempe, AZ 85287, United States
3Center for Mechanisms of Evolution, The Biodesign Institute, Arizona State University, Tempe, AZ, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: jeffrey.d.jensen@asu.edu
Jeffrey D. Jensen
1School of Life Sciences, Arizona State University, Tempe, AZ 85287, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: jeffrey.d.jensen@asu.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

ABSTRACT

It has previously been shown that, conditional on its fixation, the time to fixation of a semi-dominant deleterious autosomal mutation in a randomly mating population is the same as that of an advantageous mutation. This result implies that deleterious mutations may generate selective sweep effects. Although their fixation probabilities greatly differ, the much larger input of deleterious relative to beneficial mutations suggests that this phenomenon could be important. We here examine how the fixation of mildly deleterious mutations affects levels and patterns of polymorphism at linked sites, and how this class of sites may contribute to divergence between-populations and species. We find that, while deleterious sweeps are unlikely to represent a significant proportion of outliers in polymorphism-based genomic scans within populations, minor shifts in the frequencies of deleterious mutations can influence the proportions of private variants and the value of FST after a recent population split. As sites subject to deleterious mutations are necessarily found in functional genomic regions, interpretations in terms of recurrent positive selection may require reconsideration.

INTRODUCTION

Among the most important results in theoretical population genetics, now nearly a century old, are the fixation probabilities of new beneficial and deleterious mutations, which were obtained by Fisher (1922, 1930), Haldane (1927), and Wright (1931), using different approaches. Their results were later generalized by Kimura (1957, 1962, 1964), using the backward diffusion equation. A somewhat lesser known result concerns the trajectories of these selected mutations. Specifically, Maruyama & Kimura (1974) found that, conditional on fixation, the time that a beneficial autosomal mutation with selection coefficient +s and dominance coefficient h spends in a given interval of allele frequency in a randomly mating population is the same as that for a deleterious mutation with selection coefficient -s and dominance coefficient 1 − h, provided that the conditions for the validity of the diffusion equation approximation hold. Thus, given that the effects of selective sweeps on variability at linked neutral sites are related to their speed of transit through a population (Maynard Smith & Haigh 1974; Stephan 2019), the fixation of a deleterious mutation by genetic drift can generate a similar selective sweep effect to that caused by the fixation of a beneficial mutation, for mutations with the same magnitude of selection coefficient. Moreover, Tajima (1990) demonstrated that, on average, there is a ∼42% mean reduction in diversity at a site where a neutral mutation has recently become fixed by genetic drift. While the mean time to fixation for this class of mutation is well known to be 4Ne generations (Kimura & Ohta 1969); this is of course associated with a wide variance of approximately 4.64 Ne2 (Kimura 1970), such that neutral mutations may also fix relatively rapidly (in less than Ne generations) and generate an appreciable but highly localized sweep effect (see Tables 1 and 2 of Tajima 1990).

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 1:

Reduction in nucleotide diversity at linked neutral sites relative to the neutral value after the fixation of a weakly deleterious or advantageous semi-dominant mutation. The distance between the selected and neutral site is presented in units of the scaled recombination rate (ρ), as well as in the number of bases corresponding to Drosophila-like populations.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 2:

Mean genome-wide and outlier FST values (calculated in windows of 500 bp (labeled ‘Constant total sites’), or 10 SNPs (labeled ‘Constant SNPs’)).

Naturally, the probabilities of deleterious and beneficial fixations differ strongly. However, given that the input of deleterious mutations is much higher than the input of beneficial mutations each generation (see reviews by Eyre-Walker & Keightley 2007; Bank et al. 2014a), the potential contribution of such deleterious sweeps to levels and patterns of nucleotide variation, as well as divergence between populations and species, remains an important open question (Charlesworth 2020a). An alternative way of viewing this issue, as discussed by Gillespie (1994) and Charlesworth & Eyre-Walker (2007), is that under a model of constant selection and reversible mutation between two alternative nucleotide variants, statistical equilibrium with respect to the frequencies of sites fixed for the alternatives implies equal rates of beneficial and deleterious substitutions per unit time. It is important to note, however, that only deleterious mutations with selection coefficients on the order of the reciprocal of the population size have significant probabilities of fixation (Fisher 1930; Kimura 1964), implying that the substitutions concerned involve only very weakly selected mutations. The effects on diversity statistics of sweeps of very weakly selected mutations, including those of deleterious mutations, appears to have been investigated previously only by Mafessoni & Lachmann (2015).

A starting point for investigating this problem is the distribution of fitness effects of new mutations (the DFE). There is substantial evidence from both empirical and experimental studies that the DFE of new mutations is bimodal - consisting of a strongly deleterious mode, and a weakly deleterious / neutral mode that may contain a beneficial tail under certain conditions (e.g., Crow 1993; Lynch et al. 1999; Sanjuán 2010; Jacquier et al. 2013; Bank et al. 2014b). While the calculations and simulations presented below represent a general approach to addressing this topic, we have necessarily chosen a specific DFE realization and species for illustration. Specifically, Johri et al. (2020) recently presented an approximate Bayesian (ABC) approach, which represents the first joint-estimator of the DFE shape together with population history that corrects for the effects of background selection (BGS; Charlesworth et al. 1993). They estimated that a substantial proportion of new mutations in coding regions have mildly deleterious effects on fitness, emphasizing the importance of further understanding the consequences of such mutations in dictating observed polymorphism and divergence. Furthermore, they found it unnecessary to invoke a beneficial mutational class in order to fit the data from the African population of Drosophila melanogaster that they considered.

This analysis provides a basis for exploring the possible implications of sweeps of deleterious mutations, a topic that has been rather neglected. Here, we re-examine this question, considering both single and recurrent substitution models, which we use to examine the possibility that genomic scans for positive selection may in fact also be identifying deleterious fixations, when based on: a) levels and patterns of variation; b) population differentiation; and c) species-level divergence. Our results suggest that, while this phenomenon is unlikely to be a major factor in polymorphism-based scans, it may be a serious confounder in among-population-based analyses.

METHODS

Analytical calculations

For convenience, we assume a Wright-Fisher population of N randomly mating individuals throughout the analytical section. The selection coefficient for homozygotes is denoted by s, with s = sa or s = − sd (sa and sd > 0), representing selection for and against homozygotes, respectively. In particular, the distribution of fitness effects (DFE) for semi-dominant deleterious mutations is assumed to be discrete, with four fixed classes of mutations given by 0 < 2Nsd ≤ 1, 1 < Nsd ≤ 10, 10 < 2Nsd ≤ 100 and 100< 2Nsd ≤ 2N, where mutations are assumed to follow a uniform distribution within each class. These assumptions concerning the distribution of sd were made in order to simplify integration over the DFE (Johri et al. 2020).

Probability of fixation

The fixation probability (Pfix) of a new semi-dominant mutation with an initial frequency of 1/2N in a Wright-Fisher population of size N was calculated using Equation 10 of Kimura (1962): Embedded Image Note that this equation assumes demographic equilibrium and independence between the selected sites, as well as |s | ≪ 1.

Contributions to divergence

Under the DFE model described above, the number of fixations Nfix (i) expected per generation per site for a given DFE class i of deleterious mutations is given by the following expression: Embedded Image where μ is the total mutation rate per site per generation, fi represents the proportion of new mutations belonging to the ith DFE class and sd2 and sd1 represent the upper and lower bounds to the DFE class, respectively. The integral in equation (2) was evaluated analytically by means of an infinite series representation (derivation provided in Appendix 1), which was validated using the “integrate” function in R (R Core Team 2018).

Waiting and fixation times

The waiting time (tw) between fixations under a Poisson process was calculated as follows: Embedded Image where L represents the number of functional sites under consideration.

The expected time to fixation of a semi-dominant mutation was calculated by numerically integrating equation (17) of Kimura & Ohta (1969), using Simpson’s rule (Atkinson 1989).

Reduction in diversity due to a single sweep

The expected reduction in pairwise nucleotide site diversity at the end of a sweep (−Δπ), relative to the diversity in the absence of selection, was calculated using equation (14a) of Charlesworth (2020b) for a non-zero rate of recombination (omitting a factor that describes the effect of background selection): Embedded Image where Ts is the mean coalescent time (in units of N) for a pair of alleles sampled at the end of a sweep, Td is the duration of the deterministic phase of the sweep (i.e., excluding the initial and final stochastic phases) in units of N, Tr is the mean time to a recombination event during the sweep, conditioned on the occurrence of a recombination event; Pr is the probability of at least one recombination event transferring a sampled allele onto the wild-type background during the sweep, and Prs is the probability that there is only a single such recombination event. For large values of the ratio of the rate of recombination between the neutral and selected site (r) to the magnitude of the selection coefficient, this expression can become negative, in which case it is reset to zero.

We also used simulations based on equations (27) of Tajima (1990), which provide recursion equations for the expectation of the pairwise diversity at a neutral locus linked to a selected locus, conditional on a given trajectory of allele frequency change at the selected locus, as described by Charlesworth (2020b). These equations require only the validity of the diffusion equation approximation. Binomial sampling of allele frequencies post-selection in each generation was used to generate the trajectories of change at the selected locus, using the standard selection equation for a single locus to calculate the deterministic change in allele frequency each generation. Application of the recursion equations to a trajectory of allele-frequency change simulated in this way gives one realization of Δπ; and the overall expected value of Δπ can be obtained from the mean of the simulated values over a large number of replicates. It was found that 1000 replicates gave very accurate estimates of Δπ, with ratios of the standard errors to the means of < 5% for the parameter sets used here.

Population genetic parameters used for analytical calculations

The parameters for the calculations were chosen to match those estimated from D. melanogaster populations, estimated from exonic sites. Mutations occurred at a rate μ per basepair, and were assumed to be a mixture of neutral, nearly neutral and weakly deleterious mutations. For this analysis; μ = 3 × 10−9 per basepair (bp) per generation (Keightley et al. 2009, 2014). The sex-averaged rate of crossing over per bp (rc) was assumed to be equal to 10−8 per generation (Fiston-Lavier et al. 2010), the effective population size was 106 (Arguello et al. 2019; Johri et al. 2020), and 10 generations per year were assumed. Given estimates of neutral divergence between D. melanogaster and D. simulans, this means that t ∼21.3 × 106 generations elapsed since their common ancestor (Li et al. 1999; Halligan & Keightley 2006), corresponding to 2.13 M years.

Because non-crossover associated gene conversion is an important source of recombination between closely linked sites in Drosophila, it was assumed to occur uniformly across the genome, independently of local differences in the rate of crossing over, as indicated by the data on Dr melanogaster (Comeron et al. 2012; Miller et al. 2016). The sex-averaged rate of initiation per bp of conversion events was rg = 10−8 per bp per generation, and there was an exponential distribution of tract length with a mean of dg = 440 bp (Comeron et al. 2012; Miller et al. 2016). The net rate of recombination between sites separated by z bp, r(z), is the sum of the contributions from crossing over and non-crossover gene conversion, given by the formula of Frisse et al. (2001): Embedded Image For some of the results presented below, only the net rate of recombination r, or its value scaled by the effective population size, ρ = 2Nr, was used. In the presence of gene conversion with the parameters described above, the corresponding value of z can then be obtained by Newton-Raphson iteration of the equation r(z) − r = 0, assuming that the rate of gene conversion does not vary with the rate of crossing over,

Simulations

Simulating individual fixations in a single population

SLiM 3.3.1 (Haller & Messer 2019) was used to simulate a genomic element of length 10 kb. For the single sweep model, in order to quantify the hitchhiking effect of a single fixation, selection only acted on a single site in the middle of the region, with all other sites evolving neutrally. Simulations were performed for five different values of the scaled selection coefficient: 2Ns = 0, -1, +1, -5, and +5. Population genetic parameters resembling those of D. melanogaster were utilized (as defined above) for illustrative purposes. In order to perform simulations efficiently, population size was scaled down by a factor of 100 while the mutation and recombination rates were correspondingly scaled up by the same factor. Simulations were run for a burn-in period of 105 generations (10Nsim) after which a mutation with a scaled selection coefficient of 2Ns was introduced at the selected site. Simulations in which the introduced mutation reached fixation were retained for analysis. Fifty diploid individuals were sampled at the completion of the simulations, and population genetic summary statistics were calculated using Pylibseq (Thornton 2003).

Simulating multiple populations and FST-based analyses

Based on a recently inferred demographic history of African and European populations of D. melanogaster using populations sampled from Beijing, the Netherlands, and Zimbabwe (Arguello et al. 2019), an ancestral population of size of 1.95 × 106 was simulated, which split into two populations (6.62 × 104 generations ago) of constant size: 3.91 × 106 and 4.73 × 105, representing African and European populations, respectively. Note that, although Arguello et al. inferred recent growth in both populations, we have assumed constant sizes in order to avoid confounding effects of such growth on the fixation probabilities of mutations, and we also assumed no migration between the two populations. For the purpose of FST analyses, a functional region of size 10 kb was simulated, in which mutations had selection coefficients given by the DFE inferred in Johri et al. (2020). Specifically, the DFE was given by a discrete distribution of four fixed bins with 24.7% of mutations belonging to the effectively neutral class (f0: 0 ≤ 2Nsd <1), 49.4% to the weakly deleterious class (f1: 1 ≤ 2Nsd< 10), 3.9% to the moderately deleterious class (f2: 10 ≤ 2Nsd < 100), and 21.9% to the strongly deleterious class of mutations (f3: 100 < 2Nsd ≤ 2N).

In this two-population framework, four separate scenarios were tested: 1) the DFE remained scaled to the ancestral population size (making selection effectively weaker in the smaller derived (European) population); 2) the DFE was rescaled after the population split with respect to subpopulation-specific sizes (i.e., both populations experienced equal proportions of mutations belonging to each DFE class as defined in terms of 2Nsd), such that selection was equally strong in both populations - this is an arbitrary biological model that is simply chosen for comparison, as one would naturally expect selection to be effectively weaker in the derived population as in scenario 1; 3) in addition to this neutral and deleterious DFE, 1% of all mutations were mildly beneficial with selective effects drawn uniformly from the interval 1 ≤ 2Nsa ≤ 10, where sa is the increase in fitness of the mutant homozygote; and 4) in addition to this neutral and deleterious DFE, 1% of all mutations were strongly beneficial with 2Nsa = 1000. In scenarios 3 and 4, the deleterious and beneficial DFEs were scaled to the respective sizes of each population post-split, so that the proportions of beneficial mutations were the same in the two populations. In order to assess the role of population bottlenecks in generating neutral outliers, a fifth scenario was simulated in which there was no selection and the demographic history was that inferred by Li & Stephan (2006) - a model that involves a much larger size reduction than inferred in the Arguello et al. (2019) model utilized above. The parameters of both demographic models are provided in Supp Table 1.

Fifty diploid individuals were sampled from both populations in order to calculate FST. All sites that would be considered polymorphic in the metapopulation were used to calculate FST (i.e., sites fixed either in one population or both populations (for different alleles) were also included in FST calculations). FST was calculated in sliding windows across the genomic region for a) windows containing a constant number of SNPs using the package PopGenome (Pfeifer et al. 2014) in R, and b) for windows representing the same total number of bases using Pylibseq 0.2.3 (Thornton 2003). FST was calculated for both cases by the method of Hudson et al. (1992). FST was also calculated individually for different mutation types (i.e., for neutral, weakly deleterious and beneficial mutations, by simply restricting the calculations to segregating sites of the specific mutation type). Although there will be an upper bound to the FST values obtained in this way, which is determined by the frequency of the most frequent allele in the metapopulation (Jakobsson et al. 2013), the detection of outliers should not be affected by this procedure and should mimic the scenario for natural populations of Drosophila.

Simulating multiple species, and divergence-based analyses

McDonald-Kreitman (MK) tests (McDonald & Kreitman 1991) were performed to investigate the degree to which substitutions of mildly deleterious mutations might affect the inference of divergence due to positive selection. A population resembling a D. melanogaster African population was simulated under demographic equilibrium. Ten independent replicates of a 10-kb protein-coding region were considered such that every third position was neutral (representing synonymous sites) while all other sites represented nonsynonymous positions. Nonsynonymous sites experienced purifying selection given by a DFE that comprised only the non-neutral bins (i.e., f1, f2, and f3, in the same proportions described above), while the neutral sites experienced mutations that belonged to class f0. For comparison, simulations were performed in which 10% or 20% of nonsynonymous mutations were also neutral. In all cases, the simulation was run for 20N generations (where N = 104). The number of segregating nonsynonymous (PN) and synonymous (PS) sites were estimated by sampling 50 diploid individuals from the population from the functional and neutral regions respectively. The number of fixed substitutions occurring at the functional sites (DN) and neutral sites (DS) were calculated post burn-in (i.e., after 10N generations), and then rescaled to the number of generations since the D. melanogaster ancestor.

In order to correct for mildly deleterious mutations segregating in populations, the proportion of adaptive substitutions (α) was also inferred by implementing a variant of the test referred to as the asymptotic MK test. Messer & Petrov (2013) suggested plotting the derived allele frequency (X) of variants against α inferred using the number of segregating sites (P(X)) at that derived allele frequency Embedded Image, and showed that the asymptote of this curve would tend towards the true value of α. The asymptotic MK test was performed using a web-based tool available at: http://benhaller.com/messerlab/asymptoticMK.html (Haller & Messer 2017). For the purpose of the asymptotic MK test, values of PN/PS were binned with a bin size of 0.05, and the curve-fitting (of α(x) with respect to X) was restricted to derived allele frequencies between 0.1-1.0

RESULTS & DISCUSSION

Theoretical expectations

As has been long appreciated (Fisher 1930; Wright 1931), equation (1) implies that the probability of fixation of a mutation in the strongly deleterious DFE classes is vanishingly small. However, the weakly deleterious class may contribute substantially to divergence (Figure 1a). We derived an analytical approximation for the probability of fixation for mutations with a DFE represented as a combination of four non-overlapping uniform distributions (see Methods and Appendix 1). In a Wright-Fisher population, the ratio of the mean probability of fixation of mutations with fitness effects between 1 < 2Nsd ≤ 5, relative to the probability of fixation of effectively neutral mutations (0 < 2Nsd ≤ 1), is 0.27 (note that the value of N is irrelevant if the diffusion approximation holds). This ratio rapidly declines to 0.01 for mutations with fitness effects 5 < 2Nsd ≤ 10 (Figure 1a).

Figure 1:
  • Download figure
  • Open in new tab
Figure 1:

(a) Probability of fixation of weakly deleterious mutations relative to that of neutral mutations. (b) The distribution of fixation times (conditional on fixation) of mutations with varying selective effects, obtained from 100 simulated replicates. Fixation times are measured as the time taken for the mutant allele to spread from frequency 1/2N to frequency 1. Black solid circles are the means of the distributions obtained from simulations, and red solid circles are the mean expectations obtained from numerically integrating the expression of Kimura & Ohta (1969). The dominance coefficient is 0.5 for all mutations except in the cases “dom” and “rec” where h = 1 and 0, respectively.

Taking the recently estimated DFE for the Zambian D. melanogaster population (Johri et al., 2020) as an example, a question of interest concerns the probabilities of fixation of different classes of mutations and their contributions to population- and species-level divergence (Supp Figure 1). In the absence of positive selection, the weakly deleterious class of mutations would be expected to contribute 7.2% of the total divergence in exonic regions, while effectively neutral mutations would be expected to contribute 92.8%. If we were to assume that approximately 50% of all substitutions in Drosophila have been caused by positive selection (Eyre-Walker and Keightley 2009; Campos et al. 2017), weakly deleterious mutations are still likely to have contributed 3.5% of the total divergence in functional regions and possibly much more in regions experiencing reduced selective constraints.

Hitchhiking effects of deleterious sweeps: levels and patterns of variation

We next considered the fixation times of these mutations contributing to divergence, as well as the expected waiting time between fixations. Using population parameters relevant for D. melanogaster (see Methods), the neutral and deleterious DFE of Johri et al. (2020), and assuming that 60% of the D. melanogaster genome (of size 140 Mb) is functional, equation (3) shows that the genome-wide waiting time between successive weakly deleterious fixations is ∼ 83 generations. Hence, the ∼2.13 million years estimated for the time since the D. melanogaster and D. simulans split is expected to equate to many such fixations.

As expected, there is no significant difference (p = 0.16; Student’s t-test) in fixation times between the mildly beneficial (3.2N generations; SD: 1.5N) vs mildly deleterious mutations (2.9N generations; SD: 1.1N) with the same fitness effects (2Nsa = 2Nsd = 5) (Supp Table 2). Importantly however, the variance in fixation times of weakly selected mutations is extremely large, such that the faster tails of the fixation time distributions for both 2Nsd = 5 and 2Ns = 0 occupy ∼N generations, which corresponds to a sweep effect of the same size as the mean for 2Nsa = 30 (Figure 1b). In other words, weakly deleterious and neutral fixations can match the sweep effects of comparatively strongly selected beneficial fixations.

We evaluated the impact of these fixations on observed genomic variation in two ways. The first corresponds to a model of a single sweep event, which is relevant to the literature on detecting signatures of individual fixations, as done in genomic scans for positively selected loci (e.g., Harr et al. 2002; Glinka et al. 2003; Haddrill et al. 2005; Nielsen et al. 2005; see also Jensen 2009; Stephan 2019). Such scans operate under the assumption that the selective sweeps in question have achieved fixation immediately prior to sampling, due to the rapid loss of signal as the time since fixation increases (Kim & Stephan 2002; Przeworski 2002). The second corresponds to a recurrent substitution model, which is relevant to the recurrent sweep literature that attempts to quantify the effect of selective sweeps on genome-wide patterns of variability and their relation to rates of recombination (e.g., Wiehe & Stephan 1993; Kim 2006; Andolfatto 2007; Macpherson et al. 2007; Jensen et al. 2008; Campos & Charlesworth 2019; Charlesworth 2020b; see review by Sella et al. 2009).

The reduction of diversity resulting from the fixation of a mildly deleterious or beneficial semi-dominant mutation with a very small selective effect is not expected to be substantially different from that caused by the fixation of a selectively neutral mutation (Tajima 1990). The widely-used approximation of Barton (2000) for the reduction in diversity relative to neutrality caused by the sweep of a semi-dominant mutation, Δπ = (2Nsa)(−4r/s), suggests that, when 2Nsa = 5, diversity will only be reduced by more than 20% in a region for which r/s ≤ 0.2, corresponding to approximately 20 bp for a typical D. melanogaster recombination rate of 3 × 10− 8 per bp, including the contribution from gene conversion as given by equation (6), and assuming an effective population size of ≥ 106.

However, this formula assumes that fixations are so fast that swept alleles that have failed to recombine onto a wild-type background experience no coalescent events during the duration of the sweep (Hartfield & Bataillon 2020; Charlesworth 2020b), which is unlikely to be true for weakly selected mutations. We therefore used equation (4) for analytical predictions (Figure 2, Table 1, Supp Table 3), which is based on the results of Charlesworth (2020b). But this equation is also likely to be inaccurate with weak selection, because it assumes that the trajectory of allele-frequency change is close to that for the deterministic case, except for the initial and final stochastic phases at the two extremes of allele frequencies. We therefore also used simulations based on equations (27) of Tajima (1990) to predict sweep effects, as described in the Methods.

Figure 2:
  • Download figure
  • Open in new tab
Figure 2:

Recovery of nucleotide diversity per site (π) relative to that expected under neutrality (π0), around a recent fixation (shown at position 0 on the x-axis). The target site has experienced (a) a neutral fixation (2Ns = 0; black lines), (b) a weakly deleterious fixation (2Nsd= 5; red line), and (c) a weakly beneficial fixation (2Nsa = 5; blue line). Solid lines represent mean values of 100 replicates, shaded regions correspond to 1 SE above and 1 SE below the mean. Solid circles show the theoretical predictions using equation (14) of Charlesworth 2020b; and crosses correspond to simulations based on equation (27) of Tajima (1990).

With the full simulations of a 10-kb region, in which we assumed a population size of 104 and mutation rate of 3 × 10−9 /site/generation, with 2Nsa= 5 the nucleotide diversity 10 bp (∼ρ = 0.2) around the selected site (i.e., 5 bp in both directions) was 0.0058 (SE: 0.0012), corresponding to a reduction of 48% below the neutral value. For 2Ns = 10, the 10-bp nucleotide diversity was 0.0083 (SE: 0.0017), corresponding to a reduction of 69%. The observed reduction in both cases almost fully recovers to the expected level under neutrality within 500 bp (ρ ≈ 10; Figure 2). A similar pattern is seen in Table 1 and Supp Table 3, which shows both the analytical predictions and those based on Tajima’s equations, which agree surprisingly well except at the two highest rates of recombination displayed. For comparison, the results of simulating neutral fixations are also shown in Table 1 and Figure 2. The case with Ns = 10 is at the higher limit of what is likely to be produced by sweeps of deleterious mutations, given that the ratio of the fixation probability given by equation (1) to the neutral value of 1/(2N) is then approximately 0.0045, compared to 0.034 with Ns = 5.

Mafessoni & Lachmann (2015) showed that that fixations of weakly selected, highly dominant favorable mutations - or highly recessive deleterious mutations - could reduce diversity at linked sites by a smaller amount than fixations of neutral mutations, with a maximum effect when 2Ns is approximately 2 (and h = 1 for favorable mutations, and 0 for deleterious ones). We have confirmed this unexpected observation using Tajima algorithm simulations (Supp Table 4), finding that it exists even for 2Ns = 5 (Supp Figure 2; Supp Table 4). However, our use of h = 0.5 and 2Ns ≥ 2.5 means that this phenomenon is absent from the results presented here.

Because gene conversion is an important contributor to recombination between closely linked sites (Miller et al. 2016), the effects of fixations in the presence of gene conversion are restricted to a region that is about one-third of the distance in the absence of gene conversion (Table 1). Thus, a greater than 20% reduction in nucleotide diversity for 2Ns = 5 [2Ns = 10] was observed up to ρ = 0.8 [ρ = 1.6], corresponding to 14 bp [27 bp] with gene conversion, and 40 bp [80 bp] without gene conversion. This quite localized effect is similar for weakly beneficial, weakly deleterious, and neutral mutations.

Given the relatively faster mean speed (3.1N generations) of fixation of weakly selected semi-dominant mutations compared to the neutral expectation (4N generations), they should also result in small distortions of the SFS at closely linked neutral sites. We observed a slight skew towards rare variants (as measured by Tajima’s D, Supp Table 5) restricted to ∼50 base pairs from the selected site immediately after fixation (for 2Nsd = 2Nsa = 5). This highly localized distortion of the SFS is probably too weak to play any important role in generating false positives in genomic scans. Indeed, owing to the inherent stochasticity involved in the underlying processes, such scans generally only have power to detect very strongly selected fixations (often requiring values of 2Nsa > 1000 in order to observe appreciable true positive rates: Crisci et al. 2013). Viewed in another way, given that the false-positive rates associated with genomic scans may often be inflated well-above true-positive rates owing to the underlying demographic history of the population (e.g., Teshima et al. 2006; Thornton & Jensen 2007; Crisci et al. 2013; Harris et al. 2018), demography is probably a much stronger confounder than deleterious sweeps in polymorphism-based scans.

Finally, using a recurrent fixation model, we considered the steady-state impact of weakly deleterious sweeps (Figure 3). Here we used a model of a gene with five exons of 100 codons each, with 70% of exonic mutations subject to selection, which were separated from each other by 100 bp introns, as described by Campos & Charlesworth (2019) and Charlesworth (2020b). Five equally large classes of sites subject to deleterious mutations were modeled, with the lowest scaled selection coefficient being Nsd = 2.5 and the largest Nsd = 10; as described above, a uniform distribution of Nsd values was assumed within each class. The theoretical predictions used the method for predicting the effects of recurrent sweeps of Charlesworth (2020b), based on equation(4) above. For consistency with the simulations of population divergence described below, the population size was assumed to be 1.95 × 106, giving a neutral diversity of 0.0234 with the assumed mutation rate of 3 × 10−9. Substitution rates of deleterious nonsynonymous mutations were calculated from equations (1) and (2). The expected number of nonsynonymous substitutions per gene over 2N generations was 0.506, and the ratio of nonsynonymous to synonymous substitutions was 0.0412, which is somewhat less than half the mean value for comparisons of D. melanogaster and its close relatives (e.g., Campos et al. 2017). The major source of these substitutions is the class with Nsd between 2.5 and 4.375, which accounts for 76% of all substitutions, reflecting the fact that mutations in this class have the highest fixation probabilities.

Figure 3:
  • Download figure
  • Open in new tab
Figure 3:

Predicted mean reductions of nucleotide diversity at linked neutral sites compared to neutrality (− Δπ), due to the recurrent fixation of weakly deleterious semi-dominant mutations with 2.5 ≤ 2Nsd ≤ 10. Nucleotide diversity at neutral sites was averaged across a gene comprised of five 300-bp exons and 100-bp introns, in which all intronic sites and 30% of exonic sites were neutral.

The results for single sweeps shown in Table 1 suggested that the theoretical predictions will tend to overestimate sweep effects, so that the results in Figure 3 must be treated with some caution. With the standard D. melanogaster rate of crossing over per bp of 1 × 10−8, the mean reduction in nucleotide diversity at synonymous sites caused by deleterious sweeps is approximately 5% with gene conversion and 8% in its absence (Figure 3). For the lowest rates of crossing over and no gene conversion, average reductions can reach ∼19%, suggesting that the fixation of mildly deleterious mutations could play a significant role in organisms or genomic regions with highly reduced rates of recombination. In this low recombination environment however, selective interference would also become a factor - which would be expected to accelerate the rate of deleterious fixations, while also making deleterious variants behave more like neutral mutations. Regardless of these details, however, it seems unlikely that substitutions of deleterious mutations will have more than a minor effect on average diversity in the Drosophila genome overall, particularly compared with the effects of population history, background selection, and sweeps of positively selected mutations. Furthermore, the findings of Mafessoni & Lachmann (2015) suggest that strongly recessive deleterious mutations will have even smaller effects than those studied here (see Supp. Figure 2).

The contribution of deleterious mutations to population- and species-level divergence-based scans

Given the potentially substantial contribution of the weakly deleterious class to observed fixations, it is also of interest to consider their impact on divergence-based analyses (e.g., methodology related to dN/dS) and population differentiation-based scans (e.g., methodology related to FST). We examined properties related to inferring the proportion of substitutions fixed by positive selection (α) by performing MK tests in the presence of mildly deleterious mutations. Specifically, we simulated two scenarios: 1) 50% of mutations at nonsynonymous sites were weakly deleterious, and 50% were neutral, and 2) nonsynonymous sites experienced deleterious mutations that followed the DFE inferred by Johri et al. (2020).

Although, as expected, the presence of mildly deleterious mutations substantially increases values of dN/dS (Supp Table 6) relative to stronger purifying selection (i.e., larger proportions of moderately and strongly deleterious mutations), it also leads to strongly negative values of α when performing MK tests. This is due to the fact that mildly deleterious mutations often segregate in the population at low frequency, inflating the total number of segregating nonsynonymous polymorphisms (PN) significantly (Supp Table 6). As such, the presence of mildly deleterious mutations can result in negative values of α in the absence of positive selection, as expected. This is consistent with previous studies that have proposed a derived allele frequency cutoff (Fay et al. 2001, 2002; Andolfatto 2005; but see Charlesworth & Eyre-Walker 2008) to correct for segregating mildly deleterious alleles, as well as proposed modification of the traditional MK test (e.g., the asymptotic MK test; Messer & Petrov 2013). Nevertheless, under the asymptotic MK test, α remains underestimated (Supp Table 6) when the proportion of mildly deleterious mutations is sufficiently high.

In order to study inter-population effects, we simulated a model similar to that recently inferred by Arguello et al. (2019), which represents the European and African split of D. melanogaster (see Methods), and overlaid it with the estimated DFE of Johri et al. (2020). Under this model (i.e., in the absence of positive selection), roughly 50% of SNPs identified as FST outliers (defined as representing the upper 1% or 2.5% tails) are mildly deleterious (Figure 4a; Supp Figure 3). Two models of purifying selection are given for comparison (Table 2): 1) a more biologically realistic model in which selection coefficients are scaled to the ancestral population size in defining the DFE classes, such that selection is effectively weaker in the smaller derived population; and 2) an arbitrary model in which the DFE is rescaled such that selective effects are equally strong in the larger ancestral and the smaller derived populations (which, under the chosen demographic model, differ from one another by roughly an order of magnitude). It should be noted that in the model based on Arguello et al., the time post-split between the African and European population is extremely brief, such that there are few or no substitutions post-split (Supp Table 7). Thus, FST values are almost entirely dictated by allele frequency differences with respect to co-segregating mutations (Supp Table 7).

Figure 4:
  • Download figure
  • Open in new tab
Figure 4:

Allele frequencies of SNPs in simulated population 1 (European) vs population 2 (African), using parameters of the Arguello et al. (2019) model, where the selective effects of all mutations were rescaled with respect to their population sizes after the split (i.e., keeping the strength of selection constant in both populations). Genomic elements experienced (a) purifying selection following the DFE inferred by Johri et al. (2020); (b) the same DFE, but with the addition of 1% beneficial mutations with selective effects between 1 < 2Nsa ≤ 10; or (c) the same DFE, but with the addition of 1% beneficial mutations with selective effects of 2Nsa = 1000. Left panel: Allele frequency plots for 10 (out of 100) replicates simulated. Colored open circles represent FST outliers. Green depicts effectively neutral mutations (belonging to class 0), blue depicts beneficial mutations, and warm colors depict deleterious mutations (belonging to classes 1, 2, and 3), with red representing weakly deleterious mutations. Right panel: The distribution of fitness effects of outlier mutations for the corresponding scenarios, showing the mean and standard deviation for all 100 replicates. Sites that were fixed in both populations for the same allele were not included in this analysis.

Mean FST values at both neutral and weakly deleterious sites are larger with the rescaled purifying selection model, whereas the unscaled model yields similar values to the purely neutral model (Table 1 and Supp Tables 8, 10). In addition, the frequency of private SNPs at all sites is higher for the rescaled model than the unscaled model, with the neutral model having the lowest value of the three (Supp Table 7). In order to determine whether these weakly deleterious mutations were associated with a decrease in diversity at linked sites, potentially leading to an increase in FST values (Cruickshank & Hahn 2014), we evaluated the relationship between FST and nucleotide diversity using neutral variants alone (Supp Figure 4; see also the comparison with directly selected sites in Supp Figures 5-6). The lack of a strong negative correlation suggests that sweep-like effects of deleterious mutations on diversity at linked sites are not primarily responsible for the observed effects on FST and the frequency of private alleles. In contrast, there is a negative relationship between nucleotide diversity and FST after the strong population bottleneck represented by the neutral model of Li & Stephan (2006).

The lack of sweep-like effects of deleterious mutations is not surprising in view of the time-scale to fixation required for weakly selected mutations, which is of the order of the coalescent time, 2N generations; the split times for both the African and European populations are both only a small fraction of their respective coalescent times. Thus, neither new neutral nor weakly deleterious mutations are likely to have reached fixation in either population, consistent with the results shown in the last column of Supp Table 7. In addition, there is no time for large changes in π as a result of the altered N values.

Instead, it is more likely that BGS effects on the frequencies of segregating mutations explain these patterns (B = 0.18 and 0.15 in the African and European populations, respectively, obtained from the ratios of neutral π values in the absence of purifying selection to those without: Supp Table 9). In the absence of rescaling of the DFEs for deleterious mutations (the more biologically plausible case), the enhanced N for the African population means that fewer deleterious mutations behave as effectively neutral, so that there is less effect of drift on their frequencies compared with the ancestral population; the reverse is true for the European population. As far as linked neutral variants are concerned, there is likely to be a greater BGS effect in the African than in the ancestral population, and vice-versa for the European population. FST relative to the purely neutral case is thus subject to two opposing factors, which presumably explains the lack of any strong effect of this model of purifying selection on FST in Table 2 and Supp Table 8. However, the lower effective population size induced by BGS means that rare variants are more likely to be lost after the population split than under neutrality, explaining the increased proportion of private variants compared with neutrality with purifying selection (Supp Table 7). Because of the effects of BGS, the relative Ne values of the African and European populations are less disparate than the relative N values, so that the African/European ratio of the proportions of private SNPs is smaller than under neutrality (Supp Table 7).

The effect of rescaling is to keep the proportion of deleterious mutations that are effectively neutral the same in the two descendant populations, with the absolute strength of selection being higher in the African population, so that overall there is a stronger BGS effect in this population. This results in a higher overall fraction of private SNPs, and a larger enrichment of private SNPs in the African population, compared with the neutral case (Supp Table 7). There is a corresponding increase in mean FST for both neutral and weakly deleterious variants compared with the neutral case (Table 2 and Supp Table 8).

It is also of interest to consider the contribution of deleterious mutations to outliers in the presence of positive selection. When a class of weakly positively selected sites was added to the DFE (i.e. beneficial mutations have fitness effects 1 < 2Nsa ≤ 10, and comprise 1% of new mutations), these adaptive mutations contributed little to the observed outliers (< 5%) - with mildly deleterious and neutral mutations strongly represented amongst outliers (Figure 4b). Conversely, when positive selection is very strong (i.e., mutations have selective effects with 2Nsa=1000, and comprise 1% of new mutations), the majority (∼75%) of outliers are drawn from this beneficial class (Figure 4c). Yet, even in the presence of this exceptionally strong and frequent positive selection, ∼10% of outliers remain in the mildly deleterious class. It is also noteworthy that under this model of strong, recurrent positive selection, little variation within populations is observed, owing to the severity of the selective sweeps (Supp Table 8), and inter-population allele frequencies are only weakly correlated with one another (Supp Table 7).

Because any model, including equilibrium neutrality, will have outliers based on empirical p-values, we further quantified the properties of outliers (2.5% and 1%) relative to the genome-wide mean. Under models of purifying selection, the values of FST outliers were ∼2.3-4.5 fold larger than mean FST values (Table 2; Supp Table 10), but differed only slightly from the neutral case. Under the strongly bottlenecked neutral model of Li & Stephan (2006), in which the European population experiences a substantial decrease in population size during the bottleneck, the genome-wide mean FST values obtained are much higher, as would be expected, though outlier FST values were smaller in relative magnitude (∼1.5-2.7 fold higher than the means). Under models including weak positive selection, only slight increases in genome-wide FST values were observed; whereas the strong positive selection model greatly increased genome-wide values. However, outlier values under both positive selection models were ∼2.4-4.6 fold higher than the respective means. This suggests that recurrent positive selection does not generate substantially larger effect sizes for outlier FST values than neutrality or purifying selection, although the proportion of outliers is larger with strong positive selection.

CONCLUSIONS

In this paper, we have examined the expected impact of deleterious fixations on polymorphism-and divergence-based scans for selection. Amongst the class of weakly deleterious mutations that have some chance of reaching fixation (1 < 2Nsd ≤ 10), the resulting sweep effects are highly localized, as expected: on the order of a few dozen base pairs for the parameters considered here. This suggests that deleterious sweeps of this kind are unlikely to be detected in genomic scans based on localized deficits of variation or strongly skewed site frequency spectra. Given the theoretically expected symmetry between beneficial and deleterious sweeps (Maruyama & Kimura 1974; Mafessoni & Lachmann 2015; Charlesworth 2020a), this is expected, as common polymorphism-based methods generally have little power unless selection is exceptionally strong (e.g., Kim & Stephan 2002; Jensen et al. 2005; Crisci et al. 2013). However, our results suggest that studies which estimate the frequency and strength of classic selective sweeps using patterns of diversity around substitutions (Hernandez et al. 2011; Sattath et al. 2011; Elyashiv et al. 2016), which assume that reductions are entirely caused by the fixation of positively selected mutations, ought to take into account reductions caused by neutral and weakly deleterious substitutions.

Furthermore, for among-population comparisons based on FST, mildly deleterious mutations contribute significantly to observed outliers, even in the presence of positive selection, particularly in the case of a recent population size reduction. This appears to be the true regardless of whether selective effects are equally strong in both populations (Figure 4), or if selection is relaxed in the smaller derived population (Supp Figure 3). Because such effects are localized to functional sites (i.e., those genomic regions experiencing purifying selection), this effect may be particularly pernicious in the sense that this class of outlier will not fall in non-functional regions, where they are often attributed to demographic effects. Rather, owing to the common tendency of constructing biological narratives (true or otherwise) around functional outliers (Pavlidis et al. 2012), these results suggest that adaptive story-telling may arise from weakly deleterious outliers. However, we have only considered one specific scenario, and the generality of this conclusion remains to be established.

SUPPLEMENTARY MATERIAL

View this table:
  • View inline
  • View popup
  • Download powerpoint
Supp Table 1:

Parameters of the demographic models of D. melanogaster simulated for the FST analyses.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Supp Table 2:

The means and standard deviations of fixation times (conditional on fixation) of mutations with varying selective effects, obtained from 100 simulated replicates. Fixation times are measured as the time taken for the mutant allele to spread from frequency 1/2N to frequency 1. Unless otherwise indicated, h=0.5.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Supp Table 3:

Reduction in diversity (relative to a value of 1 for neutrality) as (A) obtained by Tajima’s (1990) simulations and (B) as approximated by numerical results using equation (14) in Charlesworth 2020b.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Supp Table 4:

Reduction of diversity (Δπ) at ρ (= 4Nr) distance from the selected site for varying dominance (h) and selection coefficients (2Ns) calculated using the algorithm presented in Tajima (1990). For the purpose of these calculations N was assumed to be 100 and θ = 0.001.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Supp Table 5:

Distortion of the SFS at neutral sites linked to the selected site at a distance measured by ρ, and calculated immediately after the fixation of a weakly selected mutation. 100 replicates were used to obtain these values. Tajima’s D was calculated in non-overlapping sliding windows of 50 base pairs, with the distance from the selected site corresponding to the mid-point of the window.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Supp Table 6:

Number of polymorphic (P) and fixed (D) sites at nonsynonymous (N) and synonymous (S) sites, and the estimated proportion of substitutions fixed by positive selection (α), for an increasing proportion of mildly deleterious mutations at nonsynonymous sites. The ratio of the rate of divergence at nonsynonymous relative to that at synonymous sites is denoted by dN/dS.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Supp Table 7:

Composition of shared and private SNPs between populations (calculated in sliding windows of 500 bp).

View this table:
  • View inline
  • View popup
  • Download powerpoint
Supp Table 8:

Mean FST and nucleotide diversity in the African and European population, for neutral, weakly deleterious, and beneficial mutations separately, calculated for windows of 500 bp.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Supp Table 9:

Average nucleotide diversity (π) per site of African and European populations simulated for FST analyses.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Supp Table 10:

Mean genome-wide and outlier FST values for European and African D. melanogaster populations (expanded from Table 2) - for window sizes of both 10 and 20 SNPs, and the 1% and 2.5% tails of the distribution.

Supp Figure 1:
  • Download figure
  • Open in new tab
Supp Figure 1:

Shown above is the specific DFE of deleterious mutations assumed in this study (from Johri et al. 2020), color-coded to display the probability of fixation of deleterious mutations relative to the probability of fixation of neutral mutations. Mutations with selective effects shown in blue regions have an appreciable probability of fixation, while those shown in red are highly unlikely to fix (scale is displayed on right).

Supp Figure 2:
  • Download figure
  • Open in new tab
Supp Figure 2:

Reduction in neutral diversity from the selected site, post-fixation of a mutation, with 2Nsa = 5 and (a) h = 0 and (b) h = 1. Solid lines represent mean values of 100 replicates, shaded regions correspond to 1 SE above and 1 SE below the mean. Crosses correspond to simulations based on equations (27) of Tajima (1990).

Supp Figure 3:
  • Download figure
  • Open in new tab
Supp Figure 3:

Allele frequencies of SNPs in European (population 1) vs African (population 2) populations - under the parameters inferred by Arguello et al. (2019), and where the selective effects of all mutations are not re-scaled after the split (thus selection is stronger in the African population). Simulated genomic elements experience purifying selection given by the DFE inferred in Johri et al. (2020). Colored open circles represent outliers. Green depicts effectively neutral mutations while warm colors depict deleterious mutations, with red representing weakly deleterious mutations. Left panel: Allele frequency plots are shown for 10 (out of 100) replicates. Right panel: The distribution of fitness effects of outlier mutations displaying the mean and standard deviation for all 100 replicates.

Supp Figure 4:
  • Download figure
  • Open in new tab
Supp Figure 4:

The relationship between FST and nucleotide diversity calculated for neutral mutations alone, for the African and European populations simulated under six scenarios: (a) neutrality, (b) purifying selection (rescaled DFE), (c) purifying selection (not rescaled DFE), (d) purifying selection and weak positive selection, (e) purifying selection and strong positive selection, and (f) the neutral bottleneck of Li & Stephan 2006. On each, the upper left corner gives mean FST values with the SD in parentheses. The upper right corners give the corresponding mean value of π with the SD in parentheses. The red line represents the best fitting linear regression. Note the difference in scale between scenarios.

Supp Figure 5:
  • Download figure
  • Open in new tab
Supp Figure 5:

The relationship between FST and nucleotide diversity calculated for weakly deleterious mutations alone, in the African and European populations simulated under four scenarios: (a) purifying selection (rescaled DFE), (b) purifying selection (not rescaled DFE), (c) purifying selection and weak positive selection, and (d) purifying selection and strong positive selection. On each, the upper left corner gives mean FST values with the SD in parentheses. The upper right corner gives the corresponding mean value of π with the SD in parentheses. The red line represents the best fit linear regression. Note the difference in scale between scenarios.

Supp Figure 6:
  • Download figure
  • Open in new tab
Supp Figure 6:

The relationship between FST and nucleotide diversity calculated for only beneficial mutations in the African and European populations simulated under two scenarios: (a) purifying selection and weak positive selection, and (b) purifying selection and strong positive selection. On each, the upper left corner gives mean FST values with the SD in parentheses. The upper right corner gives the corresponding mean value of π with the SD in parentheses. The red line represents the best fit linear regression. Note the difference in scale between scenarios.

ACKNOWLEDGMENTS

This work was funded by National Institutes of Health grant R01GM135899 to JDJ.

APPENDIX 1

Calculating the probability of fixation with uniformly distributed fitness effects

For a selection coefficient s ≪ 1, the fixation probability of a deleterious mutation when Ne ≠ N is approximated by: Embedded Image For large Ne, this should be a good approximation, since when 2Nes ≥ 10, the fixation probability is negligible, so that the contribution from the bin with 2Nes ≥ 10 can be ignored. For 2Nes < 10, s is sufficiently small that this equation is an accurate approximation.

The integral of this expression over a given interval of 2Nes values can be found as follows; for convenience, x is substituted for s and a for 2Ne. For a > 0, we need to evaluate the following indefinite integral:Embedded Image Integration by parts give:> Embedded Image For a > 0, the logarithm can be expanded as a power series in exp(−ax), which can be integrated term by term: Embedded Image The final expression for the integral of equation (1) is thus: Embedded Image The contribution to the mean fixation probability from the interval si to si+1 is obtained by dividing this expression by (si+1− si).

For s0 = 0, equation (A4) is invalid. However, the integral between 0 and a small positive value of s, sε, can be found as follows. For 2Nes ≪ 1, the initial integrand can be approximated by a−1(1−ax/2), so that equation 2 can be replaced with: Embedded Image and the indefinite integral of the fixation probability becomes: Embedded Image The contribution to the integral of Pfix between s0 = 0 and s1 from the interval (0, sε) is thus: Embedded Image The corresponding mean fixation probability over this interval is: Embedded Image This is slightly smaller than the neutral value, 1/(2N), as would be expected when 2Nes ≪ 1.

Thus, for the interval (s0, s1) with s0 = 0, equation (A5b) should be used for the interval (0≤ s ≤ sε), and equation (A4) with integration limits sε and s1 for the remainder of the interval.

CITATIONS

  1. ↵
    Andolfatto P., 2005 Adaptive evolution of non-coding DNA in Drosophila. Nature 437: 1149–1152.
    OpenUrlCrossRefPubMedWeb of Science
  2. ↵
    Andolfatto P., 2007 Hitchhiking effects of recurrent beneficial amino acid substitutions in the Drosophila melanogaster genome. Genome Res. 17: 1755–1762.
    OpenUrlAbstract/FREE Full Text
  3. ↵
    Arguello J.R., S. Laurent, and A.G. Clark, 2019 Demographic history of the human commensal Drosophila melanogaster. Genome Biol. Evol. 11: 844–854.
    OpenUrlCrossRef
  4. ↵
    Atkinson, K.E., 1989 Introduction to numerical analysis. John Wiley, New York, NY.
  5. ↵
    Bank, C., M. Foll, A. Ferrer-Admetlla, G. Ewing, and J.D. Jensen, 2014a Thinking too positive? Revisiting current methods in population genetic selection inference. Trends Genet. 30: 540–6.
    OpenUrlCrossRefPubMed
  6. ↵
    Bank C., R.T. Hietpas, A. Wong, D.N. Bolon, and J.D. Jensen, 2014b A Bayesian MCMC approach to assess the complete distribution of fitness effects of new mutations: uncovering the potential for adaptive walks in challenging environments. Genetics 196: 841–852.
    OpenUrlAbstract/FREE Full Text
  7. ↵
    Barton N.H., 2000 Genetic hitchhiking. Phil. Trans. R. Soc. Lond. B 355: 1553–62.
    OpenUrlCrossRefPubMedWeb of Science
  8. ↵
    Campos J.L., L. Zhao, and B. Charlesworth, 2017 Estimating the parameters of background selection and selective sweeps in Drosophila in the presence of gene conversion. Proc. Natl. Acad. Sci. U.S.A. 114: E4762–E4771.
    OpenUrlAbstract/FREE Full Text
  9. ↵
    Campos J.L., and B. Charlesworth, 2019 The effects on neutral variability of recurrent selective sweeps and background selection. Genetics 212: 287–303.
    OpenUrlAbstract/FREE Full Text
  10. ↵
    Charlesworth B., M.T. Morgan, and D. Charlesworth, 1993 The effect of deleterious mutations on neutral molecular variation. Genetics 134: 1289–1303.
    OpenUrlAbstract/FREE Full Text
  11. ↵
    Charlesworth B., 2020a How long does it take to fix a favorable mutation, and why should we care? Am. Nat. 195: 753–771.
    OpenUrl
  12. ↵
    Charlesworth B., 2020b How good are predictions of the effects of selective sweeps on levels of neutral diversity? bioRxiv 2020.05.27.119883.
  13. ↵
    Charlesworth J., and A. Eyre-Walker, 2007 The other side of the nearly neutral theory, evidence of slightly advantageous back-mutations. Proc. Natl. Acad. Sci. USA 104: 16992–997.
    OpenUrlAbstract/FREE Full Text
  14. ↵
    Charlesworth J., and A. Eyre-Walker, 2008 The McDonald–Kreitman test and slightly deleterious mutations. Mol. Biol. Evol. 25: 1007–1015.
    OpenUrlCrossRefPubMedWeb of Science
  15. ↵
    Comeron J.M., R. Ratnappan, and S. Bailin, 2012 The many landscapes of recombination in Drosophila melanogaster. PLoS Genet. 8:e1002905.
    OpenUrlCrossRefPubMed
  16. ↵
    Crisci J., Y.-P. Poh, S. Mahajan, and J.D. Jensen, 2013 The impact of equilibrium assumptions on tests of selection. Front. Genet. 4: 235.
    OpenUrlCrossRefPubMed
  17. ↵
    Crow J.F., 1993 Mutation, mean fitness, and genetic load. Oxf. Surv. Evol. Biol. 9: 3–42.
    OpenUrl
  18. ↵
    Cruickshank T.E., and M.W. Hahn, 2014 Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Mol. Ecol. 23: 3133–3157.
    OpenUrlCrossRefPubMedWeb of Science
  19. ↵
    Elyashiv E., S. Sattath, T.T. Hu, A. Strutsovsky, G. McVicker, et al., 2016 A genomic map of the effects of linked selection in Drosophila. PLoS Genet. 12: e1006130.
    OpenUrlCrossRef
  20. ↵
    Eyre-Walker A., and P.D. Keightley, 2007 The distribution of fitness effects of new mutations. Nat. Rev. Genet. 8: 610–8.
    OpenUrlCrossRefPubMedWeb of Science
  21. ↵
    Eyre-Walker A., and P.D. Keightley, 2009 Estimating the rate of adaptive molecular evolution in the presence of slightly deleterious mutations and population size change. Mol. Biol. Evol. 26: 2097–2108.
    OpenUrlCrossRefPubMedWeb of Science
  22. ↵
    Fay J., G.J. Wycoff, and C.-I Wu, 2001 Positive and negative selection on the human genome. Genetics 158: 1227–1234
    OpenUrlAbstract/FREE Full Text
  23. Fay J., G.J. Wycoff, and C.-I Wu, 2002 Testing the neutral theory of molecular evolution with genomic data from Drosophila. Nature 415: 1024–1026.
    OpenUrlCrossRefPubMedWeb of Science
  24. ↵
    Fisher R.A. 1922. On the dominance ratio. Proceedings of the Royal Society of Edinburgh 42: 321–341.
    OpenUrl
  25. ↵
    Fisher R.A., 1930 The genetical theory of natural selection. Clarendon Press, Oxford.
  26. ↵
    Fiston-Lavier A.-S., N.D. Singh, M. Lipatov, and D.A. Petrov, 2010 Drosophila melanogaster recombination rate calculator. Gene 463: 18–20.
    OpenUrlCrossRefPubMedWeb of Science
  27. ↵
    Frisse L., R.R. Hudson, A. Bartoszewicz, J.D. Wall, J. Donfack, and A. Di Rienzo, 2001 Gene conversion and population histories may explain the contrast between polymorphism and linkage disequilibrium levels. Am. J. Hum. Genet. 69: 831–43.
    OpenUrlCrossRefPubMedWeb of Science
  28. ↵
    Gillespie J.H., 1994 Substitution processes in molecular evolution. III. Deleterious alleles. Genetics 138: 943–952.
    OpenUrlAbstract/FREE Full Text
  29. ↵
    Glinka S.L., L. Ometto, S. Mousset, W. Stephan, and D. De Lorenzo, 2003 Demography and natural selection have shaped genetic variation in Drosophila melanogaster: a multi-locus approach. Genetics 165:1269–1278.
    OpenUrlAbstract/FREE Full Text
  30. ↵
    Haddrill P.R., K.R. Thornton, B. Charlesworth, and P. Andolfatto, 2005 Multilocus patterns of nucleotide variability and the demographic and selection history of Drosophila melanogaster populations. Genome Res. 15:790–799.
    OpenUrlAbstract/FREE Full Text
  31. ↵
    Haldane J.B.S. 1927. A mathematical theory of natural and artificial selection. Part V. Selection and mutation. Math. Proc. Camb. Philos. Soc 23:838–844.
    OpenUrlCrossRef
  32. ↵
    Haller B.C., and P.W. Messer, 2017 asymptoticMK: A web-based tool for the asymptotic McDonald–Kreitman test. G3 7: 1569–1575.
    OpenUrlAbstract/FREE Full Text
  33. ↵
    Haller B.C., and P.W. Messer, 2019 SLiM 3: Forward genetic simulations beyond the Wright-Fisher model. Mol. Biol. Evol. 36: 632–637.
    OpenUrlCrossRef
  34. ↵
    Halligan D.L., and P.D. Keightley, 2006 Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide interspecies comparison. Genome Res. 16: 875–884.
    OpenUrlAbstract/FREE Full Text
  35. ↵
    Harr B., M. Kauer, and C. Schlotterer, 2002 Hitchhiking mapping: a population-based fine-mapping strategy for adaptive mutations in Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 99: 12949–12954.
    OpenUrlAbstract/FREE Full Text
  36. ↵
    Harris R.B., A. Sackman, and J.D. Jensen, 2018 On the unfounded enthusiasm for soft selective sweeps II: examining recent evidence from humans, flies, and viruses. PLoS Genetics 14(12):e1007859.
    OpenUrl
  37. ↵
    Hartfield M., and T. Bataillon, 2020 Selective sweeps under dominance and inbreeding. G3 10: 1063–75.
    OpenUrlAbstract/FREE Full Text
  38. ↵
    Hernandez R.D., J.L. Kelley, E. Elyashiv, S.C. Melton, A. Auton, et al., 2011 Classic selective sweeps were rare in recent human evolution. Science 331: 920–924.
    OpenUrlAbstract/FREE Full Text
  39. ↵
    Hudson R.R., M. Slatkin, and W.P. Maddison, 1992 Estimating levels of gene flow from DNA sequence data. Genetics 13: 583–589.
    OpenUrl
  40. ↵
    Jacquier H., A. Birgy, H.L. Nagard, Y. Mechulam, E. Schmitt, et al., 2013 Capturing the mutational landscape of the beta-lactamase TEM-1. Proc. Natl. Acad. Sci. USA 110: 13067– 13072.
    OpenUrlAbstract/FREE Full Text
  41. ↵
    Jakobsson M., M.D. Edge, and N.A. Rosenberg, 2013 The relationship between FST and the frequency of the most frequent allele. Genetics 193: 515–528.
    OpenUrlAbstract/FREE Full Text
  42. ↵
    Jensen J.D., Y. Kim, V.B. DuMont, C.F. Aquadro, and C.D. Bustamante, 2005 Distinguishing between selective sweeps and demography using DNA polymorphism data. Genetics 170:1401– 1410.
    OpenUrlAbstract/FREE Full Text
  43. ↵
    Jensen J.D., K.R. Thornton, and P. Andolfatto, 2008 An approximate Bayesian estimator suggests strong recurrent selective sweeps in Drosophila. PLoS Genet. 4:e1000198.
    OpenUrlCrossRefPubMed
  44. ↵
    Jensen J.D., 2009 On reconciling single and recurrent hitchhiking models. Genome Biol. Evol. 1: 320–4.
    OpenUrlPubMed
  45. ↵
    Johri P., B. Charlesworth, and J.D. Jensen, 2020 Toward an evolutionarily appropriate null model: jointly inferring demography and purifying selection. Genetics 215: 173–192.
    OpenUrlAbstract/FREE Full Text
  46. ↵
    Keightley P.D., U. Trivedi, M. Thomson, F. Oliver, S. Kumar, et al., 2009 Analysis of the genome sequences of three Drosophila melanogaster spontaneous mutation accumulation lines. Genome Res. 19: 1195–1201.
    OpenUrlAbstract/FREE Full Text
  47. Keightley P.D., R.W. Ness, D.L. Halligan, and P.R. Haddrill, 2014 Estimation of the spontaneous mutation rate per nucleotide site in a Drosophila melanogaster full-sib family. Genetics 196: 313–320.
    OpenUrlAbstract/FREE Full Text
  48. ↵
    Kim Y., 2006 Allele frequency distribution under recurrent selective sweeps. Genetics 172: 1967–1978.
    OpenUrlAbstract/FREE Full Text
  49. ↵
    Kim Y., and W. Stephan, 2002 Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics 160: 765–777.
    OpenUrlAbstract/FREE Full Text
  50. ↵
    Kimura M., 1957 Some problems of stochastic processes in genetics. Ann. Math. Stat. 28: 882–901.
    OpenUrl
  51. ↵
    Kimura M., 1962 On the probability of fixation of mutant genes in a population. Genetics 47: 713–719.
    OpenUrlFREE Full Text
  52. ↵
    Kimura M., 1964 Diffusion models in population genetics. J. Appl. Probab. 1: 177–232.
    OpenUrlCrossRef
  53. ↵
    Kimura, M., 1970 The length of time required for a selectively neutral mutation to reach fixation through random frequency drift in a finite population. Genet. Res. 15: 131–133.
    OpenUrlCrossRefPubMed
  54. ↵
    Kimura M., and T. Ohta, 1969 The average number of generations until fixation of a mutant gene in a finite population. Genetics 61: 763–771.
    OpenUrlFREE Full Text
  55. ↵
    Li H., and W. Stephan, 2006 Inferring the demographic history and rate of adaptive substitution in Drosophila. PLoS Genet. 2: e166.
    OpenUrlCrossRefPubMed
  56. ↵
    Li Y.J., Y. Satta, and N. Takahata, 1999 Paleo-demography of the Drosophila melanogaster subgroup: application of the maximum likelihood method. Genes Genet. Syst. 74: 117–127.
    OpenUrlCrossRefPubMedWeb of Science
  57. ↵
    Lynch M., D. Blanchard, D. Houle, T. Kibota, S. Schultz, L. Vassilieva, and J. Willis, 1999 Spontaneous deleterious mutations. Evolution 53: 645–63.
    OpenUrlCrossRefWeb of Science
  58. ↵
    Macpherson J.M., G. Sella, J.C. Davis, and D.A. Petrov, 2007 Genome-wide spatial correspondence between nonsynonymous divergence and neutral polymorphism reveals extensive adaptation in Drosophila. Genetics 177: 2083–2099.
    OpenUrlAbstract/FREE Full Text
  59. ↵
    Mafessoni F., and M. Lachmann, 2015 Selective strolls: fixation and extinction in diploids are slower for weakly selected mutations than for neutral ones. Genetics 201: 1581–1589.
    OpenUrlAbstract/FREE Full Text
  60. ↵
    Maruyama T., and M. Kimura, 1974 A note on the speed of gene frequency changes in reverse directions in a finite population. Evolution 28: 161–163.
    OpenUrlCrossRef
  61. ↵
    Maynard Smith J., and J. Haigh, 1974 The hitch-hiking effect of a favourable gene. Genet. Res. 23: 23–35.
    OpenUrlCrossRefPubMedWeb of Science
  62. ↵
    McDonald J.H., and M. Kreitman, 1991 Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652–654.
    OpenUrlCrossRefPubMedWeb of Science
  63. ↵
    Messer P.W., and D.A. Petrov, 2013 Frequent adaptation and the McDonald–Kreitman test. Proc. Natl. Acad. Sci. U.S.A. 110: 8615–8620.
    OpenUrlAbstract/FREE Full Text
  64. ↵
    Miller D.E., C.B. Smith, N.Y. Kazemi, A.J. Cockrell, A.V. Arvanitakis, J.P. Blumenstiel, S.L. Jaspersen, and R.S. Hawley, 2016 Whole-genome analysis of individual meiotic events in Drosophila melanogaster reveals that noncrossover gene conversions are insensitive to interference and the centromere effect. Genetics 203:159–171.
    OpenUrlAbstract/FREE Full Text
  65. ↵
    Nielsen R., S. Williamson, Y. Kim, M.J. Hubisz, A.G. Clark, and C.D. Bustamante, 2005 Genomics scans for selective sweeps using SNP data. Genome Res. 15: 1566–1575.
    OpenUrlAbstract/FREE Full Text
  66. ↵
    Pavlidis P., J.D. Jensen, W. Stephan, and A. Stamatakis, 2012 A critical assessment of storytelling: Gene Ontology categories and the importance of validating genomic scans. Mol. Biol. Evol. 29: 3237–3248.
    OpenUrlCrossRefPubMedWeb of Science
  67. ↵
    Pfeifer B., U. Wittelsbürger, S.E. Ramos-Onsins, and M.J. Lercher, 2014 PopGenome: an efficient Swiss army knife for population genomic analyses in R. Mol. Biol. Evol. 31: 1929– 1936.
    OpenUrlCrossRefPubMedWeb of Science
  68. ↵
    Przeworski M., 2002 The signature of positive selection at randomly chosen loci. Genetics 160: 1179–1189.
    OpenUrlAbstract/FREE Full Text
  69. ↵
    R Core Team, 2018 R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.Rproject.org/.
  70. ↵
    Sanjuán R., 2010 Mutational fitness effects in RNA and single-stranded DNA viruses: common patterns revealed by site-directed mutagenesis studies. Philos. Trans. R. Soc. Lond. B Biol. Sci. 365: 1975–82.
    OpenUrlCrossRefPubMed
  71. ↵
    Sattath S., E. Elyashiv, O. Kolodny, Y. Rinott, and G. Sella, 2011 Pervasive adaptive protein evolution apparent in diversity patterns around amino acid substitutions in Drosophila simulans. PLoS Genet. 7:e1001302.
    OpenUrlCrossRefPubMed
  72. ↵
    Sella G., D.A. Petrov, M. Przeworski, and P. Andolfatto, 2009 Pervasive natural selection in the Drosophila genome. PLoS Genet. 5:e1000495.
    OpenUrlCrossRefPubMed
  73. ↵
    Stephan W., 2019 Selective sweeps. Genetics 211: 5–13.
    OpenUrlAbstract/FREE Full Text
  74. ↵
    Tajima F., 1990 Relationship between DNA polymorphism and fixation time. Genetics 125: 447–454.
    OpenUrlAbstract/FREE Full Text
  75. ↵
    Teshima K.M., G. Coop, and M. Przeworski, 2006 How reliable are empirical genome scans for selective sweeps? Genome Res. 16: 702–712.
    OpenUrlAbstract/FREE Full Text
  76. ↵
    Thornton K., 2003 Libsequence: a C++ class library for evolutionary genetic analysis. Bioinformatics 19: 2325–2327.
    OpenUrlCrossRefPubMedWeb of Science
  77. ↵
    Thornton K.R., and J.D. Jensen, 2007 Controlling the false positive rate in multi-locus genome scans for selection. Genetics 175: 737–750.
    OpenUrlAbstract/FREE Full Text
  78. ↵
    Wiehe T.H., and W. Stephan, 1993 Analysis of a genetic hitchhiking model and its application to DNA polymorphism data from Drosophila melanogaster. Mol. Biol. Evol. 10: 842–854.
    OpenUrlPubMedWeb of Science
  79. ↵
    Wright S., 1931 Evolution in Mendelian populations. Genetics 16:97–159.
    OpenUrlFREE Full Text
View Abstract
Back to top
PreviousNext
Posted November 17, 2020.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Revisiting the notion of deleterious sweeps
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Revisiting the notion of deleterious sweeps
Parul Johri, Brian Charlesworth, Emma K. Howell, Michael Lynch, Jeffrey D. Jensen
bioRxiv 2020.11.16.385666; doi: https://doi.org/10.1101/2020.11.16.385666
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Revisiting the notion of deleterious sweeps
Parul Johri, Brian Charlesworth, Emma K. Howell, Michael Lynch, Jeffrey D. Jensen
bioRxiv 2020.11.16.385666; doi: https://doi.org/10.1101/2020.11.16.385666

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Evolutionary Biology
Subject Areas
All Articles
  • Animal Behavior and Cognition (2430)
  • Biochemistry (4789)
  • Bioengineering (3330)
  • Bioinformatics (14673)
  • Biophysics (6635)
  • Cancer Biology (5168)
  • Cell Biology (7424)
  • Clinical Trials (138)
  • Developmental Biology (4362)
  • Ecology (6873)
  • Epidemiology (2057)
  • Evolutionary Biology (9914)
  • Genetics (7345)
  • Genomics (9522)
  • Immunology (4552)
  • Microbiology (12674)
  • Molecular Biology (4942)
  • Neuroscience (28320)
  • Paleontology (199)
  • Pathology (808)
  • Pharmacology and Toxicology (1391)
  • Physiology (2024)
  • Plant Biology (4495)
  • Scientific Communication and Education (977)
  • Synthetic Biology (1299)
  • Systems Biology (3913)
  • Zoology (725)