Abstract
Deleterious alleles are more likely to reach high frequency in small populations because of chance fluctuations in allele frequency. This may lead, over time, to reduced average fitness in the population. In that sense, selection is more ‘effective’ in larger populations. Many recent studies have considered whether the different demographic histories across human populations have resulted in differences in the number, distribution, and severity of deleterious variants, leading to an animated debate. This article seeks to clarify some terms of the debate by identifying differences in definitions and assumptions used in these studies and providing an intuitive explanation for the observed similarity in genetic load among populations. The intuition is verified through analytical and numerical calculations. First, even though rare variants contribute to load, they contribute little to load differences across populations. Second, the accumulation of non-recessive load after a bottleneck is slow for the weakly deleterious variants that contribute much of the long-term variation among populations. Whereas a bottleneck increases drift instantly, it affects selection only indirectly, so that fitness differences can keep accumulating long after a bottleneck is over. Third, drift and selection tend to have opposite effects on load differentiation under dominance models. Because of this competition, load differences across populations depend sensitively and intricately on past demographic events and on the distribution of fitness effects. A given bottleneck can lead to increased or decreased load for variants with identical fitness effects, depending on the subsequent population history. Because of this sensitivity, both classical population genetic intuition and detailed simulations are required to understand differences in load across populations.
One of the best-known predictions of population genetics is that smaller populations harbor less diversity at any one time but accumulate a higher number of deleterious variants over time [1]. The reduction in diversity has been observed in populations that have undergone strong population bottlenecks: Diversity is much decreased in populations that left Africa, and further decreases with successive founder events [2, 3, 4, 5]. The effect of demography on the accumulation of deleterious variation has been more elusive in both humans and non-human species. In conservation genetics, where fitness can be measured directly and effective population sizes are small, a modest correlation between population size and fitness was observed [6]. In humans, fitness is usually estimated through bioinformatic prediction [7, 8]. Lohmueller et al. [9] found a higher proportion of predicted deleterious variants among Europeans than among African-Americans, and attributed the finding to a reduced efficacy of selection in Europeans because of the Out-of-Africa (OOA) bottleneck. However, recent simulation studies [10, 11] suggest that there has not been enough time for substantial differences in fitness to accumulate in these populations, at least under an additive model of dominance. Peischl et al. [12], by contrast, have claimed significant differences among populations under range expansion models, and Fu [13] claims a slight excess in the number of deleterious alleles in European-Americans compared to that in African-Americans. These results have sparked a heated debate as to whether the efficacy of selection has indeed been different across human populations [14, 13].
What does it mean for selection to be ‘effective’? Some genetic variants increase the expected number of offspring by carriers. As a result, these variants tend to increase in frequency in the population. This relation between the fitness of a variant and its ultimate fate in the population—that is, natural selection—holds independently of the biology and the history of the population. However, the rate of increase in frequency for favorable alleles depends on mutation, dominance, linkage, and demography, and can vary across populations. Differences in this rate can be thought of as differences in the efficacy of selection.
The purpose of this article is to provide an intuitive but quantitative understanding of the interplay of mutation, selection, and drift in the variation in genetic load across human populations, and more specifically to discuss how the classical prediction about the reduced efficacy of selection in small populations could be verified in real populations.
1 Models
We will assume that alleles have constant fitness coefficients s and that selection is additive over genes (no epistasis). Given alleles a and A, we assume that genotype aa has fitness 1, aA has fitness 1 + 2sh, and AA has fitness 1 + 2s. In a random-mating population, an allele A at frequency x adds an average of 2s (2hx + (1 − 2h)x2) to individual fitness compared to the aa genotype. Fitness and favorable allele frequency are proportional only for genic selection (no dominance; h = 1/2).
I consider natural selection to be intense if the average frequency of favorable alleles increases by a large amount. I will say that natural selection is reliable if the frequency of favorable alleles increases with high probability, and that it is thorough if a large proportion of deleterious mutations are eradicated from the population. In addition, I will say that fitness increases if the mean fitness in the population increases. Since the genetic load is the difference between the mean fitness and optimal fitness, a fitness increase corresponds to a reduction in the genetic load. Intensity, reliability, thoroughness, and increase in fitness are all measures of the dynamics of selected alleles that can be compared across populations, and that have been used to define the efficacy of selection [11, 9, 15].
If we wait for the frequency of alleles to reach 0 or 1, all four definitions are in agreement: In that case, the average frequency of favorable alleles equals the proportion of favorable alleles with frequency 1 and also equals the proportion of favorable alleles whose frequency has increased. However, these concepts are not equivalent if we consider evolution over a shorter time span. In some cases, differences across populations may have little to do with the action of selection. This short-term disagreement among definitions that are equivalent in the long term has led to much confusion (see [14] and references therein).
Figure 1 illustrates the difference between intensity, reliability, and thoroughness for populations with different sizes. During Wright-Fisher reproduction, an allele with parental frequency x, fitness coefficient s, and dominance coefficient h will be drawn with probability x′ ≃ x + 2sx(1 − x)(h + x(1 − 2h)) in the descending population. Figure 1 shows the resulting binomial distributions in offspring allele frequency for x = 0.5, s = −0.3, h = 0.5, and 2N = 100, 500, and ∞. The average frequency x′ is independent of N, hence the intensity of selection is the same in all populations. By contrast, the variance in allele frequency is much larger in the smaller population, and so selection is much less reliable. Since very little time has elapsed, a vanishingly small proportion of deleterious alleles have been eliminated. The thoroughness of selection over that period is close to 0. Finally, the sign and magnitude of fitness changes depend on the population size as well as on the selection and dominance coefficients.
If we let these populations evolve further, however, we will eventually find that deleterious allele frequencies decrease more slowly in the smaller population. This is because selection requires favorable alleles to outcompete deleterious ones, and has little effect on the frequency of very common deleterious variants. If drift pushes a deleterious variant to fixation, selection can no longer act to reduce its frequency. Mathematically, the average increase in frequency of favorable alleles after one generation, Δx = 2sx(1 − x)(h + x(1 − 2h)), is a convex function of x when ⅓ ≤ h ≤ ⅔ (Figure 2). The smaller population, having accumulated more variance in allele frequency, will be able to eliminate fewer deleterious alleles. The intensity of selection does not depend on the current amount of drift in a population, but on the drift accumulated in previous generations. Conversely, drift during one generation can affect the intensity of selection for many future generations. When 0 ≤ h ≤ ⅓ or ⅔ ≤ h ≤ 1, the increase in favorable allele frequency Δx is not a convex funcion of x, and drift can lead to increased intensity of selection and eventually to increased fitness, as discussed below.
Figure 2 also suggests that substantial differences in the efficacy of genic selection require deleterious alleles to reach appreciable frequencies. Thus, rare alleles may contribute to load, but their contribution is relatively insensitive to recent demography as long as they are not pushed to high frequency.
2 Asymptotics
To quantify these arguments, we calculate the moments of the allele frequency distribution ϕ(x,t) under the diffusion approximation. Specifically, ϕ(x,t) represents the number of alleles with frequency x at time t. In a randomly mating population of size Ne = Ne(t) ≫ 1, it obeys where u is the total mutation rate. The first term describes the effect of drift; the second term, the effect of selection; and the third term, with Dirac’s delta distribution δ, describes the influx of new mutations. From this equation we can easily calculate evolution equations for moments of the expected allele frequency distribution . For example, the rate of change in allele frequencies , is driven by mutation and selection: where Γi,h = 4(μi−μi+1)h+4(1 − 2h)(μi+1−μi+2) is a function of the diversity in the population that generalizes the heterozygosity π1,0=Γ1,1/2 (see Appendix for detailed calculations). We can define the contributions of selection and mutation as and . The effect of mutation is constant and independent of population size, but the effect of selection is modulated by Γ1,h, and therefore depends on the history of the population.
Similarly, changes in the expected fitness F can be decomposed into contributions from mutation, drift, and selection:
Favorable mutations increase fitness, drift increases fitness when fitness of the heterozygote is below the mean of the homozygotes, and selection always increases fitness.
It is therefore natural to define the cumulative effect of selection on load as , the change in fitness caused directly by selection. Fisher’s Fundamental Theorem, for example, equates the effect of selection on fitness change, , to the additive variance in fitness [16]. The relationship does not hold for total fitness changes [17]: in the presence of dominance and drift, fitness can decrease despite the favorable action of selection. Similarly, a population in a fluctuating environment is constantly adapting, in the sense that , but may not gain fitness in the long term [17, 18].
Now consider an ancestral population that splits into two randomly mating populations with initial sizes N1 and N2. The populations may experience migration and continuous size fluctuations. Using the moments method, the Appendix shows that the difference in fitness Δ(t) = F1(t) − F2(t) between populations 1 and 2 is where t is the time in generations, π1,0 is the expected heterozygosity in the source population, and O(t2) represents terms at least quadratic in t—these will eventually dominate, but are small right after the split. This rapid, linear differentiation is entirely driven by drift coupled with dominance, and is independent of the effect of selection after the split. The smallest population has higher fitness when the heterozygote is at a disadvantage.
By contrast, the effect of selection on load differences Δs(t) grows only quadratically: where Πh is a moment of the ancestral frequency distribution that also reduces to π1,0 when h = 0.5 (see Appendix). This slower response is the mathematical consequence of the intuition provided by Figures 1 and 2: Differences in drift need to accumulate before differences in the rate of selection can be observed, and differences in the rate of selection need to accumulate to produce differences in fitness.
The effect of new mutations is even slower to appear, since the direct effect of mutation on load is independent of demography [Equation (3)], as is the combined effect of mutation and selection (Table 1). An extra factor of ut accounts for the time necessary for new mutations to accumulate in the population, before drift and selection can induce differences in load:
Finally, even though a bottleneck inexorably leads to increased load when no dominance is present, we show in the Appendix that the generic intermediateterm effect of a bottleneck is to reduce the genetic load caused by recessive variants. However, we will see in simulations that the familiar short-term increase in recessive load can last hundreds or thousands of generations for weakly deleterious variants.
3 Simulations
Evolution was simulated using ∂a∂i [19] and the Out-Of-Africa demographic model inferred from synonymous variation in [20] and illustrated in Figure 3. Simulated genetic loads were obtained for all combinations of selection coefficients γ ≡2Nrs ∈ {0, −0.01, −0.1, −0.3, −1, −3, −10, −30, −100}, with Nr = 7300, and dominance coefficients h ∈ {0, 0.05, 0.5, 1}. The contributions of selection and drift were obtained using Equation (3). Simulations were carried beyond the present time by assuming equal and large population sizes (Ne = 20Nr) to emphasize the long-lasting effect of past drift on the efficacy of selection. In all cases, Equations (4), (5), and (6) capture the initial increase in load (Figures 4B, 5B, 6B, and 7B).
3.1 Genic selection; h = 1/2
The models predict differences in load that are small and limited to intermediate-effect variants (.3 < |γ| < 30, 2 × 10−5 < |s| < 0.002). Assuming the distribution of fitness effects from Boyko et al. [22], the excess load in the OOA population is about 0.27 per Gb of amino-acid–changing variants, compared to a total load of 18.69 per Gb. If we consider the 24 Mb of exome covered by the 1000 Genomes project, and assume that 70% of mutations are coding in that region [23], the model predicts a non-synonymous load difference of 0.01. The total estimated non-synonymous load, excluding mutations fixed in the ancestral state, is 0.7. In this model, the reduced efficacy of selection caused by the OOA bottleneck leads to a relative increase in non-recessive load of 1.4%. Since we did not consider fixed ancestral deleterious alleles in the total load, we expect the relative increase in load due to the bottleneck to be even smaller. The relative increase reaches a maximum of 8% for mutations with −20 < γ < −10.
The contribution of new mutations is important for very deleterious variants, but these contribute a small fraction of total load differences. The fast response of very deleterious variants to changes in population sizes described in Equation (5) can also be seen at the time of the second bottleneck in Figures 4B and 4C. By contrast, load due to benign additive variants reacts much more slowly: To date, its evolution is almost entirely determined by ancient diversity and the size of the early bottleneck.
3.2 Partial and complete dominance
The picture changes dramatically when we consider recessive deleterious variants (h = 0, Figure 5). Reactions to changes in population size are linear rather than quadratic, and they are more substantial than in the additive case (Figure 5). The load due to variants with γ = −100 almost doubles after 500 generations, excluding fixed ancestral deleterious alleles. This excess load in the OOA population is due entirely to drift, and leads to an increased intensity of selection in the OOA population (Figure 5C), since a higher proportion of deleterious alleles are now visible to selection. The difference in load for the most deleterious variants is not sustained. Both the number of very deleterious variants and the associated genetic load eventually becomes higher in the Yoruba population model. In this case, both the early bottleneck and later growth in the European population model act to reduce recessive load from strong-effect variants. By contrast, weak-effect deleterious variants contribute more load in the European population model.
The opposite occurs for dominant deleterious variants (Figure 6). Drift tends to increase fitness by combining more of the deleterious alleles into homozygotes, reducing their average effect on fitness. The difference is much less pronounced and less sustained than in the recessive case. Equation (4) shows that the reduced magnitude is caused by reduced ancestral heterozygosity π1,0: Dominant deleterious alleles are much less likely to reach appreciable allele frequencies before drift. Here again, the population with the highest load depends on the selection coefficient, with a higher load in the European population model for strongly deleterious variants and a higher load in the Yoruba population model for the weakly deleterious variants.
If a deleterious allele is not completely recessive (h = 0.05, Figure 7), load differentiation is suppressed for the most deleterious alleles. This effect can be traced back to a reduction in the initial diversity. However, load differentiation is largely unchanged compared to that in the recessive case for weakly deleterious alleles.
4 Discussion
Selection affects evolution in many ways. It tends to increase the frequency of favorable alleles and the overall fitness of a population living in a constant environment, and it often reduces diversity. The rates at which it performs these tasks varies across populations, and population geneticists like to frame these differences in terms of the ‘efficacy’ of selection.
Defining such an ‘efficacy’ suggests a purpose to evolution, and unacknowledged disagreement about this purpose can lead to confusion.
One class of definitions focuses on the outcome of the entire evolutionary process: Selection is deemed effective if the final product looks ‘selected’—as measured by fitness, by the frequency of deleterious alleles, or by the relative impact of fitness and drift. This approach is convenient, because it enables us to compare the ‘efficacy’ of selection across populations without having to worry about what exactly led to the current state. On the other hand, because it is not tied to what selection is actually doing, it can lead to paradoxes. For example, considering only fitness differences would force us to conclude that selection is ‘effective’ even during a period when it is entirely turned off.
The other class of definitions attempts to measure what selection is actually doing. Here we define the efficacy of selection as the instantaneous rate of fitness increase due to selection, . This is the definition implicitly used by Fisher in the derivation of his fundamental theorem [17], which relates the direct effect of selection on fitness to the additive variance in fitness. Importantly, does not correspond to the total rate of fitness increase, because drift and mutation can impact fitness as well. Unfortunately, Fisher’s efficacy is difficult to measure even for constant fitness, because it requires dense time-series data or accurate models.
An alternate observable is the intensity of selection, that is, the rate of increase in favorable allele frequency, which has expectation . The intensity of selection can be directly attributed to selection, and it can be compared across populations without the need for time-series data. In addition, because the weight we assign to each variant can be an arbitrary constant, we can use any a priori information about the importance of the variant in our definition. A polyphen-weighted intensity of selection is equally valid a measure of the effects of selection as a gerp-weighted one or a selection-coefficient–weighted one.
There are many ways to assess the role of selection in shaping genetic diversity. The optimal measure ultimately depends on the biological phenomenon we are attempting to model. When seeking to determine whether selection is ‘less effective’ in a population going through a bottleneck, I believe that the effect that we are trying to demonstrate is the one illustrated in Figures 1 and 2: Drift tends to reduce the number of beneficial alleles that an average deleterious allele has to outcompete. This effect can be measured equally as a change in the efficacy of selection, as in Fisher’s Fundamental Theorem, or as a change in the intensity of selection. In cases with dominance, drift initially affects the efficacy of selection by increasing homozygosity. The more subtle action of drift—the reduction in the number of competitors per deleterious allele—can be the dominant long-term effect, but it may be extraordinarily difficult to measure at short time scales given the overbearing effect of drift on homozygosity.
Recessive, fairly deleterious alleles have the most potential to cause substantial differences in load across populations, because drift causes rapid and pronounced increase in recessive load in a bottleneck population. However, this increase in load is tied to an increase in the efficacy and intensity of selection: Drift pushes rare recessive alleles to frequencies where selection is more likely to remove them. These competing effects create complex dynamics. In simulations, the CEU model population showed a reduced load caused by very deleterious recessive variants—and an increased load due to intermediate-effect recessive variants—compared to the YRI model. By contrast, differences in load for additive variants systematically favor the larger population, but are smaller and limited to a narrow range of fitness effects. In both cases, the effect of the bottleneck on the efficacy of selection continues long after the bottleneck is over and is modulated by the later history of the populations: The OOA bottleneck has had a different effect on the efficacy of selection for all the populations that have experienced it.
5 Acknowledgements
I thank S. Baharian, M. Barakatt, B. Henn, and D. Nelson for useful comments on this manuscript.
Appendix
6 Appendix
6.1 Background
To derive the asymptotic results in the text, we start with the diffusion approximation
The mutational model, described using Dirac’s δ, is an infinite-sites model that neglects the possibility of back mutations. A complete solution of this problem can be expressed as a superposition of Gegenbauer polynomials [24]. However, here we are looking for simple asymptotic results that will help us understand the dynamics of the problem. We integrate both sides using to obtain the time dependence of the moments of the allele frequency distribution: . Thus, μ0 is the number of segregating sites, and μ1 is the average allele frequency over all sites. Using some integration by parts, we get where ϕ(0,t) and ϕ (1, t) are defined by continuity from the function values in the open interval (0,1).
That is, we lose segregating sites due to fixation, and we win them due to mutation. Similarly, if h = 1/2,
Segregating alleles can disappear due to fixation, change in frequency due to selection, or appear due to mutation. Of course, alleles that fix at frequency 1 do not really disappear; they just stop segregating. If we keep track of fixed alleles, the first term vanishes. If we account for both segregating and fixed non-ancestral sites, the general term is where πk and Γk,h are moments of the distribution: and
6.2 Response in allele frequencies
Solving Equation (8) is challenging, but it can be used to calculate the response of allele frequency to a sudden change in demographic or selective conditions. Consider a population of size N0 that experiences a change in size to N1 at time t = 0. We can expand μk for short times: where μk,0is the kth moment prior to the population size change. The other terms can be evaluated by collecting powers of t in Equation (8). For example, we get
Load increases even if N1 = N0, since our model assumes a constant supply of irreversible mutations. Since this linear term is independent of N1, it does not contribute to differences across populations that share a common ancestor. Differences appear at the next order:
6.3 Response in genetic load
To compute the fitness in the diploid case, we write
Using (8), we get where are the contributions of selection, mutation, and drift to changes in fitness. The mutation term is constant in time and independent of population size; it will not lead to differences across populations. The drift term, by contrast, has an explicit dependence on the population size; this leads to differentiation between populations that grows linearly in time: where π1,0 is the heterozygosity in the ancestral population. This reduction in load is driven purely by drift and dominance, and would happen even if the selection term were removed from the diffusion equation. It does not lead to ‘natural selection’, in that it does not increase the frequency of the favorable allele.
The changes in load due to selection are quadratic in time: where Πh is a moment of the ancestral frequency distribution: which reduces to the heterozygosity π1,0 when h = 1/2: The statistic Πh depends only on the ancestral frequency distribution and the dominance cofficient.If we imagine that the source population has frequency distribution ϕ(x) = δ(x−f), we can compute the efficacy of selection as a function of h and the frequency f (Figure 8).
6.4 Effect of new mutations
If we set πi,0 = 0 in the equations above, we can calculate the impact of new mutations on the genetic load. The leading term is again due to drift and dominance: while the leading term describing the efficacy of selection is now cubic in t:
6.5 Intermediate and long-term effects of a bottleneck under a completely recessive model
Under recessive models, drift will make deleterious alleles more visible to selection, and therefore selection will become more intense. However, drift also increases the expected genetic load by increasing homozygosity. The short-time asymptotic results tell us that drift acts faster than selection, and that the initial effect of the bottleneck will be an increase in load. Can the increased intensity of selection eventually overcome the increased homozygosity and lead to an overall reduction in load? Kimura found that for h = 0.05, smaller populations could have a slightly reduced load compared to larger populations [1], leading us to expect that drift can indeed increase fitness in some cases. In fact, increased fitness to deleterious alleles may be the generic response to a bottleneck for all but the very shortest and very longest times.
To see this in the recessive case (h = 0), consider two randomly mating populations with identical frequency spectrum ϕ0(x) at t = 0 and identical infinite population sizes for t > 0, except for a brief but intense bottleneck in population 1. The expected load contributed by mutations that occur after the bottle-neck are identical in the two populations, so differences in load can be traced back to the effect of drift on the initial distribution. Furthermore, drift after the bottleneck can be ignored, and the frequency of an allele with initial frequency x0 can be approximated by
If −st ≫ 1, we have
The fitness is simply F(t) = 2sx(t)2, and therefore
To study the effect of drift on fitness, we can use intuition similar to that suggested by Figure 2: If the genetic load — F(x0,t) is a concave function of x0, drift leads to higher load. Conversely, a convex function means that drift leads to reduced load. Figure 10 shows that the function becomes convex after a finite amount of time, at least for variants with low frequency. We have
For variants with , a small spread in allele frequency leads to a decrease in load. However, even a slight amount of drift can lead a minuscule number of variants to fixation. After a long enough time, unfixed deleterious variants will be eliminated by selection, and only fixed variants will be kept. It may take a very long time to reach this situation, but this guarantees that the very long-term effect of the bottleneck in the infinite-sites model is a reduction in fitness. However, even though the short- and long-term consequence of a bottleneck is an increase in genetic load, the generic behavior for intermediate times is an increase in fitness.
6.6 Microscopic and macroscopic efficacy and intensity of selection
The main text discusses the evolution of the expected allele frequency distributions by averaging over all possible allelic trajectories. Here we want to measure the efficacy and intensity of selection for individual allelic trajectories. If the allele frequency has trajectory {xt}t=1,…,T, with t the time in generations, we can write the fitness change ΔFt at generation t as where Δxt = xt+1 − xt and F′ represents the partial derivative of the fitness function with respect to frequency x. In the constant-fitness models discussed above, F′(xi) = 4s(h + (1 − 2h)x) and F″(xi) = 4s(1 − 2h). It is not possible in general to attribute specific changes in frequency to the effect of drift or to selection, just as it is impossible to attribute the precise number of offspring borne by an individual to drift or fitness alone—these attributions can be made only in an average sense. We note that the expectation of F′(xt)Δxt gives our , and the expectation of gives when |s| ≪ 1. We therefore define the quantities σt = F′(xt)Δxt and vt = F″(xt)(Δxt)2 as microscopic analogs of the macroscopic efficiency of selection and drift . These are not the only possible analogs—for example, we could consider the expectation of the linear term as the microscopic effect of selection, and as the microscopic effect of drift, without changing the expected values. The combination σt + vt corresponds to the change in fitness of existing alleles, or ‘fitness flux’ ρt (labeled ϕt in [18]). The fitness flux definition is almost identical to our definition for σ, namely where yi is a dense allele frequency trajectory. Whereas our trajectory {x}t is labeled by the time in generation, the time steps in {yi} are chosen so that . In other words, frequencies yi are interpolated within generations so that the quadratic terms can be neglected.
The intensity of selection i measures the rate of increase in frequency of favorable alleles. At a microscopic level, the intensity of selection is simply When 0 ≤ h ≤ 1 and selection coefficients do not change sign over time, is 1 for favorable alleles, and −1 for deleterious ones: The intensity of selection is simply the average change in frequency of favorable alleles. If h < 0 or h > 1, we have overdominance, and the favored allele is frequency dependent. In that case, the intensity of selection is also independent of the trajectory followed, and is given by , where represents the frequency of optimal fitness. As discussed in the main text, there are many ways to weight the intensity of selection when the effect is to be measured across multiple sites. As long as the weighing scheme is not frequency or time-dependent, it remains possible to directly compare the intensity of selection across populations without the need for detailed modeling.