Abstract
The extent to which populations experiencing shared selective pressures adapt through a shared genetic response is relevant to many questions in evolutionary biology. In a number of well studied traits and species, it appears that convergent evolution within species is common. In this paper, we explore how standing, deleterious genetic variation contributes to convergent genetic responses in a geographically spread population, extending our previous work on the topic. Geographically limited dispersal slows the spread of each selected allele, hence allowing other alleles – newly arisen mutants or present as standing variation – to spread before any one comes to dominate the population. When such alleles meet, their progress is substantially slowed – if the alleles are selectively equivalent, they mix slowly, dividing the species range into a random tessellation, which can be well understood by analogy to a Poisson process model of crystallization. In this framework, we derive the geographic scale over which a typical allele is expected to dominate, the time it takes the species to adapt as a whole, and the proportion of adaptive alleles that arise from standing variation. Finally, we explore how negative pleiotropic effects of alleles before an environment change can bias the subset of alleles that get to contribute to a species adaptive response. We apply the results to the many geographically localized G6PD deficiency alleles thought to confer resistance to malaria, whose large mutational target size and deleterious effects make them likely candidates to have been present as deleterious standing variation. We find the numbers and geographic spread of these alleles matches our predictions reasonably well, which suggest that these arose both from standing variation and new mutations since the advent of malaria. Our results suggest that much of adaptation may be geographically local even when selection pressures are wide-spread. We close by discussing the implications of these results for arguments of species coherence and the nature of divergence between species.
1 Introduction
There are an increasing number of examples where different populations within a species have adapted to similar environments by means of independent genetic changes. In some cases this convergent evolution has produced different phenotypes despite shared selection pressures; in other cases independent adaptations are identical down the same nucleotide change (Jeong & Rienzo, 2014; Stern, 2013; Martin & Orgogozo, 2013). Such convergent evolution within populations has been seen for many carefully studied phenotypes across a range of species, including drug resistance in pathogens, resistance to pathogens or pesticides, and the molecular basis of pigmentation changes. The phrase “parallel evolution” is also used to refer to such convergent evolution; here we use these synonymously, as we are concerned with adaptation within a single species that can occur via a different or shared genetic routes (see Arendt & Reznick, 2008, for more discussion).
The issue of convergent adaptation within species touches on a number of important questions in evolutionary biology. These include the extent to which adaptation is shaped by pleiotropic constraints (Haldane, 1932; Orr, 2005), whether adaptation is mutation-limited (Bradshaw, 1991; Karasov et al., 2010), and to what degree should species be regarded to cohesive units. Convergent evolution also affects our ability to detect adaptation from population genomic data, since no single allele sweeps to fixation over the entire area affected by the selection pressure (Pennings & Hermisson, 2006b).
Convergent evolution can occur even within a well mixed population subject to a constant selection pressure, either through selection on multiple mutations present as standing variation within the population before selection pressures switch (Orr & Betancourt, 2001; Hermisson & Pennings, 2005), or due to multiple adaptive alleles that arise after selection pressures switch (Pennings & Hermisson, 2006a). The previous work has shown that a primary determinant of the probability that multiple alleles contribute to adaptation is the product of the population size and the mutation rate (see Messer & Petrov, 2013, for a review).
Spatial population structure, as caused for example by geographically limited dispersal, also increases the chance of convergent evolution. For example, patchy geographically selection pressures can lead to much higher probability of parallel adaptation than uniform pressures, since alleles are unable to spread through intervening populations (Ralph & Coop, 2014). In Ralph & Coop (2010) we formulated a simple model of parallel adaptation in a geographic setting where the selection pressure was constant across the entire environment, and a single mutational change was sufficient to adapt after the change in environment. We assumed that there was no standing variation for the adaptive allele, so that parallel mutation must be due to multiple new mutations occurring after the environmental shift. In that setting, we found the characteristic geographic scale over which multiple instances of the adaptive allele are expected to arise in parallel. This characteristic length could be expressed in terms of a simple compound parameter determined by our parameters of interest. In this paper, we extend this spatial model to include standing variation present at mutation-selection balance before the selection pressures switch.
Below, we show that convergent adaptation within a wide-spread species is likely to be common, as has already been seen for a number of traits. On the basis of this we argue that the genetics of adaptation may often be geographically local, and that most widespread selective sweeps should occur when adaptation is highly constrained (e.g. by small mutation rate or the need for a linked combination of alleles). We discuss some implications, and history, of this view for the evolutionary coherence of species and molecular evolution.
1.1 Model description
We assume that the species range is a large, homogeneous, one– or two–dimensional region. There are two selective classes – the neutral type, and the mutated type. We assume that separately arising mutations are distinguishable – either as selectively equivalent mutations, or by linked variation. We also suppose that the mutated type has been at a selective disadvantage for a sufficiently long enough time in the past to be at selection-mutation equilibrium, but at a certain time the selective regime changes, so that the mutated type has a selective advantage and quickly spreads to fixation. After fixation, alleles are either descended from families of mutants present as standing variation when the selective regime changed, or from new mutants arising since that time.
For concreteness, suppose that before time t = 0, the mutant type has fitness 1 − sd relative to the neutral type (i.e. it produces on average 1 − sd times the number of offspring per generation), and that after time t = 0, the mutant type has fitness 1 + sb, where sb > 0 will usually be assumed to be small, and 0 < sd < 1. We assume that diploid fitness is additive, or at least that the important early dynamics are determined by the heterozygous fitness, with no reference to the fitness of the homozygote for the mutant alleles. The number of offspring has finite variance, which would divide the probability of establishment of advantageous alleles below, but to keep the formulas simpler we assume to be equal to one. (We leave aside the case where mutations are recessive, however.) As for the other parameters, suppose that each offspring of a neutral parent is of the mutant type with probability μ, and that the mean squared geographic distance between parent and child is σ2. The species occupies an area with mean density ρ of haploid individuals (or chromosomes) per unit area.
Rates of origination of standing and new mutations
We make use of the commonly used approximation that neglects competition between close relatives, treating the offspring of a new mutant that appears in an area not already occupied by the mutated type as a branching process. After t = 0, the offspring of each individual mutant thus forms (approximately) a branching process with growth rate sb, so each new mutant establishes locally with probability ps ≈ 2sb. As in Ralph & Coop (2010), each new offspring has a very small probability of being a mutant and establishing locally, so the collections of times and locations at which mutants appear and establish locally is well–approximated by a Poisson process in space and time. The rate of this Poisson process, i.e. the mean number of new mutants per unit area and per generation that appear and establish locally after t = 0 in areas not already occupied by the mutant type, is approximately λ = 2μρsb.
Before t = 0, on the other hand, the allele is deleterious. The genetic descendants of each new mutation are (with high probability) doomed to extinction, but may persist for some time (note that we have assumed that sd is not too small). Since the times and locations of mutants before t = 0 also well–approximated by a Poisson process with rate μρ, the locations of all mutant families extant at t = 0 whose descendants are destined to fix locally is also, by the Poisson Mapping Theorem, a Poisson process with rate we define to be λ0. If we assume that the descendants of at most only a few members of any extant mutant family at t = 0 will survive, and that these progenitors are near to each other in space, we can then treat each such family as equivalent to a single new mutation, but with somewhat larger probability of local establishment. (This approximation will be good if the logarithm of the size of each extant family is small relative to the establishment time, and the spatial distribution of each is small relative to the spread between the families.) To find λ0, consider a mutation that arose at time −T < 0, and let Zs be the number of its descendants at time s − T. At time t = 0, when the environment shifts, there are ZT individuals present with the mutation, and each has probability ps of establishing, approximately independently. Therefore, the probability that at least one descendant of this mutation establish and fix locally is 1 − (1 − ps)ZT, so defining ζ(u, t) = 𝔼[uZt |Z0 = 1] to be the generating function of ZT, the mean number of clusters of standing variants destined to fix locally, per unit area, is
For sb small, using ps ≈ 2sb and that 𝔼[ZT] = (1 − sd)T, we know that ζ(1 − 2sb, T) ≈ 1 − 2sb𝔼[ZT ] = 1 − 2sb(1 − sd)T, resulting in the approximation
Note that this is closely related to the equilibrium frequency of an allele at mutation selection balance μ/sd (Haldane, 1927, 1937).
Geographic spread of alleles
Once an allele has become locally established it can begin to spread across space. We assume that the allele, once established, quickly settles down to spread spatially as a traveling wave of constant speed. The behavior of this wave of advance of a beneficial allele was first described by Fisher (1937) and Kolmogorov, Petrovskii & Piscunov (1937). Under reasonably general conditions, the speed of advance of this wave is . (See Ralph & Coop (2010) for a more thorough review of these travelling waves.) Note that the speed of the wave will vary with details of the space that individuals migrate across (e.g. see Slatkin, 1976; Slatkin & Charlesworth, 1978, for comparisons to migration on discrete grids).
Putting it together
Now, we can put these ingredients together for a simple model of the geographic spread of alleles, a cartoon example of which is shown in Figure 1. Initially, when the selection pressures change at t = 0, a set of standing variants can start to spread having escaped loss through drift. The originating mutations of these variants are depicted by lightning bolts, and occur at a density λ0 across space. They spread at velocity v, carving out cones in space-time. As these alleles proceed in their geographic spread, other new alleles can arise and become established in parallel, whose origins are indicated by stars. These new mutations arise and become established at rate λ.
As we outlined in Ralph & Coop (2010) this model of geographic convergent evolution, when λ0 = 0, is analogous to a model of crystallization due to Kolmogorov (1937). In this model, nucleation sites form at random at a constant rate in time and space and initiate the radial growth of new crystals. After their initial spread, the different orientations of crystals form a random tessellation of space, whose properties have been studied by Møller (1992, 1995) and others (Bollobás & Riordan, 2008; Gilbert, 1962). The generalized version of this process, for non-constant wave speeds and inhomogeneous Poisson processes is known as the Kolmogorov–Johnson–Mehl–Avrami tessellation (Fanfoni & Tomellini, 1998). Our combined process with both standing variation and new mutation is a special case of the KJMA tessellation, where the spatialtemporally homogeneous Poisson origination process of new mutations, is supplemented by a single pulse of origination points at time zero with spatial density λ0. (For the purposes of analogy, we could imagine that before time t = 0 the temperature is high enough that nucleation sites appear but do not persist long.)
If we ignore the effects of new mutation, then everything about the process is relatively simple: each point in space will be first reached by the wave whose origination point lies closest to it. This random tessellation of space is known as a Poisson-Voronoi tessellation (Møller, 1994) (i.e. the cells formed by assigning regions of space to the nearest point in a Poisson process). The properties of this tessellation by alleles is determined by the spatial locations of the initiation points, which are sampled from a spatially homogeneous Poisson process, independent of the rate of spatial spread. Introducing new mutations cause some qualitative changes beyond dependence on new parameters: the cells formed by a Voronoi tessellation have straight sides, but the introduction of new mutations cause these to curve.
1.2 G6PD example
Below we describe a number of properties of our process, but we will first introduce a motivating example to provide some concrete numbers to use for illustrative purposes.
Over roughly the past ten thousand years, alleles conferring resistance to malaria have arisen in a number of genes and spread through human populations in areas where malaria is endemic (Kwiatkowski, 2005). A number of these alleles appear to be examples of convergent adaptation, as different derived mutations in the same gene are seen in different individuals. For example, a number of changes that confer malaria resistance have been observed in the β-globin gene; and the sickle cell allele may plausibly have arisen by up to five independent occurrences of the same base pair mutation at different locations within Africa (Flint et al., 1998; Ralph & Coop, 2010). Another particularly impressive case of convergent evolution in response to selection pressures imposed by malaria are the numerous changes throughout the X-linked G6PD gene, with upward of 50 polymorphic variants (above 1% local frequency) having so far been described that lower the activity of the enzyme (Howes et al., 2013; Minucci et al., 2012). These alleles are now found at a combined frequency of around 8% frequency in malaria endemic areas, rarely exceeding 20% (Howes et al., 2012). Whether these all confer resistance to malaria is unknown, but malaria is thought to be the primary driver of these polymorphisms (see Hedrick, 2011, for a general review). Three G6PD deficiency alleles are particularly common and relatively well studied: the A– allele found in much of sub-Saharan Africa; the Med allele found in the Mediterranean and Middle East; and the Mahidol allele found Myanmar and Thailand. The A and Mahidol alleles have been shown to be protective against Plasmodium falciparum and P. vivax in both hemizygote males and both heterozygote and homozygotes females (Ruwende et al., 1995; Louicharoen et al., 2009). Haplotype-based analysis of genetic diversity surrounding A, Med, and Mahidol suggest that they have spread over the past few thousand years (Tishkoff et al., 2001; Slatkin, 2008; Saunders et al., 2005; Louicharoen et al., 2009), consistent with the age of other known malaria resistance alleles. Population genetic analyses suggest that these three variants each have a hemizygote/heterozygote selection coefficient of 0.05 0.3 (Tishkoff et al., 2001; Slatkin, 2008; Saunders et al., 2005; Louicharoen et al., 2009). This is in reasonable agreement with the selection coefficients calculated by Ruwende et al. (1995) on the basis of the present day levels of resistance to malaria due to the A– allele.
Given such a strong pressure on these alleles they should have risen quickly to fixation, so their presence at intermediate frequency, over a broad geographic area, makes it a good candidate for a recently balanced polymorphism due to heterozygote advantage (note that the conditions for a balanced polymorphism are complicated by the hemizygosity of males, see Hedrick, 2011; Pamillo, 1979). Indeed, hemizygous males and homozygous females suffer from G6PD deficiency, leading to hemolytic anemia when exposed to a variety of compounds, notably those present in fava beans (although it is unclear if this selective pressure alone suffices to maintain the polymorphism in the face of the considerable malaria resistance advantage). The theory we use regarding the “wave of advance” (Fisher, 1937) applies as well in the case of heterozygote advantage (Aronson & Weinberger, 1975), with the selected allele spreading locally to the equilibrium frequency (rather than fixation). Therefore, our framework is applicable to the spread of G6PD, with speed determined by the advantage of heterozygotes when rare. We assume that before malaria became prevalent, G6PD deficiency alleles suffered a decrease in relative fitness of sd in heterozygote and homozygotes females and hemizygote males. Assuming that the underlying causes and strength of this drop in fitness have not changed, we estimate that sd has to have been upward of ∼ 0.05 (if sb ≥ 0.05), in order to have resulted in the equilibrium frequency seen today in areas with endemic malaria (see also Ruwende et al., 1995).
The geographic area of Central and Eastern Asia with malaria is on the order of ten million square kilometers. In that area there are at least 15 common variants (see Figure 2 Howes et al., 2013). Therefore, the average width of an area occupied by an allele is . The coding region of G6PD is 515 codons long, and around 140 distinct deficiency alleles have been observed. Assuming a mutation rate of ≈ 10-8 per base pair per generation, we can take as an order-of-magnitude estimate μ ≈ 1 × 10-6 per generation. The dispersal and demographic parameters of humans in the past few thousand years is unclear, particularly as we are concerned with the “effective” population density (i.e. population density divided by variance in offspring number). We therefore will use two reasonable values for the effective population density: ρ = 2 and 0.2 people per km2, and three values for the dispersal distance: σ = 10, 50 and 100 kilometers per generation. Clearly, human migration has been shaped both by local dispersal and larger-scale expansions (see Pickrell & Reich, 2014, for a recent discussion), so these parameters only provide a rough view of the process.
1.3 The geographic resolution of adaptation from new and standing variation
In Ralph & Coop (2010), studying the model without standing variation, we defined a characteristic length which gave the spatial scale across which mutants with distinct origins would establish. This was proportional to the mean distance between neighboring established mutants, but had the advantage of being easier to calculate. Furthermore, the time scale over which adaptation occurred could be found by dividing the characteristic length by the speed at which the mutants spread. We first define a similar characteristic length for this new model.
Suppose we fix our attention on a particular new mutation that happens to be the first to occur in some region. If it does not encounter other locally fixed, beneficial mutants, it will cover a distance L in time L/v. The number of other mutations appearing in the circle it has covered up until this time is Poisson with mean λ0πL2 + λπL3/v in two dimensions (and 2λ0L + 2λπL2/v in one dimension). Therefore, if we define χ for a two-dimensional model to be the unique positive solution to then χ gives the distance spread unobstructed by the descendants of a new mutant before it is expected that one other successful mutation would have arisen in the area covered so far. The explicit formula for χ in two dimensions is cumbersome; here we omit it. (In one dimension the characteristic length is Substituting expressions for λ0, λ, and v from above, we can rewrite this
From this we see that χ decreases with ρ and μ. Furthermore, for large σ, the characteristic length is close to the value obtained just from standing variation:
On the other hand, if the mutant allele is highly deleterious before t = 0, then the characteristic length is close to the value from Ralph & Coop (2010):
These two end points help build our intuition for the interaction of parameters in shaping the geographic scale of convergent evolution. By the above calculation, we know that the relevant mutations occur about distance χ apart, and occur within the first χ/v generations. Said another way, if we look in a circular region of space of radius χ over χ/v generations, we expect to find roughly one mutational origin.
In Figures 3 and 4 we show the range of characteristic lengths as a function of various parameters chosen to match the evolution of malaria resistance at G6PD. These curves match our intuition that population densities result in smaller characteristic lengths (as would higher mutation rates). Allowing standing variation increases the mutational input and so decreases the characteristic length below that predicted by only new mutations (i.e. equation (6), Ralph & Coop (2010)). In turn, increasing the prior deleterious effect of the allele (sd) acts to increase the characteristic length until it reaches that predicted by new mutation alone. Higher dispersal distances lead to larger characteristic lengths, since more rapid geographic spread blocks others from arising. A larger selective advantage (sb) acts in two conflicting ways: aiding the rapid geographic spread of established alleles, and also helping more independent copies escape drift and become established. The effect of helping establish alleles wins out, since increasing the selective benefit s decreases the characteristic length. (This can be shown in general by differentiating (4) with respect to , showing that ∂zχ= −(C1zχ + C2χ2)/(C3z2 + C4zχ) ≤ 0 for appropriate nonnegative constants C1-4.) This effect is strongest when only standing variation contributes , equation (5)), as in that case the speed of spread does not matter. The dependence of the characteristic length on sb when only new mutations contribute is much weaker (, equation (6)) Overall, the range of characteristic lengths observed are reasonably consistent with the average diameter of a G6PD variant in Eurasia of 800km, especially for the lower population density, as long as the fitness cost of G6PD-deficiency alleles before malaria (sd) was not too low.
Note that the conflicting roles of ρ and σ mean thar even in species where levels of neutral differentiation are low, geographic convergent adaptation may be common. This is because low levels of neutral genetic differentiation can be due to high population densities rather than high dispersal distances, and high population densities would allow convergent adaptation. As such geographic convergent evolution may be common even in species with little neutral population structure.
1.4 Time to adaptation
It is also straightforward to compute the mean time until adaptation. Imagine ourselves at some geographic location, and let τ ≥ 0 be the time at which we are first reached by some advantageous mutation. Then, as can be seen from the perspective of the grey dot in Figure 1, τ > t if and only if the cone with point at (x, t) and slope v extending back to t = 0 (light grey dashed lines) is empty of successful mutations. Since we assume these are a Poisson process, in two dimensions i.e. the combined probability that area of a circle of radius vt surrounding our point at t = 0 was free of successful standing variants, and no successful, new mutations arose in the cone (that has radius r = vt at t = 0 and height h = t, and hence volume, πr2h/3. Since ,
For means of evaluating this integral, see Appendix A.
In Figure 5 we show the mean time until adaptation for various values of the parameters chosen to match the case of adaptation at G6PD. Increasing σ and decreasing sd lower the time to adaptation, as alleles spread geographically more quickly and are present as standing variation more often respectively. Increasing sb strongly decreases the time to adaptation, as it both causes more alleles to escape drift and to rapidly spread. Given that the G6PD alleles likely spread over a few thousand years, i.e. less than a few hundred generations, this time scale seems quite plausible, except perhaps for the lowest dispersal distances.
1.5 The contribution of standing variation
We can use our framework to address what proportion of new adaptive variants arise from standing variation. We have defined λ0 to be the mean density of standing variants that are present and escape loss when the environment shifts. We will define v+ to be the mean density of newly arising alleles that spread having arisen in an area free of other adaptive alleles. Since the probability that a mutant arising at location x and time t is lucky enough to be born in a location not already occupied by mutants is ℙ{τ > t}, we can see , and hence ν+ = λ 𝔼[τ]Using that λ0 = λ/log(1/(1 – sd) ), this gives us that
Therefore, the mean proportion of patches that come from standing variation is
There are λ0 + ν+ patches per unit area, so the typical patch (i.e. with distribution given by the Palm measure) occupies area 1/(λ0 + ν+).
We can also find the mean proportion of space covered by standing variants (we will restrict ourselves to two dimensions). At time t a geographic location has not yet been reached by the mutation with probability given by equation (7). The probability, given that it has not been reached by t, that it will be reached by time t + dt by a standing variant is approximately 2λ0πν2tdt, which is λ0 multiplied by the thin slice of extra area in our expanded circle at t = 0, which has gone from a radius νt to ν(t + dt). The corresponding probability the point is reached by a new variant is λπν2t2dt, which is λ multiplied by the sliver of extra volume in our space-time cone at time t + dt compared to that at time t. Therefore, as is standard for competing exponentials, the probability a given location is reached first by a standing variant, and therefore the mean proportion of space covered by standing variants, is
To evaluate this integral, again see Appendix A.
Furthermore, if we define a0 to be the mean area occupied by a typical standing variant, then a0 is given by the proportion of the range occupied by standing variants divided by the mean density of unique standing variants, i.e. a0 = z0/λ0. We can solve for a+, the corresponding mean area occupied by a given new variant, using the formula a0/a+ = z0/(1 − z0).
In Figure 6 we show the proportion of alleles that spread from standing variation, and the proportion of geographic space covered by standing variants for parameters chosen to match our G6PD example. Even for relatively large deleterious costs prior to the environmental switch, standing variants still makes up quite large proportion of the adaptive alleles, and an even larger proportion of the range (they get to occupy a larger area than new mutations, since they start earlier).
1.6 Multiple variant types
Another problem that we can seek to address with this work is the extent to which pleiotropy biases adaptation towards the repeated use of particular subset of loci (i.e. convergence at genetic level). While many alleles may confer the beneficial phenotype, not all will contribute equally to adaptation if they have negative pleiotropic consequences. There are at least two ways that negative pleiotropy can contribute to high rates of convergence when adapting if a single change is sufficient for adaptation. First, negative pleiotropic effects can reduce the overall beneficial selection coefficient of an allele in the new environment, making them unlikely to become established and slow to spread (and in the worst case making them deleterious). This first effect has been well studied by a number of authors (Orr, 2000; Otto, 2004; Welch & Waxman, 2003; Chevin et al., 2010) and its role in precise genetic convergence examined (Orr, 2005; Chevin et al., 2010; Unckless & Orr, 2009). A second contribution is that alleles that have less negative pleiotropy are more likely to be present as standing variation before the environmental shift, and so are more able to respond immediately.
Here we focus primarily on the second effect. Let’s imagine for the moment that all classes of beneficial allele have the same beneficial selection coefficient. Or, at least that beneficial selection coefficients are similar enough that our selective exclusion approximation holds over the time scale on which we examine the process. Each class of mutations j has its own mutation rate μj and selective disadvantage sd,j prior to the environmental switch. As they have the same beneficial selection coefficient after the switch, all of the waves travel outward at a rate v. Then, the density of type j standing variants per unit area and the input rate of de novo variants per unit area per generation are, respectively,
Using these rates, and an argument analogous to that used to derive equation (11), at the time when every location has been reached by an adaptive allele, the proportion of the species range covered by alleles of type j is
If we only allow standing variation this collapses to while if we only allow new variation, i.e. if all variants are highly deleterious before the environment switches, pj = μj/(Σk μk).
To illustrate some of the properties of this model, let’s imagine the somewhat extreme scenario that there is a single base pair at which a possible mutation is relatively free of negative pleiotropy (call this class 1); and a larger mutational target where changes have more serious pleiotropic onsequences in the ancestral environment (class 2). We set sd,1 ≤ sd,2 = 0.05 and μ1 = 1 × 10−8, assume that both classes share a beneficial selection coefficient of sb = 0.05, and think of the second class of alleles as arising at one of ten, a hundred, or one thousand base pairs. We show the expected proportion of space covered by the rarer, class 1 mutations in Figure 7. Intuitively, the contribution of the rarer mutation decreases as the mutational target of the second class becomes larger, and as the difference in the negative pleiotropic consequences of the two classes of alleles decreases. The case with standing variation only is the best case scenario for the rarer mutation, so its rate of introduction after t = 0 is necessarily lower. However, the standing-variation-only case does seem to provide a reasonable rule of thumb, especially for parameter combinations, like higher population densities and high dispersal distances, that increase the contribution of standing variation (and similarly for high sb).
It is natural to also incorporate differences in the beneficial selection coefficients of the different classes of alleles, to allow for negative pleiotropic effects acting to suppress the advantage of an allele once the environment switches. One simple way to do this is to simply replace sb with sb,j resulting in a class-specific new allele establishment rates (λj) and rates of spread (vj). These then could be used in equation (13). For instance, we could extend the two class model above so that class one has the additional advantage of s1,b > s2,b. This would further increase the contribution of the rarer class, because this class would both overcome drift more often and spread more rapidly. However, a straightforward application of the logic of used to construct equation (13) fails once the different allelic types meet, since the assumption of selective exclusion no longer holds: alleles with higher sb will spread, at a lower speed, into regions occupied by alleles with lower sb. Given enough time, the most advantageous type (type 1, in this case) would spread everywhere, and so substituting multiple values for sb in equation (13) would only provide a short-term approximation to a longer term dynamic. Even if the initial tessellation has formed with purely class 2 alleles, the first allele would vhave an selective advantage δs = sb,1 − sb,2, and so would arise at rate 2ρμ1δs and would spread at speed . An extension of our Poisson process model could incorporate these effects, by thinning the Poisson process of establishing mutations by correctly, but is considerably less tractable. Whether allele 2 persists would depend on the linkage arrangements between loci. If the loci underlying allele 1 and 2 are unlinked, then allele 1 can spread without disrupting allele 2. However, if they are linked, the spread of allele 1 may push allele 2 out of the population. More complicated dynamics, including spatial Dobzhansky-Muller incompatibilities (Kondrashov, 2003; Ralph & Coop, 2010) could ensue if there are epistatic interactions between the alleles.
1.7 Local establishment and comparison to panmixia
In the above (and in Ralph & Coop (2010)) we have assumed that once the mutation appears, conditional on eventual fixation, it begins to spread spatially at speed v instantly, effectively neglecting the time it must first spend escaping demographic stochasticity. In Ralph & Coop (2010) we addressed this by noting that there would be no change at all in our results if all mutations had to wait the same amount of time before fixing locally, and that this time was short relative to the time it took the wave to spread across the characteristic length; we then showed via simulation that this was reasonable in certain situations. In this section we examine this assumption in more detail, although mostly through heuristic arguments, and also compare the results above to the results without geographic structure of Pennings & Hermisson (2006a).
We are assuming that shortly after a new mutation appears, it can be approximated by a branching process growing at rate s until the point that it grows large enough to “feel” spatial structure, at which point it begins to spread as a more–or–less deterministic wave. Although we are not aware of good analysis of this transition, the relevant size when spatial structure becomes important should be something close to σ2ρ (i.e. Wright’s “local effective population size”, Wright (1943)). Let Zt be a continuous-time branching process with Z0 = 1 and 𝔼[Zt] = esbt. Then we know that there exists a random variable W such that limt→∞ e-sbtZt = W almost surely, so that if τ is the time Zt reaches size σ2ρ, then σ2ρ = Zτ ≈ esbτ W (Jagers, 1975). From this we know that τ ≈ (1/sb)− (log(σ2ρ) − log W ); although more detailed information is available (e.g. a central limit theorem for τ, Nagaev (1971)), we will stick to the loose interpretation.
So, roughly speaking, we need to evaluate the importance of a delay of about T = (1/sb) log(σ2ρ). New mutations will appear and become established during this time if 4ρσ2μsb ≥ 1/T, i.e. if 4ρσ2μ ≥ 1/ log(σ2ρ). Our model will still be a good approximation, however, as long as T is short relative to the time a wave takes to spread between nearby mutational origins. This can be worked out, but it is simpler to note that if the converse is true, the process is largely unaffected by spatial structure, and so the panmictic model is a good approximation for the true process. Since both our spatial model and the panmictic model underestimate the true degree of parallel adaptation, it suffices to compare results from the two models to avoid applying the wrong model.
Pennings & Hermisson (2006a) show that under a panmictic model with certain assumptions on the parameters, the number of independent origins due to both standing variation and new mutation seen in a sample of size n has approximately the Ewens distribution with parameters n and θ = 4N μ. As n increases, the total number of types seen grows as log n. In our model, we can increase total population size by increasing either the density ρ or the total amount of area. In either case, the predicted number of distinct types grows linearly with n, much faster than under panmixia.
The results of Hermisson & Pennings (2005) and Pennings & Hermisson (2006a) suggest that in a panmictic population the number of independent alleles (and their frequencies), in a sample, is nearly independent of sb and sd (although this breaks down with fluctuating population size, Wilson et al., 2014). In the panmictic model the lack of dependence on sb comes about because while increasing sb increases the rate at which independent mutations become established, it also accelerates the frequency gain of established alleles, hence decreasing the time period in which new alleles can arise and hope to be at significant frequency in the population. These two effects cancel each other out leading to no strong effect of sb on the number of independent alleles. Decreasing sd increases the number of standing variants within a population, increasing the number of alleles that manage to establish and spread from standing variation (Hermisson & Pennings, 2005; Orr & Betancourt, 2001). However, having more established standing alleles acts to exclude the spread of new alleles that arise once the environment switches. These two opposing effects again cancel out leading to little overall effect of sd on the number of independent alleles. In contrast, our results show that the characteristic length (closely related to the density of independent alleles) depends on both sb and sd in a geographical spread population. Like the panmictic model, in our model alleles also act to exclude each other; however, the geographic spread of an allele is slow compared to the initial exponential growth of an allele in a panmictic population. That means that the role of selection in helping alleles become established can dominate, leading to more independent origins, both by being weaker before, and stronger after, the environmental switch.
2 Discussion
Our results further suggest that convergent evolution among populations may be quite common, especially when standing variation is not so deleterious that it is present before environmental shifts. When the geographical area where a species experiences a selection pressure is greater than our characteristic length we should expect multiple independent alleles to arise and spread in that area. Allowing standing variation in our model decreases the characteristic length, i.e. it increases the extent of convergence among populations. While at face value this increase seems unsurprising, this relationship differs markedly from the case of multiple competing alleles in a panmictic population (Hermisson & Pennings, 2005; Pennings & Hermisson, 2006a). Importantly, allowing standing variation may greatly lower the time until the species becomes adapted across the geographic range of the selection pressure. Adaptation through standing variation also biases the type of variation towards those alleles with fewer pleiotropic effects, since these are more common as standing variation before the environmental shift. This bias can in some cases easily overcome quite significant differences in mutational target sizes among loci allowing the same locus to be repeatedly the source of adaptation even if there are seemingly many different routes to adaptation. (See Figure 7.)
The confusing signal of geographic convergent evolution
As we have argued in Ralph & Coop (2010) the ease with which geographic convergent adaptation occurs means that we should incorporate it more widely into our thinking about the genetic basis of adaptation. For example, the absence of European skin pigmentation alleles in ancient DNA from European recovered from several thousand years ago has led to the suggestion that these individuals had dark pigmentation (Olalde et al., 2014; Lazaridis et al., 2014; Wilde et al., 2014). However, given our results and the partially convergent basis of skin pigmentation between Europeans and East Asians (Norton et al., 2007; Edwards et al., 2010) it seems just as plausible that these ancient individuals adapted to high latitudes via a different complement of “light-skin” pigmentation alleles; to our knowledge, we have no strong evidence either way. Such convergence may considerably complicate the exploration of phenotypes and adaptation among populations using variants mapped in a limited set of populations (Berg & Coop, 2014).
More generally, if geographic convergence is common we should often expect to see selected alleles which are strongly geographically restricted as they have simply not had time for neutral mixing to spread them across the landscape. This pattern may be very hard to distinguish from local adaptation using population genomic approaches alone. This is especially problematic as boundaries between convergent alleles may often occur where gene flow rates are low, i.e. historical and ecological breaks, even if the alleles concerned have no bearing on the ecological differences across these breaks (see Bierne et al., 2011, for a wide-ranging discussion of how allelic differentiation may build along particular zones). We are rarely so fortunate as to know as much about the genetics, phenotypic distributions, and potential selection agents as we do for malaria resistance in humans. Therefore, we must be wary of mistaking the strange spatial distributions of particular alleles for adaptation to some very specific selection pressure (e.g. the distribution of the Mahidol allele in Figure 2), when they are simply elements of a larger geographic mosaic of alleles responding to a broadly shared selection pressure.
Each of these local sweeps will be associated with the haplotype on which the particular allele arose. Under the parameter regime we study, standing variants are still quite young, so we do not expect a strongly reduced hitchhiking effect. As such, following the initial period of adaptation, we should expect the population to be partitioned into a set of geographically restricted long haplotypes. Given sufficient time these haplotypes will mix together through migration and drift, potentially leading to a signal of a sweep from multiple independent mutations if our selected allele occurs at the same locus (Pennings & Hermisson, 2006b), or to multiple partial sweeps if the loci are scattered across the genome (Coop & Ralph, 2012).
Our results are predicated on the idea that adaptive variants are initially rare within populations, i.e. they are reasonably deleterious before the environment switches. In contrast, even distant populations could have a shared basis of adaptation if they adapt via common variation, e.g. previously neutral (or nearly neutral) variation, shared among populations. Our ability to detect such shared convergent events is at the moment quite limited. If many loci contribute to variation in a trait, then selection on any one allele may be weak, which might lead adaptation to use the same alleles in different populations. However, sufficiently differentiated populations may still adapt via different genetic routes, as the constellation of alleles best able to respond to selection will be somewhat different (Barton, 1989). Therefore, it may be the case that polygenic traits are even less susceptible to a shared genetic basis to adaptation across populations than simple traits. However, we currently lack good models and methods with which to test this.
Are species held together by widespread selective sweeps?
Our results touch on an old debate on the evolutionary coherence of species. Mayr and many others have argued that species are coherent evolutionary units because they are united by shared gene flow (pages 521–522 in Mayr, 1963). However, this argument has been questioned by a number of authors based on relatively high levels of differentiation, and low rates of dispersal, in many species (Ehrlich & Raven, 1969; Levin, 1979). Even if gene flow is not high enough to prevent neutral differentiation or local adaptation, a number of authors have argued that species are cohesive if gene flow is high enough for globally selected alleles (and their hitchhiking haplotypes) to spread across entire species (see also Rieseberg & Burke, 2001; Morjan & Rieseberg, 2004; Ellstrand, 2014). At present, large scale genotyping and sequencing projects, along with more sophisticated methods, are highlighting ever more signals of gene flow between populations and species (Patterson et al., 2012; Sousa & Hey, 2013; Hellenthal et al., 2014). However, our work on geographic convergent adaptation (see also Ralph & Coop, 2010, 2014) suggests that species should often adapt to wide-spread selection pressures through convergent evolution rather than waiting for a single allele to migrate across the range.
In support of this idea, putative recent selective sweeps seem to often be geographically restricted (Pickrell et al., 2009; Coop et al., 2009; Granka et al., 2012), rather than species-wide (but see Clark et al., 2007; Long et al., 2013, for a potential example). This is likely in part due to the relatively low incidence of wide-spread selection pressures, but as noted above even when we know of wide-spread selection pressures (e.g. malaria) the response is usually convergent, not shared, across large spatial scales. On the other hand, introgression of adaptive alleles across species and sub-species boundaries, suggests that selected alleles do sometimes spread despite low migration rates (see Hedrick, 2013, for a recent review). However, at least some of these cases are likely caused by introgression of haplotype complexes consisting of many, tightly linked, beneficial alleles (that perhaps are inaccessible by mutation over reasonable time-scales for a population in a new environment). Currently we can only scan genomes for species-wide sweeps in those few organisms with population-scale sequence data, and so we do not know if these observations generalize to most species. This is rapidly changing, and will allow us to form a much improved picture of the relationship between the level of neutral population structure, and the age and geographic spread of selected alleles across many species.
Even if selective sweeps only bring alleles to fixation locally, they are still potentially a stronger homogenizing force than neutral mixing through migration. Under neutral mixing, the mean number of generations back to the most recent common ancestor is on order of the total effective population size. This quantity has not been worked out for a model with simultaneous local sweeps, but will be somewhat analogous to the “spatial λ-Fleming–Viot” models of Barton et al. (2013b), in which local sweeps occur independently across the range. Lineages that are close to the sweep ( ∼v/χ Morgans) will be moved towards the center of the sweep (a displacement O(χ )), and pairs of lineages caught up in the same sweep could be forced to coalesce (see Barton et al., 2013a, for work on geographic hitchhiking). In this case lineages and alleles are literally hitchhiking across space. The overall rate of lineage movement and coalescence depends on the rate of sweeps, their geographic scale, and the rate of recombination, and could be calculated by combining the result presented here with Barton et al. (2013b) and Barton et al. (2013a). However, if geographic sweeps are common then this may substantially speed up the rate of mixing compared to neutral drift and migration.
How then do substitutions occur?
If it is rare for gene flow to rapidly spread selected alleles across a species range, how then do selected alleles ever become fixed within species? Drift alone will act only slowly to sort variants within species into divergence among species. Furthermore, repeated bouts of adaptation in particular genomic regions may act to push a subset of previously selected alleles to fixation across the species range, through the spread of the genetic backgrounds on which they arise. However, it seems likely that this is a slow process compared to the initial rapid spread of selected alleles.
Speciation and extinction as phases of molecular evolution?
One potential resolution is that many selected alleles achieve fixation, not through their own species-wide spread, but rather through subsequent large-scale changes in geographic range size induced by extirpation of the species over parts of its range (see Barton et al., 2013b, and references therein for how such a model could be constructed). Such drops in range size may fix, or radically change the species-wide frequency, of alleles previously restricted to small portion of a species range. Furthermore, many modes of speciation are proposed to occur through a geographically-limited subset of populations forming the basis of new species, e.g. the splitting off of part of the range through a vicariance event or dispersal of a subset of individuals. In this case, speciation will cause geographic assortment of polymorphic ancestral variation, again acting to fix variants within newly formed species that were previously polymorphic across ancestral species ranges.
Such ideas are not completely new and represent a perhaps logical consequence of an allopatric or parapatric view of the biogeography of speciation. However, it is worth revisiting this idea as geographically broad population genomic sampling allows us to return to themes in biogeography. Along similar lines, Futuyma has argued that much of the adaptive differentiation within species, e.g. adaptation to local conditions, may be ephemeral and subject to loss due to local extinction and the mixing following the collapse of population structure (Futuyma, 2010, 1987). Futuyma offered this as an explanation of the pattern of punctuated equilibrium (Eldredge & Gould, 1972), and argued that the observation of stasis and rapid anagenesis associated with speciation were consistent with micro-evolution. Futuyma argued that despite rapid adaptation over short time-scales, we may observe morphological stasis in the fossil record as much of this adaptation is lost to local extinction and the collapse of population structure (see also Lieberman & Dudgeon, 1996; Eldredge et al., 2005; Futuyma, 2010). Furthermore, he suggests that speciation may act as ratchet to prevent the loss of differentiation, acting to maintain adaptive changes among populations, and prevent their loss by interbreeding. At face value the rate of species formation seems too low to contribute to this process. However, (Rosenblum et al., 2012), and many others, have argued that the rate of speciation may well be quite high, but that the majority of incipient species do not persist long due to reabsorption or extinction. Changes in range size, due to local extinction, can also be very rapid on the time-scales over which alleles may spread on the landscape (Gaston, 2003; Hewitt, 1996). Repeated bouts of extinction and speciation will send waves of alleles to fixation along particular lineages.
Such a link between speciation and substitution would not imply that substitutions should necessarily be thought of as being clustered at splits in inferred phylogenies (see Pennell et al., 2014a; Venditti & Pagel, 2014; Pennell et al., 2014b, for a recent exchange on this). Neutral substitutions are unaffected by this process, because they accumulate in a clocklike manner along lineages, as dictated by the mutation rate, regardless of the geographic details of their polymorphic stage. Turning to the accumulation of adaptive substitutions, it is likely that splits in phylogenies are only a tiny proportion of all incipient speciation events, because extinction rates may be high (Rosenblum et al., 2012), and so every lineage has likely passed through many “speciation events” in addition to the observed ones. Under these assumptions, spatial polymorphisms could accumulate gradually in geographically restricted populations between the large-scale biogeographic events that cause their fixation or loss. This effectively decorrelates the time at which new alleles arise and when they fix in the species, an effect similar to that pointed out by Gillespie (1994).
This view would not imply that adaptive evolution or speciation is driven by the shifting balance or genetic revolutions (Wright, 1932; Mayr, 1954), whereby genetic drift allows populations to cross fitness valleys and substitute novel epistatic combinations. However, although geographic lineage sorting via speciation and extinction can be thought of as very large-scale genetic drift events, in the models we study here the initial spread of alleles is due to selection, not drift (see also Futuyma, 1989, for discussion).
There is evidence that a reasonable fraction of genome-wide substitutions are fixed by positive selection in a number of species (most notably Drosophila, Sella et al., 2009). Under the geographic view of fixation, selection has played a strong role in the establishment of these alleles locally. As we get more broadly geographic population genomics sampling for a range of species we will have the opportunity to study whether the class of alleles that contribute to local differentiation are similar to those underlying species divergence, and the extent to which the answer to this depends on the age and type of population structure within species.
Finally, we close by noting that range expansion and speciation are obviously not separate from adaptive differentiation. The invasion of new geographic areas may lead to a burst of adaptive differentiation, at least in a subset of genes, and speciation may be associated with rapidly adaption to novel environments. Conversely, if the geographic spread of adaptive alleles within ranges is slow, e.g. if they only offer a local advantage or if they are selective excluded, may allow Dobzhansky-Muller incompatibilities to arise within species, effectively offering a mechanism for hybrid incompatibilities to evolve in parapatry, and fracturing the range (Bank et al., 2012; Kondrashov, 2003; Bierne et al., 2011). The alleles that act as components in many of the Dobzhansky-Muller incompatibilities studied to date are geographically restricted (see Cutter, 2012). Therefore, it seems possible that populations within species may often be tending towards speciation, and that as outlined here that this may drive some proportion of molecular divergence.
A Integrals appearing in the text
In the text at several points appear integrals of the form where c is a positive integer, d is the dimension, and a and β are positive real numbers. This could be evaluated through standard numerical methods; below we describe a power series expansion. Changing variables to u = βtd+1, this becomes so it suffices to evaluate the function in the case that a = (c − d + 1)/(d + 1), b = d/(d + 1), and x is a function of the demographic parameters. Since a and b only depend on the dimension and the quantity being computed, we are interested in G as a function of x. At least in the case b = d/(d +1) it is possible to express G as a finite sum of gamma functions, but we proceed with a simpler method. Note that ∂xG(a, b, x) = −G(a + b, b, x), and that at x = 0, the function G is the gamma function G(a, b, 0) = Γ(a + 1). Therefore, a Taylor series for G would be
It is easy to check using Stirling’s formula that lim supn→∞(xnΓ(a + nb + 1)/n!)1/n = 0 if b < 1, so the sum converges.
Acknowledgements
We thank Michael Turelli, Jon Seger, and the Coop lab for helpful conversations.