Abstract
Populations arrayed along broad latitudinal gradients often show patterns of clinal variation in phenotype and genotype. Such population differentiation can be generated and maintained by a combination of demographic events and adaptive evolutionary processes. Here, we investigate the evolutionary forces that generated and maintain clinal variation genome-wide among populations of Drosophila melanogaster sampled in North America and Australia. We contrast patterns of clinal variation in these continents with patterns of differentiation among ancestral European and African populations. We show that recently derived North America and Australia populations were likely founded by both European and African lineages and that this admixture event generated genome-wide patterns of parallel clinal variation. The pervasive effects of admixture meant that only a handful of loci could be attributed to the operation of spatially varying selection using an FST outlier approach. Our results provide novel insight into a well-studied system of clinal differentiation and provide a context for future studies seeking to identify loci contributing to local adaptation in D. melanogaster.
Introduction
All species live in environments that vary through time and space. In many circumstances, such environmental heterogeneity can act as a strong selective force driving adaptive differentiation among populations. Thus, a major goal of evolutionary and ecological genetics has been to quantify the magnitude of adaptive differentiation among populations and to identify loci underlying adaptive differentiation in response to ecologically relevant environmental variation.
Phenotypic and genetic differentiation between populations has been examined in a variety of species. In some cases, patterns of differentiation are directly interpretable in the context of circumscribed environmental differences that occur over short spatial scales (1). For instance, differences in salinity experienced by freshwater and marine populations of sticklebacks has led to the identification of key morphological, physiological, and genetic differences between replicate pairs of populations (2, 3). Similarly, pigmentation morph frequency closely tracks variation in substrate color for a variety of species (4) thereby providing an excellent opportunity to directly relate environmental variation to phenotypic and genetic differentiation.
Patterns of genetic and phenotypic variation have also been examined in species arrayed along broad geographical transects such as latitudinal clines (5). In this paradigm, the goal has often been to identify the phenotypic and genetic basis for adaptation to temperate environments. In certain cases it has been possible to directly relate latitudinal variation in specific environmental variables to aspects of phenotypic and genetic differentiation (e.g., photoperiod and critical photoperiod or flowering time; (6, 7)). In general, the collinearity of multiple ecological and environmental variables along latitudinal clines often complicates the direct relation of environmental variation to specific phenotypic and genetic differences. Nonetheless, because many genetically based phenotypic clines within species often mirror deeper phylogenetic differentiation between endemic temperate and tropical species, it is clear that populations distributed along latitudinal clines have adapted to aspects of temperate environments (8).
Latitudinal clines have been extensively studied in various drosophilid species, most notably Drosophila melanogaster. Parallel clines in morphological (9, 10), stress tolerance (11, 12), and life-history traits (12, 13) have been identified in D. melanogaster populations distributed along multiple continents. These phenotypic clines demonstrate that flies from poleward locales are generally more hardy albeit less fecund, reflecting a classic trade-off between somatic maintenance and reproductive output (12) that would be alternately favored between populations exposed to harsh winters versus more benign tropical environments. Extensive clinal variation in various genetic markers has also been identified (14, 15). In some cases clinal genetic variants have been directly linked to clinally varying phenotypes (16–19), whereas in other cases parallel clinal variation at genetic markers has been documented across multiple continents (20–22). Taken as a whole, there is abundant evidence that local adaptation to spatially varying selection pressures associated with temperate environments has shaped clinal patterns of phenotypic and genetic variation in D. melanogaster.
Demographic forces can also heavily shape patterns of clinal variation (5) and the recent demographic history of D. melanogaster may be particularly germane to our understanding of clinal patters of genetic variation in this species. D. melanogaster is an Afro-tropical species (23) that has colonized the world in the wake of human migration. Population genetic inference suggests that D. melanogaster first migrated out of Africa to Eurasia approximately 15,000 years ago (24) and eventually migrated eastward across Asia, arriving to South East Asia approximately 2.5Kya (25). D. melanogaster invaded the Americas and Australia within the last several hundred years and likely colonized these continents in their entirety quickly (26, 27). Historical records suggest that D. melanogaster colonized North America and Australia each in a single event (26, 27). However, population genetic (28, 29) and morphological evidence (30) suggest that, for the Americas at least, there were multiple colonization events with some migrants coming from Africa and some from Europe. While there is less evidence that Australia experienced multiple waves of colonization by D. melanogaster such a scenario is plausible given the high rates of human migration and inter-continental travel during the 19th century.
Evidence of multiple waves of colonization of North America comes from morphological and genetic observations. It has been noted that Caribbean populations of D. melanogaster are more phenotypically similar to African populations than continental North American populations are (30). Moreover, population genetic evidence demonstrates that at least one mid-latitude North American population of D. mleanogaster is a mixture of European and African lineages (28). These observations suggest that African populations of D. melanogaster colonized the Caribbean and then southern North America while European populations colonized northern North America. If true, North America would represent a secondary contact zone between diverged European and African flies given the high degree of differentiation between these ancestral populations (31). A natural consequence of such a scenario is that many genetic variants would appear clinal, even in the absence of spatially varying selection pressures (5).
Investigating whether North America and Australia represent secondary contact zones is, therefore, crucial for our understanding of the extent of spatially varying selection operating on this species. Classic work in Drosophila population genetics has suggested that a large number of polymorphisms are clinal (e.g., 14) and recent genomic work has further confirmed that a large fraction of the genome is highly differentiated (21, 22, 32) and clinal (33) between temperate and tropical populations both within North America and Australia. Analyses based on a limited number of markers suggested that there is limited population structure among D. melanogaster populations (34, 35). Consequently, patterns of clinal variation genome-wide are often thought to be generated and maintained by spatially varying selection (e.g., 21, 22, 32, 33). However, if secondary contact has occurred in both North America and Australia, clinality at many loci throughout the genome could have been generated by demographic forces due to the high divergence between European and African populations of flies (31).
The models of dual colonization and adaptive differentiation as evolutionary forces that generate and maintain clinal variation in North America and Australia are not mutually exclusive. Notably, one plausible model is that dual colonization of these continents generated patterns of clinal variation and spatially varying selection acting has slowed the rate of genetic homogenization among populations. Accordingly, we sought to investigate whether genome-wide patterns of clinal genetic variation in North America and Australia show signals of dual colonization and local adaptation. We find that both North American and Australian populations show several genomic signatures consistent with secondary contact and suggest that this demographic process is likely to have generated patterns of clinal variation at a large fraction of the genome in both continents. Despite this genome-wide signal of recent admixture, we find evidence that spatially varying selection has shaped patterns of allele frequencies at some loci along latitudinal clines. We discuss these findings in relation to the well-documented evidence of spatially varying selection acting on this species as well as the interpretation of patterns of genomic variation along broad latitudinal clines in general.
Results
Data
We examined genome-wide estimates of allele frequencies from ∼30 populations of D. melanogaster sampled throughout North America, Australia, Europe and Africa (Fig. 1A). Our analyses largely focused on patterns of variation in North American and Australian populations and, consequently, we primarily focus on two sets of SNP markers. First, we utilized allele frequency estimates at ∼500,000 high quality SNPs that segregate at intermediate frequency (MAF > 15%) in North America (36). The second set was composed of ∼300,000 SNPs that segregate at intermediate frequency in Australia (32). For analyses that examine patterns of polymorphism in both North America and Australia, we examined SNPs that were at intermediate frequency in both continents, yielding a dataset of ∼190,000 SNPs. Because of the low sequencing coverage in the Australian populations, it is unclear if the reduced polymorphism in that continent reflects the demographic history of those populations or experimental artifact. Although our analysis primarily focused on patterns of polymorphism in North America and Australia, we also examined allele frequency estimates at both sets of polymorphic SNPs in populations sampled in Europe (37, 38) and Africa (31).
(A) Map of collection locales (squares) and proposed colonization routes of D. melanogaster (arrows). Colors of the arrows indicate that populations locally adapted to either tropical environments (red) colonized similarly tropical locales, whereas populations locally adapted to temperate environments (blue) colonized similarly temperate locales. (B) Neighbor-joining tree of sampled populations. (C) Proportion of African ancestry among sampled populations. Note, the proportion of European ancestry is equal to one minus the proportion of African ancestry. The red (blue) point represents the proportion African ancestry of the Pennsylvanian samples collected in the fall (spring).
Genomic signals of secondary contact
We performed a series of independent analyses to examine whether North America and Australia represent secondary contact zones of European and African populations of D. melanogaster. First, we constructed a neighbor-joining tree based on genome-wide allele frequency estimates from populations sampled world-wide using 500 sets of 10,000 randomly sampled SNPs (Fig 1B). As expected, African populations exhibited the greatest diversity (31) and clustered at the base of the tree while European populations clustered at the tip. North American and Australian populations clustered between African and European populations (Fig. 1A), a pattern that supports the model (39) that both North American and Australian populations result from secondary contact of European and African ones.
Next, we calculated the proportion of African ancestry in North American and Australian populations by modeling these populations as a linear combination of African and European ancestry. Proportion African ancestry is negatively correlated with latitude in North America and Australia (Fig. 1C). Within the Pennsylvanian population, the proportion of African ancestry was not different between samples collected during the spring and the fall (Fig. 1C). Note, in this analysis the proportion of European ancestry is the inverse (i.e., 1-α; see Materials and Methods) of the proportion African ancestry; thus the proportion of European ancestry is positively correlated with latitude in North America and Australia.
Finally, we calculated the f3 (40) statistic – a formal test of admixture – for each North American and Australian population using each sampled European and African population as a putative source population. We observe a significantly negative f3 statistic (Table 1) for each North American population when using the Italian and Cameroonian populations as donor populations. Significantly negative f3 statistics for subsets of North American populations were observed when using other European and African populations as donor sources (Data S1). A negative f3 statistic can be taken as conclusive evidence of secondary contact between these, or closely related, donor populations (41). We did not observe significantly negative f3 statistics for either Australian population using any combination of European or African source populations (Data S1).
Formal admixture analysis
We note that the absence of evidence of admixture using the f3 statistic for the Australian populations should not be taken as evidence of the absence of admixture. Notably, the choice of donor populations can influence the value of the f3 statistic. For instance, we do not observe significantly negative f3 statistics for all North American populations when using alternate founder populations (Data S1). Therefore, we speculate that Australian populations were founded by European (likely British, see Discussion) and/or African populations that are not included in our dataset.
Taken together, these results support the view that both North America and Australia represent secondary contact zones between European and African lineages of D. melanogaster. Moreover, our results confirm an earlier model (23) that European D. melanogaster colonized high latitude locales in North American and Australia whereas African flies colonized low latitude locales in these continents. Genome-wide, low-latitude populations are more similar to African ones whereas high-latitude populations are more similar to European ones.
Under this dual-colonization scenario, we would expect that a large fraction of the genome varies clinally. Indeed, among North American populations of D. melanogaster approximately one third of all common SNPs, on the order of 105, are clinal (36). The vast extent of clinal variation in North America, then, is consistent with a dual colonization scenario which would generate patterns of clinal variation at a large fraction of the genome. However, these results do not preclude the existence of spatially varying selection that could also be acting among these populations which could explain patterns of differentiation reported for some loci (e.g., 14, 15, 17) and could slow the rate of homogonization of allele frequencies at neutral polymorphisms throughout the genome among clinally distributed populations. We note that a similar analysis of the extent of clinality in Australia is not possible because we lack genome-wide allele frequency estimates from intermediate latitude populations in that continent.
Genomic signals of parallel local adaptation along latitudinal gradients
Our previous analysis supports the model that the demographic history of D. melanogaster has contributed to genome-wide patterns of differentiation among temperate and tropical populations of D. melanogaster living in North America and Australia. Regardless of this putative demographic history, multiple lines of evidence suggest that populations of flies living along broad latitudinal gradients have adapted to local environmental conditions that may be associated with aspects of temperate environments (see Discussion). Accordingly, we performed several tests to assess whether there is a strong, observable genomic signal of local adaptation.
First, we sought to identify FST outliers. We used a novel technique, OutFLANK (42), that attempts to identify SNPs subject to spatially varying selection while maintaining a low false positive rate. This method models the distribution of FST values after trimming the extreme tails under the assumption that the central portion (e.g., the 5th-95th quantiles) of the FST distribution largely reflects the demographic history of the sampled populations. Then, using the inferred FST distribution as a null distribution, OutFLANK seeks to identify SNPs that are more differentiated than expected by chance. At a false discovery rate (FDR) of 5%, we did not identify any SNPs with FST values significantly higher than expected by chance among temperate and tropical populations in Australia. At a similar FDR of 5%, we identified ∼200 SNPs with FST values significantly higher than expected by chance (Data S2) among North American populations. Note that the genome-wide average FST among North American populations is lower than among Australian populations (0.025 vs. 0.08 respectively), suggesting that the lack of significantly elevated FST values in Australia is not due to a lack of population differentiation but rather a high genome-wide differentiation likely caused by recent secondary contact.
The exact number of SNPs with significantly elevated FST in any particular continent will be subject to a various of considerations including the number of sampled populations, the precision of allele frequency estimates, and the power of particular analytic methods to detect outlier FST. Some of these factors vary between our North American and Australian samples and thus our power to detect significant elevation of FST will vary between continents. Therefore, we investigated the general patterns of differentiation and parallelism between the sets of populations sampled in North America and Australia. In addition, we also examined patterns of differentiation and parallelism between these continents and populations sampled from the Old-World (i.e., Europe and Africa).
For these analyses, we first examined whether SNPs that were highly differentiated among one set of populations were also differentiated in another set (hereafter, ‘co-differentiated’). To perform this analysis, we calculated the odds ratio (see Materials and Methods) that SNPs fell above a particular quantile threshold of the FST distribution in any two sets of populations (Fig 2A). We performed this analysis for SNPs that fell either within or outside of the large cosmopolitan inversions (43). We find that SNPs that are highly differentiated in North America are also highly differentiated in Australia. In addition, we find that SNPs that are highly differentiated in either North America or Australia are also highly differentiated between Europe and Africa. Although patterns of co-differentiation are higher among SNPs within the large, cosmopolitan inversion than for SNPs outside the inversions, the qualitative patterns remain the same for either SNP class suggesting that clinal variation in inversions per se does not drive the observed high levels of co-differentiation.
SNPs that are co-differentiated among temperate and tropical populations in North America, Australia, or the Old-World can be differentiated in a parallel way or at random among each geographic region. We show here that there is a high degree of parallelism at the SNP level, genome-wide, among polymorphisms that are highly differentiated in any two sets of populations (Fig 2B). Patterns of parallelism at highly co-differentiated SNPs are similar among SNPs within or outside the large cosmopolitan inversions again suggesting that clinal variation in inversions are not driving genome-wide patterns of parallelism.
Patterns of co-differentiation and parallelism between North American, Australian, and Old-world populations. (A) log2 odds-ratio that SNPs fall above the FST quantile cut-off (x-axis) in both sets of populations (NA: North America; AUS: Australia; OW: Old-World). (B) Proportion of SNPs that vary in a parallel way given that they fall above the FST quantile cut-off in both sets of populations. Confidence bands represent 95% confidence intervals.
High rates of co-differentiation and parallelism among temperate and tropical populations sampled throughout the world can be interpreted in two ways. On the one hand, these patterns could be taken as evidence of parallel adaptation to aspects of temperate environments. On the other hand, these patterns are consistent with the model presented above that North American and Australian populations are the result of recent secondary contact between European and African lineages of flies (see Results: Genomic signals of secondary contact).
To differentiate these alternative interpretations, we estimated rates enrichment of highly co-differentiated SNPs and rates of parallelism at highly co-differentiated SNPs among classes of polymorphisms that that we expect, a priori, to be more or less likely to contribute to local adaptation. We reasoned that SNPs falling in short-introns, which have been previously shown to evolve neutrally (44), would be the least likely to contribute to local adaptation. In contrast, SNPs in other functional classes (e.g., coding, UTR, intron) might be more likely to contribute to local adaptation along latitudinal clines (21). We contrasted rates of co-differentiation and parallelism at these putatively functional SNP classes with rates at the short-intron (hereafter ‘neutral’) SNPs and at control SNPs matched to each class by several important biological and experimental features. These comparisons take into account the spatial distribution of SNPs along the chromosome (see Materials and Methods). We reasoned that if parallel adaptive processes have contributed to genome-wide signals of co-differentiation and parallelism in Australia and North America, (1) some functional SNP classes would show a higher rate of co-differentiation and parallelism than neutral SNPs, (2) functional SNPs would show a higher rate of co-differentiation and parallelism than their control SNPs, and (3) neutral SNPs would show a lower rate of co-differentiation and parallelism than their control SNPs.
We find little evidence that various functional classes show differences in rates of codifferentiation or parallelism than either neutral SNPs or their matched controls (Fig. 3). Moreover, neutral SNPs show similar rates of co-differentiation and parallelism as their matched controls (Fig. 3). There is suggestive evidence that SNPs falling in 5’ UTRs show greater of co-differentiation than expected by chance, but this comparison is not significant after correcting for multiple tests (see FST > 95% Fig. 3A; pnaive = 0.01; pcorrected = 0.24). Moreover, highly co-differentiated SNPs in 5’UTR are not more likely to be parallel than expected by chance (Fig. 3B), suggesting that the observed excess of co-differentiation may be a statistical artifact. All other tests of excess co-differentiation or parallelism at different SNP classes were not significantly different from expectation (p > 0.05).
Patterns of (A) co-differentiation and (B) parallelism among various classes of SNPs relative to their matched controls. Vertical lines represent 95% confidence intervals. Horizontal dotted lines represents the null expectations. See Materials and Methods for details.
Taken together, the tests we performed to identify strong genomic signals of parallel adaptation along latitudinal clines were equivocal. We show that there were a modest number of FST outliers among North American populations sampled along a broad latitudinal cline and no observable FST outliers among Australian populations. While the outlier detection method we used is highly conservative, the fact that so few outliers were detected suggests that the bulk of the FST distribution is determined by the demographic history of this species. We show that SNPs with high FST among any one set of populations are likely to have high FST among other sets of population. Furthermore, SNPs that are highly co-differentiated are likely to vary in a parallel fashion among geographic regions. While this result could suggest parallel adaptation, it is also consistent with the dual colonization model we present above. Finally, we show that rates of co-differentiation and parallelism at highly co-differentiated SNPs are similar between functional SNPs, neutral SNPs, and their matched control SNPs suggesting that the evolutionary forces shaping allele frequencies along latitudinal clines are similar across SNPs that are more- or less-likely to contribute to local adaptation.
Discussion
Herein we report results from a series of analyses that (1) examine whether populations of D. melanogaster sampled throughout North America and Australia show signatures of recent secondary contact between European and African lineages, and (2) examine whether there is a genomic signal of spatially varying selection acting along latitudinal gradients. We find that both North America and Australia show several signatures of secondary contact (Fig. 1BC, Table 1). Notably, high latitude populations are closely related to European populations, whereas low latitude populations are more closely related to African ones. This result implies that a large portion of clinal variation within these continents could, in principal, be generated by the dual colonization of both North America and Australia (Fig. 1A). Consistent with this view, SNPs that are highly differentiated between temperate and tropical locales in either North America or Australia are also highly likely to be differentiated in a parallel way between Europe and Africa (Fig. 2AB). In addition, we report that genome-wide scans for significantly differentiated polymorphisms identified a limited number of outlier loci. Taken together, our results support the model that recent secondary contact in North America and Australia has generated clinal variation at a large fraction of polymorphisms genome-wide and that spatially varying selection acting at a moderate number of loci acts to slow the rate of genomic homogenization between geographically separated populations.
Secondary contact and the generation of clinal variation in allele frequencies
Recent secondary contact between formerly (semi-) isolated populations is a potent force that can generate clinal variation genome-wide (5). In D. melanogaster, high levels of genetic differentiation have been observed between temperate and tropical populations sampled in North America and Australia (21, 22, 32, 33). In North America at least, most of these highly differentiated SNPs vary clinally (i.e., in a roughly monotonic fashion along latitudinal gradients;(36)). Moreover, surveys of allele frequencies along latitudinal clines in both North America and Australia at allozymes (14), SNPs (14, 36, 45), microsatellites (46), and transposable elements (20) have repeatedly demonstrated that approximately one third of all surveyed polymorphisms are clinal in either continent. At face value the high proportion of clinal polymorphisms throughout D. melanogaster’s genome suggests that demographic processes such as secondary contact have contributed to the generation of clinal variation in this species among recently colonized locales (26, 27).
Accordingly, we tested if newly derived populations of D. melanogaster show signatures of recent secondary contact. Using a variety of tests, we show that genome-wide patterns of genetic variation from populations sampled in North America and Australia are consistent with recent secondary contact (Fig. 1BC, Table 1). While historical records from North America (26) and Australia (27) suggest a single point of colonization of D. melanogaster, results from morphological, behavioral, and genetic studies reported here and elsewhere (28-30) suggest that a dual colonization scenario is more likely. At least for the Americas active trade between Europe and western Africa supports the model that North America represents a secondary contact zone. Australia did not experience the same types of trade with the Old World and throughout the 19th century intercontinental travel to Australia was primarily restricted to British ships. However, British ships traveling to Australia ported in South Africa and India then, after the opening of the Suez Canal in East Africa (47). This raises the possibility that secondary contact between European and African fruit fly lineages could have occurred immediately prior to the successful colonization of Australia by D. melanogaster in the mid 19th century (27). Under this mixed-lineage, single colonization scenario, rapid ecological sorting of colonizing lineages to temperate and tropical niches (48) may have created a gradient where European flies were initially predominant at high latitudes and African flies predominant at low latitudes within Australia.
Although secondary contact is capable of generating patterns of clinal variation genome-wide, clines generated through this demographic process are transient. As admixed populations approach migration-selection equilibrium, clines at neutral loci should attenuate. Moreover, once at equilibrium, neutral differentiation should be minimal (49) for species such as D. melanogaster where Nm has been estimated to be on the order of ∼1 (50, 51) and long-distance dispersal is believed to be frequent (52).
Thus, the critical question in determining whether the vast amount of clinal variation in North American and Australian flies has been generated by demography or selection is whether or not this species is at migration-selection equilibrium in these continents. There are several reasons why we suspect this species is not at equilibrium. First, D. melanogaster appeared in North America and Australia in the mid-to late 19th century (26, 27), or on the order of 1000 generations ago, assuming approximately 10 generations per year. Estimates of local, demic N are on the order of 104 (53–55) implying that m is on the order of 10−4 (if Nm ∼ 1). If these estimates are accurate to the order of magnitude, it would take approximately 2500 generations to get about half way to equilibrium (56) or ∼10,000 generations to fully approach equilibrium (57). Thus, from a simple demographic perspective, it would seem unlikely that D. melanogaster has reached migration-selection-drift equilibrium.
A second piece of evidence that D. melanogaster have yet to reach equilibrium come from contrasting patterns of migration and differentiation between drosophilid species. Singh and Rhomberg (51) contrasted estimates of population differentiation and Nm between North American populations of D. melanogaster and D. pseudoobscura. They note that both species have similar estimates of Nm (∼1) but that D. melanogaster shows higher levels of differentiation than D. pseudoobscura. These authors suggested the discrepancy between levels of differentiation between these species is a function of their ecology and adaptive evolutionary dynamics. They argued that high levels of differentiation among D. melanogaster populations results from local adaptation driven by the varied selection pressures associated with human commensalism, whereas low levels of differentiation in D. pseudoobscura might result from habitat selection. Both species, however, appear to rapidly evolve to subtle shifts in selection pressures experienced in the field (36, 58). Therefore, we conclude that differences in patterns of differentiation between these species reflects the fact that D. pseudoobscura is a Nearctic endemic and is thus likely to be closer to equilibrium than emigrant populations of D. melanogaster.
Finally, it is worth noting that others have suggested that non-African populations of D. melanogaster are not at equilibrium. In general, non-African populations of D. melanogaster show a reduction in diversity coupled with an excess of rare variants (59). This genome-wide patterns is consistent with a population bottleneck during colonization followed by population expansion. Others have noted that non-African populations of D. melanogaster also have higher levels of linkage-disequilibrium (LD) than expected under the standard neutral model (60–62) whereas LD in African populations is more consistent with neutrality (62 cf. 63). Although genome-wide elevation of LD could be caused by various factors including pervasive positive- or negative-selection, admixture would also possibly generate this signal.
Previous studies examining departure from equilibrium models in D. melanogaster have concluded that caution should be taken when conducting genome-wide scans for positive-selection given the non-equilibrium nature of this species (60). Notably, demographic forces such as population bottlenecks can, in principal, mimic many of the signatures left by some types of adaptive evolution. A complimentary approach to quantify the magnitude of adaptive evolution and to identify loci subject to selection is to identify polymorphisms that are differentiated between populations that are subject to divergent selection pressures. However, results presented here demonstrate that, for D. melanogaster at least, signatures of adaptive evolution from genome-wide patterns of differentiation along latitudinal clines in newly derived populations in North America and Australia should be taken with a similar or even greater degree of caution as traditional scans for recent, positive selection.
Spatially varying selection and the maintenance of clinal variation in allele frequencies
Whereas secondary contact is capable of generating clinal variation, spatially varying selection is required for its long-term maintenance. There is little doubt that populations of D. melanogaster living along broad latitudinal clines in temperate environments have adapted to spatially varying selection pressures. Support for the idea of local adaptation along latitudinal clines comes from three main lines of evidence.
First, certain phenotypes show repeatable clines along latitudinal and altitudinal gradients that mirror deeper phylogenetic variation among temperate and tropical species. For instance, aspects of body size vary clinaly in North America (64) and Australia (65) as well as along altitudinal/latitudinal clines in India (66) and altitudinal clines within Africa (67, 68). Given such patterns of parallelism within and among continents, including within the ancestral African range, the most plausible explanation is that parallel selection pressures have generated these patterns of latitudinal and altitudinal variation. These intraspecific clines mimic interspecific patterns among temperate and tropical endemic drosophilids following Bergmann’s rule (69, 70) again implicating that natural selection has shaped these patterns of genetically based, phenotypic variation.
Second, certain genetic and phenotypic clines in D. melanogaster have shifted over decadal scales. Shifts in these clines are consistent with adaptation to aspects of global climate change wherein alleles common in low-latitude populations have become more prevalent in high-latitude ones over the last 20 years (15).
Finally, using a conservative outlier detection approach (42), we identify several hundred polymorphisms in North America that are significantly differentiated (see Results and Data S2). Although the function of many of these polymorphisms is presently unknown, several are within the genes known to affect life-history traits that vary among temperate and tropical populations. For instance, one significantly differentiated SNP in North America (3R:17433977) resides within the first intron of the Insulin receptor gene (InR). Natural polymorphisms in InR have been recently shown to contribute to local adaptation between temperate and tropical populations of flies (16, 17). Two additional significantly differentiated SNPs (3R:13749473 and 3R:13894182) reside within introns of couch-potato (cpo), a gene which has been shown to be associated with diapause incidence in natural populations (13 cf. 18). Intriguingly, the SNP in cpo that has been previously associated with diapause incidence (3R: 13793588) is not among the significantly differentiated SNPs within North America. In our dataset, this SNP has an observed FST of 0.1 among North American populations falling in the upper 1.5% of the FST distribution. However, the associated false discovery rate of this SNP under the model used in OutFLANK is 80%. Similarly, the extensively studied threonine/lysine polymorphism (53, 71) that encodes the Fast and Slow allozyme variants at Alchohol dehydrogenase (Adh, 2L:14617051) falls in the upper 3.5% quantile of the North American FST distribution (FDR 99%).
The identification of significantly differentiated SNPs within North America can be taken as evidence of local adaptation to spatially varying selection pressures. However, the observation that two SNPs (one in cpo and one in Adh) that each likely contribute to local adaptation fall in an upper, but not extreme, tail of the FST distribution suggests that there are many more ecologically relevant and functional polymorphisms that have contributed to local adaptation in D. melanogaster. However, the signal of high differentiation caused by spatially varying selection at these SNPs is likely masked by recent admixture that has contributed to a high level of differentiation genome-wide. In light of these results, we suggest that scans for local adaptation based on patterns of genetic differentiation in D. melanogaster are an important first step in identifying adaptively differentiated clinal polymorphisms but that additional evidence, such as functional validation (17, 18), should be gathered before concluding that differentiation is caused by adaptive processes.
Conclusions
It has long been recognized that genetic differentiation among populations can be caused by both adaptive and demographic (neutral) processes (72). Due to D. melanogaster’s large effective population size (73) and high migration rate (52), others concluded that differentiation among populations sampled along latitudinal gradients is primarily caused by spatially varying selection. Work presented here supports the notion that spatially variable selection does contribute to some differentiation among populations (see Results: Spatially varying selection…). However, several genome-wide signatures presented here (Fig 1BC, Table 1) and elsewhere (28, 29, 36) indicate that populations of flies in North America and Australia result from admixture of European and African lineages. High-latitude (temperate) populations in North America and Australia are more closely related to European populations whereas low-latitude (tropical) populations are more closely related to African ones (Fig 1BC) suggesting that admixture occurred along a latitudinal gradient and that this demographic event generated clinal genetic variation at roughly 1/3 of all common SNPs (36). These colonizing lineages of flies were likely already differentially adapted to the temperate and tropical conditions that they encountered in North America and Australia. Consequently the recent demographic history of this species in North America and Australia is collinear with both local adaptation within these newly colonized continents and among the ancestral ranges.
One practical consequence of the collinearity of demography and adaptation is that the identification of clinality at any particular locus cannot be taken exclusively as evidence of spatially varying selection. We propose that an alternative approach to identify loci that contribute to aspects of adaptation to temperate environments in D. melanogaster is to identify alleles that vary over spatial and seasonal gradients (36) that are orthogonal to the demographic history of this species.
Materials and Methods
Genome-wide allele frequency estimates
We utilized novel and publically available genome-wide estimates of allele frequencies of D. melanogaster populations sampled world-wide (Figure 1A, Table S1). Allele frequency estimates of six North American populations are described in Bergland et al. (36). Allele frequency estimates of three European populations are described in Bastide et al. (37) and Tobler et al. (38). Allele frequency estimates from 22 African populations are described in Pool et al. (31). Allele frequency estimates of two Australian populations are described in Kolaczkowski et al. (32). Allele frequency estimates from an additional two Australian populations are reported here for the first time. Allele frequency estimates from these additional Australian populations were made by pooling ten individuals from each of 22 isofemale lines originating from Innisfail (17°S) or Yering Station (37°S), Australia (isofemale lines kindly provided by A. Hoffmann). Sequencing libraries and mapping followed methods outlined in (36). Because Australian data were low coverage (∼10X per sample, on average), we combined the two northern populations and two southern populations into two new, synthetic populations which we refer to as ‘tropical’ and ‘temperate,’ respectively.
We performed SNP quality filtering similar to the methods presented in Bergland et al (36). Briefly, we excluded SNPs within 5bp of polymorphic indels, SNPs within repetitive regions, SNPs with average minor allele frequency less than 15% in both North America and Australia, SNPs with low (<5) or excessively high read depth (>2 times median read depth) and SNPs not present in the Drosophila Genetic Reference Panel (59). African samples were not quality filtered for read depth because allele frequency estimates from these samples were derived from sequenced haplotypes and not pooled samples. Regions of inferred admixture (31) in African samples (i.e., introgression of European haplotypes back to African populations) were removed from analysis.
Estimation of the population tree
We calculated Nei’s genetic distance (74) between each pair of populations and generated a population tree using the neighbor-joining algorithm implemented in the R (75) package ape (76). To generate bootstrap values for each node, we randomly sampled 10,000 SNPs 500 times.
Estimation of the proportion African and European ancestry
We obtained maximum-likelihood estimates of the proportion of African and European ancestry in North American and Australian populations. For these estimates, we modeled each North American and Australian population as a linear combination of African and European populations. Note, for the Pennsylvanian population, we pooled together flies collected over the course of three years in either the spring or fall.
To estimate the proportion of African ancestry in each sampled population, we maximized the likelihood,
where α is the estimated proportion African ancestry (and 1-α the proportion of European ancestry); xij is the count of alternate (non-reference) reads at SNP i in North American or Australian population j, nEffij is the number of effective reads at SNP i in population j; fqiAf and fqiEu are the observed allele frequencies of SNP i averaged over all sub-populations in each of the African and European continents, respectively; among the n SNPs under investigation. We define nEffij, the effective number of reads at SNP i in population j as,
where rdi is the number of reads covering the ith SNP and nChrj is the number of chromosomes sampled from the jth population. We use the effective number of reads rather than read depth to account for the double binomial sampling that can occur during pooled resequencing which can lead to inflated precision unless this correction is applied. α was estimated by maximizing this likelihood function using the optimize procedure in R. In order to generate confidence intervals of α, we performed bootstrap resampling by randomly sampling 500 sets of 10,000 SNPs.
Formal tests of admixture
We used the f3 statistic (40) to test if North American and Australian populations show signatures of admixture between African and European populations. For each North American and Australian population, we calculated f3 using each European population as one putative donor population and each African population with more than 5 haplotypes as the other putative donor population. f3 statistics were calculated using TreeMix version 1.13 (77) with 500 bootstrap replicates sub-sampling one of every 500 SNPs.
FST outlier test
We used OutFLANK (42) to test for the presence of polymorphisms with higher FST than expected by chance. Under an island model, the classic Lewontin-Krakauer test for FST outliers assumes that the distribution of FST is proportional to a χ2 distribution with degrees of freedom equal to one less the number of populations examined (78). The assumption underlying this model have been criticized (79) and OutFLANK seeks to identify FST outliers by inferring the degrees of freedom of the observed FST distribution after trimming the distribution of high and low FST SNPs. This method has been shown to have a low false positive rate. We used OutFLANK to identify FST outliers among either North American populations (average pair-wise FST between samples collected in FL, GA, SC, NC, PA, and ME) or Australian populations (temperate vs. tropical) by trimming the top and bottom 5% of the observed FST distribution.
Differentiation and rates of parallelism at various SNP classes
We tested if SNPs at neutral sites (short-introns) or various functional categories (Fig. 3) were more likely than expected by chance to be co-differentiated or show parallel changes in allele frequency between temperate and tropical locales in both North America and Australia conditional on them being co-differentiated. To assess rates of co-differentiation we calculated the odds that SNPs fell above one of three FST quantile thresholds (85, 90, 95%) in both North America and Australia. We compared this value to the odds of co-differentiation from 500 sets of randomly selected SNPs that were matched to the focal SNPs by recombination rate (80), chromosome, inversion status (at the large, cosmoplitan inversions In(2L)t, In(2R)NS, In(3L)Payne, In(3R)K, In(3R)Payne, In(3R)Mo, In(X)A, and In(X)Be), average read depth in North America and Australia, and heterozygosity in both continents. To control for the possible autocorrelation in signal along the chromosome, we divided the genome into non-overlapping 50Kb blocks and randomly sampled, with replacement, one SNP per block. To assess rates of parallelism, we calculated the fraction of SNPs that were significantly co-differentiated and varied in a parallel fashion between North America and Australia for each SNP class and their matched, genomic controls, again controlling for the spatial distribution of SNPs along the chromosome. We report the difference in rates of parallelism (Fig. 3B). Standard deviations of the log2(odds-ratio) of co-differentiation and for differences in the rates of parallelism are calculated as in (36).
Acknowledgements
We thank Joyce Kao, Heather Machado, and Annalise Paaby for insightful comments on earlier versions of this manuscript. We thank David Lawrie for providing lists of neutrally evolving SNPs and Ary Hoffmann for graciously providing isofemale lines from Australia. AOB was supported by an NIH National Service Research Award (F32 GM097837). JG is a Ramon y Cajal fellow (RYC-2010-07306) supported by grants from the European Commission (Marie Curie CIG PCIG-GA-2011-293860) and from the Spanish Government (Fundamental Research Projects Grant BFU-2011-24397). This work was supported by NSF DEB 0921307 (awarded to PS) and NIH R01GM089926 (awarded to PS and DP).
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵