Abstract
Changes in gene regulation at multiple levels may comprise an important share of the molecular changes underlying adaptive evolution in nature. However, few studies have assayed within- and between-population variation in gene regulatory traits at a transcriptomic scale, and therefore inferences about the characteristics of adaptive regulatory changes have been elusive. Here, we assess quantitative trait differentiation in gene expression levels and alternative splicing (intron usage) between three closely-related pairs of natural populations of Drosophila melanogaster from contrasting thermal environments that reflect three separate instances of cold tolerance evolution. The cold-adapted populations were known to show population genetic evidence for parallel evolution at the SNP level, and here we find significant although somewhat limited evidence for parallel expression evolution between them, and less evidence for parallel splicing evolution. We find that genes with mitochondrial functions are particularly enriched among candidates for adaptive expression evolution. We also develop a method to estimate cis-versus trans-encoded contributions to expression or splicing differences that does not rely on the presence of fixed differences between parental strains. Applying this method, we infer important roles of both cis-and trans-regulation among our putatively adaptive expression and splicing differences. The apparent contributions of cis-versus trans-regulation to adaptive evolution vary substantially among population pairs, with an Ethiopian pair showing pervasive trans-effects, suggesting that basic characteristics of regulatory evolution may depend on biological context. These findings expand our knowledge of adaptive gene regulatory evolution and our ability to make inferences about this important and widespread process.
Introduction
Different species or populations often evolve similar phenotypes when adapting to similar environments (Schluter 2000; Losos, 2011). Although such parallel phenotypic evolution can be caused by structural mutations changing amino acids (Hoekstra and Coyne, 2007), there is increasing evidence that regulatory mutations altering gene expression underlie many cases of phenotypic evolution (Wittkopp & Kalay, 2012; Jones et al. 2012; Stern 2013; Sackton et al. 2019). Most studies on gene expression focus on expression abundance (the number of transcripts for a whole gene). However, alternative splicing resulting in the difference of transcript proportions can also contribute to adaptation (Barbosa-Morais et al. 2012; Gamazon and Stranger 2014; Smith et al. 2018). Therefore, understanding the transcriptomic basis of parallel phenotypic evolution requires studies of both expression abundance and alternative splicing, although the latter aspect has rarely been studied.
The level of parallelism for gene expression abundance changes varies across study systems. In some taxa and natural conditions, significantly more genes show parallel changes (repeatedly up- or down-regulated in one ecotype relative to the other among independent population pairs) than anti-directional changes (Zhao et al. 2015; Hart et al. 2018; Kitano et al. 2018; McGirr and Martin. 2018). However, some other cases did not show significant parallel patterns, or they even show anti-parallel patterns (Derome et al. 2006; Lai et al. 2008; Hanson et al. 2017). The varying degree of parallelism may partly be explained by the level of divergence among ancestors: more closely related ancestors are expected to show a higher degree of parallel genetic evolution underlie similar phenotypic evolution (Conte et al. 2012; Rosenblum et al. 2014).
Furthermore, gene expression evolution can be caused by the same or different molecular underpinnings. Because of the difficulties of mapping expression QTL, a first step is to classify the expression evolution into two regulatory classes. Cis-regulatory changes are caused by local regulatory mutations and result in allele-specific expression difference in a hybrid of divergent parental lines (Cowles et al. 2002; Wittkopp et al. 2004). Trans-regulatory changes are caused by regulatory mutations at other loci. They modify the expression of both alleles in hybrid diploids and do not result in allele-specific expression difference (Gilad et al. 2008). The relative importance of cis- and trans-effects to parallel evolution varies among different studies systems (Wittkopp et al. 2008; McManus et al. 2010; Wittkopp and Kalay 2012; Chen et al. 2015; Osada et al. 2017; Hart et al. 2018; Nandamuri et al. 2018; Verta and Jones 2019). Most previous studies have focused on regulatory evolution between relatively distantly related lineages such as different species, from which population genetic evidence of adaptive evolution may not be available. Hence, the contributions of cis- and trans-effects to the recent adaptive divergence between populations remain mostly unknown.
In part because of the interest in the evolutionary response to climate change, Drosophila has been used as a model system to study the genetic basis of thermal adaptation (Hoffmann et al. 2003). Because temperature is an important environmental variable along latitudinal clines, clinal populations of Drosophila melanogaster have been studied for decades (Adrion et al. 2015). Along these clines, populations exhibit different degrees of cold tolerance in the expected direction, suggesting spatially varying selection related to temperature (Hoffmann and Weeks 2007; Schmidt and Paaby 2008). The recent development of genomics has allowed identification of clinal genomic variants, which are candidates for thermal adaptation (e.g., Kolaczkowski et al. 2011; Fabian et al. 2012; Bozicevic et al. 2016; Mateo et al. 2018). There is also evidence of parallel evolution at the genomic and transcriptomic level (Reinhardt et al. 2014; Bergland et al. 2015; Machado et al. 2015; Zhao et al. 2015; Juneja et al. 2016; Zhao and Begun 2017). Some of these studies compared clines between species (which may have somewhat distinct biology), while others compared clines between Australia and North America (which both feature primarily European ancestry with clinally variable African admixture). Other transcriptomic studies have identified genes showing differential expression between sub-Saharan African and European populations (e.g., Catalan et al. 2012; Huylmans and Parsch 2014), which are separated by moderately strong neutral genetic differentiation associated with the out-of-Africa bottleneck.
More broadly, populations of Drosophila melanogaster from contrasting environments offer an excellent opportunity to study parallel gene regulatory evolution and its underlying mechanisms. Originating from a warm sub-Saharan ancestral range (Lachaise et al. 1988; Pool et al. 2012), D. melanogaster has occupied diverse habitats, including environments with contrasting temperature ranges. There are at least three instances in which the species expanded to cold environments: from Africa into higher latitude regions in Eurasia, from Ethiopia lowland to higher altitudes, and from South Africa lowland to higher altitudes. Populations were collected from these six regions, representing three warm-cold population pairs: Mediterranean pair (Med), collected in Egypt (EG, warm) and France (FR, cold); Ethiopian pair (Eth) collected in Ethiopia lowland (EA, warm) and highland (EF, cold); and South Africa pair (SAf), collect in South Africa lowland (SP, warm) and highland (SD, cold). Importantly, each of these population pairs has the advantage of low genetic differentiation between its warm- and cold-adapted members (Pool et al. 2017). Although the cold populations have invaded colder habitats for only ~1000-2000 years (~15k-30k generations) (Sprengelmeyer et al. 2019), they have shown signals of parallel adaptation for cold tolerance and allele frequency changes (Pool et al. 2017). In the present study, this unique system allows us to assess the degree of parallelism for transcriptomic changes underlying parallel cold tolerance evolution.
Here, we generate RNA sequencing (RNA-seq) data for multiple outbred genotypes from each of the six population samples listed above, from larval, pupal, and adult stages. We estimate gene expression and alternative intron usage levels for each sample, then identify cases of unusually high quantitative trait differentiation between each pair of warm- and cold-adapted populations. We find evidence for parallel evolution for expression abundance at the larval and female adult stage, but less parallel signal for splicing. We further tease out the cis-vs. trans-regulatory effect by sequencing the transcriptomics of the parental lines from different populations and their F1 offspring. Applying our resampling approach to study cis- and trans-regulatory effects, we find the relative contribution of cis-vs. trans-effects to adaptive expression differentiation varies notably across population pairs. Finally, we identify several candidate genes with both cis-effects and high FST, as potential targets of local adaptation.
Results
Phenotypic evolution related to cold adaptation
The cold populations have been shown to have a higher proportion of recovered female adults after prolonged cold exposure than the respective warm populations (Pool et al. 2017). Here for egg-to-adult survival at 15°C, we found the FR and EF populations have significantly higher survival than the ancestral range Zambia ZI population, while at 25°C benign temperature all the populations have relatively high survival (75%). Although SD is not significantly better than ZI at cold temperature for this assay, it follows the same trend. Together the results for survival and adult cold tolerance suggest the cold populations have evolved to adapt to low temperature.
Co-directional evolution in gene expression between population pairs
To focus on the transcriptomes of outbred genotypes, we generated eight within-population crosses from each population under a derived cold environment (15 °C). We then surveyed the transcriptomes on larvae, pupae and female adults for each cross using high-throughput RNA sequencing (RNA-Seq). We used a quantitative genetic index, PST, to quantify phenotypic differentiation of expression and splicing between populations in each pair. PST, analogous to FST for genetic variation, measures the amount of trait variance between populations versus total variance for a phenotype (Merila et al. 1997; Brommer 2011; Leinonen et al. 2013). The genes/introns with highest PST quantiles are more likely to be under ecological differential selection between populations than those with lower PST quantiles (Leder et al. 2015).
The numbers of genes that passed the filters for analysis were (same across population pairs): 4699 genes for larva, 5098 genes for pupa and 6786 genes for adult. To study gene expression divergence potentially under ecologically differential selection, we calculated PST (Materials and Methods). The top 20 PST outliers for each population/stage for expression and for splicing are listed in Table S1.
We used the upper 5% of PST quantile as outliers for each population pair. We found signals of parallel expression divergence in all three pairwise comparisons (Med vs. Eth; Med vs. SAf; Eth vs. SAf), where the shared outliers with co-directional changes were more than expected by chance. Across the three developmental stages, adult stage showed the highest level of parallelism (on average 0.34% of outliers were shared and changed consistently).
To explore the broader patterns of parallel changes, we used the upper 5% PST outliers in a population pair (Outlier Pair) and examined whether the expression for this set of genes changed in the same direction in another pair (Directional Pair), regardless of outlier status in the latter pair. There were excesses of co-directional changes in the Directional Pairs for the larval stage (Figure 1). However, the patterns were weaker for the adult stage and there were excesses of anti-directional changes for the pupal stage.
We also performed a similar analysis for PST outliers of alternative exon junction usage. The numbers of exon junctions that passed the cutoffs for PST calculation were 976 for larva, 4604 for pupa and 7059 for adult. The patterns of co-directional changes were qualitatively similar to those for gene expression (Fig. S1). The fractions of co-directional changes were still highest for the larvae among the three stages; all of the comparisons except one showed an excess of co-directional changes relative to the control comparisons. For pupae, there was evidence for both co-directional and anti-directional changes. For female adult stages, the major pattern was an excess of anti-directional changes.
Enriched functional categories for the PST outliers for gene expression and exon usage
Significant Gene Ontology (GO) terms enriched in different sets of PST outliers for gene expression are listed in Table S2. Among the significant GO terms for different population pairs, we found six terms shared between Med pair and Eth pair at the adult stage. The level of sharing is significantly more than we expect by chance based on permuted outlier sets (p < 0.001), suggesting functional convergence for adult development to the cold environment for Med pair and Eth pair. Further, similar GO terms were identified from different pairs at different stages such as terms related to mitochondria, nucleoside metabolic process, and oxidoreductase complex. However, the majority of GO terms were unique for different pairs, suggesting that many functional changes for adaptation to cold environments may be population-specific.
Cis- and trans-acting contributions to differential gene expression abundance
One major goal is to distinguish the contributions of cis- and trans-regulatory effects on expression differentiation. First, we compared the overall strengths of cis- and trans-effects by estimating the absolute values of cis- and trans-effects for all analyzed genes. The magnitudes of trans-effects are significantly larger than the cis-effects in all three population pairs (mean absolute cis effects and trans effects are: Med pair, 0.09 vs. 0.14, p < 2.2e-16; Eth pair, 0.27 vs. 0.32, p =1.5e-14; SAf pair, 0.14 vs. 0.15, p < 2.2e-16. ‘Mann-Whitney’ paired test.). Moreover, we found strong negative relationships between cis- and trans-effects within each population pair (Fig. S2), where the cis- and trans-effects are generally in the opposite directions.
Next, we used our conservative permutation approach (see Materials and Methods) to study how many genes show a significant cis-effect, trans-effect or both. Averaged across population pairs, we found that for the expression abundance, 12.6% show cis only regulatory effects while 26.2% show trans only effects, consistent with trans-effects being stronger on average than cis-effects (Table 2). Because we are interested in the regulatory contributions to adaptive evolution of gene expression, we further compared the ratio of trans only to cis only genes between PST outliers and non-outliers. The ratio is significantly lower in PST outliers than that in non-outliers for Med pair (p = 0.003) but the pattern reverses for Eth pair (p = 8.5e-10; Fig. 3 left; Fig. 4). While the ratio is not different for SAf pair (p = 0.999). Hence, there is not a consistent pattern of greater usage of cis-versus trans-regulatory changes in putatively adaptive expression changes compared with transcriptome-wide differentiation.
On average across population pairs, about 31% of all genes in the analysis showed both effects (Table 2). Among the outlier genes showing both effects (Fig 4), the vast majority (85%) of them were in opposite directions (i.e. compensatory). Similarly, most of the control genes with both effects showed apparent compensation (88%), which is consistent with the transcriptome-wide negative relationship between cis- and trans-effects (Fig. S2). Although the pattern can be biologically meaningful, it may also represent an artifact from using the same F1 expression data for allele specific expression (ASE) estimation to infer both cis- and trans-effects. Any measurement error on ASE will introduce artifactual negative correlation between cis- and trans-acting changes (see Discussion below).
Since the cis-regulatory mutations contributing to local adaptation may show differentiation in allele frequency between populations, we examined whether genes with cis-effects (including cis only genes and genes with both cis- and trans-effects) show association with high FST between the two warm and cold populations – for both window FST (FST_winmax) and SNP FST (FST_SNPmax). We found that only genes with high FST_winmax are enriched in cis-regulated genes in Med pair (the proportion for high FST_winmax is: cis-effect genes, 22.5%; control non-outliers, 11%; p = 0.037). However, there was no significant enrichment for high FST_SNPmax in cis-regulated genes. Moreover, there was no enrichment for either window or FST_SNPmax with cis-regulated genes in the other population pairs.
We then focused more narrowly on a set of outlier genes that showed both significant cis-effect only and higher FST quantile (upper 5%), which could reflect adaptive regulatory evolution targeting the surveyed sequences or nearby sites. For Med pair, there were three cis-genes showing high window FST (Ciao1, Cyp6a17, and NiPp1) and one cis-gene showing high FST_SNPmax (spin). Interestingly, Cyp6a17 encodes a cytochrome P450 protein that is required for temperature preference behavior (Kang et al. 2011). Cyp6a17 variants have also been associated with insecticide resistance (Battlay et al. 2018; Duneau et al. 2018). Cyp6a17 is impacted by a polymorphic whole-gene deletion with contrasting frequencies between populations (Chakraborty et al. 2018), underscoring its likely role in local adaptation. The spin gene is essential for mTOR reactivation and lysosome reformation after starvation and has important effects on nervous system and courtship behavior (Nakano et al. 2001; Rong et al. 2011). For Eth pair, there were two genes with high window FST (CG3529 and mle) and one with high FST_SNPmax (Aldh-III), which encodes a protein that confers a xenobiotic stress resistance and neutralises the lipid aldehydes formed after the attack of reactive oxygen and radicals (Arthaud et al. 2011; Mateo et al. 2014). For SAf pair, one cis-gene showed both high window FST and high FST_SNPmax (AGO2) and one showing high FST_SNPmax (eca). AGO2 is involved with antiviral defense and developmental regulation (Deshpande et al. 2005; Nayak et al. 2010) and was previously found to contain fixed differences between European and African populations (Pool 2015). For the genes showing high FST_SNPmax (spin, Aldh-III, AGO2, and eca), we plotted the SNP FST along the gene region to show the sites that are the most likely targets of selection (Fig. 5). Interestingly, for spin, Aldh-III and eca, the highest FST sites are located in noncoding regions (intron region for spin, downstream of the gene for Aldh-III and upstream of the gene for eca). While for AGO2, the highest FST site was located in the protein coding sequence.
Further, we identified seven genes showing consistent cis-effects across two population pairs (cis-effect favored expression of the same cold or warm parental alleles). Similarly, these shared cis-effect genes might show high genetic differentiation specific for cold populations in the two focal pairs. Using the “Population Branch Excess” statistic (PBE) results from Pool et al. 2017, we found that one gene named Tollo contained SNPs showing high cold-population specific differentiation (PBE quantile < 0.05) in both Eth and SAf pairs. Tollo is known to be have several important functions: innate immune response, glucose and protein metabolism regulation, and peripheral nervous system development (Seppo et al. 2003; Yagi et al. 2010; Akhouayri et al. 2011; Ballard et al. 2014).
Cis- and trans-acting contributions to differential intron usage
For all intron usage, we found the magnitude of trans-effects on average to be higher than that of cis-effects (mean absolute cis effects and trans effects are: Med pair, 0.15 vs. 0.18, p = 6.2e-06; Eth pair, 0.32 vs. 0.34, p < 3.3e-5; SAf pair, 0.20 vs. 0.22, p = 0.00027. ‘Mann-Whitney’ paired test.). Although there are few outlier introns tested for cis- and trans-regulatory effects (Table 3) because of the limited diagnostic SNPs located within the intron regions, we found the numbers of significant trans only introns were higher than that of significant cis only introns summing across three population pairs. While for non-outlier introns, the significant trans only introns are fewer than the significant cis only introns (the numbers of cis only vs. trans only introns are ten vs. 13 for outliers; 311 vs. 132 for non-outliers; X2 = 6.1; df = 1; P = 0.014). Thus, trans-regulated splicing changes appear to be relatively more common for putatively adaptive than for putatively neutral population differences, although the pattern varies geographically (Figure 3).
For the outlier introns showing cis-effects (including only cis and both cis and trans), the maximum FST (FST_SNPmax) around their splice sites tends to be higher than that for non-outliers (average FST_SNPmax for cis outlier vs. non-outliers: Med: 0.184 vs. 0.153; Eth: 0.152 vs. 0.134; SAf: 0.081 vs. 0.055). Because there are few cis outlier introns with SNPs located around splice sites, all three comparisons are non-significant based on Wilcoxon signed-rank test. Across the three comparisons, four genes contained cis-regulated introns with high FST_SNPmax around splice sites (top 15% quantile of FST_SNPmax). One identified in Med pair is Usp10, which is known to regulate Notch Signaling during development (Zhang et al. 2012). One gene identified in SAf pair is Sdc, which has been shown to have neuromuscular functions (Johnson et al. 2006; Chanana et al. 2009). The other two genes (DOR and Jabba) were related to lipid metabolism (Francis 2010; McMillan et al. 2018). Since the two lipid related genes were identified in highland pairs (DOR in SAf and Jabba in Eth), putative changes in lipid metabolism might facilitate adaptation to high altitude environments.
Discussion
Parallel evolution has often been studied at the population genetic and trait levels, but it has less frequently been analyzed at the transcriptome level (Stern 2013; Juneja et al. 2016). In this study, we used three recent instances of adaptation to colder climates in Drosophila melanogaster to study the evolution of gene expression and alternative splicing. The signal of parallel evolution in expression abundance varied among developmental stages, with a higher degree of parallelism for larva and adult stages than pupa. Further, we studied cis- and trans-regulatory evolution in the context of this ecological adaptation. For gene expression abundance, we found geographically variable patterns of cis-versus trans-effects for highly differentiated expression outliers relative to the other genes. Specifically, PST outliers show enrichment of cis-effect relative to background genes in Med pair while outliers show enrichment of trans-effect in Eth pair. For splicing, we also found PST outliers enriched for trans-effects in Eth pair. This pattern of trans-effects contributing to differential expression in the Ethiopian pair raises the possibility of large-scale gene regulatory network changes in this phenotypically distinctive highland population, which might result from a few genetic changes or from many.
Although there are significant patterns of parallel evolution in expression abundance between population pairs, the majority of outlier genes/intron usages are not shared between pairs. The low level of detected parallelism could reflect a high false negative rate, for example due to limited spatiotemporal expression of relevant differences (perhaps contributing to the greater parallelism detected in larvae, which have somewhat less tissue diversity). Alternatively, it might reflect the different selection agents in the different natural habitats as well as the demographic histories for these populations. The cold FR population colonized a higher latitude environment than the related warm population EG, whereas the other two cold populations colonized higher altitude environments where the selection agents may include air pressure, desiccation and ultraviolet radiation (Pool et al. 2017). Also, the Med pair has experienced the trans-Saharan bottleneck (Pool et al. 2012; Sprengelmeyer et al. 2019) and the standing genetic variation may be altered, potentially resulting in a distinct evolutionary path for FR compared to other two cold populations. Although EF and SD have both adapted to higher altitudes (EF at 3,070 meters above sea level, SD at 2,000), SD is seasonally cold (like FR) whereas EF is perpetually cool. Notably, the EF population exhibits distinct phenotypic evolution such as darker pigmentation (Bastide et al. 2014), larger body size (Pitchers et al. 2013; Lack et al. 2016), and reduced reproductive rate (Lack et al. 2016). Therefore, the underlying transcriptomic evolution for EF may partly reflect its unique phenotypic evolution. Indeed, the Eth pair shows the least parallelism at gene level with the other pairs (Table 1; Fig. 2), although it shared some parallel functional categories with the Med pair (Table S1).
Compared to the expression abundance, the pattern of parallelism is much weaker for intron usage (Fig. 2, Fig S1), which may partly stem from lower power to detect intron usage change (only a small proportion of reads are informative for exon junctions). However, we still found the Med pair and SAf pair show more parallel changes than the combinations with the Eth pair, which is consistent with results for expression abundance. Given the increasing evidence for alternative splicing contributing to environmental response and adaptation (e.g., Singh et al. 2017; Signor and Nuzhdin 2018; Smith et al. 2018), we need to study both expression abundance and splicing to fully understand the evolution at the transcriptome level. The development of sequencing approaches with long reads that cover the entire transcripts will enable us to quantify isoforms frequency directly and broaden the scope of alternative splicing variation that can readily be quantified. Since splicing changes during development and among tissues (Brown et al. 2014; Gibilisco et al. 2016), a detailed sampling throughout development of different tissues will also be necessary to understand the role of splicing on ecological adaptation.
We found trans-effects are generally larger than the cis-effects across the transcriptome, which is consistent with some previous studies (e.g., McManus et al. 2010; Coolon et al. 2014; Albert et al. 2018; Hart et al. 2018) but not with others (e.g., Lemmon et al. 2014; Mack et al. 2016; Verta and Jones 2018). The transcriptome-wide stronger trans-effects can be caused by random regulatory changes biased toward trans-regulation because of the larger trans-mutational target size (Landry et al. 2007). To focus on the evolved changes related to adaptation, we compared the ratios of genes with trans-effects to those with cis-effects between PST outliers and non-outliers and saw patterns varied among population pairs (Fig. 3). Cis only genes are enriched in the outliers of Med pair while trans only genes are enriched in the outliers of Eth pair, suggesting different adaptive regulatory mechanisms responding to ecological shifts. These results suggest that both cis- and trans-acting expression changes may be viable mechanisms of adaptive evolution. For intron usage, we found more differences showing cis-effects than trans-effects across the transcriptome (Table 3), consistent with splicing differences between Drosophila species studied by McManus et al. 2014. These results may be unsurprising since alternative splicing in Drosophila is mostly regulated by nearby sequences (Venables et al. 2011; Kurmangaliyev et al. 2015). However, particularly for the Ethiopian pair, we observed a relative excess of trans-regulation among PST outliers, which is consistent with expression abundance results for this same population pair (Fig. 3). Therefore, the genetic basis of gene regulatory evolution may depend on the mechanism (e.g. transcription vs. splicing), the evolutionary scale, and population-specific evolutionary events.
When we considered genes/introns showing both cis- and trans-effects, we observed that the two types of effects were generally in opposite directions (anti-directional. Table 3). This is consistent with the idea that gene expression is under stabilizing selection in general and gene regulatory networks evolve negative feedback to buffer effects of regulatory changes (Denby et al. 2012; Coolon et al. 2014; Bader et al. 2015; Fear et al. 2016). With regard to our PST outliers, it is possible that cis-acting changes might have evolved to compensate for unfavorable pleiotropic impacts of adaptive trans-regulatory evolution. However, negative correlations between cis- and trans-effects can also be an artifact coming from the measurement error on F1 expression data. Because the F1 data was used to estimate ASE and compared it to 0.5 (cis-effect null) and to parental expression proportion (trans-effect null), measurement error will introduce artifactual negative correlation between cis- and trans-acting changes. Therefore, whether the opposing effects between cis- and trans-acting changes are biologically meaningful will require further study. As Fraser (2019) and Zhang and Emerson (2019) proposed, using independent F1 replicates or other approaches such as eQTL mapping to infer cis- and trans-effects separately is necessary to affirm evidence of compensatory evolution.
We expect that the adaptive expression divergence caused by cis-regulatory changes should leave a signal in the nearby genomic region. Therefore, we used FST statistics to quantify genetic differentiation for the region around the focal genes. Window FST is sensitive to classic hard sweeps, and relatively useful for incomplete sweeps and moderately soft sweeps, but it is less useful for soft sweeps with higher initial frequencies of the beneficial allele (Lange and Pool 2016), for which SNP FST may be more sensitive. Here, we only found enrichment of window FST outliers in cis-effect genes for the Med pair. Interestingly, a previous genomic study on these populations found a stronger signal of parallel change for SNP FST than for window FST genome-wide (Pool et al. 2017). In light of the lack of elevated SNP FST among our cis-regulatory PST outliers, the previously-observed population genetic parallelism may primarily reflect changes other than the cis-regulatory events identified from our whole-organism RNAseq data.
Methods and Materials
Ecologically and phenotypically differentiated populations
The three Drosophila melanogaster cold-warm population pairs used in this study, France-Egypt (Med), Ethiopia (Eth) and South Africa (SAf), were described in previous publications (Pool et al. 2012; Lack et al. 2015; Pool 2017). The three cold derived populations have evolved increased cold tolerance in parallel. A previous study has shown that female adults from the cold populations were more likely to recover after 96 hours at 4 °C than the respective warm populations (Pool et al. 2017). Here to confirm increased cold tolerance for the cold populations for egg-to-adult survival, we selected three strains from each of the FR, EF and SD populations as well as from the ancestral warm population ZI as control.
Developmental success was assayed at 15 °C as the cold environment and 25 °C as the warm control environment. 40 mated female flies were allowed to lay eggs in a half pint glass milk bottle with a standard medium at room temperature for 15 hours. Each strain had ~8 bottles. After the flies were removed and the number of eggs were counted, about half of the bottles were incubated at warm environment and the other half were incubated at cold environment. The numbers of adult flies emerged from each bottle were counted after 14 days and 42 days from warm and cold environments respectively. Bottles with more adults than recorded eggs were scored as 100% survival. Developmental success for each strain was measured as the average emergence proportion among bottles, which is the number of emerged adults divided by the number of eggs. To determine significance, unpaired t-tests between each cold population and the ZI population were performed for both temperature conditions.
RNA sample collections and sequencing
Within each population of the three warm-cold pairs (six populations in total), we selected 16 strains and assigned them into eight crosses. Before the crossing, all the strains had been inbred for eight generations. The criterion for choosing parental strains for a cross was based on minimal genomic regions of overlapping heterozygosity. Among the strains chosen within each population, we used similar criteria to select four strains to perform crosses between the warm and the respective cold populations. Two of the four strains were used as the maternal lines and the other two were used as paternal lines in the between-population crosses. One cross between SD and SP populations was lost. We also collected adult female samples from the parental inbred lines used in the crosses.
All the flies were reared at 15°C, which approximated the derived cold condition. 20 virgin females and 20 males were collected from maternal and paternal lines respectively for each cross and allowed to mate and lay eggs for a week in half pint bottles. Each bottle contained standard Drosophila medium (containing molasses, cornmeal, yeast, agar, and antimicrobial agents). For the within-population crosses, samples at three developmental stages were collected: larva, pupa and female adult. Third-instar larvae were collected on the surface of the medium. For pupa, new yellow pupae were collected within one day of pupation. For adult, female flies were collected 4-5 days after eclosion. For samples from between-population crosses and parental lines, only female adults were collected. All the samples were shock-frozen in liquid nitrogen immediately after collection.
Approximate 50 larvae or 50 pupae or 30 female adults were used for RNA extraction for each sample. Total mRNA was extracted using the Magnetic mRNA Isolation Kit (New England Biolabs, Ipswich, MA) and RNeasy MinElute Cleanup Kit (Qiagen, Hilden, Germany). Strand-specific libraries were prepared using the NEBNext mRNA Library Prep Reagent Set for Illumina. Libraries were sized selected for approximately 150 bp inserts using AMPureXP beads (Beckman Coulter, CA, USA). The libraries were quantified using Bioanalyzer and manually multiplexed for sequencing. All libraries were sequenced on a HiSeq2500 (V4) with 2×75bp paired-end in two flow cells.
Quantifying gene expression and exon usage frequency
The paired-end sequence reads for the within-population cross samples were mapped to the transcribed regions annotated in D. melanogaster (release 6, BDGP6.84) using STAR with parameters from ENCODE3’s STAR-RSEM pipeline (Li and Dewey 2011; Dobin et al. 2013). For gene expression, the numbers of reads mapped to each gene were quantified using RSEM (Li and Dewey 2011). Reads mapped to the rRNA were excluded in the analysis. The expression abundance for each gene was standardized by the numbers of reads mapped to the total transcriptome of the sample.
To quantify exon usage, we used Leafcutter (Li et al. 2018) to estimate the excision frequencies of alternative introns. This phenotype summarizes different major splicing events, including skipped exons, 5’ and 3’ alternative splice-site usage, intron retention. Leafcutter took the alignment files generated by STAR as input to quantify the usage of each intron. Then Leafcutter formed clusters that contain all overlapping introns that shared a donor or accept splice site. The default parameters were used: ≥ 50 reads supporting each intron cluster and ≤ 500kb for introns length. The exon usage frequency is the number of intron excision events divided by the total events per cluster. It is worth noting that Leafcutter only detects exon-exon junction usage and it is unable to quantify 5’ and 3’ end usage and intron retention (Alasoo et al. 2018), which were not examined here.
Identifying outliers in gene expression and intron usage differentiation
To identify candidate genes under differential evolution between the warm and cold populations in each pair, we first controlled for the potential transcriptome skew caused by very highly expressed genes. For each expressed gene, we calculated the average expression of the cold samples (AvgExpcold) and that of the warm samples (AvgExpwarm). Then we obtained the median of the ratio of AvgExpcold/AvgExpwarm across all expressed genes for the population pair. Gene expression for the warm samples was normalized by multiplying this median before subsequent analysis. This correction was designed to avoid a scenario in which either the cold population or the warm population had important expression changes in one or more highly expressed genes that caused the relative expression of all other genes to shift, even if their absolute expression level did not.
We used PST statistics to quantify gene expression divergence between cold and warm populations in each population pair: where Vbetween is between-populations variance for expression abundance, Vwithin is the average variance for expression abundance within populations. Although both within- and between-population components of variance can be confounded by the environmental variance, PST is still a useful statistic to quantify phenotypic differentiation (Merila 1997; Brommer 2011; Leinonen et al. 2013). Here, environmental variance should be reduced by the common laboratory environment. To reduce sampling variance before calculating PST, for each gene, we required the total mapped reads across all 48 within-population samples to exceed 200 for a given developmental stage. Then for each population/stage, we excluded the crosses/samples with the highest and lowest gene expression for each gene (to avoid high PST values being driven by single anomalous values), resulting in six samples per population/stage. The PST quantile based on data excluding extreme samples is concordant with the PST quantile calculated using all the crosses for most cases (Fig. S3).
We chose the above PST-based approach instead of simply testing for differential expression in part because our within-population samples reflect real variation as opposed to technical replicates. Also, many alternative methods make assumptions about the data (e.g., negative binomial distribution for transcript counts) which are difficult to apply to splicing, even if they hold for expression. PST and the population genetic index FST are under the same theoretical framework, and are often directly compared to search for evidence of adaptive trait differentiation. However, environmental and measurement variance will downwardly bias PST, making targets of local adaptation less likely to reach a threshold defined by genome-wide high FST outliers. Hence, in this study we simply focus on the highest quantiles of PST for a given trait/population comparison, as detailed below.
As with gene expression, we used PST to estimate the intron usage differentiation between cold and warm populations, with Vbetween as the between-populations variance for a given intron’s usage frequency, Vwithin as the average within populations variance for intron usage frequency. Before calculating the PST, for each exon-exon junction, we summed the intron excision events (ni) and the alternative events (nj) of the cluster across all samples in a developmental stage. The minimum for both types of event had to be at least 5 (n ∈ [ni, nj] ≥ 5) for the exon-exon junction to be included in subsequent analysis. Then for each exon-exon junction, we excluded the sample with highest and lowest intron usage in a population/stage and calculated PST.
Examining co-directional change for outliers shared between population pairs
For gene expression differentiation, we used the upper 5% quantile of PST as outlier cutoff to identify candidate genes potentially under geographically differential selection. To study the degree of parallel evolution in gene expression, we identified outlier genes shared between two population pairs and showing consistent changes in the cold populations relative to the warm ones (co-directional). Whether the number of shared outliers with co-directional change was significantly greater than expected by chance from the total shared genes between population pairs was determined by a one-tailed binomial test. The statistics here and those below assume the expression changes are independent among genes/introns, which is not always the case (genes can interact with each other via regulatory networks).
The second approach used to examine parallelism of gene expression evolution was to focus on the outlier genes for a specific population pair (outlier pair) and examine whether the expression changes in other pair (directional pair) follow the same directions. If cold adaptation causes similar evolution in gene expression, those genes in the directional pair should have changes in the same directions as the outlier pair. Each of the pairwise population combinations had two comparisons; a population pair was assigned as the outlier pair in one comparison and as the directional pair in the other comparison. To generate a control set of genes for the null expectation of co-directional change proportion, we identified genes in the bottom 50% quantile for PST in both the outlier pair and the responding directional pair. We tested whether the proportion of co-directional change is higher in the outliers than that in the control using the Chi-squared Test.
To identify exon usage outliers, a cutoff of the upper 5% PST is used. If multiple exon junctions had PST pass the top 5% cutoff, only the exon junction with the highest PST would be kept as an outlier to control for nonindependence. Because the numbers of shared exon usage outliers in both population pairs are small (<10), we only performed the second type of analysis studying the proportion of co-directional changes between outlier pair and directional pair for the top 5% exon usage events. We identified exon usage events in the bottom 50% PST in both population pairs as control.
GO enrichment test for PST outlier genes
The Gene Ontology enrichment tests were performed using the R package “clusterProfiler” (Yu et al. 2012) based on the fly genome annotation (Carlson 2018). The types of GO terms being tests contained all three sub-Ontologies: Biological Process (BP), Cellular Component (CC) and Molecular Function (MF). Selection of overrepresented GO terms was based on adjusted p-value < 0.1 using “BH” method (Benjamini and Hochberg 1995) for each sub-Ontology. For gene expression, the upper 5% PST outliers for each population pair were tested for GO enrichment. To determinate whether the shared significant GO terms between pairs were more than expected by chance, we randomly sampled the same numbers of genes as the outliers and performed the GO test for both pairs and identified the shared significant GO terms between pairs. We repeated the process 1000 times to get a set of numbers for the shared significant GO terms and compared to the actual number of shared significant GO terms to get a permuted p-value.
To access the functional categories of the differential intron usage, we calculate the quantile of PST for each exon usage. To rank the differentiation for a gene, we used the highest quantile (the most extreme differentiation) among the exon usages within the gene as the gene quantile (qgene). To account for the multiple testing of the exon usages for a gene, the adjusted total numbers of testing is calculated as , where ni is the number of testing for a cluster and j is the number of clusters for the gene. Then adjust gene quantile is q’gene = 1- (1- qgene) × nsum. The upper 5% q’gene was used to identify the most differentiated genes for intron usage and they were tested for GO enrichment as described above.
Cis- and trans-effects of regulatory divergence
To study the contributions of cis- and trans-regulatory effects on expression and exon usage divergent, we focused our analysis on the upper 5% PST outliers for gene expression/exon usage. For each gene/exon junction in each population pair, we selected a representative cross showing the greatest difference between parental strains for this analysis. In addition, this difference needed to be larger than the average difference between the cold and warm populations for its pair.
To study allele-specific expression/exon junction, we obtained the genomic sequences of the two parental strains aligned separately to the FlyBase D. melanogaster 5.77 assembly (Lack et al. 2015; 2016). The SNP calling from the reference genome was done by samtools (Li et al. 2009). To avoid mapping bias for the RNAseq reads (Degner et al. 2009; Stevenson et al. 2013), we updated the reference based on the SNPs for the two parental stains by masking the SNPs as “N”. The F1 female adult RNA-seq reads were mapped to the updated reference using STAR with options: --chimFilter None --outFilterMultimapNmax 1 (Dobin et al. 2013). Because of the high level of heterozygosity within our inbred lines (Lack et al. 2015), we used a parental ancestry proportion statistic (f) to study the allele-specific expression instead of focusing on fixed difference between parental strains. The parental proportion in gene expression level/exon usage in the F1 RNA-seq sample was estimated as where pF1 is the allele frequency in the RNA reads for the F1 sample, pc and pw are the allele frequency in the genomic reads for the cold- and warm-adapted parental lines respectively. SNPs were filtered with read counts ≥ 10 in the F1 RNA-seq sample and the parental samples as well as parental frequency difference |pc – pw| ≥ 0.25. The parental proportion for each candidate gene was the average f for all sites located in the gene .
We tested two null hypotheses corresponding to cis only and trans only regulatory differences. Under the null hypothesis that cis-regulatory effects are absent, the is expected to be near 0.5 because the cold parental strain contributes half of the alleles to F1 offspring, and alleles from different parents express similarly in these F1s (Cowles et al. 2002; McManus 2010; Meiklejohn et al. 2014). Under the null hypothesis that trans-regulatory effects are absent, is expected to approximate the ratio of the cold parental strain expression to the total expression of both parental strains (Wittkopp et al. 2004): rF0 = Ec/(Ec+ Ew). However, sampling effects can cause to deviate from the null expectations.
We accounted for different types of uncertainty on estimating f. The first is the uncertainty on estimating parental strain frequencies pc and pw from the genomic data. For each SNP used in the calculation, we resampled 60 alleles based on the estimated allele frequency, representing the 30 individuals used for genome sequencing (Lack et al. 2015). Then we sampled reads by drawing with replacement among the resampled 60 alleles until we reached the observed read depth of the site to calculate the pc’ and pw’. To account for the measurement uncertainty in F1 expression, we sampled with replacement for the F1 reads mapped to each gene (based on pc’ and pw’) until we reached the numbers of reads mapped to the gene. Then we recalculated the pF1’ for each SNPs and together with pc’ and pw’ to calculate the ’ for each gene. We repeated the above process 1000 times to get a distribution of ’. A 95% confidence interval of the distribution not overlapping with 0.5 suggested the existence of a cis-effect.
However, there is another type of sampling effect if the regulatory variants are not fixed different between parental strains. For example, one strain may be heterozygous for a causative regulatory variant, which might be located outside the exons and hence absent from the RNAseq data. The null hypothesis for inferring a cis-effect is that only trans-effects are present and the is 0.5. Sampling of trans-regulatory polymorphism does not affect the null expectation since trans-effect influences both target alleles similarly. However, the sampling of cis-regulatory polymorphism affects the null expectation for trans-effect because the F1 expression proportion can deviate from the parental expression ratio rF0, potentially causing false positive inferences of trans-effect. Although there is no information about the frequency and effect size for the cis-regulatory mutations, we chose simple assumptions about them to make a relatively conservative approach for inferring trans-effects. We assumed that the frequency of the cis-regulatory allele is 0.5 in the cold-adapted strain (heterozygous, Aa) and 0 in the warm adapted strain (homozygous, aa). This simplest polymorphism condition maximized the sampling effect within the cold strain. Then we assigned the effect size for the a cis-regulatory allele as the expression level of the warm adapted strain (Ew). The effect size for the A cis-regulatory allele is 2Ec-Ew, where the Ec is the expression level of the cold adapted strain. Then we sampled 30 alleles randomly from Aa with replacement to create diploid individuals and calculated the average expression for the sampled individuals from the cold strain Ec’. The updated rF0’ is calculated as Ec’/(Ec’+ Ew). The sampling and calculation were repeated 1000 times. Each time the rF0’ is paired with a ’ described above to calculate the difference . A 95% confidence interval of D’ not overlapping with 0 suggested the existence of a trans-effect.
Based on the tests above, the set of candidate genes were classified into categories (McManus 2010; Schaefke et al. 2013; Chen et al. 2015) including no significant cis- or trans-effect, cis only, and trans only. For genes showing both cis- and trans-effects, we further classified them based on whether these two effects favored expression of the same (co-directional) or different parental allele (anti-directional). For exon usage differentiation, we applied a similar approach to classified the differentiated exons into the five categories, accounting for different sampling effects and measurement errors. Instead of analyzing expression level of the parental strains (E), we analyzed their exon usage frequency for the sets of outlier exon junctions.
For the PST outlier introns identified as cis only or both cis- and trans-effect, we hypothesized that causative cis-regulatory elements may show elevated allele frequency differentiation between the warm and cold populations. For expression abundance, the majority of cis-regulatory SNPs are located within 2kb upstream of the transcription start site and downstream of the transcription end site (Massouras et al. 2012). Therefore, we used the interval from 2kb upstream to 2kb downstream as the focal region of a gene for this analysis. We calculated window FST and SNP FST using sequenced genomes from Drosophila Genome Nexus (Lack et al. 2015 & 2016). For window FST, the division of windows within a gene region is based on 250 non-singleton variable sites per window in the ZI population (Pool et al. 2017). The highest FST for the windows overlapping the focal region was assigned as its FST_winmax. To determinate the statistical significance of FST_winmax, we calculated FST_winmax for all other blocks of the same number of windows along the same chromosome arm where cross-over rates were above 0.5cM/Mb (Comeron et al. 2012), but excluding those within 10 windows of the focal region. The specific non-low recombination regions are: 2.3–21.4 Mb for the X chromosome, 0.5–17.5 Mb for arm 2L, 5.2–20.8 Mb for arm 2R, 0.6–17.7 Mb for arm 3L, and 6.9– 26.6 Mb for arm 3R. SNP FST was calculated for all sites within the focal region and the highest value (FST_SNPmax) was thus obtained for the focal gene. Analogous to our FST_winmax permutation, we also calculated FST_SNPmax for permuted regions with the same number of SNPs as the focal region, along the non-low cross-over rate region on the same chromosome arm. For both FST_winmax and FST_SNPmax, we then focused on regions in the upper 10% quantile of permuted values for further analysis. We tested whether the proportion of genes with high FST is higher in the cis-effect genes than that in control non-outliers using the Fisher’s Exact Test because of the low counts.
We also identified genes with significant cis-effects shared in two population pairs and examined whether the effects favored expression of the same cold or warm parental alleles (consistent cis-effect). Also, we tested whether the shared cis-effect genes also show elevated population genetic differentiation in the two pairs. We obtained “Population Branch Excess” statistic (PBE) specific for cold populations for SNPs from Pool et al. 2017. We used ±2kb around the gene regions to look for any shared cis-effect genes containing SNPs with high PBE statistic for cold population in both pairs (PBE quantile < 0.05 in both pairs).
For exon usage, because the cis-regulation is largely contributed by the splice sites (Kurmangaliyev et al. 2015), we calculated the FST value for the splice sites, which are located within ±15 base pair around the two intron/exon boundaries. The maximum FST among the splice sites for each intron is chosen as the SNP FST for the focal intron. We compared the FST_SNPmax of the cis outlier introns and the non-outlier introns to see whether the cis outlier introns showed elevated FST based on Wilcoxon signed-rank test. To examine the potential function of splicing differentiation, genes containing high SNP FST (upper 15% quantile of FST_SNPmax) flanking cis-regulated introns were identified as candidate genes.
Acknowledgements
We thank Colin Dewey for helpful discussions and the UW-Madison Center for High Throughput Computing (CHTC) for cluster usage. This work was funded by NSF DEB grant 1754745 to JEP and by NIH NIGMS grant F32GM106594 to JBL.