Abstract
The role of phenotypic plasticity in evolution is contentious, in part because different types of plasticity – adaptive, neutral, or non-adaptive, are often not distinguished. Adaptive plasticity is expected to facilitate expansion into new environments, while non-adaptive plasticity will result in a mean phenotype further from the adaptive optimum and/or an increase in variance due to the expression of variation that was neutral and shielded from selection in the prior environment. We explore these patterns here by exposing Drosophila melanogaster and D. simulans to high ethanol concentrations, with the knowledge that D. melanogaster associates with high concentrations of ethanol in nature while D. simulans does not. Using changes in gene expression and splicing we find that in D. simulans there is a large genotype-specific response to ethanol, orders of magnitude larger than any effect that is not genotype-specific. In D. melanogaster the response to ethanol is limited, and it is concordant among different genotypes. This response to ethanol in D. simulans is enriched for non-protein coding nested genes that do not have orthologs in D. melanogaster, suggesting rapid evolution of transcription. Sequence variation in ethanol-implicated genes is consistent with balancing selection or selection relaxation in D. simulans, and they are more divergent from D. melanogaster than the genome average. Overall, these patterns are consistent with a maladaptive passive response to ethanol-induced stress in D. simulans, while in D. melanogaster the reduced response and lack of genotype-specific variation suggests selection for an optimal response.
Introduction
Genetic variation (G) occurs when there are differences in a phenotype due to genotypic differences among individuals within a population (Figure 1A). In contrast, plasticity – i.e. environment specific (E) adjustment in phenotype - can occur without genetic variation between individuals and is a change in a given phenotype in response to an external stimulus (Figure 1A). When components of variance are estimated plasticity is the part of phenotypic variation attributable to environmental variation. Plasticity is a ubiquitous property of organisms, and it may be adaptive, neutral, or maladaptive with regard to the fitness of a plastic individual (Ghalambor et al. 2007; Lande 2009; Marais et al. 2013). Genetic variation in plasticity (G x E) describes variation between genotypes for the response to the environment (Figure 1A). Quantifying genetic variation, plasticity, and genetic variation for plasticity is important for understanding the genotype to phenotype map and adaptation. For example, the evolution of a reduced plastic response (E) can reconfigure the relationship between fitness and the environment, redrawing the genotype to phenotype map. In another example, genetic variation in plasticity can make the evolution of locally adapted populations unpredictable in response to environmental change.
It is often assumed that plasticity is adaptive, however phenotypic plasticity can be adaptive, neutral, or maladaptive. This depends on the evolutionary history of exposure to the agent causing plasticity. Were such exposure infrequent, plasticity could arise as a passive consequence of environmental stress. This is likely an important, or perhaps prevalent, form of phenotypic plasticity (Ghalambor et al. 2007; 2015). One potential prediction arising from stress inducing passive phenotypic plasticity is that within population variation for performance is expected to be greater, as stress breaks down buffering mechanisms that might normally reduce between individual variation (Joshi & Thompson 1997). Thus under lower stress conditions, populations will exhibit less individual and genetic variation compared to high stress, rare conditions (Ghalambor et al. 2007). The novel or rare high stress environment is essentially uncovering cryptic variation that was allowed to neutrally accumulate in lower stress environments where selection was unable to shape the reaction norm (Rutherford 2000).
When a population colonizes a new habitat, this could manifest as a transient increase in G x E, followed by a reduction in genetic variation for plasticity, assuming that there is selection for a single optimal phenotype in response to the environment (Via & Lande 1985; Guntrip & Sibly 1998; Lande 2009; Matzkin 2012; Huang & Agrawal 2016). The maintenance of plasticity, or loss of plasticity, would then depend upon the frequency with which the population continues to experience multiple environments, or the cost of maintaining the plastic response (Joshi & Thompson 1997; Lande 2009; Scoville & Pfrender 2010; Lee et al. 2011; Hodgins-Davis et al. 2012). Thus if phenotypic plasticity is costly to maintain, or the organism no longer experiences the previous environment, the expectation is that phenotypic plasticity will transition to local adaptation.
In this manuscript we will examine plasticity and genetic variation for plasticity in response to ethanol in D. simulans and D. melanogaster, two species with divergence adaptive histories with respect to ethanol. D. melanogaster and D. simulans are thought to have originated in or around Africa (D. melanogaster, southern-central Africa, D. simulans, Madagascar) followed by an out-of-Africa expansion ~10,000 years ago, however D. melanogaster may have colonized Europe and America earlier than D. simulans (David and Capy 1988; Lachaise et al. 1988; Baudry et al. 2004; Li and Stephan 2006; Thornton and Andolfatto 2006). Drosophila simulans is thought to have colonized the Americas in the last 100 years, creating a large and more recent population bottleneck than in D. melanogaster (Sturtevant 1920; Signor et al. 2017c).
D. melanogaster and D. simulans are cosmopolitan species and human commensals, however while D. melanogaster is commonly found inside houses, breweries, and wineries, D. simulans is more often observed in orchards and parks (though their niches overlap significantly and they are often found on the same patches) (David & Bocquet 1976; Parsons & King 1977; McKenzie & McKechnie 1979; Parsons & Spence 1981; Dickinson et al. 1984; Gibson & Wilks 1988; Thomson et al. 1991; Chakir et al. 1993; Mercot et al. 1994; Joshi & Thompson 1997). D. melanogaster is considerably more ethanol tolerant than D. simulans, and is found regularly feeding upon and ovipositing in resources with ethanol concentrations greater than 8% (Hoffmann & McKechnie 1991; Fry 2014; Zhu & Fry 2015). At concentrations of 4% ethanol D. simulans shows reductions in survivorship and increased development time relative to D. melanogaster (Joshi & Thompson 1997). However, at low concentrations of ethanol (0.5 - 3.0%) adult D. simulans show increases in longevity (less so than adult D. melanogaster, which increases longevity from 0.5 - 9.0%) (Parsons et al. 1979). During multi-generation exposure to ethanol-rich substrates in the lab, D. simulans evolves tolerance (Joshi & Thompson 1997; Lefèvre et al. 2012). It has been demonstrated that ethanol tolerance comes as a trade-off that reduces the efficacy of processing other biochemical targets, thus it is possible that in D. simulans this trade-off is under different selective pressure (Chakraborty & Fry 2016). However, ethanol is by no means a novel resource for D. simulans.
In general genetic adaptation and phenotypic plasticity are considered as alternative adaptive strategies in response to the environment (Schlichting & Pigliucci, 1998). For a variety of traits D. simulans has been observed to be ‘plastic’ while D. melanogaster is described as being locally adapted (David & Bocquet 1975; Hyytia et al. 1985; Watada et al. 1986; Singh 1989). For example, for temperature tolerance D. simulans has been shown to exhibit less difference between populations and to produce ‘optimal’ phenotypes under a wider range of environments (consistent with adaptive plasticity) (Austin & Moehring 2013). This includes less differentiation between populations, in that populations from different temperatures show similar levels of plasticity across temperatures. Population genetic surveys generally corroborate these phenotypic observations, where D. melanogaster also shows greater differences between populations, and more evidence of clinal variation than in D. simulans (Singh 1989; Weeks et al. 2002; Hoffmann & Weeks 2006; Schmidt & Paaby 2008; Machado et al. 2015; Sedghifar et al. 2016; Bergland et al. 2016; Signor et al. 2017c). However, D. simulans has generally been observed to harbor more within population diversity than D. melanogaster (Machado et al. 2015; Sedghifar et al. 2016; Signor et al. 2017c).
Gene expression is both affected by the environment and mediates an organisms response to the environment. It has been shown to be plastic in response to many environmental variables such as heat stress, ethanol, and the presence of predators (Hodgins-Davis & Townsend 2009; Levine et al. 2011; Yampolsky et al. 2012; Hodgins-Davis et al. 2012). While many studies have quantified the effect of the environment on gene expression, far fewer have performed these experiments in multiple genotypes within a common garden to obtain the interaction between genotype and the environment for gene expression (DeBiasse & Kelly 2015). Indeed, most work on gene expression plasticity has not quantified G x E, and has found inconsistent results with regard to theoretical expectations for the maintenance of plasticity (Cheviron et al. 2008; McCairns & Bernatchez 2009; Hodgins-Davis et al. 2012; DeBiasse & Kelly 2015; Heckel et al. 2016; Mathur & Schmidt 2016). Furthermore, many plastic responses are time dependent, and if performed in a common garden without sampling across time points may not detect early plastic responses that usher in ‘maintenance’ phenotypes, or dynamic responses to the environment (Aubin-Horth & Renn 2009; Lewis et al. 2014). In addition, alternative splicing has been shown to respond to changes in the environment, and it is thought to diverge more rapidly than gene expression between lineages (Barbosa-Morais et al. 2012; Merkin et al. 2012; Gueroussov & Gonatopoulos-Pournatzis 2015; Jakšić & Schlötterer 2016; Wang et al. 2017; Pajoro et al. 2017; Singh et al. 2017; Calixto et al. 2018). As such it may be an important component of plasticity, necessary for adaptation to rapidly changing and heterogeneous environments (Marais et al. 2013; Preußner et al. 2017; Price et al. 2018; Calixto et al. 2018). Furthermore, it has been previously implicated in the response to ethanol, and in D. melanogaster may be a more important component of the response than gene expression changes (Oomizu et al. 2003; Newton et al. 2004; Pietrzykowski et al. 2008; Sasabe & Ishiura 2010; Hemby 2012; Zaharieva et al. 2012; Robinson & Atkinson 2013; Signor & Nuzhdin 2018). However, alternative splicing is rarely accounted for in studies of plasticity.
Overall, we compare plasticity in response to ethanol, including genetic variation for plasticity over time and between genotypes, among Drosophila species with different patterns of ethanol utilization. We measure plasticity and variation in expression and splicing in response to ethanol treatment, and we observe that in D. melanogaster, which regularly exploits ethanol-rich substrates, there is essentially no genetic variation for plasticity. This suggests the following scenario: for D. simulans high ethanol concentrations are essentially a novel stressful environment that reveals cryptic variation for phenotypic plasticity, while D. melanogaster does not experience high stress under 15% ethanol and cryptic variation is not revealed. Furthermore, D. melanogaster does not experience high stress as it is adapted to ethanol and genetic variation has been removed in favor of an optimal, and reduced, plastic response
Methods
Fly lines
Male flies for D. melanogaster originated from six genotypes collected from an orchard in Winters, California in 1998 and made isogenic by 40 generations of full sibling inbreeding (Yang & Nuzhdin 2003; Campo et al. 2013; Signor et al. 2017a; b). Male flies for D. simulans came from six genotypes collected from the Zuma Organic Orchard in the winter of 2012 and made isogenic by 15 generations of full sib inbreeding (Signor et al. 2017b; c). Residual heterozygosity is similar between these lines of D. melanogaster and D. simulans (Campo et al. 2013; Signor et al. 2017c). In natural conditions flies will not by homozygous at all loci, thus each inbred genotype was crossed to a white eyed ‘tester’ strain to create the F1 flies used in the ethanol exposure assays (D. melanogaster, w1118, Bloomington stock number 3605; D. simulans, w501, Cornell species stock number 14021-0251.011). This design allows us to replicate observations of gene expression because the flies are identical twins, while also maintaining heterozygosity such that they more closely resemble wild type flies. Rearing occurred on a standard medium at 25 °C with a 12-h light/12-h dark cycle. Several measures were taken to standardize offspring quality: F1 flies were produced from females of the same age, these females were held at the same density (10 individuals per sex/vial), males used for ethanol exposure assays were collected as virgins and reared in single sex vials, and males used for assays were held at a standard density (24-30 per vial).
During the ethanol exposure assay a female was included as a stimulus, but was not collected for RNA-seq (Signor et al. 2017a; b). The females were from y1w1 mutants for both D. simulans and D. melanogaster (D. melanogaster, Bloomington stock number 1495, D. simulans, Cornell species stock number 14021-0251.013). Quality of the females was controlled in the same manner as previously described for males, and both males and females were aged three to four days before being used in the ethanol exposure assay.
Experiment setup
Ethanol exposure took place in a circular arena, each of which was part of a larger chamber containing 12 arenas each with a diameter of 2.54 cm (VWR cat. no. 89093-496). Prior to the assay the flies were sedated through exposure at 4 °C for ten minutes, to avoid the confounding effects of CO2 exposure on behavior and gene expression. Once the flies were sedated they were placed in the arenas with a paintbrush (two males and one female per arena) and the chambers were secured with two small pieces of cryogenic tape. It is standard to allow organisms to recover in a new location for ten minutes prior to beginning behavioral assays, which was a portion of the goal of this study, so the chambers were left upside down for ten minutes while the flies regained consciousness and oriented themselves (Signor et al. 2017a). The bottom of each chamber contained a standard amount of either grapefruit medium or medium in which 15% of the water has been replaced with ethanol. During ethanol administration the flies were recorded using VideoGrabber (http://code.google.com/p/video-grabber/), and set-up of the assays was facilitated with FlyCapture (PointGrey, Canada). In order to better standardize the transcriptional response, these videos were also used to determine if any male/female pairs mated during the assay or any flies were damaged during setup, and flies in those chambers were not collected for RNA-seq.
Assays were conducted for ten, 20, or 30 minutes for three replicates of each of the two conditions. Flies are most active during the hours following dawn, thus to standardize behavior and circadian rhythms all assays were conducted within a two-hour window after dawn. Replicates for both species were conducted randomly under standardized conditions (25 °C, 70% humidity). At the conclusion of the assay the chambers were flash frozen in liquid nitrogen, allowed to freeze through, transferred to dry ice, and all of the males were collected for RNA-seq. For both D. melanogaster and D. simulans the expectation is that intoxication is occurring through inhalation of ethanol vapors, and evidence of the behavioral effects in both species and the efficacy of this approach have been published previously (Signor et al. 2017a; b).
Sample preparation and RNA sequencing
Sample preparation has been described previously, and will be briefly summarized here (Signor & Nuzhdin 2018). Flash-frozen flies were freeze dried and ten to 12 heads were placed into a 96-tube plate (Axygen MTS-11-C-R). mRNA purification, cDNA synthesis and library preparation were carried out by RAPiD GENOMICS (http://rapid-genomics.com) using a robot. mRNA was purified using Dynabeads mRNA DIRECT Micro kit (Invitrogen # 61021) with slight modifications. To fragment the RNA mRNA-beads were resuspended in 10 uL 2X first strand buffer (Invitrogen # 18064-014), incubated at 80 C for two minutes and placed on ice, then the supernatant was collected after five minutes on magnetic stand. First strand synthesis was performed using standard protocols for Superscript II (Invitrogen #18640-014) and reverse transcription (25 C 10 min, 42 C for 50 min, 70 C for 15 min, 10 C hold). Second strand synthesis was carried out using standard protocols with DNA Pol I and incubated at 16 C for 2.5 hours. cDNA was purified with 1.8 volume of AMPure XP following manufactures instructions (Beckman Coulter A63880). Illumina RNAseq libraries were prepared by Rapid Genomics (http://rapid-genomics.com/home/) using dual barcodes. Sequencing was performed using the Illumina HiSeq 2500 as both 2×150 bp or 2×50 bp reads. The two run lengths (and runs) were intended to provide extra coverage, and all replicates were sequenced in both runs.
Gene expression analysis
It is common in organisms with alternative splicing for exons from different isoforms of a single gene to overlap with one another, or be shared between all or most isoforms (Figure 1B). Short read data fundamentally cannot resolve these exons to individual isoforms, however, one approach is to quantify each exon separately and decompose exoms overlapping between isoforms into those which are shared and unique. When the differences between overlapping exons are less than 10 bp, there is no appreciable amount of information loss in not decomposing overlapping exons, and this approach has been taken in the past (Dalton et al. 2013; Graze et al. 2014; Newell et al. 2016; Fear et al. 2016). However, in many cases the differences in exon overlap are much larger than this, so to address this issue we use a classification scheme where reads may be assigned to exons, exonic regions, or exon fragments (Signor & Nuzhdin 2018) (Figure 1B). Exon boundaries were determined using the D. melanogaster FlyBase 6.17 genome features file and the D. simulans 2.02 genome features file. A single exon is one that does not overlap any other exons, and it may be unique to a single isoform, shared between several (common), or shared between all isoforms (constitutive). If a gene has only a single transcript then every exons it contains will be both unique and constitutive. When exons overlap other exons from different isoforms, they are grouped together into an exonic region, and they may be common to some isoforms or constitutive to all, but they are not unique given that by definition they require overlap between exons. When overlapping exons differ measurably, we used the 5’ and 3’ positions of the exons within the region to create exon fragments. An exon fragment may exclusive to a single isoform (unique) or common/constitutive (Figure 1B). For exon fragments there is only one unique situation in which it may be both constitutive and unique – when the exons of two genes overlap, and one of those genes has a single transcript, such that the non-overlapping portion of the exon belonging to the gene with one transcript will become a unique and constitutive exon fragment. Alignment was performed using BWA-MEM version 0.7.15 and BED files were used to count reads in each region and obtain the length adjusted read count (reads in region divided by the length of region), and the APN (average per nucleotide) (Li 2015).
The APN was summed for technical replicates of the same read length then averaged between read lengths to handle the mixture of read lengths for each sample (2×150 bp and 2×50 bp). If the APN was greater than zero in at least half of all samples per condition it was considered detected. While we considered several approaches to normalize coverage counts upper-quartile normalization with log-transformation and median centering within time × treatment × genotype were selected due to better performance of the residuals (Bullard et al. 2010; Dillies et al. 2013).
To test the significance of components of expression variation, the log APN for each exonic region was modeled as for the ith genotype (gi), jth treatment (tj; j = ethanol or no ethanol), kth time point (mk; k = 10 min, 20 min, 30 min), and lth replicate. For the interaction between treatment and time point, the log APN for each exonic region was modeled as or the ith condition (time × treatment) and jth replicate. Contrasts to compare treatments within time point (ethanol versus no ethanol, for 10 min, 20 min and 30 min) were conducted. Residuals were evaluated for conformation with normality assumptions, and assumptions were met in excess of 95% of the models.
To evaluate whether there was evidence for splicing differences among times or treatments, exonic regions for each gene and for each sample were ranked and the most expressed region ranked as one, the least expressed exonic region as three and all others as two. Exon ranks for each gene were modeled as where γijk is the exon rank (1,2,3) of the ith exonic region of the gene, jth condition (time × treatment), and the kth replicate; ri is the exonic region of the gene; tj is condition; and (rt)ij is the interaction between exonic region and condition. A more traditional GLM test could not be used due to a lack of normality in the distribution of model residuals. Accordingly, a non-parametric test must be relied upon to look for changes in exon or exonic region representation between exons of a gene and we used of a rank test to summarize changes in exon representation (Supplemental File 3). F-tests for the significance of the mean square attributed to the effect tested versus the mean square attributed to error, or the appropriate interaction term, were used. The false discovery rate was controlled using the Benjamini-Hochberg procedure, with a significance cutoff of α = 0.05 (Benjamini & Hochberg 1995).
GO Analysis
When a gene had more than one ortholog in D. melanogaster only one ortholog was included for the GO analysis, so as not to inflate the number of genes involved in a given process. This does presume that orthologs will be annotated with the same GO terms, and this is generally the case. For example, in D. simulans there is only AOX4, while in D. melanogaster there is AOX3 and AOX4, but the GO terms for each are the same. However, as D. simulans genes are generally not independently annotated, especially those without D. melanogaster orthologs, if there was no D. melanogaster ortholog the gene was not included in the enrichment analysis. This is a significant fraction of the overall genes that were involved in the response to treatment, treatment by genotype, and genotype by treatment by time, however there is no viable alternative. Lists of significant genes were tested for GO enrichment using the PANTHER classification system (Mi et al. 2017). They were corrected for multiple testing and a p-value of .01 was required for significance.
Functional class enrichment
To test for functional class enrichment multigene exonic regions were not included. Every test of functional class enrichment compared the frequency of a given subcategory among all exons and exonic regions detected in the dataset compared to the frequency within a significant list of exons and exonic regions. A χ2 test was performed in R to test the significance of the enrichment of each of these categories.
D. melanogaster polymorphism and divergence
To compare polymorphism between the significant sets of genes and genome-wide averages we calculated Tajima’s D genome-wide for the source populations of D. melanogaster and D. simulans. We obtained the VCF files from the Winters population sequenced by (Campo et al. 2013), which includes all six of the genotypes assayed here as well as 29 other inbred lines sampled from the same population at the same time. The coordinates of the genes implicated for exons, exonic regions, and exon fragments were converted from the current assembly coordinates (v6) to those used in the previous study (v5). For D. simulans we used data previously obtained from 170 individuals from this population (Signor et al. 2017c). To obtain estimates that were consistent with our dataset (which is gene regions) we extracted regions from genome-wide VCFs that corresponded to genes as annotated in the latest assembly. Note that we are considering gene regions here rather than trying to include regulatory regions for a number of reasons: 1) Gene regions includes introns, which will include some regulatory regions and splice sites. 2) The location of regulatory regions for these genes is not well established, much less so between species where we do not know if they may have shifted locations. This approach is more conservative than including arbitrary amounts of putative upstream regulatory regions. 3) If there has been recent selection on a regulatory region it may still be linked to the gene region and show the same differences in polymorphism frequency. We excluded regions of reduced recombination near the centromeres, either 1 MB or more if there were significant reductions in diversity for a broader region, calculated as extended negative Tajima’s D or values of π < ½ the chromosomal average (Sedghifar et al. 2016; Signor et al. 2017c). SNPs within these coordinates were separated using bedtools intersect (v2.26.0). We excluded regions from the fourth chromosome and unassembled scaffolds for the analysis of population genetic differences. In both species this is a trivial number of significant regions. Alternate SNPs that were present in > 99% of the mapping population were excluded, as were SNPs with more than 10% missing data. Tajima’s D was calculated in windows of 1 KB using VCFtools v1.12a. Windows of Tajima’s D that overhung the ends of genes were included in the analysis.
We calculated per-gene DXY for the significant exons, exonic regions, and genes with significant changes in rank abundance, to determine if there were any unusual patterns of divergence for these subsets. To calculate divergence comparable regions of the genome need to be identified, and there has been considerable evolution of transcription start sites between D. melanogaster and D. simulans (Main et al. 2013). Comparing regions annotated as the gene regions in each species introduces large and unexpected gaps in the start of the alignments. While the evolution of transcription start sites is of interest, if unaccounted for it will artificially inflate estimates of DXY. Thus for genes implicated in differences in D. melanogaster the regions annotated as genes were blasted using ncbi blastn (v2.4.0) to the D. simulans assembly and then back to the D. melanogaster assembly and used for divergence statistics, and vice versa for D. simulans genes. This means that for different genes small non-coding regions may be included depending upon the direction of evolution of transcription start sites, or small portions of coding regions may be excluded. Furthermore, for genes from D. simulans with more than one ortholog in D. melanogaster the top blast hit was used, and while there may have been orthologous sequence in D. melanogaster (or D. simulans) if there was no annotated ortholog it was excluded from the analysis. For example, some of the non-protein coding genes from D. simulans have very conserved BLAST hits in D. melanogaster that are not annotated as containing genes. The sequences were aligned using the R package DECIPHER (Wright 2016) and if necessary they were reverse complimented using Biostrings (Pages et al. 2017, for example, in the region on 3R in D. melanogaster that is an inversion). Following alignment the D. simulans and D. melanogaster reads were output separately in aligned fasta format and DXY was calculated using an R script courtesy of Dr. Emily Delaney, which incorporates commands from the R packages ape and pegas (Supplemetnal file 4, Paradis et al. 2004; Paradis 2010)).
This was done for every orthologous gene in the genome from D. melanogaster and D. simulans, and each significant subset from each component of variance, to compare distributions of divergence.
Results
Gene expression and isoform usage
It is difficult to decouple alternative isoform usage from gene expression, given that many exons are shared between isoforms or overlap other exons. To infer isoforms from short read data, one must rely upon unique junctions or regions of individual isoforms and extrapolate to shared regions. This requires accurate isoform annotation (knowing that any given exon/junction is found in combination with other exons/junctions) and in general can be very noisy. Accordingly, we subscribe to a simpler but more robust approach and summarize the abundance of different exons and exonic regions separately. We detail overall abundance of exons, exonic regions, and exon fragments, and changes in the rank abundance of exons within a gene (Figure 1 C&D). As illustrated in Figure 1 C&D, these approaches summarize different features of expression and splicing – for example differences in the expression level of exon fragments from cabut suggest both that there are overall expression differences for the gene between environments, and that in both D. simulans and D. melanogaster one of the alternative isoforms is more abundant with ethanol (Figure 1C). While exons within Drat change their expression in response to ethanol by time, and between environments at 30 minutes, that doesn’t capture the fact that the third exon is most abundant in ethanol while the first is most abundant without it, suggesting differences in isoform abundance that may also belong to unannotated isoforms (Figure 1D). In the following sections we will first summarize changes in exon, exonic region, and exon fragment abundance followed by differences in rank abundance between environments.
Exons, exonic regions, and exon fragments
1994 exons, 406 exonic regions, and 608 exon fragments changed their expression in response to genotype in D. simulans, while in D. melanogaster 1445 exons, 631 exonic regions, and 1135 exon fragments altered their representation. Note that the results for D. melanogaster are summarized in (Signor & Nuzhdin 2018), but are included here for comparison. 76 exons, 23 exonic regions, and 18 exon fragments changed expression in response to ethanol in D. simulans, compared to 15 exons, 13 exonic regions, and 21 exon fragments in D. melanogaster. Seven exons and exonic regions were shared between species for treatment, Drat, cabut, CG11741, CG32512, CG4607, Pino, and sugarbabe. For all discussion of shared genes, the particular exon or exonic region may or may not be the same, as well as the direction or nature of the change in expression. The complexity of this comparison is shown in Figure 1C, where in D. melanogaster cabut has three annotated transcripts and four exon fragments (one exonic region), while in D. simulans cabut there are two transcripts, one exon, and two exon fragments (one exonic region). In addition, in D. melanogaster cabut is nested within the gene ush, while in D. simulans it is not nested. In D. melanogaster an area annotated as an intron in D. simulans is unique to its third isoform, and this is more frequent with ethanol than without; in D. simulans there is one unique exon fragment belonging to one isoform which is more frequent with ethanol than without, suggesting that isoform is more common in ethanol environments (Figure 1C). In D. melanogaster the exon fragment which is unique in D. simulans is not unique, and only increased frequency of its unique third isoform can be inferred (Figure 1C). 15 exons and exonic regions and three fragments had no ortholog in D. melanogaster, 12 exons and exonic regions and one exon fragment of which were non-protein coding genes, and one exon fragment and one exon/exonic region of which are labeled as pseudogenes. The interaction between genotype and treatment was significant for 387 exons, 98 exonic regions, and 81 exon fragments in D. simulans and three exons, no exonic regions, and no fragments in D. melanogaster (Figure 2 A&B). Of these, 82 exons and exonic regions and 9 exon fragments originate from genes that do not have an ortholog in D. melanogaster, and 58 exons and exonic regions and seven exon fragments of these were non-protein coding, while ten exons and exonic regions and one exon fragment were pseudogenes. No fusions or fragments were shared between D. simulans and D. melanogaster.
24 exons, seven exonic regions, and six exon fragments were significantly different in response to the interaction between ethanol and time in D. simulans, compared to 22 exons, eight exonic regions, and 12 fragments in D. melanogaster. Three genes were shared between exons and exonic regions for these species, Drat, cabut, and CG43366. CG43366 is the Drosophila homolog of human Serpina2, which has previously been implicated in susceptibility to chemical dependence (Agrawal et al. 2008). In D. simulans 1158 exons, 299 exonic regions, and 225 exon fragments were significantly different for the interaction between genotype, ethanol, and time, while in D. melanogaster no exons, two exonic regions, and four exon fragments were significantly different (Figure 2 A&B). No exon fusions or fragments were shared between species for this component of variance. Of the exons and exonic regions 24 were pseudogenes, 172 were non-protein coding genes, and 265 had no ortholog in D. melanogaster. Among exon fragments 39 had no ortholog in D. melanogaster, of which 29 were non-protein coding genes and four were pseudogenes. None of those with orthologs were non-protein coding genes. At ten minutes nine exons, three exonic regions, and four exon fragments are significantly different between treatments, compared to two exons, one exonic region, and three exon fragments in D. melanogaster (Figure 2A). cabut is shared between exons and exonic regions in D. melanogaster and D. simulans. At 20 minutes three exons, one exonic region, and two exon fragments were different in D. simulans, two of which belong to cabut in both exons/exonic regions and exon fragments. In D. melanogaster one exon, four exonic regions, and five exonic fragments are different at this time point, and the gene cabut is shared between species for both categories. At 30 minutes 29 exons, four exonic regions, and six fragments in D. simulans were different between treatments, all of which have D. melanogaster orthologs. In D. melanogaster 46 exons, 24 exonic regions, and 20 exon fragments were significantly different between treatments at 30 minutes. Among exons and exonic regions Drat, CG32103, sugarbabe, cabut, Pinocchio, CG32512, and CG4607 are shared between species. Among exon fragments cabut and Pinocchio are shared.
Shared and unique exons, exonic regions, and fragments
Comparisons between D. simulans and D. melanogaster are shown in Figure 2C for the three components of variance in which enough genes are implicated in D. melanogaster to make the frequency of different categories meaningful. When an exon is unique/constitutive it belongs to the only annotated transcript for that gene, and as such no exonic regions are unique/constitutive. However, the combined counts are shown for the common and constitutive categories. However, in D. melanogaster constitutive exons and exonic regions are much more common than in D. simulans, where unique/constitutive exons are overwhelmingly implicated. This could potentially be explained by differences in annotation, for example if fewer genes have multiple transcripts annotated in D. simulans. While this does appear to be the case (Supplemental Figure 1), it is unclear if it is enough to explain the discrepancy between species. It is also possible that in response to the environment genes with alternative splicing are more important in D. melanogaster compared to D. simulans. The proportion of genes in each category in D. simulans is consistent between components of variance, including ethanol by genotype and ethanol by genotype by time, again suggesting the possibility that differences in annotation are responsible. If annotation differences are responsible this does not affect the overall results, for example the number of exons implicated, but may make comparison between species for the number of unique versus constitutive differences not meaningful. It also underlines the importance of not relying upon isoform annotation when trying to understand differences in expression and alternative splicing.
GO enrichment analysis
We report here only the results of enrichment for exons and exonic regions (Table 2). In D. simulans the response to treatment, treatment by time, and the three time points were not enriched for any GO terms. Much of the lack of enrichment is likely due to annotation issues – for example 9% of the exons implicated in treatment over time could not be resolved to a single gene, and of the remaining genes 14% do not have a D. melanogaster ortholog. Of those with annotated orthologs, 20% do not have any gene ontology terms associated with them. A summary of significant GO terms for the remaining components of variance is shown in Table 2. In D. melanogaster exons and exonic regions were not significantly enriched for any category of genes. In general, the small number of genes implicated for many categories precludes any conclusion of enrichment.
Changes in rank abundance
In response to treatment 54 genes showed changes in rank abundance, two of which are without a D. melanogaster ortholog and one of which is non-protein coding. 94 genes change the rank abundance of their exons for the interaction between treatment and time, including seven genes with no ortholog in D. melanogaster, five of which are non-protein coding (three are pseudogenes). In D. melanogaster 71 genes changed the rank abundance of their exons in response to ethanol and 145 changed for treatment by time (Signor & Nuzhdin 2018). No genes were shared between species. For the interaction between treatment and time lola, Mhc, and Prm were shared between species. lola is well established as being involved in the response to ethanol and is the Drosophila ortholog of ZBTB20, hypermethylation of which has been associated with major depressive disorder (Davies et al. 2014), as well as alcohol related cancer (Shi et al. 2018), and the development of fatty liver disease (Liu et al. 2017). Figure 1D illustrates an example of a change in rank abundance for the gene Drat in D. simulans, where with ethanol the third exon is most abundant, and without ethanol the first exon is most abundant. While differences in exon abundance summarized above capture some of this variation, as Drat is significant for some exons for ethanol by time and 30 minutes, the change in rank abundance highlights potential differences in exon inclusion between environments.
GO enrichment analysis
Changes in rank abundance were not enriched for any GO terms in D. simulans, with the exception of peroxiredoxin activity in response to ethanol, albeit slightly above the more conservative cut-off used in our other tests (p = .017). Peroxiredoxin activity has been associated with protection against alcohol induced liver damage (Bae et al. 2011; Chattopadhyay et al. 2015). In D. melanogaster genes implicated in changes in rank abundance in response to ethanol by time were enriched for cellular components actin cytoskeleton, again somewhat above our more conservative cutoff (p = 0.018) (Signor & Nuzhdin 2018).
In D. simulans genotype-specific reactions to the environment are abundant
In D. melanogaster components of variance for interaction terms have very few significant genes, with the largest category being exons and exonic regions that respond to ethanol and that are different at 30 minutes. D. simulans is roughly comparable for the majority of these categories. However, many more exons and exonic regions are significant for components of variance that interact with genotype: 1457 for the interaction between genotype, treatment and time, and 486 for the interaction between genotype and treatment, compared to two and three exons and exonic regions respectively in D. melanogaster (Figure 2 A&B). This suggests that in D. simulans interactions with genotype are a far more important component of the response to ethanol than in D. melanogaster. It is also worth noting that in D. simulans the response to ethanol (plasticity) is three times as large (28 exons and exonic regions in D. melanogaster compared to 99), though this is many order of magnitudes less of a difference than for genetic variation for plasticity.
Genotype-specific responses are enriched for non-coding genes in D. simulans
A large number of non-protein coding genes were implicated in gene expression changes in D. simulans in these analyses, so we applied a χ2 test to understand if our gene lists were enriched for this functional category. The number of non-protein coding genes that were significant for ethanol, ethanol by genotype, and ethanol by genotype by time, were more than would be expected by chance in D. simulans (13, χ = 49.598, p < 0.0005; 68, χ = 235.21, p < 2.2 x 10−16; 196, χ = 727.17, p < 2.2 x 10−16). Other than the pseudogenes, these are long non-coding RNAs (lncRNAs) as the shortest is annotated at 713 bp, and the majority are over 4,000 bp (long non-coding RNAs being any non-protein coding genes over 200 nt). The interaction between treatment and time, and the differences in expression at ten, 20, and 30 minutes were not enriched for non-protein coding genes, nor were changes in rank abundance.
D. melanogaster is not enriched for non-protein coding genes in response to ethanol, ethanol by time, ethanol by genotype, ethanol by genotype by time, ten or 20 minutes, exons or exonic regions expressed only in one environment, or genes implicated in changes in rank abundance. However, at 30 minutes D. melanogaster is enriched for non-protein coding genes, though this concerns far fewer genes than in D. simulans (5, χ = 12.831, p = 0.005).
Non-coding genes that respond to the environment in D. simulans do not have D. melanogaster orthologs
Given that a considerable number of non-protein coding genes were implicated in this analysis for D. simulans, we were suspicious that this may be an annotation issue in D. melanogaster. Using the exons associated with every transcript from each of these non-protein coding genes as a reference we did not find that D. melanogaster reads mapped to these exons across a range of relaxed mapping parameters allowing for mismatches and gaps (using bwa mem, from defaults to −B 2 −O 3; using both the D. simulans exons or existing homologous regions in D. melanogaster). In several cases, such as CG30377, an annotated protein coding gene in D. melanogaster is annotated as a pseudogene in D. simulans, with no noted relationship between them. Therefore there may be a small number of cases in which an ortholog exists but is not recognized, but in general these non-protein coding genes appear to be unique to D. simulans. It has been noted previously that lncRNAs can share high sequence similarity between related species, but be expressed in only one (Ulitsky 2016). Determining the orthology of these genes and the dynamics of non-protein coding gene evolution on the Drosophila phylogeny is an interesting question that will require additional future research.
The response to the environment is enriched for nested non-protein coding genes in D. simulans
In analyzing the non-protein coding genes identified in D. simulans we noted that many of them were nested in the introns of other genes. We investigated the frequency with which nested genes were implicated in our analysis, and nested non-protein coding genes, to understand the possibility that differences in expression could also be changes in intron retention in the parental genes (as ‘nestedness’ generally refers to exons or entire genes found within the introns of other genes). Using the criteria that an exon nested in an intron must overlap the intron by at least 80 bp or 10%, we found that in D. melanogaster 9.2% of exons were nested within introns, while in D. simulans 9.7% were nested, similar to what has been previously reported (Lee & Chang 2013). This was reflected in our data, where for both D. melanogaster and D. simulans 6.8% of exons and exonic regions were nested within introns (lower because multi-gene exons were excluded). However, among significant exons and exonic regions in D. melanogaster 18.6% were nested, while in D. simulans 33.5% were nested. This is a significant enrichment of exons that are nested within other introns, for both D. melanogaster (χ = 12.344, p < .0004) and D. simulans (χ = 1721.1, p < 2.2 x 10-16).
The remaining question then is whether nested genes are more likely to be non-protein coding, or whether the dataset is enriched for both. Indeed, compared to the total number of nested genes that are noncoding within the dataset, the number that are significant for the response to ethanol is enriched for nested, non-protein coding genes in D. simulans (χ = 237.49, p < 2.2 x 10-16), but not in D. melanogaster. Overall D. melanogaster has more annotated non-protein coding genes (2963) than D. simulans (1675), and similarly more exons from noncoding genes are nested within introns (1772 D. melanogaster, 1066 D. simulans), suggesting that this pattern is not reflective of annotation issues between the two species. However, the question remains as to why nested genes are so much more likely to be non-protein coding than non-nested genes – for example in the total dataset of D. simulans 9.7% of exons are nested, while of the annotated non-coding genes in D. simulans 63% are nested. Furthermore the importance of nested non-protein coding genes for the response to the environment, or potentially delays in splicing that are specific to certain components of variance, is unclear.
Nested non-protein coding genes involved in the response to ethanol are regulated independently of their parental gene
In D. simulans 328 significant nested genes (375 nested exons and exonic regions) are on the opposite strand as their parental gene, 75% of the total significant nested genes. Of non-protein coding genes, this bias is stronger, with 83% having the opposite strandedness as their parental gene. In D. melanogaster 13 significant genes are nested (16 total exons and exonic regions) and of these nine are on the opposite strand as their parent gene (69%). Only one non-protein coding gene shares strandedness with its parental gene, CR44660/drp1. It has been observed that many nested non-protein coding genes require expression of the parental gene, share strandedness, and depend on splicing out of the parental intron for activation (miRNAs, (Boivin et al. 2018). In both D. simulans and D. melanogaster, however, we do not observe non-protein coding genes as being more likely to share strandedness with their parental gene among our significant nested exons and exonic regions. This suggests that whatever the reason for the observed enrichment in nested non-protein coding genes, it is likely not because the parental genes are being expressed and the nested non-protein coding genes are being spliced out of the introns. We could not test the D. melanogaster dataset for correlation between the expression of parental and nested genes as it is too small. However, in D. simulans genes that were both non-coding and shared strandedness with their parental gene had highly correlated expression differences between treatments (0.83, 14 genes). All other categories (i.e. nested non-protein coding, opposite strand) were essentially uncorrelated (0.09-0.27). We note that of the parental genes only three are also represented as significant in the main dataset, sima, mira, and bru-3 (significant in response to ethanol or genotype by ethanol). Thus it is unlikely that overall the observed differences in expression are due to changes in the expression of the parental gene, as this would likely result in differences in expression detected at both loci.
Polymorphism and divergence in D. melanogaster and D. simulans
We were interested to determine if the patterns of polymorphism and divergence in the genes implicated in the response to ethanol suggested that they were evolving under a particular selection regime or were unusually diverged compared to background levels of polymorphism or divergence. If the genes implicated in expression differences in D. melanogaster had unusually low Tajima’s D, for example, this could suggest that directional selection for adaptation to ethanol is responsible for the observed expression differences. This might help to clarify whether the observed expression differences are adaptive or show evidence of adaptation. For both polymorphism and divergence in the following sections we will compare the genome-wide background distribution of polymorphism and divergence to differences in expression and rank abundance as a whole, rather than by components of variance, for two reasons. Firstly, for the majority of categories other than genotype interaction terms in D. simulans, the number of independent genes on each list is ~50% confounding any attempt to separate out components of variance. Second, in D. melanogaster there are too few genes that are significant for many components of variance and they cannot be considered separately.
Polymorphism
We calculated Tajima’s D in windows of 1 kb for both all annotated gene regions in the genome and the subset of genes implicated in differences in ethanol response in both species, and consider anything in the top ±2.5% to be an outlier. Outliers for D. simulans and D. melanogaster, separated between the X and the autosome, are shown in Table 4, and the percentage of outliers for each category are graphed in Figure 3A.
D. melanogaster
For changes in abundance of exons and exonic regions on the X and autosome, D. melanogaster is not enriched for outliers of Tajima’s D but there are more outliers than expected in D. simulans (Autosomes: χ = 5.26, p < 0.02, X: χ = 29.361, p < 0.0005). In D. melanogaster these intervals correspond to 27 genes, while in D. simulans they cover a total of 25, including both the X and autosomes. Between the two lists there is considerable overlap in the location of outlier intervals, at CG43366, AOX4, dsb, fumble, CG31875, Mical, Ada2b, forked, CG1986, and CG42749.
For genes implicated in changes in rank abundance in D. melanogaster, there is also no significant enrichment for outliers of Tajima’s D relative to the genome-wide frequency in D. melanogaster or D. simulans. In D. melanogaster on the autosome these intervals cover 59 genes, while in D. simulans they cover 56. 14-3-3ε, AcCoAs, boi, CG31522, CG34398, eff, elF4EHP, fru, grh, kay, lds, lola, mRpL21, osp, siz, Syp, Tep2, and tweek are shared in containing outlier intervals for Tajima’s D in both D. simulans and D. melanogaster. In D. simulans on the X chromosome these intervals cover four genes, while in D. melanogaster they cover nine, and two are in common between the species (CG34417, Ten-a).
D. simulans
Many more genes were implicated in the response to ethanol in D. simulans, and for exons and exonic regions that changed in abundance there was an excess of outliers for Tajima’s D for the autosomes and the X chromosome (Table 4; Figure 3; Autosomes: χ = 9.41, p = 0.0022, X: χ = 14.03, p = 0.00018). These intervals correspond to 428 unique genes, and many outlier intervals overlap the same gene region. For example, every 1 kb interval that overlaps the gene Cubilin is an outlier, with an average of −2.41. Polymorphisms in the human ortholog of Cubilin, CUBN, have been associated with lifetime rates of heavy drinking (Hamidovic et al. 2013). It is worth noting here that the outliers that were significantly enriched in the D. melanogaster dataset for D. simulans were all in the positive direction, while these are both positive and negative. Given the tendency towards positive Tajima’s D in this population of D. simulans, negative values of Tajima’s D may be a more suggestive measure of selection, either purifying or directional (Signor et al. 2017c). In D. melanogaster the genes that were implicated in changes in the expression of exons and exonic in response to ethanol in D. simulans were less enriched for outliers on the X chromosome than expected, and not enriched on the autosomes (X: χ = 6.38, p = .0012).
For genes implicated in changes in rank abundance in D. simulans there was a significant enrichment compared to genome-wide frequency for both the autosomes and the X (Autosomes: χ = 19.78, p < 8.7 x 10-16, X: χ = 458.73, p < 2.2 x 10-16). This includes intervals in 44 unique genes including 16 intervals in the gene Mondo. In D. melanogaster this did not constitute an enrichment relative to genome-wide. Outlier intervals that occur within shared genes between the two species include CG7881, lola, mbf1 (a stress response gene), Mhc, PI31, and Pka-1 (which has been implicated in the response to ethanol previously (Chen et al. 2008)). Note that for changes in the abundance of exons and exonic regions in D. simulans, if regions of low recombination are excluded from the candidate regions the enrichment is no longer significant, but there is no change among the other tests.
Divergence
We calculated DXY for orthologous genes on the autosomes (10,435 genes) and the X (1,943 genes) between D. melanogaster and D. simulans (Figure 3B). DXY was similar to that previously reported, with an average of 0.52 on the autosomes and 0.55 on the X (compared to 0.048 on the autosomes, 0.054 on the X (Nolte et al. 2013)). For the subsets of genes implicated in expression differences the mean was 0.047 on the autosomes and 0.057 on the X for expression differences in D. melanogaster and 0.053 on the autosomes and 0.06 on the X in D. simulans. The genes implicated in changes in rank abundance in D. melanogaster had a mean DXY of 0.047 on the autosomes and 0.05 on the X. In D. simulans this was 0.049 on the autosomes and 0.059 on the X. To determine if the distribution of DXY for any of the significant subsets of genes differed compared to the genome-wide distribution we used a Kolmogorov-Smirnov two-sample test, separately for genes on the X and the autosomes. In D. melanogaster exons and exonic regions that changed abundance in response to ethanol were not significantly different on the autosomes or the X. However, exons and exonic regions with significant differences in rank abundance did differ in distribution on the autosomes, but not the X (Autosomes: D = 0.14, p = 0.002). The genes implicated in expression differences on the autosomes in D. simulans differed significantly from the genome-wide distribution, but not on the X (Autosomes: D = 0.061, p = 0.005). However, those implicated in changes in rank abundance did not, on the autosomes or the X. A difference in distribution does not indicate the direction of effect, and in fact in D. melanogaster the mean DXY for changes in rank abundance is lower than genome-wide, while for expression differences in D. simulans it is slightly higher. This suggests that the genes implicated in changes in rank abundance in response to ethanol in D. melanogaster are somewhat more conserved than expected based on genome-wide distributions, while those implicated in expression differences in D. simulans are slightly less conserved. Note that because we can only include genes with orthologs here this does not include any of the abundant non-protein coding genes implicated in D. simulans, likely to be among the more rapidly evolving genes.
Population genetic patterns – or a lack thereof-among genes that respond to ethanol are difficult to interpret, in part because plastic responses may be due to a single regulatory change at an upstream gene. As the number of regulatory (or coding) changes involved in the plastic response are not known, it is difficult to interpret a lack of enrichment, as it is possible, for example, that only one of the genes whose expression changed could be under selection. Furthermore, it is not known whether the response to ethanol is adaptive – while it is presumed to be so in D. melanogaster due to its relative fitness on ethanol compared to D. simulans, functional links between gene expression differences and the phenotype were not established.
Discussion
In this study both D. melanogaster and D. simulans have a plastic response to the environment, but D. melanogaster is lacking in genetic variation for plasticity. Given the observed patterns, it appears most likely that for D. simulans 15% ethanol is a novel environment, and that the marked G x E is a passive stress response. This maladaptive plasticity would allow for the expression of previously cryptic variation that had accumulated in the absence of selection. In D. melanogaster this environment may not be stressful or novel, resulting in past selection removing genetic variation for plasticity that in D. simulans was allowed to neutrally accumulate. In this sense D. melanogaster may be more ‘locally adapted’, as it has evolved a reduced plastic response compared to D. simulans. However, it could also be that because D. melanogaster colonized the Americas a few thousand years prior to D. simulans, D. simulans has not yet reached an optimal level of plasticity (a nonequilibrium situation). Also, the frequency with which they encounter ethanol rich environments may vary for D. simulans and D. melanogaster, resulting in different selection pressures for developing the optimal level of plasticity – particularly if there is a trade-off with traits that are beneficial to the semi-domestic habitat of D. simulans. Trade-offs among genetically correlated traits may mean that a genotype with optimal phenotypic plasticity in one environment is constrained from evolving the optimal phenotype in another environment, resulting in the maintenance of variation for phenotypic plasticity.
Joshi and Thompson (1997) found that in D. simulans there was greater between family variation for the phenotypic response to ethanol substrate prior to selection for tolerance to ethanol. Compared to D. melanogaster and control populations, after selection in an ethanol environment the greatest change was a reduction in variation for plasticity between D. simulans families. Joshi and Thompson (1997) quantified phenotypes such as development time, which are not easily generalizable to expression patterns, but it is suggestive. However, in studies of other phenotypes in presumably adapted and non-adapted populations of D. melanogaster, no increase in genotype by environment response were observed in non-adapted populations (Heckel et al. 2016). This could be due to differences in the degree of stress or novelty of different environments, or it is possible that the response in D. simulans is due to something other than stress induced maladaptive plasticity.
The abundance of lncRNAs which are involved in genotype by environment interactions in D. simulans is perhaps not surprising, given that they have been implicated previously in the response to stress (Valadkhan & Valencia-Hipólito 2016). It is also possible they are more frequently involved because lncRNAs are often less conserved, as there is no requirement for the maintenance of ORFs and codon synonymy (Chodroff et al. 2010; Ulitsky et al. 2011; Quinn et al. 2016; Ulitsky 2016). It has been observed previously that transcription evolves more quickly than sequences, and lncRNA are commonly homologous to non-transcribed sequences in other species, however these species are typically more diverged than D. melanogaster and D. simulans (Main et al. 2013; Ulitsky 2016). If neutral variation was allowed to accumulate without selection, and then uncovered in a stressful environment, preferential accumulation within less constrained sequences would be expected. D. melanogaster is also enriched for lncRNAs for the response to ethanol at 30 minutes, which could be explained by increasing ethanol stress over time. While it is possible that lncRNAs are indicative of stress-induced maladaptive plasticity this cannot be separated from a more general involvement in the stress response, which would not necessarily occur as a result of maladaptive plasticity.
Why the lncRNAs are preferentially nested in D. simulans is less easily envisioned, though four potential scenarios for an increase in nested lncRNAs are depicted in Figure 4. The simplest explanation is only that there has been a change in the expression of nested lncRNAs, and perhaps nested lncRNAs are more commonly involved in less essential processes than other lncRNAs and are therefore less constrained. It is also possible that ethanol causes a change in the intron stability of the spliced transcript, causing increased (or decreased) detection of the nested lncRNA (Figure 4). It is also possible that a change in the processing of the parental gene occurred, causing a change in the number of reads mapping to unspliced introns, or that the parental gene simply changed expression. These latter two explanations would predict correlation between the parental gene expression and nested gene expression, which was only observed for the very small fraction of lncRNAs that shared strandedness with their parental gene. Thus it is more likely that either nested lncRNAs are less constrained, or ethanol alters the stability of introns during or after the process of splicing.
The population genetic patterns observed in these two species are not easily interpreted. D. melanogaster does not show any increase in outliers for Tajima’s D in any direction for either changes in rank abundance or expression (with the exception of fewer than expected on the X chromosome for D. simulans genes implicated in expression differences). D. simulans is an outlier in the positive direction for D. melanogaster genes implicated in differential expression, which could indicate that the genes involved in expression differences in D. melanogaster are subject to relaxed purifying or balancing selection in D. simulans. As these are all in the positive direction, it is worth noting that genome-wide D. simulans is biased towards positive values of Tajima’s D, likely due to recent population contraction (Signor et al. 2017c). While they are still more positive than expected due to background levels of Tajima’s D, caution is also warranted in interpreting these patterns as due to selection. Among genes with expression differences and changes in rank abundance in D. simulans, there is an enrichment of outliers for expression differences on the autosome and X. This is due in large part to negative Tajima’s D outliers, which given genome-wide patterns is overall more suggestive of selection, in this case directional selection. It may be that D. simulans phenotypic plasticity is currently not in equilibrium, and there is selection for an optimal phenotypic response. This can be true whether or not the observed response is due to maladaptive plasticity – diversity of the passive stress response to ethanol implies that some responses will be more beneficial than others, and there may be selection against the less adaptive stress responses.
Between species divergence (DXY) suggests that in D. melanogaster the genes involved in changes in rank abundance are less diverged than expected compared to background levels of divergence. In D. simulans the genes implicated in expression differences for exons and exonic regions are more diverged than expected based on background levels of divergence, which combined with being outliers for largely negative Tajima’s D could indicate that they have been important for adaptation in D. simulans. However, given that the number of nucleotide differences involved in the response to the environment is unknown – for example all the observed patterns could be due to a single trans variant, it is difficult to interpret the results of Tajima’s D and DXY in terms of selection.
Inferring that gene expression differences are adaptive or non-adaptive remains a major challenge in the study of gene expression reaction norms, given the lack of direct correlation between gene expression phenotypes and organismal phenotypes. However, the patterns observed in D. simulans do suggest maladaptive plasticity in response to ethanol exposure. In this scenario abundant genotype by environment interactions are expected to have accumulated neutrally and become uncovered in response to environment stress. In contrast, in D. melanogaster this ethanol environment is not novel and maladaptive plasticity has been selected out in favor of an adaptive phenotypic response. lncRNAs are preferentially differentially expressed in D. simulans in response to ethanol either because they are less constrained and can accumulate more neutral variation, or because they are involved in the general stress response. It is also possible that environmental heterogeneity has caused D. simulans to maintain balanced polymorphisms for plasticity in a way that has not occurred in D. melanogaster, though we believe there is less evidence in favor of this interpretation. In the future comparing African, non-ethanol adapted populations of D. melanogaster to cosmopolitan populations may be a way of discerning between these hypotheses.
Competing interests
The authors declare that they have no competing interests.
Funding
This work was supported by grants GM102227 and MH09156.
Author’s contributions
S.S. performed the experiments, analyzed the dataset, and wrote the manuscript. S.V.N. conceived of the experiment and assisted in writing the manuscript.
Acknowledgements
The authors would like to thank Jeremy Newman and Lauren McIntyre for contributions to the manuscript. The authors would like to thank our undergraduate researchers for assistance in producing this data: N. Shadman, V. Paterson, Z. Polonus, K. Cortez, L. Hassanzadeh, L. Cline, A. Khokhar, E. Lee, K.L. Yee, M. Ling, S. Sarva, O. Akintonwa, A. Gupta, R. Manson, P. Hassanzadeh, and K. Kavoussi. Lastly, the authors would like to thank B. & C. Emery for thoughtful commentary.
Footnotes
↵* Communicating author: ssignor{at}usc.edu
D. melanogaster sequence data have been submitted to GenBank: accession number PRJNA482662. D. simulans sequence data will be made available upon acceptance.