Abstract
All but the simplest phenotypes are believed to result from interactions between two or more genes forming complex networks of gene regulation. Sleep is a complex trait known to depend on the system of feedback loops of the circadian clock, and on many other genes; however, the main components regulating the phenotype and how they interact remain an unsolved puzzle. Genomic and transcriptomic data may well provide part of the answer, but a full account requires a suitable quantitative framework. Here we conducted an artificial selection experiment for sleep duration with RNA-seq data acquired each generation. The phenotypic results are robust across replicates and previous experiments, and the transcription data provides a high-resolution, time-course data set for the evolution of sleep-related gene expression. In addition to a Hierarchical Generalized Linear Model analysis of differential expression that accounts for experimental replicates we develop a flexible Gaussian Process model that estimates interactions between genes. 145 gene pairs are found to have interactions that are different from controls. Our method not only is considerably more specific than standard correlation metrics but also more sensitive, finding correlations not significant by other methods. Statistical predictions were compared to experimental data from public databases on gene interactions.
Introduction
Despite the plethora of modern and increasingly refined molecular biology assays – from DNA to metabolites and beyond – systematically uncovering the molecular bases of phenotypes remains one of the thorniest challenges in biology. “Omics” approaches allow whole genome, transcriptome, proteome, and other “omes” to be generated and candidate genes to be fished out of these high dimensional data, but understanding how these biomolecules interact even in the simplest pathways requires painstaking follow-on experimentation, construction of databases, and an immense collective effort to make connections from disjointed assays into a coherent model. Despite the large amount of studies and data generated for many systems, identifying underlying processes is still very rare; this is clear indication that better methods are needed to obtain understanding of biological processes from data. For complex traits the task is even more difficult. Sleep is a complex phenotype the evolution of which remains a classic mystery in biology. Although sleep and sleep-like behavior is conserved among species, its main purpose is not completely understood, and hypotheses for its purpose span functions like conservation of resources (Berger and Phillips, 1995; Scharf et al., 2008; Schmidt, 2014), pruning of synapses and memory formation (Krueger and Obál, 1993; Tononi and Cirelli, 2014; Joiner, 2016; Ly et al., 2018), and management of metabolite and waste products (Xie et al., 2013; Hill et al., 2020). It is plausible that sleep is a manifestation of multiple functions, and that it involves the activity of many genes to regulate a complex higher-level function; indeed many genes have been implicated in sleep (Harbison et al., 2017, 2013; Laing et al., 2019; Dashti et al., 2019; Jones et al., 2016; Jansen et al., 2019; Lane et al., 2019; Hammerschlag et al., 2017; Diessler et al., 2018; Joshi et al., 2019; Boyle et al., 2017). Assuming anything but the simplest possible model would therefore require a description that accounts for this complexity in the interactions of genes and gene products.
Artificial selection plus sequencing/resequencing is a powerful approach for identifying heritable variation in phenotypes and their underlying molecular bases (Schlötterer et al., 2015), typically assaying DNA or RNA expression in the initial and evolved populations and comparing them to controls (Faria et al., 2015, 2016). Coupling selection with gene expression identified candidate genes for diurnal preference (Pegoraro et al., 2020), olfactory behavior (Brown et al., 2017, 2020), food consumption (Garlapow et al., 2017), mating behavior (Mackay et al., 2005), resistance to parasitism (Wertheim et al., 2011), environmental stressors (Telonis-Scott et al., 2009; Sørensen et al., 2007), ethanol tolerance (Morozova et al., 2007), and aggressive behavior (Edwards et al., 2006). Caveats of that method include often not having molecular data on the intermediate generations, and relying on traditional statistical methods to assess the significance of polymorphic variants. In the case of gene expression, RNA levels are often modeled for each gene individually using linear models, without further consideration of the processes involved or interactions between genes. Inferring interaction between genes (as opposed to individual changes) requires observations of how the genes covary in time. Correlation or information theory-based methods (and others, reviewed in Emmert-Streib et al. (2012); Villaverde and Banga (2014); Liu (2015)) could be applied to estimate the relationship between the genes when that information is present, but neither is time course data usually available, nor are these methods standard in artificial selection experiments.
In this work we have artificially selected Drosophila melanogaster for increased or decreased night sleep duration and sequenced the mRNA of the flies from each generation of selection. The selection procedure produced both long- and short-sleeping fly populations significantly deviant from unselected controls. The RNA sequence data, which consisted of expression levels as a function of time (measured in generations), was analyzed using a Multi-Channel Gaussian Process (Melkumyan and Ramos, 2011; Bonilla et al., 2008) where each gene is described by one of these “channels”, and their relationships are estimated by an underlying covariance structure in the model. We describe the expression of 85 genes that had significant changes in the artificial selection long or short schemes along generation common to both males and females. We used this model to infer the magnitude of all 3,570 possible pairwise interactions between all possible pairs of genes. Results from this analysis and comparison to unselected controls suggest that multiple shifts in interactions underlie the increase and decrease of night sleep duration, with 145 interactions not being observed in the controls.
Methods and Materials
Construction of outbred population
We constructed an outbred population of flies – using ten lines from the Drosophila Genetic Reference Panel (DGRP) (Mackay et al., 2012; Huang et al., 2014) with extreme night sleep phenotypes (Harbison et al., 2013). Five lines had the shortest average night sleep for both males and females combined in the population: DGRP_38, DGRP_310, DGRP_365, DGRP_808, DGRP_832. The other five lines had the longest average night sleep in the population: DGRP_235, DGRP_313, DGRP_335, DGRP_338, and DGRP_379. The ten lines were crossed in a full diallel design, resulting in 100 crosses. Two virgin females and two males from the F1 of each cross were randomly assigned into 20 bottles, with 10 males and 10 females placed in each bottle. At each subsequent generation, 20 virgin females and 20 males from each bottle were randomly mixed across bottles to propagate the next generation. The census population size was 800 for each generation of random mating. This mating scheme was continued for 21 generations, resulting in the Sleep Advanced Intercross Population, or SAIP (Harbison et al., 2017; Serrano Negron et al., 2018). The SAIP was maintained by pooling the flies from each bottle together, then randomly assigning 20 males and 20 females to each bottle each generation.
Artificial selection procedure for night sleep
At generation 47 of the SAIP, we began the artificial selection procedure, which we defined as generation 0. We seeded six bottles with 25 males and 25 females mixed from all bottles of the outbred population. Two replicate bottles were designated for the short-sleeping protocol (S1 and S2), two for the long-sleeping protocol (L1 and L2), and two for a control (unselected) protocol (C1 and C2). Each generation, 100 virgin males and 100 virgin females were collected from each of the six population bottles. Virgins were maintained at 20 individuals to a same-sex vial for four days to control for the potential effects of social exposure on sleep (Ganguly-Fitzgerald et al., 2006). Flies were placed into Trikinetics (Waltham, MA) sleep monitors, and sleep and activity were recorded continuously for four days. We used an in-house C# program (R. Sean Barnes, personal communication) to calculate sleep duration, bout number, and average bout length during the night and day, as well as waking activity. We also calculated sleep latency, defined as the number of minutes prior to the first sleep bout after the incubator lights turn off. In addition, we computed the coefficient of environmental variation (CVE) for each sleep trait as the product of the standard deviation in each replicate population (σ) divided by the mean (μ) ×100 (Mackay and Lyman, 2005).
All sleep traits including night sleep duration were averaged over the four-day period. For the short (long)-sleeping populations, we chose the 25 males and 25 females in each replicate population having the lowest (highest) average night sleep as parents for the next generation. Any flies found dead were discarded, and the next shortest (longest)-sleeping fly was used in order to ensure that 25 females and 25 males were used as parents. For the control populations, we chose 25 males and 25 females at random to start the next generation. Flies were not mixed across replicate populations. We repeated this procedure for 13 generations.
Quantitative genetic analyses of selected and correlated phenotypic responses
We analyzed the differences in night sleep among selection populations as well as other potentially correlated sleep traits using a mixed analysis of variance (ANOVA) model: where Y is the phenotype; μ is the overall phenotypic mean; Sel, Sex, and Gen are the fixed effects of selection scheme (short- or long-sleeper), sex, and generation, respectively; Rep is random effect of replicate population; and ε is the error term. The CVE traits were assessed using the same model with the replicate terms removed. A statistically significant Sel term indicates a response of the trait to selection for night sleep; a significant Sel × Sex term indicates a sex-specific response to selection. We repeated the analysis for sexes separately using the reduced model where the terms are as defined above. We also analyzed the response to selection in each generation separately using the reduced model and the reduced model for each sex separately per generation.
Finally, we analyzed the change in sleep parameters over generations in the control populations using the model where each factor is as defined above.
RNA extraction and sequencing
As described above, sleep was monitored in 100 virgin males and 100 virgin females each generation. Twenty-five flies of either sex were used as parents for the next generation, leaving 75 flies of each sex in each selection and control population. Four pools of 10 flies of each sex were chosen at random from these 75 flies and frozen for RNA extraction at 12:00 pm. RNA was extracted from two of these pools; the remaining two pools were kept as back-up samples and used if needed. Samples were collected for the initial generation (0), and all subsequent generations. RNA was extracted using Qiazol (Qiagen, Hilden, Germany), followed by phenol-chloroform extraction, iso-propanol precipitation, and DNase digestion (Qiagen, Hilden, Germany). Qiagen RNeasy MinElute Cleanup kits (Qiagen, Hilden, Germany) were used to purify RNA according to the manufacturer’s instructions. With the exception of generation 1, which had RNA that was degraded, RNA from all other generations was sequenced. This produced 312 RNA samples (6 populations × 13 generations × 2 sexes × 2 replicate RNA samples).
Poly-A selected stranded mRNA libraries were constructed from 1 μg total RNA using the Illumina TruSeq Stranded mRNA Sample Prep Kits (Illumina, San Diego, CA) according to manufacturer’s instructions with the following exception: PCR amplification was performed for 10 cycles rather than 15 in order to minimize the risk of over-amplification. Unique barcode adapters were applied to each library. Libraries were pooled for sequencing. The pooled libraries were sequenced on multiple lanes of an Illumina HiSeq2500 using version 4 chemistry to achieve a minimum of 38 million 126 base read pairs. The sequences were processed using RTA version 1.18.64 and CASAVA 1.8.2.
RNA alignment of reads
Sequences were assessed for standard quality parameters using fastqc (0.11.4) (Babraham Institute, Cambridge, UK). Reads were aligned to the FB2015_04 Release 6.07 reference annotation of the Drosophila melanogaster genome using STAR (Dobin et al., 2013). Default parameters were used except that the minimum intron size was specified as 2, and the maximum intron size was specified as 268,107, consistent with the largest intron size in the D. melanogaster genome. STAR outputs aligned sequence to a SAM file format, which contains the code ‘NH’ (Dobin et al., 2013). An NH of 1 indicates a uniquely mapped read, while NH > 1 indicates that the read did not map uniquely. HTSeq was used to count only the uniquely mapped reads (NH = 1) (Anders et al., 2015).
Principal Component Analysis (PCA)
It was expected from previous studies of gene expression that there would be large differences in gene expression due to sex (Lin et al., 2016; Jin et al., 2001; Arbeitman et al., 2002; Parisi et al., 2003; ?; Harbison et al., 2005; Wayne et al., 2007; Zhang et al., 2007; Ayroles et al., 2009; Huylmans and Parsch, 2014; Huang et al., 2015). We performed Principal Component Analysis to assess those differences (Supplementary Figure S1). The principal components of the normalized RNA-seq count normalized matrix were computed, with each gene being treated as a different variable, and each sample a different observation. Samples were projected in the planes of the three first components, and clustering according to the experimental labels was inspected visually.
Gene normalization and filtering
The combined genic and intergenic counts were normalized by the expression of a pseudo-reference sample computed from the geometric mean of all samples, using the method described by Love et al. (2014). Filtering was performed by computing the 95th percentile of the distribution of normalized, base 2 logarithm, levels in the intergenic regions for males and females and using those values as cut-off level for the genic regions – i.e. any genes that did not have expression above this level for at least one sample were removed from further analyses (Zhang et al., 2010). The (linear scale) cutoff expression value for males was 48.6, and for females 102.
Generalized Linear Model analysis of expression data
Analysis of differential expression between selection schemes was initially performed for each gene independently. Given the separation of the expression levels by sex seen in the PCA analysis, analyses were conducted separately for the subsets of male or female flies.
We implemented a generalized linear model (GLM) with a hierarchical structure to account for non-independent, replicate-specific parameters. The description is similar to a generalized linear mixed model (GLMM), but uses a Bayesian formulation to specify the hyper-priors and is fully described below. Normalization factors for the RNA levels was performed using the scheme described by Love et al. (2014). A negative binomial likelihood was used and parameterized with the mean (given by the prediction of the linear model) and dispersion parameters; the number of samples (156 for each sex) allowed estimation of the latter together with model coefficients, dispensing with the need of other schemes applied when the number of samples is small, commonly implemented in some packages.
Bayesian inference was used and parameter priors were exploited to treat replicate effects in a hierarchical formulation (Gelman et al., 2013). Specifically, for each replicate-dependent parameter (say βshort,rep), two parameters were specified at the top-level (μshort and σshort), given (hyper-)priors, and estimated from the data together with all other parameters. Below that, both replicate-specific model parameters (βshort,1 and βshort,2) are given the same gaussian prior using top-level parameters (e.g. βshort,1 ∼ 𝒩(μshort, σshort) for that coefficient in replicate 1 as well as replicate 2). Under this formulation the full model for the expression of a gene j is given by logμj ∝ selrep + gen + sel × genrep′ where a relationship between each set of replicate-dependent parameters is enforced hierarchically through their higher level common parameters and hyper-priors. Explicitly, we have: where X is the design matrix, with binary 0/1 variables indicating parameters that apply to specific treatments (e.g. the entries multiplying β1,β2, are present for all, that βshort,1, is present for short sleepers from replicate 1, etc.) except for parameters dependent on the gen variable which takes the value of the generation (e.g. 0 through 13 for the entries multiplying the βgen parameter in all treatments, and for those multiplying βshort×gen,1 for short sleepers from replicate 1, etc.). Table 1 lists all parameters, their descriptions, design matrix values associated to them, and priors.
Maximum a posteriori probability (MAP) estimates and confidence intervals were obtained using the Stan package (Carpenter et al., 2017). Significance was calculated using a likelihood ratio test comparing the point estimates from the full model to a reduced model not including the interaction terms (i.e. logμj,rep = selrep + gen). Model p-values were corrected for multiple testing using the Benjamini-Hochberg method (Benjamini and Hochberg, 1995), with significance defined at the 0.001 level.
Calculation of non-parametric correlations between genes
The correlation coefficients (ρ) between any two pairs of genes can be computed directly from the data. Pearson correlation assumes the relationship between the two variables is linear, while Spearman correlation is rank-based and therefore accommodates non-linear relationships, although it still assumes the relationship is monotonically increasing or decreasing. We therefore computed Spearman correlations between genes that were found to be significant for both males and females in the GLM analysis –-one correlation coefficient was obtained for the data subset from each sex-selection combination. The significance of each correlation coefficient is tested using the null hypothesis that ρ = 0. Because the main interest is the interaction between genes in the selected populations that are different from controls we compare the coefficients by computing and comparing the confidence intervals for ρsel (where sel can be “short” or “long”) and ρcontrol using the normal approximation to arctanh(ρ) (Ruscio, 2008). We note that this is not exactly equivalent to the significance testing of the null hypothesis that ρsel = ρcontrol (Austin and Hux, 2002) (which relies on computing the confidence interval for ρsel − ρcontrol using the same method), since it over-estimates the total variance (i.e., one would find fewer significant instances). Nevertheless, the approach is valid and is more broadly applicable, in that it can be computed when a joint distribution with the two variables cannot be obtained – we use the term “significant” for either kind of difference, but explicitly state which one is used.
Gaussian Process regression
Gaussian Processes (GP) are an alternative function-space formulation to the well-known weight-space linear models of the form y = f(x) + ε; their use dates back to the 19th century and they have been covered extensively in the statistical and information theory literature (MacKay, 2003), becoming popular in machine learning applications (Bishop, 2006; Rasmussen and Williams, 2006), and more recently implemented in less technical contexts like the life sciences (Schulz et al., 2018). We give a brief overview of their usefulness, motivate their use in this work, and point to the references above for formal description of the method.
The weight-space linear model expresses the observations in terms of explicit linear coefficients (or weights) of the independent variable, x, possibly with further basis function expansions (e.g. square, x2, or higher order polynomials, xn), for instance y = β0 + β1x + β2x2 + ε, (where ε is normally distributed noise). Gaussian Processes describe the basis functions implicitly instead, with y ∼ 𝒩(μ, K); that is, a set y of N observations is distributed according to a multivariate normal distribution with mean given by the vector μ (of size N) and covariance between the values of x given by the matrix K (with dimension N × N). The entries of this matrix in row i, column j are defined by some covariance function such that kij = cov(xi, xj) – if the covariance function is linear in the values of x, for instance, the prediction for y is a straight line similar to y = β0 + β1x. Formulating the model in terms of function-space enables the use of flexible sets of basis functions; this approach of only implicitly describing a basis function, thus avoiding specification of a potentially large basis is called the “kernel trick”. Function like the commonly used squared exponential kernel can be shown to be equivalent to an infinite number of basis functions (Rasmussen and Williams, 2006), and therefore cannot be incorporated in the explicit terms of the weight-space formulation.
While Gaussian Processes are a classic formulation in statistics, the recent surge in machine learning applications has popularized its use in the natural sciences. They have been used to analyze gene expression by using their flexible output in combination with ordinary differential equations put (Honkela et al., 2010; Äijö et al., 2013; Aalto et al., 2020), with clustering approaches (McDowell et al., 2018), within other regression models (Kontio and Sillanpää, 2019), or modeling spatial covariance (Arnol et al., 2019). In the context of our experimental design Gaussian Process Regression could be used as a flexible alternative to GLMs, with each selection scheme having a different mean function μsel and a squared exponential covariance function where x takes the values of the generations in our experiment. The exponentiated term gives the correlation c(x, x′) between a pair of time points, with parameter ℓ modulating the correlation level given a distance r = x−x′, and being the signal variance of the data. Under this model, unlike with the GLM analysis, the change in RNA-seq counts is a function not of slope coefficients but of the signal variance . It is worth noting that the signal variance is a scalar constant for all terms in the covariance matrix, so it can also be written as , where C is analogous to K but with correlations instead of covariances, a notation that will be useful shortly.
Multi-channel Gaussian Processes
Despite the extensive use of Gaussian Processes, most applications in the life sciences have been restricted to single-channel GPs; that is, models that only describe one set of observations at a time (here the expression time series for a single gene). These models – in this aspect not unlike GLMs – describe expression of genes independently, i.e. they implicitly assume genes do not interact in any way. Gaussian Processes can however be extended to include covariance between two or more sets of observations, a formulation that seems to be underexploited in the biological literature (but see Velten et al. (2020) and Bahg et al. (2020)). The different dependent variables yi are sometimes called channels or tasks, and the resulting model is called a multi-task or multi-channel Gaussian Process. The details of the specification of this model can be found in Bonilla et al. (2008) and Melkumyan and Ramos (2011), which we summarize below. For an array of two genes only, for instance, instead of describing each vector y1 and y2 separately as multivariate gaussians of dimension N1 and N2, respectively, the concatenated vector [y1 y2]T with N1+N2 observations can be modeled as a single multivariate gaussian with a covariance matrix of K dimensions (N1+N2)×(N1+N2), or [y1 y2]T ∼ 𝒩(μ, K). The diagonal blocks of the covariance matrix with dimensions N1 × N1 and N2 × N2 are the same as above, and the off-diagonal blocks of dimensions N2 × N1 and N1 × N2 specify the correlations between the two points ij from channels 1 and 2 (Melkumyan and Ramos, 2011). Finally, the signal variance for each of those blocks need to be specified, and the final matrix is given by (Bonilla et al., 2008), and the mean of the multivariate gaussian is specified by a concatenated vector μ = [μ1 μ2]T. The number of parameters is reduced by recognizing that the covariance matrix is symmetric so in this example , where we also dropped the subscript f. For this model, the variation in the RNA levels of say gene 1 is a function not only of , but also of . Therefore, fitting the data with this model infers interaction between genes from scratch without any external information not contained in the array of RNA-seq counts.
The model can be extended to any number of genes, although computational requirements for performing the necessary matrix operations on K also grow with its size and may be limiting – the computational and mathematical limitations of this approach are discussed in the appendix.
Bayesian MCMC inference of Gaussian Processes
Analogously to GLM models, we maintain the negative binomial likelihood for the Gaussian Process inference, but unlike the transition between linear models and their generalized versions, the incorporation of non-gaussian likelihoods is not as straightforward, and requires methods to approximate the underlying latent Gaussian Process model, leading to what is sometimes referred to as Gaussian Process Classification (Rasmussen and Williams, 2006). Because of the Bayesian inference implemented for this model we chose to infer the latent function via Markov Chain Monte Carlo sampling as these variables can be estimated jointly with the other parameters and have priors that by design are standard gaussian, and therefore are straightforward to specify. Table 2 gives the description of all parameters in the Multi-Channel Gaussian Process model and their priors.
The number of covariance parameters in a multi-channel Gaussian Process model with M channels is (M 2 − M)/2, and the total number of parameters scales roughly as 𝒪(M 2) as the number of channels becomes large. For 100 genes, for instance, that would result in about 5,000 covariances. Due to the statistical challenge of exploring a parameter space with a dimension of several thousand, as well the computational demand of factorizing a large matrix at each MCMC step, the estimation of the signal covariance parameters between genes was not performed jointly. Instead, each pair of genes was fitted separately, with a single-channel Gaussian Process being first used to estimate the signal variance and bandwidth parameters for each gene and this estimate being used as a prior for the (pairwise) joint inference. This procedure effectively breaks down a Gaussian Process inference of any size into several smaller inference problems requiring factorization of a matrix of size 2N, with a total number of parameters of the order of N, which are computationally much more manageable and can be run in parallel. Because the covariance parameters depend only on the relationship between two variables (here, genes), separate estimation does not affect inference of the parameters; in fact, it removes the constraint of positive-definiteness on the matrix of covariances of all genes (which instead applies to the matrix of two genes only, see Appendix I).
Eight parallel chains were run for each estimation with 40 thousand samples each; half were excluded as warm-up and 1 out of every 40 was kept for further calculations. Convergence was assessed using the metric and observing the number of effective samples (ESS) (Gelman et al., 2013). The annotated model implemented in the Stan probabilistic language is made available in the supplementary material. Because inference was done separately for each selection scheme, differences between them were assessed by comparing the posterior distribution of the parameters of interest.
Results
Phenotypic response to artificial selection
The selection procedure for night sleep was very effective. Long-sleeper and short-sleeper populations had significant differences in night sleep across all generations (PSel = 0.0003); in fact, night sleep was different for the two selection schemes for each generation considered separately except for generations 0 and 1 (Supplementary Tables S1 and S2). Both males and females responded equally to the selection procedure. Figure 1A shows the phenotypic response to 13 generations of selection for night sleep. At generation 13, the long-sleeper populations averaged 642.2 ± 3.83 and 667.8 ± 2.97 minutes of night sleep for Replicate 1 and Replicate 2, respectively. The short-sleeper populations averaged 104.3 ± 6.71 and 156.2 ± 8.76 minutes of night sleep for Replicate 1 and Replicate 2, respectively. The average difference between the long- and short-sleeper lines was 537.9 minutes for Replicate 1, and 511.6 minutes for Replicate 2. In contrast, the two control populations did not have differences in their night sleep after 13 generations of random mating (PGen = 0.7083; Supplementary Table S3). In the initial generation, night sleep was 519.6±10.57 minutes in the Replicate 1 control and 567.9 ± 7.63 minutes in the Replicate 2 control. At generation 13, night sleep was 563.4 ± 7.62 and 542.3 ± 7.91 in Replicates 1 and 2, respectively, a difference of only 43.8 and 25.6 minutes. These negligible changes in night sleep in the control population suggest that there is little inbreeding depression occurred over the course of the experiment (Falconer and Mackay, 1996). Selection was asymmetric, with a greater phenotypic response in the direction of reduced night sleep. Note also that night sleep is bounded from 0 to 720 minutes, and the initial generation had 515.39 minutes of night sleep on average across all populations, a fairly long night sleep phenotype. This high initial sleep may explain why the response to selection for short night sleep was more effective. Night sleep is sexually dimorphic (Harbison and Sehgal, 2008; Harbison et al., 2009, 2013); yet both males and females responded to the selection protocol equally (PSel×Sex = 0.9492; Supplementary Table S1). Thus, we constructed a set of selection populations with nearly 9 hours difference in night sleep.
In an artificial selection experiment, some amount of inbreeding will necessarily take place. Only a subset of the animals are selected each generation as parents; thus phenotypic variance is expected to decrease as selection proceeds (Falconer and Mackay, 1996).
However, this is not the case for all artificial selection experiments (Falconer and Mackay, 1996). We calculated the coefficient of environmental variation (CVE) (Mackay and Lyman, 2005) and evaluated its trajectory across time in order to determine whether the populations were becoming more or less variable over time. As Figure 1B shows, night sleep CVE increased over time in the short sleepers, and decreased over time in the long sleepers (P < 0.0001; Table S4). The increase in CVE in short sleepers was largely due to a decrease in the population mean as the standard deviation also decreased over time, indicating that the phenotypic variance decreased (Figure S2). Likewise, the standard deviation decreased in the long sleepers over time, even as the mean night sleep increased, indicating decreased variability in these populations as well. These changes in CVE mimic previous observations in populations artificially selected for sleep (Harbison et al., 2017). Regressions of the cumulated response on the cumulated selection differential were used to estimate heritability (h2). Long-sleeper population h2 (±SE of the coefficient of regression) were estimated as 0.145 ± 0.021 and 0.141 ± 0.014 (all P < 0.0001) for Replicates 1 and 2, respectively (Figure 1C); short-sleeper population h2 were 0.0169 ± 0.013 and 0.183 ± 0.019 (all P < 0.0001) for Replicates 1 and 2 (Figure 1D). In contrast, estimated regression coefficients for the control population were non-significant and with high standard errors associated to the regression estimates: 0.405 ± 0.695 (P = 0.57) and −0.078 ± 0.487 (P = 0.88) for Replicates 1 and 2, respectively (Figure 1E).
Correlated response of other sleep traits to selection for night sleep
Traits that are genetically correlated with night sleep might also respond to selection for long or short night sleep (Falconer and Mackay, 1996). Indeed, some sleep and activity traits have been previously shown to be phenotypically and genetically correlated (Harbison and Sehgal, 2008; Har-bison et al., 2009, 2013). We examined the other sleep and activity traits for evidence of a correlated response to selection. Night and day average bout length (P = 0.0008 and P = 0.0391, respectively) and sleep latency (P = 0.0023) exhibited a correlated response to selection for night sleep across generations 0 − 13, while night and day bout number, day sleep, and waking activity did not (Figure S2; Supplementary Table S1). In the case of day average bout length, the correlated response was sex-specific to males (P = 0.0140) (Supplementary Table S1). Significant correlated responses for night and day average bout length and sleep latency did not occur in all generations (Supplementary Table S2).
Night average bout length responded to selection for night sleep in most generations, while day average bout length responded in only four of the last six generations. Sleep latency responded to selection after the second generation. In addition, we observed significant differences between the long-sleeping and short-sleeping populations for the CVE of all sleep traits except waking activity CVE (Figure S2; Table S4). However, the pattern of the CVE for each trait appeared to be more random across time.
Phenotypes in flies used for RNA-Seq
Every generation, we harvested RNA from flies chosen at random from the 200 measured for sleep in each selection population, with the exception of the flies chosen as parents for the next generation. We extracted RNA from two replicates of 10 flies each per sex and selection population. Since these flies amount to only 20% of the flies measured for sleep each generation, their sleep may or may not be representative of the group as a whole. We therefore correlated the mean night sleep for each generation in the flies harvested for RNA with the mean night sleep of all flies measured to determine how similar night sleep was to the total in the group (Figure S3). The correlations were very high for the selected populations: long-sleeper flies harvested for RNA were very well correlated with the total measured in each population [r2 = 0.99 and 0.96 (all P < 0.0001) for Replicate 1 and 2 respectively], as were short-sleepers [r2 = 0.99 for Replicate 1 and 0.97 for Replicate 2 (all P < 0.0001)]. The control populations, which did not undergo selection, were somewhat less well correlated. Replicate 1 of the control population had an r2 of 0.75 (P = 0.0001) and Replicate 2 had an r2 of 0.85 (P < 0.0001). Thus, the flies harvested for RNA are very good representatives of each population as a whole.
Hierarchical Generalized Linear Model analysis reveals that selection for night sleep impacts gene expression
For each gene, the linear model analysis produced posterior distributions for the parameters as well as log-likelihood values for the full and reduced models. Point estimates (MAP) are shown in Table S5 and S6 (for females and males, respectively). For the male flies 11,778 genes passed the filtering for low expression, of which 405 were found to have a significant selection scheme effect over the generations of artificial selection (i.e., significant likelihood ratio test for the sel× gen term). Thus, the expression level shift given by the slope of the generalized linear model is different from controls and attributable to selection for long and/or short sleep. For the females 820 genes out of 9,370 with detectable expression were found to be significant. Genes with opposite trends in the short and long selection schemes were compared using the group-level parameter μshort× gen and μlong× gen (i.e. the effect that best explains both replicates): 204 genes in the males and 384 in females showed opposite trends by that criterion. Table S7 and S8 list those genes for females and males, respectively. Between males and females, 85 genes were common to both sexes. Known functions of these 85 genes from the DAVID gene ontology database are presented in Table S9. We used these 85 genes in subsequent analyses; see below. Figure 2 shows the fit for one gene.
Pairwise Spearman correlation is non-specific and significant for a large fraction of genes
We computed Spearman correlations for all pairwise combinations of the 85 genes common between sexes (Supplementary Table S10). Correlations computed using the Spearman method were found to be significant at 95% confidence for 2,999 of the 3,570 possible pairs. The confidence intervals for the correlations coefficients showed no overlap with controls for either short sleepers, long sleepers, or both populations in 1,348 of 3,570 pairs. Thus, a simple correlational analysis identifies a minimum of 38% of the possible interactions among genes as relevant.
Gaussian Process model analysis uncovers nonlinear trends and specifically identifies covariance in expression between genes
As noted above, a simple correlational analysis suggested that large numbers of genes are potentially interacting to alter sleep. Because direct computation of linear model-based correlations cannot account for non-linear effects or spurious confounding trends we fit Gaussian Process models that can account for temporal variation in multiple genes even in the absence of actual interactions between them. The 85 significant genes overlapping between males and females potentially have 3,570 pairwise interactions. To that end, the parameter of interest in the Gaussian Process model is the signal covariance between each pair of genes. This covariance is a measure of the degree of their interaction. We applied the Gaussian Process model for each of the 3,570 pairs for each selection scheme (long, short, and control). As an example, the model fit for one pair of genes from the female gene expression data is shown in Figure 3.
Convergence for all three runs was on the order of , and close to the 4,000 samples expected for each run; therefore, the wide confidence intervals are likely a product of the large dispersion in the data itself. Correlation between gene expression patterns of the two genes is computed by dividing the signal covariance by the square root of the signal variance of each gene – e.g. – that is, similar to computing a correlation coefficient from variances and covariances, but taken as the expectation over the posterior distribution obtained from MCMC.
Figure 3 illustrates the nonlinear trajectories of gene expression that cannot be detected by the GLM model. The two trajectories exhibited high signal covariance between the expression of the two genes in the long sleepers (ρl = 0.89) that was significantly different from controls; however, intermediate covariance in the short sleepers (ρs = 0.53) did overlap with that of controls, and therefore was not significantly different.
Figure 3 - supplement 1 shows a pair where interactions in both short and long selection schemes are different from controls, Figure 3 - supplement 2 shows another pair of genes where neither scheme is different from controls. This illustrates a range of possibilities, including a case where Spearman correlations are significant but GP correlations are not (the opposite also occurs). Figure 3 - supplements 3 and 4 fit each gene individually, and the fit does not change substantially between single to multiple channel models.
The 85 single-channel fits were good despite varying levels of dispersion and occasional outliers, indicating no issues with the Gaussian Processes ability to fit the temporal patterns of any one gene. For the two-channel inference, upwards of 90% of the chains initially converged under the criterion that ; because the inference method is stochastic it is expected that by chance some chains may not converge and/or mix well with their replicates. Chains that initially failed were rerun up to two times. After three runs over 99% of the chains converged; the reasons for lack of convergence of the remaining were not investigated further. Figure 4 shows six heat maps (one for each sex and selection scheme combination) with the correlations for all pairs of genes calculated as described in the previous figure, summarizing the inferred interactions. Of the 3,570 correlations, 1,612 were greater than 0.5 and 98 greater than 0.9.
In addition to computing expected values, the posterior distributions were used to compare the signal covariances between selection schemes and set a cutoff. Distributions of the parameter for each sex-selection scheme were assembled from the parallel MCMC runs; 145 gene pairs in the selected populations are found to be different from controls (i.e. do not overlap with them at 95% credibility for either short, long or both populations). Out of the 145, twelve gene pairs were common to between males and females selected for long night sleep and one pair to males and females selected for short sleep; one gene pair was common to females in both selection schemes, and three pairs were common to males. Table S10 shows the expected values of signal covariances normalized by the variances for all two-way interactions side by side with the Spearman correlations. Table S11 shows the subset of significant Gaussian Processes correlations.
We constructed a network for each sex/selection scheme combination based on the magnitude of the correlation between genes. The network for males selected for long sleep having significant gene interactions is shown in Figure 5 (supplements 1-3 show the networks for the remaining three sex-selection scheme combinations).
For comparison, looking at significant (ρsel ≠ 0) Spearman correlations keeps almost three thousand interactions (i.e. excludes just a bit more than a tenth of the genes), and comparing the distributions ρsel versus ρcontrol – similar to how the Gaussian Processes are compared – still has over thirteen hundred. Therefore, computing correlations between genes using covariance estimates from the Gaussian Processes greatly increases specificity over direct correlations. Furthermore, the Gaussian Processes are not only more specific but more sensitive in finding 68 gene pairs that are not found to be significant by the first Spearman approach and 18 not found by the second.
Finally, we examined known interactions between the 85 genes and any other genes using the Drosophila Interaction Database, DroID (Murali et al., 2011). We found 2,830 interactions; 8 of these were one of the 3,570 between the 85 genes, but none of them overlapped with the 145 gene pairs found to be different from controls. The gene interactions we observed may therefore be unique to extreme sleep.
Discussion
We have shown that robust, reproducible phenotypic changes in Drosophila melanogaster sleep are associated with hundreds (405 in males, 820 in females) of individual shifts in gene expression – and as a consequence hundreds of thousands of potential combinations [ and ]. Nevertheless, unique interactions important to the phenotypes are a comparatively small number (145 out of possible combinations of the 85 genes common to males and females). We have also shown that these interactions cannot be found with linear model analyses or conventional correlation calculations only, but are specifically identified using a combination of an informative experimental design with densely-sampled time points to generate a large scale data set, and a nonparametric, nonlinear model-based approach that explicitly accounts for covariance in gene expression. That complex traits can be mostly explained by additive effects of individual genes (and their expression) is a common and sometimes useful assumption. While it underpins preliminary analyses that allow whole-transcriptome data to be understood, it eliminates the ability to infer interactions between them from the data and stops short from identifying relevant processes. Complex traits involve multiple genes, and the actual interactions giving rise to phenotypes are likely to be highly nonlinear (Mackay, 2014). These nonlinearities are not a mathematical construct, but a biological reality arising from chemical kinetics. Favoring approaches that account for these features will not only increase statistical power, but understanding of actual biological mechanisms beyond simple network representations of gene expression (DiFrisco and Jaeger, 2020).
In most correlation and information-theory based methods the dimension (e.g. time or space) across which samples covary is only implicit (Emmert-Streib et al., 2012); the only possible conclusion from a significant correlation between two sets of observations is that one may have an effect on the other – i.e. the data alone does not allow the distinction between actual interactions and spurious correlation. Bioinformatic pipelines that have correlation as their starting point – in addition to carrying over its limitations – are not straightforwardly comparable to our approach (see Appendix 1). In the context of Gaussian Processes, correlation between all pairs of data points – including within the same time series, i.e. autocorrelation – is explicit in time (or other dimension), so similar trends do not necessarily imply covariance between the sets of observations. Therefore, on the one hand GPs are a nonparametric method that requires no more biological knowledge than that for computing a linear correlation; on the other hand, while not an explicit description of dynamic biological processes, it is also a model-based approach that can be used within more mechanistic formalisms like differential equations (Äijö et al., 2013), or potentially be used to formulate specific hypotheses and build mechanistic models.
Although somewhat self-evident, it is important to highlight the fact that to describe correlations along time, multiple time points are needed – put another way, the use of a nonlinear model requires enough resolution in the data that the trajectory can be identified. To that end, a single high-resolution, large data set with a specific design, like the one generated in this work, will be more useful than several small data sets, for instance with only initial and final time points and allowing only two-sample linear comparison. Gene expression measured at the terminal generation of selection and compared among selected and control groups does identify candidate genes (Pegoraro et al., 2020; Brown et al., 2017; Mackay et al., 2005; Wertheim et al., 2011; Sørensen et al., 2007; Morozova et al., 2007; Edwards et al., 2006), but the relationship between pairs of genes is lost. Some studies evaluated gene expression during the last 2-3 generations of selection (Telonis-Scott et al., 2009; Garlapow et al., 2017); however, the additional sampling was used to confirm consistency rather than change across time. Our approach of sampling over time enabled us to derive interactions between genes and demonstrated that unique gene expression network profiles develop in long sleepers as compared to short sleepers.
When employing methods of increasing complexity or sophistication there is always the question of how relevant the inference is or, in other words, how “real” are the parameters or processes in the model. This pursuit of simplicity may favor the use of methods based on linear models as more palpable approaches and less prone to arbitrary assumptions about how the parameters are put together; however, it is important to realize that linear coefficients are no more real than those of any other model. On the contrary, biological processes are not restricted by our ability to comprehend them. Therefore, what may seem as an Occam’s Razor-like simplicity will probably hinder accurate description of nature. Systems-level understanding of complex biology requires not only more and more detailed data, but better descriptions of the processes and methodology that captures higher-order phenomena. Equivalently, experimental validation of these phenomena will be more technically challenging to accomplish. Despite the additional difficulties, it must be recognized that methods that cannot possibly match the complexity of nature are doomed to scratch all over the surface without realizing a deeper understanding.
Author Contributions
Conceptualization: C.S.-M., S.T.H.; Investigation: C.S.-M., Y.L.S.N., Y.L. Data curation and formal analysis: C.S.-M., Y.L., S.T.H. Writing: C.S.-M., S.T.H.
Data Availability
All RNA-Seq data from this study are available from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) under the accession number GSE—–.
Competing Interests
The authors have no competing interests to declare.
Supplemental Information
Table S1. Quantitative genetics of the response to selection for long or short night sleep and related sleep parameters. For each trait, the ANOVA analysis results are presented. Source indicates each factor in the model. gen, generation; rep, replicate; sel, selection; d.f., degrees of freedom; M.S., Type III mean squares; F, F ratio statistic; P, P −value.
Table S2. Quantitative genetics of the response to selection for long or short night sleep per generation. For each sleep trait, the ANOVA analysis results are presented for each generation. Source indicates each factor in the model. rep, replicate; sel, selection; d.f., degrees of freedom; M.S., Type III mean squares; F, F ratio statistic; P, P -value.
Table S3. Quantitative genetics of control populations. For each sleep trait, the ANOVA analysis results are presented. gen, generation; rep, replicate; sel, selection; d.f., degrees of freedom; MS, Type III mean squares; F, F ratio statistic; P, P -value.
Table S4. Correlated response of sleep trait coefficient of environmental variance (CVE) to selection for long or short night sleep duration. For each sleep trait listed, the ANOVA results are presented. d.f., degrees of freedom; M.S., Type III mean squares; F, F ratio statistic; P, P -value.
Table S5. GLM analysis results for each gene in females are shown as a row; the Maximum a Posteriori (MAP) parameter estimates and log-likelihoods are shown as well as p-values computed from the likelihood ratio test. Significance statistics corrected for multiple testing are also included, as well as the normalized counts for all samples.
Table S6. GLM analysis results for each gene in males are shown as a row; the Maximum a Posteriori (MAP) parameter estimates and log-likelihoods are shown as well as p-values computed from the likelihood ratio test. Significance statistics corrected for multiple testing are also included, as well as the normalized counts for all samples.
Table S7. Genes with opposite slopes for the short and long interaction terms of generation in females
Table S8. Genes with opposite slopes for the short and long interaction terms of generation in males
Table S9. Gene Ontology analysis results for 85 significant genes common to males and females.
Table S10. Correlations obtained from normalizing Gaussian Process signal covariances (GP correlation) and from Spearman Correlation for each of the six sex, selection scheme combinations
Table S11. Expected values for the correlations obtained from normalizing Gaussian Process signal covariances (GP correlation) not overlapping with controls for each of the six sex, selection scheme combinations (value missing if overlapping in that condition)
Acknowledgments
We thank the members of the NISC Consortium for sequence data and helpful discussions. This work used the computational resources of the National Institutes of Health High-Performance Computing Biowulf cluster (http://hpc.nih.gov). This research was supported by the Intramural Research Program of the National Institutes of Health, the National Heart Lung and Blood Institute.