Abstract
Lifespan is a heritable trait with a polygenic architecture. Experimental evolution in combination with re-sequencing has often been used to identify candidate loci for lifespan in Drosophila melanogaster. Previous experiments showed that Drosophila populations experimentally evolved to increase late-life reproduction showed a correlated responses in development time, body size, but also lifespan. Subsequent whole genome sequencing allowed for the identification of candidate loci that correlated to lifespan differentiation. However, it remains difficult to assess whether candidate loci affect lifespan and to what extent such loci pleiotropically underpin multiple traits. Furthermore, recent studies indicate that lifespan effects of loci are often context dependent, but genotype-by-genotype interactions remain understudied. Therefore, here, we report on a study where we genotyped 3210 individuals for 32 candidate loci that emerged from our evolve and re-sequence experiment and tested, (1) whether these loci significantly affected lifespan, (2) the effect size of each locus, and, (3) how these loci mutually interact, i.e. determine the level of epistasis in moulding lifespan. Of the 32 loci, six showed significant main effect associations, of which three loci showed effects of 6.6 days difference in lifespan or larger, while the overall average lifespan was 41.7 days. Eight additional significant pairwise interactions between loci were found, of which four (single) main effects and one three-way interaction was significant. Lastly, we found that alleles that increased lifespan did not necessarily have higher frequencies in populations that showed increased lifespan, indicating that lifespan itself had not been the major target of selection. Our study indicates that individual genotyping following an evolve and re-sequencing study is essential to understand the mechanistic basis of polygenetic adaptation.
Introduction
Understanding how genetic variation is generated, maintained, and altered in natural populations is a main theme in evolutionary biology. Of great interest is to explain the mechanistic genetic basis of the evolution of quantitative traits that are associated to fitness, such as survival. Lifespan, as a measure of survival, is a complex trait with a heritability ranging between 0.1 to 0.4 (Roff & Mousseau, 1987, Lehtovaara et al. 2013). In Drosophila several association studies using natural standing genetic variation have been performed to elucidate the genetic architecture of lifespan. Overall, three observations stand out from such studies. Firstly, while initially relatively low numbers of natural variants have been found, increased experimental and statistical power allowed for the identification of many more loci as candidates affecting lifespan (Huang et al. 2020, Pallares et al. 2022, Wong & Holman, 2023). Secondly, the effects of such variants typically are dependent on sex, environment, and other loci (Nuzhdin et al. 1997, Leips and Mackay 2002, Pallares et al, 2022) which in combination can show three-way interactions. Thirdly, only a small proportion of the phenotypic variance is explained by the uncovered allelic variation (Mackay 2002), as is the case for complex trait such as body size, for which many loci explain only a proportion of the phenotypic variation (Yang et al. 2010). Many complex traits seem to follow the theoretical expectation from Fisher’s infinitesimal model (Barton et al. 2017), which states that many loci of small effect underpin the quantitative variation of traits.
Lifespan is a life history trait for which trade-offs with fecundity are expected (Williams 1957, Kirkwood 1977), but also other traits are hypothesized to determine its phenotypic value in natural populations (Roff 1992, Stearns 1992). Furthermore, next to loci that underpin trade-offs, deleterious mutations can accumulate over generations, especially when expressed late in life where natural selection against such mutations is very weak (Medawar 1952). To test which other traits are genetically correlated to lifespan in Drosophila, selection for increased lifespan or postponed reproduction has often been employed in the past. This consistently led to an increased lifespan (Luckinbill et al. 1984, Rose 1984, Zwaan et al. 1995, Partridge et al. 1999, Stearns et al. 2000, May et al. 2019). In a number of these studies the genomes of the evolved populations were re-sequenced (Remolina et al. 2012, Carnes et al. 2015, Fabian et al. 2018, Hoedjes et al. 2019) which resulted in the identification of hundreds of loci that are candidates for lifespan regulation. However, these selection experiments also resulted in divergence for other traits, such as development time, body size, age-dependent fecundity and lifespan. Therefore, a key question remains whether the identified candidate loci in these evolve and re-sequence experiments directly affect lifespan.
To furher elucidate the genetic underpinning of lifespan of these candidate loci we associated variation at 32 candidate loci with lifespan in Drosophila melanogaster. Previously, we performed selection on late reproduction which resulted in increased lifespan, but also in development time, and body size (May et al. 2019) compared to populations selected on early reproduction (control populations). Using whole genome pooled resequencing we identified a high number of loci to respond to the two evolution treatments (Hoedjes et al. 2019). This previous analysis allows us to choose among 298 candidate loci that were consistently differentiated between early and late reproduction populations. This allowed us to sieve out candidates that are potentially causal to the lifespan differences observed between the diverged populations. By associating lifespan to the genotypes of 3210 individuals, we tested (1) whether they were signifcantly associated with lifespan, (2) how large their effects were and (3) how these loci mutually interacted in moulding lifespan. By measuring these effects we can make the first step to further understand genetic architecture of lifespan in natural populations, which is an important quantitative trait in the the evolution life histories.
Method
Fly populations, culture conditions
Drosophila melanogaster flies used in this study are part of a continuously selected set of populations that combines selection for differential larval food concentrations and age at reproduction (early or late) in a factorial design (May et al. 2019). A detailed description of the experimental design can be found in May et al. (2019). The six founding populations with which the experimental evolution was started were caught in a traject between Austria and Greece and crossed for 40 generations to allow for laboratory evolution and recombination between ancestral populations to decrease linkage disequilibrium (May et al. 2019). In the current experiment we focussed on loci that significantly differed in allele frequency between populations selected for early and late reproduction (Hoedjes et al. 2019). To maximize the probability of finding loci that affect lifespan in a population, we used individuals from one of the 24 selected populations, of which replicate #3 of the experimental regime “1x larval diet and early reproduction” (1-E3) was chosen. This regime was chosen as it can be considered the standard for laboratory propagation of D. melanogaster populations, and thus may serve as a control that can be compared to the late reproduction regime. Moreover, replicate #3 was chosen because when compared to other populations more candidate loci segregated at frequencies relatively close to 50% (see below for details), allowing for more even genotype distributions and thereby increasing the power of the study. Furthermore this allowed us to detect gene by gene interactions. Prior to the experiment, replicate #3 underwent the early reproduction experimental evolution treatment for 179 generations (May et al. 2019).
All experiments were performed at 25°C and 50% humidity on a 12-h light:12-h dark cycle. All flies were reared in standard density culture (42 µL of eggs, Clancy and Kennington 2001) on either 1x medium containing per litre 100 g sucrose, 70 g dried yeast (Fermipan): 20 g agar (Sigma), 15 ml nipagin and 3 ml proprionic acid, or 0.25x medium (25g sucrose, 17,5 g dried yeast, other components similar).
Prior to the experiment, flies were kept at 1x medium for two generations under standardized larval densities (42 µL of eggs, Clancy and Kennington 2001) removing potential maternal effects due to the selection regime. Then eggs were harvested on a petri dish with 1x medium. Initially, interactive effects between larval diet and genetic variation on lifespan were of interest to us as also a large evolutionary response was measured between flies evolved on different larval conditions. Therefore, a controlled number of eggs (42 µL of eggs, Clancy and Kennington 2001) were put on 0.25x and 1x diet in four replicates each in 65mL population bottles.
Experimental setup
All flies were allowed to fully develop into adults and kept in mixed sex groups and emerged flies were kept in these conditions for at least a day to allow for mating. Then flies were pooled among four replicate bottles per larval diet, sedated with CO2 and female flies were randomly allotted to 34 half-pint bottles containing fresh (1x) medium 30 mL to a total of 100 females per bottle. After set up, flies were transferred to new bottles every two or three days. Dead flies were counted, removed, and individually frozen at -80°C until further analysis. Some flies escaped or were accidentally crushed during transfers from which no lifespan association could be made. In total we measured lifespan for 3210 individuals.
Candidate loci
All flies in the lifespan assay were individually genotyped for 32 SNPs. These SNPs were chosen from our previous whole genome resequencing experiment (Hoedjes et al. 2019) based on the following criteria. In our previous analysis we identified 298 SNPs with significant allele frequency differences based on the GLM statistic (Hoedjes et al. 2019) and consistent differences between all early and late reproduction populations. In this paper, we calculated the average SNP allele frequency difference between populations selected for early and late reproduction. The candidate had to have at least a 0.3 difference in allele frequency (between 1x diet early reproduction and 1x late reproduction) resulting in 254 candidates. Secondly, SNPs were annotated whether they were found in coding regions and how many SNPs were found per gene. Genes with more than one significant SNPs were prioritized over those with only a single significant SNP. Thirdly, we used the variant annotation and prioritized according to non-synonymous > synonymous > intron variant / promotor region. If based on these criteria candidate loci had equal priority, those with the largest allele frequency differences were chosen. For each gene, only the highest differentiated SNP per gene was selected. This procedure resulted in 29 candidate SNPs (Table S1). This is therefore a 10% sample of all significant loci initially identified as those that consistently differ between early and late reproduction populations (Hoedjes et al. 2019). Additionally, three SNPs in the genes hppy, Eip75B, and sick were included in our analysis because these genes have been implicated to functionally affect life history traits including lifespan (Bryk et al. 2010, Lam et al. 2010, Huang et al. 2020, Parker et al. 2020, Hoedjes et al. 2023). However, the SNPs in these genes were less divergent in allele frequency than all other loci. The resulting list of candidates and their properties are listed in Table S1.
DNA isolation, primer design, and KASP
To extract DNA, each sampled fly was put in a tube and grounded in 100 ul Squishing buffer (0.1 M Tris HCl pH 8, 1 mM EDTA, 0.25 M NaCl, 1 µg/ul Proteinase K), incubated for 45 minutes at 55°C followed by two minutes at 95°C to inactivate Proteinase K. The extract was centrifuged for three minutes at 13000 rpm and after centrifugation 50ul of the supernatant was transferred to a clean tube. The DNA concentration of each sample was normalized to 5 ng/ul per sample. Samples were stored in the freezer until use. All samples were genotyped using a Kompetitive Allele Specific PCR (KASP) genotyping assay (Semagn et al. 2014). In a KASP assay two allele-specific forward primers are designed that carry a tail which is labelled with a dye, FAM or HEX, by which for each candidate locus the genotype can be detected for each individual by relative dye detection in a qPCR machine. For 32 SNP loci, KASP primers were designed using the Kraken software (http://www.lgcgenomics.com) and ordered at Integrated DNA Technologies (Supplement File 1).
DNAs were further diluted to 1 ng/µL and analyzed on the LGC genomics SNP genotyping line according the manufactures’ instructions. Genotypes were automatically called using the Kraken software and manually adapted when necessary (Semagn et al. 2014).
Statistical analysis
All analysis were performed using the R software (v3.6.1, R core Team 2019).
Population genetics
First, we tested whether genotype frequencies deviated from Hardy-Weinberg equilibrium with a Chi-square test. Furthermore, we estimated the linkage disequilibrium between all pairwise SNPs using the LD() function from the genetics package (Warnes 2003). We used the r2 statistic for LD and the p-value to test whether this statistic was significantly different from 0. The p-values were adjusted for multiple testing using p.adjust() with the ‘BH’ method (Benjamini and Hochberg 1995).
Main effects
We fitted a linear mixed effects model for lifespan with genotypes of a particular SNP and larval condition as main effects and replicate bottle as random effect using lmer (lme4 package, Bates et al. 2015) using a normal distribution as error structure. Then we fitted a model without genotype effects. The difference between these models indicates the significance of genotype for each locus, and the significance of this difference was tested using the anova() method, using ‘chisq-test’. The p-values were adjusted for multiple testing using p.adjust() with the ‘BH’ method (Benjamini and Hochberg 1995). All adjusted p-values below 0.05 were considered significant. The variance explained was calculated using the r.squaredGLMM(model) function of the (MuMIn package, v 1.46.0, Barton 2022) where ‘model’ was an above-described lmer model containing only the fixed effects for which the variance needed to be explained.
Test for genotype-by-genotype interactions
To test for genotype-by-genotype interactions we fitted a model with lifespan as the dependent variable and genotype effects of SNP1, genotype of SNP2, and larval condition as main effects, including interaction (I1). Next to this, we fitted a model without interaction (I2). Similarly, as with the main effects, we tested for the differences using the anova() method, using ‘chisq-test’. The p-values were adjusted for multiple testing using p.adjust() with the ‘BH’ method (Benjamini and Hochberg 1995). All adjusted p-values below 0.05 were considered significant. We only tested for interactions between genotypes when a minimum of five individuals was counted for each pairwise genotype combination for both genes in each larval condition. In total, this led to 366 tests. Similarly, we tested all three-way interactions for which at least five individuals were present for each multilocus genotype, amounting to a total of 1758 tests. For three-way interactions, the models with three-way interactions were compared to the models with three two-way interactions, and p values were adjusted as described above.
Results
Allele frequencies, HW equilibrium
The minor allele frequency of only one SNP was below 0.1 and for three additional SNPs between 0.1 – 0.2. Six loci showed a significant deviation from Hardy Weinberg equilibrium (p.adj<0.05, Table S2). The reference allele frequencies of the current study (generation 179) were significantly and highly correlated to those of the previous pooled resequencing experiment (Hoedjes et al. 2019) generation 115, Pearson r =0.87, t30=9.87, p<0.001, Figure S1), indicating that the allele frequencies from this study matched well with our previous whole genome sequencing experiment.
Consistencies with whole genome sequencing
To study the similarity of our genotyping with our previous whole genome sequencing analysis, we calculated the correlation distance of allele frequencies for the 32 candidate genes with all other populations in the entire selection experiment (May et al. 2019). The current study was performed on the 1x diet early reproduction replicate #3 (1-E3). Using correlation distances of this study with all 24 populations, the allele frequencies of this study clustered most closely together with (1-E3) sequenced in generation 115 (Figure S1). This indicates not only that our genotyping has led to reproducible results but also that the population changes between these generations have been relatively small as compared to the differences between populations.
Linkage disequilibrium
Linkage disequilibrium was estimated using the r2 statistic. As expected, r2 was higher for loci that were sampled in closer physical proximity (Figure S2). All combinations of SNPs that showed a r2 higher than 0.05 were found on the same chromosomal arms. Because 23 values of r2 were higher than 0.2, the linkage disequilibrium might be relevant for the independence of the association between genotypes and lifespan (Table S3).
Main effects on lifespan
Not a single locus showed a significant interaction with larval diet. Flies reared on 1x diet had an average lifespan of 38.4 days, while flies reared on 0.25x diet had an increased lifespan of 6.8 days (Figure S3). Therefore, larval diet as main effect is considered in the statistical models.
From the 32 loci for which we tested the main effect of genotype, seven loci showed significant genotype effects on lifespan (p.adj<0.05, Table S4). However, as high levels of LD were found between some of the significant loci, we tested whether the effects of these loci were independent. The model with mRps22 and CG9997 showed such a dependence, and after this analysis six independent loci remained significant (Figure 1), of which three loci showed very large effects. For instance, for the gene hppy, the lifespan difference between the shortest and longest living genotypes was on average 12.0 days. This was 7.4 days for Eip75B and 6.6 days for mRps22. The percentage of phenotypic variance explained by these variants alone was 10.3%, 6.4%, and 5.2% for hppy, Eip75B and mRps22, respectively (Table S5). For the three remaining loci, lifespan differences between genotypes fell between 1.1 and 2.3 days (Figure 1).
Interestingly, for both Eip75B and mRps22, one homozygote genotype stood out as having a relatively low lifespan, while the heterozygote had a similar lifespan as the other homozygote, indicating dominance for these alleles. For hppy the heterozygote was associated with the lowest lifespan, therefore showing underdominance. Furthermore, allele frequencies from five out of the six loci that significantly affected lifespan deviated from Hardy-Weinberg equilibrium, while from the other 26 loci (i.e. not significantly associated with a lifespan increase), only one showed a deviation from Hardy-Weinberg equilibrium. This result was significantly different from a random association (Fisher exact = 79.43, p<0.05). Thus, loci that are significantly associated with lifespan were more likely to be out of Hardy-Weinberg equilibrium.
Gene-by-gene interaction
We tested whether the pairwise interactions between loci were significant (as compared to both main effects). Indeed, eight pairwise interactions were significant (i.e. taking the level of p.adj <0.05, Table 1). The three loci with the largest main effects, hppy, Eip75B, and mRps22, also showed significant interactions (Figure 2). Furthermore, four SNPs that did not affect lifespan as main effects showed a significant genotype-by-genotype interaction. For instance, lifespan effects of mrfn and BOD1 interacted significantly with mRps22, and the effects of su(var)3-7 and CG33346 were dependent on the capu genotype (Figure 2). Therefore, testing for interactions increased the number of loci associated with lifespan. Furthermore, the addition of these interactive effects increased the explained phenotypic variation by 2.9% compared to when only main effects were considered. However, everysingle interaction had a relatively small effect, around or below 1% variance explained (Table 1). Lastly, one of the 1758 three-way interactions tested was significant. This was between loci on the genes hppy, liprin-gamma, and Noa36 (Figure S4).
Direction of lifespan effects
Our experimental evolution populations were selected on the ability to reproduce late in life, with early reproduction serving as a control, as this is the common fly-maintenance practice in most fly laboratories. As a correlated response, lifespan was increased in late reproduction populations (May et al. 2019). If selection on the experimental evolution had acted on alleles that only increased lifespan, it would be expected that these alleles would have increased in frequency in late reproduction populations. We would therefore expect that alleles that positively associate with lifespan in this experiment would have an increased allele frequency in late reproduction populations. Given the result shown in Figure 1, we expect the longer-lived populations to have a decreased frequency of the C, T, C, C, T, and A alleles for the genes hppy, Eip75B, mRps22, capu, Liprin-gamma, and CG4554, respectively. However, this expectation is only born out for hppy and CG4554, thus the minority of the loci (Figure S5). This indicates that direct selection on lifespan may not causally underpin the allelic differentiation between the populations selected for early and late reproduction.
Discussion
Selection for postponed reproduction in an outbred population of Drosophila melanogaster resulted in extended lifespan (May et al. 2019) and subsequently, by using whole genome sequencing, candidate loci for lifespan have been identified (Hoedjes et al. 2019). Here, we report on a follow-up study of this ‘evolve and re-sequence’ project. We tested 32 candidate loci for their effects on lifespan in a large cohort (n=3210). From the 32 tested loci, we found 11 loci that were significantly associated with lifespan. With an average lifespan of 41.7 days over all genotypes and conditions combined, we identified three loci to show very large effects on lifespan (between 6.6 and 12 days), while for three additional loci we found significant albeit smaller effects (i.e. between 1.1 and 2.3 days). Unexpectedly, when fitted together in one model, the three loci with the largest effect explained together 15.5% of the phenotypic variation. On top of the main effects, an additional four loci were identified through pairwise interactions. Furthermore, a last locus was significant in a three-way interaction without showing a significant main effect or a two-way interaction. These results show that following up a ‘evolve and re-sequence’ project with a candidate gene approach is a successful method for identifying whether candidate loci are involved in lifespan regulation in a population harbouring natural variation. On the flip side, this also means that the other 21 loci were not attributed to variation in lifespan. This could result from too low power to detect small effects on lifespan. Alternatively, these loci could be involved in the regulation of other traits, such as fecundity, development time, and body size, which are also differentiated between early and late reproduction populations (May et al., 2019).
To understand how organisms evolve, knowledge of the genetic architecture of the traits under selection is critical. How many variants segregate in a population and their effects remains elusive for many traits. Predicting which variants respond to selection is difficult and becomes even harder when many loci segregate with small effects or when interactions between loci occur (epistasis). Although we found large effect loci (making it easier to predict what the genetic course of evolution would be), we also found epistasis between 10 of the 32 candidate loci. While the variance explained by these effects was often small, it increases the complexity of the architecture of traits that has consequences for the predictability of evolution. Indeed, as recently shown, explicitly addressing epistasis improves our understanding of the genetics of ageing and diseases (Kiel et al. 2021, Wang et al. 2021).
We find large effect loci and interactions that contribute to lifespan variation, which is in line with two recent studies. Huang et al. (2020) previously estimated that the effects of some loci on lifespan can be large, similar to our findings, although the significance in that study only reached a ‘nominal’ level (of 10−5). Furthermore, genotype effects on lifespan are context-dependent, namely with temperature (Huang et al. 2020) and diet (Pallares et al. 2022). This context dependence of the expression of single loci can be extended to genotype-by-genotype interactions, of which we found ample evidence. Ten out of 11 loci showed significant interactions, and 2.9% of the total variance can be attributed to all two-way interactions. If interactions at the SNP level are important, we would expect relatively low overlap between SNP candidates in evolve and re-sequence experiments. Indeed, both Fabian et al. (2018) and Hoedjes et al. (2019) reported significant overlap between genes when comparing the results of similar evolve and re-sequence studies, but not at the level of SNPs. Although currently beyond the scope of this paper, in future studies it should be addressed whether the main and interactive effects are also present in other experimental evolution populations.
Some of the significant SNPs were found in genes that have been functionally tested for lifespan effects before. For instance, recently, it was found that RNAi on Eip75B, CG4554, and capu altered lifespan and age-specific fecundity (Parker et al. 2020, Hoedjes et al. 2023, Huang et al. 2020). Genome editing of the exact similar nucleotide studied in this experiment resulted in age-dependent effects on fecundity of Eip75B, but without a lifespan effect (Hoedjes et al. 2023). The large effect on lifespan in our study was dependent on the genetic background of another locus, namely, liprin-gamma (Figure 2) and therefore it is expected that the effects of genome editing is also context-dependent. It is well described that hppy interacts with the mTor-signaling pathway regulating life-history traits and apoptosis (Bryk et al. 2010, Lam et al. 2010) and has been previously identified in a QTL screening of lifespan (Stanley et al. 2017). It is therefore not surprising to find this locus associated with lifespan. While some of the studied genes are private to insects, others are found in a wider range of organisms. For instance, mutations in mRps22 are known to cause several types of disorders in humans (Saada et al. 2007, Smits et al. 2011, Chen et al. 2018). In this study, we found a double non-synonymous mutation in mRps22 to be associated with lifespan with a relatively high frequency. Other genes liprin-gamma, su(var)3-7, BOD1, mfrn, CG33346 and Noa36 have not yet been identified for their lifespan effects and can therefore be considered new candidates for further lifespan studies.
Evolutionary explanations of ageing are either based on deleterious mutations (mutation accumulation, Medawar 1952) or trade-offs between lifespan and other traits such as fecundity (antagonistic pleiotropy; Williams 1957, Kirkwood 1977). Interestingly, if mutation accumulation would explain most of the variation in ageing rate and lifespan, one would expect that alleles with higher frequencies in long-lived populationss have a positive effect on lifespan. Conversely, alleles that have increased in frequency in late reproduction populationss should also have been associated with higher lifespan. In four out of the six significant main effect SNPs, this was not the case. Hence, it is more likely that these loci have changed in frequency between the early and late-reproducing populationss because they are associated with other traits that are under selection. Indeed, our selection regime, that of reproduction at 14 (early reproduction) or 28 (late reproduction) days after egg laying is unlikely to result in a large selection differential directly on lifespan; most flies die long after these 28 days after egg laying of late reproduction.
Some of the genes that are included in this study have been functionally analyzed and this indicated that more traits than lifespan alone are regulated by these genes (Parker et al. 2020, Hoedjes et al. 2023). This corresponds to lifespan being a life history trait, trading off with other traits (Roff 1992, Stearns 1992). While the evolve and resequencing approach is clearly a powerful tool for identifying potential causal loci segregating in natural populations, our study indicates that it remains important to identify which traits are associated to genetic variation. Our future work will focus on assessing whether loci under selection affect other traits such as development time, body size, and age-dependent reproduction, as life history theory suggests.
Supplemental figures and tables
Footnotes
Work was conducted at the Laboratory of Genetics, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands, and the Institute of Biology, Leiden University, Sylviusweg 72, P.O. Box 9505, 2300 RA Leiden, The Netherlands