ABSTRACT
The question of why and how some species or individuals within a population live longer than others is among the most important questions in the biology of aging. A particularly useful model to understand the genetic basis and selective forces acting on the plasticity of lifespan are closely related species or ecologically diverse individuals of the same species widely different in lifespan. Here, we analyzed 76 diverse wild isolates of two closely related budding yeast species Saccharomyces cerevisiae and Saccharomyces paradoxus and discovered a diversity of natural intra-species lifespan variation. We sequenced the genomes of these organisms and analyzed how their replicative lifespan is shaped by nutrients and transcriptional and metabolite patterns. We identified sets of genes and metabolites to regulate aging pathways, many of which have not been previously associated with lifespan regulation. We also identified and characterized long-lived strains with elevated intermediary metabolites and differentially regulated genes for NAD metabolism and the control of epigenetic landscape through chromatin silencing. Our data further offer insights into the evolution and mechanisms by which caloric restriction regulates lifespan by modulating the availability of nutrients without decreasing fitness. Overall, our study shows how the environment and natural selection interact to shape diversity of lifespan.
INTRODUCTION
Organisms can respond to environmental cues by modifying the genotype to arrive at more than one phenotype, a process known as phenotypic plasticity [1, 2]. Consequently, many morphological, behavioral and physiological phenotypes vary within and between species in natural populations [3–7]. For example, genetic variation in natural populations of many organisms can differentially affect their neural and endocrine functions, leading to variation in quantitative life-history traits such as fitness and age at maturation [8, 9]. Variation in another fitness trait, lifespan, has also attracted much attention [10–12]. Across eukaryotes, species longevity can differ over many orders of magnitude, from the 2-day replicative lifespan (RLS) of budding yeast (Saccharomyces cerevisiae) to organisms that live for centuries, such as the Greenland shark (Somniosus microcephalus), which can live for more than 500 years [13–15]. Longevity also consistently varies among genetically diverse individuals of the same species, indicating that variability of lifespan is not constrained at the level of species, and that the molecular determinants of lifespan appear to be phenotypically plastic within the same genetic pool [16–21].
What, then, are the factors that favor long-lived or short-lived individuals? The molecular pathways that underlie the genetic and environmental determinants of lifespan are among the most intensely studied areas in the aging field [18, 19, 22–26]. Animal models of aging have shown that longevity of laboratory animals can be extended by environmental [27–30], dietary [31–33], pharmacological [34–37] and genetic interventions [38–40]. However, many of these laboratory-adapted populations are constrained by genetic and environmental background [41–43]. For example, artificially created mutant strains may show longer lifespan under laboratory setting, but demonstrate reduced fitness in their natural environment [44–47]. Therefore, a more integrated approach is needed to understand how the natural environment and natural selection interact to shape lifespan and associated life-history traits.
In order to understand the experiments that nature has carried out, modifying genotypes to arrive at different lifespans, we need to identify how environmental factors can trigger changes in the genome which in turn impact lifespan plasticity. Towards this goal, we employed 76 wild isolates of the budding yeast to capture evolved diversity of natural lifespan. This collection includes 40 diploid isolates of Saccharomyces cerevisiae and 36 diploid isolates of Saccharomyces paradoxus [48, 49]. These two species are closely related (sharing a common ancestor between 0.4 and 3 million years ago) with 90% genome identity and can mate and produce viable progeny [50, 51]. Earlier genome sequencing has shown that while some of these strains fall into distinct lineages with unique genetic variants, almost half of the strains have mosaic recombinant genomes arising from outcrosses between these genetically distinct lineages [49]. Therefore, these strains offer a great genetic pool to understand gene-environment interaction in shaping lifespan variation.
Accordingly, we assayed replicative lifespan (RLS) of ~4,000 individual cells from these isolates under three different conditions: yeast peptone dextrose (YPD, with 2% glucose), yeast peptone glycerol (YPG, 3% glycerol as a respiratory carbon source), and caloric restriction (CR-0.05% glucose). We identified up to 10-fold variation in median RLS. Although little is known about the natural and life histories of these wild isolates, they occupy diverse ecological niches ranging from trees to fruits to soil and face different evolutionary pressures for adaptation; thus, their natural lifespan variation must be encoded in their respective genomes [49, 51, 52]. We explored the relationship between gene expression, metabolite abundance, and longevity phenotypes to characterize the molecular patterns associated with condition-specific natural lifespan variation. We characterized genes and metabolites with significant positive or negative correlation to longevity and identified pathways associated with median RLS across these isolates under each condition. Our data showed that the environment can lead to different genotypes with wide differences in lifespan, which are regulated by distinct sets of genes, metabolites and connected pathways that have not been previously characterized in the laboratory setting. Altogether, this is the first comprehensive analysis of how the environment and natural selection interact to shape aging and the associated life-history traits.
RESULTS AND DISCUSSION
Wild isolates of Saccharomyces
Natural variation offers a powerful tool to assess the forces shaping sequence variation that underlies phenotypic diversity of natural populations. This approach could help explain how the phenotype is shaped by genotype and environment [2, 53]. Because of their wide ecological, geographical and genetic diversity, natural isolates of the budding yeast Saccharomyces have become an important model system for population/evolutionary genomics [52]. We took advantage of natural variation across 76 diploid wild yeast isolates of two closely related Saccharomyces species, including 40 S. cerevisiae and 36 S. paradoxus isolates, to examine variation in longevity. Their niches include human-associated environments, such as breweries and bakeries, and different types of wild ecological niches, such as trees, fruits, vineyards, and soils across different continents. There was also a group of pathogenic S. cerevisiae strains isolated from immunocompromised patients (Supplementary Table 1). To understand the forces that shape diversity of lifespan and the associated mechanisms, we first sequenced and assembled the genomes of these 76 isolates. Consistent with a previously published report [49], we found substantial genetic variation among these strains (~655,000SNPs). We also analyzed ploidy states of these strains and characterized those with increased copy number of specific chromosomes (Supplementary Fig. 1A, B).
Growth characteristics of wild isolates in liquid culture
We monitored growth characteristics of natural yeast isolates on standard glucose media (i.e. during fermentation) and on media with a non-fermentable substrate, glycerol, as a carbon source (i.e. during respiration), using an automated growth analyzer, and calculated the doubling time. Most wild isolates grew faster than the diploid laboratory wild type (WT) BY4743 strain under both conditions, with an average doubling time of 65 minutes in YPD and 125 minutes in YPG (Fig. 1A, Supplementary Table 1). Importantly, most of these strains grew at a similar rate on YPD, and variation in growth rate on YPG was also relatively small, indicating that these laboratory-optimized culture conditions are suitable for the nutritional needs of these strains and for RLS analysis (Fig. 1A).
Lifespan under glucose, glycerol, and calorie restricted growth conditions
We assayed RLS of these isolates at 30 °C using 3 different growth conditions: YPD medium, YPG medium, and medium with restricted glucose (calorie restriction, CR, 0.05% glucose). Under YPD conditions, we observed a remarkable ~10-fold variation in median RLS and 9-fold variation in maximum RLS across these isolates (Fig. 1B, Pearson correlation coefficient = 0.65 between medium and maximum lifespans, P < 0.00001). The average median RLS of S. paradoxus strains (29.7) was significantly higher (adjusted P value =2.7×10−9) than the average median RLS of S. cerevisiae strains (22.7), showing that the S. paradoxus mother cells can produce more daughter cells in this medium (Fig. 1B, C). S. paradoxus strain Q32.3 had the longest median RLS (median RLS=38.5), whereas S. cerevisiae strains Y10 and YS2 exhibited the shortest RLS (median RLS=4 for Y10 and 6 for YS2) (Supplementary Table 1).
Glycerol as a growth substrate can extend both RLS and chronological lifespan (CLS) of yeast [54, 57]. In the case of CLS, the increased longevity is clearly caused by a switch from fermentative metabolism to respiration [54]; however, for RLS the mechanism is unclear, as respiratory metabolism is not required for lifespan extension from CR in laboratory WT strains [58], and the requirement of respiratory metabolism for lifespan extension by glycerol has not been established. While we observed a median RLS increase in 39 strains on YPG; most other 27 strains showed no change and few strains decreased median RLS (Fig. 1D, Supplementary Fig. 2A, Supplementary Table 1). For example, S. cerevisiae strain BC187 showed an increased maximum and medium RLS when comparing growth on glucose and glycerol (maximum RLS = 62 in YPD and 92 in YPG; median R.LS = 32 in YPD and 37.5 in YPG). S. paradoxus strain KPN3829 also showed a similar increase (maximum RLS = 39 in YPD and 60 in YPG; median RLS = 21 in YPD and 35 in YPG) (Supplementary Table 1, Supplementary Fig. 2A). On the other hand, S. cerevisiae strain YJM975 showed a decrease in both measures (maximum RLS= 70 in YPD and 60 in YPG; median RLS = 31 in YPD and 23 in YPG) (Supplementary Table 1, Supplementary Fig. 2A).
We further dissected the role of carbon source metabolism in RLS by comparing median RLS changes between YPD and YPG conditions. Interestingly, strains with the shortest RLS on YPD tended to achieve the largest lifespan gains from switching to YPG, while the long-lived strains on YPD generally did not show a further RLS increase (Supplementary Fig. 2B). For example, 42 out of 51 short-lived strains (compared to laboratory WT strain on YPD) increased their median RLS when switched to glycerol (Supplementary Fig. 2A, Supplementary Table 1), whereas 17 out of 23 long-lived strains (compared to laboratory WT strain on YPD) did not show further median RLS extension (Supplementary Fig. 2A, Supplementary Table 1). This correlation matches prior studies of genetically diverse single gene deletions in the BY4742 background where the shortest-lived strains tended to receive the largest lifespan extension in response to CR [57]. Further analyses of a subset of wild isolates (36 strains) on the CR medium (0.05% glucose) revealed a similar pattern, with the shortest-lived strains on YPD tending to receive the largest lifespan increase from CR (Supplementary Fig. 2C, Supplementary Table 1). Consistent with this, the effects of CR and YPG on lifespan relative to YPD for each strain were significantly correlated (r= 0.62, Padj=2.29×10−4) (Supplementary Fig. 2C). CR is associated with increased respiration across different organisms, including yeast [59–61]. While we observed a 13% median RLS increase in the diploid laboratory WT strain, wild yeast isolates showed a wide range of median RLS under CR conditions (Fig. 1D). Among the 36 strains assayed under CR conditions, 16 did not respond to CR and decreased median RLS, while the other 20 strains showed various degrees of median RLS extension (Supplementary Table 1, Supplementary Fig. 3). For example, YJM978 strain showed a decreased median RLS (50% decrease), whereas CR caused a 75% median RLS increase of Y9 strain, compared to the 2% glucose medium (Supplementary Table 1, Supplementary Fig. 3). Overall, our data suggest that long-lived strains have distinct niche-specific nutritional and environmental adaptations, similar to CR condition that resulted in altered lifespan. Therefore, there was no consistent RLS extension in long-lived strains when switched to glycerol or subjected to CR.
Endophenotype variation across wild isolates
While many previous large-scale omics studies on aging focused on genome-wide association [62–65], recent comparative studies on transcriptomics [66, 67], proteomics [68–70], metabolomics [71–73] and ionomics [72] have begun to shed light on molecular patterns and mechanisms that link to endophenotypes. For example, it has been suggested that natural variation is associated with extensive changes in gene expression, translation and metabolic regulation, which in turn may affect selection under different stress conditions [74, 75]. In fact, gene expression variation has repeatedly been postulated to play a major role in adaptive evolution and phenotypic plasticity [75, 76]. Evolution of gene expression is known to exhibit selective constraints [77, 78], thereby supporting specific phenotypic outcomes such as changes in morphology [79] and lifespan [80]. Similarly, comparative studies on molecule abundance by metabolite profiling have been utilized to describe the genotype to phenotype relations in model organisms [81, 82]. Accordingly, we aimed to explain lifespan differences among these natural populations of closely related yeast isolates by analyzing their endophenotype differences that include gene expression variation (transcriptomics) and differences in their metabolite levels (metabolomics).
For the transcriptome analyses, we obtained ≥ 5 million 150-bp paired-end RNA-seq reads for each strain. For metabolomics analysis, we applied targeted metabolite profiling using liquid chromatography-mass spectrometry (LC-MS). After filtering and quality control, the data set contained RNA-seq reads for 5,376 genes and 166 metabolites identified commonly across all isolates (Supplementary Table 2). The expression profiles were similar to one another, with Spearman correlation coefficients of strain pairs ranging between 0.59 and 0.93 (except the pairing to Q59.1, CBS5829, YPS606, and UFRJ50791, with the range between 0.21 and 0.79) (Supplementary Fig. 4, Supplementary Table 2). To determine whether the previously published sequence-based evolutionary relationship [49] was reflected in their gene expression variation, we constructed gene expression phylograms using a distance matrix of 1 minus Spearman correlation coefficients and the neighbor-joining method [83]. The resulting topology was largely consistent with their phylogeny with a clear separation between S. cerevisiae and S. paradoxus strains (Fig. 2A).
To visualize endophenotypic variation between these two species and across the strains of the same species, we performed Principal Component analysis (PCA) on each type of data. PCA of the transcriptome revealed a pattern resembling the phylogenetic relationship, with the first three PCs explaining ~49 % of total variance in gene expression (Fig. 2B, Supplementary Fig. 5A). Although some S. paradoxus strains clustered with S. cerevisiae, we observed clear species segregation based on PC2 (except for some outlier strains that were separated by PC1). PCA of metabolomics data revealed a similar structure with the first three PCs explaining ~41% of total variance in metabolite levels, and PC2 somewhat separating the species (Fig. 2C, Supplementary Fig. 5B). However, the strains were segregated along PC1 largely according to their median RLS, and the effect was stronger across S. paradoxus strains.
To understand the basis of this segregation pattern, we performed pathway enrichment analysis by combining the 500 top genes (250 with positive weights and 250 with negative weights) and 40 top metabolites (20 with positive weights and 20 with negative weights) contributing to each PC, respectively. This integrative analysis of genes and metabolites contributing to PC1 revealed a distinct set of Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, including RNA degradation, MAPK signaling pathway, cell cycle, pantothenate and CoA biosynthesis, ribosome biogenesis and pentose phosphate pathway (Fig. 2D, Supplementary Table 3). The analysis of genes and metabolites for PC2 revealed the KEGG pathways related to ribosome, autophagy, endocytosis, cell cycle, mRNA surveillance, and nucleotide excision repair (Fig. 2E, Supplementary Table 3). These results suggest that these processes diverged most significantly across the wild isolates of two species of Saccharomyces genus and may account for their phenotypic diversity, including lifespan. It should be noted, however, that we do not know exactly what each PC represents, unless it perfectly aligns or correlates with some known variables. Also, either biological (e.g. phylogenetic structure) or technical (e.g. data normalization or batch effect) or mixed effects of both may render PCA biased [84].
Gene expression and metabolite abundance correlate with longevity
To identify the endophenotypes (genes and metabolites) correlating with natural lifespan variation across wild isolates, we applied the phylogenetic generalized least-squares (PGLS) method to account for the phylogenetic relationships among the strains and test for different models of traits evolution [85, 86]. Regression was performed between endophenotypic values and median RLS under different models of trait evolution and the best-fit model was then selected based on maximal likelihood (Materials and Methods). To assess robustness of the relationship, calculation of regression slopes was repeated by taking out one yeast strain at a time. This ensured the overall relationship did not depend on a particular isolate.
With the PGLS approach, we identified 73 transcripts with significant correlation to median RLS (Padj ≤ 0.01; 39 with positive correlation and 34 with negative correlation) (Supplementary Table 2). Among the top hits with positive correlation were a putative zinc finger protein coding gene CMR3 (Padj=3.3×10−9), histone acetyltransferase (HAT) gene HPA2 (Padj=0.0002), O-methyltransferase COQ3 (Padj=0.006) and zing regulated protein gene ZRG8 (Fig. 3A, Supplementary Table 2). The top hits with negative correlation included the genes coding for cyclin-dependent kinase Pho85p interacting proteins PCL1 (Padj=0.0008) and PCL2 (Padj=0.001), regulator of Ty1 transposon protein coding gene RTT107 (Padj=0.007), and inositol monophosphatase gene INM1 (Padj=0.006) (Fig. 3B, Supplementary Table 2).
To assess if any of our transcript hits were previously implicated in yeast lifespan, we compared our list of significant genes (357 genes at Padj=0.05) to the genes associated with RLS regulation in laboratory WT strain listed in the GenAge database [87]. The GenAge list contains 611 genes from the published literature that were reported for decreased or increased RLS of laboratory yeast strains (595 deleted and 16 overexpressed genes) (Supplementary Table 4). There were 39 genes present in both our list and the GenAge list, 23 of which showed the same direction of correlation with RLS. For example, INM1, RTT107, PPH3 and BSC1 genes increase RLS when deleted (GenAge database) and increase RLS when their transcript levels decrease across wild isolates (this study). (Supplementary Table 4). However, the overall pattern did not reach statistical significance (Fisher exact test, p ≥ 0.05).
We further compared our results with the gene expression patterns across wild isolates and 1,376 laboratory knock-out strains (KO) strains [88], whose RLS was published [89] previously (Supplementary Fig. 6A, B Supplementary Table 1). We calculated an association of gene expression with different measures of RLS (mean RLS, median RLS, and maximum RLS) across KO strains. Our analysis revealed around 400 significant genes (Padj=0.05) commonly up or down regulated in all types of RLS measures and more than 1000 genes associated with median RLS (Fig. 4A). We then calculated a correlation matrix of RLS-associated gene expression changes across KO strains and wild isolates (Fig. 4B). We found no positive correlation between RLS associated gene expression changes across KO strains and RLS associated gene expression changes across natural isolates (Fig. 4B).
To further understand whether the same sets of biological pathways are involved in lifespan variation achieved by laboratory manipulations and by natural selection, we performed functional enrichment (GSEA) of genes associated with RLS across deletion and wild isolates (Fig. 4C). For the genes correlating positively with longevity across the KO strains, the enriched terms included cellular responses to stress, ribosome, translation, cellular senescence, and DNA repair (Fig. 4C). On the other hand, positive terms enriched in wild isolates included TCA cycle, oxidative phosphorylation, and lipid metabolic process, regulation of apoptosis, and autophagy (Fig. 4C). Most of these pathways have previously been implicated in aging and lifespan regulation in laboratory conditions [90].
With regard to metabolite measurements, we identified 11 metabolites that showed correlation with median RLS (Padj ≤ 0.01) (Supplementary Table 3). Among them, seven metabolites (tryptophan, lactate, 2-hydroxyglutarate, 3-hydroxypropionic acid, 2-hydroxyisobutyrate, 2-hydroxybutyrate, and phenyllactic acid) correlated positively (Fig. 5A, Supplementary Fig. 7B) and four metabolites (lysine, quinolinic acid, propionate, Se-methylselenocysteine) showed negative correlation (Fig. 5B, Supplementary Fig. 7B). This points to the potential roles of tryptophan metabolism and NAD+ biosynthesis in natural lifespan variation. Tryptophan degradation, also known as the kynurenine pathway, is required for de novo NAD+ biosynthesis from L-Tryptophan (Trp) [91, 92]. Recent studies have shown that inhibition of Trp degradation increases the lifespan in yeast [93], C. elegans [94, 95] and fruit flies [96]. Increased kynurenine pathway activity and alteration in kynurenine pathway metabolites with age have also been observed in rats [97] and humans [98, 99], indicating a conserved role of this pathway in lifespan regulation [100]. More recently, a study across 26 mammalian species also found that species characterized by kynurenine pathway activation were shorter-lived [72]. Interestingly, while our data showed that abundance of Trp correlates positively with RLS, we also found that quinolinic acid, an intermediate in the kynurenine pathway [91, 92], correlates negatively with median RLS, suggesting a possible inhibition of Trp degradation and inactivation of the kynurenine pathway. Similarly, increased quinolinic acid with age was associated with aging and neurodegeneration [98, 99]. Furthermore, our transcriptome data showed that the BNA2 (indoleamine 2,3-dioxygenase) gene, required for the first rate-limiting step of Trp catabolism in the kynurenine pathway [92], was significantly downregulated (Padj = 0.02) in long-lived wild isolates (Fig. 3B).
Other than being an electron carrier in metabolic pathways, NAD+ is a co-substrate of several NAD+-dependent enzymes responsible for lysine acetylation (KAC), a conserved protein post-translational modification (PTM) [101, 102]. Redox balance is critical for the availability of NAD+, and it is directly regulated by mitochondrial respiration [103]. Decreased levels of NAD+ are associated with aging [104] and age-related diseases, whereas NAD+ supplementation was found to extend lifespan in mice [105], yeast and worms [104]. Increased activity of NAD+-dependent histone deacetylase (KDAC), SIR2 (SIRT1 in mammals) has been associated with NAD+-mediated lifespan extension across different organisms [106–108]. Even though our transcriptome data showed no significant association between SIR2 transcript level and median RLS, SIR2 activity might be enhanced by a different mechanism. For example, CR induced PNC1 expression reduces cellular nicotinamide (NAM, an inhibitor of SIR2 [109]. Decreased concentration of nicotinamide to nicotinic acid (NA) might enhance SIR2 activity. Interestingly, our transcriptome data revealed a significant increase in mRNA abundance of PNC1 in long-lived isolates (Fig. 3A). It is also possible that the increased SIR2 activity might not be necessary for NAD+-regulated lifespan extension in wild isolates, as has been suggested previously [110–112]. Although our metabolite data showed no significant changes in NAD+ level across wild isolates (Fig. 5B), we found the increase in NADH, correlated (Padj=0.04) with increased median RLS (Supplementary Fig. 7B). However, it has been long established that the assessment of cellular NAD+ content and the NAD+/NADH ratio is extremely challenging. They are predominantly bound to intracellular proteins so that the total concentrations determined in protein-free tissue extracts do not reflect true concentrations of these metabolites. In addition, the NAD+ and NADH contents are likely altered in our samples, because they are highly sensitive to temperature, light, pH, or buffer contents during sample preparation, so that we do not know the exact effect of kynurenine pathway inhibition and increased PNC1 expression on NAD+ level [102].
Our metabolite list also included several related short chain fatty acids (SCFAs: 3-hydroxypropionic acid, 2-hydroxyisobutyrate, 2-hydroxybutyrate) and 2-hydroxyglutarate, which are known to be involved in histone modifications. Recently, two new types of post-translational modification of lysine (Lys) residues in histones (H2A, H2B, H3, and H), namely histone propionylation and histone 2-hydroxyisobutyrylation (Khib) (both also known as histone acylation), were identified as an evolutionarily conserved modification from yeast to human [113–115]. Earlier studies have shown that histone modification through Khib is directly related to metabolic regulation and can change the enzymatic activity of glycolytic enzymes to regulate glycolysis in response to the availability of carbon source [116, 117]. For example, 2-hydroxyisobutyrylation on histone H4K8 (H4K8hib) is found to be regulated by glucose availability in the budding yeast, and inhibition of H4K8hib was shown to reduce chronological RLS in yeast [117]. Regulation of H4K8hib in yeast was mediated by histone lysine deacetylases (KDAC) Rpd3p, Hos3p and Esa1p [117, 118]. Rpd3p and Hos3p have both deacetylase and de-2–hydroxyisobutyrylase activities; however, their regulation for selective removal of each group has not been documented. Yeast Esa1p (Tip60 human ortholog) also found to add Khib (2– hydroxyisobutyrylation) to substrate proteins [118]. Along with our metabolomics data, our transcriptomic data supports the distinct regulation of Khib in long-lived isolates, since transcript abundance of both HOS3 and ESA1 was found to be positively correlate with median RLS (Supplementary Fig. 7A). Together, these results suggest that NAD+ homeostasis and a regulatory network involving histone lysine acetylation (KAC) and Khib are involved in lifespan regulation in wild isolates. In addition, 2-hydroxyglutarate was found to enhance gene silencing through inhibition of specific H3K36 histone demethylases in yeast [1119]. Overall, the data point to the importance of transcriptional regulation through different types of histone modifications in long-lived isolates. It should be also noted here that proper epigenetic regulation of ribosomal DNA (rDNA) contributes to transcriptional regulation of rDNA (constitute ~80% of the total mass of cellular RNA) and genome stability, which are required for normal life span of yeast [120]. The unstable rDNA region on chromosome XII is organized into a tandem array of 9.1-kb units that are repeated 100 to 200 times. Deacetylation of rDNA region by NAD-dependent histone deacetylase, Sir2, contributes to the rDNA gene silencing and suppress rDNA recombination to prevent formation of extra chromosomal rDNA circles [121, 122]. Increased rDNA repeats and formation of extra chromosomal rDNA circles are known to be a main factor for yeast replicative aging [120–122]. Consistently, our analysis of genomic reads revealed that increased rDNA copy number negatively correlates with median RLS (Pearson correlation coefficient = −0.3, P = 0.01) across these wild isolates (Supplementary Fig. 8)
A link between our transcriptome and metabolome data could also be seen for the Lys biosynthesis pathway. We found a negative correlation for Lys abundance, (i.e., long-lived strains tend to have less Lys). Our transcriptome data also showed that the two genes controlling the first rate-limiting step of Lys biosynthesis, LYS20 and LYS21, are negatively correlated (Supplementary Fig. 7A), possibly explaining the decreased Lys levels in long-lived strains (Fig. 5B). It should also be noted these two enzymes catalyze condensation of Acetyl-CoA and 2-oxoglutarate (alpha-ketoglutarate) to produce homocitrate and CoA directly regulating Acetyl-CoA pool which is required Lys acetylation [101].
In addition to TCA cycle, oxidative phosphorylation and NAD+ metabolism, our data point to lipid metabolism, autophagy and apoptotic processes as possible regulators of natural lifespan variation across wild isolates (Fig. 4C). Interestingly, all these processes have some resemblance to pathways, regulated by CR/dietary restriction (DR) [59, 60], further supporting the findings that CR associated mechanisms may be responsible for lifespan extension in long-lived wild isolates. CR has been found to increase the NAD+/NADH ratio [123–125], and the inhibition of the kynurenine pathway by CR has also been reported [130, 131]. These findings suggest an interesting mechanism by which CR-like physiological stimuli manifest their lifespan effects through increased respiration and kynurenine pathway inhibition. Although the involvement of Khib in CR-mediated lifespan regulation has not yet been thoroughly studied, our findings suggest histone Khib modification might be competing with NAD+ regulated KAC to influence acetylation balance. Both NAD+ and Khib have been shown to be regulated by glucose availably [117, 123–125], indicating niche-specific nutritional and environmental cues for the long life of natural isolates. These long-lived strains might be under selection due to repeating seasonal and/or constant starvation cycles, mimicking CR conditions. As a proof of concept, we compared gene expression changes of CR responding strains (i.e. those that showed an increased RLS) and non-responding strains (decreased RLS). We found 207 genes differentially expressed, including 66 that significantly reduced expression and 139 that increased expression in the CR-responding group (Supplementary Table 4). Next, we compared our CR gene list to GenAge data that are based on 112 genes with significant expression changes under CR conditions and/or associated with CR mediated RLS extension in a laboratory WT strain [126] (Supplementary Table 4). Interestingly, there were only common 3 genes, including SCH9 (negatively correlated), which is a functional ortholog of mammalian S6 kinase known to be phosphorylated by Tor1p, NPT1 (positively correlated), which is a nicotinate phosphoribosyl transferase that acts in the salvage pathway of NAD+ biosynthesis and MDH1 (positively correlated), a mitochondrial malate dehydrogenase responsible for interconversion of malate to oxaloacetate using NAD+/NADH as a cofactor. All three genes were previously associated with the CR mediated RLS extension [111, 127, 128]. Furthermore, correlation of aggregated gene expression patterns (5,376 genes) of wild isolates to gene expression patterns of a laboratory WT strain, grown under CR condition, revealed no significant correlation between any types of pairwise comparison (Supplementary Fig. 4). However, our protein-protein interaction network of 207 genes, differentially expressed in CR-responding strains, revealed a highly connected dense network of pathways, such as ribosome (both mitochondrial and cytoplasmic), cell cycle, TCA cycle, oxidative phosphorylation, steroid biosynthesis and Trp metabolism (Fig. 6). All these pathways are known to be a hallmark of CR associated longevity mechanism [58–60]. The term “longevity regulating pathway” also appeared with connected 31 nodes of known aging regulating genes (out of 36) (Fig. 6), indicating known longevity regulators are interacting with the genes associated with longevity regulation in wild isolates. Overall, we conclude that evolution of CR dependent longevity follows different genetic trajectories to regulate known genes and related pathways of CR to extend lifespan (Fig. 6). Most probably, CR associated genes in wild isolates are favored over through the course of long adaptive evolution under CR like condition induced by the genetic variability that adaptive mechanisms never been evolved under laboratory conditions (Fig. 7). It would be also interesting to experiment long term adaptive evolution of laboratory WT strain under CR condition to see if similar genes would be targeted for adapted changes (mutations). Overall, our data indicates that life history trajectories of these strains adapted a mechanism, resembling CR at the level of pathways, although genes associated with these pathways differed from the genes induced by CR in laboratory WT strains.
CONCLUDING COMMENTS
The budding yeast contributed significantly to our understanding of genetics and cell biology and has become an important model of aging, since Mortimer introduced the yeast RLS phenotype [13]. With the power of genetics and experimental tools, yeast has provided clues for understanding the aging process in eukaryotes and yielded hypotheses that can be tested in other organisms, including mammals [129, 130]. However, the yeast aging community has adopted single reference isolates or strains derived from the S288c strain, introduced to aging research by Mortimer in 1959 [13]. He derived the S288c strain from the diploid strain isolated from a rotten fig in California in 1938 [131]. Since then, it has been over 60 years that the S288c strain and its derivatives have been kept in prolonged culture under laboratory conditions that induced relaxed selection for different traits [132]. Relaxed selection might also induce detrimental mutations which may in turn lead to lifespan reduction as has been shown for different organisms [133, 134]. For example, a widely used yeast knockout library has been created from the S288c derivate BY strain background (BY4716, BY4741, BY4742, BY4743), and genome sequencing revealed 39 mutations between the original S288c isolate and the diploid BY4716 strain, a progenitor of the yeast knockout library [135]. In addition, the BY strain series inherited a mutated copy of Hap1 and an allelic variant of MIP1, which are important for mitochondrial respiration and mitochondrial genome stability. Accordingly, this strain manifests several genotype-specific traits, including decreased sporulation and increased rates of mitochondrial genome loss [136]. It is important to note that mitochondria have an important role in a wide variety of metabolic and cellular processes, including energy production, amino acid synthesis, lipid metabolism, cell cycle regulation, apoptosis, autophagy and signaling processes and many of these processes are directly linked to lifespan regulation and aging [137, 138]. Finally, the presence of auxotrophic markers (i.e., mutations that prevent yeast cell unable to synthesize an essential compound such as amino acids) in its genome has been shown to have potential confounding effects for many traits [1136, 139]. These characteristics of S288c and the laboratory adapted BY strain series illustrate that the species has the potential genotype-specific lifespan traits, regulated by particular genes in laboratory settings. But these longevity-associated conditions might be detrimental and not be employed in the natural environment. For example, most of the long-lived yeast mutant strains, from the knockout library have been shown to have reduced fitness relative to isogenic wild-type cells [47]. Similar trade-off was observed across many long-lived laboratory mutants of other model organisms of aging, including nematodes, flies and mice, which may show reduced lifespan, reduced fecundity, slower movement or reduced rates of larval development under naturalistic conditions. These findings indicate that these long-lived strains lack a competitive advantage in the wild and might be eliminated because of selection pressure in their natural environment. These issues of laboratory adapted model organisms have renewed our interest in understanding how the environment modulates lifespan diversity, leading to extended lifespan without significant reduction in fitness or fecundity [45, 46, 140, 141].
From this perspective, the natural isolates we analyze in the current study offer an excellent tool that can serve as a new model for yeast aging studies, allowing to leverage gene-environment impact on lifespan variation. In fact, our comparison of gene expression changes and longevity signatures across laboratory-adapted long-lived mutants and long-lived natural isolates showed that different genes and pathways may be associated with longevity in wild isolates and deletion strains. Interestingly, the pathways associated with lifespan regulation in wild isolates are known to be regulated by CR in laboratory conditions, although the regulation differs at the gene expression level across wild yeast isolates.
Our findings also provide insights into the evolution of adaptive metabolic shift from fermentation to respiration under CR-like condition to regulate NAD+ metabolism to extend lifespan without decreasing the fitness of strains in the wild (Fig. 7). Although it is unknown if the CR-induced lifespan extension is achieved through common mechanisms in different species, strong conservation of the NAD+ pathway across different taxa suggests that the same mechanisms regulate NAD+ homeostasis under CR across other organisms. In addition, our data suggest co-regulation of Lys modification by Khib in long-lived strains, although its association with CR and NAD+ metabolism needs further investigation. Regulation of NAD+ and Khib suggests a contribution of epigenetic factors to evolution of gene expression that possibly originated as an adaptation to food availability and other niche-specific environmental conditions. Overall, our research has uncovered one of the experiments of Nature that employs environment to modify genotype and gene expression, arriving at different lifespans. Further understanding of how gene-environment interaction modulates genes and pathways associated with longevity may open new therapeutic applications to slow aging and age-related diseases through diet, lifestyle, or pharmacological interventions.
Materials and Methods
Yeast strains and growth conditions
Many of the diploid wild isolates of S. cerevisiae and S. paradoxus (68 isolates) were obtained from the Sanger Institute [49] and the remaining (8 isolates) was gifted by Justin Fay from Washington University [48]. Detailed information about strains used in this study is in Supplementary Table 1. The diploid laboratory WT strain BY4743 was purchased from American Type Culture Collection (ATCC). Growth rates were determined using a BioTEK Epoch 2 instrument by the analysis of optical density in the OD600 range, and doubling times were calculated with an R script by analyzing fitting spline function from growth curve slopes [142]. The maximum slope of the spline fit was used as an estimate for the growth rate and doubling time for each evolved line, in combination with the YODA software package [143].
Replicative lifespan assay
RLS was determined using a modification of our previously published protocol [144]. Yeast cell cultures for each strain were freshly started from frozen stocks on yeast extract peptone dextrose (YPD) plates and grown for 2 days at 30 °C prior to dissections. Several colonies were streaked onto new YPD with 2% glucose, YPD with 0.05% glucose or YPG plates with 3% glycerol using pipette tips. After overnight growth, ~100 dividing cells were lined up. After the first division, newborn daughter cells were chosen for RLS assays using a dissection microscope. For each natural isolate, at least two independent assays were performed. Each assay contained 20–80 mother cells of BY4743 strain, which was used in every experiment as a technical control.
RNA-sequencing and data analysis
Three independent cultures for each strain were collected at the OD600=0.4 to isolate DNA and RNA from each culture using Master Pure yeast DNA purification kit (Lucigene, Cat. number: MPY80200) and Quick-RNA 96 Kit from Zymo Research (Cat. number: R1053). Illumina TruSeq DNA and RNA library preparation kit according to user manual and loaded on Illumina HiSeq 4000 platform to produce 150 bp paired-end sequences for both DNA and RNA libraries. The average depth of coverage was ~100X per strain for both DNA and RNA sequencing. After quality control and adapter removal, STAR software package [145] was used to map the reads against a pseudo reference genome of each strains, in which we replaced identified nucleotide changes in S288c reference genome. Genome Analysis Toolkit (GATK) [146] was used to identify SNPs for each strain. Read alignment rate for transcriptome data against pseudo genome was varied between 92-97% across S. cerevisiae strains and 93-99% across S. paradoxus strains (Supplementary Fig. 9). Read counts per gene were calculated using Feature Counts [147]. To filter out genes with the low number of reads, we used filterByExpr function from edgeR package and only the genes with at least 1 count per million (cpm) in at least half of the strains were retained, resulted in an expression set of 5,376 genes across replicates of wild isolates. Filtered data was then subjected to trimmed mean of M-values (TMM) normalization [148]. Differential gene expression analysis was performed using the R package EdgeR [149]. We used Benjamini-Hochberg multiple-testing procedure and selected genes with a false discovery rate (FDR) of 5%.
Metabolite profiling and data analysis
We collected 25 ml cells from the same culture that was used for sample collection for RNA-seq analysis. 1 ml of MeOH:H2O mixture (8:2, v/v) was added to the samples, swirled at 550 rpm on a mixer for 5 minutes and then transferred to an Eppendorf tube, they were sonicated in an ice bath for 10 min, centrifuged at 4°C at 14,000 rpm for 15 min, and 600 μl of supernatant was collected into a new tube and dried in a vacuum centrifuge at 30 °C for 2.5 hrs. Samples were reconstituted in 1 mL and injected into a chromatography system consisting of a dual injection valve setup allowing injections onto two different LC columns with each column dedicated to an ESI polarity. 5 μL were injected on the positive mode column and 10 μL on the negative side column. The columns were a matched pair from the same production lot number and were both a Waters BEH amide column (2.1 x 150 mm). Auto sampler was maintained at 4 °C and column oven was set to 40 °C. Solvent A (95% H2O, 3% acetonitrile, 2% methanol, 0.2% acetic acid with 10 mM ammonium acetate and 5 μM medronic acid and Solvent B (5% H2O, 93% acetonitrile, 2% methanol, 0.2% acetic acid with 10 mM ammonium acetate 5 μM medronic acid) were used for sample loading. After completion of the 18-minute gradient, injection on the opposite column was initiated and the inactive column was allowed to equilibrate at starting gradient conditions. A set of QC injections for both instrument and sample QC were run at the beginning and end of the sample run. Data was integrated by Multiquant 3.0.2 software. Peaks were selected based on peak shape, a signal-to-noise of 10 or better and retention times consistent with previously run standards and sample sets. Analysis of the dataset was performed using R (version 3.6.0). All the metabolites with ≥ 40% missingness were excluded, and a total of 166 metabolites were included in the imputation step. We imputed the remaining missing values using the K-nearest neighbors imputation method implemented in the R impute package [150]. The log2-transformed abundance was median normalized prior to imputation.
Principal component analysis (PCA)
Principal component analysis was performed on preprocessed data (e.g. normalized and imputed log2 abundance of the metabolomic data, and the log2-counts per million (CPM) values of the filtered and TMM normalized RNAseq data) using the R prcomp function. To identify the underlying pathways, the factors in each of the first three principal components (PCs) were ranked by their contributions, and pathway enrichment analysis was performed on the top 500 transcripts using Network Analyst [151] and on the top 60 metabolites using Metabol Analyst [152] platforms.
Phylogenetic regression by generalized least squares
R packages ‘nmle’ and ‘phylolm’were used to perform phylogenetic regression by generalized least squares method [153]. We tested four models of trait evolution: (i) complete absence of phylogenetic relationship (‘Null’); (ii) Brownian Motion model (‘BM’); (iii) BM transformed by Pagel’s lambda (‘Lambda’); and (iv) Ornstein–Uhlenbeck model (‘OU’). The parameters for Lambda and OU models were estimated simultaneously with the coefficients using maximum likelihood. The best-fit model was selected based on maximum likelihood. Strength of correlation was based on the p-value of regression slope. To confirm robustness of results, regression was performed by leaving out each strain, one at a time, and computing P values using the remaining strains.
Differential expression between CR responding/non-responding groups
The differential expression of genes between CR-responsive (extending lifespan on CR conditions) and non-responsive (decreasing lifespan on CR conditions) strains of S. cerevisiae species were determined using R package limma [154]. Briefly, we used a linear mixed model approach, fitting the CR responding group as a fixed effect and the strain as the random effect where the intra-strain correlation was incorporated into the covariance. To use linear mixed models, we first converted the gene counts to log2 counts per million (log-CPM) using the TMM normalization data and then estimated both observation-level and sample-specific weights using the voom “WithQualityWeights” function from the limma package [154]. The observation-level weights allow us to use a linear mixed model (by accounting for the dependence between mean and variance), and the sample-specific weights enable us to weigh individual samples up or down. Benjamini-Hochberg (BH) adjustment was performed to account for multiple hypotheses [155]. Genes with adjusted p-value < 0.05 were considered significant.
Gene expression signature associated with RLS across deletion strains
Gene expression data on deletion mutants was obtained from GSE45115, GSE42527 and GSE42526 [88]. The corresponding RLS lifespan data for mutant strains was from [89]. Based on the raw data from the number of replicates, we calculated median, mean and maximum RLS, together with corresponding standard errors (SE) for each deletion strain. In total, this resulted in 1,376 deletion strains, for which both RLS and gene expression data were available. logFC of individual genes corresponding to each mutant strain compared to control samples were used for subsequent analysis.
To identify genes associated with RLS across KO strains linear models in limma were used [154]. We found genes associated with median, mean and maximum RLS both in linear and logarithmic scale. Benjamini-Hochberg (BH) adjustment was performed to account for multiple hypotheses [155]. Genes with adjusted p-value < 0.05 were considered significant. To determine statistical significance of the overlap between genes associated with different metrics of RLS, we performed Fisher exact test separately for up- and downregulated genes, considering 6,170 genes as background.
Comparison between signatures of RLS across deletion and natural strains
To compare gene expression signatures associated with different metrics of RLS across deletion and natural Saccharomyces strains, we calculated Spearman correlation coefficients between corresponding gene expression slope coefficients in a pairwise manner. Clustering within correlation matrix was performed with complete hierarchical approach and Spearman correlation distance.
To increase the signal of correlation matrix, the union of top 1000 statistically significant genes from each of the two signatures in a pair was used to calculate Spearman correlation coefficient. To get an optimal gene number for removal of noise, we looked how the total number of significantly correlated pairs of signatures depended on the number of genes used to calculate the correlation coefficient. As a threshold, we considered BH adjusted p-value < 0.05 and Spearman correlation coefficient > 0.1.
To determine statistical significance of the overlap between genes associated with different metrics of RLS across deletion and natural strains, we performed Fisher exact test, considering 4,712 genes as background. To identify genes, whose deletions are associated with longer or shorter lifespan in S. cerevisiae strains, we compared the distribution of RLS across samples corresponding to certain deletion strains with the distribution of median RLS across all measured deletion strains. For that we used Mann-Whitney test. Genes with BH adjusted p-value < 0.05 were considered significant. Overlap of these genes with lifespan-associated genes across natural strains was assessed with Venn diagram and Fisher exact test p-value (with BH adjusted p-value threshold of 0.05).
Functional enrichment analysis
For the identification of functions enriched by genes associated with RLS across deletion and natural strains, we performed gene set enrichment analysis (GSEA) [168] on a pre-ranked list of genes based on log10(p-value) corrected by the sign of regulation, calculated as: where pv and b are p-value and slope of expression of certain gene, respectively, and sgn is signum function (is equal to 1, −1 and 0 if value is positive, negative and equal to 0, respectively). REACTOME, KEGG and GO biological process (BP) from Molecular Signature Database (MSigDB) have been used as gene sets for GSEA [156]. We utilized fgsea package in R for GSEA analysis. Adjusted p-value cutoff of 0.1 was used to select statistically significant functions.
We visualized several manually chosen statistically significant functions with a heatmap colored based on normalized enrichment score (NES). Clustering of functions has been performed with hierarchical complete approach and Euclidean distance.
Supplementary File
Acknowledgements
This work was supported by NIH AG067782 and AG064223 to V.N.G. and 1K01AG060040 to A.K. Studies performed by D.P. and M.K. were funded by a University of Washington Nathan Shock Center of Excellence in the Basic Biology pilot award to A. K. (P30AG013280). M.B.L was supported by the HHMI Gilliam Fellowship for Advanced Study and the UW NIH Alzheimer’s Disease Training Program (T32 AG052354).
REFERENCES
- 1.↵
- 2.↵
- 3.↵
- 4.
- 5.
- 6.
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.
- 12.↵
- 13.↵
- 14.
- 15.↵
- 16.↵
- 17.
- 18.↵
- 19.↵
- 20.
- 21.↵
- 22.↵
- 23.
- 24.
- 25.
- 26.↵
- 27.↵
- 28.
- 29.
- 30.↵
- 31.↵
- 32.
- 33.↵
- 34.↵
- 35.
- 36.
- 37.↵
- 38.↵
- 39.
- 40.↵
- 41.↵
- 42.
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.
- 56.
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.
- 64.
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.
- 120.↵
- 121.↵
- 122.↵
- 123.↵
- 124.
- 125.↵
- 126.↵
- 127.↵
- 128.↵
- 129.↵
- 130.↵
- 131.↵
- 132.↵
- 133.↵
- 134.↵
- 135.↵
- 136.↵
- 137.↵
- 138.↵
- 139.↵
- 140.↵
- 141.↵
- 142.↵
- 143.↵
- 144.↵
- 145.↵
- 146.↵
- 147.↵
- 148.↵
- 149.↵
- 150.↵
- 151.↵
- 152.↵
- 153.↵
- 154.↵
- 155.↵
- 156.↵