ABSTRACT
Suspended animation states such as hibernation or diapause allow organisms to survive extreme environments. But the mechanisms underlying the evolution of these extreme survival states are unknown. The African turquoise killifish has evolved diapause as a form of suspended development to survive the complete drought that occurs every year in its habitat. Here we show that many gene duplicates – paralogs – exhibit specialized expression in diapause versus normal development in the African turquoise killifish. Surprisingly, paralogs with specialized expression in diapause are evolutionarily very ancient, and they are also present even in vertebrates that do not exhibit diapause. Profiling the chromatin accessibility landscape among different fish species reveals an evolutionarily recent increase in chromatin accessibility at these very ancient paralogs, suggesting rewiring of their regulatory landscape. The increase in chromatin accessibility in the African turquoise killifish is linked to the presence of new binding sites for transcription factors (e.g., FOXO, REST, and PPAR), due to both de novo mutations and transposable element insertion. Interestingly, accessible chromatin regions in diapause are enriched for lipid metabolism genes. By performing lipidomics in different fish species, we uncover a specific lipid profile in African turquoise killifish embryos in diapause. Notably, select very long-chain fatty acids are high in diapause, suggesting they may be used for long-term survival in this state. Together, our multi-omic analysis indicates that diapause is driven by regulatory innovation of very ancient gene programs that are critical for survival. Our work also suggests a mechanism for how complex adaptations evolve in nature and offers strategies by which a suspended animation program could be reactivated in other species for long-term preservation.
Introduction
Extremophiles – species that live in extreme environments – have evolved unique adaptations for survival. Understanding how extreme adaptations evolve can reveal new pathways with important ramifications for survival in all organisms. The African turquoise killifish Nothobranchius furzeri is an extremophile for embryo survival. This vertebrate species lives in ephemeral ponds in Zimbabwe and Mozambique that completely dry up for ~8 months each year (1). To survive this annual drought, the African turquoise killifish has evolved two key adaptations: a rapid life to successfully reproduce during the rainy season and a form of long suspended animation, with embryos entering diapause and subsisting in the mud during the dry season (2–5). Diapause embryos survive for months even years – longer than adult life – without any detectable tradeoff for future life (6). Remarkably, diapause embryos already have complex organs and tissues, including a developing brain and heart (6). Hence, diapause provides a unique form of long-term protection to a complex organism.
Like other suspended animation states (hibernation, torpor), diapause is a multifaceted and active adaptation. Diapause also exists in other vertebrate species, including mammals (e.g., bear, roe deer, mice) (7). As diapause is extreme in the African turquoise killifish, this species provides a model to understand the mechanism and evolution of this suspended animation trait in vertebrates. Many genes involved in chromatin remodeling, metabolism and stress resistance are upregulated in killifish (6, 8, 9). Yet, how such an extreme and coordinated program evolved in nature is unknown. Using the lens of evolution to understand diapause could uncover new protective mechanisms for long-term survival and offer a framework for the evolution of extreme adaptations in nature.
Paralogs that specialize for expression in diapause are evolutionarily very ancient
We asked when the genes expressed in diapause originated in evolutionary time. To this end, we focused on paralogs – duplicated copies of genes (10, 11). Paralogs are the primary mechanism by which new genes originate and specialize for new functions or states (12, 13) (Fig. 1A). Paralogs also allow for a precise timing of the evolutionary origin of specific genes and they could help explain how the killifish genome can support two seemingly antagonistic traits – rapid life vs. suspended animation. Using phylogenetic inference (see Methods), we find that the African turquoise killifish genome contains 20,091 paralog pairs. We used our previously generated RNA-seq datasets of development and diapause in the African turquoise killifish (6) to analyze if the expression pattern of paralogs has diverged in diapause versus normal development states. Interestingly, many paralog pairs show opposing expression, with one gene in the paralog pair highly expressed in diapause (‘diapause-specialized gene’ e.g., the chromatin modifier EZH1) and the other gene in the paralog pair highly expressed in development (‘development-specialized gene’ e.g., the chromatin modifier EZH2) (Fig. 1B and fig. S1). Overall, 6,247 paralog pairs show expression specialization in diapause versus development state (Fig. 1C and fig. S1, Data Files S1 and S2).
We next asked whether paralogs that exhibit expression specialization in diapause are evolutionarily recent or ancient. Diapause in the African turquoise killifish is a relatively recent specialization that evolved less than 18 million years (MY) ago (14). To date the paralogs, we generated a paralog classification pipeline to identify the evolutionary time when each of the African turquoise killifish paralogs originate compared to other species (Fig. 1D and fig. S2) (15). We distinguished i) very ancient paralogs (shared with all vertebrates, including mammals) that originated more than 473 MY ago, ii) ancient paralogs (shared with all other fish) that originated between ~111-473 MY ago, and iii) recent/very recent paralogs (killifish/African turquoise killifish specific) that originated less than ~111 MY ago (Fig. 1D). Surprisingly, very ancient paralogs were significantly more likely to specialize for diapause compared to the genome-wide average, even though diapause originated recently (Fig. 1E). In contrast, ancient and especially recent/very recent (killifish-specific) paralogs were significantly less likely to specialize for diapause compared the genome-wide average, even though they originated around the time when diapause evolved (Fig. 1E). Consistently, paralogs that originate from very ancient vertebrate-specific whole genome duplication or ancient small-scale duplications were more likely to specialize for diapause (fig. S3). The enrichment for very ancient paralog pairs for specialization in diapause was robust to varying outgroups, phylogeny, method to identify paralogs, and paralog family size (fig. S4, A to E). As a specificity control, such an enrichment was not observed for paralogs that are expressed at the same level during development and diapause (in fact, those exhibited a depletion for very ancient paralogs) (fig. S4F). The genes specialized for expression in diapause did not exhibit increased positive selection at the protein level, raising the possibility of regulatory rewiring (fig. S4G). Thus, very ancient paralogs are co-opted in diapause, suggesting that ancestral programs are harnessed for this suspended animation state – perhaps by remodeling of the regulatory landscape.
Very ancient paralogs also specialize in diapause in other killifish species with diapause
Many killifish species populate the world, and their ability to undergo diapause is linked to their environment. Killifish species that live in ephemeral ponds exhibit diapause (e.g., African turquoise killifish, South American killifish), whereas killifish species that live in constant water do not undergo diapause, and instead they continuously develop (e.g., red-striped killifish and lyretail killifish, both of which are from Africa) (16–19) (Fig. 2A). To assess whether the specialization of ancient paralogs in diapause is generalizable to other species that evolved diapause independently, we used available RNA-seq data from diapause and development in the South American killifish with diapause, Austrofundulus limnaeus (9). We also generated new RNA-seq data from the developing embryos of the red-striped killifish Aphyosemion striatum and the lyretail killifish Aphyosemion australe – the closest relatives of the African turquoise killifish N. furzeri but without diapause (Fig. 2A). We found that in the South American killifish, paralogs also showed specialized expression in diapause versus development (Fig. 2, B and C, and fig. S5), and that their specialized expression correlated with that of paralogs in the African turquoise killifish (Fig. 2D and fig. S6, A and B). In contrast, killifish species without diapause expressed both paralogs during development (fig. S6C). Importantly, paralogs with specific expression in diapause in the South American killifish were also enriched for very ancient gene duplicates (Fig. 2E). Collectively, these results indicate that very ancient paralog pairs have been repeatedly co-opted for specialized expression in diapause during evolution. Together, our observations also raise the possibility that the recent specialization for diapause is driven at least in part by regulatory innovation of very ancient paralog pairs.
Evolutionarily recent remodeling of the chromatin landscape at very ancient paralogs
To characterize the regulatory landscape of the paralogs that specialize in diapause during evolution, we profiled the chromatin accessibility landscape in different species of killifish. We performed ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing), which assesses chromatin accessibility genome-wide (20), on embryos during diapause and development in killifish species with diapause (African turquoise killifish, South American killifish) and embryos during development in killifish species without diapause (lyretail killifish and red-striped killifish) at a similar developmental stage (Fig. 3A). We also used available ATAC-seq data for medaka and zebrafish development (21). We verified the quality of our ATAC-seq samples by transcription start site enrichment of open chromatin and fragment size periodicity (See Methods, fig. S8). Chromatin states easily separated diapause and development embryos by Principal Component Analysis (PCA) in the African turquoise killifish (Fig. 3B). Chromatin states also separated diapause and developmental samples of different killifish species (Fig. 3B). In the African turquoise killifish, 6,490 genomic regions were differentially accessible in diapause compared to development genome-wide (fig. S7A, Data Files S1 and S3), and they were located mostly in promoter, intronic, or distal intergenic (e.g., enhancer) regions (fig. S7B). There was a positive correlation between chromatin accessibility and gene expression levels in diapause in the African turquoise killifish (fig. S7, C and D). Together, these results indicate that our datasets for chromatin state landscape in diapause and development in several killifish species are of good quality.
We next examined accessible chromatin regions (ATAC-seq peaks) at paralogs that are differentially expressed in diapause versus development (Fig. 3C) (e.g., DNAJA4 and DNAJA2, Fig. 3D and fig. S1B). As paralogs that specialize in diapause are very ancient (> 473 MY), we therefore asked when chromatin accessibility occurred in evolutionary time (Fig. 3C and fig. S9). To quantify chromatin accessibility over evolutionary time, we developed a pipeline to identify the relative evolutionary origin of ATAC-seq peak based on multi-genome alignment (see Methods), and classified each ATAC-seq peak as i) ancient/very ancient (i.e. chromatin accessible in all fish species evaluated, such as CBX8) (Fig. 3E and fig. S9), ii) recent (i.e. chromatin accessible only in killifish species, such as HNRNPA3), and very recent (chromatin accessible only in the African turquoise killifish, such as OSBPL5) (Fig. 3E and fig. S9). Interestingly, most regulatory regions of very ancient paralogs (> 473 MY) that are differentially regulated in diapause exhibited chromatin accessibility very recently (~18 MY), only in the African turquoise killifish (Fig. 3C and fig. S10). The very recent chromatin accessibility at very ancient paralogs specialized in diapause was generalizable to non-paralog genes (fig. S10A) and was more pronounced at distal regulatory elements (likely enhancers) (fig. S10B). Thus, the African turquoise killifish exhibits an evolutionary recent remodeling of the chromatin accessibility landscape at very ancient genes.
Mechanisms underlying the evolution of chromatin accessibility in diapause
What are the mechanisms connecting evolutionary recent chromatin accessibility with diapause? Chromatin regions that opened recently in diapause paralogs in the African turquoise killifish were enriched for transcription factor motifs, including REST, NR2F2, Forkhead transcription factors (e.g., FOXA1, FOXO3), and PPAR (e.g., PPARA) (Fig. 4A). Most diapause-specific transcription factor binding motifs were only enriched in the African turquoise killifish, but not in other closely related fish without diapause (Fig. 4B and fig. S11). Thus, these transcription factor motifs arose very recently in the African turquoise killifish after divergence from other killifish species without diapause and could underlie the evolutionarily recent opening of chromatin at very ancient paralogs.
New transcription binding motifs can arise de novo by point mutation or transposable element (TE) insertion (22, 23) (Fig. 4C). A majority (81%) of the transcription factor binding motifs associated with diapause accessible chromatin at specialized paralogs in the African turquoise killifish evolved de novo, via mutation of the ancestral sequence (Fig. 4D). For example, transcription factor motifs (e.g., FOXO3 motifs) were canonical binding sites (as defined by HOMER (24)) in the African turquoise killifish sequence but were slightly divergent in closely related fish without diapause and even more divergent or even absent in more distant fish species (Fig. 4, E and F, and fig. S12). Importantly, we found a signature of positive selection (25) at many of the diapause-specific accessible chromatin regions in the African turquoise killifish, with enrichment for binding motifs for FOXO3, REST, and PPAR (Fig. 4G and fig. S13). Thus, the African turquoise killifish may have selected for canonical transcription factor binding motifs at regulatory regions of genes beneficial for diapause.
Intriguingly, some binding motifs associated with diapause in the African turquoise killifish paralogs overlapped with transposable elements and were unique to this species (Fig. 4D). While overlaps with transposable elements represents a minority of cases (5%) (Fig. 4D), transposable elements can deliver a transcription factor binding motif to new regulatory sites faster than gradual mutation and selection. As transposable elements have exploded in the African turquoise killifish genome (26), they may represent an evolutionary mechanism to co-opt genes into the diapause expression program. Several transposable element families (e.g., DNA transposons and LINEs) were highly enriched at accessible chromatin regions in diapause in the African turquoise killifish (Fig. 4H). In some cases, these regions contained both a transposable element and a transcription factor binding site (e.g., PPAR) (Fig. 4I and fig. S14). Hence, transcription factor binding motifs underlying diapause-specialized paralogs may have originated not only through mutation and selection but also via a recent burst, in the African turquoise killifish, of transposon-mediated reshuffling.
Functional enrichment and lipidomics reveal specific lipids in the diapause state
We asked whether ancient paralogs that are recently repurposed for diapause in the African turquoise killifish are associated with a specific biological function. Analysis of gene expression and chromatin accessibility at diapause-specific paralogs showed enrichment of several functions related to lipid metabolism (e.g., lipid storage, very long chain fatty acid metabolism and regulation of fatty acid beta oxidation) (Fig. 5A, Data Files S5 and S6). Both gene expression and chromatin accessibility datasets showed enrichment of upstream regulators of lipid metabolism (e.g., FOXO1 (27)) or transcriptional sensors of fatty acids (e.g., PPAR) (28) (Fig. 5B, Data File S7). These observations raise the possibility that lipids play an important role in diapause.
While some lipids and metabolites have been examined in killifish embryos and adults (29–31), a systematic profiling of lipids in diapause vs. development in killifish species with and without diapause has not been done. We therefore performed untargeted lipidomics on the African turquoise killifish embryos at different times: pre-diapause and diapause at different times (6 days and 1 month). As a comparison, we also performed lipidomics on embryos of another killifish species that does not undergo diapause (red-striped killifish) (Fig. 5C). The lipidome separated diapause from development in the African turquoise killifish and development in the red-striped killifish by PCA (Fig. 5D). Glycerophospholipids (e.g., phosphatidylcholines [PC]), which are membrane lipids, and triglycerides (TGs), which are storage lipids, were both changed in diapause in comparison to development stages (fig. S15A). The triglyceride changes in diapause are consistent with expression differences of genes, including specialized paralogs, encoding triglyceride metabolism enzymes and regulators (fig. S15B). Interestingly, we observed an enrichment of TGs containing very long chain fatty acids (fatty acids with chain lengths of 22) in diapause compared to development, and the majority of these very long chain fatty acids have 5 (docosapentaenoic acid, DPA) and 6 (docosahexaenoic acid, DHA) double bonds (Fig. 5, E and F). The same TGs with very long-chain fatty acids were also more abundant in African turquoise killifish embryos, even at pre-diapause, than in red-striped killifish at the equivalent state of development (Fig. 5G and fig. S15, C and D). As very long-chain fatty acids are processed by peroxisomes and subsequently by mitochondria to produce energy (32), they may serve as a long-term energy reserve for diapause. Other lipids, such as many ether-linked glycerophospholipids (plasmalogens), which can protect brain and hearts from oxidative stress (33–35), are more abundant in diapause than development and higher in the African turquoise killifish compared to the red-striped killifish (fig. S15D, Data File S8). Collectively, these data suggest that the African turquoise killifish has evolved to pack specific lipids, including long-chain fatty acids and membrane lipids with antioxidant properties, in their embryos. The rewiring of key transcription factor binding sites (e.g., FOXO1 or PPAR binding sites) at specialized paralogs (and other genes) involved in lipid metabolism could modulate lipid management for long-term protection and efficient storage and usage of specific fatty acids (Fig. 5H).
Discussion
Our study shows for the first time that although diapause evolved recently (less than 18 MY ago), the paralogs that specialized for diapause are ancestral and shared by most vertebrates (>473 MY old). This paralog specialization in the African turquoise killifish diapause is likely achieved by recent co-opting of conserved transcription factors (such as REST, FOXOs, and PPAR) and repurposing of their regulatory landscape by mutations and selection and transposon element insertion. Our multi-omics analysis of the diapause state (transcriptomics, chromatin states, lipidomics) and comparative analysis with several fish species suggests a model for diapause evolution via very ancient paralog-specialization. After duplication in the ancestor of most vertebrates, these very ancient paralogs likely specialize in the transient response to harsh environment (e.g., transient lack of food or temperature change, or other changes), which ensures their long-term maintenance in the genome. When the ancestors of African turquoise killifish transitioned to ephemeral ponds over 18 million years (14), these paralogs evolved new transcription factor binding motifs driving further specialization, notably for lipid metabolism genes, for survival under extreme conditions in diapause (Fig 5H).
Elucidating the mechanisms underlying the origin of complex adaptations and phenotypes (e.g., ‘suspended animation’, novel cell types/tissues, etc.) is a central challenge of evolutionary biology (36). Gene duplication is the primary mechanism to generate new genes, and these act as substrate to evolve new functions. For example, ancient gene duplicates (paralogs) are specialized for expression in different tissues (37, 38). Ancient gene duplicates (paralogs) can also contribute to the evolution of new organs such as electric organ (39) and placenta (40). Gene duplicates have also been correlated with exceptional resistance to cancer in long-lived species (41–43). The specialization of paralogs might also explain how the African turquoise killifish genome can support two seemingly antagonistic complex traits – e.g., rapid life and long suspended animation. However, the mechanisms of how divergence of duplicated genes or paralogs contribute to the evolution of complex adaptations are still poorly understood. Emerging evidence, including our study, suggests that complex adaptations can arise by rewiring gene expression by unique regulatory elements (22, 23, 44, 45). Our results indicate that such a rewiring can be achieved using de novo regulatory elements and in some cases transposon insertion.
Cis-regulatory elements such as enhancers and promoters are known to evolve rapidly (46–49), and they can in turn facilitate complex adaptations with the same set of conserved genes. Transposon insertion can be even faster in promoting the rearrangement of regulatory regions (50–57). Rapid reshuffling of regulatory regions by mutation or transposon insertion provides a framework for the evolution of complex trait in nature. Such a mechanism could extend to the evolution of other complex traits, including regenerative capacity, which involves new enhancers in killifish (58), although other mechanisms may also contribute, including positive selection of specific genes (8, 9, 26, 59–61).
Our work also reveals specific genes and lipids that could be critical for long-term survival in suspended development. Lipids that accumulate in a state of ‘suspended animation’ (e.g., very long-chain fatty acids) could serve as key substrates for long-lasting survival (62, 63). Alternatively, they could also serve as new signals to affect specific aspects of the diapause state (in addition to known signals in other species, such as vitamin D or dafachronic acid in the South American killifish (64) and in C. elegans, respectively) (65, 66). The pathways and regulatory mechanisms we identified could also apply to other states of suspended animation and even to adult longevity. For example, transcription factors whose motifs are enriched in the diapause state (e.g., FOXOs, PPARs) are genetically required for suspended animation states, such as C. elegans dauer (67–69), and are expressed in mammalian hibernation (70). Furthermore, lipids and lipid metabolism genes are expressed differentially in mammalian diapause (71) as well as hibernation and torpor (63, 72, 73), and are under positive selection in exceptionally long-lived mammals (74). Finally, several of the transcription factors we identified (e.g., FOXOs, REST) are genetically implicated in longevity in C. elegans and flies (75–80). Our results place these previous genetic and expression findings in a natural context and reveal how selective pressure can co-opt key metabolic programs to achieve extreme phenotypes. These observations also raise the possibility that a core program of lipid metabolism genes, regulated by specific transcription factors, can be deployed to achieve metabolic remodeling and stress resistance in diverse contexts, including in adults. Our study provides a new multi-omic resource for understanding the regulation and evolution of suspended animation states (hibernation, torpor, diapause). It also opens the possibility for strategies, including lipid-based interventions, to promote long-term tissue preservation and counter age-related diseases.
SUPPLEMENTARY MATERIALS
Material and Methods
References 81–131
Figs. S1 to S17
Table S1
Data Files S1 to S8
ACKNOWLEDGMENTS
We particularly thank J. Pritchard for invaluable discussion on paralogs and evolutionary analyses and for feedback on the manuscript. We thank J. Podrabsky and J. Wagner for providing A. limnaeus embryos and helpful suggestions with husbandry. We thank M. Robinson-Rechavi and J. Liu for help with positive selection analysis in ATAC-seq peaks. We thank J. Miklas, X. Zhao, K. Papsdorf, T. Ruetz, J. Chen, H. Fraser, G. Bejerano, S. Goenka, H. Chen, and all Brunet lab members for helpful discussion or comments. We thank C. Bedbrook, E. Sun, F. Boos, T. Ruetz, and X. Zhao for independent code-checking. Funding: This work was supported by the Glenn Foundation for Medical Research (A.B.), a fellowship from Stanford Center for Computational, Evolutionary, and Human Genomics (P.P.S.), and a National Science Foundation graduate fellowship (G.A.R.). Author contributions: P.P.S. designed the project with help from G.A.R. and A.B.. P.P.S. performed all the killifish experiments and computational analyses unless otherwise indicated. G.A.R. generated all ATAC-seq libraries, performed multi-genome integration of ATAC-seq and transposon analyses. K.C. and M.E. generated the lipidomics data and helped with the analysis under the supervision of M.P.S.. C-K.H. provided African turquoise killifish RNA-seq datasets pre-publication. P.P.S. wrote the manuscript with the help from G.A.R. and A.B.. All the authors provided intellectual input and commented on the manuscript.