DNA Methylation Networks Underlying Mammalian Traits

Epigenetics has hitherto been studied and understood largely at the level of individual organisms. Here, we report a multi-faceted investigation of DNA methylation across 11,117 samples from 176 different species. We performed an unbiased clustering of individual cytosines into 55 modules and identified 31 modules related to primary traits including age, species lifespan, sex, adult species weight, tissue type and phylogenetic order. Analysis of the correlation between DNA methylation and species allowed us to construct phyloepigenetic trees for different tissues that parallel the phylogenetic tree. In addition, while some stable cytosines reflect phylogenetic signatures, others relate to age and lifespan, and in many cases responding to anti-aging interventions in mice such as caloric restriction and ablation of growth hormone receptors. Insights uncovered by this investigation have important implications for our understanding of the role of epigenetics in mammalian evolution, aging and lifespan.


Introduction
The oldest common ancestor of all mammals lived approximately 300 million years ago, and over time diversified into more than 6,400 species that constitute the monotreme, metatherian, and eutherian lineages 1 . These species exhibit a great diversity in traits such as maximum lifespan, adult body weight, brain/body ratio, locomotory modes, sleep/wake cycles, diet, and social behaviors. The lineage that eventually led to the emergence of the first modern humans, Homo erectus, about two million years ago, with the acquisition of a larger brain, higher cognitive abilities, and several unique physical and social characteristics 2 . Comparative epigenomics is an emerging field that aims to combine epigenetic signatures and phylogenetic relationships to improve our understanding of gene-to-trait function 3,4,5 . Even though genomic regulatory regions are under sequence constraints 6 , epigenome evolution in these conserved regions seems to correlate with transcription in mammals 3 , potentially regulates fitness and facilitates selection. A recent study even suggested that DNA methylation marks in regulatory sequences partially relates to the phylogenetic differences of animals 5 . Previous DNA methylation studies in mammals were limited with regard to the sample size (relatively few DNA samples and few species) and the measurement platform (low sequencing depth at highly conserved stretches of DNA). To address these challenges, we profiled 176 mammalian species across 63 tissue and organ types using a methylation array platform that provides over 1k fold sequencing depth at highly conserved cytosines. This unique dataset could robustly identify co-methylation networks that underlie important species characteristics and allowed us to develop tissue-specific phyloepigenetic trees that match the topology and branch lengths of the mammalian phylogeny.

DNA methylation networks relate to individual and species traits
We used a custom methylation array platform (HorvathMammalMethylChip40) that profiles 36 thousand CpGs with flanking DNA sequences that are mostly conserved across mammalian species 7 . With this mammalian array, we generated a large-scale DNA methylation dataset, which consists of methylation profiles from 11,117 samples derived from 63 tissue types of 176 different mammalian species (Table S1). Of these 36,000 cytosines, 14,705 mapped to most eutherians and a subset of 7,956 also mapped to marsupials. Although analysis of individual cytosines can be highly effective, it ignores co-methylation relationships and clustering patterns that are associated with species traits. To address this, we used weighted correlation network analysis (WGCNA) 7 to cluster CpGs with similar methylation dynamics into co-methylation modules and summarize their methylation profiles using "module eigengenes", defined as the first principal component that positively correlates with methylation changes in each module. WGCNA also identifies intramodular hub CpGs, which are cytosines with the highest correlation within each module. As such, eigengenes and intramodular hubs are mathematical representations of a module. The respective eigengenes of these modules were subsequently used to identify their potential correlations with various traits within and across mammalian species, such as chronological age, sex, maximum lifespan, age at sexual maturity, adult body weight and other characteristics for which the corresponding information are available.
In total, nine module networks were constructed, which differed with respect to the underlying CpGs, tissues, and how they relate to different species (Extended Data Fig.1). The eutherian network (Net1) was formed from 14,705 conserved CpGs in 167 eutherian species (Fig.  1a). This network showed the strongest association with various traits and will be the main focus of the following sections. The Mammalian network (Net2) consisted of eutherians plus nine marsupial species, reducing the number of cytosines to 7,925, which are shared between eutherians and marsupials (Fig. 1b). This subset network was used to identify marsupial-related modules. To identify the most conserved modules across species and tissue types, we developed seven consensus networks (see methods). These networks can be interpreted as a meta-analysis of methylation networks to identify consensus modules that are preserved across mammalian tissues (Extended Data Fig.1).
The eutherian network (Net1) clustered the 14,705 CpGs into 55 modules (Fig. 1a). These modules were derived from unsupervised clustering of cytosine methylation levels, and were labelled by colors per the convention of WGCNA (Fig. 1a). The smallest module (lavenderblush3 color) consisted of 33 CpGs, while the largest (turquoise module) had 1,864 CpGs. As information on phylogenetic order, tissues, lifespan, age, sex and weight of each data point were available, it allowed us to directly assess if any of the modules were enriched for these characteristics. Of the 55 modules, 31 were found to be related to specific species or biological characteristics (linear regression p<10 -200 ) ( Fig. 1b; Extended Data Fig.2; Table S3). Specifically, 16 modules were related to phylogenetic orders. Eleven other modules related to tissue type, two modules to sex, one module to age, two modules to maximum lifespan, and two modules to average adult species weight. Some modules were simultaneously related to several characteristics. For example, the two Rodentia modules (green, and yellowgreen) were also related to species lifespan and weight. One of the marsupial modules (royalblue; identified from Net2) was also related to eutherian skin. For ease of comprehension, modules were labelled with the trait and direction of relationship by superscript +/-signs. For example, the green module was designated as LifespanWeight (+) Rodentia (-) module The plus sign in the superscript indicates that the cytosines in the green module tend to be hypermethylated in long living and heavy species. The minus sign indicates that the green module CpGs tend to be hypomethylated in rodents. Collectively, the WGCNA results demonstrate that numerous species-specific primary traits are manifested at the level of DNA methylation; a feature that was discovered here through an unsupervised clustering of related cytosines.
Beyond association with phenotypic traits, we sought to uncover what the modules represent at the cellular and molecular levels. We identified genes that are adjacent to cytosines within the clusters and ascertained the molecular and cellular activities that are associated with them. In general, the 500 top hub CpGs of the modules were adjacent to genes associated with a wide range of biological processes including development, immune system, metabolism, reproduction, stem cell biology, stress response, aging, or several signaling pathways (Fig. 1c). Thus, these modules, which were derived independent of any prior biological information, are a rich source of information on the underlying biological processes that are associated with characteristics and phenotypic differences between species. Modules that relate to specific phylogenetic orders, or even species, are expected to be valuable starting points for experimental interrogation of mammalian evolution. The modules with no relationship to any analyzed traits are still valuable for future studies. For example, the yellow module with 895 CpGs is the largest consensus module that is preserved in all mammalian tissues but did not relate to any of the primary traits (Fig. 1a). Hub CpGs of this module are found to be adjacent to genes involved in mRNA processing, cell cycle, melanocyte development and circadian rhythm ( Table S3). The prediction of the involvement of cytosines in the yellow module to circadian rhythm was tested in using a perturbation experiment mice. After 12 months of light pollution during night time, only the yellow module experienced a significant increase in liver methylation levels (Extended Data Fig.2b).

Evolution and DNA methylation
The evolutionary divergence of three main mammalian clades (eutherians, marsupials, monotremes) occurred over approximately 250 million years (Fig. 2a). Many phylogenetic trees of mammals have been constructed; based initially on morphology and later on DNA or protein sequences. Prior to this study, it was unknown whether there are stable and consistent species differences in DNA methylation levels at conserved CpG sites that could allow the construction of what could be termed as a phyloepigenetic tree. The DNA methylation profiles at our disposal presented a unique opportunity to address this issue. To avoid confounding by different tissue types, we constructed tissue-specific phyloepigenetic trees. Strikingly, the resulting phyloepigenetic trees parallel the evolutionary distances among taxa established genetically for phylogenetic trees ( Fig. 2b; Extended Data Fig.3). This holds true for phyloepigenetic trees constructed from different tissues (blood, liver and skin), as well as those from hierarchical clustering of modules based on order and species (Fig. 2c, d). This result was expected as the mammalian methylation array is designed to measure methylation in highly conserved genomic sequences. The close relationship between phylogenetic and phyloepigenetic trees reflects an intertwined evolution of genome and epigenome that mediates the biological characteristics of the different mammalian species.

Consensus modules relate to tissue type, age, and sex
The mammalian methylation dataset is an atlas of 63 tissue-types, cell-types, and tissue regions from a wide age range of mammalian species. This allowed us to create a tissue module atlas and gain biological insights into DNA methylation-based mechanisms of tissue differences (Fig.4a). We related the module eigengenes to the following sample characteristics: tissue type, sex, and relative chronological age at the time of tissue collection. DNA methylation could distinguish the tissues with large sample sizes such as blood, skin, liver, muscle, cerebellum, cerebral cortex, and brain. The tissue modules were largely preserved in individual phylogenetic orders (Fig.4a, Table S3, Extended Data Fig.5c). For example, the top blood modules in different phylogenetic orders were Blood (+) (magenta) and Blood (-) (tan) modules. Both modules are enriched with immune system genes such as those for B cell maturation ( Table S3). The cerebellum modules (violet, darkslateblue) on the other hand, are enriched for genes related to neuron development, projection, and differentiation pathways. Thus, tissue-specific modules corroborate the biological differences between tissue types.
Unsurprisingly, these modules consist mainly of X-chromosome CpGs (Fig.4c). The only autosomal CpG is located on the POU3F2 exon in human chromosome 6. This gene showed sexrelated differential methylation in primates 8 , and is also indicated to contribute to maternal behavior differences in mammals 9 .
We correlated the module eigengenes with two different measures of chronological age: age and relative age, which is defined as the ratio between age of the organism and maximum lifespan of its species (the relative age of a 61-year-old human is 0.5, as recorded maximum human age is 122.5). The purple module (denoted subsequently as RelativeAge (+) module) exhibits a significant positive correlation (r = 0.37, p<10 -200 ) with the relative age of all mammalian samples (Fig. 4d). This module also relates to the chronological age of the animal at the time of sampling (Extended Data Fig.4f). The RelativeAge (+) module was corroborated in different consensus networks including 57 species-tissues, 35 species, and 27 species blood networks. Thus, the RelativeAge (+) module is among the most conserved modules in mammalian tissues. The RelativeAge (+) module contains 470 CpGs with hub genes such as TFAP2D, CTNNA2, POU3F2, TFAP2A, and UNC79. IPA implicates POU5F1, NANOG, SHH, KAT6A, and SOX2 proteins as putative upstream regulators. Functional enrichment of this module highlighted embryonic stem cell regulation, axonal fasciculation, angiogenesis, and diabetes-related pathways ( Table S3). The CpGs in this module were adjacent to Polycomb repressor complex 2 (PRC2, EED) targets and H3K27me3 regions (Extended Data Fig.7b). The top hits from EWAS of age in mammals 10 are also enriched in PRC2 targets and similar biological processes as the RelativeAge (+) module. Additionally, we carried out an overlap analysis between module genes and genes implicated by available large-scale genome-wide association studies (GWAS) of complex traits in LD Hub 11 , OpenGWAS database API, etc. We found that genes of the RelativeAge (+) module overlapped significantly with top hits of GIANT Body fat distribution GWAS in humans (p<10 -3 , TableS3, TableS7).

Life expectancy in pure dog breeds
As described above, five modules relate to maximum lifespan, adult weight, and age across most phylogenetic orders in our dataset, thereby allowing us to further investigate these associations in specific mammalian orders. Using DNA methylation data from n=574 blood samples from 51 different dog breeds (Table S8), we assessed whether these modules correlate with a measure of life-expectancy provided by the American Kennel Club 12 . Interestingly, the LifespanWeight (-) Rodentia (+) (greenyellow) module showed an inverse correlation with breed lifeexpectancy (r=-0.42, p<2x10 -16 ) but a positive correlation with breed weight (r=0.37, p<2x10 -16 , Fig. 5a). The correlation pattern of this module is consistent with the well-known observation that larger dog breeds have shorter life expectancies 13 . This finding is particularly interesting as this module also relates to the general mammalian trend where larger species have longer lifespans (Fig. 3a, 3b). The constituents and regulators of this module provide an entry point to understand the relationship between body size and lifespan. For example, as described above, LifespanWeight (-) Rodentia (+) (greenyellow) module is partially related to retinoic acid receptor signaling and adipogenesis. Interestingly, a recent study showed that sphingomyelin levels are higher in large-short lived dogs 14 . A substantially weaker correlation could be observed between the RelativeAge (+) (purple) module and dog breed lifespan (r=-0.16, p=7x10 -5 ).

Interventional studies in mice
Our robust identification of epigenetic modules associated with traits such as age supports the notion that DNA methylation can serve as a dynamic molecular read-out of variations in individual-level phenotypes. Thus, we ascertained whether maximum lifespan, weight, and age modules are affected by interventions that are known to modulate the lifespan of mice. Growth hormone receptor knockout (GHRKO) mice, which have greatly increased lifespan 15 , exhibited a change in specific modules: increase in LifespanWeight (+) Rodentia (-) (green, p<0.05), decrease in LifespanWeight (-) Rodentia (+) (greenyellow, p<0.01), decrease in Weight (+) (lavenderblush3, p<0.0001), and decrease in RelativeAge (+) (purple, p<0.001) eigengene values (Fig. 5b). The signs of these associations are consistent with the increased lifespan and smaller size of GHRKO mice 15 .
Caloric restriction (CR) and high fat diet are known to increase or reduce the lifespan of mice, respectively. Caloric restriction increased LifespanWeight (+) Rodentia (-) (green, p<10 -5 ), decreased Weight (+) (lavenderblush3, p<0.0001), and decreased RelativeAge (+) (purple, p<10 -5 ) eigengenes in the expected direction (Fig. 5c, d). In contrast, CR moderately increased the LifespanWeight (-) Rodentia (+) (greenyellow, p<0.05) eigengene in the opposite expected direction for longevity. High-fat diet was associated with an expected increase in the RelativeAge (+) (purple, p<0.05) module but an unexpected increase in Lifespan (+) (paleturquoise). Collectively, these studies demonstrate that some of the modules, especially those associated with lifespan, relative age, and weight across mammalian species, are indeed dynamic and can be reporters of antiaging or pro-aging interventions within a species.
Additionally, we screened for any modules that showed a dynamic response to lifespan interventions. We identified a total of six modules with such a change. Future experiments would elucidate if these modules could serve as novel biomarkers of longevity and mortality risks in mammals.

Discussion
This unbiased network analysis of the largest collection of DNA methylation data from 176 mammalian species facilitated the identification of methylation modules based on unsupervised clustering of highly-conserved CpGs. Thirty-one of the 55 modules could be interpreted by their associations with individual traits (chronological age, tissue, sex) or species traits (phylogenetic order, maximum lifespan, average adult species weight). The module-based analysis demonstrates that DNA methylation is a highly informative molecular read-out, not only at the level of individual tissues and within an organism, but also across species. This is evident from the high degree of congruence between the phyloepigenetic and phylogenetic trees. It indicates that species-level conservation and divergence of DNA methylation profiles closely parallels that of genetics through evolution. Moreover, several methylation modules show strong association with life history traits (e.g. maximum lifespan, average adult species weight) across 176 mammalian species, suggesting graded methylation changes on conserved DNA elements are robust features underlying the evolution of these quantitative traits in mammals. Overall, these results indicate that cytosine methylation data are highly informative for understanding the molecular basis of mammalian diversity. Although higher levels of DNA methylation are often associated with transcriptional silencing, a positive relationship between DNA methylation and expression levels has been observed especially in case of bivalent chromatin 16 .
The disparities in aging rate and species lifespan are among the most intriguing biological mysteries that continue to engender debate over the relative importance of different aging theories, i.e. mutation accumulation 17 , antagonistic pleiotropy 18,19 , and disposable soma 20 , which are predicated on a trade-off between reproductive fitness and longevity. Factors such as food availability, population density, and reproductive cost have been shown to impact this trade-off 21 . This study supports the view that cytosine methylation relates to biological processes that underpin evolutionary differences in mammals [22][23][24][25] .
While the identified modules lend themselves for studying many species traits, we were particularly interested in characterizing life history traits surrounding aging and development. In this context, we mention the RelativeAge (+) (purple) module, which demonstrates conservation of DNA methylation aging biology across species and tissues. A decrease in the RelativeAge (+) (purple) eigengene value, which is a collective decrease in the methylation level of CpGs in this module, related to an increase in life expectancy of dog breeds, GHRKO dwarf mice and caloricrestricted mice. In contrast, the high-fat diet increased the RelativeAge (+) eigengene. Strikingly, the genes adjacent to the CpGs of this module are highly enriched for embryonic stem cell regulation, include targets of Polycomb repressor complex 2 (PRC2, EED) and H3K27me3 regions (Extended Data Fig.7b). These genomic features are strongly implicated in stem cell biology through key regulators such as SOX2 and NANOG transcriptional factors 26,27 . This is corroborated by IPA analysis, which also highlighted NANOG and SOX2 as regulators of the RelativeAge (+) (purple) module. Our findings extend similar findings from humans to other mammals [28][29][30] , and demonstrates the intricate connection between stem cells, development and aging. Experimental perturbation of candidate regions in this module could elucidate if there is a causal link between module CpGs and lifespan of mammalian species. Future studies could evaluate whether module eigengenes associated with the RelativeAge (+) (purple) and five other identified modules (Fig. 5e) lend themselves as indicators of biological age.
In summary, application of unsupervised machine learning through WGCNA-mediated clustering of CpG methylation has unveiled the stability of DNA methylation profiles at the species level while also identifying the responsiveness and dynamic nature of some methylation modules to anti-aging or pro-aging interventions in mice. The unsupervised approach provides a level of objectivity that gives a stamp of authenticity to the observation that DNA methylation is indeed a biologically meaningful molecular readout across all levels of mammalian life, from cells to species. The information-rich modules we have identified here form a roadmap to expand our approach towards cross-species DNA methylation analysis, and will undoubtedly open a new avenue to address many long-standing fundamental questions of biology, from evolution and lifespan to aging.

Acknowledgment
This work was mainly supported by the Paul G. Allen Frontiers Group (SH). Nicolas Cermakian, Steve Brown and Stuart Peirson were involved in the mouse study of light pollution.

Conflict of interest
SH is a founder of the non-profit Epigenetic Clock Development Foundation which plans to license several patents from his employer UC Regents. These patents list SH and JE as inventors. The other authors declare no conflicts of interest.

Data Availability
The data will be made publicly available as part of the data release from the Mammalian Methylation Consortium. Genome annotations of these CpGs can be found on Github https://github.com/shorvath/MammalianMethylationConsortium

Data description
The data included 11,117 samples from 63 tissues of 176 mammalian species (167 eutherians, 9 marsupials). These samples were collected from different age ranges of most of the species. Sample collection and ethical approval for each mammalian species is described in separate individual papers 8,31-47 . The species level characteristics such as maximum lifespan, average weight, and age at sexual maturity were chosen from anAge database 48 . This data also includes DNA methylation from different dog breeds and mouse data from experimental lifespan intervention. Additional data sets included 57 horse transcriptome data generated from 29 different horse tissues 46 . All DNA samples were analyzed by the novel custom-designed mammalian methylation array 49 . This new array contains 38,608 probes that also includes 1,116 control probes. Following data collection, the SeSaMe normalization method was used to define beta values for each probe 50 .

Mappable CpGs for eutherians and mammals
The conserved probes were selected based on alignment of probes to 11 mammalian species from different phylogenetic orders 49 ASM229v1.100). These species are selected based on a large sample size in our data, a relatively high genome quality, and also representation from different phylogenetic orders.
Two sets of probes were further selected from alignments in all these genomes. The first set was the probes that mapped to these ten eutherian species, and the second set were a subset that could also map to opossum as a marsupial representative. These two sets were additionally filtered using calibration data generated from the array's performance on human, mouse, and rat synthetic DNA at different methylation levels (from 0-100% methylated) 49 . Only the probes with a linear correlation of >=0.8 in all three species calibration data were kept as a mappable probe in mammalian species. The final number of remaining probes for the analysis were 14,705 in eutherians, and 7,956 that also mapped to marsupials.

Unsupervised WGCNA
First, we formed two WGCNA networks based on the two sets of probes in our data. The first network was generated from 14,705 conserved CpGs in 10,939 samples of 167 eutherian species. The second network was a subset of 7,956 probes in 11,117 samples from 167 eutherians and 9 marsupial species. Traditionally, WGCNA is applied for transcriptome data and uses an unsupervised clustering method to assign the co-expressed genes into modules 7 . In this study, we used the WGCNA method to define the modules of co-methylation CpGs in mammalian samples. First, the adjacency matrix (correlations between CpGs) was converted into a scale-free network using the soft threshold power (tuned value = 12) of the signed matrix. The result was converted into a topological overlap matrix (TOM), and 1-TOM distance measure (dissimilarity), which was used for Hierarchical clustering of the data. The trees were trimmed using a dynamic tree-cut algorithm to assign the modules containing at least 30 CpGs. Module eigengenes (MEs) were calculated as the maximum amount of the variance of the model that can be represented by a single variable for each module, based on the singular value decomposition method. The eigengenes in the eutherian network (Net 1) explained a range of 24-63% (average = 43%) of the variance in the methylation data in each module ( Table S3). The hub CpGs of the modules were defined based on eigengene connectivity (kME) to each module. The association of module eigengenes were examined for different traits using multivariate linear regression models. The module colors in both networks were matched using the matchLabels() function in WGCNA package. Module preservation for each network was estimated using the "modulePreservation" R function in the WGCNA R package using primates as the reference for comparison.

Consensus Networks
A total of seven consensus co-methylation networks were developed to effectively remove the confounding effects by conditioning on different species and tissue type combinations. The constructed consensus networks are as follows: cNet3, 57 species-tissue strata (network where tissue/species effects were removed); cNet4, 35 species but ignore tissue strata (only species effects were removed); cNet5, 15 tissue types but ignore species (only tissue effects were removed) ; cNet6, 27 species blood (species effects were removed in blood); cNet7, 7 species brain (species effects were removed in brain); cNet8, 10 species liver (species effects were removed in liver); and cNet9, 30 species skin data (species effects were removed in skin). Blood, skin, liver, and brain were tissues with data from >7 mammalian species.
Consensus WGCNA assumes that the DNA methylation network is conserved between multiple data strata. This network was generated following methods previously described 7,51 . Briefly, the adjacency matrices (correlation) were constructed using DNA methylation beta values in each data set. The matrices were converted into scale free networks using a tuned soft threshold power for each dataset. Results were converted into TOM, merged, and then used to form a consensus tree network using a hierarchical clustering of dissimilarity matrix (1-TOM). Similarly, the colors were matched to network 1 colors.

Hierarchical clustering
DNA methylation data from tissues with more than 50 species were used for hierarchical clustering and comparison with the evolutionary tree. The hierarchical clustering of tissue samples (as opposed to CpGs) was based on complete linkage coupled with a dissimilarity measure defined as 1-correlation. The distances in the hierarchical trees (i.e. the height values) were directly compared with the evolutionary distances (based on estimated time) in a publicly available evolutionary tree 52 .

Gene ontology enrichment
The genomic region level enrichment was performed using GREAT analysis 53 and the mappable probes as the background. The analysis used human hg19 annotations, a 50kb window for extending the gene regulatory domain, and default settings for the other options. For each module, the input included up to 500 hub CpGs. The biological processes were reduced to parent ontology terms using the "rrvgo" package. The larger ontology category was defined manually. The intra-module hub CpGs were also statistically tested for overlap with human GWAS results, in which gene p values were calculated by MAGENTA algorithm. Additionally, the hub genes were analyzed by ingenuity pathway analysis to identify the enriched canonical pathways and potential upstream regulators. Fig.1 | DNA methylation network relates to species and individual characteristics in mammalian species. a, the WGCNA network of 14,705 conserved CpGs in 167 eutherian species (Network 1). The data include around 63 tissue types, from all age ranges of most of the species. The identified modules related to species characteristics (e.g. phylogenetic order, maximum lifespan), or individual sample characteristics (e.g. tissue type, age, sex). Network 1 modules were compared to eight additional networks: based on subsets of probes that map to eutherians and marsupials (Extended Data Fig. 1); and seven consensus networks based on species and tissues (Extended Data Fig. 1). The modules with strong associations with species and sample characteristics were labeled below the dendrogram. b, summary of the modules that showed strong association with species and individual sample characteristics. Analyzed traits: age, sex, tissue type, species max age, species average adult weight, and different mammalian orders. The edge labels are the direction of association with each trait. c, Top defined functional biological processes related to network 1 modules. The gene level enrichment was done using GREAT analysis and human Hg19 background limited to 14,705 eutherian probes. The biological processes were reduced to parent ontology terms (Extended Data Fig. 6), and the larger ontology category was defined manually for this summary heatmap. Detailed enrichment results are reported in the supplementary excel file. Images of animals are from Phylopic ("http://phylopic.org") or Wikimedia, which are under public domains or CC BY 3.0 license ("https://creativecommons.org/licenses/by/3.0/").  Fig.3a,b). Distances: 1-cor. c, Heatmap of phylogenetic order specific modules. * indicated the top two modules related to each phylogenetic order with minimum absolute correlation of 0.5. The marsupial modules were identified in Network 2, which was based on the conserved CpGs in both eutherians and marsupials. d, DNA methylation modules associated with individual species. The top two modules for each species are labeled by *. The marsupial modules are identified in network 2. The color code of the rows shows the phylogenetic order of each species. The rows are clustered based on hierarchical clustering of Euclidean distances and complete method. The species with weak module association were not shown in the heatmap. e, The correlation distances of hierarchical clustering of blood DNA methylation and evolutionary tree. Additional analyses are reported in the supplement (Extended Data Fig. 4). Images of animals are from Phylopic ("http://phylopic.org"), which are under public domains or CC BY 3.0 license ("https://creativecommons.org/licenses/by/3.0/").  The tissues with no modules were excluded from the heatmap. * indicated the top two modules related to each tissue, cell type, or tissue region with minimum absolute correlation of 0.5. b, Sex-specific modules in different phylogenetic orders. c, Distribution of sex module CpGs on sex chromosomes. d, The top age-related module in all mammalian species (p<2.2e-16). The association with relative age (chronological age/maximum reported age) was examined after adjustment for tissue, sex and species differences. Eigengene: 1 st principal component of the module that positively correlates with DNA methylation levels. conserved CpGs in 167 eutherian species, showed the strongest module-trait associations. An additional unsupervised network was developed based on subsets of probes for future study applications: Network 2, 7,925 probes that map to both eutherians and marsupials. Additionally, seven consensus networks of 14,705 eutherian probes by different tissue and species combinations were formed to identify the most conserved modules for studying individual sample characteristics. All module colors were matched to network 1 for comparison. Fig.2 | DNA methylation Module-trait association in mammals. a, Association of different individual and species-level traits with network 1 modules. Network 1 consists of 14,705 conserved CpGs in 167 eutherian species. The data include around 62 tissue types, from all age ranges of most of the species. b, Long term exposure to light pollution increases the yellow module eigengene. Control group, N = 8, standard light/dark cycles of 12 hours light (100 lux) followed by 12 hours dark. Circadian rhythm disrupted group, N = 8, the light cycle included12 hours light (100), followed by 12 hours dim light (20 lux). Cohorts were exposed to these conditions from three months of age, for a period of 12 months. c Extended Data Fig.3 | DNA methylation is a good indicator of mammalian evolution. a, Phyloepigenetic trees in blood, liver and skin of mammalian species. Distances: 1-cor. b, The distances of phyloepigenetic and evolutionary trees are highly associated. The size of the dots indicates the number of overlapping points in the plot. c, Sensitivity analysis of phyloepigeneticphylogenetic trees relationship based on probe sequence conservation. 100 random probes were selected based on reduction in conservation and alignment to mammalian genomes. The phyloepigenetic distances (1-cor) were calculated from hierarchical clustering of blood DNA methylation data. The phylogenetic distances are based on the TimeTree database. 100 probes map to number of species correlation phyloepigenetic−phylogenetic trees Effects of probe conservation on phyloepigenetic tree Extended Data Fig.4 | DNA methylation modules association with age, max age and weight in mammals. The association of modules eigengenes with relative age (a), log species maximum age (b), log species average adult weight (c), chronological age (d), and log chronological age (e). Red line: p =0. Covariates: tissue, sex and species differences for relative age, chronological age and log chronological age; relative age, tissue, and sex for log(max age) and log(average weight). f, Purple module also relates to log chronological age in mammals. Module association with log maximum age (g) and log species adult weight (h) within each phylogenetic order. * indicates the top 3 modules for max age, and weight modules for each order. i, Phylogenetic regression analysis of the modules with mammalian maximum lifespan. The evolutionary tree for phylogenetic regression was downloaded from the TimeTree database. Only the top two modules are presented in the scatter plot. Covariates: relative age, tissue, and sex. Fig.5 | DNA methylation modules relate to mRNA changes and are preserved in different phylogenetic orders. a,b, DNA methylation modules and mRNA networks are canonically correlated in horse tissues. The mRNA and DNA methylation data originated from 57 samples from 29 different horse tissues. Canonical correlation is done for the 10 hub CpGs of each module and mRNA changes of their neighboring genes in the horse genome (EquCab3.0.100). The red lines indicate p<0.05. c, The DNA methylation modules are highly preserved in different phylogenetic orders. Module preservation is estimated with permutation of networks in each order and comparison with primates. The figures show the summary z score of preservation, summary p value of preservation, and median rank of preservation for each module. Fig.6 | Gene ontology enrichment of the hub CpGs in each module. The gene level enrichment was done using GREAT analysis and human Hg19 background. The background probes were limited to 14,705 conserved probes in eutherians. The biological processes were reduced to parent ontology terms, and the top 10 parent terms with p<10 -3 for each module were reported in the heatmap. The larger ontology category was defined manually and was used to arrange the ontology terms in the heatmap. * indicates the top ontology category for each module. The input includes the gene neighbors to up to 500 hub CpGs for each module. Fig.7 | Enrichment analysis of modules associated with phylogenetic orders, age, max age, and average weight. The gene level enrichment was done using GREAT analysis and human Hg19 background. The background probes were limited to 14,705 conserved probes in eutherians. The input includes the gene neighbors to up to 500 hub CpGs for each module. a, Enrichment analysis of phylogenetic order modules. The plot shows the top unique enriched terms among the modules. b, Enrichment analysis of modules associated with max age and species weight. In addition to the unique enriched terms for each module, additional terms were mined based on specific keywords: 'mortality', 'aging', 'survival', 'perinatal lethality', and 'weight'. The plot shows the top unique and top 20 overlapped enriched terms among the modules.