Genome-wide association studies identify 137 loci for DNA methylation biomarkers of ageing

Biological ageing estimators derived from DNA methylation (DNAm) data are heritable and correlate with morbidity and mortality. Leveraging DNAm and SNP data from >41,000 individuals, we identify 137 genome-wide significant loci (113 novel) from meta-analyses of four epigenetic clocks and epigenetic surrogate markers for granulocyte proportions and plasminogen activator inhibitor 1 levels, respectively. We report strong genetic correlations with longevity and lifestyle factors such as smoking, education, and obesity. Significant associations are observed in polygenic risk score analysis and to a lesser extent in Mendelian randomization analyses. This study illuminates the genetic architecture underlying epigenetic ageing and its shared genetic contributions with lifestyle factors and longevity.


Introduction
Ageing is associated with an increased risk of a multitude of physical, cognitive and degenerative disorders [1]. While the rate of chronological ageing is constant between individuals, there are inter-individual differences in the risk of age-associated morbidities.
Biological ageing is influenced by both environmental and genetic factors [2]. Multiple measures of biological age exist, several of which have drawn information from DNA methylation (DNAm) across the genome. DNAm is a common epigenetic modification typically characterised by the addition of a methyl group to a cytosine-guanine dinucleotide (CpG). DNAm levels can be influenced by both genetic and environmental factors, and in recent years, DNAm signatures have become established correlates of multiple health-related outcomes [3,4,5]. Such signatures include "epigenetic clocks", accurate markers of ageing which associate with several health outcomes [6,7]. Epigenetic clocks use weighted linear combinations of CpGs to predict an individual's chronological age and have common SNPbased heritability estimates ranging from 0.15 to 0.19 [8,9]. Individuals with epigenetic . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint clock estimates greater than their chronological age display "age acceleration", and have been shown to be at a greater risk of all-cause mortality and multiple adverse health outcomes [10].
The first generation of epigenetic ageing clocks used penalized regression models to predict chronological age on the basis of DNA methylation data, e.g. the widely used clocks from Hannum (2013) and Horvath (2013) apply to blood and 51 human tissues/cell types, respectively [11,12,13]. A derivative of the Horvath clock, intrinsic epigenetic age acceleration (IEAA) has since been developed, conditioning out (i.e. removing) estimates of blood cell composition. An increasing literature supports the view that IEAA relates to properties of hematopoietic stem cells [8,2,14]. The second generation of epigenetic clocks move beyond estimating chronological age by incorporating information on morbidity and mortality risk (e.g., smoking, plasma protein levels, white blood cell counts), and chronological age. Two such predictors, termed PhenoAge (a DNAm predictor trained on a measure that itself was trained on mortality, using 42 clinical measures and age as input features) and GrimAge (trained on mortality, including a DNAm measure of smoking as a constituent part), outperform both Hannum and Horvath clocks in predicting mortality, and are associated with various measures of morbidity and lifestyle factors [15,16]. DNAm GrimAge outperforms PhenoAge and the first generation of epigenetic clocks when it comes to predicting time to death [8,17,18].
While nothing is known about the genetics of the second generation of epigenetic clocks, 13 genetic loci have been associated with the first generation of epigenetic clocks. A study of nearly ten thousand individuals revealed a regulatory relationship between human telomerase (hTERT) and epigenetic age acceleration [8]. More recently, a larger GWAS (n=13,493) revealed that metabolic and immune pathways share genetic underpinnings with epigenetic clocks [9].
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint Here, we greatly expand on these studies across several dimensions. First, we analyse a large, multi-ethnic dataset comprised of over 41k individuals from 29 European ancestry studies, seven African American studies, and one Hispanic ancestry study. Second, we characterize for the first time the genetic architecture of the second generation epigenetic clocks, GrimAge and PhenoAge. Third, we also conduct GWAS of two important DNAm based surrogate markers: DNAm plasminogen activator inhibitor-1 (PAI1) levels and granulocyte proportion, respectively. DNAmPAI1 was chosen because it exhibited stronger associations with cardiometabolic disease than the epigenetic clocks [15]. The DNAm based estimate of granulocyte proportions was chosen because a) it exhibited significant associations with several epigenetic clocks (including GrimAge and PhenoAge) and with health outcomes such as Parkinson's disease [15,16,19]. The unprecedented sample size of the current study allowed us to develop polygenic risk scores for these six epigenetic biomarkers.
We report 137 independent loci, including 113 novel loci (i.e. not previously identified in previous GWAS meta-analyses of epigenetic age estimators [8,9]), and examine the genetic and causal relationships between epigenetic ageing, lifestyle behaviours, health outcomes, and longevity.

Results
To identify genetic variants associated with six methylation-based biomarkers, genome wide association studies (GWAS) of 34,962 European ancestry and 6,482 African American individuals were performed (Appendix 1 and Tables S1-S2). A fixed-effects meta-analysis was performed to combine the summary statistics within each ancestry group. Genomic inflation factors ranged between 1.01 and 1.06 (Figure S1-S6; https://datashare.is.ed.ac.uk/handle/10283/3645) for the European-only meta-analyses and . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint from 1.11 and 1.21 for the meta-analyses comprising African American participants (Figures S7-S12; Table 1). Inflation was present and consistent across all allele frequencies in the African American analyses; there was much greater variability in the effect sizes in the African American analyses (Appendix 2). We examined the relationship between means and standard deviations of predicted age versus means and standard deviations of chronological age for each cohort, separated by ancestry group, observing weaker mean correlations in the African American cohorts. There was little difference in the relationship between the standard deviations of age acceleration and chronological age by ancestry group ( Figure   S13). Heterogeneity between studies may decrease power to detect genetic associations. We found little evidence of systematic between-study heterogeneity in both the European ancestry and African American meta-analyses, as determined by M-statistic outlier analysis, meta-regressions against cohort characteristics, and analysis of heterogeneity I 2 statistics [21] (Appendix 2).
The key findings of the post GWA analyses are summarised in Table 1.

European ancestry GWAS meta-analysis: 56 independently associated loci
We identified 56 conditionally independent associations (P <5x10 -8 ) across the six epigenetic biomarkers in European ancestry populations using a stepwise model (Figures S14-S19; Table 1; Table S3; [22,23]. We replicated 10/10 loci associated with IEAA and 1/1 locus associated with a cell-adjusted Hannum based measure of epigenetic age acceleration identified in an earlier GWAS (P < 0.05/11 = 0.0045) [9]. All but three loci (associated with IEAA) were replicated at the genome-wide significant level. To validate the associations with DNAm derived granulocyte counts, we compared our results to a previous GWAS of FACS granulocyte counts which identified 155 independent loci [20]. In the current meta-analysis, . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint we replicated 13/129 present loci (P < 0.05/129 = 3.88x10 -4 ; Table S4; Figures S20-S21), two of which replicated at the genome-wide significant level. Effect sizes at the 129 loci were strongly correlated between studies (r=0.85).
To examine whether genetic variation across the six epigenetic biomarkers was shared, we performed genetic colocalization analyses of the 56 loci [24]. There was evidence for colocalization at 29 loci between epigenetic biomarkers (posterior probability > 0.8; Table   S5). IEAA was associated with the greatest number of independent associations (n=24), whereas granulocyte proportion was associated with the fewest (n=2). The most significant independent association was observed between IEAA and rs10949481 (β T_allele = -0.20; P =4.5x10 -54 ), mapping to the 3'UTR of NHLRC1 on chromosome 6p22.3, within one of the top loci identified in previous meta-analyses of IEAA [8,9].

Trans-ethnic meta-analyses identify 69 loci
To determine if any loci were shared across the European and African American populations, a trans-ethnic meta-analysis was carried out for each of the six epigenetic biomarkers using MRMEGA [25]. There were between 6 and 23 risk loci common to all ancestries (69 loci in . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint total). Fifty-three of these loci contained independent lead SNPs from the ancestry-specific meta-analyses (38 and 15 lead SNPs from the European ancestry and African American meta-analyses were contained within these loci, respectively); 21 of the trans-ethnic loci did not contain a SNP identified in the ancestry-specific meta-analysis models (Table S6). We found 21 ancestry-specific lead SNPs (i.e. those identified in either European ancestry or African American analyses and not within the trans-ethnic loci; Table S7). Allele frequencies for 11/21 ancestry-specific lead SNP allele frequencies differed by >10%.
We compared effect sizes of the lead SNPs from these loci in a Hispanic-American ancestry subset of the MESA cohort (n=287). Correlations between the respective effect sizes ranged from 0.16 (granulocyte proportions) to 0.92 (Hannum age acceleration; Table S8 and Figure   S28).
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 1 1

Independently-associated loci are associated with DNA methylation levels
To explore whether any of the 56 loci from the European GWAS shared genetic variation influencing epigenetic clock DNAm sites, colocalization analyses using GoDMC summary statistics were conducted (Methods). We found strong evidence (posterior probability > 0.8) that 1/4 loci (25%) for GrimAge acceleration, 3/12 loci (25%) for PhenoAge acceleration, 11/24 loci (46%) for IEAA and 5/9 loci (56%) for Hannum age acceleration had shared genetic variation influencing epigenetic clock DNAm sites (Tables S10-S11 Utilising results from a published GWAS of IEAA in brain tissue [29], we tested whether genetic variation influencing IEAA in blood and brain was shared for the 24 blood-related IEAA loci (cross-tissue plot for lead SNPs shown in Figure S29). Colocalization analysis showed that there was no strong evidence (PP > 0.80) for a single SNP being associated with both traits. However, the true extent of sharing is difficult to estimate because the sample size of the brain study (n=1796) is much smaller than our blood-based study, limiting power to detect shared loci. Previous simulations using a sample size of 2000 individuals have indicated that the shared variant must explain close to 2% of the variance of a biomarker to attain a posterior probability >0.8 for shared genetic effects [24]. Nevertheless, we observed suggestive evidence for colocalization (PP=0.53; LocusZoom plot in Figure S30) for a locus . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint mapping to DSCR6 on chromosome 21. This locus also overlaps with a mQTL for an IEAA clock CpG, cg13450409 (PP = 0.99; Table S11).

Functional enrichment analysis
To gain an understanding of the functional and regulatory properties of the variants that underlie the six epigenetic biomarkers, we performed functional enrichment analyses across various gene annotations and regulatory and cell-type specific elements on the summary statistics for each of the European ancestry GWAS results (see Methods, Table S12) [30]. At an epigenetic biomarker-specific adjusted p-value calculated from the effective number of annotations, significant enrichments were present for IEAA (n=191) and Hannum age acceleration (n=3). Associations with IEAA were enriched in DHS hotspots in several tissues (which might reflect that Horvath's pan tissue clock applies to all tissues) but the strongest enrichment of associatios with IEAA could be observed for mobilized CD34 primary cells (OR=6.06, P=6.1x10 -12 ), which supports the view that IEAA reflects properties of hematopoietic stem cells.

Pathway enrichment analysis
In the African American analysis, genes associated with granulocyte proportions were enriched amongst 35 Gene Ontology (GO) terms (Bonferroni P<0.05), the majority of which were immune-related (e.g. adaptive immune response, lymphocyte activation, regulation of immune system process or skin development; Table S13). By contrast, there was no significant enrichment amongst KEGG pathways or GO terms for the significantly-associated genes in the European ancestry analysis.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint

Overlap with Mendelian disease genes
In order to characterise the potential overlap of Mendelian disease genes and associated pathways with our findings, a series of enrichment analyses were conducted (Methods) [31].
In the European ancestry analysis, enrichment of Mendelian disease genes was observed for IEAA and two gene-sets ('disorders of platelet function' and 'vascular skin abnormality': bootstrapped P=0.027 and 0.049, respectively). Enrichment analysis of PhenoAge loci revealed overrepresentation of methylation-related Mendelian disease genes (Table S14; bootstrapped p-value = 0.04) with MTR (methyltetrahydrofolate-homocysteine Smethyltransferase) present in addition to TPMT. No significant over-representations of Mendelian disease gene sets were observed for any of the genes identified in the African American analysis.

SNP-and gene-based enrichment within published GWAS
To determine whether any of the 56 lead SNPs in the European ancestry meta-analyses for the six epigenetic biomarkers showed evidence for pleiotropic associations, a look-up of published GWAS significant associations (P <5x10 -8 ) was carried out ( Table S15). Four of the SNPs (rs2736100 in TERT, rs2275558 in PBX1, rs144317085 in TET2 and rs2492286 in RPN1) were associated with 16 unique traits including multiple cancers (e.g. lung cancer, glioma) [32,33] and blood cell counts (e.g. platelet count, eosinophil count, red blood cell count) [20, 32] (Table S15). All four SNPs were associated with IEAA in the current study.
A gene-based test of enrichment among traits within the GWAS catalog output (Table S16) showed genes associated with IEAA, GrimAge and granulocyte proportion were enriched among those associated with white blood cell counts [20,34]. Several genes associated with granulocyte proportions were also enriched among those associated with inflammatory traits (e.g. inflammatory bowel disease, rheumatoid arthritis, asthma) [35,36,37,38,39,40].
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint IEAA-associated genes were also significantly enriched among those identified in a previous GWAS of IEAA [8,9].

Colocalization to identify GWAS SNPs that might regulate expression levels
Colocalization analyses were conducted to investigate whether any of the 56 loci from the European ancestry GWASs showed evidence of regulating gene expression levels (Methods). There was strong evidence (posterior probability > 0.8) that eight loci had shared genetic effects with eQTLs ( Table 1; Table S17). Of these, one was associated with GrimAge acceleration and seven were associated with IEAA. The locus associated with GrimAge acceleration was linked to the expression of C6orf183 whereas IEAA-associated loci were linked to the expression of 11 transcripts including genes related to lipid transport and immune function (e.g. ATP8B4, CD46, TRIM59).

Heritability and LD Score Regression
We quantified the proportion of variance in the six epigenetic biomarkers from the European ancestry meta-analyses that can be explained by our SNP sets using LD Score regression [41]. The GWAS summary statistic SNP-based heritability ranged from 0.02 (SE = 0.02) for PAI1 levels, to 0.17 (SE = 0.02) for IEAA (Table 1; Figure S31; Table S18). We omitted DNAm PAI1 from the genetic correlation analysis due to its low heritability estimate. Several of the remaining five epigenetic biomarkers exhibited significant (p<0.05) pairwise genetic correlation coefficients ranging from r=0.27 (GrimAge acceleration and IEAA) to r=0.66 (GrimAge acceleration and PhenoAge acceleration; Figure 1; Table S19). A selection of significant genetic correlations is presented in Figure 2.

Polygenic risk score (PRS) profiling
To determine how well SNP-based genetic scores can approximate the six epigenetic biomarkers, and to see if these genetic scores associate with health outcomes, a polygenic risk score analysis was conducted on the European ancestry data. Re-running the meta-analysis with an iterative leave-one-cohort-out process (and on the full summary statistics in a completely independent cohort -the Young Finns Study), the mean polygenic predictions explained between 0.21 and 2.37% of the epigenetic biomarkers (Table 1; Figure 3; Table   S21). The maximum prediction for a single cohort was 4.21% for PAI1 levels in ARIES.
Parsimonious predictors (built using SNPs with P <5x10 -8 ) performed well for IEAA, PAI1 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint levels and PhenoAge acceleration, whereas predictors including more SNPs (P <0.01 -P<1) tended to explain the most variance in GrimAge acceleration and granulocyte proportions.
In order to investigate the association between the polygenic risk scores and health outcomes, we utilised a PRS Atlas to model associations with 581 heritable traits (n range =10,299 to 334,915) from the UK Biobank study [49]. The PRS inputs included the independent SNPs with P<1 for GrimAge acceleration and granulocyte proportion and P<5x10 -8 for the other four epigenetic biomarkers, with thresholds based on the results from the leave-one-out predictions (Figure 3). Using an FDR-corrected p-value for each of the six epigenetic biomarkers, we found between 7 and 250 significant associations for GrimAge acceleration, Granulocyte proportions and Hannum age acceleration (P FDR < 0.05; Table S22). The strongest associations were between the GrimAge acceleration PRS and the following traits: adiposity-related traits (e.g. body fat percentage: β = 0.02; P FDR = 7.3x10 -39 ); education (e.g. college or university degree: β = -0.06; P FDR = 2.6x10 -43 ); and parental longevity (e.g. father's age at death: β = -0.02; P FDR = 5.7x10 -16 ; mother's age at death: β = -0.02; P FDR = 1.6x10 -11 ).

Mendelian Randomisation between age acceleration phenotypes and health and life style outcomes
To investigate if the epigenetic measures were causally influenced by lifestyle factors and had a causal effect on ageing and disease outcomes, we performed Mendelian randomization . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint (MR) analyses on 150 traits for the European ancestry data (Table S23). We found 12 inverse-variance weighted MR effects between the main exposures and epigenetic outcomes (GrimAge acceleration, PhenoAge acceleration and PAI1 levels), after adjustment for multiple testing ( Table 1; Table S24). Of these, three remained significant (P <0.05) across the other three MR methods. All of these consistent effects were with GrimAge as the outcome. Greater adiposity was associated with greater GrimAge acceleration: BMI (Beta IVW =0.76 years per SD increase in BMI, P =3.7x10 -16 ); hip circumference (Beta IVW =0.42 years per SD increase in hip circumference, P =2.5x10 -5 ); waist circumference (Beta IVW =0.59 years per SD increase in waist circumference, P =5.9x10 -6 ; Figure 4). Current tobacco smoking showed evidence for a causal effect on increased GrimAge in two of the MR methods (Beta IVW =3.42 years for smokers, P =9.0x10 -6 ; Figure 4), as anticipated given that it incorporates a DNAm-based estimator of smoking pack-years [8]. Past tobacco smoking showed evidence for an inverse causal effect (Beta IVW =-1.09 years, P =6.6x10 -9 ), indicating that GrimAge acceleration is reduced upon smoking cessation. As a DNAm-proxy for leptin was also included in the derivation of GrimAge, the smoking and adiposity findings may act as positive controls. There was also evidence from three of the four MR methods to support a link between higher educational attainment (both years of schooling and college/university degree) and lower GrimAge acceleration (Figure 4). For the secondary exposures, there was evidence across all methods for a causal effect of a greater body size at age 10 on higher GrimAge acceleration (Beta IVW =0.70, P =1.6x10 -4 ; Table S23; Table S25). Consistent findings across all four MR methods provided evidence to support a causal effect of 13 cell count traits and DNA methylation-estimated granulocyte proportions, and between lower lymphocyte proportions and higher Hannum and GrimAge acceleration (Table S26). There was evidence for heterogeneity in the causal effects for most of the cell types on epigenetic . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint age measures, as well as years of schooling on GrimAge acceleration, although weaker evidence for directional pleiotropy was detected based on the Egger intercept ( Table S27).
The biomarker analyses with epigenetic measures as outcomes identified no consistent effects across all four MR methods (Table S26). We found limited evidence to support any causal effects of the epigenetic measures (as exposures) on key disease and health outcomes, including longevity (Tables S29-S30).

Discussion
Epigenetic biomarkers of ageing and mortality have been extensively studied in relation to a plethora of health and disease outcomes. Here, we conducted a comprehensive suite of analyses in a meta-analysis sample of over 40,000 individuals, including the first GWA studies of two DNAm based estimators of mortality risk (PhenoAge and GrimAge), as well as for DNAm-based proxies for granulocyte proportion and plasminogen activation inhibitor 1 protein levels. We identified 137 loci, of which 113 were novel, related to the six epigenetic biomarkers. Although our comparison of genetic architectures across different ancestries was limited by sample size, our trans-ethnic meta-analyses implicated many shared genetic loci.
However, heterogeneity of effect sizes between European and African American ancestries was found for genome wide significant loci in African Americans which may reflect differential tagging of underlying causal variants. Alternatively, gene-environment interactions or poor prediction of epigenetic biomarkers in African Americans may explain heterogeneity in these effect sizes.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. The IEAA-associated locus on chromosome 3 (lead SNP rs1047210) colocalized with an eQTL for TRIM59. DNA methylation levels at TRIM59 have been robustly associated with chronological age and its expression has been noted in multiple cancers [53,54,55,56,57].
Apart from TRIM59, the eQTL colocalization also implicated other genes associated with cancer progression (e.g. KPNA4, SMC4, and CXXC4) [58,59,60]. SMC4 is known to inhibit cellular senescence in replicating cells [61,62]. Additional eQTL colocalization results for IEAA include CD46, a regulator of the complement system and T-cell function [63,64], and the lipid transporter gene ATP8B4, which contains variants that have been reported in relation to centenarian status in Italians and Alzheimer's disease [65,66].
Using the findings from the European ancestry meta-analyses, we observed shared genetic contributions between PhenoAge and GrimAge acceleration with education, cognitive ability, adiposity, and smoking; there were also significant epigenetic biomarker-specific genetic correlations with numerous other health and modifiable lifestyle factors. The best epigenetic . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint biomarker of mortality risk, GrimAge acceleration, exhibited strong genetic correlations with parental longevity and lung cancer. As GrimAge uses a DNA methylation based estimator of smoking pack-years in its definition, it is possible that the significant genetic correlation with GrimAge acceleration and lung cancer is mediated by smoking.
Four different Mendelian randomisation methods provide directional evidence of a causal influence of adiposity-related traits on GrimAge acceleration, while smoking cessation was inversely related to GrimAge acceleration. Several MR analyses indicate that increased educational attainment is associated with lower GrimAge acceleration. However, there was no causal evidence for associations with other lifestyle traits, such as alcohol consumption.
There was limited evidence from MR to implicate any of the DNAm predictors as playing an important causative role in longevity or disease risk. This is the first study to present polygenic risk scores for six epigenetic biomarkers of ageing.
A phenome-wide scan of the six polygenic risk scores (PRS) yielded highly significant associations between the PRS of GrimAge acceleration and adiposity-related traits, education, and parental longevity. Both the PRS analysis and the MR analyses suffer from two limitations i) the geographical structure in the UK Biobank cohort might confound these analyses [67] and ii) low heritability estimates for some of the phenotypes (e.g. longevity or epigenetic biomarkers such as PAI1). Limited statistical power due to low heritability or low sample size may help explain the disconnect between the genetic correlation analysis (which revealed a plethora of significant genetic correlations for GrimAge and PhenoAge) and the MR analysis (which led to a dearth of significant findings). In general, careful interpretation of the MR findings is required. In particular, causal MR analyses that modelled epigenetic biomarkers as exposures and disease states as outcomes suffered from weak genetic instruments (e.g., for GrimAge acceleration and granulocyte proportions, where the variance explained was <1%) or inadequate power in two-sample analysis. Future multivariate MR . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint analyses will be required to test whether the protective causal effect of education on GrimAge is mediated by smoking, obesity, or other factors.
Overall, this study highlights the shared genetic architecture between epigenetic ageing, lifestyle factors (smoking, obesity), and parental longevity, which shows that DNAm-based biomarkers are valuable endophenotypes of biological ageing. Coordinating Center (R01HL-120393; U01HL-120393; contract HHSN268201800001I). We . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Acknowledgements
The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. Cohort specific acknowledgements are presented in Appendix 1.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint

Study cohort information
The meta-analysis sample comprised 37 datasets from 31 cohorts encompassing 41,731 participants (controls/healthy volunteers). Of these, 28 included individuals of European ancestry comprising 34,449 participants, seven were of individuals of African American comprising 6,148 participants, and one was of Hispanic ancestry comprising 287 participants.
The total meta-analysis sample age range was 27.2-79.1 years (mean 54.0 years overall; 54.8 years across European ancestry cohorts; 50.4 years across African American cohorts), and comprised 0-100% females (mean 58.26%; 57.3% across European ancestry cohorts; 58.3% across African American cohorts). Cohort-level descriptive data are presented in Table S1 and described in Appendix 1. Each of the studies was approved by their local Ethical Committee. All subjects provided written informed consent.

Data availability
Meta-analysis summary statistics for each epigenetic biomarker are publicly available at https://datashare.is.ed.ac.uk/handle/10283/3645. For cohort-specific details, please see Appendix 1, which contains information for each study.

Data preparation
Age-adjusted DNA methylation-based estimates of Hannum age, Intrinsic Horvath age, PhenoAge, GrimAge, plasminogen activator inhibitor-1 levels, and unadjusted granulocyte proportion were calculated using the Horvath epigenetic age calculator software (https://dnamage.genetics.ucla.edu/ or standalone scripts provided by Steve Horvath and Ake Lu). The following outputs were assessed: Intrinsic Epigenetic Age Acceleration -"IEAA", . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint Hannum age acceleration -"AgeAccelerationResidualHannum", PhenoAge acceleration -"AgeAccelPheno", GrimAge acceleration -"AgeAccelGrim", estimate levels of Plasminogen Activation Inhibitor 1, adjusted for age -"DNAmPAIadjAge", and estimated proportion of granulocytes -"Gran". For each cohort, an outlier threshold for methylation values of +/-5 standard deviations was applied and outlier samples were excluded from the analysis.

GWAS and meta-analysis
Quality control and imputation were done by each study separately (Appendix 1) Genotypes were imputed to either Haplotype Reference Consortium (HRC) or 1000 genomes phase 3 panels [68,69]. In each cohort, association testing was conducted using imputed dosages using an additive model. Linear models were adjusted for sex and genetic principal components. GWAS summary statistics were obtained for between 1,097,816-15,221,271 genetic markers. This was the case for all cohorts with the exception of GOLDN (wholegenome sequence data) and the Sister Study (imputed data not available at the time of analysis). For each cohort, summary statistics were processed and harmonised using the R package EasyQC [70]. Multi-allelic variants were filtered to contain the variant with the highest minor allele count. At the individual cohort level, variants that were monomorphic, with a minor allele count ≤ 25, genotyped in <30 individuals, or with an imputation quality score <0.6 were removed. Allele codes and marker names were harmonized, duplicate variants were removed, and allele frequency checks were performed against the appropriate population reference data. Meta-analyses were performed with METAL using an inverse variance fixed effects scheme [71]. Meta-analyses were performed on European ancestry and African American studies separately (n=34,962 and 6,482, respectively). Variants were omitted from the meta-analysis if they were absent from >50% of the total meta-analysis sample size. Cohort specific genomic inflation factors ranged from 0.86 to 1.07. Genome-. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint wide significance was defined as P < 5x10 -8 . To summarize the associations in terms of index SNP with the strongest association and other SNPs in linkage disequilibrium we used conditional and joint association analysis of GWAS summary data, including the HLA region, in the GCTA-COJO software. A stepwise selection model was used with default settings for SNP LD (R 2 <0.9), analysis window size (10Mb), and genome-wide significance (P<5x10 -8 ) using HRC imputed genotype data from Generation Scotland, and 1000G imputed genotype data from ARIC as the reference panels for the European ancestry and African American analyses, respectively. Heterogeneity I 2 statistics were obtained from the metaanalyses and plotted against both -log 10 p-values and effect sizes to determine if SNPs with heterogeneous effects across cohorts were more statistically significant or had larger effect sizes. Systematic between-study heterogeneity was also investigated [21]. Meta-analyses were re-run after excluding cohorts identified as outliers and effect sizes were visually compared with the full meta-analysis output. Forest plots were prepared for all significant loci.  [72]. Independent lead SNPs had P<5x10 -8 and were independent of each other at r 2 <0.6; lead SNPs within this subset were required to have r 2 <0.1. A locus was defined by . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Functional Annotation of Meta-Analysis Summary Statistics
The European ancestry and African American meta-analysis summary statistics were uploaded to FUMA (http://fuma.ctglab.nl) for further annotation and functional analysis [72].
Genes were annotated from SNP-level data using the "SNP2GENE" tool, permitting gene set and tissue expression analyses using MAGMA [26]. A Bonferroni-corrected significance threshold (adjusting for 18,606 tested genes) of P <2.69x10 -6 was set for the gene-based GWAS. Genes annotated to significant GWAS loci were further investigated using the "GENE2FUNC" tool in FUMA for enrichment of GWAS catalog gene sets [73]. Bonferronicorrected P-value thresholds were applied.

Functional enrichment
To test if the GWAS meta-analysis findings were associated with regulatory and functional features of interest, enrichment analyses were conducted using GARFIELD [30]. SNPs were first pruned (r 2 >0.1) then annotated to categories (e.g., chromatin states, histone modifications, DNaseI hypersensitive sites, and transcription factor binding sites). Statistical enrichment was then carried out for SNPs at two p-value thresholds (P<1x10 -5 and P<1x10 -8 ) while accounting for MAF, distance to the nearest TSS, and number of LD proxies.

Colocalization Analysis
We hypothesised that some of the lead loci from the meta-analyses will have shared variants 1) across the Epigenetic biomarkers, 2) with DNAm sites in blood, and 3) with gene expression levels in blood. We used GoDMC summary statistics on 190,102 DNAm sites . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint [74] to examine the overlap between loci and epigenetic clock DNAm sites, BMI-associated DNAm sites and smoking-associated DNAm sites. We used eQTL Gen summary statistics on 19,942 transcripts [75] that were available in the MR-Base database [76]. We used the Rpackage gwasglue (https://mrcieu.github.io/gwasglue/) to extract SNPs that were +/-1Mb of the lead SNP and to harmonise the datasets. For each Epigenetic biomarkers molecular trait pair (or pair of Epigenetic biomarkers) we then performed colocalization analysis using the coloc.abf function in the R/coloc package [24], using default parameters. We only kept colocalized pairs with more than 50 shared SNPs and a posterior probability above 0.8. We removed the HLA region from the eQTL colocalization analysis.

Disease and Phenotype Ontology Enrichment
The potential role of Mendelian disease genes and associated pathways in influencing the epigenetic clock was investigated with MendelVar [31], independently for each marker phenotype. We did not limit our analysis to any particular phenotype class amongst the Mendelian disease genes but looked for enrichment of any disease processes found to be strongly linked to genes in the GWAS loci. MendelVar analysis was run using intervals based on ±0.5 Mbp window around the lead SNPs using the 1000 Genomes EUR population as LD reference [66]. Inside MendelVar, INRICH was run in "target" enrichment mode, with the target gene set filter set at minimum 5 (-i option) and maximum of 20000 (-j option), and minimum observed threshold of 2 (-z option) [77]. The nominal p-values were corrected for multiple testing with two rounds of permutation in INRICH.

Heritability and Genetic Correlation Analysis
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint LD score regression, using LD scores and weights estimated from European ancestry populations (downloaded from https://data.broadinstitute.org/alkesgroup/LDSCORE/), was used to assess genetic correlations between the six epigenetic biomarkers. Genetic correlations were further assessed between the six epigenetic biomarkers and publiclyavailable GWAS summary statistics using the LDHub web interface (http://ldsc.broadinstitute.org/ldhub/) [41]. Meta-analysis results for each epigenetic biomarker were uploaded to the LDHub website, selecting all available traits for genetic correlation analysis. SNP heritability was estimated using univariate LD score regression. As the majority of large-scale GWA studies have been based on European ancestry populations, heritability and genetic correlation analyses were limited to this group to maximise statistical power. Filtering was performed to exclude traits where the LD Hub output came with the following warning messages: "Caution: using these data may yield less robust results due to minor departure of the LD structure" and "Caution: using this data may yield results outside bounds due to relative low Z score of the SNP heritability of the trait". This left a total of 693 unique traits from 708-711 studies per epigenetic clock. A Bonferroni-corrected significance threshold of P < 0.05/693 = 7.21x10 -5 was applied.

Polygenic risk scores
To determine the proportion of variance in the six epigenetic biomarkers that can be explained by common genetic variants, we carried out a polygenic risk score analysis using results from the European ancestry meta-analyses. Weights for the additive genetic scores were created by re-running the meta-analyses excluding one cohort (test cohort) at a time. Six weighted-PGR-scores (one for each epigenetic biomarker) were generated using default settings of the PRSice software (clump-kb=250, clump-p=1; clump-r2=0.25) [78]. P-value thresholds were set at <5x10 -8 , <0.01, <0.05, <0.1, <0.5, and 1. Linear regression models were built to calculate the incremental R 2 between the null model (epigenetic biomarker~ sex) . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint and full model (epigenetic biomarker ~ sex + polygenic risk score) in the test cohort. The procedure was iterated after excluding different test cohorts one by one (Lothian Birth Cohort 1921, Lothian Birth Cohort 1936, Framingham Heart Study, Born in Bradford, ARIES, and SABRE, respectively) from the meta-analysis. Finally, these steps were repeated, using the full meta-analysis summary statistic output to generate polygenic risk scores in a completely independent cohort (Young Finns Study, n =1,320).
For Born in Bradford, ARIES and SABRE, best-guess genotypes files with a MAF cut-off of 1% and info score>0.8 were generated and the polygenic risk score analyses were corrected for 20 genetic PCs.
A phenome-wide association study of 581 heritable traits from the UK Biobank study was then carried out for parsimonious polygenic risk scores based on independent SNPs with

Mendelian Randomisation
To investigate if the epigenetic biomarkers were i) causally influenced by lifestyle factors and ii) had a causal effect on ageing and disease outcomes, Mendelian Randomisation (MR) was performed in MRBase [76]. The epigenetic measures were considered as both exposures (i.e., causally influencing the outcome) and outcomes (i.e., the epigenetic measure being causally influenced by a trait of interest). The analyses were further split into four sections: primary . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint exposures/outcomes (common lifestyle risk factors and ageing/disease outcomes from the largest available GWAS in MR Base); secondary exposures/outcomes (traits identified as relevant via moderate genetic correlations from the LD regression analyses); 34 cell count exposures [20]; and 38 biomarker exposures [79]. SNPs instrumenting each exposure were clumped using a European LD reference panel and an r2 < 0.001. Harmonization of the SNP effects with the exposure and outcomes were performed so that palindromic SNPs were aligned when minor allele frequencies (MAFs) were <0.3 or were otherwise excluded.  . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint  . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint PhenoAge acceleration (Panel B) and a selection of GWAS traits. *This variable was originally coded with a high score representing lower health rating. We have multiplied the genetic correlation by -1 for interpretability.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint Figure 4: Causal effects of UK Biobank GWAS traits on GrimAge acceleration. Effects correspond to increase/decrease in GrimAge acceleration per SD increase in waist circumference, hip circumference and BMI; or per log odds increase for university/college education and current smoker status.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint Supplementary Tables:   Table S1: Cohort summaries.                             . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020.            . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint Figure S13: Mean/SD epigenetic age plotted against mean/SD chronological age in European ancestry and African American cohorts.          . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint        . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2020. ; https://doi.org/10.1101/2020.06.29.133702 doi: bioRxiv preprint