Abstract
Most age-related human diseases are accompanied by a decline in cellular organelle integrity, including impaired lysosomal proteostasis and defective mitochondrial oxidative phosphorylation. An open question, however, is the degree to which inherited variation impacting each organelle contributes to age-related disease pathogenesis. Here, we evaluate if organelle-relevant loci confer greater-than-expected age-related disease risk. As mitochondrial dysfunction is a “hallmark” of aging, we begin by assessing nuclear and mitochondrial DNA loci relevant to mitochondria and surprisingly observe a lack of enrichment across 24 age-related traits. Within nine other organelles, we find no enrichment with one exception: the nucleus, where enrichment emanates from nuclear transcription factors. In agreement, we find that genes encoding several organelles tend to be “haplosufficient,” while we observe strong purifying selection against protein-truncating variants impacting the nucleus. Our work identifies common variation near transcription factors as having outsize influence on age-related trait risk, motivating future efforts to determine if and how this variation contributes to age-related organelle deterioration.
Introduction
The global burden of age-related diseases such as type 2 diabetes (T2D), Parkinson’s disease (PD), and cardiovascular disease (CVD) has been steadily rising due in part to a progressively aging population. These diseases are often highly heritable1. Genome-wide association studies (GWAS) have led to the discovery of thousands of robust associations with common genetic variants2, implicating a complex genetic architecture as underlying much of the heritable risk. These loci hold the potential to reveal underlying mechanisms of disease and spotlight targetable pathways.
Aging has been associated with dysfunction in many cellular organelles3. Dysregulation of autophagic proteostasis, for which the lysosome is central, has been implicated in myriad age-related disorders including neurodegeneration, heart disease, and aging itself4, and mouse models deficient for autophagy in the central nervous system show neurodegeneration5,6. Endoplasmic reticular (ER) stress has been invoked as central to metabolic syndrome and insulin resistance in type 2 diabetes7. Disruption in the nucleus through increased gene regulatory noise from epigenetic alterations3 and elevated nuclear envelope “leakiness”8 has been implicated in aging. Dysfunction in the mitochondria has even been invoked as a “hallmark” of aging3 and has been nominated as a driver of virtually all common age-associated diseases. In particular, deficits in mitochondrial oxidative phosphorylation (OXPHOS) have been observed in aging and age-related diseases as evidenced by in vivo 31P-NMR measures9,10, enzymatic activity11–17 in biopsy material, accumulation of somatic mitochondrial DNA (mtDNA) mutations18–20, and a decline in mtDNA copy number (mtCN)21.
Given that a decline in organelle function is observed in age-related disease, a natural question is whether inherited variation in loci relevant for organelles are enriched for age-related disease risk. In the present study, we use a human genetics approach to assess common variation in loci relevant to the function of 10 cellular organelles. We begin with a deliberate focus on mitochondria given the depth of literature linking it to age-related disease. As mitochondria-localizing protein products from ~1100 nuclear DNA (nucDNA)-encoded genes22 and 13 mtDNA-encoded genes are critical for proper OXPHOS homeostasis23, we test both nucDNA and mtDNA loci relevant for mitochondrial function in 24 different age-related diseases and traits. We hypothesized that heritability for common, age-related traits would be overrepresented among mitochondria-relevant loci, namely variants near genes encoding the organelle’s proteome or loci associated with quantitative readouts of mitochondrial function.
To our surprise, we find no evidence of enrichment for genome-wide association signal in mitochondria-relevant loci across any of our analyses. Further, of ten tested organelles, only the nucleus shows enrichment among many age-associated traits, with the signal emanating from the transcription factors. Further analysis shows that genes encoding the mitochondrial proteome tend to be tolerant to heterozygous predicted loss-of-function (pLoF) variation and thus are surprisingly “haplosufficient,” whereas nuclear transcription factors are especially sensitive to gene dosage and are often “haploinsufficient”. Thus, we highlight variation influencing gene-regulatory pathways, rather than organelle physiology, in the inherited risk of common age-associated diseases.
Results
Age-related diseases and traits show diverse genetic architectures
To systematically define age-related diseases, we turned to recently published epidemiological data from the United Kingdom (U.K.)24 in order to match the U.K. Biobank (UKB)25 cohort. We prioritized traits whose prevalence increased as a function of age (Methods) and were represented in UKB (https://github.com/Nealelab/UK_Biobank_GWAS) and/or had available published GWAS meta-analyses26–35 (Figure 1A, Supplementary note). We used SNP heritability estimates from stratified linkage disequilibrium score regression (S-LDSC, https://github.com/bulik/ldsc)36 to ensure that our selected traits were sufficiently heritable (Methods, Supplementary note). We then computed pairwise genetic correlations between the age-associated traits to compare their respective genetic architectures (Figure 1B, Table S2, Methods). As expected we find a highly correlated module of primarily cardiometabolic traits with high density lipoprotein (HDL) showing anti-correlation37. Interestingly, several other traits (gastroesophageal reflux disease (GERD), osteoarthritis) showed moderate correlation to the cardiometabolic trait cluster while atrial fibrillation, for which T2D and CVD are risk factors38, did not. Our final set of prioritized, age-associated traits included 24 genetically diverse, heritable phenotypes (Table S1). Of these, 11 traits were sufficiently heritable only in UKB, 3 were sufficiently heritable only among non-UKB meta-analyses, and 10 were well-powered in both UKB and an independent cohort.
No evidence for enrichment of age-related trait heritability in mitochondria-relevant loci
To test if age-related trait heritability was enriched among mitochondria-relevant loci, we began by simply asking if ~1100 nucDNA genes encoding the mitochondrial proteome from the MitoCarta2.0 inventory22 were found near lead SNPs for our selected traits represented in the NHGRI-EBI GWAS Catalog (https://www.ebi.ac.uk/gwas/)39 more frequently than expectation (Methods, Supplementary note). To our surprise, no traits showed a statistically significant enrichment of mitochondrial genes (Figure S1A); in fact, six traits showed a statistically significant depletion. Even more strikingly, MitoCarta genes tended to be nominally enriched in fewer traits than the average randomly selected sample of protein-coding genes (Figure S1B, empirical p = 0.014). This lack of enrichment was observed more broadly across virtually all traits represented in the GWAS Catalog (Figure S1C). We also tested several transcriptional regulators of mitochondrial biogenesis and function – TFAM, GABPA, GABPB1, ESRRA, YY1, NRF1, PPARGC1A, PPARGC1B. We found little evidence supporting a role for these genes in modifying risk for the age-related GWAS Catalog phenotypes, observing only a single trait (heel bone mineral density) for which a mitochondrial transcriptional regulator (TFAM) was nearest an associated genome-wide significant variant (Supplementary note).
To investigate further, we turned to U.K. Biobank (UKB). We compiled and tested three classes of “mitochondria-relevant loci” (Figure 2A) with which we interrogated the association between common mitochondrial variation and common disease. First, we curated literature-reported nucDNA quantitative trait loci (QTLs) associated with measures of mitochondrial function (Table S3): mtCN40,41, mtRNA abundance and modification42,43, and plasma levels of OXPHOS dysfunction biomarkers including GDF15 protein44,45, lactate, pyruvate, and lactate/pyruvate ratio46–48. Second, we considered all common variants in or near nucDNA MitoCarta genes, as well as two subsets of MitoCarta: mitochondrial Mendelian disease genes49 and nucDNA-encoded OXPHOS genes. Third, we obtained mtDNA genotypes at up to 213 loci after quality control (Methods) from 360,662 individuals.
First, we tested if published QTLs for mtCN, mtRNA abundance, and OXPHOS biomarkers (Table S3, S4) were enriched for an overlap with genome-wide significant loci for each of our age-related traits in UKB (Methods, Figure S2). We observed no evidence of enrichment among QTLs available in the literature (Figure 2B, Supplementary note; all q > 0.1).
Second, we used S-LDSC36,50 and MAGMA (https://ctg.cncr.nl/software/magma)51, two robust methods that can be used to assess gene-based heritability enrichment accounting for LD and several confounders, to test if there was any evidence of heritability enrichment among MitoCarta genes (Methods). We found no evidence of enrichment near nucDNA MitoCarta genes for any trait tested in UKB using S-LDSC (Figure 2C, S8A), consistent with our results from the GWAS Catalog. We replicated this lack of enrichment using MAGMA at two different window sizes (Figure S8C, S8E; all q > 0.1).
Given the lack of enrichment among the MitoCarta genes, we wanted to (1) verify that our selected methods could detect previously reported enrichments and (2) confirm that common variation in or near MitoCarta genes can lead to expression-level perturbations. We first successfully replicated previously reported enrichment among tissue-specific genes for key traits using both S-LDSC (Figure S3, S4) and MAGMA (Figure S5, S6, Supplementary note, Methods). We next confirmed that we had sufficient power using both S-LDSC and MAGMA to detect physiologically relevant enrichment effect sizes among MitoCarta genes (Figure S7, Methods, Supplementary note). We finally examined the landscape of cis-expression QTLs (eQTLs) for these genes and found that almost all MitoCarta genes have cis-eQTLs in at least one tissue and often have cis-eQTLs in more tissues than most protein-coding genes (Figure S9, Methods, Supplementary note). Hence, our selected methods could detect physiologically relevant heritability enrichments among our selected traits at gene-set sizes comparable to that of MitoCarta, and common variants in or near MitoCarta genes exerted cis-control on gene expression.
Third, we considered mtDNA loci genotyped in UKB, obtaining calls for up to 213 common variants passing quality control across 360,662 individuals (Methods, Supplementary note). We found no significant associations on the mtDNA for any of the 21 age-related traits available in UKB using linear or logistic regression (Methods, Figure 2E, S9).
As a control and to validate our approach, we also performed mtDNA-GWAS for specific traits with previously reported associations. A recent analysis of ~147,437 individuals in BioBank Japan revealed four distinct traits with significant mtDNA associations52. Of these, creatinine and aspartate aminotransferase (AST) had sufficiently large sample sizes in UKB. We observed a large number of associations throughout the mtDNA for both traits (p < 1.15 * 10−5, Figure S9E). Thus, our mtDNA association method was able to replicate robust mtDNA associations among well-powered traits.
Finally, we sought to replicate our negative results in an independent cohort. We turned to published GWAS meta-analyses26–35 (Table S1) and successfully replicated the lack of enrichment for MitoCarta genes across all 10 traits with an available independent cohort GWAS using S-LDSC (Figure 2E, S8B) and MAGMA at two different window sizes (Figure S8D, Supplementary note; all q > 0.1). Importantly, while we were unable to pursue analyses for PD and Alzheimer’s disease in UKB due to limited case counts, we tested MitoCarta genes among well-powered meta-analyses for these disorders (Supplementary note) and observed no enrichment (Figure 2E; all q > 0.1).
In summary, we tested (1) QTLs for mitochondrial physiology in UKB, (2) nucDNA loci near genes that encode the mitochondrial proteome in the GWAS Catalog, UKB, and GWAS meta-analyses, (3) mtDNA variants in UKB, and (4) known transcriptional regulators of mitochondrial biogenesis and function in the GWAS Catalog. We found no convincing evidence of heritability enrichment for common age-associated diseases among these mitochondria-relevant loci (Table S8).
Enrichment of age-related trait heritability near genes encoding nuclear transcription factors
We next asked whether heritability for age-related diseases and traits clusters among loci associated with any cellular organelle. We used the COMPARTMENTS database (https://compartments.jensenlab.org) to define gene-sets corresponding to the proteomes of nine additional organelles53 besides mitochondria (Methods). We used S-LDSC to produce heritability estimates for these categories in the UKB age-related disease traits, finding evidence of heritability enrichment in many traits for genes comprising the nuclear proteome (Figure 3A, Methods). No other tested organelles showed evidence of heritability enrichment. Variation in or near genes comprising the nuclear proteome explained over 50% of disease heritability on average despite representing only ~35% of tested SNPs (Figure S10, Supplementary note). We successfully replicated this pattern of heritability enrichment among organelles using MAGMA in UKB at two window sizes (Figure S13A, S13B), again finding only enrichment among genes related to the nucleus.
With over 6,000 genes comprising the nuclear proteome, we considered largely disjoint subsets of the organelle’s proteome to trace the source of the enrichment signal54–56 (Figure 3B, Methods, Supplementary note). We found significant heritability enrichment within the set of 1,804 genes whose protein products are annotated to localize to the chromosome itself (q < 0.1 for 9 traits, Figure 3C, S12). Further partitioning revealed that much of this signal is attributable to the subset classified as transcription factors56 (1,523 genes, q < 0.1 for 10 traits, Figure 3D, S12). We replicated these results using MAGMA in UKB at two window sizes (Figure S13), and also replicated enrichments among TFs in several (but not all) corresponding meta-analyses (Figure S14) despite reduced power (Figure S7H). We generated functional subdivisions of the TFs (Methods, Supplementary note), finding that the non-zinc finger TFs showed enrichment for a highly similar set of traits to those enriched for the whole set of TFs (Figure S15D, S16B, S17B, S18B). Interestingly, the KRAB domain-containing zinc fingers (KRAB ZFs)57, which are recently evolved (Figure S15H), were largely devoid of enrichment even compared to non-KRAB ZFs (Figure S15E, S16C, S17C, S18C). Thus, we find that variation within or near non-KRAB domain-containing transcription factor genes has an outsize influence on age-associated disease heritability (Table S8).
Mitochondrial genes tend to be more “haplosufficient” than genes encoding other organelles
In light of observing heritability enrichment only among nuclear transcription factors, we wanted to determine if the fitness cost of pLoF variation in genes across cellular organelles mirrored our results. Mitochondria-localizing genes and TFs play a central role in numerous Mendelian diseases49,58–60, so we initially hypothesized that genes belonging to either category would be under significant purifying selection (i.e., constraint). We obtained constraint metrics from gnomAD (https://gnomad.broadinstitute.org)61 as the LoF observed/expected fraction (LOEUF). In agreement with our GWAS enrichment results, we observed that the mitochondrion on average is one of the least constrained organelles we tested, in stark contrast to the nucleus (Figure 4A). In fact, the nucleus was second only to the set of “haploinsufficient” genes (defined based on curated human clinical genetic data61, Methods) in the proportion of its genes in the most constrained decile, while the mitochondrion lay on the opposite end of the spectrum (Figure 4B). Interestingly, even the Mendelian mitochondrial disease genes had a high tolerance to pLoF variation on average in comparison to TFs (Figure 4C, S19A). Even across different categories of TFs, we observed that highly constrained TF subsets tend to show GWAS enrichment (Figure S19B, S15E) relative to unconstrained subsets for our tested traits. Indeed, explicit inclusion of LOEUF as a covariate in the enrichment analysis model (Methods) reduced the significance of (but did not eliminate) the enrichment seen for the TFs (Figure S20B, S21B, S20E, S20F). Thus, while disruption in both mitochondrial genes and TFs can produce rare disease, the fitness cost of heterozygous variation in mitochondrial genes appears to be far lower than that among the TFs. This dichotomy reflects the contrasting enrichment results between the mitochondrial genes and the TFs and supports the importance of gene regulation as it relates to evolutionary conservation.
Discussion
Pathology in cellular organelles has been widely documented in age-related diseases3,7,62–65. Using a human genetics approach, here we report the unexpected discovery that except for the nucleus, cellular organelles tend not to be enriched in genetic associations for common, age-related diseases. We started with a focus on the mitochondria as a decline in mitochondrial abundance and activity has long been reported as one of the most consistent correlates of aging9,14,19,20 and age-associated diseases10–13,15–18,21. We tested mitochondria-relevant common variants on the nucDNA and mtDNA and found no convincing evidence of heritability enrichment in any tested trait, cohort, or method. We systematically expanded our analysis to survey 10 organelles and found that only the nucleus showed enrichment, with much of this signal originating from nuclear transcription factors. Constraint analysis showed a substantial fitness cost to heterozygous loss-of-function mutation in genes encoding the nuclear proteome, whereas genes encoding the mitochondrial proteome were “haplosufficient.”
For highly polygenic and well-powered traits, any large fraction of the genome may explain a statistically significant amount of disease heritability66,67. Indeed, individual associations between mitochondria-relevant loci and certain common diseases have been identified previously68,69. As associations have also been identified among loci relevant for other organelles, enrichment analyses can place these complex genetic architectures in a broader biological context and prioritize pathways for follow-up. Importantly, both MAGMA and S-LDSC are capable of detecting an enrichment even in a highly polygenic background. Both methods have been used in the past to identify biologically plausible disease-relevant tissues36,50 and pathway enrichments70,71 in traits across the spectrum of polygenicity, and we identify enrichments among disease-relevant tissues using both methods in several highly polygenic traits.
While previous work has shown that common disease GWAS can be enriched for expression in specific disease-relevant organs50,72, our data suggest that this framework does not generally extend to organelles. This finding contrasts with our classical nosology of inborn errors of metabolism that tend to be mapped to “causal” organelles, e.g., lysosomal storage diseases, disorders of peroxisomal biogenesis, and mitochondrial OXPHOS disorders. The observed enrichment for transcription factors within the nucleus indicates that common variation influencing genome regulation impacts common disease risk more than variation influencing individual organelles.
Our analysis of common inherited mitochondrial variation represents, to our knowledge, the most comprehensive assessment of mitochondria-relevant nucDNA and mtDNA variation in age-related diseases. We replicated mtDNA associations with creatinine and AST observed previously in BioBank Japan52, further supporting our approach. While individual mtDNA variants have been previously associated with certain traits73–75, these associations appear to be conflicting in the literature, perhaps because of limited power and/or uncontrolled confounding biases such as population stratification76,77. Our negative results are surprising, but they are not inconsistent with a small number of isolated reports interrogating either mitochondria-relevant nucDNA78 or mtDNA52,79–81 loci in select diseases.
To our knowledge, we are the first to systematically document heterogeneity in average pLoF across cellular organelles. That MitoCarta genes are “haplosufficient” and pLoF tolerant (Figure 4A) is consistent with the observation that most of the ~300 inborn mitochondrial disease genes produce disease with recessive inheritance49 and healthy parents. The few mitochondrial disorders that show dominant inheritance are nearly always due to dominant negativity rather than haploinsufficiency. The intolerance of TFs to pLoF variation (Figure 4A) provide a stark contrast to the results from the mitochondria that is borne out in their associated Mendelian disease syndromes: TFs are known to be haploinsufficient82 and even regulatory variants modulating their expression can produce severe Mendelian disease83. We observe heritability enrichment among TFs for 10 different diseases, consistent with observed elevated purifying selection against pLoF variants in these genes. Our enrichment results combined with pLoF intolerance suggest that variation among TFs may produce disease-associated variants with larger effect sizes than expectation, underscoring their importance as genetic “levers” for common disease heritability.
Why are mitochondria so robust to variation in gene dosage and hence “haplosufficient?” We propose three possibilities. First, one possibility is pathway redundancy. For example, in cell culture, defective OXPHOS can be supported thanks to the action of non-mitochondrial pathways such as cytosolic glycolysis and nucleotide salvage as long as key environmental nutrients are provided84. Second, mitochondrial pathways tend to be highly interconnected, and it was already proposed by Wright85 and later by Kacser and Burns86 that haplosufficiency arises as a consequence of physiology, i.e., network organization of metabolic reactions. Kacser and Burns in fact explicitly mention that noncatalytic gene products fall outside their framework, and we believe that our finding that nucleus-localizing and cytoskeletal genes are the two most pLoF-intolerant compartments is consistent with their assessment. Third, mitochondria were formerly autonomous microbes and hence may have retained vestigial layers of “intra-organelle buffering” against genetic variation. Numerous feedback control mechanisms, including respiratory control87, help to ensure organelle robustness across physiological extremes88,89. In fact, a recent CRISPR screen showed that of the genes for which knock-out modified survival under a mitochondrial poison, there is a striking over-representation of genes that themselves encode mitochondrial proteins90.
Throughout this study, we have tested inherited common variant associations via an additive genetic model. We acknowledge the limitations of focusing on a specific genetic model and variant frequency regime, though note that common variation is the largest documented source of narrow-sense heritability, which typically accounts for a majority of disease heritability91,92. First, we consider only common variants. While rare variants may prove to be instructive, it is notable that a previous rare variant analysis in T2D93 failed to show enrichment among OXPHOS genes. Second, we consider only additive genetic models. A recessive model may be particularly fruitful for mitochondria-relevant loci given their tolerance to pLoF variation, however these models are frequently power-limited and may not explain much more phenotypic variance than additive models94,95. Third, we have not considered epistasis. The effects of mtDNA-nucDNA interactions96 in common diseases have yet to be assessed. While there is debate about whether biologically-relevant epistasis can be simply captured by main effects92,94,97,98 at individual loci, it is possible that modeling mtDNA-nucDNA interactions will reveal new contributions. Finally, it is crucial not to confuse our results with previously reported associations between somatic mtDNA mutations and age-associated disease18–20 – the present work is focused on germline variation.
We emphasize that our study does not formally address the causality of mitochondrial dysfunction in common age-related disease. Rather, we have tested if common variants in mitochondrial pathways tend to explain a disproportionate amount of age-related disease heritability. The observed lack of heritability enrichment in mitochondrial pathways does not preclude the possibility of a therapeutic benefit in targeting the mitochondrion for age-related disease. For example, mitochondrial dysfunction is documented in brain or heart infarcts following blood vessel occlusion in laboratory-based models99,100. Though mitochondrial variants do not influence infarct risk in this laboratory model, pharmacological blockade of the mitochondrial permeability transition pore can mitigate reperfusion injury and infarct size101. Future studies will be required to determine if and how the mitochondrial dysfunction associated with common age-associated diseases can be targeted for therapeutic benefit.
Our finding that the nucleus is the only organelle that shows enrichment for common age-associated trait heritability builds on prior work implicating nuclear processes in aging. Most human progeroid syndromes result from monogenic defects in nuclear components102 (e.g., LMNA in Hutchinson-Gilford progeria syndrome, TERC in dyskeratosis congenita), and telomere length has long been observed as a marker of aging103. Heritability enrichment of age-related traits among gene regulators is consistent with the epigenetic dysregulation104 and elevated transcriptional noise3,105 observed in aging (e.g., SIRT6 modulation influences mouse longevity106 and metabolic syndrome63). An important role for gene regulation in common age-related disease is in agreement with both the observation that a very large fraction of common disease-associated loci corresponds to the non-coding genome and the enrichment of disease heritability in histone marks and transcription factor binding sites36,107. Given that a deterioration in several other cellular organelles has been linked to age-related traits, a future challenge lies in elucidating the connection between variation influencing transcription factors and organelle dysfunction in age-related disease.
Data Availability
Genetic correlation point estimates and standard errors plotted in Figure 1B is available in Table S2. Summary statistics from mtDNA-GWAS available in Table S6. All gene-based enrichment analysis p-values and point estimates are available in Table S8. Literature-reported loci associated with biomarkers of mitochondrial function after clumping and QC are available in Table S4. Period prevalence data for diseases in the UK can be obtained from Kuan et al. 2019. Gene-sets can be found using COMPARTMENTS (https://compartments.jensenlab.org), MitoCarta 2.0 (https://www.broadinstitute.org/files/shared/metabolism/mitocarta/human.mitocarta2.0.html), Lambert et al. 2018 (DOI: 10.1016/j.cell.2018.01.029), Frazier et al. 2019 (DOI: 10.1074/jbc.R117.809194), Finucane et al. 2018 (https://alkesgroup.broadinstitute.org/LDSCORE/), Kapopoulou et al. 2015 (DOI: 10.1111/evo.12819), and the Macarthur laboratory (https://github.com/macarthur-lab/gene_lists). Gene age estimates were obtained from Litman, Stein 2019 (DOI: 10.1053/j.seminoncol.2018.11.002). GWAS catalog annotations can be obtained from: https://www.ebi.ac.uk/gwas. UKB heritability estimates can be obtained at: https://nealelab.github.io/UKBB_ldsc/. UKB summary statistics can be obtained from Neale lab GWAS round 2: https://github.com/Nealelab/UK_Biobank_GWAS. Annotations for the Baseline v1.1 and BaselineLD v2.2 models as well as other relevant reference data, including the 1000G EUR reference panel, can be obtained from https://alkesgroup.broadinstitute.org/LDSCORE/. eQTL and expression data in human tissues can be obtained from GTEx (https://www.gtexportal.org). Constraint estimates can be found via gnomAD: https://gnomad.broadinstitute.org. See citations for publicly available GWAS meta-analysis summary statistics26–35.
Code Availability
Our analysis leverages publicly available tools including LDSC for heritability enrichment and genetic correlation (https://github.com/bulik/ldsc), MAGMA v1.07b for gene-set enrichment analysis (https://ctg.cncr.nl/software/magma), PLINK v1.07 for linkage disequilibrium clumping (https://zzz.bwh.harvard.edu/plink/), and Hail v0.2.51 for distributed computing and mtDNA GWAS (https://hail.is).
Competing Interests
VKM is an advisor to and receives compensation or equity from Janssen Pharmaceuticals, 5am Ventures, and Raze Therapeutics. BMN is a member of the scientific advisory board at Deep Genomics and RBNC Therapeutics. BMN is a consultant for Camp4 Therapeutics, Takeda Pharmaceutical and Biogen. KJK is a consultant for Vor Biopharma.
Author Contributions
R.G., B.M.N., and V.K.M. conceived of the project; R.G., K.J.K., D.H. designed analyses; R.G. performed analyses; B.M.N., V.K.M. supervised project; R.G. and V.K.M. wrote the manuscript with input from other authors.
Materials and Methods
Trait selection
Sex-standardized period prevalence of over 300 diseases was obtained from an extensive survey of the National Health Service in the UK as reported previously24. To select high prevalence late-onset diseases, we ranked diseases with a median onset over 50 years of age by the sum of the period prevalence of all age categories above 50. We selected the top 30 diseases using this metric and manually mapped these traits to similar or equivalent phenotypes with publicly available summary statistics from UKB and/or well-powered meta-analyses (e.g., Parkinson’s Disease and Alzheimer’s Disease for dementia) resulting in 24 traits with data available in UKB, meta-analyses, or both (Table S1).
Criteria for inclusion of summary statistics
We manually mapped selected age-related diseases and traits to corresponding phenotypes in UKB. In parallel, we searched the literature to identify well-powered EUR-predominant GWAS (referred to as meta-analyses) that (1) used primarily non-targeted arrays, (2) had publicly available full summary statistics, and (3) did not enroll individuals from UKB to serve as independent replication (Supplementary note). For UKB, we obtained heritability estimates (https://github.com/Nealelab/UKBB_ldsc) previously computed using stratified linkage-disequilibrium score regression (S-LDSC, https://github.com/bulik/ldsc)36 atop the BaselineLD v1.1 model using reference LD scores computed from 1000G EUR. For meta-analyses, we computed heritability estimates with S-LDSC atop the updated BaselineLD v2.2 model using reference LD scores computed from 1000G EUR (https://alkesgroup.broadinstitute.org/LDSCORE/). We computed the heritability Z-score, a statistic that captures sample size, polygenicity, and heritability36, and included only traits with heritability Z-score > 4 (Supplementary note) for further analysis.
Genetic correlations in UKB
Pairwise genetic correlations, rg, were computed using linkage-disequilibrium score correlation37 on all selected age-related traits with heritability Z-score > 4. We used UKB summary statistics (https://github.com/Nealelab/UK_Biobank_GWAS) for all sufficiently powered traits; summary statistics from meta-analyses were used for eGFR33, Alzheimer’s Disease35, and Parkinson’s Disease34 as these traits showed heritability Z-score > 4 within meta-analyses but not in UKB (Table S1). P-values for genetic correlation represented deviation from the null hypothesis rg = 0. Traits were ordered by their contribution to the first eigenvector of the absolute value of the correlation matrix, with point estimates and standard errors available in Table S2. Bonferroni correction was applied producing a p-value cutoff of .
Assessment of mitochondria-localizing genes in the GWAS Catalog
We mapped variants in the GWAS Catalog (obtained on September 5th, 2019, https://www.ebi.ac.uk/gwas/) meeting genome-wide significance (p < 5e-8) to genes using provided annotations, producing a set of trait-associated genes for each trait. We manually selected phenotypes represented in the GWAS Catalog matching our set of age-associated traits with over annotated 30 trait-associated genes. For each trait, we computed the proportion of trait associated genes that were mitochondria-localizing (defined via MitoCarta2.022) and tested for enrichment or depletion relative to overall genome background using two-sided Fisher’s exact tests correcting for multiple hypothesis tests with the Benjamini-Hochberg (BH) procedure at FDR q-value < 0.1.
We also computed the test statistic defined as the number of age-associated traits showing a nominal (not necessarily statistically significant) enrichment for a given gene-set g, for the MitoCarta genes. We then generated an empirical null distribution for . We drew 1,000 random samples of protein-coding genes, where each sample contained the same number of genes as the set of mitochondria-localizing genes and computed for each of these gene-sets (Figure S1B). The one-sided p-value, defined as under the null, was subsequently obtained.
We expanded our enrichment/depletion analysis to all 332 traits in the GWAS Catalog with over 30 trait-associated genes; for enrichment or depletion testing, we used two-sided Fisher’s exact tests and corrected for multiple hypothesis testing with the BH procedure at FDR q-value < 0.1.
Enrichment analysis of literature-curated mitochondria-associated phenotypes
We reviewed the literature for quantitative trait loci (QTLs) for mtDNA copy number (mtCN)40,41, mtRNA abundance/modification42,43, and biomarkers of OXPHOS dysfunction (namely lactate, pyruvate, lactate/pyruvate ratio46–48, and GDF15 abundance44,45) (Supplementary note). We subsequently used PLINK v1.07 (https://zzz.bwh.harvard.edu/plink/)108 to identify independent variants for each phenotype based on the 1000G EUR reference panel (Supplementary note). To test for overlap with UKB age-associated disease traits, we divided curated variants into three classes: mtCN-related (21 variants), mtRNA-related (78 variants), and OXPHOS biomarkers (62 variants). For each of the 21 UKB age-related disease traits, we computed the number of genome-wide significant (p < 5e-8) variants that overlapped the curated variants for each class, termed where c is the class. We only considered variants with INFO > 0.8 and MAF > 0.001 or expected case MAC > 25. For significance testing, we generated an empirical null distribution around including only variants with INFO > 0.8. For each class, we drew variants at random 2500 times matching on LD score, in-sample MAF, and distance to transcription start site (where the distance metric was set to 0 if the variant was located within a gene boundary). LD scores per variant were generated per-chromosome with a 1 cm window using the 1000G EUR reference panel. The was then computed for each category for each set of randomly selected variants, generating a category specific empirical null distribution for the statistic (Figure S2). The one-sided p-value, defined as under the null, was subsequently obtained. To correct for multiple hypothesis testing, we applied the BH procedure with FDR < 0.1 and also applied a Bonferroni threshold of .
Harmonization and filtering of summary statistics for LDSC and MAGMA
UKB summary statistics previously formatted for use with LDSC and filtered to HapMap3 (HM3) SNPs (https://github.com/Nealelab/UKBB_ldsc) were used for analysis with S-LDSC. For analysis with MAGMA v1.07b51, we included variants from the full Neale Lab UKB Round 2 GWAS summary statistics (https://github.com/Nealelab/UK_Biobank_GWAS) with INFO > 0.8 and MAF > 0.01, and excluded any variants flagged as low confidence (a heuristic defined by MAF < 0.001 or expected case MAC < 25).
Summary statistics obtained from publicly available GWAS meta-analyses26–35 were reported in varied formats. We manually verified the genome build upon which each meta-analysis reported results and ensured that all sets of summary statistics contained columns listing P-value, variant rsID, genome-build specific coordinates, and if available, variant-specific sample size (Table S1). If variant coordinates or rsID were not provided, the relevant columns were obtained from dbSNP database version 130 (for hg18) or 146 (for hg19). We used the summary statistic munging script provided with S-LDSC (https://github.com/bulik/ldsc) to generate summary statistics compatible with S-LDSC, restricting to HM3 SNPs as these tend to be best behaved for analysis with LDSC. For use of meta-analyses with MAGMA51, we restricted analysis to variants with INFO > 0.8 and MAF > 0.01 if such information was provided.
Multiple testing correction for gene-set enrichment analysis
To account for the multiple hypothesis tests performed throughout this study, we obtained p-value thresholds via the BH procedure at FDR < 0.1 for all gene-sets assessed for a given method and cohort type (where the two cohort types were UKB and meta-analysis).
Gene-set based enrichment analysis
We extensively use S-LDSC and MAGMA to perform gene-set enrichment analyses among GWAS summary statistics. To test enrichment with S-LDSC, SNPs were mapped to each gene with a 100kb symmetric window as recommended50 and LD scores were computed using the 1000G EUR reference panel (https://alkesgroup.broadinstitute.org/LDSCORE/) and subsequently restricted to the HM3 SNPs. We used S-LDSC to test for heritability enrichment controlling for 53 annotations including coding regions, enhancer regions, 5’ and 3’ UTRs, and others as previously described36 (baseline v1.1, referred to as baseline model hereafter). We also used MAGMA with both 5kb up, 1.5kb down and 100kb symmetric windows to test for enrichment. MAGMA gene-level analysis was performed with the 1000G EUR LD reference panel to account for LD structure, and gene-set analysis was performed including covariates for gene length, variant density, inverse minor allele count (MAC), as well as log-transformed versions of these covariates. Statistical tests for both S-LDSC and MAGMA were one-sided, considering enrichment only. For both methods, we included the relevant superset of genes as a control to ensure that our analysis was competitive (Supplementary note). We refer to this approach as the ‘usual approach’. All enrichment effect size estimates and p-values are available in Table S8.
Enrichment analysis of genes comprising the mitochondrial proteome
We obtained the set of nuclear-encoded mitochondria-localizing genes using MitoCarta2.022 and used the literature to obtain the subset of MitoCarta genes involved in inherited mitochondrial disease49 as well as those producing components of oxidative phosphorylation (OXPHOS) complexes. We used both S-LDSC and MAGMA to test for enrichment in the usual way (Methods) controlling for the set of protein-coding genes to ensure a competitive analysis (Supplementary note). We also tested mitochondria-localizing genes for enrichment in meta-analyses using S-LDSC and MAGMA with the same parameters as for UKB traits (Supplementary note).
Tissue-expressed gene-set enrichment analysis
To obtain the set of genes most expressed in a given tissue versus others, we obtained t-statistics computed from GTEx v6 gene-level transcript-per-million (TPM) data corrected for age and sex as published previously50. For each tissue, we selected the top 2485 genes (10%) with the highest t-statistics for tissue-specific expression, producing tissue-expressed gene-sets. We selected nine tissues based on expectation of enrichment for our tested traits in UKB (e.g., liver for LDL levels, esophageal mucosa for GERD). We used both S-LDSC and MAGMA to test for enrichment in the usual way (Methods) controlling for the set of tissue-expressed genes to ensure a competitive analysis (Supplementary note). Tissue-expressed gene-set analyses were performed on meta-analyses with S-LDSC and MAGMA on the same tissues using the same parameters as used in UKB.
Power analysis
To test for the effects of gene-set size on power, we selected ten positive control tissue-trait pairs based on (1) the presence of tissue enrichment in UKB with S-LDSC and MAGMA and (2) if the observed enrichment was biologically plausible. The pairs tested were liver-HDL, liver-LDL, liver-TG, liver-cholesterol, pancreas-glucose, pancreas-type 2 diabetes, atrial appendage-atrial fibrillation, sigmoid colon-diverticular disease, coronary artery-myocardial infarction, and visceral adipose-HDL. We then, in brief, used an empirical sampling-based approach, generating random subsamples of a selected set of tissue-expressed gene-sets at four different gene-set sizes (1523, 1105, 800, and 350 genes), defining power as the proportion of trials showing a significant enrichment (Supplementary note). We used the same sub-sampled gene-sets for enrichment analysis using both S-LDSC and MAGMA in the usual way (Methods) controlling for the set of tissue-expressed genes to ensure a competitive analysis (Supplementary note). We used the same gene-sets among the subset of the positive control traits that showed enrichment in the corresponding meta-analysis to verify power for the meta-analyses (Supplementary note).
Cross-tissue eQTL analysis
We obtained the set of eGenes from GTEx v8 across 49 tissues (https://www.gtexportal.org), filtering to only include cis-eQTLs with q-value < 0.05. To determine how the landscape of cis-eQTLs for MitoCarta genes compared to other protein-coding genes, we regressed the number of tissues with a detected cis-eQTL for a given gene x, , onto an indicator for membership in a given organellar proteome, controlling for gene length, log gene length, breadth of expression (τx), and the number of tissues with detected expression > 5 TPM (, Supplementary note). To quantify breadth of expression, we obtained median-per-tissue GTEx v8 TPM expression values and computed τ109 after removing lowly-expressed genes with maximal cross-tissue TPM < 1, defined as: where xi is the expression of gene x in tissue i with n tissues. τ ranges from 0 to 1, with lower τ indicating broadly expressed gene and higher τ indicating more tissue specific expression patterns. Because GTEx sampled multiple tissue subtypes (e.g., brain sub-regions) that show correlated expression profiles110 which bias τx, , and upward, for each broader tissue class (brain, heart, artery, esophagus, skin, cervix, colon, adipose) we selected a single representative tissue when computing these quantities (Figure S14B, Supplementary note). We used LD scores computed from the 1000G EUR reference panel.
The model, fit via OLS for each tested organelle, was:
mtDNA-wide association study
We obtained mtDNA genotype data on 265 variants as obtained on the UK Biobank Axiom array and the UK BiLEVE array from the full UKB release25. To perform variant QC, we used evoker-lite111 to generate fluorescence cluster plots per-variant and per-batch and manually inspected the results, removing 19 variants due to cluster plot abnormalities (Table S5, Supplementary note). We additionally removed any variants with heterozygous calls, within-array-type call rate < 0.95, and with less than 20 individuals with an alternate genotype. For case-control traits, we removed any phenotype-variant pair with an expected case count of alternate genotype individuals of less than 20, resulting in a maximum of 213 variants tested per trait (Supplementary note). To perform sample QC, we restricted samples to the same samples from which UKB summary statistics were generated (https://github.com/Nealelab/UK_Biobank_GWAS), namely unrelated individuals 7 standard deviations away from the first 6 European sample selection PCs with self-reported white-British, Irish, or White ethnicity and no evidence of sex chromosome aneuploidy. We additionally removed any samples with within-array-type mitochondrial variant call rate < 0.95, resulting in 360,662 unrelated samples of EUR ancestry. We generated the LD matrix for mitochondrial DNA variants using Hail v0.2.51 (https://hail.is) pairwise for all 213 variants tested across all post-QC samples.
We ran mtDNA-GWAS for all 21 UKB age-related phenotypes as well as creatinine and AST using Hail v0.2.51 via linear regression controlling for the first 20 PCs of the nuclear genotype matrix, sex, age, age2, sex*age, and sex*age2 as performed for the UKB GWAS (https://github.com/Nealelab/UK_Biobank_GWAS). We also used Hail to run Firth logistic regression with the same covariates for case/control traits (Table S1). As we observed that some mitochondrial DNA variants were specific to array type, we also ran linear regression including array type as a covariate; we did not perform logistic regression with array type as a covariate due to convergence issues secondary to complete separation of variants assessed only on only array type. We defined mtDNA-wide significance using a Bonferroni correction by .
Enrichment analysis of components of organellar proteomes
COMPARTMENTS (https://compartments.jensenlab.org)53 is a resource integrating several lines of evidence for protein localization predictions including annotations, text-mining, sequence predictions, and experimental data from the Human Protein Atlas. We used this resource to obtain the degree of evidence (a number ranging from 0 to 5) linking each gene to localization to one of 12 organelles: nucleus, cytosol, cytoskeleton, peroxisome, lysosome, endoplasmic reticulum, Golgi apparatus, plasma membrane, endosome, extracellular space, mitochondrion, and proteasome. To avoid noisy localization assignments due to weak text mining and prediction evidence, we only considered localization assignments with a score > 2 as described previously53. We subsequently assigned compartment(s) to each gene by selecting the compartment(s) with the maximal score within each gene. We only included compartments containing over 240 genes due to limited power at these smaller gene-set sizes and used MitoCarta2.022 to obtain a higher confidence set of genes localizing to the mitochondrion, resulting in gene-sets representing the proteomes of 10 organelles. S-LDSC and MAGMA were used to test for enrichment across the UKB age-related traits for these gene-sets in the usual way, controlling for the set of protein-coding genes. S-LDSC was also used to obtain estimates of the percentage of heritability explained by each organelle gene-set.
Enrichment analysis of spatial components of the nucleus
To produce interpretable sub-divisions of the nucleus, we used Gene Ontology (GO)54,55 to identify terms listed as children of the nucleus cellular component (GO:0005634). We used Ensembl version 99112 to obtain a first pass set of genes annotated to each sub-compartment of the nucleus (or its children). After manual review of sub-compartments with > 90 genes, we selected nucleoplasm (GO:0005654), nuclear chromosome (GO:0000228), nucleolus (GO:0005730), nuclear envelope (GO:0005635), splicosomal complex (GO:0005681), nuclear DNA-directed RNA polymerase complex (GO:0055029), and nuclear pore (GO:0005643). We excluded terms listed as ‘part’ due to poor interpretability and manually excluded similar terms (e.g., nuclear lumen vs nucleoplasm). To generate a high confidence set of genes localizing to each of these selected sub-compartments, we then turned to the COMPARTMENTS resource which assigns localization confidence scores for each protein to GO cellular component terms. We assigned members of the nuclear proteome to these selected nuclear sub-compartments using same the approach outlined for the organelle analysis (Methods). After filtering our selected sub-compartments to those containing > 240 genes, we obtained four categories: nucleoplasm, nuclear chromosome, nucleolus, and nuclear envelope. The nuclear chromosome annotation was largely overlapping with a manually curated high-quality list of transcription factors56 however was not exhaustive; as such, we merged these lists to generate the chromosome and TF category. To improve interpretability, we removed genes from nucleoplasm that were also assigned to another nuclear sub-compartment, constructed a list of other nucleus-localizing proteins not captured in these four sub-compartments, and included only genes annotated as localizing to the nucleus (Methods). S-LDSC and MAGMA were used to test for enrichment across the UKB age-related traits for these gene-sets in the usual way while controlling for the set of protein-coding genes (Methods).
Enrichment analysis of functionally distinct TF subsets
We used a published curated high-quality list of TFs56 to partition the Chromosome and TF category into transcription factors and other chromosomal proteins. To determine which TFs are broadly expressed versus tissue specific, we computed τ per TF across all selected tissues after removing lowly-expressed genes with maximal cross-tissue TPM < 1 (Methods, Supplementary note). The threshold for tissue-specific genes was set at τ ≥ 0.76 based on the location of the central nadir of the resultant bimodal distribution (Figure S14A). To identify terciles of TFs by age, we obtained relative gene age assignments for each gene previously generated by obtaining the modal earliest ortholog level across several databases mapped to 19 ordered phylostrata113. DNA binding domain (DBD) annotations for the TFs were obtained from previous manual curation efforts56. S-LDSC and MAGMA were used to test for enrichment across the UKB age-related traits for these gene-sets in the usual way while controlling for the set of protein-coding genes (Methods). We also tested TFs for enrichment in meta-analyses using S-LDSC and MAGMA with the same parameters as for UKB traits (Supplementary note).
Analysis of constraint across organelles and sub-organellar gene-sets
We obtained gene-level gnomAD v2.1.1 constraint tables (https://gnomad.broadinstitute.org), haploinsufficient genes, and olfactory receptors61 (https://github.com/macarthur-lab/gene_lists). Constraint values as loss-of-function observed/expected fraction (LOEUF) were mapped to genes within organelle, sub-mitochondrial, sub-nuclear, and TF binding domain gene-sets.
Enrichment analysis across age-related disease holding constraint as a covariate
To test for enrichment with constraint as a covariate, we used MAGMA with UKB age-related traits. We mapped variants to genes and performed the gene-level analysis as done previously for the mitochondria-localizing gene and organelle analysis. We included LOEUF and log LOEUF as covariates for the gene-set analysis in addition to the default covariates (gene length, SNP density, inverse MAC, as well as the respective log-transformed versions) via the –condition-residualize flag.
Acknowledgements
We thank D. Altshuler, S.E. Calvo, H. Finucane, E.S. Lander, M.E. MacDonald, D. Palmer, E.B. Robinson, A.V. Segrè, M.E. Talkowski, R.K. Walters, C.C. Winter, and members of the Mootha and Neale labs for critical feedback and discussions. This research has been conducted using the UK Biobank Resource under Application Number 31063. This project was supported in part by grants (NIH R35GM122455 to V.K.M. and NIH T32 AG000222 to R.G.) from the National Institutes of Health. VKM is an Investigator of the Howard Hughes Medical Institute.
Footnotes
In this manuscript update, the abstract, introduction, discussion were modified to more clearly highlight the previous literature linking organelle dysfunction and the contrast we observe between this literature and our results.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.
- 13.↵
- 14.↵
- 15.↵
- 16.
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.
- 115.
- 116.
- 117.
- 118.
- 119.
- 120.
- 121.
- 122.
- 123.