Abstract
Cognitive functioning is heritable, with metabolic risk factors known to accelerate ageassociated cognitive decline. Identifying genetic underpinnings of cognition is thus crucial.
Here, we undertake single-variant and gene-based association analyses upon six neurocognitive phenotypes across six cognition domains in whole-exome sequencing data from 157,160 individuals in the UK Biobank to expound the genetic architecture of human cognition. We further identify genetic variants interacting with APOE, a significant genetic risk factor for cognitive decline, while controlling for lipid and glycemic risks, towards influencing cognition. Additionally, considering lipid and glycemic traits, we conduct bivariate analysis to underscore pleiotropic effects and also highlight suggestive mediation effects of metabolic risks on cognition.
We report 18 independent novel loci associated with five cognitive domains while controlling for APOE isoform-carrier status and metabolic risk factors. Our novel variants are mostly in genes which could also impact cognition via their functions on synaptic plasticity and connectivity, oxidative stress, neuroinflammation. Variants in or near these identified loci show genetic links to cognitive functioning in association with APOE, Alzheimer’s disease and related dementia phenotypes and brain morphology phenotypes, and are also eQTLs significantly controlling expression of their corresponding genes in various regions of the brain. We further report four novel pairwise interactions between exome-wide significant loci and APOE variants influencing episodic memory, and simple processing speed while accounting for serum lipid and serum glycemic traits. We obtain both APOC1 and LRP1 as significantly associated with complex processing speed and visual attention in our gene-based analysis. They also exhibit significant interaction effect with APOE variants in influencing visual attention. We find that variants in APOC1 and LRP1 act as significant eQTLs for regulating their expression in basal ganglia and cerebellar hemispheres, crucial to visual attention. Taken together, our findings suggest that APOC1 and LRP1 have plausible roles along pathways of amyloid-β, lipid and/or glucose metabolism in affecting visual attention and complex processing speed. Interestingly, variants in MTFR1L, PPFIA1, PCDHB16, ATP2A1 show evidence of pleiotropy and mediation effects through serum glucose/HDL levels affecting four different cognition domains.
This is the first report from large-scale exome-wide study with evidence underscoring the effect of LRP1 on cognition. Our research highlights a novel set of loci that augments our understanding of the genetic underpinnings of cognition during ageing, considering cooccurring metabolic conditions that can confer genetic risk to cognitive decline in addition to APOE, which can aid in finding causal determinants of cognitive decline.
Introduction
Cognition refers to a plethora of mental processes that guides acquisition, transformation, storage, recovery and implementation of information and is key to good health. Understanding genetic predispositions for inter-individual differences in age-related cognitive decline is of paramount importance in healthy ageing. Genome-wide studies on cognition have shown that intelligence in humans are heritable and individual differences can be explained by genetic variations.1–5 Non-invasive neuropsychological cognitive assessments serve as dependable endophenotypes to assess brain functioning in healthy aging and dementia.6,7
The APOE locus, confers the highest genetic risk for Alzheimer’s dementia and is also known to be associated with nonpathological cognitive ageing.8 ApoE is the major apolipoprotein that plays a central role in maintaining homeostasis in the brain via transport and clearance of lipids and amyloid beta (Aβ). Several other age-associated metabolic disorders, namely, obesity, type 2 diabetes, dyslipidemia and cardiovascular disease can act as modifiable risk factors for cognitive impairment.9 Interplay among ApoE, lipid homeostasis, brain glucose, and Aβ trafficking in animal models of Alzheimer’s disease has been reported.10
In this study, we decipher the genetic underpinnings of cognitive functioning while considering the effects of putative interrelations with metabolic risk factors in the UK Biobank. We also identify variants in crucial genes that work in conjunction and interact with APOE in influencing cognitive functioning at a granularity of specific cognitive domains in the presence of lipid and glycemic metabolic risk factors.
Materials and Methods
Samples and participants
We present our analysis based on whole exomes of 200,643 individuals enrolled in UK Biobank (approved project-ID 55652).11
Phenotypes
We consider six cognitive domains of simple processing speed, episodic memory, fluid intelligence, working memory, visual attention, and complex processing speed corresponding to which ‘Reaction time’, ‘Pairs’, ‘Reasoning’, ‘Digit recall’, ‘Trail making’, ‘Digit-symbol substitution’ cognitive tests were administered on the UK Biobank participants (https://biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id=8481) (further details in Supplementary information).
Genetic data and quality control
We download the UK Biobank population-level exome OQFE files for ~200k exomes in pVCF format (Field id: 23156) using the ‘gfetch’ utility. After extensive quality checks (details in Supplementary information), we retain 157,160 individuals with 211,012 variants (Supplementary Table 1).
Heritability
Before proceeding to association analyses, we assess the heritability of the six cognition phenotypes based on unrelated individuals using LDAK12 model (Supplementary information). Our heritability estimates (Supplementary Table 2) show good concordance with evidence from previous family-based studies and GWAS ATLAS resource.13
APOE-carrier status determination
Out of 157,160 samples, 93 have missing genotype information for APOE at either rs7412 or rs429358 or both. We determine APOE-carrier status, by flagging samples with at least one copy of ∈4 allele as risk, with at least one copy of ∈2 as protective/beneficial, ∈1/∈3 and ∈3/∈3 carriers as neutral, to include as a covariate in association models (Supplementary Table 3).
Statistical Analyses
Single variant association
With genetic data on the resultant 157,067 samples and 211,012 variants, we perform single-variant Wald test using rvtests.14 Our baseline model control for age, gender, educational qualification, top 10 principal components and APOE-carrier status (Supplementary information). Further, to control for age-related metabolic conditions that can adversely affect cognition, we add lipid levels (serum total cholesterol, HDL and LDL direct cholesterol, triglycerides), glucose and HbAlc levels separately as covariates to the baseline model in models 2 and 3 (Supplementary information). We obtain the residuals and test the inverse-normalized residuals against genotype of each variant (Supplementary information). We obtain Manhattan plots (Supplementary Fig. 1-4) and QQplots (Supplementary Fig. 5-6) to visualize our results, and significant hits. To ascertain novelty, we use the LDtrait module of LDlink15 to check if variants which are in high LD (r2 > 0.8) and fall within ± 500kbp with our significant variant were previously associated with any trait listed in the EBI-GWAS catalogue.16
Gene-based association
We perform gene-based association tests in 157,067 individuals with kernel based (SKAT 17 and unified kernel and burden based methods (SKAT-O18) to detect cumulative burden in genes that work concomitantly with APOE. Figure 1 represents the possible pathways in which APOE affects neuronal dysfunction and hence cognition. Consideration of genes along these pathways ensures capturing the genetic basis of cognition in association with lipids homeostasis, glucose metabolism and amyloid-beta pathogenesis that plausibly play vital roles in modulating cognition with age.
Pairwise epistasis
We further uncover the interactions among the significant loci discovered by our single-variant association tests, through pairwise epistasis analysis (using plink-1.9.0 software)19 with each of the two APOE isoform-defining SNPs (rs429358 and rs7412). We control for all covariates used in models 2 and 3 except APOE-status and obtain the inverse-normalized residuals before testing for interaction effects.
Next, we conduct pairwise epistasis test for the variants in significant gene hits from either the SKAT or SKAT-O model (p-value < 0.0025), with APOE isoform-defining rs429358 and rs7412, as well as all APOE SNPs in two different models (Supplementary information).
Bivariate association tests
For each of the significant variants from our single-variant association tests, we conduct bivariate association tests for cognitive measures and lipid levels/ glycemic traits respectively, with the summary statistics obtained from the single variant Wald test results using metaMANOVA and metaUSAT20 (Supplementary information).
Annotation and tissue-expression analysis
We annotate our exome-wide significant hits by mapping them to nearest genes (web resources) and calculating their deleteriousness using CADD21 scores, where higher scores indicate more deleteriousness. We perform functional annotation of the variants uncovered and calculate LofTool scores (web resources). Lower the LoFtool score, more intolerant is the said variant to functional changes. Also, we investigate, using GTEx,22 if our variants are eQTL loci or lie near eQTL loci significantly regulating expression in brain regions for the respectively annotated genes.
Data availability
All phenotype and genotype data used in this study for analysis are available at UK BIOBANK (https://www.ukbiobank.ac.uk). We shall share the in-house scripts as required by other researchers.
Results
Identification of exome-wide significant variants for cognitive domains
From our exome-wide analysis on 211,012 variants in 157,160 individuals, we identify 20 independent loci associated to five different domains of cognition (summarized in Table 1, detailed results in Supplementary Table 4) with and without controlling for metabolic risks.
Fluid intelligence
We identify a novel rare variant rs115865641 in PCDHB16 (3’UTR) associated with fluid intelligence (Supplementary Table 4). PCDHB16 localizes mainly in the post-synaptic compartment and serve as a candidate gene for specification of synaptic connectivity and neuronal networks,23 a key element for cognition. Controlling for HDL and glucose separately, we obtain two independent significant novel hits-rs17876162 in PON2 (intronic) and rs3824734 (synonymous) in CPEB3 (Supplementary Table 4). PON2 (Paraoxonase-2), a mitochondrial enzyme, has higher expression in dopaminergic regions such as striatum, striatal astrocytes, and cortical microglia,24 which suggest its role in protecting cells from oxidative damage and neuroinflammation.24 CPEB3 is involved in synaptic protein regulation, acting as a negative regulator of AMPA receptor subunits GluA1,GluA2 to maintain long-term synaptic plasticity.25 Our novel synonymous rs3824734 (CPEB3) with a CADD score of 11.74 (Supplementary Table 4), implying that it is predicted to be among 7% of the most deleterious substitutions to the genome, could be crucial for pinpointing the role of this gene in cognition.
Simple processing speed
We detect rs3813363 (5’ UTR of SAMD3), and rs17662853, (missense variant in KANSL1) (CADD score 23.9) (Supplementary Table 4) to be associated with simple processing speed in the baseline as well as all models controlling for lipid and glycemic traits. rs3813363 is within 500kbp and in high LD (r2 > 0.8) of both rs11154580 and rs6937866, known to be associated with reaction time5 (Fig. 2). Highly deleterious rs17662853 is in high LD (r2=0.856) with intronic rs10775404 (CADD score: 1.782) previously associated with reaction time (Fig. 2)5, highlighting that our variant could be more impactful, and more likely to be causal. Koolen-de Vries syndrome/17q21.31 microdeletion syndrome characterized by intellectual disability has been attributed to mutations in KANSL1.26 One study showed that autophagosome accumulation at excitatory synapses in KANSL1-deficit neurons lead to reduced synaptic density, reduced transmission via GRIA/AMPA receptors along with impairment of neuronal network activity.27 We also identify two novel variants-rs73922480 and rs77285514 (synonymous and intronic GPR108 respectively, 80 bp apart) to be associated with mean reaction time in the baseline model as well as controlling for HbA1c (Supplementary Table 4). Controlling for HbA1c, we additionally identify novel rs201404149 (synonymous MTFR1L), rs3825970 and rs1549317 (synonymous LARP6) associated with mean reaction time (Supplementary Table 4). A recent study showed that MTFR1L expression changed in the hippocampus and cerebral cortex in memantine-treated transgenic Alzheimer’s diseased mice.28 Memantine is an FDA approved prescription drug administered to improve learning and memory for moderate-to-severe Alzheimer’s cases; suggesting the importance of our identified loci in MTFR1L for understanding cognition in the pathophysiology of Alzheimer’s disease. We identify novel hits rs11062991 (intronic CCDC77) and rs2959174 (synonymous THAP10; intronic LRRC49) while controlling for serum HDL (Supplementary Table 4). rs11062991and rs2959174 are also associated with mean reaction time when we control for glucose and total cholesterol respectively. Although, immediate relevance to cognition phenotype for LRRC49/THAP10 is unapparent, our variant in LRRC49/THAP10 is an eQTL for LARP6 in brain regions contextual to cognitive abilities.
Previously known statistically significant effects of our exome-wide significant cognition-associated loci (mapped to nearest genes) on related metabolic and brain structure (obtained from EBI-GWAS catalogue) is highlighted in pink-purple gradient. Darker color signifies more significant association. Grey signifies no significant association.
Complex processing speed
We find novel rs12932325 (intronic GTF3C1) associated with complex processing speed from the baseline model. GTF3C1 loci have been found to be significantly associated with entorhinal cortical thickness, an Alzheimer’s disease-related neuroimaging biomarker (Fig. 2).29,30 Our variant could potentially influence such related traits, and thus allude to the shared genetic mechanisms of cognition, Alzheimer’s disease and grey matter density. Controlling for LDL and total cholesterol levels independently, we detect rs12301915 (intronic PTPN11) as a novel hit (Supplementary Table 4). PTPN11 is a tyrosine phosphatase that activates MAPK pathway, plays a critical role in synaptic plasticity and memory formation,31 and interacts with tau in Alzheimer’s patients.32 Mutations in PTPN11 have been associated with numerous syndromes among which the human cognition affecting Noonan syndrome is the most common,33 along with cardiovascular abnormalities and congenital heart defects.34 Controlling for HbA1c, we identify rs71467481 (intronic PPFIA1) as another novel significant hit (Supplementary Table 4). PPFIA1 encodes the neuronal scaffold protein liprin-α1 functioning in active synaptic zones and post-synaptic sites,35 and has been proposed as a candidate gene involved in late-onset Alzheimer’s disease etiology.36
Episodic memory
We identify three novel variants rs146766120 (missense AMIGO1), rs7780766 (synonymous PTPN18), and rs111522866 (intronic ITPR3) to be associated with episodic memory in the baseline as well as controlling for serum HbA1c levels (Supplementary Table 4). From the HbA1c controlled models, we additionally identify rs3754644 (missense IQCA1) and rs73529530 (intronic ATP2A1) to be associated with episodic memory (Supplementary Table 4). Upon controlling for LDL, we find another novel variant rs7725495 (intronic POLR3G) to be associated with episodic memory (Supplementary Table 4). rs146766120 (CADD score: 15.97) is among the top ~3% deleterious variants, and AMIGO1 is a commonly altered marker gene in Alzheimer’s patients.37 PTPN18 is a non-receptor tyrosine phosphatase expressed in neural tissues, likely influencing Alzheimer’s disease progression.38 ITPR3 encodes inositol 1,4,5-trisphosphate receptor, type 3, which mediates release of intracellular calcium and facilitates crucial intra-organellar Ca2+ signal transmission from the endoplasmic reticulum (ER) to the mitochondria39 for maintaining proper cognition. IQCA1, found to be upregulated in the hippocampus of Alzheimer’s-like monkeys as compared to normal aged monkeys, is postulated to be associated with brain AMPKα2 activity playing a pivotal role in de-novo protein synthesis, an indispensable phenomenon for long-term synaptic plasticity and memory formation.40 ATP2A1 encodes proteins associated with mitochondria-associated-ER membrane (MAM)41 and disruption at this locus could perturb MAM functioning which is posited to play a role in Alzheimer’s disease pathogenesis. Our results thus provide genetic insights into cognition in Alzheimer’s disease.
Visual attention
We detect novel rs11589562 (intronic MAST2) as associated with visual attention, measured by alphanumeric trail duration, when controlling for HDL levels. rs11589562 can significantly control expression of several nearby genes, including MAST2 in different brain regions.
Genes implicated in cognition from kernel and burden tests
We identify APOC1 to be significantly associated with complex processing speed and visual attention in baseline model and in models controlling for LDL, total cholesterol, triglycerides and HbA1c (Supplementary Table 5). APOC1 (~5kb downstream of APOE) encodes the smallest of all lipoproteins participating in lipid transport and metabolism and is known to be pleiotropically associated with serum HDL, LDL, triglyceride and cholesterol and HbA1c levels.42 Animal model studies have indicated the role of APOC1, expressed in astrocytes and endothelial cells of hippocampus, in cognitive processes in both APOE dependent and independent manner. 43 rs4420638 (APOC1), has been implicated in general intelligence44 and CSF biomarker levels.45 However, we report the first evidence of APOC1 influencing two specific cognitive domains through collective burden of all variants in the gene in a human population. We could uncover this effect of APOC1, after filtering out the more plausible effects of APOC1 in lipid and glycemic pathways, thereby highlighting the independent role of APOC1 in cognition and the importance of considering appropriate co-occurring metabolic risks in genetic epidemiological studies.
Controlling for HDL and the baseline covariates, we also identify LRP1 as a significant gene influencing visual attention (Supplementary Table 5). A few targeted studies indicate that LRP1 SNPs and haplotypes influence cognitive performance in Chinese patients with risk of Alzheimer’s disease.46,47 This gene encodes the low-density lipoprotein receptor-related protein1, an endocytotic receptor with over 40 ligands including ApoE and Aβ, regulating Aβ uptake and clearance across the blood-brain barrier along with its signaling role in Alzheimerls disease pathology.48 Our results provide first ever evidence from large scale human whole-exome based analysis on the role of the elusive LRP1, in visual attention.
Pleiotropy and mediation
Out of the 20 independent loci (Fig. 3A), 15 independent loci (PCDHB16, PON2, MTFR1L, SAMD3, LARP6, KANSL1, GPR108, PPFIA1, PTPN11, AMIGO1, PTPN18, IQCA1, POLR3G, ITPR3, ATP2A1) exhibit pleiotropic effects on lipid and/or glycemic phenotypes (Supplementary Tables 6-10). Interestingly, we identify suggestive mediating effects of four of these 20 loci on their respective cognitive domains. rs115865641 (PCDHB16), associated to fluid intelligence in our baseline model, is also found to be associated with HDL and glucose, but shows effect sizes reduced in magnitude when we control for HDL and glucose levels separately, and is also pleiotropically associated with lipid and glycemic traits (Supplementary Table 6). This suggests that serum HDL and glucose levels could partially mediate the effect of rs115865641on fluid intelligence along with its pleiotropic effect. Similar effects were observed for rs201404149 (MTFR1L) associated to simple processing speed controlling for HbA1c. rs201404149 is significant from the baseline model, pleiotropically associated with serum glucose levels and this variant shows reduced effect size on mean reaction time when controlling for serum glucose levels (Supplementary Table 7) indicating that the effect of this variant on reaction time could be partially mediated through its effect on serum glucose levels, providing further evidence of metabolic risk affecting cognition. Similarly, the PPFIA1 variant rs71467481 is significant in the baseline model, and is associated with serum glucose levels but shows reduced effect size than baseline when controlling for glucose implying that the effect of this variant may be mediated through glucose homeostasis in influencing complex processing speed. This variant also shows pleiotropic association with HDL, LDL and glucose levels (Supplementary Table 9). rs73529530 in ATP2A1 shows association with HDL and glucose levels in addition to pleiotropic association with cognition phenotype and all lipid levels and serum glucose levels. rs73529530 may also affect episodic memory by mediation through serum HDL and glucose levels as reflected by the reduction in magnitude of effect size compared to baseline when the phenotype is controlled for HDL and glucose levels (Supplementary Table 10).
(A) Variants and genes we have uncovered associated to diverse cognition domains. The genes to which our single variant hits have been annotated and the genes identified from gene-based tests (given in bold) have been represented here. The variants which are eQTLs for the genes they have been mapped to have been represented in red; which are eQTLs for nearby genes have been given in green; and the ones which are eQTLs for both their annotated and nearby genes have been represented in light blue. The genes corresponding to variants which are suggestive eQTLs (because of their proximity to eQTL variants) have been shown in black. The missense variants have been represented with asterisk sign (*) beside their corresponding genes.
(B) Circos plot showing pairwise interactions of loci with APOE influencing diverse domains of cognition. The numbers on the periphery of the circle represent the chromosome. The purple lines represent interactions influencing episodic memory, the yellow lines represent interactions influencing simple processing speed and the turquoise lines represent interactions affecting visual attention. Tables 2 and 3 contain related details.
Expression profile analysis
eQTL analysis of significant loci associated with fluid intelligence
We identify novel rs115865641(3’ UTR of PCDHB16) associated with fluid intelligence scores from the baseline model. Controlling for HDL and glucose levels separately, we obtain two independent significant novel hits-rs17876162 in PON2 and rs3824734 in CPEB3. rs3824734 is an eQTL controlling significant expression of CPEB3 in cerebellar hemispheres (NES=0.22, p-value=3.8 x 10-5) (Supplementary Table 11) which is known for its role in influencing intelligence.49 Even though rs115865641 (rare variant) is not a significant eQTL controlling expression of PCDHB16 as per GTEx data, we find that all eQTL variants lying within ± 500kb of our novel variant significantly control expression of PCDHB16 in cerebrum, which contains the prefrontal cerebral cortex – the postulated seat of fluid intelligence,49,50 and also in cerebellar hemispheres, hippocampus and basal ganglia (Supplementary Table 12). Tissue specific expression data reveals that PON2 is highly expressed in frontal cortex, anterior cingulate cortex, and basal ganglia (Supplementary Fig. 7) which are areas in the brain correlated with fluid intelligence.50–52
eQTL analysis of significant loci associated with simple processing speed
We find that rs3813363 in SAMD3, the association hit for mean reaction time from all models, as a significant eQTL controlling expression of SAMD3 in the cortical regions of the brain (NES=-0.4, p-value=1.1 x 10-5) (Supplementary Table 11). Several studies have established that cortical regions of the brain are well correlated with reaction time phenotypes assessing the domain of simple processing speed.53 Similarly, rs17662853-the missense hit in KANSL1 for reaction time, is an eQTL with significant expression for KANSL1 in the cerebellum (NES = −0.4; p-value=2.3 x 10-5) and anterior cingulate cortex (NES = −0.49; p-value=2.5 x 10-5) (Supplementary Table 11), regions responsible for perception and motor response whose coordination is necessary for completion of a reaction time task.53,54 rs17662853 is also an eQTL controlling expression of NSFP1, LRRC37A, ARL17A, ARL17B, RP11-798G7.8, NSF, NSFP1, FAM215B in several brain regions including cortex and cerebellum (Supplementary Table 11), highlighting the importance of significantly associated variants obtained from exome-wide analysis, that could regulate expression of nearby genes relevant to the biology of the trait. eQTL variants in GPR108 significantly control expression of GPR108 in various brain tissues with the lead eQTLs within 500kbp of our lead SNP controlling GPR108 expression significantly in the cortex (Supplementary Table 12). Our eQTL analysis reveals that loci around 500kbp of, rs11062991 (intronic CCDC77) most significantly regulates expression of CCDC77 in hypothalamus and cerebellar hemispheres (Supplementary Table 12). rs2959174 (synonymous THAP10/ intronic LRRC49) is a significant eQTL regulating the high expression of LARP6 in cerebellum, cerebellar hemispheres and putamen of basal ganglia, hippocampus and cortex (Supplementary Table 11).Thus, our eQTL analysis reveals another relevant gene LARP6 for understanding the biology of cognition, even though the identified variant itself annotates to LRRC49 and THAP10, with less relevance to cognition (https://maayanlab.cloud/Harmonizome/gene_set/Cognition+Disorders/CTD+Gene-Disease+Associations).55 The LARP6 loci identified from our analysis (rs3825970 and rs1549317) is also a significant eQTL controlling LARP6 expression in cerebellum, cerebellar hemispheres and putamen of basal ganglia (Supplementary Table 11). eQTLs within 500kbp of rs201404149 (synonymous MTFR1L) are significant for expression of MTFR1L in cerebellum, cortex, frontal cortex, cerebellar hemispheres, and caudate nucleus of basal ganglia (Supplementary Table 12).
eQTL analysis of significant loci associated with complex processing speed
The baseline model for this domain yields one significant hit-rs12932325 in intronic region of GTF3C1. rs12932325 is an eQTL for IL21R (Supplementary Table 11) which impacts Alzheimer’s disease pathology by enhancing brain and peripheral immune and inflammatory responses and leads to increased deposition of Aβ plaques.56 Both the models controlling for LDL and total cholesterol levels independently yield a novel intronic variant in PTPN11 as a novel significant hit for complex processing speed. The lead eQTL variant near ±500 kb of this variant significantly controls expression of PTPN11 in the substantia niagra of the brain (Supplementary Table 12). Research has shown that Parkinson’s disease causes loss of dopamine producing neurons in the substantia nigra and dopaminergic processes have been shown to be involved in cognitive functions like processing speed.57 Controlling for HbA1c, we identify rs71467481 in intronic region of PPFIA1 as another novel significant hit for complex processing speed. Our eQTL analysis shows that variants around 500 kbp of this SNP can significantly regulate expression of PPFIA1 in many brain regions (Supplementary Fig. 8, Supplementary Table 12).
eQTL analysis of significant loci associated with episodic memory
We identify a significant novel missense variant rs146766120 in AMIGO1 to be associated with episodic memory with and without controlling for serum HbA1c levels. AMIGO1 is expressed in the astrocytes, hippocampus and cortical neurons and it is postulated to influence neuron survival.58 In our eQTL analysis too, we find that variants within 500kb of rs146766120 significantly influences expression of AMIGO1 in the brain, especially in cortex (NES = 0.2, p-value = 8.2 x 10-12) and hippocampus (NES = 0.17, p-value = 1.4 x 10-11) (Supplementary Table 12, Supplementary Fig. 9), areas in the brain which interact among each other to encode and retrieve episodic memory,59,60 thus highlighting the importance of our identified hit in influencing episodic memory. Similarly we identify another novel synonymous variant rs7780766 in PTPN18 both with and without controlling for serum HbA1c levels. Significant eQTL variants around 500 kbp of rs7780766 can regulate expression of PTPN18 in cortex, prefrontal cortex, cerebellum, cerebellar hemispheres, caudate basal ganglia and nucleus accumbens (Supplementary Table 12 and Supplementary Fig. 10), thus pinpointing to the established crucial role of cerebellum in episodic memory via cortical-cerebellar brain networks.61 Studies also suggest that memory formation in hippocampus is guided by motivational significance of events whose effect on memory is thought to depend on interactions between hippocampus, ventral tegmental area and nucleus accumbens.62 The baseline model as well as the model controlling for HbA1c also yield rs111522866 in the intronic region of ITPR3 as another novel significant variant for episodic memory. eQTL variants around 500kb of this variant are significantly influence expression of ITPR3 in cerebellar hemispheres as well as in caudate basal ganglia (Supplementary Table 12). Upon controlling for LDL, we find another novel variant rs7725495 in intronic region of POLR3G to be associated with episodic memory. eQTL variants within 500kb of rs7725495 significantly influences expression of POLR3G in cerebellum, cortex, anterior cingulate cortex, hypothalamus, nucleus accumbens and putamen (Supplementary Table 12).
We identify two additional novel hits - missense rs3754644 (IQCA1) and rs73529530 (intronic ATP2A1) to be associated to episodic memory when controlled for HbA1c levels. eQTL variants around 500kb of rs3754644 also significantly control IQCA1 expression in amygdala, cerebellar hemispheres, cerebellum, cortex, frontal cortex, anterior cingulate cortex and nucleus accumbens (Supplementary Table 12). eQTL variants around 500kb of rs73529530 significantly regulates expression of ATP2A1 in hypothalamus (Supplementary Table 12).
eQTL analysis of significant loci associated with visual attention
We get an association signal of rs11589562 for visual attention when we adjust for HDL level. This variant is located in intronic region of MAST2 gene. eQTL analyses shows that this variant controls expression for MAST2 in cerebral cortex (NES = 0.21, p-value = 7.70E-06) and cerebellum (NES = 0.26, p-value = 4.6E-06) (Supplementary Table 12). It is known that the posterior parietal lobe of the cortex assesses the visual scene and it interacts with the frontal lobes in choosing object of interest to plan visually guided movement.63 This variant is also an eQTL significantly influencing expression of CCDC163, TESK2, and PIK3R3 in several brain regions (Supplementary Table 11). MAST2 is highly expressed in the hypothalamus and substantia niagra (Supplementary Fig. 11). Several studies have found oxytocin, synthesized in several nuclei of the hypothalamus, to regulate visual attention and eye movements to external sensory/social stimuli.64 Additionally, studies have shown dopamine producing neurons in the ventral tegmental area and substantia nigra to be related to multiple aspects of visual attention.64
Interaction analyses
Our epistasis analysis conducted with significant variants from the single variant analysis reveals four pairs of significant epistatic interactions with the APOE isoform-defining variants (rs7412 and rs429358) for episodic memory and simple processing speed (Table 2; Fig. 3B). Each of the variants which interact with either of the two APOE variants exerts a significant effect on the phenotype in addition to its interaction effect. These variants are rare with large effect sizes conforming to the general consensus that rarer variants have larger effect sizes. Out of these interactions, we find the interaction between rs429358 (APOE) and rs14676612 (AMIGO1) and between rs429358 and rs77807661 (PTPN18) of particular interest. We see that both the variants in the APOE-AMIGO1 and APOE-PTPN18 interactions (baseline and HbA1c controlled) exert a significant main effect as well as an interaction effect on episodic memory even when we tease out the effect of serum HbA1c on episodic memory. ITPR3 and GPR108 variants also exhibit an interaction effect with APOE for episodic memory and simple processing speed respectively. The epistasis analysis conducted on the basis of gene-based tests reveals nine significant epistatic interactions between five APOE (rs440446, rs143063029, rs769449 rs429358 and rs7412) and eight LRP1 SNPs (Table 3) to be associated with visual attention, adjusting for HDL levels. It also reveals one significant interaction between rs7412 (APOE) and rs1064725 (3’UTR APOC1) to be associated with alphanumeric trail duration while controlling for baseline covariates and for HbA1c independently.
Discussion
Our study is a comprehensive analysis to understand the genetic architecture of human cognition via single variant based, gene-based association, pairwise interaction, mediation and pleiotropy analyses (Fig. 4).
Blue boxes represent the information about the data and quality checks performed; the yellow boxes are indicative of the phenotypes and genes considered for gene-level analysis. Red boxes highlight the statistical tests performed; and the purple box indicates downstream analysis performed such as annotations, gene expression analysis and mediation analysis.
Our single-variant and gene-based association identifies novel independent loci in PCDHB16, PON2, CPEB3, LRRC49/THAP10, CCDC77, LARP6, MTFR1L, GPR108, GTF3C1, PTPN11, PPFIA1, AMIGO1, ITPR3, PTPN18, IQCA1, ATP2A1, POLR3G, MAST2, APOC1, LRP1, and previously known KANSL1, SAMD3 as associated with diverse cognition domains (Fig. 3A) in baseline as well as while adjusting for serum lipids and glycemic levels which are postulated to be modifiable metabolic risk factors for cognition. We note that these implicated genes are known to impact Alzheimer’s disease and related dementias through their functioning in synaptic plasticity and connectivity, oxidative stress, neuroinflammation. Interestingly, all risk alleles of single variant hits affecting cognition are common in the population with allele frequency > 5%, thus highlighting the significance of our work for studying the genetic context of cognitive abilities of individuals in the general population in order to understand the risk factors for cognitive decline. We have also obtained significant hits harboured in the coding region which are in LD with genotyped variants identified by Davies et al.5 associated with reaction time, thus highlighting the importance of exome-based analysis in uncovering likely causal associations.
Functional annotation of the significant variants reveal that majority of them are rare and have possibly damaging effects on the gene function (LoFTool score < 0.25), explaining comparatively higher proportion of variance (Fig. 5). However, as exceptions, we note a few common and low frequency variants with possible deleterious effects, yet explaining comparatively moderate or low proportion of variation by virtue of lower effect size (Fig. 5).
This figure represents the proportion of phenotypic variance explained by the exome-wide significantly associated variants for diverse cognitive domains vs their intolerance to genic functional changes. Coral and green, solid circles represent common, low variants respectively while turquoise and violet solid circles represent rare frequency variants. The sizes of the circles are proportional to the allele frequency of the corresponding variants represented by the circles
While the general consensus is that rare variants exhibit higher effects and are more likely to be deleterious, our results show that disease-associated common variants can also be intolerant to loss-of-function.
Out of the 20 independent loci (Fig. 3A), rs3824734 (CPEB3), rs3813363 (SAMD3) and rs3825970 (LARP6) are eQTL loci significantly controlling expression of their respective genes in cerebellum, cortex and basal ganglia. rs2959174 (LRRC49/THAP10) and rs12932325 (GTF3C1) are significant eQTL controlling expression of nearby genes such as LARP6 and IL21R respectively. rs11589562 (MAST2) and rs17662853 (KANSL1) are significant eQTLs controlling expression of the respective mapped genes as well as nearby genes. For the remaining loci, we observe eQTL in the vicinity (± 500kbp) controlling the expression of their respectively annotated genes in different brain regions pertinent to cognition. Thus, our eQTL analysis of the significant exome-wide variants show that the genes mapped to these variants are highly expressed in brain regions deemed responsible for completion of neuropsychological tasks corresponding to respective cognitive domains thus providing convincing relevance for the significance of our results. We find that variants in APOC1 and LRP1 act as significant eQTLs for regulating their expression in basal ganglia and cerebellar hemispheres (Supplementary Table 13), crucial to visual attention.64
Our study is the first-ever evidence of LRP1 association with cognition. Furthermore, we find that six out of the eight LRP1 SNPs which interact with APOE are rare and remaining two are of low frequency. Targeted experiments have shown roles for APOC1 and LRP165 in cognitive decline or neurodegeneration, however, our interaction analysis (Fig. 3B, Table 3) is the first to identify SNPs in APOC1 and LRP1 acting in conjunction with APOE in governing cognitive abilities, thus providing direct evidence for the role of LRP1 on cognition. In total, we have identified 14 pairwise interactions relevant to episodic memory, simple processing speed, visual attention between APOE and our exome-wide associated hits, many of them are interestingly rare (allele frequency 0.12-4%). Our study is the first to report evidence of interactions between APOE and AMIGO1, PTPN18, ITPR3. GPR108 in influencing cognition or neurodegeneration.
Despite several strengths of this study, we acknowledge the fact that the results reported herein must be considered in the light of some limitations. Firstly, even though the initial sample size is quite large (~157,000), effective sample sizes varies for each test (~27000 −121,000) is lesser because we have ensured that each participant has non-missing data on all variables of interest (phenotype and covariates) for all models. Secondly, our analyses has been based on individuals from European ancestry only. So caution must be exercised while generalizing the results for diverse ancestries.
Web resources
UK Biobank: https://www.ukbiobank.ac.uk
1000 Genomes Project: https://www.internationalgenome.org/
LDtrait: https: https://ldlink.nci.nih.gov/?tab=ldtrait
Uniprot: https://www.uniprot.org/
GWAS ATLAS: https://atlas.ctglab.nl/
VEP (LofTool): https://asia.ensembl.org/Homo_sapiens/Tools/VEP/
Harmonizome: https://maayanlab.cloud/Harmonizome/
Gtex: https://gtexportal.org/home/
NCBI: https://www.ncbi.nlm.nih.gov/ dbSNP: https://www.ncbi.nlm.nih.gov/snp/
UCSC: https://genome.ucsc.edu/
Funding
This study was funded by Science & Engineering Research Board (SERB), Government of India (ECR/2018/001429), Department of Biotechnology, Government of India (BT/RLF/29/2016) and National Supercomputing Mission, Government of India (DST/NSM/R&D_HPC_Applications/2021/03.12).
Competing interests
The authors report no competing interests.
Author’s contributions
B.K. conceived and designed the study. S.C, and B.K. performed the data analysis. S.C, and B.K. wrote the manuscript. S.C., and B.K. prepared the figures. Both the authors have read and approved the final manuscript.
Supplementary material
Supplementary material is available at Brain online.
Acknowledgements
We are extremely thankful to Dr. Balaji Jayaprakash for his support and enthusiastic encouragement towards this work. We would also like to thank Mr. Sheldon D’Silva for his help in carrying out quality control processes for the genetic data. We are also grateful to our system administrators Mr. Naveenan Srinivasan, Mr Karthik Sundaram and Mr Anand Kumar E for their support in managing technical challenges encountered while carrying out this computational work. We also thank the funding agencies-Science & Engineering Research Board (SERB), Government of India (ECR/2018/001429), Department of Biotechnology, Government of India (BT/RLF/29/2016) and National Supercomputing Mission, Government of India (DST/NSM/R&D_HPC_Applications/2021/03.12) for funding equipment and data acquisition for this study.
We conducted this research using the UK Biobank Resource under application number 55652. resource. UK Biobank’s database includes blood samples, heart and brain scans, and genetic data of the 500,000 volunteer participants and is globally accessible to approved researchers undertaking public health-related research. UK Biobank recruited 500,000 people between 40 and 69 years of age from 2006–2010 across the UK. The organization has more than 150 dedicated members of staff based in multiple locations across the UK who collected and stored detailed information about their lifestyle, physical measures, and blood, urine, and saliva samples with their consent. Since its inception in April 2012, over 20,000 researchers from 90+ countries have been approved to use this resource, and more than 2000 peer-reviewed papers that used it have now been published. This resource thus significantly contributes to advances in modern medicine and treatment, enabling better understanding of the prevention and diagnosis of a wide range of severe and life-threatening illnesses—including cancer, heart diseases, and stroke. And to run its operations, the UK Biobank receives generous support from its founding funders, the Wellcome Trust and UK Medical Research Council, the British Heart Foundation, Cancer Research UK, Department of Health, the Northwest Regional Development Agency, and the Scottish Government. We thus extend our sincere gratitude to all UK Biobank participants, researchers, clinicians, technicians, administrative staff and funding authorities who enabled curation of this enriched biomedical resource.
Abbreviations
- CADD
- Combined Annotation Dependent Depletion
- eQTL
- Expression Quantitative Trait Locus
- LD
- Linkage disequilibrium
- GWAS
- Genome-wide association studies
- HDL
- High-density lipoprotein
- LDL
- low-density lipoprotein
- HbA1c
- hemoglobin A1C; glycated haemoglobin
- SKAT
- Sequence Kernel Association Test
- FDA
- Food and Drug Administration