Abstract
Background Causes of the association between lower cognitive ability and poorer health remain unknown, but may reflect a shared genetic aetiology as indicated by previous research. This study examines the causal genetic associations between cognitive ability and physical health outcomes.
Method We carried out Mendelian randomization analyses using the inverse variance weighted method to test for causality between later life cognitive ability, educational attainment (as a proxy for cognitive ability in youth), BMI, height, systolic blood pressure, coronary artery disease, and type 2 diabetes in the UK Biobank sample (N = 112 151). Sensitivity analyses were performed using MR-Egger regression.
Results BMI, systolic blood pressure, coronary artery disease and type 2 diabetes showed negative associations with cognitive ability, while height was positively associated with cognitive ability. The Mendelian randomization analyses provided no evidence for a casual association from health to cognitive ability. In the other direction, higher educational attainment predicted lower BMI, systolic blood pressure, coronary artery disease, type 2 diabetes, and taller stature. The Mendelian randomization analyses indicated partly causal associations from educational attainment to health, however when adjusting for bias using the MR-Egger regression, these effects disappeared.
Conclusions The lack of consistent evidence for causal associations between cognitive ability, educational attainment, and physical health could be explained by violations of the Mendelian randomization assumptions, including biological pleiotropy.
Key messages
Cognitive ability and physical health outcomes are positively associated.
Mendelian randomization analyses indicated that educational attainment influenced physical health outcomes.
Sensitivity analyses, using MR-Egger regression, indicated that these associations were biased due to violations of the Mendelian randomization assumptions.
Introduction
Lower cognitive ability, lower educational attainment and greater cognitive decline are all associated with poorer health outcomes1-3. Some of these associations possibly arise because of the effect of lower cognitive ability in childhood on later life health, others because illnesses may lower cognitive ability in later life. The causes of these associations are unclear, but may reflect a shared genetic aetiology. Recent papers have reported genetic associations between cognitive ability and educational attainment, and a number of physical and mental health traits and diseases4-6. These4, 6, and other7-9, papers have shown successful use of educational attainment as a proxy for cognitive ability, showing phenotypic correlations between educational attainment and general cognitive ability around 0.509 and a genetic correlation of 0.724.
Some of the reciprocal phenotypic associations between cognitive and physical health variables, and their genetic correlations, are as follows. Short stature has been consistently linked with lower cognitive ability10, 11. Molecular genetic studies have indicated positive genetic correlations between height and cognitive ability4, 12, as well as between height and education attainment4, 5. Higher polygenic scores for height have been associated with better cognitive ability in adulthood4.
Multiple studies have shown associations between cognitive ability and cardiovascular risk factors. For example, lower childhood cognitive ability is associated with subsequent high blood pressure13 and obesity14. However, higher BMI in mid-life15 and both hypertension and hypotension16 are associated with lower cognitive ability and greater cognitive decline in later life. A negative genetic correlation has been identified between BMI, but not blood pressure, and educational attainment and cognitive ability in mid to late life4, 5 and a polygenic score for higher BMI is associated with lower cognitive ability in mid to late life and lower educational attainment4, whereas a polygenic score for higher systolic blood pressure is associated with lower educational attainment, but higher cognitive ability in mid to late life4.
Similarly, associations have been identified between cognitive ability and cardio metabolic diseases. Childhood cognitive ability has been associated with developing diabetes17 and coronary artery disease18 later in life. Diabetes19 and coronary artery disease20, 21 in midlife have been associated with greater cognitive decline later in life. A polygenic risk score for type 2 diabetes is associated with lower educational attainment, but not with cognitive ability in mid to late life4, although one has been associated with reduced cognitive decline22. To date no significant genetic correlation between diabetes and cognitive ability has been identified4, 5. A polygenic risk score for coronary artery disease is associated with lower educational attainment and lower mid to late life cognitive ability4, and a negative genetic correlation was identified between coronary artery disease and educational attainment4, 5, but not cognitive ability in mid to late life4.
The question arises, are the genetic associations caused by: 1) genes influencing health traits/diseases, and then those health traits/diseases subsequently influencing cognitive ability; 2) genes influencing cognitive ability, and then cognitive ability subsequently influencing health traits/diseases; 3) genes influencing general bodily system integrity23 that influences both cognitive ability and health traits/diseases?
To try to make some progress in understanding causality of the correlation between cognitive ability and a number of physical and mental health traits we used a bi-directional Mendelian randomization (MR) with Egger regression approach24. MR uses genetic variants as proxies for environmental exposures and is subject to the following assumptions: 1) the genetic variants are associated with the exposure; 2) the genetic variants are only associated with the outcome of interest via their effect on the exposure (i.e., there is no biological pleiotropy, also called the exclusion restriction); and 3) the genetic variants are independent of confounders. Figure 1 shows the Mendelian randomization study model; the instrumental variable, based on genome-wide significant SNPs from independent studies for the exposure, is used to estimate if the exposure (e.g. BMI) causally influences the outcome (e.g. cognitive ability). Individual single nucleotide polymorphisms (SNPs) are often found to be weak instruments for investigating causality because they often have small effect sizes. Using multiple SNPs can increase the strength of the instrument. However, this increases the chance of violating the MR assumptions, specifically violation of the assumption that the genetic variant affects the outcome via a different pathway than via the exposure. In the present study, Egger regression was used as a sensitivity analysis to test for such violations. We used multiple genetic variants for a number of health-related traits and diseases, previously identified in a genome-wide association study, as instrumental variables to see if they predicted cognitive ability (verbal-numerical reasoning) in mid to later life in the UK Biobank. We then used genome-wide significant educational attainment SNPs as an instrumental variable to test whether genetic differences associated with educational attainment (a proxy measure of cognitive ability in early life6, 8) predicts later life health outcomes in the UK Biobank.
Model for Mendelian randomization study. The instrumental variable, based on genome-wide significant SNPs from independent studies for the exposure, is used to estimate if the exposure (e.g. BMI) causally influences the outcome (e.g. cognitive ability). The instrumental variable should be unrelated to potential confounders of the exposure-outcome association and should only affect the outcome via the exposure.
Methods
Sample
This study uses baseline data from the UK Biobank Study, a large resource for identifying determinants of human diseases in middle aged and older individuals25. UK Biobank received ethical approval from the Research Ethics Committee (reference 11/NW/0382). This study has been completed under UK Biobank application 10279. Around 500 000 community-dwelling participants aged between 37 and 73 years were recruited and underwent assessments between 2006 and 2010 in the United Kingdom. This included cognitive and physical assessments, providing blood, urine and saliva samples for future analysis, and giving detailed information about their backgrounds and lifestyles, and agreeing to have their health followed longitudinally. For the present study, genome-wide genotyping data were available on 112☐151 individuals (58☐914 females) aged 40–70 years (mean age=56.9 years, s.d.=7.9) after the quality control process which is described in more detail elsewhere4.
Measures
Body mass index
Body mass index (BMI) was calculated as weight(kg)/height(m)2, and measured using an impedance measure, i.e. a Tanita BC418MA body composition analyser, to estimate body composition. We used the average of the two methods when both measures were available (r = 0.99); if only one measure was available, that measure was used (N = 1629). 291 individuals did not have information on BMI. One outlier was excluded based on visual inspection of the BMI distribution (BMI > 50). 111 712 individuals had valid BMI and genetic data.
Height
Standing and sitting height (cm) were measured using a Seca 202 device. We used standing height and excluded one individual based on the visual inspection of the height distribution with a standing height < 125 cm and a sitting/standing height ratio < 0.75. 111 959 had valid height and genetic data.
Systolic blood pressure
Systolic blood pressure was measured twice, a few moments apart, using the Omron Digital blood pressure monitor. A manual sphygmomanometer was used if the digital blood pressure monitor could not be employed (N = 6652). Systolic blood pressure was calculated as the average of measures at the two time points (for either automated or manual readings). Individuals with a history of coronary artery disease were excluded from the analysis (N = 2513). Following the recommendation by Tobin, Sheehan et al26, 15 mmHg was added to the average systolic blood pressure of individuals taking antihypertensive medication (N = 10 988). Individuals with a systolic blood pressure (after correcting for medication) more than 4 SD from the mean were excluded from future analyses (N = 75). After all exclusions, 106 759 individuals remained with valid blood pressure and genetic data.
Coronary artery disease
UK Biobank participants completed a touch screen questionnaire on past and current health, which included the question “Has a doctor ever told you that you have had any of the following conditions? heart attack/angina/stroke/high blood pressure/none of the above/prefer not to answer”. This was followed by a verbal interview with a trained nurse who was made aware if the participant had a history of certain illnesses and confirmed these diagnoses with the participant. For the present study, coronary artery disease was defined as a diagnosis of myocardial infarct or angina, reported during both the touchscreen and the verbal interview (N = 5288). The control group (N = 104 784) consisted of participants who reported none of the following diseases (based on the non-cancer illness code provided by UK Biobank): myocardial infarction, angina, heart failure, cerebrovascular disease, stroke, transient ischaemic attack, subdural haemorrhage, cerebral aneurysm, peripheral vascular disease, leg claudication/intermittent claudication, arterial embolism.
Type 2 diabetes
Type 2 diabetes case-control status was created using the same method as described by Wood et al27. Cases included participants who reported type 2 diabetes or generic diabetes during the nurse interview, started insulin treatment at least one year after diagnosis, were older than 35 years at the time of diagnosis, and did not receive a diagnosis one year prior to baseline testing (N = 3764). The control group consisted of participants who did not fulfil these criteria, and did not report a diagnosis of type 1 diabetes, diabetes insipidus and gestational diabetes (N = 108 015).
Years of education
As part of the sociodemographic questionnaire in the study, participants were asked, “Which of the following qualifications do you have? (You can select more than one)”. Possible answers were: “College or University Degree/A levels or AS levels or equivalent/O levels or GCSE or equivalent/CSEs or equivalent/NVQ or HND or HNC or equivalent/Other professional qualifications e.g. nursing, teaching/None of the above/Prefer not to answer”. For the present study, a new continuous variable was created measuring ‘years of education completed’. This was based on the ISCED coding, using the 1997 International Standard Classification of Education (ISCED) of the United Nations Educational, Scientific and Cultural Organization28. See the Table 1 for further details. For the current study, years of education was used a proxy phenotype for cognitive ability4, 6, 8. A total of 111 114 individuals had valid data for the years of education variable.
Coding for years of education in UK Biobank based on the ISCED coding.
Information about instrumental variables.
Cognitive ability
Cognitive ability was measured using a 13-item touchscreen computerized verbal-numerical reasoning test. The test included six verbal and seven numerical questions, all with multiple-choice answers, with a two-minute time limit. An example verbal item is: ‘Stop means the same as?’ (possible answers: ‘Pause/Close/Cease/Break/Rest/do not know/prefer not to answer’). An example numerical item is: Which number is the largest’ (possible answers: ‘642/308/987/714/253/do not know/prefer not to answer’). The cognitive ability score was the total score out of 13 (further detail can be found in Hagenaars et al.4). A total of 36 035 had valid cognitive ability and genetic data.
Covariates
All analyses were adjusted for the following covariates: age when attending assessment centre, sex, genetic batch and array, and the first ten genetic principal components for population stratification.
Instrumental variables
SNPs used in the instrumental variables were extracted from the imputed UK Biobank genotypes interim release including 112 151 individuals after quality control. Details on the quality control process have been published previously4. All instrumental variables were created based on SNPs that reached genome-wide significance in the largest available GWAS in European samples for the variables of interest (BMI29, height27, systolic blood pressure30, coronary artery disease31, type 2 diabetes32 and educational attainment33). SNPs out of Hardy–Weinberg equilibrium (HWE, p < 1×10-6), with an imputation quality below 0.9, or individual genotypes with a genotype probability below 0.9 were excluded from the instrumental variables. The individual variants were recoded as 0, 1 or 2 according to the number of trait increasing alleles. Table 1 includes information on the number of SNPs included, the reference paper, and the amount of variance explained by the instrumental variables for the corresponding variable of interest. Supplementary Table 1a-f provides details of the included SNPs.
Statistical analysis
Phenotypic associations
We performed linear regression analysis using BMI, height, systolic blood pressure, coronary artery disease, and type 2 diabetes to predict cognitive ability. We regressed BMI, height, and systolic blood pressure against educational attainment in a linear regression model; coronary artery disease and type 2 diabetes were regressed against educational attainment in logistic regression models.
Mendelian randomization analysis
The Mendelian randomization analysis was performed using inverse variance weighted regression analysis based on SNP level data, with each instrumental variable (IV) consisting of multiple SNPs24. The inverse variance weighted method is based on a regression of two vectors with the intercept constrained to zero; the genetic variant with the exposure association, and the genetic variant with the outcome association (Figure 1). By constraining the intercept to zero, this method assumes that all variants are valid instrumental variables. We performed an association analysis between each SNP in the instrumental variable for the exposure and the exposure itself (IV - exposure), as well as between the instrumental variable for the exposure and the outcome (IV - outcome). We then used the vector of the instrumental variable-outcome association analyses against the vector of the instrumental variable-exposure analyses. This association (vector IV - outcome ~ vector IV - exposure) was weighted by the standard error of the original IV-outcome association, to correct for minor allele frequency, as described by Bowden et al24. If the inverse variance weighted method indicated a significant effect, we used MR-Egger regression as a sensitivity analysis to test for violations of the instrumental variable assumptions24. This method is similar to the inverse variance weighted method, but uses an unconstrained intercept. The MR-Egger regression will give a bias-adjusted estimate, as it uses a weaker version of the exclusion restriction, by assuming that the associations of the genetic variants with the exposure, and the direct effect of the genetic variants on the outcome are independent of each other. This tests for directional pleiotropy, as the instrumental variable is not constrained by the instrumental variable assumptions.
Results
Health outcomes predicting cognitive ability
BMI, height, systolic blood pressure, and coronary artery disease predicted performance on the verbal-numerical reasoning test of cognitive ability (Table 3). A 1 SD higher BMI was associated with a 0.05 SD lower score for cognitive ability. A 1 SD greater height was associated with a 0.18 SD higher score for cognitive ability. A 1 SD higher systolic blood pressure was associated with a 0.05 SD lower score for cognitive ability. Individuals with coronary artery disease had, on average, a 0.27 SD lower score for cognitive ability. Individuals with type 2 diabetes had, on average, a 0.06 SD lower score for cognitive ability. The inverse variance weighted analyses, with the five health outcomes as the exposures, and cognitive ability as the outcome, did not provide any genetic causal evidence for any of these associations.
Phenotypic and genetic associations, using Mendelian randomization analysis, between five health instrumental variables and cognitive ability, using the verbal-numerical reasoning test. Significant associations are in bold. OR, odds ratio; MR-IVW, Mendelian randomization - inverse variance weighted method.
Education predicting health outcomes
Educational attainment, as measured by years of education, predicted BMI, height, systolic blood pressure, type 2 diabetes and coronary artery disease (Table 4, Figure 2). In every case, the inverse variance weighted method showed a causal effect of educational attainment on the health outcomes. However, the bias adjusted effect from the MR Egger regression indicated that the association between educational attainment and the five health outcomes was influenced by a violation of the MR assumptions. This suggested that there is no genetic causal association from educational attainment to health outcomes. The full results can be found in Table 4 and Figure 2.
Phenotypic and genetic associations, using Mendelian randomization analysis, between the educational attainment instrumental variable and five health outcomes. Significant associations are in bold. OR, odds ratio; MR-IVW, Mendelian randomization - inverse variance weighted method; MR-Egger, Mendelian randomization – Egger regression.
Comparison of phenotypic (x-axis) and genetic (y-axis) associations with 95% confidence intervals between educational attainment (exposure) and five health variables (outcomes). For the purpose of the figure, the assocation between height and educational attainment have been flipped, with a negative association indicating lower height and lower educational attainment.
Discussion
This study was designed to investigate causes of the well replicated finding that lower cognitive ability is associated with poorer health outcomes1-3. It used a bidirectional MR approach to investigate this. We found no consistent evidence for causal association between cognitive ability, in middle and older age, and several health outcomes. Inverse weighted MR analysis indicated that number of years of education (a proxy measure of cognitive ability in early life4, 8) may lead to lower BMI, greater height, lower systolic blood pressure and a reduced chance of coronary artery disease. However, the MR Egger sensitivity analyses indicated that these associations were driven by a violation of the MR assumptions.
The lack of causal associations could possibly indicate pleiotropy between health and cognitive ability, meaning that sets of genetic variants have independent effects on different phenotypes, rather than a set of genetic variants causally influencing cognitive ability via the related health exposure or vice versa. This would violate the assumption that the genetic variants only influence the outcome via the exposure, and not via other pathways. The idea of pleiotropy between health and cognitive ability is consistent with the theoretical construct of bodily system integrity23, whereby a latent trait is manifest as individual differences in how effectively people meet cognitive and health challenges from the environment, and which has some genetic aetiology.
The MR Egger regression results highlight the importance of performing sensitivity analyses in an MR framework. For example, the paper by Bowden et al.24 applied MR-Egger regression to data previously published on the causal effects of blood pressure on coronary artery disease30. The original study showed causal effect of genetic variants for blood pressure on coronary artery disease, which indicated directional pleiotropy. The MR Egger regression by Bowden et al.24 subsequently showed that the bias-adjusted estimates were closer to null and non-significant. Many of the original causal estimates in our study between educational attainment and health outcomes indicated directional pleiotropy, however the MR Egger regression indicated that the instrumental variable assumptions were violated.
Strengths of this study include the large sample size of UK Biobank, the participants of which all took the same cognitive tests, completed the same questionnaires and answered the same interview questions, in contrast to most genetic studies, where assessments across different cohorts often vary. A further strength is the fact that all of the UK Biobank genetic data were processed in a consistent matter, on the same platform and at the same location. The genetic variants on which the instrumental variables originated used the largest available GWAS at moment of testing.
Limitations of this study include the fact that cognitive ability was only measured on a subset of the UK Biobank participants and that it was a bespoke test. A second major limitation was that there is no published large genome-wide association study of cognitive ability in early life from which we could obtain genetic variants to use as an instrumental variable. Therefore, we used genome-wide significant SNPs associated with educational attainment as our early life cognitive ability instrument.
Instrumental variables for cardiovascular disease, type 2 diabetes, blood pressure, and educational attainment explain a small amount of the variance in the exposure. A good/strong instrumental variable would be expected to explain a substantial amount of the variance of the exposure. The current study used the same dataset (UK Biobank) for estimating the association of the variants with the exposure and outcome. This could lead to bias in the results, as the association between the variants and the exposure is likely to be overestimated, because variants are chosen due to the association with the exposure in the analysed dataset. This could lead to overestimation of the association with the outcome, also called the Beavis effect or winner’s curse34.
Overall, this study found phenotypic cognitive-physical health associations, but did not find consistent evidence for causal associations between cognitive ability and physical health. This could be due to biological pleiotropy or violations of the instrumental variable assumptions. Future work should focus on stronger instrumental variables, as well as better case-control ascertainment.
Funding
This work was supported by The University of Edinburgh Centre for Cognitive Ageing and Cognitive Epidemiology, part of the cross council Lifelong Health and Wellbeing Initiative (MR/K026992/1). Funding from the Biotechnology and Biological Sciences Research Council (BBSRC) and Medical Research Council (MRC) is gratefully acknowledged. This research was conducted using the UK Biobank Resource.