Abstract
Background Poorer self-rated health (SRH) predicts worse health outcomes, even when adjusted for objective measures of disease at time of rating. Twin studies indicate SRH has a heritability of 18-60% and that its genetic architecture may overlap that of personality and cognition.
Methods We carried out a genome-wide association study (GWAS) of SRH on 111 749 members of the UK Biobank sample. Univariate GCTA-GREML analyses were used to estimate the proportion of variance explained by all common autosomal SNPs for SRH. LD score regression and polygenic risk scoring were used to investigate pleiotropy between SRH in UK Biobank and up to 21 health-related and personality and cognitive traits from published GWAS consortia.
Results The GWAS identified 13 independent signals associated with SRH, including several in regions previously associated with diseases or disease-related traits. The strongest signal was on chromosome 2 (rs2360675, p = 1.77×10−10) close to KLF7, which has previously been associated with obesity and type 2 diabetes. A second strong peak was identified on chromosome 6 in the major histocompatibility region (rs76380179, p = 6.15×10−10). The proportion of variance in SRH that was explained by all common genetic variants was 13%. Polygenic scores for the following traits and disorders were associated with SRH: cognitive ability, education, neuroticism, BMI, longevity, ADHD, major depressive disorder, schizophrenia, lung function, blood pressure, coronary artery disease, large vessel disease stroke, and type 2 diabetes.
Conclusions Individual differences in how people respond to a single item on SRH are partly explained by their genetic propensity to many common psychiatric and physical disorders and psychological traits.
Key Messages
Genetic variants associated with common diseases and psychological traits are associated with self-rated health.
The SNP-based heritability of self-rated health is 0.13 (SE 0.006).
There is pleiotropy between self-rated health and psychiatric and physical diseases and psychological traits.
Introduction
There is considerable evidence that how individuals respond to one simple question asking them to evaluate their current state of health is a powerful predictor of future health outcomes. Poorer self-rated health (SRH) has been associated with increased mortality from all causes1–4 and from several specific causes including cardiovascular disease5–7, diabetes8, respiratory disease8, cancer8 and infectious disease.8 Poorer SRH has also been linked in prospective studies to an increased risk of the onset of certain diseases, in particular, heart disease9–11, cancer11 and type 2 diabetes12, with a higher likelihood of incident admission to psychiatric hospital11, and with increased incidence of cognitive or functional impairment13. People with a greater burden of chronic disease are more likely to rate their health as poor or fair14 but, in general, adjustments for objective measures of disease, common risk factors, and health behaviours at the time that individuals rated their health, explains only a small part of the association between SRH and later morbidity or mortality.
Evidence for the heritability of SRH comes from several twin studies15–17, which provide estimates of the percentage variance explained by genetic factors which range from ~20% to ~60%18,19. Studies using molecular genetic methods also provide evidence for heritability: for instance, the GCTA-GREML method20 was used to estimate that common SNPs account for 18% of the variation in SRH (N = 4233).21 A multivariate twin study22 indicated appreciable genetic overlap between SRH and the phenotypically-correlated traits of optimism and self-rated mental health. However, there were also substantial genetic influences unique to SRH (see also23). In addition, Svedberg et al.24 showed that SRH and (measured) cognitive ability have a shared genetic basis using twin models; for older adults, genetic factors were entirely responsible for the phenotypic relation between SRH and cognitive ability. To date, studies have been insufficiently powered to detect variants from individual genes that relate to SRH. Previous research suggests that perceptions of health are driven in part by psychological factors. There is evidence that people who are higher in the personality trait neuroticism—the tendency to experience negative emotions—are more likely to rate their health as being poor25–27, and have a steeper decline in health ratings over time.28 Another psychological factor that has been linked with poorer SRH in cross-sectional surveys is lower cognitive ability. While there is some indication that poorer perception of health can be a risk factor for subsequent cognitive decline29, longitudinal evidence suggests that having lower cognitive ability in youth increases the risk of poorer SRH decades later.30 Part of this link may be due to lower educational attainment—itself consistently linked with poorer SRH.31–33 It has been suggested that psychosocial resources may enable the highly educated to cope better with the negative effects of worsening health, and that this may in part explain why such individuals have better SRH.33,35
The aim of the present study was to add substantially to the understanding of the genetic mechanisms and genetic architecture of SRH. Using the large UK Biobank genotyped sample we conducted a genome-wide analysis of SRH, we estimated its SNP-based heritability, and we studied its pleiotropy with physical and mental health and with personality and cognitive traits.
Methods
Cohorts and measures
The UK Biobank is a health resource facilitating the study of the origins of a wide range of illnesses.36 Around 500 000 individuals aged between 37 and 73 years were recruited in the United Kingdom between 2006 and 2010. They underwent testing of cognitive abilities, physical and mental health examinations, completed questionnaires about lifestyle, socio-demographic background and family medical history, and agreed to have their health followed longitudinally. For the present study, genome-wide genotyping data were available for 112 151 individuals (58 914 females, 53 237 males) aged 40 to 73 years (mean = 56.91 years, SD = 7.93).
Ethics
UK Biobank received ethical approval from the Research Ethics Committee (REC reference 11/NW/0382). This study has been completed under UK Biobank application 10279.
Self-rated health
Participants were asked the question, “In general how would you rate your overall health?”. Possible answers were “Excellent/Good/Fair/Poor/Do not know/Prefer not to answer”. We created a four-category SRH variable indexing how each participant rated their health ranging from “excellent” to “poor”; excluding those that responded with “do not know” or “prefer not to answer”. For the phenotypic correlations, LD score regression and polygenic profile score analyses used in this study, a higher score for SRH indicates a better health rating.
Neuroticism
Participants completed 12 questions of the Eysenck Personality Questionnaire-Revised Short Form (EPQ-R Short Form)37,38 neuroticism scale. Neuroticism refers to the relatively stable personality trait that assesses individual differences in the tendency to experience negative emotions. A summary score was derived to obtain a measure of neuroticism. The EPQ-R Short Form has been shown to correlate highly with other well-validated Neuroticism scales39, and has shown a high genetic correlation (0.91) with psychological distress examined in a non-psychiatric population using the 30-item General Health Questionnaire.40
Education
Education was measured by the question, “Which of the following qualifications do you have? (You can select more than one)”. Possible answers were: “College or University Degree/A levels or AS levels or equivalent/O levels or GCSE or equivalent/CSEs or equivalent/NVQ or HND or HNC or equivalent/Other professional qualifications e.g. nursing, teaching/None of the above/Prefer not to answer”. For the present study, a binary education variable was created to indicate whether or not a participant had a college or university-level degree; excluding those who responded with “prefer not to answer”. Previous studies have used similar binary variables as a ‘proxy-phenotype’ for cognitive ability.41
Intelligence
Intelligence was measured by a thirteen item-test with a time limit of two minutes, completed by 36 035 individuals. Six items were verbal and seven numerical. An example of a verbal question is ‘Bud is to flower as child is to?’ (Possible answers: ‘Grow/Develop/Improve/Adult/Old/Do not know/Prefer not to answer’). An example of a numerical question is ‘If sixty is more than half of seventy-five, multiply twenty-three by three. If not subtract 15 from eighty-five. Is the answer?’ (Possible answers: ‘68/69/70/71/72/Do not know/Prefer not to answer’). The Intelligence score was the total score out of thirteen. The Cronbach α coefficient for the thirteen items was 0.62.
Phenotypic Correlations
Phenotypic correlation coefficients were calculated between SRH, and neuroticism, education, intelligence and mortality in UK Biobank. Cox-proportional hazard ratios were calculated for all-cause mortality according to the SRH categories (Poor to Excellent).
Genotyping and quality control
152 729 UK Biobank samples were genotyped using either the UK BiLEVE (N = 49 979) or the UK Biobank axiom array (N = 102 750). Array design, genotyping details and, quality control details can be found elsewhere42. Genotyping was performed on 33 batches of ~ 4700 samples by Affymetrix. Initial quality control (QC) of the genotyping data was also performed by Affymetrix. Further details are available of the sample processing specific to the UK Biobank project (http://biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id=155583) and the Axiom array (http://media.affymetrix.com/support/downloads/manuals/axiom_2_assay_auto_workflow_user_guide.pdf). Prior to the release of the UK Biobank genetic data a stringent QC protocol was applied, which was performed at the Wellcome Trust Centre for Human Genetics (WTCHG). Details of this process can be found here (http://biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id=155580). Prior to the analyses described below, further quality control measures were applied. Individuals were removed sequentially based on non-British ancestry (within those who self-identified as being British, principal component analysis was used to remove outliers), high missingness, relatedness, QC failure in UK Bileve, and gender mismatch. A sample of 112 151 individuals remained for further analyses.
Imputation
An imputed dataset was made available in which the UK Biobank interim release was imputed to a reference set combining the UK10K haplotype and 1000 Genomes Phase 3 reference panels. Further details can be found at the following URL: http://biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id=157020. The association analyses were restricted to autosomal variants with a minor allele frequency greater than 0.1% and an imputation quality score of 0.1 or greater (N ~ 17.3m SNPs).
Curation of summary results from GWAS consortia on health-related variables
In order to conduct LD score regression and polygenic profile score analyses between the UK Biobank SRH and the genetic predisposition to psychiatric, physical and cognitive variables, we gathered 21 sets of summary results from international GWAS consortia and three sets of summary results from GWAS of the following UK Biobank variables: neuroticism, education and intelligence. Details of the health-related variables, the consortia’s websites, key references, and number of subjects included in each consortia’s GWAS are given in Supplementary Materials and Supplementary Table 1.
Association analyses
The UK Biobank measure of SRH was adjusted for age, gender, assessment centre, genotyping batch, genotyping array, and 10 principal components for population stratification prior to the association analyses. The distribution of SRH was visually inspected and no exclusions were made; 111 749 individuals with both SRH and genotype information remained for further analyses.
SNPTEST v2.5.143 was used to perform genotype-phenotype association analyses on the imputed dataset. SNPTEST v2.5.1 can be found at the following URL: https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html#introduction. The ‘frequentist 1’ option was used to specify an additive model. Genotype dosage scores were used to account for genotype uncertainty.
The number of independent signals for the genotype-phenotype analyses was determined using LD clumping, using the 1000 genomes as a measure of LD between SNPs. First, SNPs with a genome-wide significant association with SRH (p < 5×10−8) were selected as index SNPs. Second, SNPs within 500kb and in LD of r2 > 0.1 with the index SNP were included in the clump. SNPs from within this region were assigned to the clump if they had a P-value <1×10−5.
MAGMA44 was used to perform gene-based association analyses. The results of the GWAS were used to derive the gene-based statistics. Genetic variants were assigned to genes based on their position according to the NCBI 37.3 build with no additional boundary placed around the genes; this resulted in a total of 18 116 genes being analysed. The European panel of the 1000 Genomes data (phase 1, release 3) was used as a reference panel to account for linkage disequilibrium. A genome-wide significance threshold for gene-based associations was calculated using the Bonferroni method (α = 0.05/18 116; P < 2.76 × 10−6).
Functional annotation and gene expression
For the 13 independent genome-wide significant SNPs identified by LD clumping, evidence of expression quantitative trait loci (eQTL) and functional annotation were explored using publicly available online resources. The Genotype-Tissue Expression Portal (GTEx) (http://www.gtexportal.org) was used to identify eQTLs associated with the SNPs. Functional annotation was investigated using the Regulome DB database45 (http://www.regulomedb.org/). Regulome DB was used to identify regulatory DNA elements in non-coding and intergenic regions of the genome in normal cell lines and tissues.
Estimation of SNP-based heritability
Univariate GCTA-GREML20 analyses were used to estimate the proportion of variance explained by all common autosomal SNPs for SRH.
Genetic analyses: DEPICT
DEPICT46 was used to conduct three analyses; gene prioritisation, gene-set analysis, and tissue enrichment. The full GWAS output of SRH was clumped using PLINK to derive independent regions of the genome showing evidence of association. Next, DEPICT was used to determine if these independent regions overlapped with genes that share biological function by comparing the empirically-derived clumps with randomly-selected loci drawn from across the genome and matched for gene density. DEPICT tests the hypothesis that genes showing a true association with SRH will be involved in the same mechanisms that in turn contribute toward this phenotype. Clumping was performed using index SNPs of p < 1×10−5 with a 500kb boundary including SNPs in LD of r2 > 0.1.
Two methods have been used to compute genetic associations between health-related variables from GWAS consortia and SRH in UK Biobank: LD score regression and polygenic profile score analyses, both providing a different metric to examine pleiotropy between two traits. LD score regression was used to determine the degree of overlap in polygenic architecture between two traits by deriving genetic correlations. The polygenic profile score method was used to predict the phenotypic variance in SRH using summary data from GWASs of health-related variables to create polygenic profile scores in the UK Biobank sample. Both LD score regression and polygenic profile score analyses depend on traits being highly polygenic in nature, i.e. a large number of variants of small effect contributing toward phenotypic variation47,48. LD score regression was performed between the 16 health related traits from GWAS consortia and three UK Biobank traits, while the polygenic profile score analyses were performed on the complete set of 21 health related traits from GWAS consortia as this method requires independent samples.
LD score regression
LD score regression uses the information that for a given SNP, the effect size is a function of this particular SNP’s LD with other SNPs.47,49 Assuming a trait with a polygenic architecture, SNPs with high LD will have stronger effects on average than SNPs with low LD. LD score regression estimates the genetic effect on a trait by measuring the extent to which the observed effect sizes from a GWAS can be explained by LD. The covariance between the genetic effects in two traits can be indexed in a similar way, normalizing this genetic covariance by the heritability of the trait will estimate the genetic correlation between the two traits.
In the present study, LD score regression has been used to derive genetic correlations between summary statistics from 16 health related GWAS consortia and three UK Biobank GWA studies (Intelligence, Education and Neuroticism), and the UK Biobank SRH measure. We followed the data processing pipeline devised by Bulik-Sullivan et al.47 In order to ensure that the genetic correlation for the Alzheimer’s disease phenotype was not driven by a single locus or biased the fit of the regression model, a 500kb region centred on the APOE locus was removed and this phenotype was re-run. This additional model is referred to in the Tables and Figures as ‘Alzheimer’s disease (500kb)’.
Polygenic profile score analyses
The UK Biobank genotyping data required recoding from numeric (1, 2) allele coding to standard ACGT format prior to being used in polygenic profile scoring analyses. This was achieved using a bespoke programme developed by one of the present authors (DCML), details of which are provided in the Supplementary Materials.
PRSice50 was used to create polygenic profile scores from 21 health-related phenotypes of published GWAS in all genotyped participants (Supplementary Table 1). SNPs with a minor allele frequency < 0.01, as well as strand-ambiguous SNPs were removed prior to creating the scores. Clumping was used to obtain SNPs in linkage equilibrium with an r2 < 0.25 within a 200bp window. The scores were calculated as the sum of alleles associated with the phenotype of interest across many genetic loci, weighted by their effect sizes estimated from the GWAS summary statistics. The conventional approach was used to create polygenic profile scores that included variants according to the significance of their association with the phenotype, exceeding five predefined p-value thresholds of 0.01, 0.05, 0.1, 0.5 and all SNPs. Throughout the paper, the most predictive threshold will be presented in the main tables; the full results, including all five thresholds, can be found in Supplementary Table 10.
Regression models were used to examine the associations between the 21 polygenic profiles and SRH, adjusting for age at measurement, sex, genotyping batch and array, assessment centre, and the first ten genetic principal components to adjust for population stratification. All polygenic profile score association analyses were performed in R51, and the obtained p-values from each test were corrected for multiple testing using the False Discovery Rate (FDR) method.52 Sensitivity analyses were performed in order to test whether the results are driven by individuals with a given illness. This was done by removing individuals with a self-reported clinical diagnosis of coronary artery disease (N = 5300), type 2 diabetes (N = 5800) and hypertension (N = 26 912) from the relevant analyses. More details can be found in the Supplementary Materials. Multivariate regression has been performed including all FDR significant polygenic profile scores and the covariates described earlier.
Results
Phenotypic correlations
Within UK Biobank, 111 749 individuals with genotype data completed the question ‘How would you rate your overall health?’ Their mean (SD) score for SRH was 2.14 (0.73). SRH showed a negative correlation with the measure of neuroticism (r = −0.25, p < 0.0001), indicating that individuals who rate their health as worse had higher levels of neuroticism. Correlations were also found for the UK Biobank measures of intelligence and education (r = 0.146 and 0.110, p < 0.0001), indicating that individuals with higher levels of intelligence or education are more likely to rate their health as better. Cox proportional hazard models for all-cause mortality adjusted for age and sex, indicated that, compared to people with excellent SRH, the risk of dying in those with good, fair or poor SRH is 1.37 (1.17, 1.62), 2.51 (2.12, 2.97), 6.95 (5.79, 8.36) respectively.
Genome-wide association study
A total of 109 SNPs from 12 genomic regions were associated with SRH (Figure 1, Figure 2, and Supplementary Table 2). Thirteen independent signals were identified. The strongest signal was on chromosome 2 and included the gene encoding Kruppel-Like Factor 7 (KLF7). Variants in this gene have previously been associated with obesity53 and type 2 diabetes.54 A second strong peak was identified on chromosome 6. Two independent SNPs were identified within the major histocompatibility complex region (MHC). The MHC consists of a large number of genes that encode a group of cell surface molecules, which have important roles in the immune system. Two independent signals were identified on chromosome 3, one of which was within Bassoon Presynaptic Cytomatrix Protein (BSN), a gene that encodes a scaffold protein expressed in the brain, is involved with neurotransmitter release and was previously associated with Crohn’s disease.55 A single SNP in Sterile Alpha Motif Domain Containing 12 (SAMD12) on chromosome 8 was associated with SRH. This region has previously been linked to diastolic blood pressure.56 A single SNP in Transcription Factor 4 (TCF4, chromosome 18), believed to be important in nervous system development and previously associated with neurodevelopmental disorders and psychiatric diseases was also associated with SRH.57 Single SNPs were also identified in SEC24 Family Member C (SEC24C) involved in vesicle trafficking and Shisa Family Member 9 (SHISA9) a regulator of short-term plasticity in the dentate gyrus, on chromosomes 10 and 16 respectively.58,59
Gene-based analyses (MAGMA)
The gene-based analysis identified 36 genes across 11 genomic regions associated with SRH (Supplementary Table 3). The most significantly associated gene was BSN. Eighteen other genes in this gene dense region of chromosome 3 are also included in the list. This same region previously showed suggestive significance with general cognitive function.60 Four major histocompatibility complex genes (HLA-DQA1, HLA-DQB1, HLA-DRB1 and HLA-DRB5) were also associated with SRH. Other genes of potential interest include: neurexin 1 (NRXN1, chromosome 2), a synaptic adhesion molecule previously associated with neurodevelopmental disorders61; autism susceptibility candidate 2 (AUTS2, chromosome 7), previously associated with neurodevelopmental disorders and cancer62; Zinc Finger Protein 652 (ZNF652, chromosome 17) a zinc finger protein previously associated with blood pressure63; and Additional Sex Combs Like Transcriptional Regulator 3 (ASXL3, chromosome 18), previously associated with cancer.64
GCTA-GREML analysis of SNP-based heritability
The proportion of variance in SRH that was explained by all common genetic variants was 13% (GCTA-GREML estimate 0.13, SE 0.006).
Functional annotation and gene expression
Using the GTEx database (http://www.broadinstitute.org/gtex/), three cis-eQTL associations were identified for the 13 independent genome-wide significant SNPs (Supplementary Table 4). rs907662 on chromosome 1 potentially regulates Mannosidase, Alpha, Class 1A, Member 2 (MAN1A2), previously identified as being differentially expressed in type 2 diabetes patients compared to normal controls.65 rs76380179 and rs7761182 on chromosome 6 potentially regulate a number of major histocompatibility genes. There was evidence of regulatory elements associated with all nine of the independent genome-wide significant SNPs included in the Regulome DB database. (http://www.regulomedb.org/) (Supplementary Table 4).
Gene prioritisation, gene set analysis and tissue enrichment
The gene prioritisation analysis, gene set analysis and the tissue enrichment, performed in DEPICT, provided no evidence of association for any of the gene sets or tissue types considered. Full results for these analyses can be found in Supplementary Tables 5, 6 and 7.
To test for pleiotropy between SRH and health-related, personality and cognitive traits, we present LD score regression and polygenic profile analyses. For the purpose of these two analyses, a higher score for SRH indicates a better health rating.
LD score regression
LD score regression was performed to obtain genetic correlations between SRH in UK Biobank and the summary results of the 16 GWAS consortia and three UK Biobank traits (Neuroticism, Education and Intelligence) (Figure 3, Supplementary Table 8). Better SRH showed positive genetic correlations with intelligence (rg = 0.40), education (rg = 0.59), longevity (rg = 0.33), anorexia nervosa (rg = 0.11), and forced expiratory volume in one second (FEV1) (rg = 0.29). Negative genetic correlations were found between better SRH and neuroticism (rg = −0.38), BMI (rg = −0.41), ADHD (rg = −0.38), major depressive disorder (rg = −0.46), schizophrenia (rg = −0.17), systolic and diastolic blood pressure (rg −0.14 and −0.16), coronary artery disease (rg = −0.33), ischaemic stroke (rg = −0.21), and type 2 diabetes (rg= −0.38). No associations were found for Alzheimer’s disease, bipolar disorder or the ischaemic stroke subtypes (Figure 3, Supplementary Table 8).
Polygenic profile analyses
The results of the polygenic risk score analyses are shown in Table 1, using the most predictive threshold for each trait. The numbers of SNPs included in each polygenic threshold score for each of the 21 health-related traits are shown in Supplementary Table 9. Higher polygenic profile scores for years of education, general- and childhood cognitive ability, longevity, and FEV1 were associated with higher levels of SRH (standardised β between 0.01 and 0.06). Higher polygenic profile scores for neuroticism, BMI, ADHD, major depressive disorder, schizophrenia, diastolic and systolic blood pressure, coronary artery disease, large vessel disease stroke, and type 2 diabetes were associated with lower levels of SRH (standardised β between −0.07 and −0.009). No associations were found between polygenic profile scores for SRH and those for Alzheimer’s disease, anorexia nervosa, bipolar disorder, ischaemic stroke, cardioembolic stroke, and small vessel disease stroke. The results showed very little change when individuals with self-reported clinical diagnoses of cardiovascular disease, diabetes, and hypertension were removed from the corresponding analyses (coronary artery disease, type 2 diabetes and systolic blood pressure). The results including all five thresholds can be found in Supplementary Table 10.
A multivariate regression model was run including 14 of 15 significant polygenic profile scores (years of education, childhood cognitive ability, general cognitive function, neuroticism, BMI, longevity, ADHD, major depressive disorder, schizophrenia, FEV1, systolic blood pressure, coronary artery disease, large vessel disease stroke, and type 2 diabetes) alongside the same covariates as described previously. Due to the high phenotypic correlation between systolic and diastolic blood pressure, only systolic blood pressure was included in the model. This tested the extent to which including all significant polygenic profile scores in a multivariate model would improve the prediction of SRH and discover which polygenic scores contributed independently. This was done by subtracting the r2 value of the model only including the covariates from the model with both covariates and polygenic profile scores. All polygenic profile scores remained significant, after FDR correction (p < 0.032), in this multivariate model, and together accounted for 1.03% of the variance in SRH (Table 2).
Discussion
In the present and other studies, a single-item of SRH is associated with mortality. Such SRH items are widely and successfully used in health research. Given their predictive validity, it is of interest to discover the causes of people’s differences in SRH. Here, in analyses of the large UK Biobank sample together with results from many GWAS consortia, we discovered many new genome-wide significant genetic variants associated with SRH. A robust estimate of the SNP-based heritability of SRH was provided. Extensive pleiotropy was found between SRH and many physical and psychiatric disorders and health-related, cognitive and personality traits, indicating that, to a significant degree, the same genetic variants are responsible for the heritability of these traits and SRH. This provides comprehensive new findings on the overlap between how individuals rate their health on a four-point scale and the genetic contributions to intelligence, personality, cardiovascular diseases, and many psychiatric and physical disorders and traits.
The present study identified novel genes/loci associated with individual differences in SRH. These include genes previously associated with diabetes (KLF7, MAN1A2)54,65, neurodevelopmental disorders (TC4F, NRXN1, AUTS2)57,61,62, autoimmune diseases (BSN)55, blood pressure (SAMD12, ZNF652)56,63 and cancer (ASXL3, AUTS2)64,62. These results indicate that genes previously associated with objectively measured diseases are also associated with SRH, perhaps indicating that people’s perception of their health does truly reflect their state of health. The MHC on chromosome 6 was also shown to be associated with SRH. The MHC is vital for the correct functioning of the immune system and therefore genetic variants in this region can have major health implications, for example HLA-DQA1 which was associated with SRH in our gene-based analysis, has previously been associated with coeliac disease.66
Sensitivity analyses showed that the polygenic profile score analyses for systolic blood pressure, coronary artery disease, and type 2 diabetes were not confounded by individuals with the associated disease (hypertension, cardiovascular disease and diabetes mellitus). This indicates that even in healthy individuals, a higher polygenic profile score for systolic blood pressure, coronary artery disease and type 2 diabetes is associated with lower health ratings. The results of the present study indicate that genetic variants associated with better SRH are associated with a lower genetic risk of neuroticism, but a higher genetic risk of anorexia nervosa. From a previously published positive phenotypic association between anorexia nervosa and neuroticism67, and our finding that individuals who rate their health lower had higher levels of neuroticism, one might have expected that high polygenic risk of anorexia nervosa would be associated with lower SRH. However, the polygenic profile score for anorexia nervosa could be seen on a spectrum, where individuals on the lower end of the spectrum might be more conscious about their eating behaviour and health, leading to better SRH, without exceeding the threshold for a clinical diagnosis of anorexia nervosa. The summary results of the GWAS for anorexia nervosa used for both LD score regression and polygenic profile analyses were based on 2907 cases and almost 15 000 controls.68 It is possible that this GWAS is picking some degree of predisposition to healthy behaviour. Another explanation for this finding is that individuals with anorexia nervosa potentially have a discrepancy between their SRH and their actual health, due to the body image distortion of individuals with anorexia nervosa. This study was unable to test this hypothesis.
This study shows that the SRH measure, consisting of only one question, is able to reflect the genetic variants of traits and disorders, such as intelligence, personality, cardio-metabolic disease and psychiatric disorders, associated with actual health. Genetic variants associated with higher levels of intelligence and lower levels of cardio metabolic diseases are associated with better health ratings. This supports the theoretical construct of bodily system integrity, a latent trait indicating individual differences in encountering health and cognitive challenges from the environment.69 Individuals with better system integrity are likely to have higher levels of intelligence, fewer diseases, a better overall health and greater longevity.
The strongest association found in this study is between SRH and the polygenic profile score for BMI, accounting for 0.45% of the variance. When combining the polygenic liabilities for multiple traits and disorders in a multivariate model, the polygenic liabilities together double the amount of variance to 1%. This implies that SRH is affected by risk alleles unique to each trait and disorder.
A strength of this study is the large sample size of UK Biobank, permitting powerful and robust tests of pleiotropy between SRH and many health related traits. Other strengths include that all individuals were of white British ancestry, minimising population stratification. Genotyping and quality control has been performed in a consistent way across the whole sample. The use of summary data from many international GWAS consortia allowed a detailed examination of pleiotropy between SRH and a wide range of health-related traits, showing many novel estimates of genetic correlations between traits.
The present study has some limitations. The summary data from the GWAS studies curated to perform LD score regression and create polygenic profile scores often originated from consortia studies, which involve meta-analyses across datasets with substantial heterogeneity in sample size, genome-wide imputation quality, and measurement of the traits. For the polygenic profile analyses we might have overestimated the effects because of possible overlap of individuals in UK Biobank sample and some of the cohorts within some of the GWAS consortia. We were unable to quantify the exact overlap, but the number of overlapping individuals is probably small and we judge that this will have a minor effect on the results. Because the analyses were restricted to individuals of white British ancestry, we are unable to generalize the results beyond that group. Therefore, these analyses should be replicated in large samples of individuals with different backgrounds.
Summary
Measuring people’s overall health is difficult, because the state of the body and mind can be disrupted in many ways, and people’s perceptions of the same objective bodily state can differ. Notwithstanding this complexity, the responses to a single subjective question about whether a person is in good or poor health has proved valid and useful in health research. The present study has been able to identify many genetic contributions to SRH, confirming the complexity of the contributions to the phenotype, and also its partial foundations in genetic differences. The single subjective item of SRH picks up the contributions from many background systems, including mental and physical health, as well as cognitive abilities and personality.
Funding
This work was supported by The University of Edinburgh Centre for Cognitive Ageing and Cognitive Epidemiology, part of the cross council Lifelong Health and Wellbeing Initiative (MR/K026992/1). Funding from the Biotechnology and Biological Sciences Research Council (BBSRC) and Medical Research Council (MRC) is gratefully acknowledged. This research was conducted, using the UK Biobank Resource. AMM, JMW and IJD are supported by Wellcome Trust Strategic Award 104036/Z/14/Z.