Abstract
Major depressive disorder (MDD) is a heritable condition (h2 = 37%)1 and a leading cause of disability worldwide2. MDD is clinically heterogeneous and comorbid with a variety of conditions and it has been hypothesised that this causal heterogeneity may have confounded previous attempts to elucidate its genetic architecture3-5. We applied a relatively new technique, Buhmbox6, to identify the presence of heterogeneous sub-groups within MDD using summary data from genome-wide association studies. We analysed two independent cohorts (ntotal = 31,981) and identified significant evidence (Pcorrected < 0.05) for 10 sub-groups across both cohorts, including subgroups with a liability for migraine, alcohol consumption and eczema. The most notable subgroups (Pcorrected ≤ 2.57 × 10−8 in both cohorts) were for blood levels of cholesterol and triglycerides, and blood pressure, indicating subgroups within MDD cases of individuals with a genetic predisposition for anomalous levels of these metabolic traits. Our findings provide strong evidence for novel causal heterogeneity of MDD and identify avenues for both stratification and treatment.
MDD is a complex and clinically heterogeneous condition that is characterised by symptoms including low mood and/or anhedonia persisting for at least two weeks. Many unique combinations of symptoms may lead to the same diagnosis and it has been suggested that this symptomatic heterogeneity may be due to, as yet unproven, causal heterogeneity7. In support of the causal heterogeneity hypothesis, MDD is frequently observed to be comorbid with many diseases including cancer8, cardiovascular disease,9 and other psychiatric illnesses10,11.
We sought to test the presence of causal heterogeneity in MDD according to a number of disease and quantitative traits using a newly available tool, Buhmbox6. Buhmbox examines the weighted pairwise correlations of the risk allele dosages for these diseases and traits within MDD cases and controls, based on effect size and frequency, and assigns a P-value based on the likelihood of the observed correlations between the cases and controls. We used two cohort studies, Generation Scotland: Scottish Family Health Study (GS:SFHS)12 and UK Biobank13, both of which have whole-genome genotyping data and information relating to MDD status. Study demographics for each cohort are provided in Table 1. Within each cohort, we examined 34 traits with a reported comorbidity with MDD and tested whether evidence of subgroups for these traits could be detected within our MDD cases. Further information regarding the 34 traits and their sources is provided in Supplementary Table 1. For the traits anorexia nervosa, neuroticism and MDD, summary statistics from different publications were assessed and are numbered accordingly, i.e. MDD 1, MDD 2 and MDD 3. In the case of MDD, this allowed us to examine whether different sets of associated loci, drawn from different populations and diagnostic criteria for MDD, would form a heterogeneous subgroup within our GS:SFHS and/or UK Biobank MDD cases.
Study demographics of Generation Scotland: Scottish Family Health Study (GS:SFHS) and UK Biobank
To account for multiple testing, P-values were adjusted using a false discovery rate adjustment and all reported values have been adjusted14. Ten traits showed significant evidence (P < 0.05) of MDD subgroup heterogeneity across both cohorts: total HDL cholesterol levels, total LDL cholesterol levels, serum triglycerides levels, diastolic blood pressure, systolic blood pressure, pulse pressure, alcohol consumption, migraine, eczema and MDD 3. Four traits: Alzheimer’s disease, neuroticism, schizophrenia and being a ‘morning person’ were identified as stratifying traits within one, but not both, cohorts. The subgroup heterogeneity P-values obtained within each cohort and for each trait are shown in Table 2 and Figure 1.
The false discovery rate adjusted P-values for subgroup heterogeneity of the shown disease or quantitative trait within MDD for Generation Scotland: Scottish Family Health Study (GS:SFHS) and UK Biobank. Bold values indicate statistical significance (P < 0.05).
The false discovery rate adjusted P-values for evidence of subgroup heterogeneity of the shown disease or quantitative trait within MDD for both Generation Scotland: Scottish Family Health Study (GS:SFHS; n = 6,946) and UK Biobank (n = 25,035).
The most striking results were those relating to metabolic traits (P ≤ 2.57 × 10−8 across both cohorts). Although these traits are unlikely to be independent of one another, the statistical significance obtained across both studies suggest robust subgroup heterogeneity. Buhmbox does not identify the individuals within the subgroup and future work to address this would aid in determining the degree of overlap between the observed P-values. Milaneschi, et al. 15 also reported evidence of a genetic correlation using profile risk scores between triglycerides and a severe atypical MDD subgroup and it would be beneficial to examine the blood pressure and cholesterol traits in other populations, social, economic and health care settings. Although elevated blood pressure and cholesterol levels are positively correlated with coronary artery disease (CAD), no evidence (P ≥ 0.05) for a subgroup for CAD was found. This apparent anomaly merits further study, but may reflect stratification independent of CAD.
There is substantial comorbidity between alcohol abuse and MDD16 with studies demonstrating a bidirectional relationship between alcohol and depression17,18. Both cohorts used in our study provided evidence (P < 0.05) of a novel alcohol consumption subgroup within MDD. Ellingson, et al. 19 have suggested that negative emotionality and behavioral control may mediate the genetic overlap between alcohol consumption and MDD. Further work could examine whether the subgroups within our MDD cases could also possess loci that influence those mediatory factors.
Similarly to alcohol consumption, migraine has also been shown to have a bidirectional relationship with depression20. We found evidence (P < 0.05) of subgroup heterogeneity within MDD that had a genetic predisposition for migraine in both of the cohorts that we studied. Ours is the first study to report the existence of a heterogeneous subgroup within MDD cases of individuals with a migraine-like genetic profile.
Subgroup heterogeneity for MDD was observed (P < 0.05) in both GS:SFHS and UK Biobank for the MDD 3 trait obtained from Hyde, et al. 21. MDD 3 was based on a self-reported diagnosis of MDD within a large and predominately European population. The diagnosis of MDD within UK Biobank was also self-reported and the existence of an MDD 3 subgroup suggests that there is a common shared genetic basis to this phenotype, but that it was not shared across all cases. MDD 1 was extracted from a study examining recurrent depression in Han Chinese women within a hospital setting and although we didn’t find evidence of a subgroup (P ≥ 0.05), this was not completely unexpected within our population and UK-based cohorts. No evidence (P ≥ 0.05) was found for a MDD 2 subgroup, however a polygenic risk score approach has provided evidence of pleiotropy between MDD 2 cases and GS:SFHS cases (P < 1.37 × 10−10) and MDD 2 cases and UK Biobank cases (P < 1.92 × 10−8), Hall, et al., (manuscript in preparation).
The body’s inflammatory response has been highlighted as a potential contributor to depression22. Crohn’s disease, inflammatory bowel disease and asthma were examined, but neither cohort provided evidence (P ≥ 0.05) for subgroup heterogeneity. Asthma is frequently comorbid with eczema and both cohorts examined in our study provided evidence (P < 0.05) for a subgroup of individuals with a genetic predisposition for eczema within our MDD cases. Associations between eczema and depression are well reported in the literature and potentially mediated by health anxiety23. Chronic inflammation seen in some cases of eczema, and a growing appreciation for the impotence of inflammation in depression, provides another possible explanation for subgroup heterogeneity.
A degree of cognitive impairment has been reported in individuals that are currently experiencing a depressive episode24,25. However, we found no evidence (P ≥ 0.05) for MDD subgroup heterogeneity for general fluid cognitive ability. Alzheimer’s disease is also associated with a decline in cognitive ability and previous studies have demonstrated depression to be a risk factor for the disease26,27. Within GS:SFHS, there was evidence (P = 0.006) for a subgroup of MDD cases which harboured the loci associated with Alzheimer’s disease, however this was not replicated in UK Biobank (P ≥ 0.05). Parkinson’s disease is another condition that is associated with neuropathology that is more likely to occur later in life. We found no evidence (P ≥ 0.05) in either cohort for a subgroup of Parkinson’s disease within MDD cases.
Subgroup heterogeneity was observed (P = 1.13 × 10−12) in UK Biobank for schizophrenia which substantiates the work of Milaneschi, et al. 7 who demonstrated correlated genomic profile risk scores between schizophrenia and a severe typical MDD subtype. However, Han, et al. 6 and our GS:SFHS cohort provided no evidence (P > 0.05) of a schizophrenia subgroup within MDD cases, which suggests that evidence of subgroup heterogeneity for schizophrenia is population and/or diagnosis dependent.
We also examined a number of developmental and personality traits due to the impact that depression can have on social interaction and feelings of self-worth. Evidence of subgroup heterogeneity (P = 3.20 × 10−15) was only found within UK Biobank for neuroticism 2 drawn from the Smith, et al. 28 study. The neuroticism 2 trait had a much greater number of associated loci compared to neuroticism 1 and therefore neuroticism 1 may have been underpowered to detect an effect, but this is also dependent on the effect sizes of the associated loci, the number of MDD cases and the size of any subgroup. There are similarities in the way that individuals respond to stressful events between neuroticism and MDD and it may be the heritable component that underpins this response that is driving the observed subgrouping within UK Biobank.
The diagnosis of MDD within GS:SFHS was based on the DSM-IV criteria29, which includes questions related to sleep and eating patterns. Therefore, sleep duration, being a ‘morning person’ and anorexia nervosa were included in our study. However, it was only within UK Biobank that a significant P-value for being a ‘morning person’ (P = 0.010) was observed. Anorexia nervosa 1 and 2 and sleep duration had low numbers of associated loci available (Supplementary Table 1) and were potentially underpowered to detect an effect.
Rheumatoid Arthritis (RA) is an autoimmune disease which, like many other chronic diseases, has been shown in multiple studies to be comorbid with depression30,31, with a potential subgroup of depressed individuals within RA sufferers32. However, no evidence (P ≥ 0.05) of subgroup heterogeneity was found for RA within either GS:SFHS or UK Biobank MDD cases. Multiple studies have suggested that morphological traits33,34 and type 2 diabetes35 may identify subgroups of individuals with MDD, but we found no evidence (P ≥ 0.05) for subgroup heterogeneity according to these traits in the current study.
The two cohorts used in this study reflect a subsample of the UK population, with additional steps taken to ensure that were no overlapping individuals. An MDD diagnosis was made using a structured clinical interview within GS:SFHS, whereas UK Biobank cases were defined by a number of self-reported measures. A broad range of traits were assessed and the selection of the summary statistics used was based on the number of individuals analysed, the availability of summary statistics and publication date. Buhmbox measures the correlations within cases, which is independent (or orthogonal) information from the effect size (personal communication with Buhm Han). Therefore, UK Biobank was able to be used to obtain the summary statistics for neuroticism 2, alcohol consumption and the blood pressure traits, and then also used to assess the existence of MDD subgroup heterogeneity.
Multiple studies have suggested the presence of aetiology subgroups within MDD and Buhmbox provides a quantifiable measure of their existence. Our study has provided replicable evidence of novel subgroup heterogeneity within MDD for a range of disease and quantitative traits, including blood pressure, cholesterol and triglyceride levels, migraine, eczema and alcohol consumption. This research underlines the potential of using genomic data for developing stratified approaches to the diagnosis and treatment of depression.
COMPETING FINANCIAL INTERESTS
The authors declare that no competing financial interests exist.
AUTHOR CONTRIBUTIONS
AMMcI, DJP, BHS, ADM, IJD and CH were involved in the acquisition of the GS:SFHS cohort. Quality control of the GS:SFHS data was conducted by LSH, JDH, MJA, CH and DHM. Imputation of the GS:SFHS data was conducted by TB and CH. Quality control of the UK Biobank data was conducted by MJA and DHM. AMMcI and DMH conceived the initial design of the study with HCW, JDH, YZ, T-KC, MJA, EMW, JG, PAT, CSH, IJD and DJP involved in the ongoing development of the project. DMH conducted the analysis and wrote the paper and all authors have read and approved its submission.
Methods
Generation Scotland: Scottish Family Health Study (GS:SFHS)
The family and population-based Generation Scotland: Scottish Family Health Study (GS:SFHS) cohort12 consisted of 23,960 individuals, of whom 20,195 were genotyped with the Illumina OmniExpress BeadChip (706,786 SNPs). The genotypic data was uploaded to the Michigan Imputation Server36 and phased using SHAPE IT v2.r83737 and imputed using the Haplotype Reference Consortium reference panel (HRC.r1-1)38. The imputation of GS:SFHS has been published previously39. We applied an imputation accuracy threshold (infoscore) of ≥ 0.8 and this provided us with a total of 8,633,288 genome-wide variants calls for 20,032 individuals.
A diagnosis of MDD was made using two initial screening questions and the Structured Clinical Interview for the Diagnostic and Statistical Manual of Mental Disorders (SCID)29. The diagnosis of MDD within GS:SFHS has been described previously40 and in our study, MDD was defined by at least one instance of a major depressive episode. Further to this, we used record linkage to the Scottish Morbidity Record41 to examine the psychiatric history of both case and control individuals. We identified 1,072 control individuals who had attended at least one psychiatry outpatient clinic and we excluded these individuals from our study. Using the psychiatric inpatient records, we identified 47 MDD cases who were also diagnosed with bipolar disorder or schizophrenia and these individuals were also excluded from our study. These participants provided us with prior consent for their anonymised data to be linked to clinical data. As GS:SFHS was a family-based cohort, we created an unrelated subsample using GCTA v.12242 ensuring that no two individuals shared a genomic relatedness of ≥ 0.025. A further 186 individuals who were identified as population outliers through principal component analyses of their genotypic information43. This left a total of 975 MDD cases and 5,971 controls (14.0% prevalence) in the GS:SFHS cohort.
UK Biobank
The population-based UK Biobank13 (provided as part of project #4844) consisted of 152,249 individuals with genomic data for 72,355,667 imputed variants44. This was the standard data release available to all approved researchers of UK Biobank. Detailed information regarding the imputation procedure45 and initial quality control46 are provided elsewhere. In summary, phasing was achieved using a modified version of SHAPE IT 247 with a combined reference panel of 1,000 genomes phase 3 and the UK10K haplotype reference48 panels and the IMPUTE2 package49 used for imputation. We applied an infoscore threshold of ≥ 0.8 which left a total of 24,467,210 variants. We removed individuals listed as non-white British and those individuals that had also participated in GS:SFHS identified using a checksum approach50 using genotype data.
Of the remaining participants, 25,035 had completed a touchscreen assessment of depressive symptoms and previous treatment. We used the diagnostic definitions of Smith, et al. 51 and defined case status as either ‘probable single lifetime episode of major depression’ or ‘probable recurrent major depression (moderate and severe)’ and with control status defined as ‘no mood disorder’. This provided us with a total of 8,508 cases and 16,527 controls (34.0% prevalence) within UK Biobank, which is greater than that observed within GS:SFHS.
Statistical Approach
Buhmbox v0.336 was used to conduct the statistical analysis and this package requires raw genetic and phenotypic data (disease A) and also summary statistics relating to the additional disease and quantitative traits for testing (disease B). The disease B associated loci were drawn from either published material or from personal communications and are detailed in Supplementary Table 1. The pruning of disease B associated loci was conducted using Plink 1.9052 and the --indep-pairwise command. A 50 variant window with a 5 variant sliding window was applied to the summary statistics and pruned any variants with an r2 > 0.1.
For GS:SFHS the first 20 principal components were derived from the genotypic data using GCTA v1.2242 and these were fitted within Buhmbox to account for population stratification. For UK Biobank the first 15 genetic principal components53 were fitted. Buhmbox examines whether there is a sharing of risk alleles between the disease B associated loci and the disease A cases (in our case MDD). Buhmbox uses the positive correlations between risk allele dosages in disease A cases to determine whether any sharing of risk alleles is driven by all individuals (pleitropy) or by a subset of individuals (heterogeneity). The likelihood of observing such positive correlations are used to determine the reported P-values. The Buhmbox software and manual is freely downloadable from http://software.broadinstitute.org/mpg/buhmbox/. The data that support the findings of this study are available on reasonable request from the corresponding author, DMH. The data are not publicly available due to participant confidentiality and the terms of the existing mutual transfer agreements with the respective data repositories.
ACKNOWLEDGEMENTS
Please refer to the supplementary information for full acknowledgments.
Footnotes
↵† Consortium members and their affiliations are listed in supplementary information