Abstract
Neuroticism is a personality trait of fundamental importance for psychological wellbeing and public health. It is strongly associated with major depressive disorder (MDD) and several other psychiatric conditions. Although neuroticism is heritable, attempts to identify the alleles involved in previous studies have been limited by relatively small sample sizes and heterogeneity in the measurement of neuroticism. Here we report a genome-wide association study of neuroticism in 91,370 participants of the UK Biobank cohort and a combined meta-analysis which includes a further 7,197 participants from the Generation Scotland Scottish Family Health Study (GS:SFHS) and 8,687 participants from a Queensland Institute of Medical Research (QIMR) cohort. All participants were assessed using the same neuroticism instrument, the Eysenck Personality Questionnaire-Revised (EPQ-R-S) Short Form’s Neuroticism scale. We found a SNP-based heritability estimate for neuroticism of approximately 15% (SE = 0.7%). Meta-analysis identified 9 novel loci associated with neuroticism. The strongest evidence for association was at a locus on chromosome 8 (p = 1.28×10−15) spanning 4 Mb and containing at least 36 genes. Other associated loci included genes of interest on chromosome 1 (GRIK3, glutamate receptor ionotropic kainate 3), chromosome 4 (KLHL2, Kelch-like protein 2), chromosome 17 (CRHR1, corticotropin-releasing hormone receptor 1 and MAPT, microtubule-associated protein Tau), and on chromosome 18 (CELF4, CUGBP elav-like family member 4). We found no evidence for genetic differences in the common allelic architecture of neuroticism by sex. By comparing our findings with those of the Psychiatric Genetics Consortia, we identified a large genetic correlation between neuroticism and MDD (0.64) and a smaller genetic correlation with schizophrenia (0.22) but not with bipolar disorder. Polygenic scores derived from the primary UK Biobank sample captured about 1% of the variance in trait liability to neuroticism. Overall, our findings confirm a polygenic basis for neuroticism and substantial shared genetic architecture between neuroticism and MDD. The identification of 9 new neuroticism-associated loci will drive forward future work on the neurobiology of neuroticism and related phenotypes.
Introduction
Neuroticism is a dimension of personality that has been studied for about 100 years, is present in most personality trait theories and questionnaires, and is found in the lexicons of most human cultures1. Individual differences in neuroticism are highly stable across the life course1,2. Higher neuroticism is associated with considerable public health and economic costs3, premature mortality4, and a range of negative emotional states and psychiatric disorders, including major depressive disorder (MDD), anxiety disorders, substance misuse disorders, personality disorders and schizophrenia5-9. Thus, the study of neuroticism is not only important for understanding an important dimension of personality but may also illuminate the aetiology of a range of psychiatric disorders10,11.
H. J. Eysenck suggested a biological basis for neuroticism over 50 years ago12. Although the biological underpinnings of personality traits are not understood, genetic factors are clearly involved. Twin studies suggest that about 40% of the trait variance for neuroticism is heritable13-18, of which between 15-37% is explained by variation in common single nucleotide polymorphisms (SNPs)18,19 and is potentially detectable using the genome-wide association study (GWAS) paradigm. The clear links between neuroticism, psychopathology and other adverse health outcomes - and the implications for global health that would result from a better understanding of its mechanisms20 - provide a strong rationale for large-scale GWAS to identify its genetic architecture (genetic aetiology).
To date, individual GWAS of neuroticism have been limited by modest sample sizes and have delivered equivocal findings. Large meta-analyses of GWAS have also delivered modest findings, possibly as a result of the use of different neuroticism assessment instruments. The Genetics of Personality Consortium, who addressed the issue of different assessment instruments by using item response theory analysis to harmonise neuroticism scores, conducted the largest and most recent study18. The final sample included 73,447 individuals from 29 discovery cohorts plus a replication cohort. Meta-analysis identified a single genome-wide significant associated locus at MAGI1 on chromosome 3 (p=2.38 × 10−8) and in two of the cohorts common genetic variants explained approximately 15% of the variance in neuroticism19.
In the current study, seeking additional associated loci, we used data from the UK Biobank cohort21 to conduct a GWAS of neuroticism. Based on 91,370 participants from the UK, this is the largest GWAS of neuroticism to date and the most homogeneous in terms of ascertainment strategy and assessment methodology. We sought to replicate and extend our UK Biobank GWAS findings within two independent samples (the Generation Scotland Scottish Family Health Study (GS:SFHS)22 and the QIMR Berghofer Medical Research Institute Study in Adults (QIMR) cohort13-15) by conducting meta-analysis across all three samples. Additionally, we evaluated the genetic relationship between neuroticism and three major psychiatric phenotypes for which there are large publically accessible GWAS datasets: major depressive disorder (MDD); schizophrenia; and bipolar disorder (BD). Finally, we have compared our findings with those from the recently-published Genetics of Personality Consortium meta-analytic GWAS of neuroticism19.
Materials and methods
Sample
UK Biobank is a large prospective cohort of more than 502,000 residents of the United Kingdom, aged between 40 and 69 years21. Its aim is to study the genetic, environmental, medication and lifestyle factors that cause or prevent disease in middle and older age. Recruitment occurred over a four-year period, from 2006 to 2010. Baseline assessments included social, cognitive, personality (the trait of neuroticism), lifestyle, and physical health measures. For the present study, we used the first genetic data release (June 2015) based on approximately one third of UK Biobank participants. Aiming to maximise homogeneity, we restricted the sample to those who reported being of white United Kingdom (UK) ancestry and for whom neuroticism phenotype data were available (n=91,370).
We also made use of data provided by investigators from the GS:SFHS22 and QIMR cohorts13-15 to replicate and extend our GWAS findings and conduct a meta-analysis. The GS:SFHS sample comprised 7,196 individuals and the QIMR sample comprised 8,687 individuals. Individuals who had participated in both UK Biobank and GS:SFHS were removed from the latter based on relatedness checking using the genetic data.
Note that we were unable to use the data from the Genetics of Personality consortium for replication analysis as that study did not report either standardised regression coefficients (prohibiting inverse variance meta-analysis) or sample sizes (which varied considerably) for each SNP (prohibiting sample size weighted meta-analysis).
This study was conducted under generic approval from the NHS National Research Ethics Service (approval letter dated 17th June 2011, Ref 11/NW/0382) and under UK Biobank approvals for application 6553 “Genome-wide association studies of mental health” (PI Daniel Smith) and 4844 “Stratifying Resilience and Depression Longitudinally” (PI Andrew McIntosh).
Neuroticism phenotype
Neuroticism was assessed in all three cohorts (UK Biobank, GS:SFHS and QIMR) using the 12 items of the neuroticism scale from the Eysenck Personality Questionnaire-Revised Short Form (EPQ-R-S)23 (Supplementary Table S1). Respondents answered ‘yes’ (score 1) or ‘no’ (score zero) to each of the questions, giving a total neuroticism score for each respondent of between 0-12. This short scale has a reliability of more than 0.823 and high concurrent validity; for example, in a sample of 207 older people EPQ-R-S scores correlated 0.85 with the neuroticism score from the NEO-Five Factor Inventory, the scale most widely used internationally24,25.
Genotyping and imputation
In June 2015 UK Biobank released the first set of genotype data for 152,729 UK Biobank participants. Approximately 67% of this sample was genotyped using the Affymetrix UK Biobank Axiom® array and the remaining 33% were genotyped using the Affymetrix UK BiLEVE Axiom array. These arrays have over 95% content in common. Only autosomal data were available under the current data release. Data were pre-imputed by UK Biobank as fully described in the UK Biobank interim release documentation26. Briefly, after removing genotyped single nucleotide polymorphisms (SNPs) that were outliers, or were multi-allelic or of low frequency (minor allele frequency, MAF < 1%), phasing was performed using a modified version of SHAPEIT2 and imputation was carried out using IMPUTE2 algorithms, as implemented in a C++ platform for computational efficiency27,28. Imputation was based upon a merged reference panel of 87,696,888 bi-allelic variants on 12,570 haplotypes constituted from the 1000 Genomes Phase 3 and UK10K haplotype panels29. Variants with MAF < 0.001% were excluded from the imputed marker set. Stringent QC prior to release was applied by the Wellcome Trust Centre for Human Genetics (WTCHG), as described in UK Biobank documentation30.
Statistical analysis
Quality control and association analyses
Prior to all analyses, further quality control measures were applied. Individuals were removed based on UK Biobank genomic analysis exclusions (Biobank Data Dictionary item #22010), relatedness (#22012: genetic relatedness factor; a random member of each pair of individuals with KING-estimated kinship co-efficient > 0.0442 was removed), gender mismatch (#22001: genetic sex), ancestry (#22006: ethnic grouping; principal component analysis identified probable Caucasians within those individuals that were self-identified as British and other individuals were removed from the analysis) and QC failure in the UK BiLEVE study (#22050: UK BiLEVE Affymetrix quality control for samples and #22051: UK BiLEVE genotype quality control for samples). A sample of 112,031 individuals remained for further analyses. Of these, 91,370 had neuroticism scores. Genotype data were further filtered by removal of SNPs with Hardy-Weinberg equilibrium p<10-6, and of SNPs with MAF<0.01, after which 9,181,138 variants were retained. Association analysis was conducted using linear regression under a model of additive allelic effects with sex, age, array, and the first 8 principal components (Biobank Data Dictionary items #22009.01 to #22009.08) as covariates. Genetic principal components (PCs) were included to control for hidden population structure within the sample, and the first 8 PCs, out of 15 available in the Biobank, were selected after visual inspection of each pair of PCs, taking forward only those that resulted in multiple clusters of individuals after excluding individuals self-reporting as being of non-white British ancestry (Biobank Data Dictionary item #22006). The distribution of the neuroticism score was assessed for skewness and kurtosis (coefficients were 0.56 and −0.61, respectively) and found to be sufficiently ‘normal’ (both coefficients are between −1 and 1) to permit analysis using linear regression. GWAS of neuroticism were additionally performed separately for females (N=47,196) and males (N=44,174) using linear regression (as above), with age, array, and the first 8 principal components as covariates.
Heritability, polygenicity, and cross-sample genetic correlation
Univariate GCTA-GREML analyses were used to estimate the proportion of variance explained by all common SNPs for the neuroticism phenotype31. We additionally applied Linkage Disequilibrium Score Regression (LDSR)32 to the summary statistics to estimate SNP heritability (h2SNP) and to evaluate whether inflation in the test statistics is the result of polygenicity or of poor control of biases such as population stratification. Genetic correlations between neuroticism scores in the three cohorts (UK Biobank, QIMR and GS:SFHS) were tested, and genetic correlations between neuroticism, schizophrenia, bipolar disorder (BD), and major depressive disorder (MDD) were evaluated in the UK Biobank sample using LD score regression (LDSR)33, a process that allows for potential sample overlap without relying on the availability of individual genotypes32. For the psychiatric phenotypes, we used GWAS summary statistics provided by the Psychiatric Genomics Consortium (http://www.med.unc.edu/pgc/)34-36.
Polygenic risk score analyses in the QIMR and GS:SFHS samples
In the QIMR sample (N = 8,687 individuals), Polygenic Risk Scores for neuroticism (PRS-N) based on the summary statistics from the UK Biobank GWAS were computed with PLINK 1.90 (Purcell, version Sep 3rd 2015, http://pngu.mgh.harvard.edu/purcell/plink/)37, for p value thresholds (PT) 0.01, 0.05, 0.1, 0.5, and 1; following the procedure described by Wray and colleagues38. All subjects had GWAS data imputed to 1000G v.3. Only SNPs with a minor allele frequency ≥0.01 and imputation quality r2≥0.6 were used in the calculation of the PRS-N. Genotypes were LD pruned using clumping to obtain SNPs in approximate linkage equilibrium with an r2<0.1 within a 10,000bp window. Since QIMR participants were related, predictions were calculated using GCTA (Genome-wide Complex Trait Analysis, version 1.22)39, using the following linear mixed model: EPQ-N = intercept + beta0*covariates + beta2 * g + e with g~N(0, GRM), where: EPQ is neuroticism measured by EPQ (standardised sum score); covariates are age, sex, imputation chip, ten genetic principal components and the standardised PRS (PT 0.01, 0.05, 0.1, 0.5, or 1); e is error; and GRM is genetic correlation matrix. P-values were calculated using the t-statistic on the basis of the Beta and SE from the GCTA output. Variance explained by the PRS was calculated using: var(x)*b^2/var(y), where x is the PRS, b is the estimate of the fixed effect from GCTA and y is the phenotype.
In the GS:SFHS sample, PRS-N based on the UK Biobank neuroticism GWAS results were created using PRSice from observed genotypes in 7,196 individuals 22,40. SNPs with a minor allele frequency <0.01 were removed prior to creating PRS-N. Genotypes were LD pruned using clumping to obtain SNPs in linkage equilibrium with an r2<0.25 within a 200kb window. As above, five PRS-N were created containing SNPs according to the significance of their association with the phenotype, with PTs of 0.01, 0.05, 0.1, 0.5, and 1 (all SNPs). Linear regression models were used to examine the associations between the PRS-N and neuroticism score in GS, adjusting for age at measurement, sex and the first 10 genetic principal components to adjust for population stratification. The False Discovery Rate method was used to correct for multiple testing across the PRS-N at all five thresholds41.
Meta-analysis
Inverse variance-weighted meta-analysis of UK Biobank, GS:SFHS and QIMR results was performed, restricted to variants present in the UK Biobank sample, using the METAL package (http://www.sph.umich.edu/csg/abecasis/Metal). Differences in SNP coverage between studies meant that data were only available across all 3 studies for 7,642,044 of the original 9,181,138 variants from the primary analysis. Sample size therefore varies with SNP, but the total maximum sample size included in the meta-analysis was N = 106,716 (UK Biobank N = 91,370; GS:SFHS N = 6,659; QIMR N = 8,687).
Results
Neuroticism phenotype within UK Biobank
Sociodemographic details of the 91,370 UK Biobank participants used in this analysis, as well as the full UK Biobank sample, are provided in table 1 and the distributions of neuroticism scores for males and females in our sample are provided in figure 1. As expected42, mean neuroticism scores were lower for men than for women (men mean EPQ-R-S = 3.58, SD = 3.19; women mean EPQ-R-S = 4.58, SD = 3.26; p = 0.001). Principal component analysis of the 12 EPQ-R-S items showed that all items loaded highly on a single component, and the internal consistency (Cronbach alpha) coefficient was 0.84 (supplementary material, table S2). Analysis of the entire UK Biobank sample (N with data = 401,695) gave very similar results (supplementary material, table S2), suggesting the subsample analysed here is representative of the whole UK Biobank cohort.
Genome-wide association results in UK Biobank
Genome-wide association results from the UK Biobank cohort are summarized in supplementary materials: supplementary figure S1 (QQ plot); supplementary figure S2 (Manhattan plot); and supplementary table S3 (genome-wide significant loci associated with neuroticism).
Overall, the GWAS data showed modest deviation in the test statistics compared with the null (λGC = 1.152); this was negligible in the context of sample size (λGC1000 = 1.003) (figure S1). LDSR32 suggested that deviation from the null was due to a polygenic architecture in which h2SNP accounted for about 14% of the population variance in neuroticism (liability scale h2SNP = 0.136 (SE 0.0153)), rather than inflation due to unconstrained population structure (LD regression intercept = 0.982 (SE 0.014)). Estimates of heritability using GCTA were similar to those using LD score regression (h2 = 0.156, SE = 0.0074).
We observed a total of 8 independent loci exhibiting genome-wide significant associations with neuroticism (figure S2, table S3) with the strongest evidence for association coming from a locus on chromosome 8 (p = 1.28×10−15) at which there is an extensive LD block spanning 4 Mb (attributable to an inversion polymorphism which has suppressed recombination) containing at least 36 genes.
Meta-analysis of UK Biobank, GS:SFHS and QIMR samples
In the combined dataset, we obtained genome wide significance for 11 independent loci (figure 2; supplementary table S4) but for 2 of these (chromosome 7 at around 7.7 Mb and chromosome 2 at around 58.1 Mb), the evidence relies on SNPs present only in the UK Biobank sample. Importantly, both loci contain highly correlated variants that were also genome-wide significant in UK Biobank but which are no longer significant where additional data are available (supplementary table S4), suggesting neither should be considered to be associated with neuroticism. One other locus that was originally associated in the UK Biobank sample(chromosome 17 at 8.9Mb) was no longer supported by meta-analysis (figure 2, supplementary figure S2 and supplementary table S4).
Overall, the meta-analysis continued to support 5 of the 8 loci originally identified in the UK Biobank sample alone, while an additional 4 loci that were previously at a sub-threshold level of significance were now more strongly supported at genome wide-significance. It is worth noting that for the original loci identified within the UK Biobank GWAS that remained significant in meta-analysis, the best associated SNP from the meta-analysis may not be the same as that from the primary GWAS (compare table S3 and S4).
Details of the final set of 9 associated loci are provided in table 2 and the associated regions are depicted graphically as region plots in supplementary figure S3 (S3a-S3i). Candidate genes of particular note mapping to the associated loci include: the glutamatergic kainate receptor GRIK343,44; CELF4, which regulates excitatory neurotransmission45; and CRHR1, encoding corticotropin-releasing hormone receptor 1, a protein that is central to the stress response46. Associated loci are considered in greater detail within the discussion.
Stratification by sex in UK Biobank
Neuroticism scores are in general higher in women than in men and it has been postulated that neuroticism may play a stronger etiologic role in MDD in women than in men47,48, potentially explaining the greater prevalence of depressive and anxiety disorders in women49. This suggests the possibility of sex-related genetic heterogeneity. We therefore conducted secondary analyses looking for sex-specific neuroticism loci in women (N = 47,196) and men (N = 44,174) respectively. To minimize heterogeneity, this analysis was restricted to the UK Biobank samples. SNP heritability (measured by LDSR) for each sex was comparable (female h2SNP = 0.149 (SE = 0.0169); male h2SNP = 0.135 (SE = 0.0237)), and was highly correlated between the sexes (genetic correlation = 0.911 (SE = 0.07); p = 1.07×10−38) at a level that was not significantly different from 1 (p=0.21). In both sexes separately, the chromosome 8 locus was associated at genome-wide significance but no other single locus attained significance. Overall, we found no evidence for genetic differences in the common allelic architecture of neuroticism by sex.
Genetic correlation of neuroticism with MDD, schizophrenia and bipolar disorder
LDRS showed strong genetic correlation between neuroticism and MDD (genetic correlation= 0.64, SE = 0.071, p = 3.31×10−19) and a smaller, but significant, correlation between neuroticism and schizophrenia (genetic correlation = 0.22, SE = 0.05, p = 1.96×10−05). We found no significant overlap between neuroticism and bipolar disorder (genetic correlation = 0. 07, SE = 0.05, p = 0.15) (table 3).
Genetic correlations for neuroticism between UK Biobank, GS:SFHS and QIMR samples
The LDRS-calculated genetic correlation for neuroticism between the three samples was strong: between UK Biobank and GS:SFHS, genetic correlation = 0.91 (SE = 0.15, p = 4.04×10−09); between UK Biobank and QIMR, genetic correlation = 0.74 (SE = 0.14, p = 2.49×10−07), and between GS:SFHS and QIMR, genetic correlation = 1.16 (SE = 0.35, p = 0.0009).
Polygenic risk score (PRS) analysis for neuroticism in GS:SFHS and QIMR samples
Table 4 shows the results of PRS analysis (based on the UK Biobank-only GWAS) within the GS:SFHS and QIMR samples. At all thresholds tested, PRS-N predicted neuroticism, although the amount of variance explained was small (around 1%).
Discussion
To date, genetic association studies of neuroticism have identified only a single genome-wide significant locus, at MAGI119. Here, we considerably extend this number, with 9 independent loci showing genome-wide significant associations in the final meta-analysis. We additionally note that we do not robustly support the principal finding from the Genetics of Personality Consortium, in that we did not identify a genome-wide significant hit close to MAGI1 within 3p1419. However, within the UK Biobank sample, the same allele at the associated SNP from that study (rs35855737) did show a trend for association (p=0.035; 1-tailed) in the expected direction, suggesting that the association may be true.
The most significant associated locus on chromosome 8, which was independently associated at genome-wide significance for both men and women, spans a 4 Mb region of extended LD (the result of an inversion polymorphism) containing at least 36 genes (table 2 and supplementary figure S3e). The extended LD at this locus means that identifying the specific genes responsible for the association is likely to prove challenging. As an initial attempt to resolve the signal, we queried the index SNP (rs12682352) at the BRAINEAC (http://www.braineac.org/) brain eQTL resource. This identified ERI1 as the only protein coding gene within the locus whose expression was associated with the index SNP in brain, but only nominally so (p=0.019) and not at a level that would reliably point to this gene as likely explaining the association.
The locus on chromosome 17 (rs111433752 at 43.8 MB; supplementary figure S3h) similarly maps to an inversion polymorphism spanning multiple genes. As with the locus on chromosome 8, inspection of eQTLs in the region in BRAINEAC did not help to resolve the signal. Nevertheless, this locus contains a notable candidate gene, CRHR1, encoding corticotropin-releasing hormone receptor 1. In the presence of corticotropin-releasing hormone (CRH), CRHR1 triggers the downstream release of the stress response-regulating hormone cortisol. CRHR1 is therefore a key link in the hypothalamic-pituitary-adrenal (HPA) pathway which mediates the body’s response to stress and which is abnormal in severe depression46. CRHR1 per se has also been shown to be involved in anxiety-related behaviours in mice and has also been genetically associated with panic disorder in humans50.
Another potential candidate gene within the extended region of genome-wide significant association at the chromosome 17 locus is MAPT, which encodes the microtubule-associated protein Tau. There is evidence that Tau is present in the postsynaptic compartment of many neurons51 and MAPT knockout in mice leads to defects in hippocampal long-term depression (LTD)52, as well as mild network-level alterations in brain function53. The clearest candidate gene at one of the other loci, CELF4 on chromosome 18 at approximately 35Mb, encodes an mRNA binding protein known to participate in a major switch in Tau protein isoform distribution after birth in the mammalian brain54. It is expressed predominantly in glutamatergic neurones, and recent studies suggest it has a central role in regulating excitatory neurotransmission by modulating the stability and/or translation of a range of target mRNAs45.
The finding of an association with a locus on chromosome 1 (rs490647), which includes the glutamatergic kainate receptor GRIK3, is of considerable interest given that abnormalities of the glutamate system are implicated in the pathophysiology of MDD55-60. Further, a recent glutamate receptor gene expression study in a large cohort of post-mortem subjects, including some individuals with MDD who had completed suicide, found GRIK3 to be the strongest predictor of suicide44.
On chromosome 4, rs62353264 lies a short distance upstream of KLHL2, which encodes a BTB-Kelch-like protein. KLHL2 is an actin-binding protein and has also been reported to be part of a complex that ubiquitinates NPTXR, the neuronal pentraxin receptor61, amongst other targets. Expression of KLHL2 has been reported to be enriched in brain, and it is localised to cytoplasm and processes of neurons and astrocytes, being found at sites of ruffles and other actin network-containing membrane outgrowths62,63. The associated region at this locus is short (approximately 150kb), and although several other genes lie within 500kb of the peak association at this locus, none is as promising a candidate as KLHL2.
The associated region in chromosome 9p23, at around 11.2-11.7Mb (supplementary figure S3) contains no protein-coding genes; the nearest gene on the telomeric side, with its 5’-end located about 650 kb from the associated region is PTPRD. This gene encodes a receptor-type protein tyrosine phosphatase known to be expressed in brain and with an organising role at a variety of synapses64, including those that play a role in synaptic plasticity. PTPRD is also known to harbour variation associated with restless legs syndrome65. This is a credible candidate but particular caution is required given the distance between the associated locus and this gene.
In addition to identifying genome-wide signfiicant loci, our study contributes further to understanding the general genetic architecture of neuroticism and its relationship to other disorders. Our SNP-based heritability estimate for neuroticism was around 0.15 as estimated using GCTA, and only slightly lower using LDSR. This is consistent with the estimates reported by the Genetics of Personality Consortium19 in the two homogeneous subsets of the data they tested, and considerably greater than some earlier reports of approximately 6%66,67. Despite differences in the distribution of neuroticism by sex, heritability was similar for both men and women and the genetic correlation between sexes was not significantly different from 1, suggesting a similar common variant architecture for both, and that differences in trait scores are likely to result from structural variants, rare alleles and/or environmental exposures.
PRS analysis of neuroticism within the GS:SFHS and QIMR samples supported the expected highly polygenic architecture of neuroticism; despite the large discovery UK Biobank sample, but consistent with the modest number of GWS findings identified in this large sample, extremely weakly associated alleles at relaxed association thresholds (e.g., PT up to at least 0.5) contributed to the variance captured by the signal.
By comparing the overall association analysis results in our study with those from the Psychiatric Genomics Consortia, we identified a strong genetic correlation between neuroticism and MDD (0.64), and a weaker but still significant genetic correlation with schizophrenia (0.22), although not with bipolar disorder. These findings are line with evidence suggesting that neuroticism and MDD - as well as, to a lesser extent, neuroticism and schizophrenia - share genetic risk factors in common68. However, the present findings do not distinguish between a direct causal link between neuroticism and those other disorders5,7,8,69 versus pleiotropy, whereby a proportion of risk alleles that influence neuroticism also exert an effect on the clinical diagnoses. Nevertheless, our findings suggest neuroticism as a potentially fruitful measure for efforts such as the Research Domain Criteria (RDoC) initiative that seek to use fundamental and quantitative characteristics to investigate the etiology of psychiatric disorders across traditional nosological boundaries, in order to develop a more biologically-informed system of psychiatric classification70.
Our findings are of considerable interest in the context of the limited success to date of GWAS studies of MDD. A recent mega-analysis of genome-wide association studies for MDD (9,240 MDD cases and 9,519 controls in discovery phase, and 6,783 MDD cases and 50,695 controls in replication phase) failed to identify any genome-wide significant SNPs, suggesting that much larger samples are required to detect genetic effects for complex traits such as MDD36. Given the high genetic correlation between neuroticism and MDD, combining the two datasets in a meta-analysis may be a plausible strategy to optimise the power of population samples in the search for a proportion of MDD loci, while noting that the two phenotypes are not perfectly genetically correlated. The MDD locus identified in a recent study of Chinese women with recurrent (N = 5,303) and melancholic (N = 4,509) MDD by the CONVERGE consortium71 does not overlap with any of the loci reported here; given the apparent modest power to detect genome-wide significant loci in our sample, population differences between the studies and substantial differences between the phenotypes, the absence of overlap does not provide any evidence against the validity of the CONVERGE study finding. Given that neuroticism is a personality trait established as phenotypically and genetically strongly associated with MDD, the identification of several new genome-wide significant loci for neuroticism represents an important potential entry point into the biology of MDD.
Conclusion
Overall, our findings confirm a polygenic basis for neuroticism and substantial shared genetic architecture between neuroticism and MDD, and to a lesser extent with schizophrenia, though not with bipolar disorder. The identification of 9 new loci associated with neuroticism represents a significant advance in this field and will drive future work on the neurobiology of a personality trait which has fundamental importance to human health and wellbeing.
Acknowledgements
DJS is supported by an Independent Investigator Award from the Brain and Behaviour Foundation. AMM, IJD and MA are supported by Welcome Trust Strategic Award 104036/Z/14/Z. This research was conducted using the UK Biobank resource. UK Biobank was established by the Wellcome Trust, Medical Research Council, Department of Health, Scottish Government and Northwest Regional Development Agency. UK Biobank has also had funding from the Welsh Assembly Government and the British Heart Foundation. Data collection was funded by UK Biobank. The funders had no role in the design or analysis of this study, decision to publish, or preparation of the manuscript. We acknowledge support (QIMR study) from Grant W. Montgomery and Andrew C. Heath.
Conflict of interest
JPP is a member of the UK Biobank Scientific Advisory Board and IJD was a participant in UK Biobank. None of the other authors have actual or potential conflicts of interest to declare.