Abstract
Background Diverse behaviour problems in childhood correlate phenotypically, suggesting a general dimension of psychopathology that has been called the p factor. The shared genetic architecture between childhood psychopathology traits also supports a genetic p. This study systematically investigates the manifestation of this common dimension across self-, parent- and teacher-rated measures in childhood and adolescence.
Methods The sample included 7,026 twin pairs from the Twins Early Development Study (TEDS). First, we employed multivariate twin models to estimate common genetic and environmental influences on p based on diverse measures of behaviour problems rated by children, parents and teachers at ages 7, 9, 12 and 16 (depressive symptoms, emotional problems, peer problems, autistic symptoms, hyperactivity, antisocial, conduct and psychopathic symptoms). Second, to assess the stability of genetic and environmental influences on p across time, we conducted longitudinal twin modelling of the first phenotypic principal components of childhood psychopathological measures across each of the four ages. Third, we created a genetic p factor in 7,026 unrelated genotyped individuals based on eight polygenic scores for adult psychiatric disorders to estimate how a general polygenic predisposition to adult psychiatric disorders relates to childhood p.
Results Behaviour problems were consistently correlated phenotypically and genetically across ages and raters. The p factor is substantially heritable (50-60%), and manifests consistently across diverse ages and raters. Genetic correlations of p components across childhood and adolescence suggest stability over time (49-78%). A polygenic general psychopathology factor, derived from studies of adult psychiatric disorders consistently predicted a general phenotypic p factor across development.
Conclusions Diverse forms of psychopathology consistently load on a common p factor, which is highly heritable. There are substantial genetic influences on the stability of p across childhood. Our analyses indicate genetic overlap between general risk for psychiatric disorders in adulthood and p in childhood, even as young as age 7. The p factor has far-reaching implications for genomic research and, eventually, for diagnosis and treatment of behaviour problems.
Introduction
The p factor, analogous to the concept of general intelligence (‘g’), reflects the observation that individuals who score highly for certain symptoms also score highly on others (Lahey et al. 2012; Caspi et al. 2014). Recent research suggests that this single continuous dimension can, in part, summarise and explain liability to a wide range of psychopathologies in childhood.
Interest in the p factor stemmed initially from high levels of psychopathological comorbidity in adults. The co-occurrence of psychiatric disorders is strikingly high, with up to 50% of individuals diagnosed with a mental illness going on to develop two or more comorbidities in a 12-month period (Kessler et al. 2005). Already during childhood and adolescence, forms of psychopathology are often comorbid. A recent report found that 1 in 20 British young people under 20 years of age met criteria for 2 or more mental disorders (NHS Digital 2017).
Quantitative genetic research suggests that shared genetic factors contribute substantially to the observed co-occurrence of psychopathological traits (Plomin et al. 2016). Several multivariate twin and family studies have replicated the finding that a common genetic factor influences a wide range of emotional and behavioural problems in childhood (Waldman et al. 2016; Tackett et al. 2013; Lahey et al. 2011). Many studies have investigated developmental genetic effects on specific psychopathological traits in childhood (e.g. Pingault et al. 2015), yet little is known about the genetic and environmental architecture of general psychopathology across development. Stability and change in p across time, and the extent to which genetic influences drive age-related patterns remain largely unknown. Here, for the first time, we systematically investigate p across diverse ages, raters and measures in childhood and adolescence.
It is also unknown to what extent a general p factor across earlier development relates to adult psychopathology. In addition to genetic analyses using the twin and family designs, polygenic scores are a new genomic tool that can be used to test for shared genetic effects across traits. Polygenic scores are constructed by aggregating genetic risk across thousands of genetic variants, thus indexing the genetic liability that each individual carries for a specific trait. A landmark study in the field of psychiatric genetics (International Schizophrenia Consortium et al. 2009) first showed that a polygenic score for schizophrenia was also associated with bipolar disorder, suggesting a shared genetic component underlying these two disorders, which has been substantiated further more recently (Cross-Disorder Group of the Psychiatric Genomics Consortium, 2019). Several studies have used polygenic scores for schizophrenia, ADHD, and other psychiatric disorders to predict general psychopathology in childhood. An increasing amount of evidence converges on the finding that few polygenic effects specific to individual aspects of psychopathology remain after conditioning on the p factor (Jones et al. 2018; Jones et al. 2016; Riglin et al. 2018; Brikell et al. 2017). These studies also suggest that genetic risk for psychiatric disorders emerges in childhood, in the form of continuously measured behavior problems. More recently, a study using different genomic methods provided evidence for a ‘polygenic p’ factor, yielding similarly high weights for psychiatric disorders across the different approaches (Selzam et al. 2018). However, no studies to date have empirically related ‘polygenic p’ to ‘phenotypic p’ or systematically tested the architecture of p across development and across different raters.
Here we investigated the structure of general psychopathology across childhood and adolescence. Our study has three aims:
Investigate the genetic architecture of p in childhood through common pathway twin models across ages and raters.
Test the stability of p across childhood and adolescence through longitudinal quantitative genetic analysis of first principal components of psychopathology across ages (7, 9, 12 and 16) and raters (parent-, teacher- and self-ratings).
Estimate associations between childhood phenotypic p and adult polygenic p. The latter can be constructed by principal component analysis of polygenic scores for adult psychiatric disorders created for each TEDS participant.
Methods
Sample
The sampling frame is the Twins Early Development Study (TEDS), a multivariate, longitudinal study of >10 000 twin pairs representative of England and Wales, recruited from 1994 - 1996 births (Haworth et al. 2013). Analyses were conducted on a sub-sample of unrelated individuals with available genotype data and their co-twins (N = 7,026). Genomic analyses were limited to unrelated individuals (one twin from each pair).
Genotyping
Data were available for 3,057 individuals genotyped on the AffymetrixGeneChip 6.0 and 3,969 individuals genotyped HumanOmniExpressExome-8v1.2 arrays. Typical quality control procedures were followed (e.g., samples were removed based on call rate <0.98, MAF <0.5%). Genotypes from the two platforms were separately imputed and then harmonized (for detail see Selzam et al. 2018).
Measures
Twins Early Development Study (TEDS) measures have been described previously (Haworth et al., 2013). Measures administered at ages 7, 9, 12, and 16 were included in our analyses. Some of these measures (e.g. peer problems, prosocial behaviour (reversed), autistic traits) have not previously been used in other studies of general psychopathology, but we adopted a hypothesis-free approach in an attempt to capture a general trait that is pervasive across diverse domains. For similar reasons, we included all measures available at each age, even though some measures (e.g., aggression) were available only at one age.
For all phenotypes, z-standardised residuals were derived for each scale regressed on sex and age. Composite scores were calculated as unit-weighted means, with the requirement of complete data for at least half the individual measures contributing each composite (i.e., 3 of 4 measures, or 2 of 3, sub-scales measures). All procedures were executed using RStudio (Version 1.1.419; RStudio 2019).
Age 7 measures
We used both parent and teacher ratings of all subscales of the Strengths and Difficulties Questionnaire (SDQ) (Hyperactivity, Conduct Problems, Peer Problems, Emotional Problems, and Prosocial (reversed); Goodman 1997), as well as the Antisocial Process Screening Device (APSD) and autistic symptoms (ASD).
Age 9 measures
The 5 subscales of the SDQ and ASD were included in the set of self-, parent- and teacher- reported measures. In addition, we used parent- and teacher-rated APSD and aggression (a mean of proactive and reactive scales) measures.
Age 12 measures
The 5 subscales of the SDQ, the APSD, and the Childhood Autism Spectrum Test (CAST; Williams et al., 2005) were included in the set of self-, parent- and teacher-reported measures. Parent reports of the Moods and Feelings Questionnaire (MFQ) assessing depressive symptoms, and the Conner’s ADHD behaviours measure were also available.
Age 16 measures
The 5 subscales of the SDQ, CAST, MFQ and callous-unemotional measures were available from self reports and teacher reports. Parent-rated data on Conner’s ADHD measure was also included.
Statistical analyses
Common pathway twin models of behaviour problem measures for each rater at each age
To estimate the genetic and environmental influence on phenotypic variance in general psychopathology, and to examine loadings of individual psychopathology measures on p, we conducted multivariate twin model-fitting analyses. In the twin design, differences in within-pair trait correlations for monozygotic (MZ) and dizygotic (DZ) twins are used to estimate genetic, shared environmental, and non-shared environmental effects on traits. Greater MZ than DZ similarity indicates additive genetic influence (A). Within-pair similarity that is not due to genetic factors is attributed to shared environmental influences (C). Non-shared environment (E) accounts for individual-specific factors that influence differences among siblings from the same family, plus measurement error. We considered genetic and environmental associations between all psychopathology measures at each age and separately for each rater. Specifically, we fit the data to the common pathway model (Rijsdijk 2005). This is a multivariate twin model, in which common genetic and environmental variation influence all measures via a single common latent (p) factor. The model allows the estimation of genetic and environmental influences on a common factor (p), and of the factor loadings of each measure of psychopathology on the latent liability (p).
Longitudinal twin analysis: Cholesky decomposition of phenotypic principal components
We performed a Cholesky decomposition of the parent-rated phenotypic p principal components, allowing for the investigation of stability and innovation in the genetic and environmental influences on our measures of p across the four ages. We focused on parent-rated data since measures were much more consistent across time than for self report and teacher report. The first genetic factor (A1) represents genetic influences on p at age 7. The extent to which these same genes also influence p at ages 9, 12, and 16 is also estimated, and is represented by the diagonal pathways from A1 to the other variables. The second genetic factor (A2) represents genetic influences on p at age 9 that are independent of those influencing age 7. The extent to which these genes also influence p at ages 12 and 16 is also estimated. The third genetic factor (A3) indexes genetic influences on p at age 12 that are independent of genetic influences shared with the previous ages. The impact of these genes on age 16 general psychopathology is also estimated. Finally, the fourth genetic factor (A4) represents residual genetic influences on age 16 general psychopathology. The same decomposition is done for the shared environmental and non-shared environmental influences (C1–4 and E1–4, respectively). All twin model fitting analysis using full-information maximum likelihood were carried out with structural equation modelling software OpenMx (Neale et al., 2016).
Extracting p: Principal Component Analyses (PCA)
In preparation for longitudinal analyses and genomic prediction analyses, we obtained the first principal component (1st PC) of behaviour problem phenotypes at each age separately for child, parent and teacher ratings. Only individuals with complete data were used to generate PCs. We report full results from PCA, which in themselves give insights into the phenotypic architecture of p in childhood. The variance explained by the first PC suggests how much the p factor underpins diverse forms of psychopathology, and loadings of each measure on the first PC indicate the extent to which variables reflect general psychopathology.
We also obtained the first PC from polygenic scores for psychiatric disorders (polygenic p). We used publicly available genome-wide association summary statistics for 8 major psychiatric traits: autism spectrum disorder (Grove et al. 2019), major depressive disorder (MDD; (Wray et al. 2018)), bipolar disorder (BIP); schizophrenia (SCZ; (Pardiñas et al. 2018)); attention deficit hyperactivity disorder (ADHD; (Demontis et al. 2019)); obsessive compulsive disorder (OCD; (International Obsessive Compulsive Disorder Foundation Genetics Collaborative (IOCDF-GC) and OCD Collaborative Genetics Association Studies (OCGAS) 2018)); anorexia nervosa (AN; (Duncan et al. 2017)); post-traumatic stress disorder (PTSD; (Duncan et al. 2017)). For each psychiatric disorder, polygenic scores for each TEDS participant were created in LDpred (Vilhjálmsson et al. 2015), assuming a fraction of causal markers of 1 (analysis steps were analogous to Selzam et al. (2018)).
Assessing the association between the polygenic 1st PC and the phenotypic 1st PC across childhood
To assess the extent to which the genetic predisposition for a general psychopathology factor relates to p in childhood, we performed ordinary least square regression analyses of phenotypic p on polygenic p at each age separately by each rater. Age, sex and the first 10 genomic principal components were regressed from all dependent and independent variables, and standardized residuals were used in all linear models.
Results
Common pathway twin models
Common pathway twin models showed substantial heritability for the p-factor at each age for all raters (50% to 60%. See Figure 1 for parent-rated measures and Supplementary Figures S1 and S2 for teacher-rated and child-rated measures, respectively. Shared environmental effects were moderate for the parent-rated common factors (~30%) (Supplementary Figure 1), absent for the teacher-rated common factors (~0%; Supplementary Figure S1), and weak for the self-rated common factors (~15%, declining with age; Supplementary Figure S2). Autistic symptoms, conduct problems, antisocial behavior and psychopathic symptoms loaded the highest on the parent-rated and teacher-rated common factors, while emotional problems, depression and anxiety loaded the highest for the child-rated p factor. We also found substantial specific genetic and environmental variance for all measures suggesting unique influences on psychopathological measures beyond the p factor. See Supplementary Table S1 for model-fitting parameters, including sample sizes of measures, which ranged from 2216 to 5592 twin pairs who also had genotype data, and see Supplementary Table S2 for model fit statistics.
Cholesky decomposition of p across development
The Cholesky decomposition of principal components suggests stability of genetic effects on general psychopathology across childhood and adolescence, in addition to new genetic components at each age, as shown in Figure 2 for parent-ratings. Supplementary Figure S5 shows genetic correlations derived from a correlated factors solution. Age-to-age genetic correlations derived from these results are high, ranging from 0.49 to 0.78 (see Supplementary Figure 5). Supplementary Figures S3 and S4 present the Cholesky model-fitting results for shared and non-shared environmental variance components, respectively. Supplementary Figures S6 to S15 indicates phenotypic correlations among psychopathology measures at all ages and for all raters. Figure S16 shows that correlations between phenotypic principal components across age are also substantial, ranging from .47 to .68. These correlations are notably similar to genetic correlations from the Cholesky model. Supplementary Table S3 lists loadings of observed measures on first principal components, which shows that loadings are consistently substantial for all measures, ages and raters. The first unrotated principal component of phenotypic measures accounted for 40% to 50% of the variance across ages and raters (see Supplementary Table S4, which also shows the sample sizes for each 1st PC, which ranged from 1391 to 4490).
Prediction of phenotypic p with polygenic p
A polygenic p score defined as the first unrotated principal component of polygenic scores for adult psychiatric disorders was significantly associated with phenotypic p scores in childhood, predicting 0.3% to 0.9% of the variance across ages and raters. See Supplementary Table S5 for full polygenic prediction results. Prediction was generally consistent across ages and raters, although standard errors are largely overlapping (see Figure 3). Supplementary Figure S17 shows correlations between the polygenic scores in TEDS used to derive polygenic p. Although these correlations are modest (0.01 to 0.32), the first principal component of polygenic scores from psychiatric traits explained up to 20% of the polygenic score variability. The pattern of loadings for polygenic p are shown in Supplementary Figure S18.
Discussion
For the first time, we systematically quantified the extent to which a single common factor relates to diverse forms of psychopathology across childhood and adolescence using phenotypic, genetic and genomic methods. Phenotypically, our results confirm previous findings of a strong p factor involving all measures at all ages for all raters. Our genetic results support three main conclusions. Firstly, multivariate twin analyses revealed that 48% to 80% of the variance in the common factor was due to genetic influences, depending on age and raters considered. Secondly, longitudinal twin model-fitting showed that this genetic p factor was stable across time. Thirdly, polygenic prediction analyses demonstrate that there are shared genetic influences connecting childhood psychopathology to general risk for adult psychiatric disorders. In sum, these analyses provide further evidence that a common genetic substrate permeates the landscape of psychopathology, across measures, ages and raters. It is important to note that although we found a consistent and stable genetic p factor across childhood and adolescence, substantial unique genetic and environmental influences indicate that there are also genetic components specific to each trait and each age beyond p.
Our common pathway twin modelling analyses, for which we adopted a hypothesis-free approach to the inclusion of measures, show that diverse psychopathological traits contribute to p. Furthermore, it is commonly acknowledged that all psychopathological traits are dimensional traits both at the phenotypic and genetic levels (Plomin et al. 2009). Future research might investigate the extent to which p might extend to other behavioural domains. For example, suggestive evidence of links between p and personality have begun to emerge (Rosenström et al. 2018).
Differences between raters in our common pathway twin analyses suggested some additional insights. Firstly, inspection of the loadings of psychopathology measures revealed that ‘externalising’ problems relating to conduct and antisocial behavior contributed most to parent- and teacher-rated common factors, whereas ‘internalising’ problems such as depression and anxiety loaded the highest for the child-rated p factor. This could suggest that parents report on overt behaviours, which might stem from worry and sadness from the child’s perspective. Secondly, we observed that shared environmental influences were moderate for the parent-report-based p factor, but negligible for self- and teacher-rated p, respectively. This pattern of results is most likely due to rater bias in that parent ratings are based on a single informant rating both twins, whereas for teacher and self ratings different informants rate each twin (Bartels et al. 2004).
Our longitudinal twin model-fitting and polygenic scoring revealed substantial genetic influences on stability of general psychopathology across childhood. Our polygenic score results suggest that these stable genetic influences overlap with those underlying adult psychiatric disorders. Future research could assess influences on different temporal trajectories of p across childhood and adolescence. One study recently showed that polygenic scores for neurodevelopmental disorders (schizophrenia, ADHD) and depression were associated with early adolescent onset depression, whereas later onset depression was only predicted significantly by depression polygenic scores (Rice et al. 2018). This could be repeated with more powerful polygenic p scores.
Naturally, through the course of multivariate longitudinal studies like TEDS, there are changes in available measures and informants, which in turn can introduce variability in the pattern of results. That is, our measures of p are not perfect indices of general liability to psychopathology, but reflect the specific measures and raters available at each age. This is problematic when estimating genetic and environmental influences on stability and change in p across time. Specifically, any innovation cannot solely be attributed to p, as it will reflect new influences on new measures that were not available at the previous age. This criticism is difficult to overcome even with the availability of consistent data: exactly the same measure at different time points does not necessarily reflect the same thing. We consider that the availability of varied measures is a strength rather than a limitation of the present study because this means that our strong evidence for genetic p and genetic stability for p emerges despite the use of different measures. In the cognitive literature on g, this phenomenon is known as the indifference of the indicator – any set of diverse cognitive measures yields a strong g factor (Spearman, 1904). Factor loadings were consistently substantial, not only across measures but also across ages and raters. Importantly, the phenotypic correlations between first principal components across time (ranging between ~0.5 and ~0.7) suggest that p indexes a consistent core psychopathology trait.
The fact that we can predict childhood p using polygenic p derived from adult case-control genome-wide association studies has several interesting implications. Firstly, it suggests that in young children there are already manifestations of genetic risk for adult psychiatric disorders. In other words, early onset behavioural and emotional problems are early signs of psychiatric genetic risk. This is particularly striking given that genetic stability for psychopathology often does not begin until adolescence, and supports other evidence for the usefulness of early intervention for psychiatric problems. The second implication of the genetic overlap between p in childhood and adulthood relates to research design. Specifically, researchers could increase the power of genome-wide association studies to detect DNA variation associated with general risk for psychopathology by aggregating diverse traits across wide age ranges. One way to implement this is a common-factor genome-wide association analysis using Genomic SEM (Grotzinger et al. 2018). Similarly, the modest power of psychiatric polygenic scores to predict traits in childhood could be enhanced using multi-trait frameworks to generate predictors that leverage the shared genetic risk between traits (e.g. SMTpred; Maier et al. 2018).
The current clinical zeitgeist focuses on specificity. The recognition that a common factor transcends diverse aspects of psychopathology in childhood is of primary importance, as this knowledge can inform early detection of children at risk in the general population.
Key points
*We investigated the underlying structure of p across diverse measures, ages and raters, and consistently found a substantial genetic component, in line with previous theory.
*We showed that this genetic component is stable across time, with influences in childhood being pervasive across development through to adolescence.
*Genomic analyses revealed shared genetic risk between p in children as young as 7 and general risk for adult psychiatric disorders.
*We provide further evidence that, in addition to residual variation specific to each trait, a common genetic substrate permeates the landscape of psychopathology.
Acknowledgements
We gratefully acknowledge the ongoing contribution of the participants in the Twins Early Development Study (TEDS) and their families. TEDS is supported by a program grant to R.P. from the UK Medical Research Council (MR/M021475/1 and previously G0901245), with additional support from the US National Institutes of Health (AG046938). The research leading to these results has also received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007–2013)/ grant agreement no. 602768 and ERC grant agreement no. 295366). R.P. is supported by a Medical Research Council Professorship award (G19/2). T.C. Eley is part funded by the above program grant from the UK Medical Research Council (MR/M021475/1). This study represents independent research part funded by the National Institute for Health Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health. High-performance computing facilities were funded with capital equipment grants from the GSTT Charity (TR130505) and Maudsley Charity (980). R.C. is supported by an ESRC studentship. AGA has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement no. 721567. J.-B.P. is a fellow of MQ: Transforming Mental Health (MQ16IP16).
Abbreviations
- P
- general psychopathology factor
- PCA
- principal component analysis