Abstract
Understanding how parents’ cognitive and non-cognitive skills influence offspring education is vital for educational, family and economic policy. We use genetics (GWAS-by-subtraction) to assess a latent, broad non-cognitive skills dimension. To index parental effects controlling for genetic transmission, we estimate indirect parental genetic effects of polygenic scores on childhood and adulthood educational outcomes, using siblings (N=47,459), adoptees (N=6,407), and parent-offspring trios (N=2,534) in three UK and Dutch cohorts. We find that parental cognitive and non-cognitive skills affect offspring education through their environment: on average across cohorts and designs, indirect genetic effects explain 36-40% of population polygenic score associations. However, indirect genetic effects are lower for achievement in the Dutch cohort, and for the adoption design. We identify causes of higher sibling- and trio-based estimates: prenatal indirect genetic effects, population stratification, and assortative mating. Our phenotype-agnostic, genetically sensitive approach has established overall environmental effects of parents’ skills, facilitating future mechanistic work.
Introduction
Parents and children tend to have similar educational outcomes. Given the ties between education, social mobility and health1, 2, understanding the mechanisms underlying the intergenerational transmission of education could inform efforts to alleviate inequalities. Many studies have investigated how much certain parental characteristics influence offspring education, but relatively few have considered non-cognitive skills. Whereas cognitive skills relate to learning and problem solving, non-cognitive skills are ‘socio-emotional’. Given the growing recognition of the importance of individuals’ non-cognitive skills for their educational outcomes3, it follows that parents’ non-cognitive skills might also matter.
Parents’ non-cognitive skills appear to be less salient for children’s education than parents’ cognitive skills. In one study, sons’ standardised test scores at age 16 were more strongly associated with fathers’ cognitive than non-cognitive skills (0.47 and 0.09, respectively)4. Measures of parents’ non-cognitive skills also account for less of the intergenerational transmission of socioeconomic status than cognitive skills (10 vs 20%, respectively)5, and less of the socioeconomic gap in children’s achievement (8 vs 16%)6. Indicators of non-cognitive skills in these studies included self-esteem, locus of control5, attitudes and social skills6, and perseverance and extraversion4.
Two key limitations weaken this evidence based on the relative effects of parents’ cognitive and non-cognitive skills on offspring education: poor phenotypic assessments of parents’ non- cognitive skills, and genetic confounding.
First, regarding assessment, whereas cognitive skills can be directly measured by tests of domain-specific or general cognitive performance, non-cognitive skills are more challenging with measures often inconsistent, incomplete or unreliable7, 8. There is little agreement on what non-cognitive skills to measure. Some researchers focus on personality, whereas others include self-control, self-esteem, motivation and interests. An alternative, broader definition of non- cognitive skills is all traits positively affecting educational success beyond cognitive skills9.
Important non-cognitive characteristics may have been neglected – for instance, in the study by Grönqvist et al. (2017) direct skill measures for mothers, and paternal measures of motivation, a key education-linked trait, were unavailable. Importantly, studies identifying partial effects of specific parental cognitive and non-cognitive skills are less informative about the overall influences of these domains. More severe measurement error could also mean that effects of parents’ non-cognitive relative to cognitive skills have been underestimated.
Genetic methods offer a new approach to defining and estimating the importance of domains of parental skills for offspring education. Both cognitive and non-cognitive skills (as far as we know what they are) are substantially genetically influenced, with twin study heritability estimates of 40-70%10, 11. Non-cognitive skills assessed in these studies included grit, intellectual curiosity, the Big Five personality traits, and subject-specific enjoyment and ability. A new method – ‘GWAS-by-subtraction’ – makes it possible to ‘subtract’ cognitive ability-related genetic variation from educational attainment genetic variation and assess the remaining latent genetic non-cognitive construct12. These non-cognitive aspects of educational attainment are independent of cognitive skills, and associated with higher socioeconomic attainment, more open and conscientious personality, and some psychiatric disorders (e.g., higher risk for schizophrenia, lower risk for attention deficit/hyperactivity disorder). A GWAS-by-subtraction-derived measure of non-cognitive skills captures a broader construct that is not reliant upon measurement of specific traits. This method opens up the possibility of assessing the overall effect of all parent phenotypes that are influenced by common genetic variants linked to educational attainment, independent of cognitive skills. This could include parental phenotypes not traditionally classed as ‘non-cognitive’ or ‘skills’, such as mental health. This broad, phenotype-agnostic approach is a necessary first step towards characterizing pathways from parents’ skills to offspring educational outcomes. After establishing overall effects, subsequent studies can use phenotypic measures of parental non-cognitive skills to find specific mediating mechanisms.
Second, regarding genetic confounding, existing research relies on designs that cannot distinguish social (i.e., environmental) from genetic transmission. None of the associations between parental skills and offspring education cited above were estimated using genetically sensitive designs. This is problematic, because from just parent-offspring correlations one cannot conclude that parents’ skills shape offspring education, for instance by providing resources, experiences and support. Ignoring any shared genetic influences on parents’ skills and child educational outcomes confounds estimation of the effects of parental phenotypes on offspring outcome13. To establish the extent that parents’ (non-)cognitive skills influence child educational outcomes socially, it is vital to control for inherited genetic effects.
Genetic study designs can isolate environmental effects of parental skills on offspring education, controlling for genetic transmission. Several designs estimate a genetic effect of the child’s genotype on the child phenotype (direct genetic effect), and an environmentally mediated effect of the parental genotype on the child’s phenotype (parental indirect genetic effect). For example, polygenic scores (individual-level indices of trait-specific genetic endowment; PGS) for educational attainment based on parents’ genotypes that were not transmitted to offspring, are associated with offspring attainment14–16. Non-transmitted variants affect offspring attainment indirectly via the environment shaped by parents that influences the development of their children. Complementary evidence of indirect effects of parents’ education-linked genetics on offspring education has also accumulated from sibling and adoption designs14, 15, 17, 18. It is not known whether parental indirect genetic effects on offspring education occur through cognitive or non-cognitive pathways (or both), because studies have not parsed out the contributions of sub-components of the educational attainment PGS.
Importantly, we directly compare estimates of parental indirect genetic effects obtained from different designs. Estimation of genetic associations may involve numerous biases19. Sibling, adoption and non-transmitted allele designs have different assumptions and subtle differences in biases and components affecting the estimated indirect genetic effect. As shown by our data simulations (see Supplementary Note and GitHub), indirect genetic effect estimates from the sibling and non-transmitted allele designs are more strongly biased by assortative mating and population stratification than the adoption design. Estimates obtained from the adoption design unfortunately do not capture prenatal parental environmental effects on child education. The sibling design may estimate parental indirect genetic effects with more bias from sibling genetic effects. Triangulation across designs and sensitivity analyses can help detect possible biases and quantify parental indirect genetic effects and other environmental effects.
In the current study (pre-registration: https://osf.io/mk938/), we use a novel approach to estimate the social effects of parents’ cognitive and non-cognitive skills on offspring education. We deploy GWAS-by-subtraction to estimate individuals’ genetic endowments (PGS) for cognitive and non-cognitive skills, and test how much these operate environmentally via parental influences on offspring educational outcomes. We provide a multi-cohort comparison of parental indirect genetic effects in three cohorts of genotyped families in two countries with different educational systems (UK Biobank, UK Twins Early Development Study, Netherlands Twin Register). Each cohort includes multiple achievement outcome measures (i.e., standardised test results and teacher-reported grades in childhood and adolescence) and attainment (i.e., years of completed education reported in adulthood). We triangulate across three complementary study designs for estimating parental indirect genetic effects and assess the presence of components and biases.
Results
GWAS-by-subtraction results
We identified the genetic components of cognitive and non-cognitive skills using Genomic SEM, following Demange et al. 2020, in samples that excluded participants used for polygenic score analyses. Educational attainment and cognitive performance meta-analytic summary statistics (see Methods) were regressed on two independent latent variables, Cog and NonCog (see Supplementary Figure 1). These two latent factors were then regressed on 1,071,804 HapMap3 SNPs in a genome wide association (GWA) design. The LD score regression-based SNP heritabilities of Cog and NonCog were 0.18 (SE=0.01) and 0.05 (SE=0.00), respectively. More information on the GWAS is presented in Supplementary Table 1.
Descriptive statistics
SNP associations with the Cog and NonCog latent variables provided the weights to create individual-level polygenic scores in 3 cohorts with family data and educational achievement and/or attainment outcomes. Sample sizes for individuals with polygenic score and educational outcome data were: 39,500 UK Biobank siblings, 6,409 UK Biobank adoptees, up to 4,796 DZ twins in the Twins Early Development Study (TEDS), up to 3,163 twins and siblings in the Netherlands Twin Register (NTR), and up to 2,534 NTR individuals with both parents genotyped. Full phenotypic descriptive statistics are available in Supplementary Table 2.
Overview of the three designs for estimating direct and indirect polygenic score effects
To estimate direct offspring-led and indirect parent-led effects of polygenic scores for cognitive and non-cognitive skills on educational outcomes, we considered three family-based genomic designs. The designs are illustrated in Figure 1. All models jointly included Cog and NonCog PGS. Note that population effects are equivalent to PGS effects estimated in standard population analyses that do not use within-family data. In contrast, within-family designs exploit the principles of Mendelian segregation or the natural experiment of adoption to separate direct and indirect/social components of the overall population PGS effect. Importantly, a direct genetic effect is only direct in the sense that it does not originate from another individual’s genotype.
Analytical designs to estimate direct and parental indirect genetic effects. Note: square = observed variable, circle = unobserved / latent variable; β = estimated effect of polygenic score (PGS) on outcome; the population effect of a PGS captures both direct and indirect genetic effects; direct genetic effects (controlling for indirect genetic effects) are represented with solid arrows.
Direct effects are also not necessarily ‘purely’ genetic, but lead to educational outcomes via intermediate pathways, and are expressed in the context of environments.
First, the sibling design estimates indirect genetic effects by comparing population-level and within-family (i.e., within-sibling or within-DZ twin) polygenic score associations (equation (1))17. The direct effect of a polygenic score is estimated based on genetic differences between siblings, which are due to random segregations of parental genetic material, independent of shared family effects (including parental indirect genetic effects). Specifically, the direct effect is estimated using a variable representing individuals’ (i) polygenic scores minus the average polygenic score for their family (j): the within-family beta (βWithin in equation (1)). The population effect of a polygenic score is estimated in a separate model, simply regressing the outcome variable on polygenic score differences between individuals from different families (equation (2)). The indirect genetic effect is obtained by subtracting the within-family PGS effect estimate from the population effect estimate.

Note: EA is the educational outcome, PGS is the polygenic score (for Cog PGS(Cog) and NonCog PGS(NonCog)). PGS refers to the average polygenic score in the family j. i refers to the individual sibling. α0 refers to the intercept, PCs are principal components to capture genetic ancestry. See Supplementary Note for a comparison of different versions of this sibling design, using data simulation.
Second, indirect genetic effects can be estimated by comparing polygenic score associations estimated in a sample of adoptees against those estimated for individuals who were reared by their biological parents18. Therefore, we estimate the regression model shown in equation (2) separately for adoptees and for non-adopted individuals.
The population effect is estimated as the polygenic score effect on phenotypic variation among non-adopted individuals (i.e., a combination of direct and indirect genetic mechanisms). The direct genetic effect is the effect of the polygenic score among adoptees. Adoptees do not share genes by descent with their adoptive parents, so we expect their polygenic scores to be uncorrelated with the genotypes of their adoptive parents. Therefore, the polygenic score effect in adoptees cannot be inflated by environmentally mediated parental indirect genetic effects. In this design, the indirect genetic effect is estimated by subtracting this direct PGS effect from the population effect estimated in the non-adopted group. When taking the difference, it is important that the groups are similar in characteristics other than genetic relatedness to their parents. We did not find strong evidence for differences in several demographic and early-life characteristics of adoptees and non-adopted individuals in the UK Biobank (see Supplementary Table 11, Supplementary Note, and Supplementary Figure 2).
Third, indirect genetic effects can be estimated, and disentangled from direct genetic effects, using information on parental genetic variation that was not transmitted to offspring14, 15 (equation (3)).
The population effect is estimated from a polygenic score based on transmitted variants (βT). Transmitted genetic variants are present in an offspring and in at least one of their parents, and so may influence offspring education via both direct and indirect mechanisms. The parental indirect genetic effect is estimated as the effect of a polygenic score based on parental variants that were not transmitted to offspring (βNT). Non-transmitted variants can only take effect on offspring education through the environment. The direct genetic effect is estimated by partialling out the effect of the non-transmitted polygenic score from that of the transmitted polygenic score (βT - βNT). Maternal and paternal scores are averaged in order to create overall parental non- transmitted polygenic scores.
Parents’ heritable cognitive and non-cognitive skills both influence offspring education indirectly via the environment
In the overall meta-analysis across cohorts, designs and outcomes, the Cog PGS showed a slightly stronger association with educational outcomes than the NonCog PGS (indicated by the total height of the bars in Figure 2a; population βNonCog=0.22, SE=0.01; population βCog=0.25, SE=0.01). We investigated environmental effects of parents’ non-cognitive and cognitive skills on offspring education by estimating the contribution of parental indirect genetic effects to the population effects of NonCog and Cog PGS. Figure 2a shows that, for both NonCog and Cog PGS, indirect genetic effects of parents on offspring education were present (meta-analytic indirect βNonCog=0.08, SE=0.03; indirect βCog=0.10, SE=0.01), in addition to direct genetic effects (direct βNonCog=0.14, SE=0.03; direct βCog=0.15, SE=0.02). Averaged across all designs, outcomes and cohorts, indirect environmentally-mediated effects explained 36% of the population effect of the NonCog PGS, and 40% of the population effect of the Cog PGS.
a. Population effects of NonCog and Cog PGS on educational outcomes include both direct and indirect genetic mechanisms. Indirect genetic effects work through the environment that parents provide for their children. Notes: beta coefficients were obtained from meta- analysis of effects across cohorts, designs and outcome phenotypes; bars = 95% CIs.
b. Estimates of direct and indirect effects of NonCog and Cog PGS by cohort (for age 12 and adult outcomes), using the sibling design only. NTR is a Dutch cohort (N=1631 and N=3163 respectively), TEDS (N=2862) and UKB (N=16,624) are UK cohorts; bars = 95% CIs.
c. Estimates of direct and indirect effect of NonCog and Cog PGS by analytic design (for adult educational attainment outcomes only). Samples sizes: N=42,663 (results meta- analysed across UKB and NTR); N=6407 adoptees and 6500 non-adopted individuals (UKB); N=2534 trios in NTR; bars = 95%CIs.
However, results varied depending on the methods used and outcomes investigated. Results per cohort, outcome and design, as well as population genetic effects and the ratio of indirect to population effects are reported in Supplementary Table 3 and Supplementary Figure 3, 4 and 5. Meta-analytic results are reported in Supplementary Table 4. Z-tests results comparing direct and indirect effects are reported Supplementary Table 5.
Estimates of parental indirect genetic effects vary slightly by age, outcome and cohort
Figure 2b shows estimates of direct and indirect genetic effects of NonCog and Cog PGS for different cohorts and educational outcomes, holding the design constant (i.e., the sibling design, which was available for all cohorts and outcomes). Estimates were highly consistent across cohorts except for age 12 achievement in Dutch versus UK cohorts: indirect genetic effects were negligible and represented a small fraction of the population effect in NTR (3% and 23% for NonCog and Cog, respectively), whereas they accounted for 56% and 48% of the population effects of NonCog and Cog PGS in TEDS. For adult educational attainment, estimates of direct and indirect effects were more similar for the Dutch (NTR: indirect βNonCog=0.11, SE=0.03; indirect βCog=0.06, SE=0.03) and UK (UKB: indirect βNonCog=0.12, SE=0.01; indirect βCog=0.12, SE=0.01) cohorts. See Supplementary Table 3 for full results.
Estimates of indirect genetic effect depend on the analytical design: adoption- based estimates are lower
Figure 2c shows estimates of direct and indirect genetic effects of NonCog and Cog PGS for different designs, holding the phenotype constant (i.e., educational attainment, which was available for all three methods). While estimates obtained with sibling and non-transmitted PGS methods indicate equal indirect effect sizes (indirect βs for educational attainment ranged between 0.11-0.12; see Supplementary Tables 3 and 4), the adoption design yielded low to null indirect genetic effects for both NonCog and Cog PGS (indirect βNonCog=0.02, SE=0.02; indirect βCog=0.08, SE=0.02).
Figure 3 summarises how the three designs estimate parental indirect genetic effects in the presence of different contributors, thus highlighting explanations for lower adoption-based estimates. This information is based on simulations (see Supplementary Note, Supplementary Figure 9, and GitHub). First, unlike the sibling and non-transmitted allele designs, the adoption design does not capture indirect genetic effects occurring in the prenatal period. Second, the adoption design estimates indirect genetic effects with less bias from population stratification and assortative mating. Notably, the adoption design uniquely estimates parental indirect genetic effects without bias from assortative mating if there is no parental indirect genetic effect, and is slightly less biased by assortment than the other designs in the presence of a parental indirect genetic effect. Any excess indirect genetic effect estimated in the sibling/non-transmitted allele designs compared to the adoption design therefore indicates the overall impact of population stratification, assortative mating, and prenatal indirect genetic effects.
Estimates of parental indirect genetic effects from the three designs, based on data simulated to include different components (parental prenatal and postnatal indirect genetic effects) and biases (sibling indirect genetic effects, assortative mating, and population stratification). Boxplots of 100 replicates based on a simulated sample of 20,000 families. Red line is the true simulated (postnatal) parental indirect effect.
With the adoption design, the indirect genetic effect of the NonCog PGS on educational attainment in UK Biobank is 83% lower than with the sibling design, while it is only 33% lower for Cog. This suggests that estimates for NonCog are affected more strongly than Cog by population stratification, assortative mating and/or prenatal indirect genetic effects.
Indirect genetic effects from siblings are the only potential source of difference between sibling- and trio-based estimates – positive sibling effects inflate estimates from the sibling design but not the other (see Supplementary Note, Supplementary Figure 9, and GitHub). Since we did not find evidence of differences between results from these two designs, sibling indirect genetic effects are likely to be small or non-existent.
Population stratification and assortative mating, but not sibling indirect effects, might inflate estimates of indirect genetic effects from sibling and non- transmitted alleles designs
Although triangulating designs suggested that population stratification, assortative mating, and prenatal indirect genetic effects contribute to the higher estimated parental indirect genetic effects from non-transmitted alleles/sibling designs relative to the adoption design, this approach cannot disentangle the relative importance of these individual biases. To this end, we conducted additional sensitivity analyses to assess the magnitudes of these biases (not pre-registered).
First, we analysed the GWAS summary data on which the polygenic scores were based, using LD score regression to detect population stratification. The LD score regression ratio statistics of uncorrected educational attainment and cognitive performance GWAS were 0.11 (SE=0.01) and 0.06 (SE=0.01), respectively (Supplementary Table 1). These non-null estimates indicated that a small but significant portion of the GWAS signal was potentially attributable to residual population stratification. As CP seems less prone to population stratification than EA, it is possible our estimates of direct and indirect genetic effects of NonCog were more biased by population stratification than Cog.
Second, we detected slight evidence of assortative mating, which appeared stronger in the UK than Dutch cohorts. In NTR, parental PGS correlations are non-significant (NonCog r= 0.03, Cog r=0.02). Sibling PGS intraclass correlations ranged between 0.49-0.52 in NTR, and between 0.53-0.56 in TEDS and UK Biobank. This supports the presence of assortative mating on NonCog and Cog PGS potentially biasing our estimates of indirect genetic effects in UK cohorts, but less in our Dutch cohort. See Supplementary Table 6 for full correlations.
Third, we performed three sensitivity analyses, none of which supported the presence of indirect effects of siblings’ NonCog and Cog PGS on individuals’ educational outcomes. Our first approach leveraged sibling polygenic scores, the rationale being that in the presence of a sibling effect, a sibling’s PGS will influence a child’s outcome beyond child and parent PGS. In NTR, siblings’ NonCog or Cog PGS had non-significant effects on achievement and attainment (Supplementary Table 7). In a second approach, the difference in PGS effects on EA between monozygotic (MZ) and dizygotic (DZ) individuals was tested. Since MZ twins are more genetically similar than DZ twins, their PGS should capture more of the indirect genetic effect of their twin. In NTR and TEDS, PGS effects were not significantly different between MZs and DZs (Supplementary Table 8 & Supplementary Figure 6). Finally, in UKB, we tested PGS effects on EA given the number of siblings individuals reported having. If more siblings leads to a stronger sibling effect, this will be captured as an increased effect of an individual’s own PGS on the outcome in the presence of more genetically related siblings. As a negative control, we conducted the same analysis in adoptees. Since adoptees are unrelated to their siblings, their PGS do not capture sibling effects at any family size. NonCog PGS effects weakly increased with number of siblings, but this pattern was also present in adoptees, suggesting confounding by unobserved characteristics of families with numerous children (Supplementary Table 9 & Supplementary Figure 7).
Discussion
We used genetic methods to study environmental effects of parents’ skills on child education. We found evidence that characteristics tagged by NonCog and Cog polygenic scores (PGS) are both involved in how parents provide environments conducive to offspring education. Indeed, indirect genetic mechanisms explained 36% of the population effect of the NonCog PGS, and 40% of the population effect of the Cog PGS (population βNonCog=0.22, SE=0.01; population βCog=0.25, SE=0.01). This result was consistent across countries, generations, outcomes and analytic designs, with two notable exceptions. First, estimated parental indirect genetic effects were null for childhood achievement in our Dutch cohort (NTR), but not for comparable outcomes in our UK cohort (TEDS). Second, parental indirect genetic effects estimated with the adoption design were lower than for the sibling and non-transmitted allele designs, particularly for the NonCog PGS. Given our evidence from data simulations that the adoption-based estimates of indirect genetic effects are more robust to population stratification and assortative mating, these biases may contribute substantially in the other two designs, especially for the NonCog PGS. This was supported by results from sensitivity analyses.
This study demonstrates utility of genetic methods for assessing elusive phenomena: non- cognitive skills, and genuine environmental influences from parents, unconfounded by offspring- led effects of inherited genes. Compared to analysing a set of measured parental non-cognitive skills, our GWAS-by-subtraction approach captures a wider array of traits linked genetically to attainment, and therefore broadly quantifies the overall salience of parents’ non-cognitive skills. Our evidence that parents’ non-cognitive and cognitive skills are both important for children’s education complements the growing literature that has considered effects of specific measured skills within both of these domains4, 5. These studies found that effects of parents’ non-cognitive skills on offspring education were less than half the size of the effects of parents’ cognitive skills. In contrast, we found that indirect genetic effects of NonCog PGS were almost as large as for Cog skills. This discrepancy is likely to stem from our comprehensive definition of non- cognitive skills, as we do not rely on possibly unreliable and incomplete phenotypic measures. Importantly, the parental indirect genetic effects we have identified may capture proximal forms of ‘nurture’ (e.g., a parent directly training their child’s cognitive skills, or cultivating their child’s learning habits through participation and support) and/or more distal environmental effects (e.g., a parent’s openness to experience leading them to move to an area with good schools). The environmental effects of parents’ non-cognitive and cognitive skills are likely to be larger than we estimate, because our approach only captures effects of parent skills tagged by current GWAS. Polygenic scores index a subset of the common genetic component of parent skills, which is in turn a fraction of the total genetic component (missing heritability20, 21), and cannot account for the non-heritable component of parent skills.
The lower importance of parental indirect genetic effects for child achievement in the Netherlands compared to similar UK outcomes indicates that our UK achievement outcomes more strongly capture variation in family background. This difference could result from the design of these achievement measures: Dutch test results are standardized based on a representative population, but UK teacher reports might still be affected by student social background. Societal differences offer another explanation. Some argue that estimates of family shared-environmental variance in twin studies are indicators of social inequality, and this logic holds for indirect genetic effects22. For adult attainment, results were more consistent across UK and Dutch cohorts, corresponding with recent evidence for consistent shared-environment influence on educational attainment across social models23. This consistency also suggests that the difference in childhood is not due to a cohort or population difference. The higher indirect genetic effects in adult attainment might reflect an increase in environmental variance due to tracking in secondary schools in the Netherlands16. Socioeconomic disparities in achievement seem to increase more between ages 10 and 15 in the Netherlands than in the UK24. Despite no statistically significant parental indirect genetic effects on the achievement test at 12, children whose parents have a higher education are more likely to enroll in a higher educational track25, suggestive of greater parental effects on secondary and later education, which should be tested in further studies.
We found that the choice of design used to estimate indirect genetic effects matters, with the adoption design giving systematically lower estimates. Direct comparison of results across designs suggested that 33% (for Cog) and 83% (for NonCog) of the indirect genetic effects on adult educational attainment, estimated using the sibling design, are due to population stratification, assortative mating, and prenatal indirect genetic effects. The importance of population stratification for genetic associations with educational attainment was suggested by recent UK Biobank studies26, 27, and was reflected in our sensitivity analyses. Our LD score regression results indicated residual population stratification, which was more severe for the NonCog GWAS. There was some evidence of assortative mating, with sibling PGS correlations above expectation (>0.5) particularly in the UK cohorts. This country difference in assortment is supported by the lower estimated spouse PGS correlations in NTR (0.02 for Cog, 0.03 for NonCog) than for the EA PGS in the UK Biobank (0.06)28. There was no statistically significant difference in assortative mating between Cog and NonCog, suggesting that population stratification explains the particularly large design-based discrepancy between estimates of indirect genetic effects for NonCog. Population stratification should be carefully considered in studies using NonCog PGS. Methods should be developed to parse the contributions of assortative mating, population stratification, indirect and direct genetic effects to complex traits. This could be achieved using genomic data on extended pedigrees, inspired by extended twin- family designs29. Additionally, indirect genetic effects on education might not only arise from parents but might span across more than a single generation, for example the influence of grandparents. Since cumulative indirect genetic effects are all removed when a child is adopted, their presence would contribute to the observed difference in indirect effect between the adoption and other designs.
Regarding siblings, we did not find evidence that indirect effects of siblings’ NonCog and Cog PGS affect individual differences in educational outcomes, using three different approaches. This corresponds with null findings regarding indirect effects of siblings’ educational attainment genetics in the UK Biobank26, 27. This does not rule out the existence of indirect sibling genetic effects in other populations (or effects such as parental compensation of sibling PGS differences30). Indirect genetic effects of sibling EA PGS were found in an Icelandic cohort15.
One extended twin study found that the sibling environment contributed 12% of the total phenotypic variation in educational attainment in Norway, whereas the environment provided by parents explained only 2.5% of the variance31. It is possible that our PGS analyses were not sufficiently powered to detect indirect genetic effects of siblings, since they were based on lower sample size than our main analyses. However, our results suggest that indirect genetic effects of siblings on education are small. Therefore, our methods provide good proxies for parental indirect genetic effects, with minimal inflation from sibling effects.
Our data suggest that the adoption design provides a useful lower-bound estimate of indirect genetic effects of parents. Given that there was no evidence for sibling effects of the Cog or NonCog PGS, our adoption-based estimates, less biased by population stratification and assortative mating, are likely a closer measure of parental indirect genetic effects. However, three factors may make the adoption-based estimates of indirect genetic effects too conservative. First, adoption based indirect effect estimates exclude prenatal indirect genetic effects (and indirect genetic effects taking place between the birth and moment of adoption), which might influence educational outcomes32, 33. While we are unable to test for prenatal indirect effects, these could be investigated in cohorts with pregnancy information, adjusting for postnatal indirect genetic effects. Second, adoptees may have been exposed to a narrower range of environments (e.g., family socioeconomic status) compared to non-adopted individuals 34. This form of selection bias is likely to increase the genetic variance at the expense of the indirect genetic effect. Third, selective placement of children in adoptive families matching characteristics of their biological families could result in correlation between child and (adoptive) parent genotypes, leading to an underestimation of the indirect genetic effect. There is modest evidence for selective placement of adoptees based on education in the US35. We cannot directly test for selection factors in the UK Biobank, since there is no information on the adoptive parents.
We acknowledge several limitations. First, while we suggest that an attribute of our study is the broad and phenotype-agnostic characterisation of non-cognitive skills, our GWAS-by- subtraction approach is unable to identify specific parental characteristics, and is also still limited by measures of cognitive performance and educational attainment in the original GWAS. Some cognitive skills might not be reflected in the available Cognitive Performance GWAS, so the NonCog factor could capture genetic influences affecting cognition. However, previous analyses have shown that NonCog PGS predicts substantially less variation in cognition than the Cog PGS36. Additionally, our NonCog latent variable reflects the residual variance of adult educational attainment, and therefore is a measure of non-cognitive aspects of adult EA. Non- cognitive aspects of childhood achievement might differ somewhat, which might lead to an underestimation of indirect genetic effects of the NonCog PGS on these outcomes. Second, the generalisability of our results is limited. Highly educated individuals are over-represented in all cohorts. Participation bias also affects GWAS results37. Selection effects may be especially strong in the adoption design as adoptions may depend on (partially heritable) phenotypes of the biological parents, and many adoptive parents are also selected on the basis of their (partially heritable) behavioural phenotypes. Additionally, only participants of European descent were included in the analysis. Third, replication efforts are needed. Special effort should be targeted to include diverse ancestry participants. While our overall estimates are well powered due to the aggregation of cohorts, some analyses rely on a single sample. As such, results from these analyses might reflect specifics of these samples and not design-specific biases, and should be replicated. Finally, although our within-family methods allowed us to evaluate biases in polygenic score effects within the target samples, the same biases are likely to influence the effect size estimates from the original GWAS upon which our polygenic scores are based. Increasingly large within-sibship GWAS will allow this to be resolved.
Several future research directions emerge. First, given that we have quantified overall environmental effects of parents on offspring education tagged by NonCog and Cog PGS, the next step is to identify specific mediating parent characteristics, whether proximal or distal.
Researchers could also examine mediating child characteristics on the pathway between their parents’ characteristics and their own educational outcomes. We speculate that parents’ non- cognitive skills do not affect offspring education by affecting those same non-cognitive skills in offspring. This is because existing twin research shows no influence of shared environmental factors on individual differences in children’s measured non-cognitive skills such as grit and self-control38–40.
A second future direction is to incorporate gender and socioeconomic status into research on indirect genetic effects on education. Twin data show that shared environmental contributions to educational attainment are larger for women than for men23. It is unknown whether this finding holds for indirect genetic effects and for childhood achievement. Another gender aspect to consider is differential maternal and paternal indirect genetic effects41. There is some evidence (although not genetically informed) that mother and father skills show unique associations with offspring education4. Indirect effects of parents’ genetic endowment for non-cognitive skills on child education might be mediated or moderated by parents’ income and cultural capital (including school-related skills and habits). While the home learning environment has been found to be more stimulating in higher socioeconomic status families42, 43, there is recent evidence that low-income mothers report more frequent activities that facilitate cognitive stimulation44.
In sum, this study provides evidence for environmental effects of parents’ non-cognitive and cognitive skills on offspring educational outcomes, indexed by indirect genetic effects of polygenic scores. Combining three cohorts and three designs for estimating indirect genetic effects allowed us to obtain robust findings. These results have significance for human health, as the role parents play in successful cognitive development and (mental) health development go hand in hand.
Methods
Our research complies with all relevant ethical regulations. Project approval for the Twins Early Development Study (TEDS) was granted by King’s College London’s ethics committee for the Institute of Psychiatry, Psychology and Neuroscience PNM/09/10–104. Ethical approval for the Netherlands Twin Register (NTR) was provided by the Central Ethics Committee on Research Involving Human Subjects of the VU University Medical Center, Amsterdam, and Institutional Review Board certified by the U.S. Office of Human Research Protections (IRB number IRB-2991 under Federal-wide Assurance-3703; IRB/institute codes 94/105, 96/205, 99/068, 2003/182, 2010/359) and participants provided informed consent. The UK Biobank has received ethical approval from the National Health Service North West Centre for Research Ethics Committee (reference: 11/NW/0382). Informed consent was obtained from all human participants.
The study methods were pre-registered on the Open Science Framework (https://osf.io/mk938/) on the 24/02/2020. Additional non-preregistered analyses are indicated as such below and should be considered exploratory. Additional deviations from the pre-registration are detailed in Supplementary Note.
Samples
UK Biobank
The UK Biobank is an epidemiological resource including British individuals aged 40 to 70 at recruitment45. Genome-wide genetic data came from the full release of the UK Biobank data, and were collected and processed according to the quality control pipeline46.
We defined three subsamples of the UK Biobank to be used for polygenic score analyses: adopted participants, a control group of non-adopted participants, and siblings. Starting with UK Biobank participants with QC genotype data and educational attainment data (N=451,229), we first identified 6,407 unrelated adopted individuals who said yes to the question “Were you adopted as a child?” (Data-Field 1767). We restricted the sample to unrelated participants (kinship coefficient <1/(2^9/2))47. Second, our comparison sample (N=6,500) was drawn at random from non-adopted participants who were unrelated to each other and to the adopted participants. Third, we identified 39,500 full-siblings, excluding adopted individuals. We defined full-siblings as participants with a kinship coefficient between 1/(2^(3/2)) and 1/(2^(5/2)) and a probability of zero IBS sharing >0.0012, as suggested by 46 and 47.
After excluding the three sub-samples for polygenic score analyses and individuals related to these participants, we were left with 388,196 UK Biobank individuals with educational attainment (EA) data, and 202,815 individuals with cognitive performance (CP) data. We used these remaining individuals for the GWAS of EA and CP, and later meta-analysis with external GWASs48 (see ‘Statistical Analyses’ and Supplementary Note).
Twins Early Development Study (TEDS)
The Twins Early Development Study (TEDS) is a multivariate, longitudinal study of >10,000 twin pairs representative of England and Wales, recruited 1994–199649. The demographic characteristics of TEDS participants and their families closely match those of families in the UK. Analyses were conducted on a sub-sample of dizygotic (DZ) twin pairs with genome-wide genotyping and phenotypic data on school achievement at age 12 (1,431 DZ pairs) and age 16 (2,398 pairs).
Netherlands Twins Register (NTR)
The Netherlands Twin Register (NTR)50 is established by the Department of Biological Psychology at the Vrije Universiteit Amsterdam and recruits children and adults twins for longitudinal research. Data on health, personality, lifestyle and others, as well as genotyping data have been collected on participants and their families.
We included in our analyses genotyped European-ancestry participants. We created a subsample of full-siblings. NTR contains information on numerous monozygotic multiples (twins or triplets). Because MZ multiples share the same genes, we randomly excluded all individuals but one per MZ multiple. Only siblings with complete genetic and outcome data were subsequently included in the analyses: 1,631 siblings with CITO (achievement test taken during the last year of primary school) data (from 757 families) and 3,163 siblings with EA data available (from 1,309 families).
We created a subsample with complete offspring, maternal and paternal genotypic data (i.e., trios). Among individuals with available parental genotypes, respectively 1,526 (from 765 families) and 2,534 (from 1,337 families) had reported CITO and EA information.
The sibling and trio subsets are not independent: for CITO, 823 participants are present in both subsets, 1,374 for EA.
Phenotypic Measures
UK Biobank
Educational attainment and cognitive performance phenotypes were defined following Lee et al. 2018 48. From data-field 6,238, educational attainment was defined according to ISCED categories and coded as the number of Years of Education. The response categories are: none of the above (no qualifications) = 7 years of education; Certificate of Secondary Education (CSEs) or equivalent = 10 years; O levels/GCSEs or equivalent = 10 years; A levels/AS levels or equivalent = 13 years; other professional qualification = 15 years; National Vocational Qualification (NVQ) or Higher National Diploma (HNC) or equivalent = 19 years; college or university degree = 20 years of education. For cognitive performance, we used the (standardized) mean of the standardized scores of the fluid intelligence measure (data-field 20016 for in-person and 20191 for an online assessment).
TEDS
Educational achievement at age 12 was assessed by teacher reports, aggregated across the three core subjects (Mathematics, English, and Science).
Educational achievement at age 16 was assessed by self-reported results for standardized tests taken at the end of compulsory education in England, Wales and Northern Ireland: General Certificate of Secondary Education; GCSE). GCSE grades were coded from 4 (G; the minimum pass grade) to 11 (A∗; the highest possible grade). As with the age 12 measure, we analysed a variable representing mean score for the compulsory core subjects.
NTR
Educational attainment was measured by self-report of the highest obtained degree51. This measure was re-coded as the number of years in education, following Okbay et al. 201652.
Academic achievement is assessed in the Netherlands by a nation-wide standardized educational performance test (CITO) around the age of 12 during the last year of primary education. CITO is used to determine tracking placement in secondary school in the Netherlands, in combination with teacher advice. The total score ranges from 500 to 550, reflecting the child’s position relative to the other children taking the test this particular year.
Genotype quality control
UK Biobank
SNPs from HapMap3 CEU (1,345,801 SNPs) were filtered out of the imputed UK Biobank dataset. We then did a pre-PCA QC on unrelated individuals, and filtered out SNPs with MAF < .01 and missingness > .05, leaving 1,252,123 SNPs. After removing individuals with non- European ancestry, we repeated the SNP QC on unrelated Europeans (N = 312,927), excluding SNPs with MAF < .01, missingness >.05 and HWE p < 10-10, leaving 1,246,531 SNPs. The HWE p-value threshold of 10-10 was based on: http://www.nealelab.is/blog/2019/9/17/genotyped-snps-in-uk-biobank-failing-hardy-weinberg-equilibrium-test. We then created a dataset of 1,246,531 QC-ed SNPs for 456,064 UKB subjects of European ancestry. Principal components were derived from a subset of 131,426 genotyped SNPs, pruned for LD (r2 > 0.2) and long-range LD regions removed53. PCA was conducted on unrelated individuals using flashPCA v254.
TEDS
Two different genotyping platforms were used because genotyping was undertaken in two separate waves. AffymetrixGeneChip 6.0 SNP arrays were used to genotype 3,665 individuals. Additionally, 8,122 individuals (including 3,607 DZ co-twin samples) were genotyped on Illumina HumanOmniExpressExome-8v1.2 arrays. After quality control, 635,269 SNPs remained for AffymetrixGeneChip 6.0 genotypes, and 559,772 SNPs for HumanOmniExpressExome genotypes.
Genotypes from the two platforms were separately phased and imputed into the Haplotype Reference Consortium (release 1.1) through the Sanger Imputation Service before merging. Genotypes from a total of 10,346 samples (including 3,320 DZ twin pairs and 7,026 unrelated individuals) passed quality control, including 3,057 individuals genotyped on Affymetrix and 7,289 individuals genotyped on Illumina. The identity-by-descent (IBD) between individuals was < 0.05 for 99.5% in the merged sample excluding the DZ co-twins (range = 0.00 – 0.12) and ranged between 0.36 and 0.62 for the DZ twin pairs (mean = 0.49). There were 7,363,646 genotyped or well-imputed SNPs (for full genotype processing and quality control details, see55).
To ease high computational demands for the current study, we excluded SNPs with MAF <1% and info < 1. Following this, 619216 SNPs were included in polygenic score construction.
Principal components were derived from a subset of 39,353 common (MAF > 5%), perfectly imputed (info = 1) autosomal SNPs, after stringent pruning to remove markers in linkage disequilibrium (r2 > 0.1) and excluding high linkage disequilibrium genomic regions to ensure that only genome-wide effects were detected.
NTR
Genotyping was done on multiple platforms, following manufacturers protocols: Perlegen- Affymetrix, Affymetrix 6.0, Affymetrix Axiom, Illumina Human Quad Bead 660, Illumina Omni 1M and Illumina GSA. For each genotype platform, samples were removed if DNA sex did not match the expected phenotype, if the PLINK heterozygosity F statistic was < -0.10 or > 0.10, or if the genotyping call rate was < 0.90. SNPs were excluded if the MAF < 1×10-6, if the Hardy- Weinberg equilibrium p-value was < 1×10-6, and/or if the call rate was < 0.95. The genotype data was then aligned with the 1000 Genomes reference panel using the HRC and 1000 Genomes checking tool, testing and filtering for SNPs with allele frequency differences larger than 0.20 as compared to the CEU population, palindromic SNPs and DNA strand issues. The data of the different platforms was then merged into a single dataset, and one platform was chosen for each individual. Based on the ∼10.8k SNPs that all platforms have in common, DNA identity-by- descent state was estimated for all individual pairs using the Plink and King programs. Samples were excluded if these estimates did not correspond to expected familial relationships. CEU population outliers, based on per platform 1000 Genomes PC projection with the Smartpca software, were removed from the data. Then, per platform, the data was phased using Eagle and then imputed to 1000 Genomes and Topmed using Minimac following the Michigan imputation server protocols. Post-imputation, the resulting separate platform VCF files were merged with Bcftools into a single file per chromosome for each reference, for SNPs present on all platforms. For the polygenic scoring and parental re-phasing, the imputed data were converted to best guess data and were filtered to include only ACGT SNPs, SNPs with MAF > 0.01, HWE p > 10 -5 and a genotype call rate > 0.98, and to exclude SNPs with more than 2 alleles. All mendelian errors were set to missing. The remaining SNPs represent the transmitted alleles dataset. 20 PCs were calculated with Smartpca using LD-pruned 1000 Genomes–imputed SNPs genotyped on at least one platform, having MAF > 0.05 and not present in the long-range LD regions. Using the --tucc option of the Plink 1.07 software pseudo-controls for each offspring were created, given the genotype data of their parents. This resulted in the non-transmitted alleles dataset, as these pseudo-controls correspond to the child’s non-transmitted alleles. To determine the parental origin of each allele, the transmitted and non-transmitted datasets were phased using the duoHMM option of the ShapeIT software. The phased datasets were then split based on parental origin, resulting in a paternal and maternal haploid dataset for the transmitted and non- transmitted alleles.
Statistical analyses
All statistical tests are two-sided, unless otherwise stated.
NonCog GWAS-by-subtraction
To generate NonCog summary statistics, we implemented a GWAS-by-subtraction using Genomic SEM following Demange et al. 2020 using summary statistics of EA and cognitive performance obtained in samples independent from our polygenic score samples.
We ran a GWAS of Educational Attainment and Cognitive Performance in UK Biobank (polygenic score sample left-out). We meta-analysed them with the EA GWAS by Lee et al. excluding 23andMe, UK Biobank and NTR cohorts, and with the CP GWAS by Trampush et al. respectively (EA total N=707,112 and CP N=238,113). More information on these methods and intermediate GWAS are found in Supplementary Note and Supplementary Table 1.
Following Demange et al. 2020, we used EA and CP meta-analysed summary statistics to create two independent latent variables: Cog, representing the genetic variance shared between EA and CP, and NonCog representing the residual genetic variance of EA when regressing out CP (Supplementary Figure 1). These two latent factors were regressed on each SNP: we obtained association for 1,071,804 SNPs (HapMap3 SNPs, as recommended when comparing PGS analyses across cohorts). We calculate the effective sample size of these GWAS to be 458,211 for NonCog and 223,819 for Cog.
Polygenic Score construction in UK Biobank, TEDS and NTR
Polygenic scores of NonCog and Cog were computed with Plink software (version 1.9 for NTR, 2 for UKB and TEDS) 56, 57 based on weighted betas obtained using the LD-pred v1.0.0 software using infinitesimal prior, a LD pruning window of 250kb and 1000Genomes phase 3 CEU population as LD reference. Weighted betas were computed in a shared pipeline. In NTR, scores for non-transmitted and transmitted genotypes were obtained for fathers and mothers separately so we average them to obtain the mid-parent score.
Polygenic score model fitting
Each model included cognitive and non-cognitive polygenic scores simultaneously and controlled for: 10 ancestry principal components (PCs), sex and age, interaction between sex and age, and cohort-specific platform covariate (NTR: genotyping platform, UKB: array, TEDS: batch). Polygenic scores and outcome variables were scaled. Age was estimated by year of birth, age at recruitment or age at testing depending on the cohorts, see Supplementary Table 2.
Correlations between NonCog and Cog PGS, as well as between and within-family PGS are reported Supplementary Table 10.
All regressions were linear models with lm() in R rather than mixed models as in previous analyses16, 17 and our pre-registered methods. See Supplementary Note: Deviation from pre- registered methods for the justification based on simulated data. We obtained bootstrapped standard errors and bias-corrected confidence intervals (normal approximation) for the population, indirect and direct effects, as well as the ratios of indirect/direct and indirect/population effect. We ran ordinary non-parametric bootstraps using 10,000 replications with boot() in R. For the sibling design, where two independent regressions are used, we use the same bootstrap samples for both (both regressions were run within the same boot object). For the adoption design, the bootstrapped samples are drawn from the adopted and non-adopted samples separately. The bootstrap estimates were used to test for the difference between the direct and indirect effect in both Cog and NonCog and the difference between the ratio indirect/population for Cog and NonCog, using Z-tests.
Additional analyses (not pre-registered)
Meta-analyses
To estimate the overall indirect and direct effects of NonCog and Cog polygenic scores, we meta-analysed estimates across cohorts, designs and phenotypic outcomes.
To compare results obtained across the three different designs, we meta-analysed effect sizes obtained from each design across cohorts, but holding the outcome constant (educational attainment). The adoption design was only applied to EA in UKB, hence no meta-analysis was necessary.
Meta-analyses were conducted using the command rma.mv() in the R package metafor. Design was specified as a random intercept factor, except when results were meta-analysed within- design.
Investigation of the presence of biases
Population stratification
Population stratification refers to the presence of systematic difference in allele frequencies across subpopulations, arising from ancestry difference due to non-random mating and genetic drift. This leads to confounding in genetic association studies. In a PGS analysis, bias due to population stratification can arise from both the GWAS used to create the scores and the target sample. We corrected for population stratification in the target sample by adjusting analyses for PCs (although this may not remove fine-scale stratification). For the GWAS summary statistics, the ratio statistics LDSC output is a standard measure of population stratification58. As a rule of thumb a LDSC intercept higher than 1 (inflated) indicates presence of population stratification. Because we corrected the standard errors of the EA GWAS for inflation and GenomicSEM corrects for inflation as well, the ratio statistics of the Cog and NonCog GWAS are not a valid indication of population stratification (ratio <0 following GC correction). We therefore use the ratio statistics of uncorrected EA and CP GWAS as proxies. Ratio and LDscore intercept was assessed with the ldsc software58.
Assortative mating
Assortative mating refers to the non-random mate choice, with a preference for spouses with similar phenotypes. If these preferred phenotypes have a genetic component, assortative mating leads to an increased genetic correlation between spouses, as well as between relatives 28.
Assortative mating can therefore be inferred from elevated correlations between polygenic scores in siblings (correlations would be 0.5 without assortative mating) and between parents (correlations would be 0 without assortative mating). We estimated sibling intraclass correlations of Cog and NonCog PGS in UKB, TEDS and NTR, and Pearson’s correlations of paternal and maternal Cog and NonCog PGS in NTR. Notably, these observed correlations cannot distinguish assortative mating from population stratification.
Sibling effects
We performed three additional analyses to investigate indirect genetic effects of siblings on educational outcomes.
First, we ran a linear mixed model extending our main non-transmitted alleles design to include polygenic scores of siblings (equation (4)). To this end, we used data from NTR on DZ pairs and both of their parents. Sample sizes of genotyped ‘quads’ with offspring CITO or EA phenotypes were 657 and 788, respectively.

Second, we can also assess the presence of sibling genetic effects using monozygotic and dizygotic twins. Because monozygotic twins have the same genotypes, the genetically-mediated environment provided by the cotwin is more correlated to the twin genotype in MZ twins than in DZ twins. The sibling genetic effect is more strongly reflected in the polygenic score prediction of the educational outcome for MZ twins than for DZ twins. If the sibling genetic effect is negative, the polygenic score effect (betas) on the outcome in people that have an MZ twin will be lower than in people that have a DZ twin, it will be higher in those with an MZ twin then those with an DZ twin if the sibling genetic effect is positive. We therefore compare Betas from equation (2) in a subset of MZ twins and in a subset of DZ twins (one individual per pair) in both NTR (NMZ=818 & NDZ=865 for CITO and NMZ=1,600 & NDZ=1,369 for EA) and TEDS (NMZ=546 & NDZ=2,709)
Third, the presence of sibling genetic effects can be assessed using data on the number of siblings participants have. If an individual has more siblings we expect their polygenic scores to be more correlated to sibling effects. As the number of siblings increases (if we assume linear increase) so does the degree to which a PGS captures sibling effects. If the sibling genetic effect is positive, the effect of the Cog and NonCog PGS on the educational outcome should increase with the number of siblings. However, family characteristics and environment might differ across families depending on the number of children. Therefore, changes in the effect of the PGS on our outcome with the number of siblings could be due to factors other than sibling genetic effects (for example, there is a known negative genetic association between number of children and EA59 which could result in confounding). By also looking at changes in the effect of the Cog and NonCog PGS on the educational outcome in adopted (unrelated) sibships, we break the correlation between PGS and any sibling effects. If there is a change in PGS effect on the educational outcome in adopted children dependent on the number of (non-biological) siblings, we can assume this effect to be caused by mechanisms other than a sibling effect. We finally contrast the change in PGS depending on family size in biological and adopted siblings to get an idea of the sibling effect minus any other confounding effects of family size. We use the total number of reported siblings (full siblings for non-adopted and adopted siblings for adopted individuals, data-fields: 1873, 1883, 3972 & 3982).
Data availability
Summary Statistics of Cog and NonCog used in this paper are available upon request. Summary Statistics of cognitive performance from the COGENT cohort, of EA excluding NTR and UK Biobank cohorts are available upon request to the communicating author of these papers.
For UK Biobank dataset access, see: https://www.ukbiobank.ac.uk/using-the-resource/.
Netherlands Twin Register data may be accessed, upon approval of the data access committee, email: ntr.datamanagement.fgb{at}vu.nl
Researchers can apply for access to TEDS data: https://www.teds.ac.uk/researchers/teds-data-access-policy
Code availability
All scripts used to run the analyses (empirical and simulated) are available at: https://github.com/PerlineDemange/GeneticNurtureNonCog
All additional software used to perform the analyses are available online. The pre-registration of the study is available on OSF: https://osf.io/mk938/
Author Contributions
RC & PAD conceived and designed the study, with helpful contributions from MGN. PAD & RC analysed the data, with help from JJH to obtain polygenic score weights and AA to perform GWAS in UK Biobank. PAD, MGN, RC, and EME performed the simulation study. RC & PAD wrote the manuscript. JJH, AA, EME, MM, BWD, ELdZ, KR, TCE, DIB, EvB, and GB contributed to the interpretation of data, provided feedback on manuscript drafts and approved the final draft.
Competing Interests
The authors declare no competing interests.
Acknowledgments
We thank Dr. Aysu Okbay, the SSGAC and COGENT consortiums for sharing their summary statistics for GWAS of educational attainment and cognitive performance excluding specific cohorts. PAD is supported by the grant 531003014 from The Netherlands Organisation for Health Research and Development (ZonMW). RC is supported by an ESRC studentship. AA is supported by the Foundation Volksbond Rotterdam and by ZonMw grant 849200011 from The Netherlands Organisation for Health Research and Development. KR is supported by a Sir Henry Wellcome Postdoctoral Fellowship. DIB is supported by the Royal Netherlands Academy of Science (KNAW) Professor Award (PAH/6635). EvB is supported by ZonMW grant 531003014 and VENI grant 451-15-017. MGN is supported by R01MH120219, ZonMW grants 849200011 and 531003014 from The Netherlands Organisation for Health Research and Development, a VENI grant awarded by NWO (VI.Veni.191G.030) and is a Jacobs Foundation Research Fellow. This research has been conducted using the UK Biobank Resource under Application Number 40310. The Netherlands Twin Register is supported by NWO Groot (480-15-001/674): Netherlands Twin Register Repository: researching the interplay between genome and environment and the Avera Institute for Human Genetics, Sioux Falls, South Dakota (USA) for genotyping. We gratefully acknowledge the research program ‘Consortium on Individual Development (CID)’ which is funded through the Gravitation program of the Dutch Ministry of Education, Culture and Science and the Netherlands Organization for Scientific Research (NWO: 0240-001-003). We gratefully acknowledge ‘Open Data Infrastructure for Social Science and Economic Innovations (ODISSEI) (NWO: NRGWI.obrug.2018.008)’. The authors gratefully acknowledge the ongoing contribution of the participants in the Twins Early Development Study (TEDS) and their families. TEDS is supported by a programme grant to Thalia Eley from the UK Medical Research Council (MR/V012878/1 and previously to Robert Plomin MR/M021475/1 and G0901245), with additional support from the US National Institutes of Health (AG046938). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Footnotes
We added simulations to assess effects of biases on the three designs. We ran the sibling design analyses using the population effect instead of the between-sibling effect as "total" effect. We updated Figure 1 and 2 and changed Figure 3 to illustrate the simulations. We revised the manuscript to improve its clarity.