Abstract
Pedigree-based analyses of intelligence have reported that genetic differences account for 50-80% of the phenotypic variation. For personality traits, these effects are smaller with 34-48% of the variance being explained by genetic differences. However, molecular genetic studies using unrelated individuals typically report a heritability estimate of around 30% for intelligence and between 0% and 15% for personality variables. Pedigree-based estimates and molecular genetic estimates may differ because current genotyping platforms are poor at tagging causal variants, variants with low minor allele frequency, copy number variants, and structural variants. Using ~20 000 individuals in the Generation Scotland family cohort genotyped for ~520 000 single nucleotide polymorphisms (SNPs), we exploit the high levels of linkage disequilibrium (LD) found in members of the same family to quantify the total effect of genetic variants that are not tagged in GWASs of unrelated individuals. In our models, genetic variants in low LD with genotyped SNPs explain over half of the genetic variance in intelligence, education, and neuroticism. By capturing these additional genetic effects our models closely approximate the heritability estimates from twin studies for intelligence and education, but not for neuroticism and extraversion. From an evolutionary genetic perspective, a substantial contribution of genetic variants that are not common within the population to individual differences in intelligence, education, and neuroticism is consistent with mutation-selection balance.
The scores of different types of cognitive ability tests correlate positively and the variance that is shared between tests is termed general intelligence, general cognitive ability, or g.1 General intelligence typically accounts for around 40% of the overall variance among humans in batteries that contain tests of diverse cognitive abilities. The personality traits of extraversion and neuroticism are two of the five higher-order personality factors that are consistently identified in dimensional models of personality. High levels of extraversion are associated with positive affectivity and a tendency to engage with, and to enjoy, social situations, but also shows phenotypic and genetic associations with mental disorders like attention deficit hyperactivity disorder.2 High levels of neuroticism are associated with stress sensitivity, as well as mental and physical disorders.2,3 All of these traits are partly heritable, but have also been linked to evolutionary fitness. This poses an ostensible paradox, that could be resolved if rare variants, which are less amenable to selection, are found to play a major role in the genetic contribution to variance in these traits. Using a recently-developed analytic design for combined pedigree and genome-wide molecular genetic data, we test whether rare genetic variants, copy number variants (CNVs), and structural variants make an additional contribution to the genetic variance in intelligence, neuroticism, and extraversion.
General intelligence has been found to be heritable, with twin and family studies estimating that 50% to 80%4 of phenotypic variance is due to additive genetic factors, a proportion that increases with age from childhood to adulthood.5 Heritability can also be estimated from molecular genetic data. Using the genomic-relatedness-matrix restricted maximum likelihood (GREML) method, additive common single nucleotide polymorphisms (SNPs) are estimated to collectively explain between 20% and 50% of variation in general intelligence,6,7 with an estimate of around 30% in the largest studies.8 General intelligence is also a significant predictor of fitness components including mortality,9 fertility,10,11 higher social status,12 as well as mental and physical disease,5 and it is associated with developmental stability,13,14 suggesting that general intelligence is not selectively neutral.
As selective pressure on a trait is expected to deplete its genetic variation, the existence of such robust heritability findings seems paradoxical when evolutionary theory is considered.15 However, mutation-selection balance provides an explanation of how genetic variation can be maintained for quantitative traits that are under directional selective pressure. Mutation-selection balance describes instances where mutations that are deleterious to the phenotype occur within a population at the same rate that they are removed through the effects of selective pressure. Due to the removal of variants with deleterious effects on the phenotype, the existence of common variants of medium to large effect is not expected under mutation-selection balance. This is consistent with the current findings from large GWAS on cognitive phenotypes, including general intelligence and education, where common single nucleotide polymorphisms (SNPs) collectively explain a substantial proportion of phenotypic variance, but the individual effect size of each genome-wide significant SNP discovered so far is around 0.2%.16,17
Population genetic simulations show that very rare (Minor allele frequency, MAF < 0.1%) variants explain little of the population variance in traits that are not under selection.18 However, the contribution made by rare variants increases when their effects on a trait and on fitness are correlated either through pleiotropy, or by the trait directly affecting fitness.18 The genetically informative evidence that is available tends to show that variants associated with intelligence are also linked to better health,19,20 although these effects may be outweighed by a negative effect on fertility,21, 22 and that the regions of the genome making the greatest contribution to intelligence differences have undergone purifying selection.23 Whereas this does not necessarily imply that intelligence has been selected for or against across our evolutionary history, it does indicate that genetic variants that are associated with intelligence are also associated with fitness, which suggests that rare genetic variants and hence mutation-selection balance, may act to maintain intelligence differences.18
Empirical studies so far have failed to find evidence of a link between intelligence and rare variants.24 These studies have often been limited in scope, with only copy number variants or exonic regions being considered, or being limited in statistical power because all rare variants were treated as having the same direction of effect through the use of burden tests.24–27 Where such tests have found an association these have been in small samples and subsequently failed to replicate.28 However, in large samples, rare variants found within regions of the genome under purifying selection have been found to be associated with educational success,29 an effect that was greater for genes expressed in the brain. Hence, rare variants found in some genes might have an effect on intelligence.
Less is known about the genetics of personality.30 As with intelligence, extraversion and neuroticism have been found to have higher heritability estimates, of around 34-48%, using quantitative (twin- and family-based) genetic methods31 compared to molecular genetic estimates (4 – 15% for neuroticism32 and 0 – 18% for extraversion2,33). Both extraversion and neuroticism are predictive of social and behavioural outcomes as well as anxiety, well-being and fertility.34-37 Positive genetic correlations have been reported for extraversion with attention deficit hyperactivity disorder and bipolar disorder, and for neuroticism with depression and anorexia nervosa.2
In the current study, we quantify the total genetic effect from across the genome on intelligence (including education, which shows strong genetic correlations with general intelligence38 and is used as a proxy phenotype for it in genetic studies39), extraversion, and neuroticism. We are able to include genetic variants not normally captured using GWAS. As our sample included nominally unrelated individuals with varying degrees of genetic similarity, as well as family members who all provided genome-wide SNP data, we were able to decompose two genetic sources of variance corresponding to genetic effects associated with common SNPs in a population (h2g), and genetic effects associated with kinship (h2kin). Among related individuals, linkage disequilibrium is stronger and hence allows us to capture variation not tagged by common SNPs. This includes rare variants, CNVs, and other structural variants. As the inclusion of family members can introduce a confound between shared genetic effects and shared environmental effects,40 we use the method employed by Xia and colleagues 41 to simultaneously estimate three sources of environmental variance: sibling effects, spouse effects, and family effects. By using information from both nuclear family relationships and the many more distant pedigree relationships in the cohort we analyse, this novel design allows us to estimate kin-specific genetic variation net of common environmental effects.
Materials and Methods
Samples
Data was used from the Generation Scotland: Scottish Family Health Study (GS:SFHS).42–44 A total of 24 090 individuals (Nmale = 9927, Nfemale = 14163, Agemean = 47.6) were sampled from Glasgow, Tayside, Ayrshire, Arran, and North-East Scotland of whom 23 919 have donated blood or saliva for DNA extraction. These samples were collected, processed, and stored using standard procedures and managed through a laboratory information management system at the Wellcome Trust Clinical Research Facility Genetics Core, Edinburgh.45 The yield of DNA was measured with a PicoGreen and normalised to 50ng/μl prior to genotyping. Genotype data were generated using an Illumina Human OmniExpressExome -8- v1.0 DNA Analysis BeadChip and Infinium chemistry.46 We then used an identical quality control procedure as Xia et al.41 that included removing SNPs in the event that they were not on autosomes or had a minor allele frequency (MAF) of <0.05, a Hardy-Weinberg Equilibrium P-value <10-6, and a missingness of >5%. As per Xia et al. 41 this left 519 729 common SNPs from 22 autosomes. Following quality control, a total of 20 032 individuals (n females = 11 804) were retained; 18 293 of these individuals were a part of 6 578 nuclear or extended families.47 The mean age of the sample was 47.4 years (SD = 15.0, range 18 to 99 years). As the variance attributable to the shared environment was explicitly modelled here, no relationship cut off (typically, 0.025 is used) was applied to the genetic relationship matrix (GRM).
Ethics
The Tayside Research Ethics Committee (reference 05/S1401/89) provided ethical approval for this study.
Phenotypes
A total of eight phenotypes were examined here. Six of the phenotypes were cognitive in nature and included general intelligence (g), education, the Mill Hill Vocabulary Scale (MHVS),48 the Wechsler Digit Symbol Substitution Task (DST),49 Wechsler Logical Memory which measures Verbal declarative memory,50 and executive function (phonemic Verbal fluency, using letters C, F, L).51 The general factor of intelligence (g) was derived by extracting the first unrotated principal component from the four cognitive tests. This single component accounted for 42.3% of the variance in the total sample and each of the individual tests used demonstrated strong loadings on the first unrotated component (DST 0.58, Verbal Fluency 0.72, MHVS 0.67, and Verbal declarative memory 0.63). Education was calculated in the GS:SFHS as the years of full time formal education which was recoded into an ordinal scale from 0 to 10 (0: 0 years, 1: 1-4 years, 2: 5-9 years, 3: 10-11 years, 4: 12-13 years, 5: 14-15 years, 6: 16-17 years, 7: 18-19 years, 8: 20-21 years, 9: 22-23 years, 10: > 24 years of education). Education and general intelligence were positively correlated (r = 0.38, SE = 0.01, p < 2.20 × 10−16).
The effects of age, sex and population stratification were controlled for by using regression prior to fitting the models using GREML. Supplementary section Figure 1 shows the number of principal components used to control population stratification for each of the phenotypes used.
Selected models plotted for each of the phenotypes included. The proportion of variance explained is on the x-axis with each of the phenotypes used on the y-axis. Each component from the selected model is plotted individually, with the stacked bar plot showing the total proportion of the variance explained by the selected models. Error bars indicate standard error.
Statistical method
Partitioning phenotypic variance into five sources
For each of the phenotypes examined here variance was partitioned into five corresponding effects plus residual variance. This variance components analysis is based on the work of Zaitlen and colleagues40 who developed a method for estimating h2g and h2kin in a data set with a measured family structure. More recently this method has been extended by Xia and colleagues41 to include sibling, spouse, and nuclear family environmental effects. The two genetic matrices described by Zaitlan et al. and Xia et al. correspond to those associated with common SNPs (h2g) and those associated with pedigree (h2kin) genetic variants. These two genetic sources of variance were quantified using a genetic relationship matrix derived in the GCTA software.52 Whereas h2g describes variance associated with common SNPs, and those that are in LD with genotyped SNPs on a SNP chip, h2kin describes variance from the additional genetic effects associated with pedigree.
Matrix construction
Genetic matrices
The contribution made by common SNPs, h2g, was quantified using a genomic relationship matrix (GRMg, or G). This was derived in the manner set out by Yang and colleagues,52 where the estimated genomic relatedness between each pair of individuals is derived from identity by state SNP relationships and is found in each off diagonal entry in the GRM.
Minor allele frequency for SNP i is denoted as pi and the allelic dose (x) for individuals j or k at locus i is described as xji or xki. N indicates the total number of SNPs.
The kinship relationship matrix, GRMkin, or K, was derived using the method described by Zaitlen et al., (2013) 40 by modifying the GRMg. Here, values in the GRMg that were equal to or less than 0.025 were set to 0.
Environmental matrices
Three environmental matrices (ERM) were used to capture the variance associated with specific relationships between individuals. Each ERM was created by deriving an N by N (where N is number of participants) matrix with diagonal entries set to 1and non-diagonal entries set to 1 if the pair of individuals are a part of the environmental relationship described or set to zero otherwise. The three ERMs derived here captured variance associated with the shared environment of spouses, (ERMCouple, or C), siblings (ERMSibling, or S), and nuclear families (ERMFamily, or G).
Deriving the quantity of phenotypic variance explained
For each trait we first fitted the two GRM and the three ERM simultaneously using a linear mixed model (LMM) implemented using the GCTA software.52, 53 This full model is referred to as the GKFSC model, as it includes the genetic, kinship, family, sibling, and couple matrices.
Here, Y is a vector of standardised residuals derived using one of the eight phenotypes examined here. Random genetic effects were explained by fitting the GRMg and the GRMkin, which captured variants in LD with common SNPs found across a population and the extra genetic effects captured by the increase in LD found between members of the same family, respectively. Random environmental effects that were shared between related pairs of individuals were captured by fitting the ERMFamily, ERMSibling, and ERMCouple to quantify the contributions made by environmental similarities between members of a nuclear family, siblings, and couples, respectively.
Restricted maximum likelihood (REML), implemented using the GCTA software,52 was used to estimate the variance explained by each of the matrices, with statistical significance being examined using a log-likelihood ratio test (LRT) and the Wald test. Model selection began with the full GKFSC model (referred to as the full model) and components were dropped if they were not statistically significant according to both the Wald and the LRT tests (The model that contained only components that explained a significant proportion of variance is referred to as the selected model). If more than one component could be dropped from the model, we dropped the one with the poorer fit first then tested the significance of the other. The full results of each model can be seen in Supplementary Table 1.
The variance components corresponding to h2g (common SNP-associated effects), h2kin (Pedigree associated genetic effects), ef2 (shared family environment effect), es2 (Shared sibling environment effect), and ec2 (shared couple environment effect) were estimated (Table 1).
Results of the variance components analysis for cognitive abilities using the full model and the final model selected from a stepwise selection procedure.
Results
The results of the full GKFSC models (consisting of the GRMg, GRMkin, ERMFamily, ERMSibling, ERMCouple), as well as the results of the selected models, can be seen in Table 1. For general intelligence (g) the final model was the GKSC model, allowing for a significant contribution from additive common genetic effects, additive pedigree-associated genetic variants, shared sibling environment, and a shared couple environment. For g, common SNPs (h2g) explained 23% (SE = 2%) of the phenotypic variation. Pedigree-associated genetic variants (h2kin) added an additional 31% (SE=3%) to the genetic contributions to g, yielding a total contribution of genetic effects on g of 54% (SE=3%). The net contribution of measured environmental factors to phenotypic variance in g was 35%. This was due to two sources of variance, shared sibling environment (es2) and shared couple environment (ec2), that accounted for 9 % (SE=1%), and 22% (SE=2%), respectively.
The GKSC model was also the selected model for education, vocabulary, verbal fluency, and digit symbol test. As with general intelligence, pedigree-genetic variants accounted for the majority of the total genetic contribution to phenotypic variation in these traits. Pedigree-associated genetic variants explained between 15% - 30% of the variation, and common SNP effects explained 16% - 26%. The genetic results, i.e. SNP and pedigree contributions combined, for g and education are similar to the heritability estimates derived using the traditional pedigree study design which found a heritability estimate of 54% (SE=2%) for g54 and 41% (SE=2%) for education using the same data set (Figure 2). This indicates that the genetic variants with the greater estimated cumulative effect on cognitive abilities are those that are poorly tagged on current genotyping platforms.
Bar plots showing the proportion of variance explained using family based methods and using molecular genetic data. Both of these analyses were performed using the same GS:SFHS data (n = 20 522, Education n = 22 406, current manuscript, g n = 19 036, Education n = 18 5280. Estimates shown in red were derived in the current study using GREML and show two sources of genetic variance. Bright red being common genetic effects captured by the GRMg matrix and dark red shows the additional genetic effects captured by exploiting the higher level of linkage disequilibrium between family members using the GRMkin matrix. The estimates in blue are taken from Marioni and colleagues54 and show the total genetic effects using ASReml-R when relatedness is inferred using identity by descent.
For logical memory the effect of shared couple environment was non-zero, but not significant, with the final selected model being GKS. Here, common SNP effects explained 12% (SE = 2%) of the variation. As with the other cognitive phenotypes considered here, pedigree-associated variants made a greater estimated contribution to the net genetic effect on logical memory, explaining 20% (SE=3%) of the variation. Sibling effects explained 5% (SE=1%) of the variation in logical memory.
For neuroticism the final model consisted of the G (GRMg), and K (GRMkin) contributions. Additive common genetic effects explained 11% (SE=2%) of the variance with pedigree-associated variants explaining an additional 19% (SE=3%). Whereas none of the environmental components was statistically significant, the family component accounted for 2% of the variance in the full model and 1% in a model that included only the G and the K matrices in addition to F. A lack of power may have occluded this effect.
For extraversion the only detectable source of genetic variation came from GRMg, where G accounted for 13% (SE=2%), with ERMFamiliy explaining a further 9% (SE=1%) of the phenotypic variation. The lack of pedigree-associated genetic effects could be due to low statistical power, as K explained 5% of the variance in the full model and 6% in a GKF model, but with a relatively large SE, estimated at 5%.
In addition to our model selection procedure, we also fit all possible component combinations for all phenotypes, to show a more complete account of the data and to give readers the ability to explore the consequences of including different components for the results, even when some of those components were not significant. The results have been made interactively available at https://rubenarslan.github.io/generation_scotland_pedigree_gcta/.
Discussion
The aim of this study was to use molecular genetic and pedigree data on the same large sample in order to decompose and quantify genetic and environmental sources of variation to intelligence and personality in a novel manner. In doing so, we sought to identify reasons for the gap between pedigree-based and SNP-based estimates of heritability in samples of unrelated individuals, a difference which might be due to genetic variants in poor linkage disequilibrium with SNPs genotyped on current platforms. This permits us to draw inferences about the evolutionary pressures that maintain general intelligence and personality differences. By making use of a large Scottish cohort that consists of close, distant, and spousal relationships, we were able to partition phenotypic variance of cognitive and personality measures into two genetic and three environmental sources of variance. A number of novel findings speak to long-standing questions in behaviour genetics and evolutionary genetics of psychological differences.15,30,55
Firstly, taken together, the two variance components derived directly from genome-wide molecular genetic data can account for the entire narrow-sense heritability of general intelligence, as estimated in twin and family studies.54,56 For all of the cognitive variables measured here, a substantial and significant proportion of the phenotypic variance was found to be explained by pedigree-associated genetic effects (h2kin). With the exception of the digit symbol test, these pedigree-associated genetic variants accounted for over half of the genetic effects.
The SNP-based methods of estimating heritability from unrelated individuals often produce lower heritability estimates than those derived using family-based studies. One reason for this is that population-based SNP methods, such as GREML, rely on LD between genotyped SNPs and causal variants at population level, and are sensitive to the frequency of causal alleles. Should LD between genotyped SNPs and causal variants be low, then the genetic similarity between a pair of individuals at the causal variant will be different to the genetic similarity at genotyped SNPs, resulting in a reduction in the heritability estimate in the studies group. In within-family and twin studies, relatedness is based on identity by decent (IBD), where segments of DNA have been inherited from a recent common ancestor. Should a region be IBD between a pair of individuals, then all variants except de novo mutations within that segment are shared. As population-based SNP methods are sensitive to allele frequency, where IBD methods are blind to such effects, the discrepancy between the heritability estimates is consistent with the idea that causal variants in low LD with genotyped SNPs account for this missing element of the heritability of intelligence differences.
In the current study we investigate if variants in poor LD with genotyped SNPs account for additional heritability, unmeasured in GWAS on unrelated individuals, by using DNA from close family members. Higher genetic relatedness within families leads to an increase in the LD between genotyped SNPs and potentially causal variants and resulted in heritability estimates in our study that are comparable to pedigree-based methods. This provides evidence that, for intelligence, the gap between the heritability estimates derived using IBD methods and those derived using SNP-based population methods is most likely due to causal variants in low LD with genotyped SNPs. In addition, we were able to model this missing variance and separate it from the additive common genetic effects that are estimated in a GREML analysis based on unrelated individuals. The additional source of additive genetic variance from closely related family members, captured here in our kinship matrix (GRMkin), would go unnoticed in a GWAS on unrelated individuals. This shows a need for GWAS on related individuals and for methods such as whole-genome sequencing to capture the individual effects of such variants. Whilst the use of related individuals can result in the confounding of pedigree genetic effects with shared family environmental effects, here we were able to distinguish the contributions made to phenotypic variance by pedigree-associated genetic variants from those by shared environment. Since we modelled three sources of environmental variance alongside the two genetic sources simultaneously, the variance that is due to a shared environment does not contribute towards our estimates of the genetic effects, as would be the case in instances where related individuals are included without adjusting for the shared environmental effects.41, 52
Furthermore, despite the level of confounding between the five matrices, we were able to correctly disentangle the contributions of each of the variance components. Simulations conducted by Xia et al. 41 show that this method provides robust results due to the dense relationships within the GS:SFHS cohort. The GS:SFHS is a family based cohort and the participants are related to varying degrees, including 1,767, 18,320, 7,851, 4,129, 3,950 and 11,032 pairs of couples, 1st, 2nd, 3rd, 4th and 5th degrees of relatives, respectively. Therefore, what is shared between ERMFamily matrix and GRMkin matrix are merely ~18k pairs of entries represented by 1st degree relatives. However, ERMFamily holds ~1.8k pairs of unique entries (couple pairs) and GRMkin holds ~23k pairs of unique entries (2nd-5th degree relative pairs of who were greater than 0.025 genetically identical), the unique entries from both matrices result in an increase of power to correctly disentangle the variance from those two different sources. An additional point is that the pedigree-associated genetic effects decay as the distance of the relationship increases, whereas nuclear family environmental effects do not. Thus, the fact that GS:SFHS consist of different classes of relatives, as well as the unique entries within the GRMkin and ERMFamily, helps to capture the property of pedigree-associated genetic variants. This logic extends to separating the variance from each of the environmental matrices. Although ERMCouple and ERMSib are nested within the ERMFamily, there are 9,853 pairs of unique entries (representing parents-offspring) existing within the ERMFamily, which helps to separate the environmental effects. Therefore, there are sufficient number of appropriate relationship in GS:SFHS to make sure this method works.
Xia et al. 41 showed that our method reliably identifies the major sources of variance that contribute to trait architecture. However, as with any method, with decreasing effect sizes they become harder to measure accurately and more power is needed for the reliable detection of small signals. This means that if one of the matrices here only contributes to a small proportion of the overall phenotypic variance (e.g. less than 5% in GS:SFHS) its contributions will not be estimated reliably and dropped in the model selection procedure. However, any excluded minor component in the final model will have only a limited influence on the estimates of the major components that are retained in the final model. Thus, the major components we detected for each trait should be estimated reliably.
For personality traits, the genetic components can explain slightly more (30%) than the narrow-sense heritability (22%) that was meta-analytically derived from family and adoption studies with heterogeneous measurements of personality.31 However, it falls short of the broad-sense heritability (47%31; 45%57). As previous research has suggested,31,58 this is consistent with epistasis playing a major role in personality genetics, as a non-additive genetic component is not captured well outside of twin studies. Previous research58 did not discuss gene-environment correlation (rGE) and interaction (GxE) as a plausible cause for heritability estimates being higher in twin studies than in adoption and family studies, presumably because the shared environment contribution to personality variation was usually estimated not to be different from zero. A more recent meta-analysis gives an estimate of 13%57 for shared environment that seems to be stable over age, so the difference between twin estimates of heritability and those presented here may also be explained to some extent by gene-by-environment interactions (GxE) and gene-environment correlations (rGE)30.
The additional variance explained by the GRMkin is unlikely to be due to common SNPs that are poorly tagged on current arrays because even with imputed data SNP heritability estimates of general intelligence are around 66% of that explained by twin models in the same sample.59 Using GREML-LDMS,60 and data on 43,599 participants with ~17 million SNPs imputed based on the 1000 Genomes reference panel, the heritability for height was found to be 56%, with 8% of this estimate being traced to rare variants with a minor allele frequency between 0.001 and 0.01. By imputing to 1000 Genomes, the same study estimated that 97% of common genetic variation was being captured, but only 68% of rare variation.60 Using ~500 000 genotyped SNPs, Xia and colleagues41 found that common variants explained 43% of the variance for height, whereas 45% of the variance was explained by pedigree-associated variants. These two studies together show there is only a modest increase in heritability by measuring or imputing further common SNPs, and that with 97% of common variants the heritability estimate is lower than what can be found in family studies. As the h2g plus h2kin estimates for height from Xia and colleagues41 exceed the total contribution from common variants, and closely approximate the estimates from twin studies of h2 = 0.89-0.9361, it seems more likely that the h2kin is not mainly driven by common genetic variants that are in low LD with genotyped variants but by rare variants, CNVs, and structural variation.
Our variance analyses are blind to the direction of effects and the number of variants involved in each genetic component. If, as we would predict, future work finds that variants with the lowest minor allele frequency tend to have the greatest negative effects, it would imply a coupling between the selection coefficient of alleles and their effect on intelligence, as selective pressure would act to minimise the frequency of highly deleterious variants. If this coupling were strong,62 future work might infer that selection on intelligence was important in the past, even though current selective pressure may differ. If the impact of intelligence on fitness were limited to instances of pleiotropy with, for example, health, as some initial research suggests,19, 20 the coupling between the selection coefficients of alleles and their effect sizes would be expected to be weaker. Selective pressure would act on the health-linked variants and intelligence-linked variants would only be selected to the extent of their pleiotropic effects on health. This would de-couple the selection coefficient of an allele and its effect on intelligence.
Future work can use the SNPs known to affect intelligence and personality16, 17 to empirically quantify the coupling between allele frequency (indicating selection strength) and effect size to test this explanation, as has been demonstrated for height and BMI.60 Targeted re-sequencing of enriched genetic regions23, 63, 64 might be necessary to find very rare genetic variants associated with intelligence and personality, as has proven fruitful for example in prostate cancer research.65 Future studies should test directly whether rarer SNPs have stronger negative effects on intelligence and personality, as has been shown for height and BMI.60 This could test whether selection has acted to minimise the frequency of variants that negatively affect these traits.
The sibling component, which was retained in all models of intelligence, tracks the meta-analytic estimate of shared environmental variance (11%) from twin studies almost exactly. However, in our study the sibling component might also include a quarter of the dominance variation in intelligence that siblings share, because siblings are the only relationship in this study where dominance plays a significant role.41 In the classical twin design, dominance variation (making dizygotic twins more dissimilar than half the similarity of monozygotic twins) can be obscured by shared environment effects (making DZs more similar). There is some evidence from other approaches that dominance only plays a minor role in intelligence differences.66–69
The family component was only retained in the model for Extraversion, although the point estimate was non-zero in the full Neuroticism model as well. This is consistent with meta-analytic estimates of shared environment for adults,57 but may also be due to some level of confounding between K and F, where the association between extraversion and the GRMFamily is due to contributions of the genetic factors accounted for by the GRMkin.
The couple component is somewhat complex to interpret. For intelligence70 and education71, there is evidence of assortative mating, which will increase both the genetic and environmental similarity between couples. The spousal similarity could explain still uncaptured genetic and environmental variance. If recent research on similarity in genetic propensities for education71 is a good guide, spousal similarity in intelligence may be mostly explained by genetic similarity. Apart from this, the spouse’s trait value may also serve as a good aggregate indicator of any effects the current environment has on a person, so that the couple component would also reflect recent environmental influences. The importance of shared environment with siblings appears to decline from childhood to adulthood,72 as individuals pick their environmental individual niches (i.e., active gene-environment correlation). It may be that the current environment, now shared with a spouse, still has causal influences. We find no couple component for personality, which is consistent with much weaker assortative mating on personality, especially neuroticism and extraversion.73–75
In the current study we were able to exploit the high LD found between members of the same family to measure the contribution of genetic effects that are normally missed in GREML analyses of GWAS data. We also simultaneously modelled the effect of the family, sibling, and couple environment to avoid potential environmental confounds inflating our estimates. For intelligence and education, we find that genetic variants poorly tagged on current genotyping platforms explained a substantial and significant proportion of the phenotypic variance, raising our heritability estimates to match those derived using pedigree-based methods. Such variants can include CNVs, structural variants, and rare variants. We find similar effects for neuroticism, a dimension of personality genetically correlated with many fitness traits,76 where pedigree-associated genetic variants explained 19% of variation. For extraversion, pedigree-associated variants appear to play a smaller role in phenotypic variation. These results suggest that mutation-selection balance has maintained heritable variation in intelligence and neuroticism, explaining why differences in these traits persist to this day despite selection. Future work should directly measure rare variants, as well as CNVs and structural variants, and test the direction of their effects.
Funding statement
This work was undertaken in The University of Edinburgh Centre for Cognitive Ageing and Cognitive Epidemiology (CCACE), supported by the cross-council Lifelong Health and Wellbeing initiative (MR/K026992/1). Funding from the Biotechnology and Biological Sciences Research Council (BBSRC), the Medical Research Council (MRC), and the University of Edinburgh is gratefully acknowledged. CCACE funding supports I.J.D. W.D.H. is supported by a grant from Age UK (Disconnected Mind Project). C.X. is funded by the MRC and the University of Edinburgh. Generation Scotland received core support from the Chief Scientist Office of the Scottish Government Health Directorates [CZD/16/6] and the Scottish Funding Council [HR03006]. Genotyping of the GS:SFHS samples was carried out by the Genetics Core Laboratory at the Wellcome Trust Clinical Research Facility, Edinburgh, Scotland and was funded by the Medical Research Council UK and the Wellcome Trust (Wellcome Trust Strategic Award “STratifying Resilience and Depression Longitudinally” (STRADL) Reference 104036/Z/14/Z) to AMM, IJD, CSH and DP. RCA and LP acknowledge support from the Bielefeld Center for Interdisciplinary Research group “Genetic and social causes of life chances”.
Footnotes
↵† These authors contributed equally Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, 7 George Square, Edinburgh EH8 9JZ, UK, T: %44 (131) 650 8405, E: David.Hill{at}ed.ac.uk