Functional consequences of genetic loci associated with intelligence in a meta-analysis of 87,740 individuals

Jonathan R. I. Coleman; Julien Bryois; Héléna A. Gaspar; Philip R. Jansen; Jeanne Savage; Nathan Skene; Robert Plomin; Ana B. Muñoz-Manchado; Sten Linnarsson; Greg Crawford; Jens Hjerling-Leffler; Patrick F. Sullivan; Danielle Posthuma; Gerome Breen

doi:10.1101/170712

Abstract

Variance in IQ is associated with a wide range of health outcomes, and 1% of the population are affected by intellectual disability. Despite a century of research, the fundamental neural underpinnings of intelligence remain unclear. We integrate results from genome-wide association studies (GWAS) of intelligence with brain tissue and single cell gene expression data to identify tissues and cell types associated with intelligence. GWAS data for IQ (N = 78,308) were meta-analyzed with an extreme-trait cohort of 1,247 individuals with mean IQ ∼170 and 8,185 controls. Genes associated with intelligence implicate pyramidal neurons of the somatosensory cortex and CA1 region of the hippocampus, and midbrain embryonic GABAergic neurons. Tissue-specific analyses find the most significant enrichment for frontal cortex brain expressed genes. These results suggest specific neuronal cell types and genes may be involved in intelligence and provide new hypotheses for neuroscience experiments using model systems.

Genome-wide association studies (GWAS) have successfully identified statistical associations with a wide range of behavioral phenotypes and neuropsychiatric disorders^1–3. Increasing sample sizes has begun to yield findings for intelligence^4–6. The largest and most recent study of intelligence reported 18 loci significantly associated with intelligence⁴. Significant genetic correlations were observed between intelligence and a variety of behavioral (educational attainment, smoking behaviors), anthropometric (cranial morphology, height, body composition), and psychiatric phenotypes (schizophrenia, autism, depressive symptoms), mirroring epidemiological evidence for correlations between intelligence and a broad range of health-related outcomes^4,7.

Considered alone, not all associations identified by GWAS precisely localize biological mechanisms amenable to subsequent experimentation. For instance, the most associated variant in a significant locus may not be the causal variant^8,9, there may be multiple causal variants in a locus¹⁰, loci may act through altering the expression of distant genes¹¹, and the associations identified by GWAS of complex traits are often spread across the genome^12–14. To extract meaningful biological inferences from GWAS results, it is necessary to integrate data from other sources, such as studies of gene expression¹⁵. Results from gene-wise analyses in Sniekers et al. (2017) identified expression predominant in the brain for 14 of the 44 genes with significant association, although some transcription was inferred for most genes across most tissues⁴. Integration of genomic results with data on biological pathways suggested a prominent role for nervous system development in intelligence.

In silico functional annotation of GWAS results is dependent on high-quality biological reference data. Recently, data from the Karolinska Institutet (KI) mouse superset of single-cell RNA sequencing (scRNAseq) of ∼10,000 single cells from multiple brain regions was used to map schizophrenia GWAS results to brain cell types¹⁶. Genes that previously showed association with schizophrenia were expressed with higher specificity in pyramidal cells, medium spiny neurons, and interneurons than in 20 other brain cell types. This demonstrates the potential of cell-type specific annotation to enable the construction of new functional hypotheses for complex traits.

We sought to develop a better understanding of the neurobiological underpinnings of intelligence through combining GWAS results with a number of data sources concerned with tissue and cell-type specific gene expression. To this end, we meta-analyzed the most recent GWAS of intelligence⁴ with an extreme-trait GWAS that compared individuals of very high intelligence to a group from the general population (HiQ)¹⁷. We then analyzed the enrichment of associations with intelligence in single cell expression data from the KI mouse superset¹⁶, and in brain genomic and transcriptomic data from the GTEx project¹⁸. Finally, we combined genomic and tissue-specific expression data with information from biological, disease-relevant and drug-target pathway databases to further assess the potential impact of biological mechanisms explaining variance in intelligence.

Results

Meta-analysis

The genetic correlation between Sniekers et al. (2017) and HiQ was 0.92 (SE: 0.07) and was sufficiently high to justify meta-analysis. 25 loci met genome-wide significance with intelligence (i.e., p < 5 × 10^-8; Table 1, Figure 1a). Of these, 13 were genome-wide significant in Sniekers et al. (2017) and 12 were novel (Table 1; Supplementary Table 1). The single locus previously identified in the HiQ sample was not genome-wide significant in this analysis (p = 0.00595;¹⁷).

View this table:

Table 1:

Genome-wide significant variants from single-variant association analyses. Loci significant in Sniekers et al. (2017) are in bold.

FreqA1 = A1 frequency in non-Finnish samples from the 1000 Genomes project. Dir = direction of effect in Sniekers and in TIP.

Figure 1:

a) Manhattan plot and b) QQ plot of meta-analysis results

Assessment of genome-wide inflation yielded a median λ_GC of 1.24. The LD score regression intercept from was 1.004 (SE: 0.01), suggesting that this inflation is caused by polygenicity rather than confounding (Figure 1b)¹⁹. Annotation of specific genomic loci to databases of interest suggested overlapping loci between intelligence and a variety of phenotypes (Supplementary Table 2). The largest overlap was observed with educational attainment (14/25 loci), but overlap at multiple loci was widespread, including with age at menarche, height, body mass index, autoimmune disease, and schizophrenia. Genes previously implicated in intellectual disability or developmental delay (ID/DD) were present within 9/25 loci.

Heritability and partitioned heritability

LD Score regression yielded a SNP-heritability estimate of 0.221 (SE: 0.01) in line with the previously reported SNP-heritability in Sniekers et al (2017). Partitioning this heritability across 58 functional SNP annotations identifies conserved regions as significantly enriched contributors to the heritability of intelligence (proportion of heritability = 0.340, enrichment = 13.3 fold, p = 3.26 × 10^-8), consistent with previous reports in a subset of our meta-analyzed cohorts²⁰. Four extended annotations were also significantly enriched (p < 8.62 × 10^-4; Figure 2), suggesting that genetic variation located in the vicinity of conserved regions, enhancers (specifically H3K4me1 elements) and open chromatin in brain dorsolateral prefrontal cortex²¹ and fetal cells) are enriched in heritability for intelligence.

Figure 2:

Heritability enrichment of genomic annotations. Horizontal line is p = 8.62×10^-4, Bonferroni-corrected threshold for 58 annotations. Vertical line is enrichment = 1 (that is, no enrichment)

Gene-wise analyses

93 genes at 41 loci were identified at genome-wide significance (p < 2.65 × 10^-6, Bonferroni threshold for 18,839 genes; Table 2, Supplementary Figure 1). 28 of these genes (20 loci) were also identified in Sniekers et al. (2017). 11 of the 93 genes were previously implicated in intellectual disability or developmental delay, although this overlap does not significantly from what is expected under the null hypothesis of no enrichment (p = 0.073, hypergeometric test; Table 2).

View this table:

Table 2:

Genome-wide significant genes from gene-wise association analyses. Genetic locus is that identified in single-variant analyses, or the genomic region otherwise. P value from competitive test in MAGMA. Loci in bold were significantly associated in Sniekers et al. (2017). Genes in italics have previously been implicated in developmental delay or intellectual disability.

Tissue- and cell-specific gene expression

Tissue-specific enrichment analysis identified an enrichment of genes with high brain-specific expression associated with intelligence (p_MAGMA = 4.43 × 10^-9, p_LDSC = 4.23 × 10^-6; Figure 3a, Supplementary Table 3a). Across 10 brain regions in GTEx¹⁸, stronger gene associations with intelligence were associated with increased specificity of gene expression to the frontal cortex (p_MAGMA = 0.00305, p_LDSC = 2.66 × 10^-4; Figure 3b, Supplementary Table 3b).

Figure 3:

Results of tissue- and cell-specific analyses. a) Whole-body analyses; b) Brain tissue analyses; c) KI level 1 cell analyses; d) KI level 2 cell analyses (only those significant in MAGMA competitive analyses after correction for multiple testing shown). Vertical lines are the Bonferroni threshold for each analysis. Results shown are from MAGMA analyses - tissues and cell-types also significant in LDSC analyses are indicated with †. Full results are shown in Supplementary Table 3.

In the KI level 1 (broad cell groups) cell-type specific analyses, both linear regression (MAGMA)²² and heritability enrichment-based analyses (LD Score)²³ supported enrichment of genes with high specificity to pyramidal neurons in the somatosensory neocortex (p_MAGMA = 1.41 × 10^-6, p_LDSC = 5.81 × 10^-4) and in the CA1 region of the hippocampus (p_MAGMA = 9.08 × 10^-5, p_LDSC = 1.12 × 10^-3), as well as to midbrain embryonic GABAergic neurons (p_MAGMA = 9.47 × 10^-5, p_LDSC = 1.61 × 10^-3; Figure 3c, Supplementary Table 3c). Level 2 analyses (narrowly-defined cell types) suggested significant enrichment of genes with high specificity to type pyramidal cells of the CA1 region (p_MAGMA = 2.16 × 10^-4, p_LDSC = 1.19 × 10^-4), although considerable variability was observed between methods at this level of granularity (Figure 3d, Supplementary Table 3d).

Cell-type specific analyses were repeated conditioning on each enriched cell-type in turn. When controlling for gene expression in pyramidal neurons of the somatosensory neocortex, the previously observed enrichment in CA1 pyramidal cells is lost. In contrast, the patterns of enrichment observed when conditioning on expression in CA1 pyramidal cells or in midbrain embryonic GABAergic neurons were consistent with those observed without conditioning (Figure 4).

Figure 4:

Conditional cell-type enrichment analysis. X-axis lists target cell-types. Y-axis lists other cell-types. Colors correspond to the enrichment probability of the other cell type after conditioning (P). Values of log(P) approaching zero indicate no enrichment after conditioning. Barplot on the right shows the minimum value of P for each cell-type across all conditional analyses (excluding analyses of the target cell-type with itself); the vertical line marks p=0.05. Red box highlights the loss of significant enrichment in CA1 pyramidal neurons when conditioning on somatosensory (SS) neurons.

Predicted tissue-specific gene expression

Across the 10 GTEx brain tissues, results from MetaXcan suggested significant effects on the expression of 16 genes (p < 1.60×10^-6; Table 3). Eight of these genes are situated at locus 11 (Chr3, 48.7-50.2 Mb). Genetic variation was predicted to upregulate 6 genes and downregulate 10 genes, with both single-region (9 genes) and multiple-region (7 genes) patterns of altered expression implied. Three genes (NAGA, TUFM and GMPPB) have previously been implicated in ID.

View this table:

Table 3:

Genes predicted to be upregulated (+) or downregulated (-) in specific tissues. Genetic locus is that identified in single-variant analyses, or the genomic region otherwise. Genewise = genome-wide significant in gene-wise analyses. Genes in italics have previously been implicated in developmental delay or intellectual disability.

Tissue-specific pathway analysis

Tissue-specific pathway analyses identified 32 nested pathways with p ≤ 5.34×10^-6 (Bonferroni correction for 9,361 effectively independent pathways), of which 29 contained genes that were significant in the gene-wise analysis (Supplementary Table 4; Supplementary Figure 2). 7 pathways remained significant after further correcting for all tissue-specific tests: “self-reported educational attainment”, “modulation of synaptic transmission”, “neurodegenerative disease”, “neuron spine”, “schizophrenia”, “rare genetic neurological disorder”, and “potentially synaptic genes” (Table 4).

View this table:

Table 4:

Pathways significantly enriched for genes associated with intelligence after correction for all tissue-effective pathways tests. Genes in italics have previously been implicated in developmental delay or intellectual disability.

Conditional gene set enrichment

Results from tissue-specific and gene-set analyses identified a number of gene sets associated with intelligence. Of specific interest were synaptic genes (post-synaptic density proteome list;²⁴), RBFOX family binding partners²⁵, CELF4 binding partners, and previously reported intellectual disability genes. Additional analyses were performed to assess whether the association of these gene sets was independent of the enrichment for gene expression in pyramidal cells of the somatosensory cortex. Bootstrapped analyses confirmed the association of each gene set prior to conditional analysis (empirical p < 0.05; Supplementary Table 5). Conditioning on expression in gene expression in pyramidal cells of the somatosensory cortex, the enrichment for synaptic genes is no longer significant, indicating that this enrichment is not independent of that observed in the pyramidal neurons. This effect was not observed for RBFOX or CELF4 targets. Intellectual disability genes were weakly enriched prior to conditioning, and conditioning had little effect on enrichment. However, subdividing ID/DD genes into those associated with severe and those associated with moderate ID/DD indicated that the enrichment stemmed predominantly from moderate ID/DD genes, and this was not altered by conditioning on somatosensory pyramidal neuron gene expression (Supplementary Table 5).

Discussion

The overall power of our GWAS meta-analysis was equivalent to a sample size of ∼99,000 individuals due to the inclusion of the extreme trait HiQ sample, which contributes equivalently to a population cohort of ∼21,000 individuals, and is likely to be enriched for alleles associated with intelligence in the normal range. We mapped results from our GWAS to tissue and cell-type specific gene expression data, identifying enrichment of specificity at multiple levels: in the brain, the frontal cortex, midbrain embryonic GABAergic neurons and pyramidal neurons, especially those in the somatosensory cortex. A number of genes previously implicated in intellectual disability or developmental delay are predicted from the GWAS results to show differential gene expression for normal range IQ in different brain regions, and are associated with variation in intelligence in the normal range from gene-wise analyses.

RNA sequencing data suggest that genes more strongly associated with intelligence are enriched for brain-specific expression in general. While the dominance of brain-specificity over other body tissue specificity is pronounced, assessing differences within the brain is more difficult. Genes more strongly associated with intelligence showed higher specificity for frontal cortical expression, but the differences in cell composition between brain tissues means that cell-type analyses may be more informative. This can be seen within our results, in that pyramidal neurons of the somatosensory cortex were significant in the cell-type specific analysis, but the cortex as a whole is not significant in the GTEx brain-tissue analysis. This is perhaps due to the fact that the cortex is a highly heterogeneous mixture of cell-types. Our results suggest expression in pyramidal neurons in one area of the cortex is relevant in intelligence, but expression in the other cell-types and areas of the cortex may not be²⁶. A further caveat to this interpretation is that the full cellular composition of the cortex (and the brain overall) is not reflected in the KI mouse superset, and as such our conclusions about wider effects must be constrained.

Genes with higher specificity to pyramidal neurons were enriched for associations with intelligence. However, the location of the most interesting neuron population is not yet clear. Observed enrichment in the pyramidal cells of the CA1 region of the hippocampus was lost when accounting for gene expression in pyramidal cells of the somatosensory cortex. An uncaptured population of pyramidal neurons (for example, in the frontal cortex) may exist that similarly overlaps in expression with the somatosensory pyramidal neurons, and accounts for the enrichment observed in the latter population. While the KI superset is the largest brain scRNAseq resource to date, it covers a limited set of regions and developmental stages¹⁶.

Tissue and cell-specific analyses were performed using related but distinct approaches, namely linear regression of binned specificity and heritability enrichment analysis of the top 10% of specific genes. Linear regression tended to give smaller p-values than heritability enrichment analysis (Supplementary Table 3). While this may be a statistical artefact generated by the differing assumptions of the methods, this could also result from the pattern of enrichment. For example, a small set of genes associated with intelligence, all with very high specificity to a given tissue would result in a lower p-value from heritability enrichment than if the association with intelligence was spread more broadly across genes with generally enriched tissue-specificity.

The results of analyses in the KI mouse superset for intelligence can be contrasted with those for schizophrenia¹⁶. Both phenotypes initially demonstrate enrichment for the same hippocampal and somatosensory pyramidal neuron populations. However, conditional analyses in schizophrenia demonstrated significant differences in the results between schizophrenia and intelligence - the enrichment observed in the somatosensory cortex in schizophrenia could be fully explained by that in the hippocampus, while the opposite pattern was observed in intelligence. Additionally, unlike in schizophrenia, genes implicated in intelligence also showed specificity in midbrain embryonic cells. Analyses in schizophrenia implicated more level two cell types than were significant in our analyses. Such differences may reflect differences in the common-variant contribution to variance in intelligence and schizophrenia; despite a similar effective sample size to the schizophrenia GWAS (40,675 cases, 64,643 controls, N_eff ∼ 100K), this GWAS identified only 25 loci compared with the 140 associated with schizophrenia²⁷ highlights the potentially more polygenic nature of the observed heritability of intelligence, which is approximately equal to that of schizophrenia.

The KI mouse superset has a number of strengths; it is the largest and broadest scRNAseq dataset to date, captures extra-nuclear as well as nuclear transcripts, and was generated using identical methods¹⁶. However, there are also limitations to the use of this dataset. One issue is that the expression data used is derived from mice rather than humans. Gene expression in the brain is conserved across mammals, such that the principal axes of variation in comparative studies of gene expression capture inter-tissue, rather than inter-species, variance²⁸. Furthermore, there is a high degree of conservation of gene expression between mouse and human brains specifically^16,29,30. Previous analyses using the KI mouse superset have made extensive comparison between mouse and human gene expression and found high concordance in mapping mouse to human genes¹⁶.

Nevertheless, cell types that are not enriched for genes associated with intelligence in this study should not be prematurely excluded, as some cell-types are not present in the dataset and others will have dissimilar functions or have been exposed to different evolutionary pressures in mouse and in human. Intelligence is a major characteristic that differentiates humans from other mammals³¹. As such, it may be the case that genes with higher specificity to regions dissimilar between humans and mice could be enriched for associations with intelligence, which would not be captured by this approach.

Our results highlight potential insights beyond those gleaned from tissue-specific expression patterns. Several genes previously implicated in ID/DD were present in loci associated in the GWAS. Perhaps the most interesting example of this is GMPPB (GDP-Mannose Pyrophosphorylase B), which is in locus 11 of the GWAS results, and was genome-wide significant in gene-wise analyses. It is a member of the “rare genetic neurological disorder” gene set (significantly enriched, specifically in neural tissues), and the expression of GMPPB was predicted to be downregulated in the cortex of individuals with higher intelligence. Rare loss of function mutations in the GMPPB gene have been identified as causal mutations in tens of individuals with muscular dystrophies and myasthenias, many of which present with mild to severe ID^32–36. The product of this gene is important for the glycosylation of a-DG (alpha-dystroglycan - the dystroglycan gene DAG is also present in locus 11 and is a significantly associated gene in this analysis;³²). Glycosylation is required for the interaction of a-DG with extracellular ligands, with a variety of consequences including the organization of axon guidance³⁷. Our results, in the context of the medical genetic literature, tentatively suggest the effects on axon guidance by glycosylated a-DG may be an area worthy of further exploration to understand the biology of intelligence.

However, the observed overall overlap of all ID/DD genes with loci from the GWAS does not differ from that expected by chance. This lack of significant enrichment is partly reflected in the pathway analysis - there are several overlapping gene sets designed to capture ID/DD genes. Of these, only “rare genetic neurological disorder” was significantly enriched following correction. Further insight is obtained from conditional gene-set analyses - although the overall “intellectual disability” gene set is only nominally associated with intelligence, stratifying this gene set demonstrates considerable enrichment of mild intellectual disability genes, independent of gene expression in somatosensory pyramidal neurons. The presence of genes causative of ID/DD in loci associated with intelligence in the normal range, and the enrichment of specific pathways and gene sets derived from the ID/DD literature, may indicate shared biology between ID/DD and normal intelligence. However, the lack of broad enrichment of all such genes suggests that there may also be distinct pathways contributing to normal intelligence that are not commonly affected in ID/DD. ID/DD is not a single condition, but a group of disorders with differing etiologies - our results are still consistent with a two-group model of ID/DD etiology^38–40.

The meta-analysis results presented herein extend previous findings⁴. The results are largely consistent with those previously reported, which is unsurprising. More genes were identified in our gene-wise analysis due to an analytical decision to extend the boundaries by which each gene is defined 35kb upstream and 10kb downstream of the coding region^41,42. Defining genes using this boundary (as opposed to no boundary extension) captures additional transcriptional elements - these may be specific to the target gene, but may also capture elements with more distal regulatory effects.

Deriving testable biological hypotheses from the statistical associations of GWAS results is one of the central challenges for the immediate future of complex genetics^3,15. The provision of high-quality reference datasets encompassing genetic information from variation to translation, and the integration of genomic data to such reference data is invaluable to this aim^16,18. We have demonstrated that some insights into the biology of intelligence can be derived from GWAS, and have suggested potential avenues for further exploration. Our results could indicate that intelligence represents optimal pyramidal neuron functioning. Cognitive tests are highly correlated with general intelligence (g), which may depend on pyramidal neuron function⁷.

Understanding how biology underlies variation in intelligence is an active area of research that is beginning to yield results. Unifying these new genetic results with data from multiple approaches, can increase the power of each approach, has the potential to yield greater understanding of the biology of intelligence, which in turn could inform the study of many health-related phenotypes with which intelligence is correlated.

Online Methods

Cohort descriptions

Sniekers intelligence GWAS⁴

The cohort analyzed in Sniekers et al. (2017) was drawn from 7 cohorts, primarily consisting of data from the UK Biobank pilot genotyping release (N = 54,119) and the Childhood Intelligence Consortium (N = 12,441) as well as seven additional cohorts (N = 11,748). The phenotype for analysis was Spearman’s g, or a primary measure of fluid intelligence that correlates highly with g^4,43. Summary statistics from this analysis are available at https://ctg.cncr.nl/software/summary_statistics. Full details on cohort characteristics, genotyping and analysis are supplied in the Supplementary Material.

HiQ high-intelligence GWAS

The Duke University Talent Identification Program (TIP) cohort has been described previously¹⁷. In brief, TIP is a non-profit organization that recruits and nurtures academically gifted children of extremely high intelligence (top 3%) from the US population. For genomic study, 1,247 participants from the top 1% of TIP (top 0.03% of population) were selected as a high-intelligence cohort (HiQ). IQ was inferred from performance on the Scholastic Assessment Test (SAT) or American College Test (ACT) taken at age 12 rather than the usual age of 18 years. A population comparison cohort (N = 8,185) was obtained from the The University of Michigan Health and Retirement Study (HRS). Participants were assumed to be drawn from the normal distribution of intelligence. Full details on genotyping and analysis are supplied in the Supplementary Material.