Abstract
Variance in IQ is associated with a wide range of health outcomes, and 1% of the population are affected by intellectual disability. Despite a century of research, the fundamental neural underpinnings of intelligence remain unclear. We integrate results from genome-wide association studies (GWAS) of intelligence with brain tissue and single cell gene expression data to identify tissues and cell types associated with intelligence. GWAS data for IQ (N = 78,308) were meta-analyzed with an extreme-trait cohort of 1,247 individuals with mean IQ ∼170 and 8,185 controls. Genes associated with intelligence implicate pyramidal neurons of the somatosensory cortex and CA1 region of the hippocampus, and midbrain embryonic GABAergic neurons. Tissue-specific analyses find the most significant enrichment for frontal cortex brain expressed genes. These results suggest specific neuronal cell types and genes may be involved in intelligence and provide new hypotheses for neuroscience experiments using model systems.
Genome-wide association studies (GWAS) have successfully identified statistical associations with a wide range of behavioral phenotypes and neuropsychiatric disorders1–3. Increasing sample sizes has begun to yield findings for intelligence4–6. The largest and most recent study of intelligence reported 18 loci significantly associated with intelligence4. Significant genetic correlations were observed between intelligence and a variety of behavioral (educational attainment, smoking behaviors), anthropometric (cranial morphology, height, body composition), and psychiatric phenotypes (schizophrenia, autism, depressive symptoms), mirroring epidemiological evidence for correlations between intelligence and a broad range of health-related outcomes4,7.
Considered alone, not all associations identified by GWAS precisely localize biological mechanisms amenable to subsequent experimentation. For instance, the most associated variant in a significant locus may not be the causal variant8,9, there may be multiple causal variants in a locus10, loci may act through altering the expression of distant genes11, and the associations identified by GWAS of complex traits are often spread across the genome12–14. To extract meaningful biological inferences from GWAS results, it is necessary to integrate data from other sources, such as studies of gene expression15. Results from gene-wise analyses in Sniekers et al. (2017) identified expression predominant in the brain for 14 of the 44 genes with significant association, although some transcription was inferred for most genes across most tissues4. Integration of genomic results with data on biological pathways suggested a prominent role for nervous system development in intelligence.
In silico functional annotation of GWAS results is dependent on high-quality biological reference data. Recently, data from the Karolinska Institutet (KI) mouse superset of single-cell RNA sequencing (scRNAseq) of ∼10,000 single cells from multiple brain regions was used to map schizophrenia GWAS results to brain cell types16. Genes that previously showed association with schizophrenia were expressed with higher specificity in pyramidal cells, medium spiny neurons, and interneurons than in 20 other brain cell types. This demonstrates the potential of cell-type specific annotation to enable the construction of new functional hypotheses for complex traits.
We sought to develop a better understanding of the neurobiological underpinnings of intelligence through combining GWAS results with a number of data sources concerned with tissue and cell-type specific gene expression. To this end, we meta-analyzed the most recent GWAS of intelligence4 with an extreme-trait GWAS that compared individuals of very high intelligence to a group from the general population (HiQ)17. We then analyzed the enrichment of associations with intelligence in single cell expression data from the KI mouse superset16, and in brain genomic and transcriptomic data from the GTEx project18. Finally, we combined genomic and tissue-specific expression data with information from biological, disease-relevant and drug-target pathway databases to further assess the potential impact of biological mechanisms explaining variance in intelligence.
Results
Meta-analysis
The genetic correlation between Sniekers et al. (2017) and HiQ was 0.92 (SE: 0.07) and was sufficiently high to justify meta-analysis. 25 loci met genome-wide significance with intelligence (i.e., p < 5 × 10-8; Table 1, Figure 1a). Of these, 13 were genome-wide significant in Sniekers et al. (2017) and 12 were novel (Table 1; Supplementary Table 1). The single locus previously identified in the HiQ sample was not genome-wide significant in this analysis (p = 0.00595;17).
Assessment of genome-wide inflation yielded a median λGC of 1.24. The LD score regression intercept from was 1.004 (SE: 0.01), suggesting that this inflation is caused by polygenicity rather than confounding (Figure 1b)19. Annotation of specific genomic loci to databases of interest suggested overlapping loci between intelligence and a variety of phenotypes (Supplementary Table 2). The largest overlap was observed with educational attainment (14/25 loci), but overlap at multiple loci was widespread, including with age at menarche, height, body mass index, autoimmune disease, and schizophrenia. Genes previously implicated in intellectual disability or developmental delay (ID/DD) were present within 9/25 loci.
Heritability and partitioned heritability
LD Score regression yielded a SNP-heritability estimate of 0.221 (SE: 0.01) in line with the previously reported SNP-heritability in Sniekers et al (2017). Partitioning this heritability across 58 functional SNP annotations identifies conserved regions as significantly enriched contributors to the heritability of intelligence (proportion of heritability = 0.340, enrichment = 13.3 fold, p = 3.26 × 10-8), consistent with previous reports in a subset of our meta-analyzed cohorts20. Four extended annotations were also significantly enriched (p < 8.62 × 10-4; Figure 2), suggesting that genetic variation located in the vicinity of conserved regions, enhancers (specifically H3K4me1 elements) and open chromatin in brain dorsolateral prefrontal cortex21 and fetal cells) are enriched in heritability for intelligence.
Gene-wise analyses
93 genes at 41 loci were identified at genome-wide significance (p < 2.65 × 10-6, Bonferroni threshold for 18,839 genes; Table 2, Supplementary Figure 1). 28 of these genes (20 loci) were also identified in Sniekers et al. (2017). 11 of the 93 genes were previously implicated in intellectual disability or developmental delay, although this overlap does not significantly from what is expected under the null hypothesis of no enrichment (p = 0.073, hypergeometric test; Table 2).
Tissue- and cell-specific gene expression
Tissue-specific enrichment analysis identified an enrichment of genes with high brain-specific expression associated with intelligence (pMAGMA = 4.43 × 10-9, pLDSC = 4.23 × 10-6; Figure 3a, Supplementary Table 3a). Across 10 brain regions in GTEx18, stronger gene associations with intelligence were associated with increased specificity of gene expression to the frontal cortex (pMAGMA = 0.00305, pLDSC = 2.66 × 10-4; Figure 3b, Supplementary Table 3b).
In the KI level 1 (broad cell groups) cell-type specific analyses, both linear regression (MAGMA)22 and heritability enrichment-based analyses (LD Score)23 supported enrichment of genes with high specificity to pyramidal neurons in the somatosensory neocortex (pMAGMA = 1.41 × 10-6, pLDSC = 5.81 × 10-4) and in the CA1 region of the hippocampus (pMAGMA = 9.08 × 10-5, pLDSC = 1.12 × 10-3), as well as to midbrain embryonic GABAergic neurons (pMAGMA = 9.47 × 10-5, pLDSC = 1.61 × 10-3; Figure 3c, Supplementary Table 3c). Level 2 analyses (narrowly-defined cell types) suggested significant enrichment of genes with high specificity to type pyramidal cells of the CA1 region (pMAGMA = 2.16 × 10-4, pLDSC = 1.19 × 10-4), although considerable variability was observed between methods at this level of granularity (Figure 3d, Supplementary Table 3d).
Cell-type specific analyses were repeated conditioning on each enriched cell-type in turn. When controlling for gene expression in pyramidal neurons of the somatosensory neocortex, the previously observed enrichment in CA1 pyramidal cells is lost. In contrast, the patterns of enrichment observed when conditioning on expression in CA1 pyramidal cells or in midbrain embryonic GABAergic neurons were consistent with those observed without conditioning (Figure 4).
Predicted tissue-specific gene expression
Across the 10 GTEx brain tissues, results from MetaXcan suggested significant effects on the expression of 16 genes (p < 1.60×10-6; Table 3). Eight of these genes are situated at locus 11 (Chr3, 48.7-50.2 Mb). Genetic variation was predicted to upregulate 6 genes and downregulate 10 genes, with both single-region (9 genes) and multiple-region (7 genes) patterns of altered expression implied. Three genes (NAGA, TUFM and GMPPB) have previously been implicated in ID.
Tissue-specific pathway analysis
Tissue-specific pathway analyses identified 32 nested pathways with p ≤ 5.34×10-6 (Bonferroni correction for 9,361 effectively independent pathways), of which 29 contained genes that were significant in the gene-wise analysis (Supplementary Table 4; Supplementary Figure 2). 7 pathways remained significant after further correcting for all tissue-specific tests: “self-reported educational attainment”, “modulation of synaptic transmission”, “neurodegenerative disease”, “neuron spine”, “schizophrenia”, “rare genetic neurological disorder”, and “potentially synaptic genes” (Table 4).
Conditional gene set enrichment
Results from tissue-specific and gene-set analyses identified a number of gene sets associated with intelligence. Of specific interest were synaptic genes (post-synaptic density proteome list;24), RBFOX family binding partners25, CELF4 binding partners, and previously reported intellectual disability genes. Additional analyses were performed to assess whether the association of these gene sets was independent of the enrichment for gene expression in pyramidal cells of the somatosensory cortex. Bootstrapped analyses confirmed the association of each gene set prior to conditional analysis (empirical p < 0.05; Supplementary Table 5). Conditioning on expression in gene expression in pyramidal cells of the somatosensory cortex, the enrichment for synaptic genes is no longer significant, indicating that this enrichment is not independent of that observed in the pyramidal neurons. This effect was not observed for RBFOX or CELF4 targets. Intellectual disability genes were weakly enriched prior to conditioning, and conditioning had little effect on enrichment. However, subdividing ID/DD genes into those associated with severe and those associated with moderate ID/DD indicated that the enrichment stemmed predominantly from moderate ID/DD genes, and this was not altered by conditioning on somatosensory pyramidal neuron gene expression (Supplementary Table 5).
Discussion
The overall power of our GWAS meta-analysis was equivalent to a sample size of ∼99,000 individuals due to the inclusion of the extreme trait HiQ sample, which contributes equivalently to a population cohort of ∼21,000 individuals, and is likely to be enriched for alleles associated with intelligence in the normal range. We mapped results from our GWAS to tissue and cell-type specific gene expression data, identifying enrichment of specificity at multiple levels: in the brain, the frontal cortex, midbrain embryonic GABAergic neurons and pyramidal neurons, especially those in the somatosensory cortex. A number of genes previously implicated in intellectual disability or developmental delay are predicted from the GWAS results to show differential gene expression for normal range IQ in different brain regions, and are associated with variation in intelligence in the normal range from gene-wise analyses.
RNA sequencing data suggest that genes more strongly associated with intelligence are enriched for brain-specific expression in general. While the dominance of brain-specificity over other body tissue specificity is pronounced, assessing differences within the brain is more difficult. Genes more strongly associated with intelligence showed higher specificity for frontal cortical expression, but the differences in cell composition between brain tissues means that cell-type analyses may be more informative. This can be seen within our results, in that pyramidal neurons of the somatosensory cortex were significant in the cell-type specific analysis, but the cortex as a whole is not significant in the GTEx brain-tissue analysis. This is perhaps due to the fact that the cortex is a highly heterogeneous mixture of cell-types. Our results suggest expression in pyramidal neurons in one area of the cortex is relevant in intelligence, but expression in the other cell-types and areas of the cortex may not be26. A further caveat to this interpretation is that the full cellular composition of the cortex (and the brain overall) is not reflected in the KI mouse superset, and as such our conclusions about wider effects must be constrained.
Genes with higher specificity to pyramidal neurons were enriched for associations with intelligence. However, the location of the most interesting neuron population is not yet clear. Observed enrichment in the pyramidal cells of the CA1 region of the hippocampus was lost when accounting for gene expression in pyramidal cells of the somatosensory cortex. An uncaptured population of pyramidal neurons (for example, in the frontal cortex) may exist that similarly overlaps in expression with the somatosensory pyramidal neurons, and accounts for the enrichment observed in the latter population. While the KI superset is the largest brain scRNAseq resource to date, it covers a limited set of regions and developmental stages16.
Tissue and cell-specific analyses were performed using related but distinct approaches, namely linear regression of binned specificity and heritability enrichment analysis of the top 10% of specific genes. Linear regression tended to give smaller p-values than heritability enrichment analysis (Supplementary Table 3). While this may be a statistical artefact generated by the differing assumptions of the methods, this could also result from the pattern of enrichment. For example, a small set of genes associated with intelligence, all with very high specificity to a given tissue would result in a lower p-value from heritability enrichment than if the association with intelligence was spread more broadly across genes with generally enriched tissue-specificity.
The results of analyses in the KI mouse superset for intelligence can be contrasted with those for schizophrenia16. Both phenotypes initially demonstrate enrichment for the same hippocampal and somatosensory pyramidal neuron populations. However, conditional analyses in schizophrenia demonstrated significant differences in the results between schizophrenia and intelligence - the enrichment observed in the somatosensory cortex in schizophrenia could be fully explained by that in the hippocampus, while the opposite pattern was observed in intelligence. Additionally, unlike in schizophrenia, genes implicated in intelligence also showed specificity in midbrain embryonic cells. Analyses in schizophrenia implicated more level two cell types than were significant in our analyses. Such differences may reflect differences in the common-variant contribution to variance in intelligence and schizophrenia; despite a similar effective sample size to the schizophrenia GWAS (40,675 cases, 64,643 controls, Neff ∼ 100K), this GWAS identified only 25 loci compared with the 140 associated with schizophrenia27 highlights the potentially more polygenic nature of the observed heritability of intelligence, which is approximately equal to that of schizophrenia.
The KI mouse superset has a number of strengths; it is the largest and broadest scRNAseq dataset to date, captures extra-nuclear as well as nuclear transcripts, and was generated using identical methods16. However, there are also limitations to the use of this dataset. One issue is that the expression data used is derived from mice rather than humans. Gene expression in the brain is conserved across mammals, such that the principal axes of variation in comparative studies of gene expression capture inter-tissue, rather than inter-species, variance28. Furthermore, there is a high degree of conservation of gene expression between mouse and human brains specifically16,29,30. Previous analyses using the KI mouse superset have made extensive comparison between mouse and human gene expression and found high concordance in mapping mouse to human genes16.
Nevertheless, cell types that are not enriched for genes associated with intelligence in this study should not be prematurely excluded, as some cell-types are not present in the dataset and others will have dissimilar functions or have been exposed to different evolutionary pressures in mouse and in human. Intelligence is a major characteristic that differentiates humans from other mammals31. As such, it may be the case that genes with higher specificity to regions dissimilar between humans and mice could be enriched for associations with intelligence, which would not be captured by this approach.
Our results highlight potential insights beyond those gleaned from tissue-specific expression patterns. Several genes previously implicated in ID/DD were present in loci associated in the GWAS. Perhaps the most interesting example of this is GMPPB (GDP-Mannose Pyrophosphorylase B), which is in locus 11 of the GWAS results, and was genome-wide significant in gene-wise analyses. It is a member of the “rare genetic neurological disorder” gene set (significantly enriched, specifically in neural tissues), and the expression of GMPPB was predicted to be downregulated in the cortex of individuals with higher intelligence. Rare loss of function mutations in the GMPPB gene have been identified as causal mutations in tens of individuals with muscular dystrophies and myasthenias, many of which present with mild to severe ID32–36. The product of this gene is important for the glycosylation of a-DG (alpha-dystroglycan - the dystroglycan gene DAG is also present in locus 11 and is a significantly associated gene in this analysis;32). Glycosylation is required for the interaction of a-DG with extracellular ligands, with a variety of consequences including the organization of axon guidance37. Our results, in the context of the medical genetic literature, tentatively suggest the effects on axon guidance by glycosylated a-DG may be an area worthy of further exploration to understand the biology of intelligence.
However, the observed overall overlap of all ID/DD genes with loci from the GWAS does not differ from that expected by chance. This lack of significant enrichment is partly reflected in the pathway analysis - there are several overlapping gene sets designed to capture ID/DD genes. Of these, only “rare genetic neurological disorder” was significantly enriched following correction. Further insight is obtained from conditional gene-set analyses - although the overall “intellectual disability” gene set is only nominally associated with intelligence, stratifying this gene set demonstrates considerable enrichment of mild intellectual disability genes, independent of gene expression in somatosensory pyramidal neurons. The presence of genes causative of ID/DD in loci associated with intelligence in the normal range, and the enrichment of specific pathways and gene sets derived from the ID/DD literature, may indicate shared biology between ID/DD and normal intelligence. However, the lack of broad enrichment of all such genes suggests that there may also be distinct pathways contributing to normal intelligence that are not commonly affected in ID/DD. ID/DD is not a single condition, but a group of disorders with differing etiologies - our results are still consistent with a two-group model of ID/DD etiology38–40.
The meta-analysis results presented herein extend previous findings4. The results are largely consistent with those previously reported, which is unsurprising. More genes were identified in our gene-wise analysis due to an analytical decision to extend the boundaries by which each gene is defined 35kb upstream and 10kb downstream of the coding region41,42. Defining genes using this boundary (as opposed to no boundary extension) captures additional transcriptional elements - these may be specific to the target gene, but may also capture elements with more distal regulatory effects.
Deriving testable biological hypotheses from the statistical associations of GWAS results is one of the central challenges for the immediate future of complex genetics3,15. The provision of high-quality reference datasets encompassing genetic information from variation to translation, and the integration of genomic data to such reference data is invaluable to this aim16,18. We have demonstrated that some insights into the biology of intelligence can be derived from GWAS, and have suggested potential avenues for further exploration. Our results could indicate that intelligence represents optimal pyramidal neuron functioning. Cognitive tests are highly correlated with general intelligence (g), which may depend on pyramidal neuron function7.
Understanding how biology underlies variation in intelligence is an active area of research that is beginning to yield results. Unifying these new genetic results with data from multiple approaches, can increase the power of each approach, has the potential to yield greater understanding of the biology of intelligence, which in turn could inform the study of many health-related phenotypes with which intelligence is correlated.
Online Methods
Cohort descriptions
Sniekers intelligence GWAS4
The cohort analyzed in Sniekers et al. (2017) was drawn from 7 cohorts, primarily consisting of data from the UK Biobank pilot genotyping release (N = 54,119) and the Childhood Intelligence Consortium (N = 12,441) as well as seven additional cohorts (N = 11,748). The phenotype for analysis was Spearman’s g, or a primary measure of fluid intelligence that correlates highly with g4,43. Summary statistics from this analysis are available at https://ctg.cncr.nl/software/summary_statistics. Full details on cohort characteristics, genotyping and analysis are supplied in the Supplementary Material.
HiQ high-intelligence GWAS
The Duke University Talent Identification Program (TIP) cohort has been described previously17. In brief, TIP is a non-profit organization that recruits and nurtures academically gifted children of extremely high intelligence (top 3%) from the US population. For genomic study, 1,247 participants from the top 1% of TIP (top 0.03% of population) were selected as a high-intelligence cohort (HiQ). IQ was inferred from performance on the Scholastic Assessment Test (SAT) or American College Test (ACT) taken at age 12 rather than the usual age of 18 years. A population comparison cohort (N = 8,185) was obtained from the The University of Michigan Health and Retirement Study (HRS). Participants were assumed to be drawn from the normal distribution of intelligence. Full details on genotyping and analysis are supplied in the Supplementary Material.
Meta-analysis
Summary statistics from Sniekers et al. (2017) and HiQ were meta-analyzed using METAL4,44. To account for the increased discovery power afforded by the extreme-sampling method used in TIP, analyses were weighted by their respective non-centrality parameters (NCP), estimated using the Genetic Power Calculator45–47. Specifically, NCPs were estimated assuming a causal variant of 20% frequency, capturing 0.1% of variance in each phenotype, assuming HiQ controls are drawn from the normal distribution (+/- 2 SD from the mean) and HiQ cases are sampled from 4 SD above the mean, consistent with IQ > 160 in 99% of the cohort17. The NCP of the Sniekers cohort (N = 78,308) was 78.4, while the NCP of the HiQ cohort was 21.6, suggesting the HiQ cohort contributes equivalently to a population cohort of ∼21,000 individuals. Only variants present in both cohorts were retained for analysis.
Following association analysis, genome-wide significant loci were defined via clumping in PLINK248. Index variants (p < 5×10-4) were merged into loci if in linkage disequilibrium (r2 > 0.1 within 500kb) with a variant with a lower p-value. Loci within 50kb of each other were merged. Manhattan and QQ plots were generated using FUMA (49. Annotation of genomic results with: data from the EBI GWAS catalog; OMIM; GENCODE genes; genes previously implicated in autism and in intellectual disability; copy-number variants previously implicated in psychiatric disorders; and mouse knockout phenotypes was performed with RegionAnnotator version 1.63 (https://github.com/ivankosmos/RegionAnnotator).
Heritability and partitioned heritability
The heritability of intelligence accounted for by common variants was estimated using LD Score, limited to the HapMap3 variants and pre-computed LD scores provided with the package19. Heritability was then partitioned across the 53 genomic annotations provided with the package23, with the addition of 5 annotations: open chromatin regions (ATAC and ATAC Bryois extend 500, which increases the window around the region by 500 bases in both directions), the intersection between ATAC and conserved regions of the genome (ATAC-conserved) and regions present in the Neanderthal genome (Neanderthal and Neanderthal extend 500;21,50. Regions of open chromatin were identified in prefrontal cortical tissue from 135 schizophrenic individuals and 137 controls using ATAC sequencing, which identifies stretches of DNA free of nucleosomes and other DNA-binding proteins21,51.
Gene-wise analyses
Results from the meta-analysis were filtered to retain only single nucleotide variants (SNPs) present in the European superpopulation of 1000 Genomes Phase 352. SNPs were annotated to a gene using MAGMA v1.06, assigning SNPs to genes if they lay between 35kb upstream and 10kb downstream of the gene location (as supplied on the MAGMA website, build 37;22. Gene-wise p-values were obtained from MAGMA as the aggregate of the mean and smallest p-value across all SNPs annotated to the gene. MAGMA accounts for possible confounders such as gene size, gene density, linkage disequilibrium and minor allele count. The threshold for genome-wide significance was defined as p = 2.65 × 10-6, the Bonferroni correction for the 18,839 genes tested. Genes passing genome-wide significance were defined as coming from the same locus if their locations were within 50kb of each other, or if they lay within clumped loci from the single variant analysis. Significant genes were cross-referenced to the intellectual disability (ID) gene list provided with RegionAnnotator. The significance of the observed overlap was quantified as a hypergeometric test in R53, using as background 1,366 ID/DD genes in 18,839 autosomal genes.
Tissue- and cell-specific gene expression
Tissue-specific and cell-type specific proportions of gene expression were calculated following the method described in detail in16). Tissue expression data was drawn from the GTEx Consortium18, and brain cell-type expression data was drawn from scRNAseq data from mouse brain16. For each gene, the value for each tissue (or cell-type) was calculated by dividing the mean Unique Molecular Identifier (UMI) counts for the given tissue by the summed mean UMI counts across all tissues16.
Associations between gene-wise p-values from the meta-analysis and tissue-specific (cell-type specific) gene expression were calculated using two methods, implemented in MAGMA22 and in LD Score19. In MAGMA, genes were grouped into 40 equal bins by specificity of expression, and bin membership was regressed on gene-wise association with intelligence in the meta-analysis (For these analyses, gene-wise association was defined as the mean p-value across all SNPs assigned to the gene.) In LD Score, the 10% of genes with the highest specificity within each tissue were used as a gene set for partitioned heritability analysis. Results were considered significant if the association p-values were smaller than the relevant Bonferroni threshold for both methods.
Conditional cell-specific analyses were performed as a secondary analysis to test whether each enriched cell-type observed was independent of all others. Full details of the method implemented are provided in Skene et al., 2017. In brief, for each enriched cell-type in turn (the target cell-type), z-scores from gene-wise association analyses with intelligence were randomly resampled without replacement. The mean z-score within each expression-specificity decile of the target cell-type was held constant, but the mean z-score of each specificity decile of other cell types was randomized. Empirical p-values were derived for each of the other cell-types, and this procedure was repeated 10,000 times. Expression in each of the other cell-types is considered to be associated with intelligence independently of expression in the target cell type if the observed p-value is lower than the 500th empirical p-value (i.e. 95% of the empirical distribution;16).
Predicted tissue-specific gene expression
Results from the variant-level meta-analysis were used to predict gene expression using MetaXcan and genomic and transcriptomic reference data from the brain regions assayed in the GTEx project18,54. Associations between predicted gene expression levels and intelligence were calculated. Significance was set at 1.60×10-6, the Bonferroni correction for the 31303 gene-tissue pairs tested54. Significant genes were cross-referenced to the intellectual disability gene list provided with RegionAnnotator.
Pathway analysis
A pathway matrix P was generated with elements Pg,p = 1 if gene g was in pathway p and Pg,p = 0 otherwise. The elements in the matrix were multiplied by binned gene expression weights obtained from GTEx data (as in the foregoing section on tissue-specific expression) for 13 brain regions and 32 tissues, generating 45 weighted gene/pathway matrices. Only genes with expression data were taken into account. Association between these tissue-weighted pathways and gene-wise associations with intelligence was computed using MAGMA. 13,564 pathways were drawn from OpenTargets (downloaded January 2017;55), GO ontologies, canonical pathways drawn from MSigSB v5.2 C2 and C5 datasets56, and biological pathways related to psychiatric disorders found in various scientific publications (a link to each pathway source is provided in Supplementary Table 4). The pathways assessed by this approach are related to each other in a complex fashion; GO pathways are hierarchical and MSigDB and OpenTargets pathways capture related gene sets. Accordingly, in order to control for multiple testing, the effective number of pathways tested was established by computing the number of principal components accounting for 99.5% of explained variance in the pathway similarity matrix, obtained by computing the Tanimoto similarity between pathways. This results in a Bonferroni-corrected threshold of p = 5.34×10-6 for 9,361 effectively independent tests for each matrix. A more stringent threshold was applied secondarily, taking into account all tissue-specific pathway matrices for a total for 9,361 × 46 tests and threshold p = 1.16×10-7.
Conditional gene set enrichment
Gene sets of interest were drawn from the results of pathway analyses. Specifically, the human postsynaptic density proteome gene set24 was used to capture the effects of synaptic and neuronal pathways. Gene sets of RBFOX and CELF4 targets were included as they showed enrichment that appears to be driven by brain-specificity, and have been previously implicated in other brain-related traits16,25,57. ID/DD gene sets were tested as they are of specific interest to the study of intelligence. To assess whether the enrichment of these gene sets is independent of gene expression in somatosensory pyramidal neurons, conditional analyses were performed following the method described above for conditional cell-type analyses, modified for the use of gene sets16.
Data availability
Summary statistics from the GWAS meta-analysis will be made available at [link available upon publication].
Author contributions
GB, DP and PFS conceived the study. JRIC, JB, HAG, PRJ, JS and NS performed statistical analyses. RP, ABM, SL, GC and JH acquired data. JRIC and GB wrote the manuscript. All authors reviewed the manuscript.
Acknowledgements
We gratefully acknowledge the contribution of all of the researchers and participants involved in the collection and analysis of the data included. This study includes data from Sniekers et al., 2017, which made use of the UK Biobank resource under application number 16406 (as previously acknowledged).
Analysis in this paper represents independent research funded by the National Institute for Health Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health. Analyses were performed using high performance computing facilities funded with capital equipment grants from the GSTT Charity (TR130505) and Maudsley Charity (980).
Analysis from Sniekers et al., 2017 was funded by the Netherlands Organization for Scientific Research (NWO VICI 453-14-005), and carried out on the Genetic Cluster Computer, which is financed by the Netherlands Scientific Organization (NWO:480-05-003), by VU University, Amsterdam, the Netherlands, and by the Dutch Brain Foundation and is hosted by the Dutch National Computing and Networking Services SurfSARA. PRJ is supported by the ‘Stichting Vrienden van Sophia’ (grant nr: 14-27) awarded to DP.
Research on the HiQ cohort was supported by a European Research Council Advanced Investigator award (295366) to RP. Collecting DNA from the highest-scoring TIP individuals was supported by an award from the John Templeton Foundation (13575) to RP.
Summary statistics from this analysis have been made available at [link available upon publication].
Footnotes
Conflict of Interest Statement PF Sullivan reports the following potentially competing financial interests: Lundbeck (advisory committee), Pfizer (Scientific Advisory Board member), and Roche (grant recipient, speaker reimbursement). G Breen reports consultancy and speaker fees from Eli Lilly and Illumina and grant funding from Eli Lilly. All other authors declare no financial interests or potential conflicts of interest.