Abstract
The associations between indices of brain structure and measured intelligence are not clear. In part, this is because the evidence to date comes from mostly small and heterogenous studies. Here, we report brain structure-intelligence associations on a large sample from the UK Biobank study. The overall N = 29,004, with N = 18,363 participants providing both brain MRI and cognitive data, and a minimum N = 7318 providing the MRI data alongside a complete four-test battery. Participants’ age range was 44-81 years (M = 63.13, SD = 7.48). A general factor of intelligence (g) was extracted from four varied cognitive tests, accounting for one third of the variance in the cognitive test scores. The association between (age-and sex-corrected) total brain volume and a latent factor of general intelligence is r = 0.275, 95% C.I. = [0.252, 0.299]. A model that incorporated multiple global measures of grey and white matter macro-and microstructure accounted for more than double the g variance in older participants compared to those in middle-age (13.4% and 5.9%, respectively). There were no sex differences in the magnitude of associations between g and total brain volume or other global aspects of brain structure. The largest brain regional correlates of g were volumes of the insula, frontal, anterior/superior and medial temporal, posterior and paracingulate, lateral occipital cortices, thalamic volume, and the white matter microstructure of thalamic and association fibres, and of the forceps minor.
1. Introduction
The association between brain volume and intelligence has been one of the most regularly-studied—though still controversial—questions in cognitive neuroscience research. The conclusion of multiple previous meta-analyses is that the relation between these two quantities is positive and highly replicable, though modest (McDaniel, 2005; Gignac & Bates, 2017; Pietschnig, Penke, Wicherts, Zeiler, & Voracek, 2015), yet its magnitude remains the subject of debate. The most recent meta-analysis, which included a total sample size of 8,036 participants with measures of both brain volume and intelligence, estimated the correlation at r = 0.24 (Pietschnig et al., 2015). A more recent re-analysis of the meta-analytic data, only including healthy adult samples (N = 1,758), found a correlation of r = 0.31 (Gignac & Bates, 2017). Furthermore, the correlation increased as a function of intelligence measurement quality: studies with better-quality intelligence tests—for instance, those including multiple measures and a longer testing time—tended to produce even higher correlations with brain volume (up to 0.39). In a meta-analysis, issues of cross-cohort heterogeneity might have an important bearing on the magnitude of the correlation.
Here, we report an analysis of data from a large, single sample with high-quality MRI measurements and four diverse cognitive tests. We use latent variable modelling to create a general intelligence (‘g’) factor from the cognitive test and estimate its association with both total brain volume and several more fine-grain imaging-derived indices of brain structure. We judge that the large N, study homogeneity, and diversity of cognitive tests relative to previous large scale analyses provides important new evidence on the size of the brain structure-intelligence correlation. By investigating the relations between general intelligence and characteristics of many specific regions and subregions of the brain in this large single sample, we substantially exceed the scope of previous meta-analytic work in this area.
There is considerable debate about what the association between brain size and general intelligence means. It is unclear, for example, whether brain size is a direct proxy for neuron number (discussed in Pietschnig et al., 2015). There is also an apparent paradox that there are substantial sex differences in total brain volume (on the order of 1.41 standard deviations; Ritchie et al., 2018) but no sex differences in mean intelligence (Johnson et al., 2009). More recent work indicates that multiple brain properties might be required to better explain individual differences in general intelligence, and some of these might be compensatory for differences in overall brain size (Luders et al., 2004; Deary et al., 2010; Kievit et al., 2012, 2014; Ritchie et al., 2015). For example, in an older cohort with a narrow age range (N = 672), Ritchie et al. (2015) found that incorporating multiple global, but tissue-specific, brain MRI measures (including tissue volumes, measures of white matter microstructure, and hallmarks of brain ageing) accounted for up to 21% of the variance in general intelligence, which was substantially higher than could be accounted for by total brain size alone (∼12%).
Such results, combined with indications that there is regional heterogeneity in the magnitude of intelligence associations across both grey and white matter (Jung & Haier., 2007; Deary et al., 2010; Basten et al., 2015; Karama et al., 2011; Cox et al., 2018; Ryman et al., 2016), extend the focus beyond a single, well-replicated proxy (total brain volume) and toward tissue-(and region-) specific associations with general intelligence. One of the most influential accounts of the neurobiological underpinnings of general intelligence (also known as general intelligence or “g”) has been the Parieto-Frontal Integration Theory (P-FIT; Jung & Haier, 2007). The P-FIT was initially based on a synthesis of disparate structural and functional brain imaging results. However, none of the Brodmann regions implicated in intelligence were supported by more than 60% of the studies reviewed, which the authors pointed out might be considered a relatively weak consensus.
The P-FIT model implicates, the following regions as being associated with intelligence differences: the lateral frontal, superior temporal, medial temporal, parietal and extrastriate (lateral occipital) regions, along with the white matter tracts that connect them. Specific reference was originally made to the arcuate fasciculus; this pathway is variously described as being just adjacent to the superior longitudinal fasciculus, or as one of the components thereof (Kamali et al., 2014; Dick and Tremblay, 2012). Together, these form a ‘dorsal stream’ of anterior-posterior cortical connectivity. Alongside other fibres such as the inferior longitudinal, inferior fronto-occipital, uncinate, and cingulum fasciculi, these ‘association’ fibres—along with the genu of the corpus callosum (forceps minor)—facilitate connectivity across the distal cortical regions highlighted by the P-FIT model. The model has generally received support from subsequent work (Deary et al., 2010; Basten et al., 2015; Karama et al., 2011; Cox et al., 2018; Ryman et al., 2016).
As with the meta-analyses on brain volume and intelligence described above (Pietschnig et al., 2015, Gignac and Bates, 2017), the broad heterogeneity of studies on the P-FIT might produce a less precise picture of the brain basis of cognitive abilities. Further evaluation of the model would greatly benefit from large-sample research that investigates the grey and white matter components of this putative intelligence framework, together in the same analysis. We conduct that analysis in the present study.
We capitalize on data from the UK Biobank study, a large-scale biomedical study of health and wellbeing, which includes brain MRI and various measures of cognitive ability. The UK Biobank participants have completed various cognitive measures; originally, they were administered a battery of bespoke tests with relatively poor reliability (Lyall et al., 2016). Using an earlier data release (Ritchie et al., 2018), we previously estimated the correlation between brain size and one of those tests, “Fluid Intelligence” (which we refer to as Verbal-Numerical Reasoning) to be r = 0.177. We found that the correlation did not differ by sex. Another study using an earlier release of UK Biobank imaging data examined the association between Verbal-Numerical Reasoning and brain size, reporting a correlation of r = 0.19 (N = 13,608; Nave et al., 2018). In addition, analyses of regional white and 10 grey matter measures have been reported with respect to Verbal-Numerical Reasoning in an earlier UK Biobank release; however, the authors of that study cited several reasons to doubt that this test, in isolation, is a valid indicator of fluid cognitive ability (Kievit et al., 2018; see also Hagenaars et al., 2016).
In this pre-registered study, we use a newer subset of UK Biobank participants who have completed an enhanced cognitive assessment battery at their brain imaging assessment. The overlap of the complete cognitive battery and the various MRI measures ranges from N = 8165 to 7318 following exclusions that are described below. Their data have only recently been released, and have not previously been analysed by our team. The enhanced cognitive battery includes three new measures based on standardised cognitive tests: Symbol-Digit Substitution, Matrix Reasoning, and Trail-Making. These three tests, combined with the previous Verbal-Numerical Reasoning measure, allows the estimation of brain imaging associations with a latent factor of general intelligence (g), that arguably gives coverage of the cognitive domains of reasoning, processing speed, working memory, and executive function. In a large sample size, the current study design thus: results in a better-quality cognitive measure than was previously possible in the UK Biobank data; mitigates variability in the administration and measurement of cognitive and brain imaging constructs (potentially allowing for stronger brain-intelligence correlations; Gignac & Bates, 2017); and, given the detailed brain imaging measures available, facilitates a detailed estimate of the global and regional brain correlates of latent general intelligence.
Our analyses followed a preregistered protocol and 4 hypotheses (https://osf.io/w7evd/). First, we tested whether the four cognitive tests were correlated moderately-highly (r > 0.40), and formed a latent general factor that explains 40% or higher of the variance across tests. Next, we examined the association between this cognitive factor and total brain volume, and we hypothesised that there would be no significant sex difference in the size of the brain-cognitive correlation for any of the models. We then hypothesised that different global measures of grey and white matter would each account for significant unique variance in g. We then aimed to test associations between general intelligence and brain grey and white matter regional measures, hypothesising that the strongest associations would concur with regions implicated by the Parieto-Frontal Integration Theory of intelligence (P-FIT; Jung & Haier, 2007).
2. Methods
The UK Biobank study is a large-scale biomedical study of health and wellbeing, which includes brain MRI and measures of cognitive function (Sudlow et al., 2015). Cognitive tests and brain imaging data were acquired on the same assessment day. The tests used here were administered at the UK Biobank brain imaging assessment. The imaging assessment took place at 3 different assessment centres. The majority were in Manchester, with more recent appointments now also taking place in Newcastle, and most recently in Reading. Cognitive tests were administered to participants working independently on a touchscreen computer with no tester observing. MRI data was acquired at the three sites using the same hardware and software. The current data release from UK Biobank initially included 30,316 participants who attended the scanning appointment, i.e. they had a record for age at scanning. Following exclusions, the total N = 29,004. The minimal N with complete cognitive-MRI overlap was N = 7318; further information is provided in Statistical Analysis, Table 1 and Figures 1 and S1. UK Biobank Field IDs are listed in Table S1. Most of them had scores for the Verbal-Numerical Reasoning test (always administered at the MRI appointment). Many fewer had the more recently-introduced cognitive tests (Matrix Pattern Completion, Symbol-Digit and Trail Making Test Part B). Missing MRI data was due to the lag between MRI acquisition and its subsequent processing for release by the UK Biobank. All data and materials are available via UK Biobank (http://www.ukbiobank.ac.uk).
2.1 Cognitive Tests
The four cognitive tests used in the current study were: Matrix Reasoning, Symbol-Digit Substitution, Verbal-Numerical Reasoning, and Trail-Making Test. Specifically, for the Trail-Making Test, we used part B, since this test includes both elements of speed and executive functioning (Salthouse, 2011).
Matrix Pattern Completion
The UK Biobank Matrix Reasoning test is an adapted version of the Matrices test in the COGNITO battery (Ritchie et al., 2014). This test of non-verbal fluid reasoning requires participants to inspect a grid pattern with a piece missing in the lower right-hand corner and select which of the multiple choice options at the bottom of the screen completes the pattern both horizontally and vertically. This 15-item test assesses participants’ ability to problem solve using novel and abstract materials. The score is the number of correctly answered questions in three minutes.
Symbol-Digit Substitution
was used as a measure of processing speed. It is similar in format to the Symbol Digit Modalities Test (Smith, 1991), which is a well-validated measure of processing speed. At the top of the screen, participants were shown a key pairing shapes with numbers. Beneath the key were rows of shapes with an empty box under each shape. Using the key, participants had 60 seconds to enter the number in the empty boxes that are paired with the shapes. Participants were instructed to work as quickly and as accurately as possible. The score is the number of correct symbol-digit matches made in 60 seconds.
Verbal-Numerical Reasoning (referred to as “Fluid Intelligence” in UK Biobank)
Participants were presented with 13 multiple choice questions assessing verbal and numerical reasoning abilities. The score is the number of questions answered correctly in 2 minutes.
Trail-Making Test Part B
This test is a computerised version of the Halstead-Reitan Trail-Making Test (Reitan & Wolfson, 1985). It is often said to be an assessment of executive function. In part B, participants were presented with the numbers 1-13, and the letters A-L arranged quasi-randomly on a computer screen. The participants were instructed to switch between touching the numbers in sequential order, and the letters in alphabetical order (e.g., 1-A-2-B-3-C) as quickly as possible. The score is the time (in deci-seconds) taken to successfully complete the test. Those with a score coded as 0 (denoting “Trail not completed”) had their score set to missing.
2.2 Brain Imaging Acquisition and Analysis
All brain MRI data were acquired on a Siemens Skyra 3T scanner with a standard Siemens 32-channel head coil, in accordance with the open-access protocol (http://www.fmrib.ox.ac.uk/ukbiobank/protocol/V4_23092014.pdf), documentation (http://biobank.ctsu.ox.ac.uk/crystal/docs/brain_mri.pdf), and publication (Alfaro-Almagro et al., 2018). T1-weighted MPRAGE data was acquired in the sagittal plane at 1mm isotropic resolution; the T2-weighted FLAIR acquisition at 1.05 × 1 × 1 mm resolution, was also acquired in the sagittal plane. The diffusion MRI (dMRI) data was acquired using a spin-echo echo-planar sequence with 10 T2-weighted (b ≈ 0 s mm2) baseline volumes, 50 b = 1000 s mm-2 and 50 b = 2000 s mm-2 diffusion-weighted volumes, with 100 distinct diffusion-encoding directions and 2 mm isotropic voxels. We used global and regional brain Imaging Derived Phenotypes (IDPs) provided by the UK Biobank brain imaging team: total brain volume (TBV, which is the sum of grey and white matter and excludes cerebrospinal fluid), grey matter volume (GM), and white matter volume (WM) from FSL FAST (Zhang et al., 2001), 14 subcortical volumes using FSL FIRST (Patenaude et al., 2011) and white matter hyperintensity volume (WMH) using BIANCA (Griffanti et al., 2016), which uses both T1-weighted and T2-weighted volumes. We estimated normal-appearing white matter volume (NAWMV) as the difference between total WMV and WMHV. Regional brain information was also available as UK Biobank IDPs in the form of tract-averaged fractional anisotropy and mean diffusivity for each of 27 white matter tracts using AutoPtx (de Groot et al., 2013), and as individual grey matter cortical segmentations, derived using FSL FAST, using the Harvard-Oxford Atlas (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/Atlases). These white matter tracts and cortical regions are shown in Figure 2.
2.3 Statistical Analyses
Outliers (+/-4SDs) were removed from brain and cognitive measures. All individuals who reported any of the neurological or neurodegenerative health conditions listed in Supplementary Material (Table S1) were removed prior to analysis. All participants were of White European ancestry. Of an initial 30,316 of the UK Biobank participants who had a record of age at attending the imaging assessment, 29,004 provided data on at least one of the primary variables of interest (global brain imaging or cognitive) following exclusions, and 18,363 had at least one cognitive test and MRI data. All structural equation modelling (SEM) analyses were conducted using Full Information Maximum Likelihood (FIML) estimation within R, using the lavaan package for SEM (Rosseel, 2012). Throughout, indicators were corrected for age and sex, with the MRI variables also corrected for scanner head positioning confounds (X, Y and Z coordinates provided by the UK Biobank team: UKB IDs: 25756, 25757, 25758). In contrast to our pre-registration, covariates (age, sex, MRI confounds) were applied to manifest variables within SEMs (i.e. we did not need to residualise data outside the model to enable model convergence and fit). FIML takes advantage of all available data, including data from participants who are missing data on some of the dependent variables. Model fits were assessed with a chi-squared test, Root Mean Square Error of Approximation (RMSEA), the Comparative Fit Index (CFI), the Tucker-Lewis Index (TLI), and the Standardized Root Mean Square Residual (SRMR).
2.3.1 Estimating a latent general factor of general intelligence, ‘g’
We performed a confirmatory factor analysis (CFA) of the 4 cognitive tests: Symbol-Digit Substitution, Matrix Reasoning, Trail-Making Test Part B, and Verbal-Numerical Reasoning. We hypothesised that the four tests would correlate moderately-highly (with intercorrelations of r > .40), and would form a single latent general factor explaining ∼40% of the variance across the 4 tests, with good fit to the data (CFI and TLI > 0.95, SRMR and RMSEA <0.05). We ran a version without, and then with age and sex correction at the manifest level. Since principal components analyses (PCAs) are commonly also used in intelligence research (e.g. Nave et al., 2018), but do not separate common and test-specific variance, we also provide a PCA estimate of g (tests not corrected for age and sex) using the first unrotated principal component, for comparison with the CFA.
2.3.2 Associations between general intelligence (g) and global brain MRI measures
Next, we estimated the association of the latent general intelligence factor (‘g’) with total brain volume, and then 6 global measures of grey and white matter: grey matter, normal-appearing white matter and white matter hyperintensity volumes (TBV, GM, NAWM, WMH), and general factors of fractional anisotropy and mean diffusivity (gFA and gMD). The general factors of white matter microstructure were formed from diffusion indices for white matter pathways of interest, which were extracted from confirmatory factor analysis, as previously described (Cox et al., 2016). We tested each individual brain-g association, i.e. we fitted a separate SEM for each brain MRI measure. We then fitted a single SEM in which all indicators contributed to g variance; a so-called Multiple Indicators Multiple Causes (MIMIC; Muthen, 1989) model. We excluded TBV from this, to avoid model fit and theoretical part-whole issues. Significant correlated residual paths among the imaging variables estimated from modification indices were included. False-Discovery Rate (FDR; Benjamini and Hochberg, 1995) correction of the p-values (implemented using the p.adjust function in R) was applied across the six bivariate associations of interest, and then across each of the path estimates in the multivariate SEM. Manifest variables were corrected as described above.
We then conducted an additional—non-pre-registered—analysis, to investigate whether the substantially lower proportion of g variance accounted for by multiple global MRI measures in this sample, when compared to our prior work in an older cohort (Ritchie et al., 2015), was due to moderating effects of age. We split the sample by age into groups with equal-sized cognitive-MRI overlap (middle age N = 10,164; older age N = 10,166) to ensure no imbalance in statistical power (above and below age 63.29 years). Initially, we tested for measurement invariance of g between the two age groups. Specifically, we were interested in weak factorial invariance (equal factor loadings) rather than strong (equal loadings and intercepts; as defined by Widaman, Ferrer & Conger, 2010), given the expected difference in cognitive performance between middle and older participants. We did this by comparing two multi-group SEMs; in the first, the cognitive test loadings on g were freely estimated, whereas in the second they were constrained to be equal in the middle and older-aged groups. We used a chi-squared test, the Akaike Information Criterion (AIC), and the sample-adjusted Bayesian Information Criterion (saBIC), and an additional check of factor congruence (coefficient of factor congruence; Lorenzo-Seva & ten Berge, 2006) using the ‘psych’ package, to test whether there was a difference between these sub-models.
Congruence coefficients index the similarity between factor solutions; a congruence coefficient greater than .90 indicates an extremely high level of similarity of the factor solutions. Next, we used a different set of two multi-group models to test whether the associations between g and the MRI measures differed as a function of age. In the first, the g-brain associations were freely estimated, and in the second, they were constrained to be equal between the two age groups. All measures were corrected for sex, and the MRI measures were also corrected for MRI head position. The group g loadings were set to equality. We ran this test for both a single g-TBV association, and then where multiple global MRI measures (GM, NAWM, WMH, gFA, gMD) predicted g. Differences between the two multi-group SEMs were assessed with a chi-squared test, the AIC, and the saBIC.
2.3.3 Sex differences in g-brain MRI associations
We then investigated sex differences in the size of the brain-cognitive associations. Before doing so, we tested for measurement invariance between the sexes by creating a multi-group SEM including just the cognitive tests, with sex as the grouping variable, and tested for strong measurement invariance (as defined by Widaman, Ferrer & Conger, 2010). If strong measurement invariance was found (i.e., the model with strong invariance does not fit significantly more poorly, by a chi-squared test, the AIC, and the saBIC, than one where factor loadings and intercepts are freely-estimated), we aimed to test a set of models where the brain-cognitive associations was fixed to equality across the two sub-models grouped by sex, and one where it was freely-estimated. We used a chi-squared test, the AIC, and the saBIC to test whether there was a difference between these sub-models (thus indicating that there is a sex difference). For these analyses, the variables were adjusted for all the above-mentioned covariates except sex.
2.3.4 Associations between g and regional brain MRI measures
Finally, we examined associations between g and regional brain measures: i) the fractional anisotropy and mean diffusivity in 27 white matter pathways, ii) cortical volumes of 48 regions according to the Harvard-Oxford cortical atlas segmentations, and iii) 14 subcortical volumes (bilateral nucleus accumbens, amygdala, caudate, hippocampus, pallidum, putamen, thalamus). We applied FDR correction within each family of tests (across all cortical tests, and separately across the 27 tests of WM tracts for FA, and then for MD, and across all subcortical tests). We hypothesised that the associations between general intelligence and brain volumes across the cortex would be consistent with the Parieto-Frontal Integration Theory (P-FIT; Jung & Haier, 2007), and be strongest in lateral frontal, superior parietal and temporal regions. Likewise, we hypothesised that thalamic and association fibres, plus forceps minor will show the statistically largest associations with general intelligence. The additional use of subcortical volumes was an addition to our pre-registered plan; subcortical structure did not figure largely in the P-FIT (Jung & Haier., 2007), though more recent work has reported associations between intelligence and overall subcortical volume (Ritchie et al., 2015), caudate (Basten et al., 2015; Grazioplene et al., 2015; Rhein et al., 2014), hippocampal (Valdés Hernández et al., 2017) and thalamic volume (Bohlken et al., 2014).
3. Results
3.1 Estimating a latent general factor of general intelligence, ‘g’
Participant characteristics are shown in Table 1. The cognitive tests were all correlated with medium effect sizes according to Cohen (1992): the Pearson’s r range was |0.300 to 0.405|. A first principal component (without age and sex correction) accounted for 55% of the variance, with loadings ranging from |0.71 to 0.80| (Table S2). The confirmatory factor analysis, in which each indicator was not corrected for age and sex, had two fit indices (TLI and RMSEA) outside our pre-registered criteria (CFI = 0.973, TLI = 0.918, RMSEA = 0.078, SRMR = 0.024). Modification indices suggested the addition of a residual correlation between Verbal-Numerical Reasoning and Matrix Reasoning (r = 0.170), following which model fit was above our pre-registered threshold across all fit indices (CFI = 0.995, TLI = 0.969, RMSEA = 0.048, SRMR = 0.010). The general factor of cognitive ability accounted for 40% of the cognitive test score variance (standardised loadings were Matrix Reasoning = 0.550, Symbol-Digit = 0.626, Verbal-Numerical Reasoning = 0.532, Trail-Making part B = −0.794). When we corrected each cognitive test within the SEM for age and sex, keeping the abovementioned residual correlation, the model fit the data well (CFI = 0.998, TLI = 0.978, RMSEA = 0.030, SRMR = 0.004). The general factor of cognitive ability accounted for 32% of the cognitive test score variance (standardised loadings were Matrix Reasoning = 0.505, Symbol-Digit = 0.479, Verbal-Numerical Reasoning = 0.592, Trail-Making part B = −0.666).
3.2 Associations between general intelligence (g) and brain MRI measures
Results between g and global brain MRI measures are shown in Table 2 and Figure 3. SEM fit statistics are reported in Table S3, and residual correlations among the global brain tissue measures from the MIMIC model are reported in Table S4. In all models, the cognitive and MRI indicators were adjusted for age and sex, and the MRI indicators also adjusted for head positioning confounds. The latent factor of general intelligence was correlated with TBV at r = 0.275, p < 0.001. Associations with GM (r = 0.270) and NAWM (r = 0.254) were significantly larger than the other three tissue-specific measures (i.e., WMH, gFA, gMD; p for comparisons < 0.001). The associations of g with white matter hyperintensities, general fractional anisotropy, and general mean diffusivity had effect sizes (r) ranging from |0.060 to 0.122|.
In a multivariate SEM (MIMIC model), we found that all MRI measures (GM, NAWM, WMH, gFA, gMD) accounted for 6.54% of the variance in g. The unique contributions to this variance were largest for grey matter volume (β = 0.181) with equivalent contributions of NAWM and WMH volumes (β = 0.128 and β = −0.121, respectively). Neither measure of white matter microstructure made significant unique contributions beyond this, following FDR correction (gFA β = −0.014, p = 0.403; and gMD β = −0.038, p = 0.035).
Ritchie et al. (2015) previously reported that the total variance in g explained by multiple MRI markers in an older cohort (all aged approximately 73 years) was 18-21%. To determine whether the difference between that estimate and the one reported here may be attributable to differences in the age range of the samples, we conducted a post-hoc supplementary test of differences in the proportion of variance explained by age. Initially, we tested whether g exhibited weak measurement invariance across the two age groups. Both models had excellent fit to the data, and were highly similar: saBIC indicated that weak invariance was preferred, contradicting the AIC results and the small but significant difference detected by the chi-squared test (Δχ2(3) = 27.617, p = <0.001, ΔAIC = 22, ΔsaBIC = −6.32). Comparing the magnitude and rank order of the freely estimated factor loadings between middle-aged (MR = 0.506, SDS = 0.539, VNR = 0.614, TMTb = −0.719) and older (MR = 0.522, SDS = 0.492, VNR = 0.569, TMTb = −0.701) participants also suggested that g exhibited weak factorial invariance between groups (coefficient of factor congruence = 1.00).
We then investigated whether the proportion of g variance accounted for by MRI measures was substantially greater in older than middle-aged participants. Results are shown in Table 3. A model with unconstrained g-MRI associations fitted the data significantly better than when the associations were constrained to equality between age groups (Δχ2(7) = 188.02, p = <0.001, ΔAIC = 174, ΔsaBIC = 138). In the model in which g-MRI associations were allowed to differ by age group, GM, NAWM, WMH, gFA and gMD together explained a total of 5.9% of the variance in g among the middle aged group, compared to 13.4% in older age. g-brain association magnitudes were all stronger in older age for GM (0.133 versus 0.293), WMH (−0.114 versus −0.149), gFA (0.032 versus −0.089) and gMD (0.001 versus −0.115); the one exception was NAWM volume (which showed stronger g associations in middle than older age; 0.165 versus 0.066). Notably, the global dMRI measures (gFA and gMD) were non-significant in the middle-age group. Moreover, we did not observe this significant age difference in variance explained in g by TBV alone, or by each individual MRI measure in isolation (all p-values for chi-squared tests were non-significant following FDR correction, with the exception of GM, which was more strongly associated with g in the older group, 0.266 versus 0.323; Δχ2(1) = 13.095, p < 0.001).
In summary, whereas the individual associations between brain measures and intelligence were of relatively similar magnitudes in middle and older age, when entered together they explained more than double the variance in g in older age. Inspection of the age differences in the correlational structure among these imaging markers (Figure S2) indicated that the source of increased variance explained is not likely to be attributable to their diverging collinearity (i.e. they overlap less in older age, and thus convey more unique information), given that the only notable differences were in associations between WMH and both gFA and gMD, which were stronger in older than younger age.
3.3 Sex differences in g-brain MRI associations
Before testing for sex differences in the size of the associations between g and brain MRI measures, we tested for measurement invariance between the sexes. We found that the model of strong factorial invariance of g did not fit more poorly than the model in which factor loadings and intercepts were freely-estimated (ΔAIC = 28, ΔsaBIC = −22, Δχ2 (4) = 40.027, p = <0.001). These results are reported in Table S5. We then tested for sex differences in the magnitude of associations between g and MRI measures. We did so by comparing two group models; one in which the brain-g association is fixed to equality between sexes, and the other in which it is freely estimated. We found that there were no significant sex differences in the magnitude of the association between g and any global brain MRI measure (p ≥ 0.171) except for NAWM (p = 0.011, standardised estimate for females = 0.185, males = 0.241), which was non-significant following FDR correction.
3.4 Associations between g and white matter microstructure
SEMs testing associations between g and the FA and MD of each white matter tract fitted the data well (all CFI ≥ 0.991, TLI ≥ 0.980, RMSEA ≤ 0.020, SRMR ≤ 0.016); results are shown in Figure 4, and Tables S6 and S7. Associations with g were in the expected direction, such that higher g was related to higher FA and lower MD (with the exception of the left medial lemniscus β = 0.027, FDR q = 0.018). Only a few pathways had non-significant associations with g (FA and MD in the bilateral acoustic radiations, FA in the middle cerebellar peduncle, and MD in the right parahippocampal cingulum, Forceps Major, bilateral inferior longitudinal fasciculus and left medial lemniscus). The effect sizes were not homogeneous across tracts (FA range = 0.001 to 0.112; MD range = −0.101 to 0.027). Consistent with our hypothesis, the magnitude of associations with g were numerically largest within thalamic pathways (FA mean = 0.080, MD mean = −0.081), and in association fibres and Forceps Minor (FA mean = 0.062, MD mean = −0.036) than within projection fibres and Forceps Major (FA mean = 0.033, MD mean = 0.009)1. However, it is also notable that both aspects of the cingulum bundle, showed among the weakest g relationship among association fibres, and that more generally there was a considerable amount of overlap between these classes of tract (for example, the right corticospinal tract MD was associated with g at levels comparable with most association fibres).
3.5 Associations between g and cortical regions
Associations between g and cortical regional volumes were all positive and all significant following FDR correction. The results are reported in Figure 5 and in Table S8; all models fitted the data well (CFI ≥ 0.990, TLI ≥ 0.979, RMSEA ≤ 0.021, SRMR ≤ 0.016). As with the white matter analyses above, there was regional heterogeneity in association magnitudes across the cortical surface. Substantial portions of the frontal lobe (frontal pole, frontal orbital, subcallosal) were among the numerically largest associations, bilaterally (range = 0.161 to 0.212), and these were significantly larger than other frontal regions (p < 0.001). Associations between the insula cortex and g (left = 0.186, right = 0.197) were also large compared to the average magnitude across all ROIs (M = 0.116, SD = 0.036). Associations within the temporal lobe were strongest in the temporal pole and anterior portions of the superior temporal gyrus (range = 0.140 to 0.160), and in more anterior portions of the parahippocampal and fusiform gyri (range = 0.148 to 0.154). Notably, the temporal lobe appeared to show a gradient of anterior > posterior for both lateral and medial portions, and the lateral surface also showed evidence of a superior > inferior gradient. Compared to the above-mentioned frontal, temporal and insula volumes, parietal regions were consistently and significantly more weakly associated with g (range = 0.066 to 0.101, p < 0.001). With the exception of the lingual, precuneus, and lateral occipital cortex (range = 0.109 to 0.150), occipital volumes were among the most weakly associated with g (range = 0.066 to 0.092).
3.6 Associations between g and subcortical volumes
As with the cortical analyses, subcortical volumes were all positively associated with g, and all were significant following FDR correction. The results are reported in Table S9. All models fitted the data well (CFI ≥ 0.990, TLI ≥ 0.979, RMSEA ≤ 0.020, SRMR ≤ 0.014), and standardised estimates ranged from 0.050 to 0.277. Largest effect sizes were found for the Thalamus (left = 0.274, right = 0.277), which was significantly larger than for all other subcortical regions (p < 0.001). The Amygdala showed the weakest associations of all (left = 0.063, right = 0.050), whereas the remaining volumes (Accumbens, Caudate, Hippocampus, Pallidum and Putamen) showed comparable magnitudes (range = 0.090 to 0.146).
Discussion
In this sample of 29,004 middle and older aged participants, we found that the association between total brain volume and a latent factor of general intelligence was r = 0.275. The current single-cohort analysis was not affected by the confounds of cross-cohort heterogeneity in the protocol for intelligence and brain size measurement which have affected recent meta-analyses of this association (McDaniel, 2005; Rushton & Ankney, 2009; Pietschnig et al., 2015; Gignac & Bates, 2017. This estimate is equal to the mid-point between the meta-analytic effect size estimate from Pietschnig et al., (2015; r = 0.24) and the quality-corrected estimate of Gignac and Bates (2017; r = 0.31). It is also considerably larger than previous estimates using a single cognitive indicator (verbal numerical reasoning) in an earlier UK Biobank release (r = 0.19, N = 13,608; Nave et al., 2018; r = 0.177, N = 5,216; Ritchie et al., 2018), emphasising the utility of our latent variable approach, which was also informed by a larger sample. The fact that the association between g and TBV was not significantly different between sexes is in contrast the results reported by McDaniel (2005), but not with a larger, more recent meta-analysis (Pietschnig et al., 2015), and prior work in an earlier UK Biobank release using the VNR only (Ritchie et al., 2018), adding weight to the hypothesis that more specific brain characteristics compensate for the large brain size difference between males and females.
We also ascertained that the g-TBV association belies important heterogeneity at both the global tissue level, and at the regional level across cortex, subcortex and white matter. GM and NAWM were the strongest global tissue correlates of g, with WMH and microstructural measures showing weaker but significant associations in separate models. However, when modelled simultaneously in a MIMIC model, unique contributions of WMH and NAWM were near identical (GM still largest), whereas information about white matter microstructure apparently not carrying any unique information about individual differences in g. Together, these measures explained only 6.54% of g variance. This was substantially lower than prior estimates using similar global structural brain metrics (Ritchie et al., 2015) which explained as much as 21% g variance. Given that their participants were from an older cohort – all participants were born in 1936 and were approximately 73 years old at scanning – we conducted a post-hoc analysis to ascertain whether the brain measures would account for more variance in older than in younger participants. We found that these measures accounted for more than double the g variance in older participants compared to those in middle-age (13.4% and 5.9%, respectively); thus while still smaller than that accounted for in Ritchie et al (2015), this does support the notion that age moderates the relationship between general intelligence and multiple aspects of brain tissue structure. This age moderation pattern was only observed in a multi-predictor analysis that simultaneously included multiple MRI-based predictors and not in an analysis of total brain volume or individual MRI predictor alone.
First, these age differences stood in contrast to the apparent age invariance of the association between g and total brain volume (as found by Pietschnig et al., 2015). This supports the notion that total brain volume is a proxy for several other aspects of brain integrity whose variances are i) uniquely informative for cognitive function and ii) more informative than brain size alone. Moreover, total brain volume is likely to be an age-varying indicator of brain integrity, raising questions about the value of considering the brain size-intelligence relationship, in isolation, for furthering our mechanistic insight into the cerebral basis of intelligence. Overall, the age moderation pattern suggests that g may be less sensitive to the variance around ‘healthier’ / ‘younger’ averages (higher GM and NAWM volume, lower WMH volume, higher FA and lower MD).
In our analysis of regional brain correlates of intelligence, the cortical and grey matter associations were stronger than for the regional white matter microstructural parameters, on average, though association magnitudes were all of small effect size (Cohen, 1992). Nearly all regional measures were significant following FDR correction, and our findings of regional heterogeneity of g associations across the tissues of the brain were partly consistent with our hypotheses based on the P-FIT (Jung & Haier, 2007). Specifically, we hypothesised that g would be most strongly positively associated with lateral frontal, superior parietal and temporal cortical volumes, and show stronger (positive for FA, negative for MD) relationships with g in thalamic and association fibres, plus forceps minor. In accordance with this, we found relatively stronger associations in regions such as the frontal pole, dorsolateral frontal cortex, paracingulate, anterior aspects of both lateral and medial temporal lobes, and lateral occipital cortex. However, the comparatively weaker associations in inferior frontal, anterior cingulate, and superior parietal / angular / supramarginal areas were less consistent with P-FIT. Moreover, medial frontal regions (orbitofrontal and subcallosal), central and precentral gyri were among the strongest associations here, but were not explicitly implicated in prior reviews (Jung & Haier, 2007; Basten et al., 2015), and we also found associations with the insula and precuneus / posterior cingulate volumes which were only more recently implicated in general intelligence (Basten et al., 2015), and concurs with more recent insights into the dense and wide-ranging connectivity profile of the insula (Nomi et al., 2018). With reference to the white matter pathways, magnitudes were consistently smaller than for cortical regional volumes, but were strongest among thalamic and most association pathways, along with the forceps minor, which facilitate connectivity across many of the distal cortical regions highlighted by the P-FIT model.
Finally, we opted to include subcortical volumes in a post-hoc (non-pre-registered) analysis. Consistent with prior reports (Basten et al., 2015; Grazioplene et al., 2015; Rhein et al., 2014), we found significant bilateral associations with the caudate, though these were not significantly larger than the magnitudes found for the majority of subcortical structures. In fact, thalamic volume was substantially more strongly related to general intelligence (≥1.5 times as large) than any other subcortical structure (r for left and right = 0.255 and 0.251). This finding is in line with the highly complex connectivity profile of the thalamus, whose various nuclei share connections across much of the cortex (including prefrontal and hippocampal pathways; Behrens et al., 2003; Aggleton et al., 2010), its role in orchestrating cortical activity as well as an information relay (Rikhye et al., 2018), and a prior report of its phenotypic and genetic associations with intelligence (Bohlken et al., 2014). It is also consistent with previously-reported associations of the thalamus and its radiations with ageing, and to potential determinants thereof, such as vascular risk (Cox et al., 2016; Cox et al., 2019). However, it is also notable that the association between g and all subcortical structures, though not as large as for the Thalamus, were still comparable or larger than those exhibited by white matter microstructural measures.
The study has several limitations. The information reported here is correlational in nature, and though it describes what intelligent brains look like (insofar as these are some of the axes along which brains differ as a function of intelligence), it cannot directly differentiate between regions that are and are not required to support the cognitive processes subsumed beneath the umbrella of g. Neverthelss, it continues to be of interest and value to robustly quantify how and where brain structure and intelligence are associated; along with longitudinal data, lesion studies and other methodologies, such studies will help to triangulate the contributions that brain regions play in giving rise to individual differences in g. The study sample is also range restricted in three respects. First, they are members of a voluntary research study (and UK Biobank is known to be range restricted in some ways compared to the general population; Fry et al., 2017), Second, we know that the brain imaging subset of UK Biobank participants tends to live in less deprived areas (Lyall et al., 2019), and third, the age range of the sample is restricted to middle and older age. These sample limitations may affect the generalisability of our results with respect to the total brain, global tissue or regional results, and the pattern of age moderation; these whole-life-course patterns could more optimally be addressed in a large scale multi-cohort mega-analytic framework. Though the reliability of the more recent cognitive tests that we used here is not known, we note that the loadings and proportion of variance explained would be very unlikely to occur if they were unreliable tests, and that these compare favourably with the correlational structure of other UK Biobank cognitive tests where test-retest reliability is known to be low (Lyall et al., 2014). Moreover, the tests selected here were based on well-validated cognitive tests, and a paper covering their design and reporting results of a validation study of this enhanced cognitive battery in UK Biobank is the subject of ongoing work by the authors (CFR and IJD). Finally, it could be argued that the brain imaging methods might limit the fidelity with which we can measure the regional specificity of g associations across the brain. The 27 major pathways have the advantage of being well characterised and aid consistent identification across subjects, but they do not allow a direct measure of the WM connectivity between two cortical or subcortical sites of the brain, which would allow for a more precise and stringent test of g associations with the WM pathways underlying the P-FIT, as well as a less biased set of pathways (for example, the current dataset has more information on thalamic connectivity than on other subcortical pathways). Similarly, the cortical parcellation used here was one of convenience and does not correspond directly onto the Brodmann Areas used by Jung and Haier (2007), which makes mapping the current findings onto prior hypotheses opaque. For example, whereas the Harvard-Oxford atlas includes a paracingulate region (Brodmann Area 32), this additional cortical fold is not always present, and thus is perhaps more usefully referred to as “superior medial” cortex (e.g. see Cox et al., 2014). Likewise, frontal pole region subsumes a large portion of the frontal lobe compared to that described by Brodmann; though this concordance issue is well-known, and there is no straightforward solution (Bohland et al., 2009; Cox et al., 2014), it is important to interpret the results with these limitations in mind.
In conclusion, this preregistered study provides a large single sample analysis of the global and regional brain correlates of a latent factor of general intelligence. Our study design avoids issues of publication bias and inconsistent cognitive measurement to which meta-analyses are susceptible, and also provides a latent measure of intelligence which compares favourably with previous single-indicator studies of this type. We estimate the correlation between total brain volume and intelligence to be r = 0.275, which applies to both males and females. Multiple global tissue measures account for around double the variance in g in older participants, relative to those in middle age. Finally, we find that associations with intelligence were strongest in frontal, insula, anterior and medial temporal, lateral occipital and paracingulate cortices, alongside subcortical volumes (especially the thalamus) and the microstructure of the thalamic radiations, association pathways and forceps minor.
Acknowledgements
We thank the UK Biobank participants and the UK Biobank team for their work in collecting, processing and disseminating these data for analysis. We also thank David C. Liewald for data and technical computing support. This research was conducted, using the UK Biobank Resource under approved project 10279, in The University of Edinburgh Centre for Cognitive Ageing and Cognitive Epidemiology (CCACE) (http://www.ccace.ed.ac.uk), part of the cross-council Lifelong Health and Wellbeing Initiative (MR/K026992/1). Funding from the Biotechnology and Biological Sciences Research Council (BBSRC) and Medical Research Council (MRC) is gratefully acknowledged. SRC and IJD were supported by MRC grants MR/M013111/1 and MR/R024065/1. IJD is additionally supported by the Dementias Platform UK (MR/L015382/1), and he, SRC and SJR by the Age UK-funded Disconnected Mind project (http://www.disconnectedmind.ed.ac.uk). SRC, SJR, IJD and EMT-D were supported by a National Institutes of Health (NIH) research grant R01AG054628. EMT-D was also supported by National Institutes of Health (NIH) research grant R01HD083613, and is a member of the Population Research Center at the University of Texas at Austin, which is supported by NIH center grant P2CHD042849. CF-R is supported by Dementias Platform UK (DPUK), funded through the MRC (MR/L023784/2).
Footnotes
↵1 We note that the splitting of the commissural fibres in this way was a pre-registered hypothesis.