Abstract
Previous research points to the heritability of risk-taking behaviour. However, evidence on how genetic dispositions are translated into risky behaviour is scarce. Here, we report a genetically-informed neuroimaging study of real-world risky behaviour across the domains of drinking, smoking, driving, and sexual behaviour, in a European sample from the UK Biobank (N= 12,675). We find negative associations between risky behaviour and grey matter volume (GMV) in distinct brain regions, including amygdala, ventral striatum, hypothalamus, and dorsolateral prefrontal cortex (dlPFC). These effects replicate in an independent sample recruited from the same population (N=13,004). Polygenic risk scores for risky behaviour, derived from a genome-wide association study in an independent sample (N=297,025), are inversely associated with GMV in dlPFC, putamen, and hypothalamus. This relation mediates ~2.2% of the association between genes and behaviour. Our results highlight distinct heritable neuroanatomical features as manifestations of the genetic propensity for risk taking.
One Sentence Summary Risky behaviour and its genetic associations are linked to less grey matter volume in distinct brain regions.
Main
Taking risks—an essential element of many human experiences and achievements—requires balancing uncertain positive and negative outcomes. For instance, exploration, innovation and entrepreneurship can yield great benefits, but are also prone to failure1. Conversely, excessive risk-taking in markets can have enormous societal costs, such as speculative price bubbles2. Similarly, common behaviours such as smoking, drinking, sexual promiscuity, or speeding are considered rewarding by many but might expose individuals and those around them to deleterious health, social, and financial consequences. In 2010, the combined economic burden in the United States of these risky behaviours was estimated to be about $593.3 billion3–6. Although previous findings point to the partial heritability of risk tolerance and risky behaviours7 and neuroanatomical measures exhibit high heritability8,9, little is known about the brain features involved in translating genetic dispositions into risky behavioural phenotypes8.
Recent research using structural brain-imaging data from small, nonrepresentative samples (comprising up to a few hundred participants) has identified several neuroanatomical associations with risk tolerance10–12. However, this literature is limited by low statistical power13,14, and the generalizability of their findings to other populations is questionable. Small sample sizes have also limited the ability to control systematically for many factors that could confound observed relations between brain features and risky behaviour, such as height15 and genetic population structure16,17. Moreover, despite evidence that the effects of genetic factors are likely mediated by their influence on the brain and its development7,18, neuroscientific and genetic approaches to understanding the biology of risky behaviour have largely proceeded in isolation–perhaps due to the lack of large study samples that include both genetic and brain imaging measures.
Here, we utilize data obtained in a prospective epidemiological study of ~500,000 individuals aged 40 to 69 years [the UK Biobank (UKB)19,20] to carry out a pre-registered investigation (https://osf.io/qkp4g/, see Supplementary Methods for deviations from the analysis plan) of the relationship between individual differences in brain anatomy and the propensity to engage in risky behaviour across four domains (N= 12,675). We replicate our findings in an independent sample recruited from the same population (N=13,004). Further, we isolate specific differences in brain anatomy that are linked to the genetic disposition for risky behaviour— quantified via polygenic risk scores (PRS) derived from a genome-wide association study (GWAS) in an independent sample (N = 297,025)—and investigate how these neuroanatomical endophenotypes mediate the influence of genetics on the behavioural phenotype.
Results
Grey Matter Volume Associations with Risky Behaviour
Akin to a previous investigation7, we construct a measure of risky behaviour by extracting the first principal component from four self-reported measures of drinking, smoking, speeding on motorways, and sexual promiscuity (N = 315,855; see Figure 1A, Supplementary Methods 1.1, Supplementary Figures 1-2 and Supplementary Tables 1-2 for descriptive statistics). This measure of risky behaviour is genetically correlated with many other traits, including cannabis use (rg = 0.72, SE = 0.02), general risk tolerance (rg = 0.56, SE = 0.02), self-employment (rg = 0.52; SE = 0.30), suicide attempt (rg = 0.47, SE = 0.07), antisocial behaviour (rg = 0.45, SE = 0.14), extraversion (rg = 0.34, SE = 0.04), and age at first sexual experience (rg = −0.54, SE = 0.02) (Supplementary Methods 1.5 and Supplementary Table 3). Thus, our measure is partly rooted in genetic differences between people and relates to a broad range of relevant events and behaviours.
Our main analysis includes a sample of 12,675 European-ancestry participants from the UKB. We first regress our measure of risky behaviour on total (whole-brain) grey matter volume (GMV) while controlling for age, birth year, gender, handedness, height, total intracranial volume and the first 40 genetic principal components, which account for genetic population structure (see Supplementary Methods 1.2). To exclude confounding effects of excessive alcohol consumption21, we excluded from the analysis all current or former heavy drinkers (see Methods)22,23. We find an inverse association between total GMV and risky behaviour (standardized β = -.122; 95% confidence interval (CI) [-.156, -.087]; P < 4.86 × 10-12, two-sided).
To identify specific brain regions related to risky behaviour, we perform a wholebrain voxel-based morphometry (VBM)24 analysis that regresses our measure of risky behaviour separately on GMV in each voxel across the brain, adjusting for the same control variables. We identify localized inverse associations between risky behaviour and GMV in distinct regions, only some of which were expected based on previous small-scale studies (see Figure 1B, Supplementary Figure 3 and Supplementary Table 4). In subcortical areas, we identify associations bilaterally in expected areas such as the amygdala and ventral striatum (VS), as well as in less expected areas such as the posterior hippocampus, putamen, thalamus, hypothalamus, and cerebellum. We also identify bilateral associations between risky behaviour and GMV in cortical regions that include the ventral medial prefrontal cortex (vmPFC), dorsolateral prefrontal cortex (dlPFC), ventro-anterior insula (aINS) and the precentral gyrus. In all of these regions, GMV is negatively associated with the propensity to engage in risky behaviours. We find no positive associations between GMV and risky behaviour anywhere in the brain.
To quantify effect sizes of the associations between risky behaviour and GMV in anatomically-defined brain structures and to investigate the convergence of our findings across MRI processing pipelines25, we conduct a follow-up analysis at the region of interest (ROI) level. This analysis primarily relies on the imaging-derived phenotypes (IDPs) provided by the UKB brain imaging processing pipeline8,26,27, which used parcellations from the Harvard-Oxford cortical and subcortical atlases and the Diedrichsen cerebellar atlas. We derived additional IDPs using unbiased masks based on the results of the voxel-level analysis (see Supplementary Methods 1.3.3). This analysis identifies negative associations between risky behaviour and GMV in 23 anatomical structures, with standardized βs between −0.079 and −0.036 (Figure 1C and Supplementary Figure 4), the largest of which is in the right ventro-aINS; (β = −0.079; 95% CI [-.103, -.055]; Puncorr = 1.34 × 10-10, two-sided).
We carry out several additional analyses to assess the differential contributions of various factors to the associations we observe. First, we re-estimate the ROI-level regressions with additional controls for various socioeconomic and cognitive outcomes that may be linked to both brain anatomy and risky behaviour, either as antecedents or downstream consequences. These controls include participants’ years of education and fluid intelligence (13-item measure)17, a zip-code level measure of the Townsend social deprivation index28, household income and size, and birth location (binned into 100 geographical clusters, see Supplementary Methods 1.2 and Supplementary Figure 1B). The direction and magnitude of all ROI effects are comparable to the main analysis (range of standardized βs between −0.079 and −0.035), with the largest effect again located in the right ventro-aINS (β = −0.079; 95% CI [-.104, -.055]; Puncorr = 3.3 × 10-10, two-sided). Furthermore, 19 of the 23 ROIs (all except the left Cuneal Cortex, left Crus I of the Cerebellum, left Planum Polare and the Brain Stem) are statistically significant after correction for multiple comparisons in this analysis (see Supplementary Figure 5 and Supplementary Table 5).
Second, we re-estimate our ROI-level regressions with additional controls for current levels of drinking (binned into 10 deciles) and smoking (binned into 3 categories). Although introducing these controls into the model regresses out variance of interest from the main outcome measure, which likely decreases the size of the effects, this analysis allows us to test whether any of the identified associations can reliably be attributed to risky behaviour that is not limited to the substance-use domain. In this analysis, we find that all of the effects originally identified remain negative in sign, yet are smaller (range of standardized βs between −0.041 and −0.011; see Supplementary Table 7). Nonetheless, the effects in 9 subcortical ROIs (including the Amygdala, Putamen, VS and Cerebellum) remain statistically significant after correction for multiple comparisons (see Supplementary Figure 6), with the strongest association identified in the left Amygdala (β = −0.041; 95% CI [-0.059, −0.023]; Puncorr = 1.1 × 10-5, two-sided). Thus, these subcortical IDPs are reliably associated with risky behaviour in non-substance use domains.
Replication in an Independent Sample
Several months after the completion of our original analyses (in February 2020; https://biobank.ndph.ox.ac.uk/showcase/exinfo.cgi?src=timelines), the UKB released brain images of 20,316 additional participants— providing us with an opportunity to replicate our findings in an independent dataset that contains the same variables, and participants recruited in the same way from the same population29. After applying the same exclusion criteria as in our original analysis, our replication sample consists of 13,004 participants, roughly the same size as our original sample (see Methods for details). We repeat both the voxel-level and ROI-level analyses in this dataset. In the voxel-level analysis, we apply a significance threshold that corresponds to a familywise-error (FWE) rate of 5% in all voxels that showed significance in the original analysis (puncorr = 2.956 x 10-04, with tuncorr = 3.62, two-sided). We find that 92.6% of the original voxels (located in 20 of the 21 clusters originally identified, with the exception of a cluster in cerebellar lobules I-IV) successfully replicate (see Supplementary Figure 7). Furthermore, the un-thresholded t-map25 of the original dataset strongly correlates with the un-thresholded t-map of the replication dataset (r = .767; 95% CI [.766, .768]; P < 10-10, two-sided). Likewise, our ROI-level analysis successfully replicates 21 of the original 23 ROI-level findings (puncorr = 3.35 x 10-03, with tuncorr = 2.93, two-sided; see Supplementary Table 8). The two ROIs that do not replicate are the Cuneal Cortex (left) and the cerebellar lobule II (left).
Overlap between Grey Matter Volume Differences and Functional MRI (fMRI) Meta-Analysis
To investigate whether the neuroanatomical associations of real-world risky behaviour correspond spatially with the localized activation patterns commonly identified in fMRI studies of risky decision-making, we conduct a conjunction analysis to compare our VBM results with data obtained from a publicly available meta-analysis of fMRI studies of risky behaviour30 (N = 4,717 participants, K = 101 individual studies, see Supplementary Table 9). The analysis reveals several brain regions whose anatomical associations with risky behaviour converge with functional engagement, including the thalamus, amygdala, vmPFC, and dlPFC (Figure 1D).
Association of Polygenic Risk Scores for Risky Behaviour with Grey Matter Volume
Finally, we explore whether participants’ genetic disposition for risky behaviour, proxied via their polygenic risk scores (PRS), are associated with the neuroanatomical correlates of the trait, and test whether these associated neuroanatomical correlates mediate the relationship between genetic predisposition and behaviour. To this end, we first conducted a GWAS in an independent sample of UKB participants of European ancestry (N=297,025), exclusive of 18,796 genotyped individuals with usable MRI images (main sample) and their relatives. From the GWAS, we constructed a PRS that aggregated the effects of 1,176,729 single nucleotide polymorphisms (SNPs) on risky behaviour for all of the participants with MRI data in our independent target sample (see Supplementary Methods 1.4). The PRS predicts ~3% of the variance in risky behaviour in our target sample. Although the PRS are not associated with whole-brain GMV (standardized β = -.004; 95% CI [-.050 .020]; P > 0.41, two-sided), they are inversely associated with GMV in distinct regions, specifically the right dlPFC, right putamen and hypothalamus (Figure 2A.; regressions include all standard control variables, including total intracranial volume). Thus, GMV in these specific brain areas is negatively associated with the genetic disposition for risky behaviour.
Based on these results, we use the previously extracted GMV of these three ROIs to examine whether it mediates the observed gene-behaviour associations. A structural equation model including all standard controls reveals that ~2.2% of the association between the PRS and risky behaviour is mediated through individual differences in GMV in the three regions (indirect path c’; standardized β = 0.004, 95% CI [.002, .005], P = 1.4 × 10-06, two-sided) (Figure 2B).
Discussion
We investigate in a genetically-informed neuroimaging study (a) the association between GMV and real-world risky behaviour in a large population sample of European ancestry (main sample of 12,675 individuals and replication sample of 13,004 individuals), and (b) how the genetic disposition for risky behaviour is linked to GMV differences in a network of distinct brain areas. Several of the areas whose GMVs are linked to risky behaviour in this study have also often been functionally engaged during risky decision-making in small-scale fMRI studies that used stylized tasks. For instance, such correlations have been observed in the aINS, thalamus, dlPFC, vmPFC and VS31,32. These findings have led to proposals that upward and downward risks are encoded by distinct circuits, with upward risk mainly represented by areas encoding rewards (VS and vmPFC) and downward risk encoded by areas related to avoidance and negative arousal (aINS). Here, we substantiate previous functional studies with large-scale evidence that the structural properties of the same areas relate to risky behaviour in an ecologically valid setting33 when long-term health consequences are at stake.
Our results extend previous findings by showing that the neural foundation of risky behaviour is complex. Our analyses identify additional negative associations between risky behaviour and GMV in several areas, including the cerebellum, posterior hippocampus, hypothalamus, and putamen. While it is not yet clear how structural differences are manifested in properties of brain function34, our results suggest that risk taking draws on manifold neural processes, not just the representation and integration of upward and downward risks in the brain. Specifically, considering previous metaanalyses, the areas identified in this study are involved broadly in memory (posterior hippocampus), emotion processing (amygdala, ventro-aINS)35, neuroendocrine processes (hypothalamus)36,37, reward processing (vmPFC, VS and putamen)38 and executive functions (dlPFC)39. Thus, it appears that risky behaviour taps into multiple elements of human cognition, ranging from inhibitory control40 to emotion regulation41 and the integration of outcomes and risks42. This mirrors previous findings showing that risky behaviour is also a genetically complex trait7.
Additionally, our results underscore the long-suspected role of the hypothalamic–pituitary–adrenal (HPA) axis in regulating risk-related behaviours, in line with hormonal studies that link risky behaviour and sensation seeking to stress responsivity36,37,43–45. Furthermore, our finding that risky behaviour is linked to the structure of several cerebellar areas confirms the under-appreciated importance of the cerebellum for human cognition and decision-making, and highlights the need for further research on the specific behavioural contributions of this brain area46.
Of note, it remains an open question the extent to which the differences in GMV that we identify can be ascribed to specific heritable micro-anatomical traits, such as neuronal size, dendritic or axonal arborisation, or relative count of different cell types47. Moreover, it is not yet clear how these individual differences, in turn, influence behaviour. Nonetheless, our results provide evidence that neuro-anatomical structure constitutes the micro-foundation for neuro-computational mechanisms underlying individual differences in risky behaviours34.
Several of the observed neuroanatomical correlates of risky behaviour are also associated with the genetic disposition for the phenotype. Specifically, we find that GMV in the hypothalamus, the putamen and the dlPFC share variance with both risky behaviour and its polygenic risk score. This finding extends previous studies showing correlational36,42,43 and causal evidence37,48 of the involvement of these areas in risky behaviour by indicating that a genetic component partly underlies the associations. Although our analyses cannot identify the direction of the causal relationships (see Supplementary Discussion for an additional discussion of limitations), they show that risky behaviour and its genetic associations share variance with distinct GMV features and provide an overarching framework for how the genetic dispositions for risky behaviour may be expressed in the corresponding behavioural phenotype.
Our results are also in line with the bioinformatic annotation of the largest GWAS on risk tolerance to date, conducted in over 1 million individuals7, which implicated specific areas in the prefrontal cortex (BA9, BA24), striatum, cerebellum and the amygdala. However, bioinformatics tools used for GWAS annotation cannot be considered conclusive, as they rely on gene expression patterns in relatively small samples of (post-mortem) human brains or non-human samples49. Moreover, they cannot speak to whether changes in a particular tissue or cell type have negative or positive effects on the phenotype or how strong the effects are. Here, we show an alternative approach to annotating GWAS findings using a different type of data (large-scale population samples that include in-vivo brain scans and genetic data), relying on different assumptions than those used by bioinformatics tools. Our results add new insight by showing that lower GMV in specific brain areas is related to more risky behaviour and by implicating brain regions (i.e., putamen, hypothalamus, and dlPFC) in addition to those previously annotated. The effect sizes we observe here (standardized β < 0.08) are an order of magnitude larger than those found in GWAS on risky behaviour, but they nonetheless require very large samples for identification.
Finally, while many features of the brain are heritable, the environment indisputably plays an important role in brain development. We therefore see our results not as independent from, or of greater importance than, the effects of environmental and developmental factors. Rather, our study constitutes one step towards understanding how the complex development of human risky behaviour may be constrained by genetic factors.
Methods
Sample Characteristics and Selection Criteria
Main sample
We use publicly available data from the UKB, which recruited 502,617 people aged 40 to 69 years from the general population across the United Kingdom19,20. All UKB participants provided written informed consent and the study was granted ethical approval by the North West Multi-Centre Ethics committee. Our initial sample consists of 18,796 individuals with brain scans and genotype data, all of the imaged UKB participants as of October 2018. We excluded participants with putative sex chromosome aneuploidy (N = 6) or a mismatch between genetic and reported sex (N = 10), participants of non-European ancestry (N = 893), and participants who did not pass the UKB quality-control thresholds (N = 14), described in Bycroft et al.27. To minimize the potential influence of neurotoxic effects due to excessive alcohol intake21,22, we also excluded current heavy drinkers (531 females consuming more than 18 drinks per week and 793 males consuming more than 24 drinks per week)22,23. To exclude potential former drinkers, we also removed 426 participants who indicated that they did not drink alcohol.
All structural T1 MRI images used in the study underwent automated quality control by the UKB brain imaging processing pipeline26. We performed two additional quality checks using the Computational Anatomy Toolbox (CAT; www.neuro.uni-jena.de/cat/) for SPM 12 (www.fil.ion.ucl.ac.uk/spm/software/spm12/). First, we relied on the CAT12 automated Image and Pre-processing quality assessment, which included quality parameters for resolution, noise, and bias of images, 57 of which were automatically excluded and not pre-processed due to low image quality. Second, after pre-processing, we utilized an automated quality check of sample homogeneity to identify outliers that exhibited substantially different GMV patterns than the rest of the sample (see the CAT12 manual for details; http://www.neuro.uni-jena.de/cat12/CAT12-Manual.pdf, p. 17ff). In total, we excluded 690 individuals with scans of high image inhomogeneity (two standard deviations below the mean). Finally, we excluded 2,701 participants with incomplete behavioural data of interest or control variables. Our final dataset consists of N = 12,675 individuals.
Replication sample
The initial replication sample consisted of 20,316 individuals with usable brain scans and genotype data, all of the imaged UKB participants as of February 2020 who were not included in our main sample. Following the original analysis, we excluded participants with putative sex chromosome aneuploidy (N = 7), a mismatch between genetic and reported sex (N = 11), non-European ancestry (N = 1,143), heavy drinking or abstinence from alcohol (N = 1,376), and those that did not pass the UKB quality-control thresholds (N = 31)27. All T1 MRI images used in the study underwent the same quality control procedure as in our original sample, resulting in the removal of additional 391 individuals. Finally, we excluded participants with incomplete data (N = 4,353). Our final replication sample consists of N = 13,004. The empirical distributions of the main variables used in our replication analysis are depicted in Supplementary Figure 1B.
Measures
Risky Behaviour
We closely follow Karlsson Linnér et al.7 to derive a measure of risky behaviour across domains based on participants’ self-reports of: (1) Number of alcoholic drinks per week, (2) Ever having smoked, (3) Number of sexual partners, and (4) Frequency of driving faster than the motorway speed limit (see Supplementary Methods 1.1). We perform principal component analysis (PCA) on N = 315,855 UKB participants and extract the first principal component (PC) of the four measures as the main outcome of interest (referred to as “risky behaviour”). The first PC explains ~37% of the variance in the four measures, and it is the only PC that loads positively on all of them. Summary statistics and factor loadings of the PCA are available in Tables S1 and S2. The code for generating the variable of ‘risky behaviour’ is accessible at https://osf.io/qkp4g/.
Control variables
The full list of control variables and the methods used to generate them are available in Supplementary Methods 1.2.
T1 MRI Image Processing
We use T1-weighted structural brain MRI images in NIFTI format provided by the UKB. Images were acquired using 3-T Siemens Skyra scanners, with a 32-channel head coil (Siemens, Erlangen, Germany), with the following scanning parameters: repetition time = 2000 ms; echo time = 2.1 ms; flip angle = 8°; matrix size = 256 × 256 mm; voxel size =1 × 1 × 1 mm; number of slices = 208. A detailed description of the methods used to pre-process the images and derive voxel-level and ROI-level IDPs is available in Supplementary Methods 1.3.1-1.3.3.
Polygenic Risk Score (PRS) for Risky Behaviour
To construct a PRS, we first re-estimated the GWAS of our main measure described in ref7 after excluding 18,796 genotyped individuals with usable T1 MRI images and their relatives up to the third degree (final GWAS sample: N = 297,025 individuals of European ancestry). The GWAS was performed using linear mixed models (LMM), implemented via BOLT-LMM version 2.3.250. Next, we performed quality control (QC) of the GWAS results using a standardized QC protocol, described in detail in ref7. This protocol removes rare and low-quality single-nucleotide polymorphisms (SNPs) based on minor allele frequency (MAF) < 0.001, imputation quality (INFO) < 0.7, and SNPs that could not be aligned with the Haplotype Reference Consortium (HRC) reference panel, among other filters. After QC, a total of 11,514,220 SNPs remained in the GWAS summary statistics. Thereafter, we calculated a PRS for each participant by weighting his or her genotype across SNPs by the corresponding regression coefficients estimated in the GWAS (see Supplementary Methods 1.4 for further information).
Analysis
Voxel-based Morphometry (VBM)
We identify associations between risky behaviour and localized GMV across the brain using whole-brain voxel-based morphometry (VBM), a method that normalizes the anatomical brain images of all participants in one stereotactic space24. We regress risky behaviour separately on each voxel of the smoothed GMV images (see Supplementary Methods 1.3.1) and the control variables. We correct for multiple comparisons by adjusting the FWE rate to α = 0.01 using permutation tests (Puncorr = 1.248 x 10-06, with |tuncorr| = 4.85, two-sided). See Supplementary Figure 3 and Supplementary Table 4 for the summary statistics of each cluster and the coordinates of the peak voxel within that cluster.
Region of Interest (ROI)-level Analysis
We compute the associations between risky behaviour and 139 IDPs of GMV extracted by the UKB brain imaging processing pipeline26 using parcellations from the Harvard-Oxford cortical and subcortical atlases and Diedrichsen cerebellar atlas, in addition to 9 IDPs derived using unbiased masks based on the results of our voxel-level analysis (see Supplementary Methods 1.3.2 and 1.3.3). We regress risky behaviour separately on each IDP and the control variables and correct for multiple comparisons by adjusting the FWE rate of α = 0.01 using permutation tests (puncorr = 9.37 x 10-05, with |tuncorr| 3.91, two-sided).
Replication of the Voxel-level and ROI-level Analyses
We repeat the VBM analyses in the replication sample for all voxels that showed significance in the original sample, correcting for multiple comparisons by adjusting the FWE rate to α = 0.05 (two-tailed) using permutation tests (puncorr= 2.956 x 10-04, with tuncorr= 3.62). We also repeat the ROI-level analysis in all ROIs that showed significance in the original sample, and correct for multiple comparisons by adjusting the FWE rate to α = 0.05 (two-tailed) using permutation tests (puncorr = 3.35 x 10-03, with tuncorr = 2.93). We define replication success as observing a statistically significant effect at the 5% level (corrected for multiple hypothesis testing) in the same direction as the original finding51.
Comparison of VBM Results with a Meta-Analysis of Functional MRI (fMRI) Studies
We compare our VBM results with a publicly available meta-analysis of fMRI studies provided by Neurosynth30, an online platform for large-scale, automated synthesis of fMRI data (https://neurosynth.org/). The meta-analysis was based on the keyword ‘risky’. It consisted of K = 101 individual studies with a total of N = 4,717 participants (see Supplementary Table 9 for details), and was conducted using a uniformity test (assuming that random activations are evenly distributed across all voxels). The meta-analytic statistical image was corrected for multiple comparisons by applying a false discovery rate (FDR) of .01 (implemented by Neurosynth). The summary of studies included in the meta-analysis is available on Supplementary Table 9. The 3D activation map that resulted from the meta analysis is available on https://neurosynth.org/analyses/terms/risky/. To compare our VBM results (i.e. brain structure) with the meta-analysis of data on brain function, we perform a whole-brain voxel-level conjunction analysis of the two (see Supplementary Figure 8) that exhibits the spatial overlap of all voxels that are significant in both analyses (Figure 1D). Thus, both the structure and function of the brain regions identified by this conjunction are significantly associated with risky behaviour.
Voxel-based Morphometry (VBM) Analysis of Risky Behaviour PRS
We repeat the VBM analysis in all voxels identified to be associated with risky behaviour, using the PRS as the dependent variable. This approach allows the identification of brain regions that were likely to mediate the effect of genetic disposition on risky behaviour. The PRS was constructed using GWAS results from an independent sample, to ensure that the effect size estimates in this analysis are not inflated due to overfitting. We account for multiple comparisons using a permutation test with an FWE rate of PFWE < 0.05. This part of the analysis was not pre-registered.
Mediation Analysis
We conduct mediation analyses (implemented in STATA 14) to test whether GMV differences in the brain regions whose GMVs are associated with the PRS of risky behaviour (right dlPFC, right putamen and hypothalamus) mediate the association between the PRS and risky behaviour. We first extract GMV from these three ROIs using the same unbiased masks as in the ROI-level analysis (see Supplementary Methods 1.3.3). Then, we estimate a structural equation model (SEM) to quantify the effect of the PRS on risky behaviour mediated via GMV differences in these ROIs. All SEM equations include the aforementioned standard control variables (listed in Supplementary Methods 1.2). We carry out an additional robustness check by estimating an SEM that assumes one single path (i.e., the sum of all ROIs), which yields the same pattern of results (see Supplementary Figure 9).
Family-Wise-Error Correction using Permutation Tests
To account for multiple hypothesis testing, we determine the appropriate FWE corrected p-value threshold with a permutation test procedure in each of our analyses52. To this end, we generated 1,000 datasets with randomly permuted phenotypes (i.e., breaking the link between the outcome and explanatory variables), estimated regression models for all IDPs per analysis, and recorded the lowest p-value of each run to generate an empirical distribution of the test statistic under the null hypothesis. To obtain the FWE rate of any given alpha, we use the nth = ⍰ x 1000 lowest p-value from the 1,000 permutation runs as the FWE corrected p-value threshold.
Pre-registration of Analysis Plan and Unplanned Deviations
We pre-registered our analysis plan on Open Science Framework (OSF, https://osf.io/qkp4g/, registered Dec. 2018). Our pre-registered plan specified the construction of the dependent variable, the control variables, the inclusion criteria and quality controls, the VBM analyses and the main ROI-level analyses. We summarize all deviations from the analysis plan in Supplementary Methods 2.
Data Availability
Data and materials are available via UK Biobank at http://www.ukbiobank.ac.uk/.
Code Availability
The analysis code used in this study is publicly available on https://osf.io/qkp4g/.
Author contributions
G.A., R.D., J.W.K., P.D.K. and G.N. designed the research plan. G.N., P.D.K. and C.C.R. oversaw the study. G.A., R.D. and R.K.L analysed the data with critical input from P.D.K., G.N. and C.C.R.. G.A. G.N. and P.D.K. wrote the paper and Supplementary Materials. All authors contributed to and critically reviewed the manuscript.
Competing interests
Dr. Kranzler is a member of an advisory board for Dicerna Pharmaceuticals; a member of the American Society of Clinical Psychopharmacology’s Alcohol Clinical Trials Initiative, which was sponsored in the past three years by AbbVie, Alkermes, Amygdala Neurosciences, Arbor Pharmaceuticals, Ethypharm, Indivior, Lilly, Lundbeck, Otsuka, and Pfizer; and is named as an inventor on PCT patent application #15/878,640 entitled: “Genotype-guided dosing of opioid agonists,” filed January 24, 2018. All other authors declare no competing interests.
Supplementary Methods
1. Measures
1.1. Main risky behaviour measure
We closely follow the methods of ref 1 to derive a measure of risky behaviour based on participants’ self-reports across the drinking, smoking, driving, and sexual domains.
Specifically, we use the following UK Biobank variables:
Number of alcoholic drinks per week (Data-Fields: 1558, 1568, 1578, 1588, 1598, 1608, 5364, 4407, 4418, 4429, 4440, 4451, 4462)
Ever smoking (Data-Field 20116, 1249, 1239)
Frequency of driving faster than the motorway speed limit (Data-Field 1100)
Lifetime number of sexual partners (Data-Field 2149)1
The full description of each Data-Field can be found in the online data showcase of the UKB (http://biobank.ctsu.ox.ac.uk/crystal/search.cgi). The annotated STATA code used to derive all behavioural phenotypes and control variables can be found in our pre-registered analysis plan (https://osf.io/qkp4g/).
All variables above were measured on at least one of 3 occasions: (1) the initial assessment visit, (2) the first repeat assessment visit, and (3) the imaging visit. Data from (2) and (3) are only available for a subset of the original sample. In cases where participants provided answers across more than one visit, we compute the average of their reports.
To obtain a measure that captures the common variance in risky behaviour shared across domains, we perform principal component analysis (PCA) on N = 315,855 UKB participants and extract the first principal component (PC) as our main outcome of interest for this study (referred to as “risky behaviour”). Compared to experimental procedures that elicit risk tolerance, selfreported measures exhibit higher external validity and test-retest reliability3–5. Furthermore, by extracting the first principal component of the four risky behaviours, we reduce measurement noise due to the aggregation of signals across various measures, while capturing behavioural tendencies across domains that are independent of idiosyncratic differences in the four specific behaviours. The PCA summary statistics are available in Supplementary Table 1, and the component loadings are available in Supplementary Table 2. The first PC explained about 37% of the variance in the different phenotypes of risky behaviours in the sample, and it was the only PC that positively loaded on all of four phenotypes.
While the GWAS by Linnér et al. (2019)1, was primarily based on a meta-analysis of two very crude, noisy, single-item measures of risk taking that were available in the two largest samples (UKB and 23andMe, which had slightly different questions on risk taking), this choice (in the GWAS) was made to maximize the sample size for genetic discovery, following the logic outlined in ref 6, i.e., that in genetic discovery studies, sample size typically trumps phenotypic accuracy in terms of statistical power. (The supplementary material of ref 6 includes a mathematical derivation that illustrates this). In the current work, we also wanted to maximize statistical power, albeit the situation here is different, as the sample size was exogenously determined by the UKB. Thus, the only means to increase statistical power was via increasing the quality of the phenotypic measurement. We decided to focus on the first PC of the four risky behaviours introduced in Linnér et al. (2019) for the following reasons: (1) It is available for a large part of the scanned subsample. (2) Linnér et al. (2019) showed that this first PC has a higher SNP-based heritability than any of the general risk-taking measures or individual phenotypes, which is partly because the first PC is less affected by random measurement error than any input variable considered separately. (3) The high heritability of the first PC suggests that similar genetic factors influence risk taking across various domains, making this a promising trait to study in connection with other biomarkers such as brain anatomy. (4) This variable has been studied in the literature, limiting our degrees of freedom for the current study. (5) GWAS results for this variable were readily available.6
1.2. Control Variables
All of our analyses systematically control for several genetic, socio-demographic and anthropometric factors that could potentially confound the observed associations [e.g. sex7, height8 and genetic population structure9]. Specifically, we use the following control variables, as provided by the UKB:
Age at the time of brain scan (Data-Field 21003)
Birth year (Data-Field 33)
Sex (self reported and genetically identified, Data-Fields 31 & 22001, dummy coded)
Height (Data-Field 50)
Handedness (Data-Field 1707, categorical variable: Right-handed, Left-handed, ambidextrous, N/A)
Sex x birth year interactions (binned into fields containing at least 20 participants each)
The first 40 PCs of the genetic data (Data-Field 22009)
Total intracranial volume (TIV), derived using the CAT12 toolbox from T1 images.
We carry out an additional analysis that further controls for the following socio-economic and cognitive outcomes (provided by the UKB):
Educational attainment (Data-Field 6138)
A 13-item measure of fluid IQ (Data-Fields 20016 and 20191)
Zip-code level measure of the Townsend social deprivation index (Data-Field 189)
Household income (Data-Field 738)
Number of household members (Data-Field 709)
Place of birth, binned in 100 clusters based on North and East birth location coordinates (Data-Fields 129 and 130). Clusters were calculated using the k-means algorithm, which minimizes within-cluster variances (squared Euclidean distances) of k = 100 clusters with 10,000 iterations after random seeding.
The empirical distributions of the main variables used in our main analysis and the correlations between them are depicted in Supplementary Figures 1A-C and Supplementary Figure 2.
1.3. Imaging-derived Phenotypes (IDPs)
1.3.1 T1 MRI Image Processing
Our voxel-level analysis uses T1-weighted structural brain MRI images in NIFTI format provided by the UKB (data field 20252). The images were acquired using 3-T Siemens Skyra scanners, with a 32-channel head coil (Siemens, Erlangen, Germany), with the following scanning parameters: repetition time = 2000 ms; echo time = 2.1 ms; flip angle = 8°; matrix size = 256 × 256 mm; voxel size = 1 × 1 × 1 mm; number of slices = 208.
We preprocessed the data using the Computational Anatomy Toolbox (CAT; www.neuro.uni-jena.de/cat/) for SPM (www.fil.ion.ucl.ac.uk/spm/software/spm12/), a fully automated toolbox for deriving neuroanatomical measurements at voxel and ROI levels. Image pre-processing used the default setting of CAT12 (accessible online at http://www.neuro.uni-jena.de/cat12/CAT12-Manual.pdf). Images were corrected for bias-field inhomogeneities, segmented into gray matter, white matter, and cerebrospinal fluid (CSF), spatially normalized to the MNI space using linear and non-linear transformations, and were modulated to preserve the total amount of signal in the original image during spatial normalization (the specific SPM-processing parameters can be found in the pre-registered document on OSF https://osf.io/qkp4g/). We applied spatial smoothing with 8-mm Full-Width-at-Half-Maximum (FWHM) Gaussian kernel for the segmented, modulated images for grey matter volume (GMV). Finally, to ensure that only voxels that likely contain grey matter enter the analyses, we constructed a brain mask based on the average of all GMV images. Specifically, following standard VBM procedures (see SPM/CAT12 http://www.neuro.uni-jena.de/cat12/CAT12-Manual.pdf) we thresholded the average of all brain images at 250 GMV intensity units. The resulting image was binarized and applied as a premask to all individual images before running analyses. Additionally, on an individual level, we excluded all voxels that exhibited a lower grey matter volume than .1 from the analyses (see standard parameters of SPM/CAT12 http://www.neuro.uni-jena.de/cat12/CAT12-Manual.pdf).
1.3.2 Region of interest (ROI)-level IDPs Processed by the UKB
We use all of the GMV IDPs that were processed and provided by the UKB [for details see ref 10]. These IDPs include GMV of 139 ROIs derived using parcellations from the Harvard-Oxford cortical and subcortical atlases, and Diedrichsen cerebellar atlas.
1.3.3 Additional ROI-level IDPs
Based on our voxel-level results (see 2.1), we extracted 5 additional ROI-level IDPs that quantified GMV in anatomical substructures that were not derived by the UKB. These ROIs were extracted bilaterally from unbiased masks and included the dorsolateral prefrontal cortex (dlPFC; BA 46), hypothalamus, posterior hippocampus, ventro-anterior insula and ventromedial prefrontal cortex (vmPFC). For the dlPFC, ventro-anterior insula and vmPFC masks, we used recent functional parcellations based on resting state data. The dlPFC mask was derived using the Sallet Dorsal Frontal resting state connectivity-based parcellation (cluster 7/BA46)11. Functionally, this area exhibits coupling with the frontal-parietal network (incl. anterior cingulate cortex, parietal cortex and inferior parietal lobe), as well as with the vmPFC. Anatomically, its boundaries show resemblance to BA 46 — an area functionally related to executive function that shows distinct cytoarchitectonic properties.
We extracted GMV from the vmPFC using a parcellation of the medial wall of the prefrontal cortex, based on resting state functional coupling12. Specifically, we extracted GMV from 14m — an area linked to cost-benefit integration in value-based decision-making13–16, which maintains strong positive coupling with hypothalamus, ventral striatum, and amygdala17. The hypothalamus mask was derived from a high-resolution atlas of human subcortical brain nuclei18. The posterior hippocampus mask was derived according to recent recommendations for long-axis segmentation of the hippocampus in human neuroimaging19. We labeled hippocampal voxels posterior to the coronal plane at y = −21 in MNI space (which corresponds to the uncal apex of the parahippocampal gyrus), as posterior hippocampus. To ensure spatial precision across participants, we used a minimum 80% likelihood of each voxel being in the anatomical structure for all of the aforementioned masks. The ventro-anterior insula mask was derived following a recent parcellation of the insula based on a resting state functional connectivity analysis by ref 20, which reported that this brain region showed functional coactivation with limbic areas including amygdala, ventral tegmental area (VTA), superior temporal sulcus, and posterolateral orbitofrontal cortex. The raw mask was thresholded at z = 10.
1.4 Polygenic Risk Score (PRS) for Risky behaviour
We use the genetic data provided by the UKB to construct a polygenic risk score (PRS) for risky behaviour. As a first step, we rerun the genome-wide association study (GWAS) of risky behaviour (the same measure used in the current study) as reported in ref 1 after excluding the 18,796 genotyped individuals with usable T1 NIFTI structural brain images (UKB field 20252) and all of their relatives up to the third degree (defined using the KING coefficient21 based on a pairwise coefficient >0.0442). The final GWAS sample includes 297,025 individuals of European ancestry. We use BOLT-LMM version 2.3.222 to perform GWAS with linear mixed models (LMM), which outperforms linear regression in terms of statistical power and controlling for relatedness23.
Next, we perform quality control (QC) of the GWAS results using a standardized QC protocol, described in detail in ref 1. This protocol removes rare and low-quality single-nucleotide polymorphisms (SNPs) based on minor allele frequency (MAF) < 0.001, imputation quality (INFO) < 0.7, and SNPs that could not be aligned with the Haplotype Reference Consortium (HRC) reference panel. After QC, a total of 11,514,220 SNPs remains in the GWAS summary statistics.
Thereafter, we calculate for each participant i a PRS, Si by weighting his or her genotype across SNPs (j), gij, by the corresponding regression coefficients, βj estimated in the GWAS described above. Thus, the PRS is a linear combination of genetic effects, calculated as: where the set of SNPs, M, is restricted to the consensus genotype set of 1.4 million SNPs established by the International HapMap 3 Consortium24, which has been successfully employed for polygenic prediction in many previous studies. Furthermore, the PRS is constructed only with autosomal, bi-allelic SNPs with MAF > 0.01 and INFO > 0.9 in the UKB. The resultant PRS is based on a total of M=1,176,729 SNPs. The PRS is then standardized to mean zero and unit variance in the prediction sample.
1.5 Genetic Correlations of Risky behaviour
We rely on the results of the risky behaviour GWAS to estimate genetic correlations between this phenotype and 85 other traits, using bivariate LD Score regression25. The estimates are reported in Supplementary Table 3. For this purpose, we query the “GWAS ATLAS”26 to identify publicly archived GWAS results that we consider relevant. We supplement the publicly available GWAS with a soon-to-be published GWAS on diet composition27. Notably, the collected traits span across many different outcomes, including the anthropometric, behavioural, cognitive, psychiatric, medical, and socioeconomic domains.
We find moderate to strong genetic correlations between our main measure and a range of phenotypes that are considered risky behaviours, including ever consuming cannabis (rg = 0.72; SE = 0.03), self-employment (rg = 0.52; SE = 0.30), and age at first sexual experience (rg = −0.54; SE = 0.02). Our measure of risky behaviour is also genetically correlated with a range of mental disorders including bipolar disorder (rg = 0.23; SE = 0.03), major depressive disorder (rg = 0.22; SE = 0.03), and schizophrenia (rg = 0.17; SE = 0.02). Finally, risky behaviour is genetically correlated in the expected direction with the personality traits of conscientiousness (rg = −0.25; SE = 0.10) and extraversion (rg = 0.34; SE = 0.05).
2. Pre-registration of Analysis Plan and Unplanned Deviations
We pre-registered our analysis plan on Open Science Framework (OSF, https://osf.io/qkp4g/). Our pre-registered plan specifies the construction of the dependent variable, the control variables, the inclusion criteria and quality controls, the VBM analyses and the main ROI-level analyses.
We deviated from the pre-registered plan in several cases, which are outlined in the following. These deviations occurred when the computational burden of following the preregistered plan was unexpectedly high, and when alternative measures that we were not aware of at the time of the pre-registration were made available by the UKB. Specifically, we decided not to use alternative segmentations of the cortex (e.g. Hammer’s atlas) as robustness checks for our ROI-level analysis because of the significant computational burden in deriving those measures. Instead, based on the voxel-level analysis, we derived additional ROIs only when they were not derived in sufficient granularity in the IDPs provided by the UKB (see 1.3.3).
Similarly, we did not derive cortical thickness (CT) measures because of the high computational burden using FreeSurfer, which is the gold standard in cortical thickness estimation. While other means to derive CT would have been available (e.g. CAT toolbox), they would provide relatively lower quality data and would not allow analyses of subcortical areas. Additionally, the UKB was expected to release CT measures derived from FreeSurfer before this work was finalized (see the UKB Data Showcase website for public announcements). The lack of CT measures has also led us to decide to postpone the conduct of an additional preregistered multivariate analysis.
Finally, our pre-registered plan states that we would run additional robustness checks to control for potential neurotoxic effects of excessive alcohol intake. Upon examining the data for a different project that is focused on the effects of alcohol intake on the brain, we observed effects that were mainly driven by individuals who were heavy drinkers. We therefore decided to deviate from our original plan and exclude all participants who qualified as current or former regular heavy drinkers. However, we also provide additional analyses that include weekly alcohol intake and smoking habits as a covariate (see Supplementary Figure 6). Finally, the preregistered analysis of white-matter volume is not reported here, because we decided to focus our manuscript on GMV differences.
Supplementary Discussion
Our study highlights the importance of using large samples to study associations of neuroanatomy with complex behavioural traits. The largest effect we identify for the relationship between any cluster of voxels and risky behaviour is ΔR2 = 0.6% (see Supplementary Table 4). It would require more than 1,750 participants to have 90% statistical power at a liberal p-value threshold of 0.05 (uncorrected) to identify effects of this magnitude. This is a lower bound for the required sample size for such studies that does not reflect the upward bias in our effect size estimate due to the statistical “winner’s curse”, and the need to correct for multiple testing. Previous large-scale VBM studies (N > 1000) with other behavioural phenotypes28 found effect sizes of similar magnitude and suggest that large samples are a prerequisite to detect such an association reliably. Of note, the largest previous study of risk tolerance employed a sample of 108 participants29 and would have only 12% power to detect ΔR2 = 0.6% at α = 0.05 (uncorrected).
A possible limitation of our study is that, the specific features of the component phenotypes (e.g., smoking) rather than their first PC (risky behaviour) could have driven the associations we report (quantified via standardized regression coefficients). To further investigate this possibility, we repeat our ROI-based analysis with the individual phenotypic measures (instead of their first PC) as outcome variables (see Supplementary Table 6). We find that 22 out of 23 ROIs are significantly associated with more than one phenotype (the exception is IX Cerebellum (r), which is significantly associated only with the number of sexual partners, yet the standardized coefficient denoting its relationship with the first PC is greater in magnitude than the coefficient denoting its relationship with the number of sexual partners). Furthermore, the standardized coefficients quantifying the relationships between the ROIs and the individual phenotypes are either smaller than or at the same order of magnitude as the coefficients quantifying their relationship with the first PC.
While our study is larger and more representative than any previous investigation of the topic, and although we control for various potential confounds and replicate our findings in an independent sample, it was conducted in a population of UK individuals of European descent that were over 40 years old at the time of measurement, which limits the generalizability of our results to other populations. Moreover, our results do not exclude the possibility of bias due to other unobserved variables that our analyses do not account for. With the rise of large publicly available data sets [e.g. ref 30], we hope that future studies will be able to test the generalizability of our findings to populations of different ethnicities and age groups (e.g., adolescents).
Finally, while our analyses identify distinct brain areas that mediate gene-phenotype associations for risky behaviour (i.e., putamen, hypothalamus and dlPFC), they do not provide evidence for their causal relationship. For instance, it is possible that a person’s genetic disposition would lead them to select into environments that influence both risky behaviour and features of brain anatomy.
Supplementary Figures
Supplementary Tables
Acknowledgments
This research was carried out under the auspices of the Brain Imaging and Genetics in Behavioural Research (https://big-bear-research.org/) consortium. The authors thank Nadja C. Furtner for helpful comments and Dylan Manfredi for research assistance. The research was conducted using UK Biobank resources under application 40830. The study was supported by funding from an NSF Early Career Development Program grant (1942917) and The Wharton School Dean’s Research fund to G.N., an ERC Consolidator Grant to P.D.K. (647648 EdGe), and a Swiss National Science Foundation grant to C.C.R (100019L_173248). R.R.W. was financially supported by NIAAA K23 grant (K23 AA023894) and H.R.K. was supported by NIDA grant P30 DA046345. G.N. thanks Carlos and Rosa de la Cruz for ongoing support. The work was carried out on the Dutch national e-infrastructure with the support of the SURF Cooperative. Data can be accessed via the UKBiobank, and data analysis scripts are available on OSF (https://osf.io/qkp4g/).