Enhanced Brain Imaging Genetics in UK Biobank

UK Biobank is a major prospective epidemiological study that is carrying out detailed multimodal brain imaging on 100,000 participants, and includes genetics and ongoing health outcomes. As a step forwards in understanding genetic influence on brain structure and function, in 2018 we published genome-wide associations of 3,144 brain imaging-derived phenotypes, with a discovery sample of 8,428 UKB subjects. Here we present a new open resource of GWAS summary statistics, resulting from a greatly expanded set of genetic associations with brain phenotypes, using the 2020 UKB imaging data release of approximately 40,000 subjects. The discovery sample has now almost tripled (22,138), the number of phenotypes increased to 3,935 and the number of SNPs increased to 17 million. For the first time, we include associations on the X chromosome. Previously we had found 148 replicated clusters of associations between genetic variants and imaging phenotypes; here we find 692 replicating clusters of associations, including 12 on the X chromosome. We describe some of the newly found associations, focussing particularly on the X chromosome. All summary statistics are openly available for interactive viewing and download on the “BIG40” open web server.

UK Biobank (UKB) is now approximately halfway through imaging 100,000 volunteers; the early-2020 release of brain imaging data contained data from almost 40,000 participants. This spans 6 brain MRI (magnetic resonance imaging) modalities, allowing the study of many different aspects of brain structure, function and connectivity. In conjunction with other data being recorded by UKB, which includes health outcomes, lifestyle, biophysical measures and genetics, UKB is a major resource for understanding the brain in health and disease.
In Elliott et al. [2018], we presented genome-wide association studies (GWAS) of 3,144 brain imaging phenotypes, with a discovery sample of 8,428 subjects. At that point we identified 148 replicated clusters of associations between genetic variants and the phenotypes. These imaging-derived phenotypes (IDPs) are derived from structural magnetic resonance imaging (MRI), functional MRI (fMRI), diffusion MRI (dMRI) and susceptibilty-weighted MRI (swMRI); for descriptions of the brain imaging modalities and IDPs, see Elliott et al. [2018]. We found links between IDPs and genes involved in: iron transport and storage, extracellular matrix and epidermal growth factor, and development, pathway signalling and plasticity.
We have now expanded and enhanced this work, with an almost threefold increase in sample size, an increase in the number of IDPs to almost 4,000, and with associations on the X chromosome [Wise et al., 2013] carried out for the first time. We have also greatly expanded our set of imaging confound variables [Alfaro-Almagro et al., 2020], reducing the likelihood of finding confounded associations. GWAS summary statistics and Manhattan plots for all 3,935 phenotypes are freely available for download from the Oxford Brain Imaging Genetics (BIG40) web server 1 , which also includes detailed tables of all IDPs, all SNPs (single nucleotide polymorphisms) tested, all association clusters, and an interactive viewer allowing for fine-detailed investigations of IDP associations with SNPs and nearby genes.
We also conducted sex-specific X chromosome GWAS, followed by meta-analyses combining these, using Fisher's method [Fisher, 1948]. The X chromosome accounts for about 5% of the human genome and incorporates over 1200 genes, including many which play a role in human cognition and development [Brenner, 2013]. However, testing for association with genetic variants on chromosome X requires special consideration [Clayton, 2008, Özbek et al., 2018, König et al., 2014. While genetic females inherit two copies of the X chromosome, genetic males inherit only a single copy from their biological mother and a copy of the Y chromosome from their father (here we refer to people with two X chromosomes as genetic females, and people with one X and one Y chromosome as genetic males). The short pseudoautosomal regions (PAR) on the ends of chromosome X are homologous with parts of chromosome Y and can be analysed in the same way as autosomal chromosomes. For the non-pseudoautosomal region, a mechanism has evolved to balance allele dosage differences between the genetic sexes (X chromosome inactivation, or XCI); during female development, one copy is randomly inactivated in each cell. This means that maternally and paternally inherited alleles would be expected to be expressed in different cell populations within the body approximately 50% of the time. However, this dosage compensation mechanism (DC) is imperfect; it is currently thought that only 60-75% of X-linked genes have one copy completely silenced in this way [Sidorenko et al., 2019].
To account for this in GWAS, it is common to follow Clayton [2008] and assume full dosage compensation: males are treated as homozygous females with genotypes coded (0,2) according to whether they have 0 or 1 copy of the alternative allele. In a joint analysis of females and males, simulation studies in Özbek et al. [2018] and König et al. [2014] suggest that type-I error control under this approach is reasonably robust to deviations from other assumptions (such as no sex-specific differences in allele frequencies), provided genetic sex is included as a covariate.
Recent studies have used the large sample sizes afforded by UK Biobank to perform stratified analyses to estimate the degree of dosage compensation as a parameter across a broad variety of traits [Sidorenko et al., 2019, Lee et al., 2018. These studies suggest that only a small proportion of genes escape XCI, although the appropriate amount of DC shows considerable variation amongst traits [Sidorenko et al., 2019]. For educational attainment, Lee et al. [2018] estimated a DC factor of 1.45, but concluded that little power was lost in their joint analysis irrespective of the assumed model. While a joint analysis under a full DC model is a reasonable default, the power afforded by biobank-scale datasets also permits examination of possible sex-specific effects via stratified analyses. These stratified analyses can subsequently be meta-analysed [Sidorenko et al., 2019, Lee et al., 2018, Luciano et al., 2019. If the meta-analysis is based on estimated effects (regression beta values), Lee et al. [2018] shows that appropriately chosen weights can give results almost the same as those from a joint genetic male/female analysis corresponding to any assumed DC model. Nevertheless, results will be biased if the assumed DC model differs from the truth. An unweighted meta-analysis based on p-values using Fisher's method (explored in this research), though potentially less powerful, should avoid this potential bias (as it is not sensitive to any relative scaling in regression model and hence betas in the two sex-separated GWAS), but still have value in confirming signals from a joint analysis.

Overview of GWAS Results
We conducted a genome-wide association study using the 39,691 brain imaged samples in UK Biobank. We divided these samples into a discovery (N=22,138) and a replication (N=11,086) cohort. The details for the imaging and genetics processing and the cohorts are given in Online Methods. We applied automated methods for identifying local peak associations for each phenotype, and also for aggregating peaks across phenotypes into clusters (described in Appendix A of the Supplementary Material). A cluster is a set of phenotype/variant pairs such that all of the phenotype/variant pairs have a −Log10(P) value for association that exceeds a 7.5 genome-wide significance threshold, and such that all of the pairs have variants that are close with respect to genetic distance. We assigned each of the phenotype/variant pairs to one and only one cluster. We defined a cluster as replicating if at least one of the phenotype/variant pairs had nominal significance in our replication cohort.
With these methods, we found 10,889 peak associations among all phenotypes and chromosomes (8,446 replicating at nominal significance), and found 1,282 clusters (693 replicating) after clustering the peak associations according to our automated methods (the number of replicating clusters reported in Elliott et al. 2018 was 148). The 693 replicating clusters are distributed across all chromosomes, with between 8 and 60 clusters per chromosome. We grouped the IDPs into 17 categories (Supplementary Table 1). Of the replicated associations among these 693 clusters, 16 out of 17 categories are represented (the task fMRI activation category is the only category without at least one associated SNP). The number of associations per category ranges between 12 for the category volume of white matter hyperintensities (lesions) which consists of just one IDP, and 1,954 for the regional and tissue volume category. All of these associations are listed in Supplementary Table S4, and Manhattan plots along with quantile plots are provided on the BIG40 open web server.
Of the clusters of associations, 38 are on the X chromosome (12 replicating), and 4 of the X chromosome clusters have a phenotype/variant pair with a −Log10(P) value for association that exceeds the more stringent Bonferroni 2 corrected level of −Log10(P) ≥ 11.1.

X Chromosome Results -Overview and Sex-specific Tests
Full details for the lead associations for the X chromosome clusters (including clusters that do not replicate) are provided in Supplementary Table S2. A summary of all of the peak associations included in the X chromosome clusters is provided in Supplementary Table S3, and the full results for peak associations on all chromosomes are provided in Supplementary Table S4. In these tables, clusters numbers are given in the first column, and clusters are ordered based on the chromosome number (in ascending order with the X chromosome first) and then by the −Log10(P) value of the lead association (in descending order). A summary of all replicating X chromosome clusters is provided in Table 1, and further details (genes and eQTL information) of the four Bonferroni-significant clusters is provided in Table 2. Figure 1 shows Manhattan plots for the lead associations in these 4 top X chromosome clusters. These top four clusters are explored further below.
Genetic sex affects the brain in fundamental ways [Nugent and McCarthy, 2011, Ruigrok et al., 2014, Saleem and Rizvi, 2017, Nguyen et al., 2019. Our main GWAS analyses include sex as one of the confound variables.
To assess the quality of this deconfounding, and to explore associations on the X chromosome that are driven by sex, we conducted two additional GWAS in which we restricted our discovery cohort to just genetic females and (separately) just genetic males. We then combined these two additional GWAS in a meta-analysis using Fisher's method [Fisher, 1948]. Clusters in our main analysis that are significant in the meta-analysis but not significant in one of the sex-specific scans may indicate sex-driven associations. Of the 12 replicating X chromosome clusters in the main (all subjects pooled) analysis, one cluster (Cluster 2) is significant at the 11.1 level for one genetic sex, but not significant for the other genetic sex; Cluster 2 may therefore be driven by genetic males. To provide more direct evidence for this, we performed two-tailed z-tests to determine if the beta coefficients differ significantly between the genetic sexes [Clogg et al., 1995]. For Clusters 1 and 2, we found that the beta coefficients are nominally different (Cluster 1: P=1.2×10 −2 , beta coefficient for genetic females: -0.14, beta coefficient for genetic males: -0.08; Cluster 2: P=1.0×10 −2 , beta coefficient for genetic females: 0.07, beta coefficient for genetic males: 0.13). The differences between the sex-specific beta coefficients for the lead associations of Clusters 3 and 4 were not significant.
Finally, we created an additional set of clusters (using the clustering method described above), based on the p-values of the meta-analysis of the X chromosome (thresholding at the genome-wide significance level).
The clustering of the meta-analysis X chromosome scan produced 23 clusters. Twenty of these had lead associations within 0.25cM of one of the discovery cohort clusters derived from pooling all subjects of both sexes (indicating an overlap of those clusters). Each of the four discovery cohort X chromosome clusters with lead association at the Bonferroni level (the first four rows of Table 1) overlap with a meta-analysis cluster, suggesting that these main clusters are not confounded by genetic sex. Of the remaining meta-analysis clusters, three do not overlap with any of the discovery cohort clusters. The lead-rsid/lead-phenotype pairs of these three non-overlapping clusters are rs5990961/V3742, rs142994659/V1233 and rs764953454/V3919 (the mapping between the phenotype numbers and phenotype names is given in Supplementary Table S1). None of these three meta-analysis clusters achieved Bonferroni significance, however.
The differences in sensitivity between original (all subjects pooled) GWAS, sex-specific GWAS and Fisher meta-analysis are visualised concisely in Figure 2. The histograms show the distributions of paired-difference −Log10(P) values. For the sex-specific comparisons, there are SNP-phenotype pairs having reduced sensitivity compared with the original all-subjects GWAS (likely due to reduced statistical power because of reduced subject numbers), and other pairs with increased sensitivity (likely because a given association is stronger for the sex in question than for the other sex). The meta-analysis paired-difference distribution demonstrates that sex-separated GWAS followed by meta-analysis gives increased sensitivity to finding genetic associations in the X chromosome.

Investigation of the Four Main X Chromosome Clusters
We now examine the four main X chromosome clusters in greater detail; these additional investigations are summarized in Supplementary Table S5.
Cluster 1 comprises 5 SNPs, associated in total with 96 IDPs, all capturing differences in the properties of white matter tracts distributed throughout the cerebrum. This is described more in Table 2 and Supplementary   Table S5. The top SNP (rs2272737, P=3.5 × 10 −21 ) is located about 10 kb away from, and is an eQTL of FAM58A (or CCNQ). Mutations in this gene lead to STAR (syndactyly, telecanthus and anogenital and renal malformations), a rare X-linked developmental disorder [Unger et al., 2008] recently identified, for which notable brain variations have been observed such as incomplete hippocampal inversion, thin corpus callosum, ventriculomegaly and cerebellar hypoplasia [Bedeschi et al., 2017, Orge et al., 2016. In addition, while FAM58A codes for an orphan cyclin with undescribed function, it has been shown recently to interact with CDK10 [Guen et al., 2013]. Of particular relevance considering the many white matter IDPs associated with Cluster 1, mutation of the gene CDK10 has been observed in a case study to lead to a rudimentary corpus callosum and paucity of white matter surrounding the lateral ventricles [Guen et al., 2018].
The SNPs of Cluster 1 are further associated with an array of non-imaging-derived phenotypes (nIDPs) largely related to health (including diagnosed diseases and operative procedures), as well as some variables not necessarily health-related (Supplementary Table S5). Interestingly, one SNP in Cluster 1 (rs1894299) was seen previously in a GWAS of Type 2 diabetes [Suzuki et al., 2019]. This SNP is located in an intron of DUSP9 (MKP4), a gene that codes for a phosphatase whose overexpression specifically protects against Black dots indicate associations that are significant associations at the genomewide level, −Log10(P) ≥ 7.5. Grey lines show genome-wide+Bonferroni level (11.1) and genome-wide significance level (7.5).

Figure 2:
Paired difference histograms for the sex-specific scans. We plot histograms for the differences between the −Log10(P) values for: genetic females (top), genetic males (middle), and the meta-analysis (bottom), vs. the discovery scan (which pooled all subjects together, but did include a sex confound covariate).
Differences are plotted for all associations for which the maximum −Log10(P) value over the four analyses is greater than 4.0, leading to the bimodal nature of the histogram. A total of 989,981 variants pass this maximum filter. The bottom plot shows that there is greater statistical sensitivity when carrying out sex-specific GWAS on the X chromosome, and then pooling results with a meta-analysis, than by combining all subjects together in a simple standard GWAS.   stress-induced insulin resistance [Emanuelli et al., 2008]. This may be related to another consistent aspect of these nIDP associations with Cluster 1: the diet of UK Biobank participants with intake of sweet food and drinks (including desserts, puddings, beer and cider).
Cluster 2 comprises 9 SNPs associated altogether with 17 IDPs, all of which are grey matter vs white matter intensity contrast, in limbic and temporal regions, and brain areas making up the default-mode network. The top SNP (rs62595479, P=8.2×10 −17 ) is located in a pseudoautosomal region of chromosome X, i.e., a genetic region homologous between chromosomes X and Y -in an intron of DHRSX, and is an eQTL of the same gene. The genetic association with the grey-white intensity contrast IDP for this SNP was mainly driven by the male UK participants (as described above). The male-dominated aspect of the association between DHRSX and the brain has also been observed in a study showing that four PAR genes, including DHRSX (and SPRY3, see below), are up-regulated in the blood of genetic male patients with ischemic stroke [Tian et al., 2012].
While the majority of the nIDPs associations with the SNPs of Cluster 2 were related to diagnoses and operative procedures, half of these pointed at thyroid-related issues, in addition to the nIDP of workplace temperature which may be related to thyroid function (Supplementary Table S5). Remarkably, the distribution of thyroid function modulation in the brain appears to consistently follow (in both positron emission tomography and fMRI studies), that of the 17 regions associated with Cluster 2: mainly limbic and temporal areas including the posterior cingulate cortex, orbitofrontal cortex, parahippocampal and fusiform gyrus [Miao et al., 2011, Schreckenberger et al., 2006, Zhang et al., 2014, Göttlich et al., 2015.
Cluster 3 includes 9 SNPs associated with 28 IDPs of local brain volume. All 28 IDPs are located in the occipital lobe except for the volume of the brainstem and fourth ventricle. The top SNP (rs644138, P=4.8×10 −15 ) is located in a PAR in an intron of SPRY3. It is also an eQTL in many brain regions of a variety of genes whose mutations are involved in developmental and neurodevelopmental disorders: RAB39B, which plays a role in normal neuronal development and dendritic process, is associated with cognitive impairment [Vanmarsenille et al., 2014], X-linked intellectual disability [Giannandrea et al., 2010] and Waisman syndrome in particular, an X-linked neurologic disorder characterised by delayed psychomotor development, impaired intellectual development, and early-onset Parkinson's disease [Wilson et al., 2014]; TMLHE is associated with X-linked autism [Celestino-Soper et al., 2011]; CLIC2, with X-linked intellectual disability [Takano et al., 2012]; and BRCC3, with an X-linked recessive syndromic form of moyamoya disease [Miskinyte et al., 2011]. It is also an eQTL of F8, and F8A1 (DXS522E/HAP40), a likely candidate for the aberrant nuclear localisation of mutant huntingtin in Huntington's disease. Considering the distribution of the brain IDPs in the occipital lobe, it perhaps lends additional credence to the consistent, but yet not understood, observation of volumetric and sulcal differences in these visual grey matter regions in Huntington's disease gene carriers [Rosas et al., 2008, Mangin et al., 2020.
Another aspect of the genes for which the top SNP of Cluster 3 is an eQTL is cardiovascular issues: for instance, mutant CLIC2 leads to atrial fibrillation, cardiomegaly and congestive heart failure [Takano et al., 2012], F8 encodes a large plasma glycoprotein that functions as a blood coagulation factor, whereas mutations in BRCC3 are linked to moyamoya syndrome, a rare blood vessel disorder in which certain arteries in the brain are blocked or constricted, and that is accompanied by other symptoms including hypertension, dilated cardiomyopathy and premature coronary heart disease [Miskinyte et al., 2011]. In line with this, we found that Cluster 3 was associated in UKB participants with diagnosis of atrioventricular block and ventricular premature depolarisation, as well as an operative procedure consisting of the replacement of two coronary arteries. In addition, Cluster 3 was consistently associated with many measures of physical growth, perhaps in line with the role of SPRY3 that functions as fibroblast growth factor antagonists in vertebrate development, including height, lung function and capacity and body mass (Supplementary Table S5).
Finally, Cluster 4 comprised 8 SNPs associated with volume of grey matter regions in the dorsolateral prefrontal cortex and the lateral parietal cortex (supramarginal gyrus and opercular cortex) . The top SNP in this cluster (rs12843772, P=5.1×10 −12 ) is located just <150 basepairs from ZIC3, which plays a key role in body pattern formation and left-right asymmetry. Mutations in this gene are thought to be involved in 1% of heterotaxy (situs ambiguous and inversus) in humans [Ware et al., 2004]. This may explain why the distribution of the higher-order grey matter regions associated with Cluster 4 follows the pattern of the fronto-parietal networks, which are notoriously left-vs-right segregated [Witt et al., 2020]. In particular, the supramarginal gyrus is known to show the strongest asymmetries from an early developmental stage [Dubois et al., 2010], and to be connected by white matter tracts that share a genetic influence with human handedness [Wiberg et al., 2019]. ZIC3 is also involved in neural tube development and closure, and mutations in this gene cause, in addition to neural tube defects and cerebellum hypoplasia, consistent histological brain alterations with abnormal laterality and axial patterning, including a disorganised cerebral cortex [Purandare et al., 2002].
Cluster 4 is in particular strongly associated with IGF-1 levels in the blood of UK Biobank participants (Supplementary Table S5). IGF-1 in particular controls brain development, plasticity and repair [Dyer et al., 2016]. More recently, it has emerged as a risk factor for dementia and particularly Alzheimer's disease [Westwood et al., 2014], as it is a major regulator of Aβ physiology, and controls Aβ clearance from the brain [Bates et al., 2009].

Discussion
A major component in the expansion of the UK Biobank prospective epidemiological resource is the addition of tens of thousands of newly imaged participants, and the increase in the richness of phenotypes that can be derived from the imaging data. Since we published our first large-scale GWAS of UKB brain imaging in 2018, the brain imaging has almost reached its halfway point, having now scanned nearly 50,000 volunteers. As a result, the size of the available discovery sample for GWAS has nearly tripled, and the number of genetic variants passing reliability thresholds (such as minor allele frequency) has increased by 45%, now reaching 17 million. We believe therefore that this is a good time to update our large-scale GWAS, now with almost 4,000 imaging-derived phenotypes -thousands of distinct measures of brain structure, function, connectivity and microstructure. The number of replicated clusters of imaging-genetic associations identified has more than quadupled since our previous work. We have made all of the GWAS summary statistics openly available via the new "BIG40" brain imaging genetics server.
We also studied brain imaging associations in the X chromosome. We identified 12 clusters of association that replicate at the genome-wide significance level of −Log10(P) ≥ 7.5, four of which replicate at the GWAS+Bonferroni level (increasing the standard 7.5 level according to Bonferroni correction for the number of IDPs tested). Among these four top X chromosome clusters, we find associations involving diffusion measures distributed in white matter tracts and grey/white matter intensity contrasts. We also find associations involving occipital lobe grey matter, and fronto-parietal grey matter. These associations are relevant to a diverse set of variations in brain development or pathologies including the recently identified STAR condition (syndactyly, telecanthus and anogenital and renal malformations), Waisman syndrome, early-onset Parkinson's disease, X-linked autism, Alzheimer's disease and Huntington's disease. Cardiovascular conditions such as ischemic stroke, moyamoya syndrome and premature coronary heart disease are also implicated. The X chromosome genes FAM58A (CCNQ), DHRSX, SPRY3, F8A1, F8, BRCC3, TMLHE, RAB39B, CLIC2 and ZIC3 are involved in the associations we report.
The X chromosome is typically understudied in GWAS, and many of the X chromosome loci we identify are now implicated in genetic associations with brain imaging phenotypes for the first time. Ploidy, the Barr body and potential confounding with genetic sex complicates X chromosome analysis. To address this, we performed sex-specific GWAS, and a meta-analysis re-combining the sex-separated analyses. We showed significant overlap between the sex-specific meta-analysis and the main (all subjects pooled) analysis of the discovery cohort, providing evidence against confounding. This also allowed us to investigate if a given association was sex-driven (the association may be significant for only one genetic sex, or have significantly different effect sizes between the two sex-separated GWAS). For example, we found evidence that our Cluster 2 of brain-gene associations is driven by associations in males.
Enhancing our understanding of the mapping between genotype and phenotype leads to advances in neuroscience and improvements in outcomes for brain pathologies. Deeply phenotyped resources such as UKB provide an opportunity to update known genotype/phenotype mappings for a wide spectrum of brain imaging phenotypes, as more samples and more phenotypes are released. It is crucial that such updates are provided on open platforms. We have done that here with the open BIG40 server (with a total of 1,282 clusters of associations), and with freely available summary statistics. We hope that this dissemination can be valuable for the next generation of neuroscience research.