Discovery of the first genome-wide significant risk loci for ADHD

Attention-Deficit/Hyperactivity Disorder (ADHD) is a highly heritable childhood behavioral disorder affecting 5% of school-age children and 2.5% of adults. Common genetic variants contribute substantially to ADHD susceptibility, but no individual variants have been robustly associated with ADHD. We report a genome-wide association meta-analysis of 20,183 ADHD cases and 35,191 controls that identifies variants surpassing genome-wide significance in 12 independent loci, revealing new and important information on the underlying biology of ADHD. Associations are enriched in evolutionarily constrained genomic regions and loss-of-function intolerant genes, as well as around brain-expressed regulatory marks. These findings, based on clinical interviews and/or medical records are supported by additional analyses of a self-reported ADHD sample and a study of quantitative measures of ADHD symptoms in the population. Meta-analyzing these data with our primary scan yielded a total of 16 genome-wide significant loci. The results support the hypothesis that clinical diagnosis of ADHD is an extreme expression of one or more continuous heritable traits.


Abstract
Attention-Deficit/Hyperactivity Disorder (ADHD) is a highly heritable childhood behavioral disorder affecting 5% of school-age children and 2.5% of adults. Common genetic variants contribute substantially to ADHD susceptibility, but no individual variants have been robustly associated with ADHD. We report a genome-wide association meta-analysis of 20,183 ADHD cases and 35,191 controls that identifies variants surpassing genome-wide significance in 12 independent loci, revealing new and important information on the underlying biology of ADHD.
Associations are enriched in evolutionarily constrained genomic regions and loss-of-function intolerant genes, as well as around brain-expressed regulatory marks. These findings, based on clinical interviews and/or medical records are supported by additional analyses of a self-reported ADHD sample and a study of quantitative measures of ADHD symptoms in the population. Meta-analyzing these data with our primary scan yielded a total of 16 genomewide significant loci. The results support the hypothesis that clinical diagnosis of ADHD is an extreme expression of one or more continuous heritable traits.

Background
Attention-Deficit/Hyperactivity Disorder (ADHD) is a neurodevelopmental psychiatric disorder, that affects around 5% of children and adolescents and 2.5% of adults worldwide 1 . ADHD is often persistent and markedly impairing with increased risk of harmful outcomes such as injuries 2 , traffic accidents 3 , increased health care utilization 4,5 , substance abuse 6 , criminality 7 , unemployment 8 , divorce 4 , suicide 9 , AIDS risk behaviors 8 , and premature mortality 10 .
Epidemiologic and clinical studies implicate genetic and environmental risk factors that affect the structure and functional capacity of brain networks involved in behavior and cognition 1 , in the etiology of ADHD.
Consensus estimates from over 30 twin studies indicate that the heritability of ADHD is 70-80% throughout the lifespan 11,12 and that environmental risks are those not shared by siblings 13 . Twin studies also suggest that diagnosed ADHD represents the extreme tail of one or more heritable quantitative traits 14 . Additionally, family and twin studies report genetic overlap between ADHD and other conditions including antisocial personality disorder/behaviours 15 , cognitive impairment 16 , autism spectrum disorder 17,18 , schizophrenia 19 , bipolar disorder 20 , and major depressive disorder 21 .
Thus far genome-wide association studies (GWASs) to identify common DNA variants that increase the risk of ADHD have not been successful 22 . Nevertheless, genome-wide SNP heritability estimates range from 0.10 -0. 28 23,24 supporting the notion that common variants comprise a significant fraction of the risk underlying ADHD 25 and that with increasing sample size, and thus increasing statistical power, genome-wide significant loci will emerge.
Previous studies have demonstrated that the common variant risk, also referred to as the single nucleotide polymorphism (SNP) heritability, of ADHD is also associated with depression 25 , conduct problems 26 , schizophrenia 27 , continuous measures of ADHD symptoms 28,29 and other neurodevelopmental traits 29 in the population. Genetic studies of quantitative ADHD symptom scores in children further support the idea that ADHD is the extreme of a quantitative trait 30 .
Here we present a genome-wide meta-analysis identifying the first genome-wide significant loci for ADHD using a combined sample of 55,374 individuals from an international collaboration.
We also strengthen the case that the clinical diagnosis of ADHD is the extreme expression of one or more heritable quantitative traits, at least as it pertains to common variant genetic risk, by integrating our results with previous GWAS of ADHD-related behavior in the general population.

Genome-wide significantly associated ADHD risk loci
Genotype array data for 20,183 ADHD cases and 35,191 controls were collected from 12 cohorts In total, 304 genetic variants in 12 loci surpassed the threshold for genome-wide significance (P<5´10 -8 ; Figure 1, Table 1 Table 2).

Homogeneity of effects between cohorts
No genome-wide significant heterogeneity was observed in the ADHD GWAS meta-analysis (see Supplementary Information). Genetic correlation analysis (see Online Methods) provided further evidence that effects were consistent across different cohort study designs. The estimated genetic correlation between the European ancestry PGC samples and the iPSYCH sample from LD score regression 37 was not significantly less than 1 (r g = 1.17, SE = 0.2). The correlation between European ancestry PGC case/control and trio cohorts estimated with bivariate GREML was close to one (r g = 1.02; SE = 0.32).
Polygenic risk scores (PRS) 38 also show consistency over target samples. PRS computed in each PGC study using iPSYCH as the training sample were consistently higher in ADHD cases as compared to controls or pseudo-controls (see Supplementary Figure 11). Increasing deciles of PRS in the PGC were associated with higher odds ratio (OR) for ADHD ( Figure 2). A similar pattern was seen in five-fold cross validation in the iPSYCH sample, with PRS for each subset computed from the other four iPSYCH subsets and the PGC samples used as training samples (see Online Methods; Figure 2). Across iPSYCH subsets, the mean of the maximum variance explained by the estimated PRS (Nagelkerke's R 2 ) was 5.5% (SE = 0.0012). The difference in standardized PRS between cases and controls was stable across iPSYCH subsets (OR = 1.56, 95% confidence interval (CI): 1.53 -1.60); Supplementary Figure 9). These results further support the highly polygenic architecture of ADHD and demonstrate that the risk is significantly associated with the individual PRS burden in a dose-dependent manner.

Polygenic Architecture of ADHD
To assess the proportion of phenotypic variance explained by common variants we applied LD score regression 37 in the European ancestry meta-analysis (Online Methods). Assuming a population prevalence of 5% for ADHD 39  To further characterize the patterns of heritability from the genome-wide association data, we performed partitioning based on the functional annotations described in Finucane et al. 40

Genetic correlation with other traits
Pairwise genetic correlation with ADHD was estimated for 220 phenotypes using LD score

Biological annotation of significant loci
For the 12 genome-wide significant loci, Bayesian credible sets were defined to identify the set of variants at each locus most likely to include a causal effect (see Online Methods, Supplementary eTable 1). Biological annotations of the variants in the credible set were then considered to identify functional or regulatory variants, common chromatin marks, and variants associated with gene expression (eQTLs) or in regions with gene interactions observed in Hi-C data (see Online Methods, Supplementary eTable 2). Broadly, the significant loci do not coincide with candidate genes proposed to play a role in ADHD 58 .
Here we highlight genes that are identified in the regions of association (see also Supplementary Table 4). The loci on chromosomes 2, 7, and 10 each have credible sets localized to a single gene with limited additional annotations. In the chromosome 7 locus, FOXP2 encodes a forkhead/winged-helix transcription factor and is known to play an important role in synapse formation and neural mechanisms mediating the development of speech and learning [59][60][61] .
Comorbidity of ADHD with specific developmental disorders of language and learning is common (7-11%) 62,63 , and poor language skills have been associated with higher inattention/hyperactivity symptoms in primary school 64 . Genome

Analysis of gene sets
Competitive gene based tests were performed for FOXP2 target genes, highly constrained genes, and for all Gene Ontology terms 81 from MsigDB 6.0 82 using MAGMA 83 (Online Methods).
Association results for individual genes are consistent with the genome-wide significant loci for the GWAS (Supplementary Table 5 Consistent with the partitioning of heritability, a set of 2,932 genes that are highly constrained and show high intolerance to loss of function 86 showed significant association with ADHD (beta=0.062, P=2.6x10 -4 ). We also find little evidence for effects in previously proposed candidate genes for ADHD 58 ; of the nine proposed genes only SLC9A9 showed weak association with ADHD (P=3.4´10 -4 ; Supplementary Table 6). None of the Gene Ontology gene sets were significant after correction for multiple testing, , but the top pathways did include interesting and nominally significant pathways such as "dopamine receptor binding" (p=0.0010) and "Excitatory Synapse" (P = 0.0088). (Supplementary eTable 4).

Replication of GWAS loci
Here we describe the comparison of the GWAS meta-analysis of ADHD with two other ADHDrelated GWASs: a 23andMe self-report cohort (5,857 cases and 70,393 controls) and a metaanalysis of childhood rating scales of ADHD symptoms performed by the EAGLE consortium (17,666 children < 13 years of age) 30 . We observed moderate concordance of genome-wide results between the ADHD GWAS and the cohort with a self-reported history of diagnosis for ADHD or Attention Deficit Disorder genotyped by 23andMe (see Supplementary Information).
The estimated genetic correlation between the two analyses was strong (r g =0.653, SE=0.114), but significantly less than 1 (P=2.   Table 13). In EAGLE, the direction of effect is concordant for 10 of 11 genome-wide significant loci from the ADHD GWAS meta-analysis (P =0.0159).
We then meta-analyzed the ADHD GWAS with the 23andMe and EAGLE results.  (Table 1). This is within the range of effect sizes for common genetic variants that has been observed for other highly polygenic psychiatric disorders e.g. schizophrenia 33  The current results further support the hypothesis that ADHD is the extreme expression of one or more heritable quantitative traits. We observe strong concordance between the GWAS of ADHD and previous GWAS of ADHD-related traits in the population from the EAGLE Consortium 30 , both in terms of genome-wide genetic correlation and concordance at individual loci. Polygenic risk for ADHD has previously been associated with inattentive and hyperactive/impulsive trait variation below clinical thresholds in the population 29 . Shared genetic risk with health risk behaviors may similarly be hypothesized to reflect an impaired ability to self-regulate and inhibit impulsive behavior 95,96 .
In summary, we report 12 independent genome-wide significant loci associated with ADHD in

GWAS meta-analysis
Quality control, imputation and primary association analyses were done using the bioinformatics pipeline Ricopili (available at https://github.com/Nealelab/ricopili), developed by the Psychiatric Genomics Consortium (PGC) 33 . In order to avoid potential study effects the 11 PGC samples and the 23 genotyping batches within iPSYCH were each processed separately unless otherwise stated (see Supplementary Information).
Stringent quality control was applied to each cohort following standard procedures for GWAS, including filters for call rate, Hardy-Weinberg equilibrium, and heterozygosity rates (see

Supplementary Information). Each cohort was then phased and imputed using the 1000 Genomes
Project phase 3 (1KGP3) 34,97 imputation reference panel using SHAPEIT 98 and IMPUTE2 99 , respectively. For trio cohorts, pseudocontrols were defined from phased haplotypes prior to imputation.
Cryptic relatedness and population structure were evaluated using a set of high quality markers pruned for linkage disequilibrium (LD). Genetic relatedness was estimated using PLINK Variants were filtered and included for imputation quality (INFO score) > 0.8 and MAF > 0.01.
Only markers supported by an effective sample size N eff = 2/(1/N cases + 1/N controls ) 103 greater than 70% were included. After filtering, the meta-analysis included results for 8,047,421 markers.

Conditional analysis
Twelve independent genome-wide significant loci were identified by LD clumping and merging loci within 400 kb (see Supplementary Information). In two of these loci a second index variant persisted after LD clumping. The two putative secondary signals were evaluated by considering analysis conditional on the lead index variant in each locus. In each cohort, logistic regression was performed with the imputed genotype dosage for the lead index variant included as a covariate. All covariates from the primary GWAS (e.g. principle components) were also included. The conditional association results were then combined in an inverse-variance weighted meta-analysis.

Genetic correlations between ADHD samples
Genetic correlation between the European-ancestry PGC and iPSYCH GWAS results was calculated using LD Score regression 37

Polygenic Risk Scores for ADHD
The iPSYCH sample were split into five groups, and subsequently five leave-one-out association analyses were conducted, using four out of five groups and the PGC samples as training datasets 38 . PRS were estimated for each target sample using variants passing a range of association P-value thresholds in the training samples. PRS were calculated by multiplying the natural log of the odds ratio of each variant by the allele-dosage (imputation probability) and whole-genome polygenic risk scores were obtained by summing values over variants for each individual.
For each of the five groups of target samples, PRS were normalized and the significance of the case-control score difference was tested by standard logistic regression including principle components. For each target group and for each P-value threshold the proportion of variance explained (i.e. Nagelkerke's R 2 ) was estimated by comparing the regression with PRS to a reduced model with covariates only. The OR for ADHD within each PRS decile group was estimated based on the normalized score across groups (using the P-value threshold with the highest Nagelkerke's R 2 within each target group) (Figure 3). OR was also estimated using logistic regression on the continuous scores for each target group separately and an OR based on all samples using the normalized PRS score across all groups (Supplementary Figure 9).
Additionally PRS were evaluated in the PGC samples using the iPSYCH sample as training sample, following the approach described above (see Supplementary Information).

SNP heritability and intercept evaluation
LD score regression 37 was used to evaluated the relative contribution of polygenic effects and confounding factors, such as cryptic relatedness and population stratification, to deviation from the null in the genome-wide distribution of GWAS " # statistics. Analysis was performed using pre-computed LD scores from European-ancestry samples in the 1000 Genomes Project (available on https://github.com/bulik/ldsc) and summary statistics for the European-ancestry ADHD GWAS to ensure matching of population LD structure. The influence of confounding factors was tested by comparing the estimated intercept of the LD score regression to one, it's expected value under the null hypothesis of no confounding from e.g. population stratification.
The ratio between this deviation and the deviation of the mean " # from one (i.e. it's expected value under the null hypothesis of no association) was used to estimate the proportion of inflation in " # attributable to confounding as opposed to true polygenic effects (ratio = (intercept-1)/(mean " # -1)). SNP heritability was estimated based on the slope of the LD score regression, with heritability on the liability scale calculated assuming a 5% population prevalence of ADHD 39 .
Partitioning of the heritability SNP heritability was partitioned by functional category and tissue association using LD score regression 40 . Partitioning was performed for 53 overlapping functional categories, as well as 220 cell-type-specific annotations grouped into 10 cell-type groups, as described in Finucane et al. 40 . For each cell-type group and each H3K4Me1 cell-type annotations, the contribution to SNP heritability was tested conditional on the baseline model containing the 53 functional categories.

Genetic correlations of ADHD with other traits
The genetic correlation of ADHD with other traits were evaluated using LD Score regression 42 .
For a given pair of traits, LD score regession estimates the expected population correlation between the best possible linear SNP-based predictor for each trait, restricting to common SNPs.
Such correlation of genetic risk may reflect a combination of colocalization, pleiotropy, shared biological mechanisms, and causal relationships between traits. Correlations were tested for 219 phenotypes with publically available GWAS summary statistics using LD Hub 41 (see Supplementary Information). Correlation with Major Depressive Disorder was tested using GWAS results from an updated analysis of 130,664 cases and 330,470 controls from the Psychiatric Genomics Consortium (submitted). As in the previous LD score regression analyses, this estimation was based on summary statistics from the European GWAS meta-analysis, and significant correlations reported are for traits analysed using individuals with European ancestry.

Credible set analysis
We defined a credible set of variants in each locus using the method described by Maller et al. 107 (see Supplementary Information), implemented by a freely available R script

Biological annotation of variants in credible set
The variants in the credible set for each locus, were annotated based on external reference data in order to evaluate potential functional consequences. In particular, we identify: (a) Gene and regulatory consequences annotated by Variant Effect Predictor (VEP) using Ensembl with genome build GRCh37 108 . We exclude upstream and downstream consequences, and consequences for transcripts that lack a HGNC gene symbol (e.g. vega genes). across 127 tissue/cell types was annotated using FUMA (http://fuma.ctglab.nl/). We also evalauted the annotated chromatin state from fetal brain.

Gene-set analyses
Gene-based association with ADHD was estimated with MAGMA 1.05 83  A set of evolutionarily highly constrained genes were also analysed. The set of highly constrained genes was defined using a posterior probability of being loss-of-function intolerant (pLI) based on the observed and expected counts of protein-truncating variants (PTV) within each gene in a large study of over 60,000 exomes (the Exome Aggregation Consortium; ExAC) 86 . Genes with pLI ≥0.9 were selected as the set of highly constrained genes (2932 genes).

Replication of GWAS loci
To replicate the results of the ADHD GWAS meta-analysis we compared the results to analyses from 23andMe and EAGLE. We evaluated evidence for replication based on: (a) genetic correlation between the ADHD GWAS and each replication cohort; (b) sign tests of concordance between the ADHD GWAS meta-analysis and each replication cohort; (c) meta-analysis of the ADHD GWAS meta-analysis results with the results from analyses of the replication cohorts; and (d) tests of heterogeneity in the meta-analyses of the ADHD GWAS meta-analysis with the replication cohorts.
Genetic correlations were calculated using LD score regression 37 with the same procedure as described above. For the sign test, LD clumping was performed for all variants with P < 1 x 10 -4 in the ADHD GWAS meta-analysis using LD estimated from European ancestry individuals from 1000 Genomes Phase 3 data. The proportion of variants with a concordant direction of effect in the two replication samples (p) was evaluated using a one sample test of the proportion with Yates' continuity correction against a null hypothesis of p = 0.50 (i.e. the signs are concordant between the two analyses by chance). This test was done for loci passing P-value thresholds of P < 5 x 10 -8 , P < 1 x 10 -7 , P < 1x 10 -6 , P < 1 x 10 -5 , and P < 1 x 10 -4 in the ADHD GWAS meta-analysis (see Supplementary Information).
We performed three meta-analyses based on the ADHD GWAS meta-analysis result and the results from the two replication cohorts. First, we performed an inverse variance-weighted metaanalysis of the ADHD GWAS meta-analysis with the results of the 23andMe GWAS of selfreported ADHD case status. Second, we performed a meta-analysis combining the results from clinically ascertained ADHD with results from GWAS of ADHD-related behavior in childhood population samples (the EAGLE data). This was done using a modified sample size-based weighting method (see below). Third, we applied the modified sample size-based weighting method to meta-analyze the EAGLE GWAS with the ADHD+23andMe GWAS meta-analysis.
For meta-analyses including the EAGLE cohort, modified sample size-based weights were derived to accounts for the respective heritabilities, genetic correlation, and measurement scale of the GWASs (Supplementary Information). To summarize, given z-scores Z 1j and Z 2j resulting from GWAS of SNP j in a dichotomous phenotype (e.g. ADHD) with sample size N I and a continuous phenotype (e.g. ADHD-related traits) with sample size N 2 , respectively, we calculate The adjusted sample sizes , -and , # reflect differences in power between the studies due to measurement scale and relative heritability that is not captured by sample size. The calculation of $ # reduces the contribution of the continuous phenotype's GWAS to the meta-analysis based on imperfect genetic correlation with the dichotomous phenotype of interest (i.e. ADHD). The adjustments are computed based on the sample prevalence (P) and population prevalence (K) of the dichotomous phenotype, the estimated liability scale SNP heritability of the two phenotypes (ℎ -# and ℎ # # ), and the genetic correlation (r g ) between the two phenotypes, as well as the average SNP LD score (l j ) and the number of SNPs (M). Heritability and genetic correlation values to compute these weights are computed using LD score regression. This meta-analysis weighting scheme is consistent with weights alternatively derived based on modelling the joint distribution of marginal GWAS beta across traits 113 .
To test heterogeneity with each replication cohort, we considered Cochran's Q test of heterogeneity in the first two replication meta-analyses described above. Specifically, we evaluated the one degree of freedom test for heterogeneity between the ADHD GWAS metaanalysis and the replication cohort.       Table 1. Results for the genome-wide significant index variants in the 12 loci associated with ADHD identified in the GWAS metaanalysis. Index variants are LD independent (r 2 < 0.1), and are merged into one locus when located with a distance less than 400kb. The location (chromosome [Chr] and base position [BP]), alleles (A1 and A2), allele frequency (A1 Freq), odds ratio (OR) of the effect with respect to A1, and association P-value of the index variant are given, along with genes within 50kb of the credible set for the locus.