## Abstract

Identifying genetic correlations between complex traits and diseases can provide useful etiological insights and help prioritize likely causal relationships. The major challenges preventing estimation of genetic correlation from genome-wide association study (GWAS) data with current methods are the lack of availability of individual genotype data and widespread sample overlap among meta-analyses. We circumvent these difficulties by introducing a technique for estimating genetic correlation that requires only GWAS summary statistics and is not biased by sample overlap. We use our method to estimate 300 genetic correlations among 25 traits, totaling more than 1.5 million unique phenotype measurements. Our results include genetic correlations between anorexia nervosa and schizophrenia, anorexia and obesity and associations between educational attainment and several diseases. These results highlight the power of genome-wide analyses, since there currently are no genome-wide significant SNPs for anorexia nervosa and only three for educational attainment.

## Introduction

Understanding the complex relationships between human behaviours, traits and diseases is a fundamental goal of epidemiology. In the absence of randomized controlled trials and longitudinal studies, many disease risk factors are identified on the basis of population cross-sectional correlations of variables at a single time point. Such approaches can be biased by confounding and reverse causation, leading to spurious associations [1, 2]. Genetics can help elucidate cause or effect, since inherited genetic effects cannot be subject to reverse causation and are biased by a smaller list of confounders.

The first methods for testing for genetic overlap were family studies [3–7]. The disadvantage of these methods is the requirement to measure all traits on the same individuals, which scales poorly to studies of a large number of traits, especially traits that are difficult or costly to measure (e.g., low-prevalence diseases). Genome-wide association studies (GWAS) produce effect-size estimates for specific genetic variants, so it is possible to test for shared genetics by looking for correlations in effect-sizes across traits, which does not require measuring multiple traits per individual.

A widely-used technique for testing for relationships between phenotypes using GWAS data is Mendelian randomization (MR) [1, 2], which is the specialization to genetics of instrumental variables [8]. MR is effective for traits where significant associations account for a substantial fraction of heritability [9, 10]. For many complex traits, heritability is distributed over thousands of variants with small effects, and the proportion of heritability accounted for by significantly associated variants at current sample sizes is small [11]. For such traits, MR suffers from low power and weak instrument bias [8, 12].

A complementary approach is to estimate genetic correlation, a quantity that includes the effects of all SNPs, including those that do not reach genome-wide significance (Methods). Genetic correlation is also meaningful for pairs of diseases, in which case it can be interpreted as the genetic analogue of comorbidity. The two main existing techniques for estimating genetic correlation from GWAS data are restricted maximum likelihood (REML) [13–18] and polygenic scores [19, 20]. These methods have only been applied to a few traits, because they require individual genotype data, which are difficult to obtain due to informed consent limitations.

In response to these limitations, we have developed a technique for estimating genetic correlation using only GWAS summary statistics that is not biased by sample overlap. Our method, cross-trait LD Score regression, is to single trait LD Score regression [21] and is computationally very fast. We apply this method to data from 25 GWAS and report genetic correlations for 300 pairs of phenotypes, demonstrating shared genetic bases for many complex diseases and traits.

## Results

### Overview of Methods

The method presented here for estimating genetic correlation from summary statistics relies on the fact that the GWAS effect-size estimate for a given SNP incorporates the effects of all SNPs in linkage disequilibrium (LD) with that SNP [21, 22]. For a polygenic trait, SNPs with high LD will have higher *χ*^{2} statistics on average than SNPs with low LD [21]. A similar relationship holds if we replace *χ*^{2} statistics for a single study with the product of *z*-scores from two studies of traits with non-zero genetic correlation.

More precisely, under a polygenic model [13, 15], the expected value of *z*_{1j}*z*_{2j} is
where *N _{i}* is the sample size for study

*i*,

*ρ*is genetic covariance (defined in Methods),

_{g}*ℓ*is LD Score [21],

_{j}*N*is the number of individuals included in both studies, and

_{s}*ρ*is the phenotypic correlation among the

*N*overlapping samples. We derive this equation in the Supplementary Note. If study 1 and study 2 are the same study, then Equation 1 reduces to the single-trait result from [21], because genetic covariance between a trait and itself is heritability, and

_{s}*χ*

^{2}=

*z*

^{2}. Asa consequence of equation 1, we can estimate genetic covariance using the slope from the regression of

*z*

_{1j}

*z*

_{2j}on LD Score, which is computationally very fast (Methods). If there is sample overlap, it will only affect the intercept from this regression (the term ) and not the slope, so the estimates of genetic correlation will not be biased by sample overlap. Similarly, shared population stratification will alter the intercept but have minimal impact on the slope, for the same reasons that population stratification has minimal impact on the slope from single-trait LD Score regression [21]. If we are willing to assume no shared population stratification and we know the amount of sample overlap and phenotypic correlation in advance (i.e., the true value of ), we can constrain the intercept to this value, which reduces the standard error. We refer to this approach as constrained intercept LD Score regression. Normalizing genetic covariance by the SNP-heritabilities yields genetic correlation: , where denotes the SNP-heritability [13] from study

*i*. Genetic correlation ranges between −1 and 1. Similar results hold if one or both studies is a case/control study, in which case genetic covariance is on the observed scale. There is no distinction between observed and liability scale genetic correlation for case/control traits, so we can talk about genetic correlation between a case/control trait and a quantitative trait and genetic correlation between pairs of case/control traits without difficulties (Supplementary Note).

### Simulations

We performed a series of simulations to evaluate the robustness of the model to potential confounders such as sample overlap and model misspecification, and to verify the accuracy of the standard error estimates (Methods).

Table 1 shows cross-trait LD Score regression estimates and standard errors from 1,000 simulations of quantitative traits. For each simulation replicate, we generated two phenotypes for each of 2,062 individuals in our sample by drawing effect sizes approximately 600,000 SNPs on chromosome 2 from a bivariate normal distribution. We then computed summary statistics for both phenotypes and estimated heritability and genetic correlation with cross-trait LD Score regression. The summary statistics were generated from completely overlapping samples. Results are shown in Table 1. These simulations confirm that cross-trait LD Score regression yields accurate estimates of the true genetic correlation and that the standard errors match the standard deviation across simulations. Thus, cross-trait LD Score regression is not biased by sample overlap, in contrast to estimation of genetic correlation via polygenic risk scores, which is biased in the presence of sample overlap [20]. We also evaluated simulations with one quantitative trait and one case/control study and show that cross-trait LD Score regression can be applied to binary traits and is not biased by oversampling of cases (Table S1).

Estimates of heritability and genetic covariance can be biased if the underlying model of genetic architecture is misspecified, *e.g.*, if variance explained is correlated with LD Score or MAF [21, 23]. Because genetic correlation is estimated as a ratio, it is more robust: biases that affect the numerator and the denominator in the same direction tend to cancel. We obtain approximately correct estimates of genetic correlation even in simulations with models of genetic architecture where our estimates of heritability and genetic covariance are biased (Table S2).

### Replication of Pyschiatric Cross-Disorder Results

As technical validation, we replicated the estimates of genetic correlations among psychiatric disorders obtained with individual genotypes and REML in [16], by applying cross-trait LD Score regression to summary statistics from the same data [24]. These summary statistics were generated from non-overlapping samples, so we applied cross-trait LD Score regression using both unconstrained and constrained intercepts (Methods). Results from these analyses are shown in Figure 1. As expected, the results from cross-trait LD Score regression were similar to the results from REML. cross-trait LD Score regression with constrained intercept gave standard errors that were only slightly larger than those from REML, while the standard errors from cross-trait LD Score regression with intercept were substantially larger, especially for traits with small sample sizes (e.g., ADHD, ASD).

### Application to Summary Statistics From 25 Phenotypes

We used cross-trait LD Score regression to estimate genetic correlations among 25 phenotypes (URLs, Methods). Genetic correlation estimates for all 300 pairwise combinations of the 25 traits are shown in Figure 2. For clarity of presentation, the 25 phenotypes were restricted to contain only one phenotype from each cluster of closely related phenotypes (Methods). Genetic correlations among the educational, anthropometric, smoking, and insulin-related phenotypes that were excluded from Figure 2 are shown in Table S4 and Figures S1, S2 and S3, respectively. References and sample sizes are shown in Table S3.

For the majority of pairs of traits in Figure 2, no GWAS-based genetic correlation estimate has been reported; however, many associations have been described informally based on the observation of overlap among genome-wide significant loci. Examples of genetic correlations that are consistent with overlap among top loci include the correlations between plasma lipids and cardiovascular disease [10]; age at onset of menarche and obesity [25]; type 2 diabetes, obesity, fasting glucose, plasma lipids and cardiovascular disease [26]; birth weight, adult height and type 2 diabetes [27, 28]; birth length, adult height and infant head circumference [29, 30]; and childhood obesity and adult obesity [29]. For many of these pairs of traits, we can reject the null hypothesis of zero genetic correlation with overwhelming statistical significance (*e.g.*, *p* < 10^{−20} for age at onset of menarche and obesity).

The first section of Table 2 lists genetic correlation results that are consistent with epidemiological associations, but, as far as we are aware, have not previously been reported using genetic data. The estimates of the genetic correlation between age at onset of menarche and adult height [31], triglycerides [32] and type 2 diabetes [32, 33] are consistent with the epidemiological associations.

The estimate of a negative genetic correlation between anorexia nervosa and obesity suggests that the same genetic factors influence normal variation in BMI as well as dysregulated BMI in psychiatric illness. This result is consistent with the observation that BMI GWAS findings implicate neuronal, rather than metabolic, cell-types and epigenetic marks [34, 35]. The negative genetic correlation between adult height and coronary artery disease agrees with a replicated epidemiological association [36–38]. We observe several significant associations with the educational attainment phenotypes from Rietveld *et al.* [39]: we estimate a statistically significant negative genetic correlation between college and Alzheimer’s disease, which agrees with epidemiological results [40, 41]. The positive genetic correlation between college and bipolar disorder is consistent with previous epidemiological reports [42, 43]. The estimate of a negative genetic correlation between smoking and college is consistent with the observed differences in smoking rates as a function of educational attainment [44].

The second section of table 2 lists three results that are, to the best of our knowledge, new both to genetics and epidemiology. One, we find a positive genetic correlation between anorexia nervosa and schizophrenia. Comorbidity between eating and psychotic disorders has not been thoroughly investigated in the psychiatric literature [45, 46], and this result raises the possibility of similarity between these classes of disease. Two, we estimate a negative genetic correlation between ulcerative colitis (UC) and childhood obesity. The relationship between premorbid BMI and ulcerative colitis is not well-understood; exploring this relationship may be a fruitful direction for further investigation. Three, we estimate a positive genetic correlation between autism spectrum disorder (ASD) and educational attainment, which itself has very high genetic correlation with IQ [39, 47, 48]. The ASD summary statistics were generated using a case-pseudocontrol study design, so this result cannot be explained by the tendency for the parents of children who receive a diagnosis of ASD to be better educated than the general population [49]. The distribution of IQ among individuals with ASD has lower mean than the general population, but with heavy tails [50] (*i.e.*, an excess of individuals with low and high IQ). There is evidence that the genetic architectures of high IQ and low IQ ASD are dissimilar [51].

The third section of table 2 lists interesting examples where the genetic correlation is close to zero with small standard error. The low genetic correlation between schizophrenia and rheumatoid arthritis is interesting because schizophrenia has been observed to be protective for rheumatoid arthritis [52], though the epidemiological effect is weak, so it is possible that there is a real genetic correlation, but it is too small for us to detect. The low genetic correlation between schizophrenia and smoking is notable because of the high prevalence of smoking among individuals with schizophrenia [53]. The low genetic correlation between schizophrenia and plasma lipid levels contrasts with a previous report of pleiotropy between schizophrenia and triglycerides [54]. Pleiotropy (unsigned) is different from genetic correlation (signed; see Methods); however, the pleiotropy reported by Andreassen, *et al.* [54] could be explained by the sensitivity of the method used to the properties of a small number of regions with strong LD, rather than trait biology (Figure S5). We estimate near-zero genetic correlation between Alzheimer’s disease and schizophrenia. The genetic correlations between Alzheimers disease and the other psychiatric traits (anorexia nervosa, bipolar, major depression, ASD) are also close to zero, but with larger standard errors, due to smaller sample sizes. This suggests that the genetic basis of Alzheimer’s disease is distinct from psychiatric conditions. Last, we estimate near zero genetic correlation between rheumatoid arthritis (RA) and both Crohn’s disease (CD) and UC. Although these diseases share many associated loci [55, 56], there appears to be no directional trend: some RA risk alleles are also risk alleles for UC and CD, but many RA risk alleles are protective for UC and CD [55], yielding near-zero genetic correlation. This example highlights the distinction between pleiotropy and genetic correlation (Methods).

Finally, the estimates of genetic correlations among metabolic traits are consistent with the estimates obtained using REML in Vattikuti *et al.* [17] (Supplementary Table S4), and are directionally consistent with the recent Mendelian randomization results from Wuertz *et al.* [57]. The estimate of 0.57 (0.074) for the genetic correlation between CD and UC is consistent with the estimate of 0.62 (0.042) from Chen *et al.* [18].

## Discussion

We have described a new method for estimating genetic correlation from GWAS summary statistics, which we applied to a dataset of GWAS summary statistics consisting of 25 traits and more than 1.5 million unique phenotype measurements. We reported several new findings that would have been difficult or impossible to obtain with existing methods, including a positive genetic correlation between anorexia nervosa and schizophrenia. Our method replicated many previously-reported GWAS-based genetic correlations, and confirmed observations of overlap among genome-wide significant SNPs, MR results and epidemiological associations.

This method is an advance for several reasons: it does not require individual genotypes, genomewide significant SNPs or LD-pruning (which loses information if causal SNPs are in LD). Our method is not biased by sample overlap and is computationally fast. Furthermore, our approach does not require measuring multiple traits on the same individuals, so it scales easily to studies of thousands of pairs of traits. These advantages allow us to estimate genetic correlation for many more pairs of phenotypes than was possible with existing methods.

The challenges in interpreting genetic correlation are similar to the challenges in MR. We highlight two difficulties. First, genetic correlation is immune to environmental confounding, but is subject to genetic confounding, analogous to confounding by pleiotropy in MR. For example, the genetic correlation between HDL and CAD in Figure 2 could result from a causal effect HDL → CAD, but could also be mediated by triglycerides (TG) [10, 58], represented graphically [59] as HDL ← G → TG → CAD, where G is the set of genetic variants with effects on both HDL and TG. Extending genetic correlation to multiple genetically correlated phenotypes is an important direction for future work [60]. Second, although genetic correlation estimates are not biased by oversampling of cases, they are affected by other forms of selection bias, such as misclassification [16].

We note several limitations of cross-trait LD Score regression as an estimator of genetic correlation. First, cross-trait LD Score regression requires larger sample sizes than methods that use individual genotypes in order to achieve equivalent standard error. Second, cross-trait LD Score regression is not currently applicable to samples from recently-admixed populations. Third, we have not investigated the potential impact of assortative mating on estimates of genetic correlation, which remains as a future direction. Fourth, methods built from polygenic models, such as cross-trait LD Score regression and REML, are most effective when applied to traits with polygenic genetic architectures. For traits where significant SNPs account for a sizable proportion of heritability, analyzing only these SNPs can be more powerful. Developing methods that make optimal use of both large-effect SNPs and diffuse polygenic signal is a direction for future research.

Despite these limitations, we believe that the cross-trait LD Score regression estimator of genetic correlation will be a useful addition to the epidemiological toolbox, since it allows for rapid screening for correlations among a diverse set of traits, without the need for measuring multiple traits on the same individuals or genome-wide significant SNPs.

## Methods

### Definition of Genetic Covariance and Correlation

All definitions refer to narrow-sense heritabilities and genetic covariances. Let *S* denote a set of *M* SNPs, let *X* denote a vector of additively (0-1-2) coded genotypes for the SNPs in *S*, and let *y*_{1} and *y*_{2} denote phenotypes. Define , where the maximization is performed in the population (*i.e.*, in the infinite data limit). Let *γ* denote the corresponding vector for *y*_{2}. This is a projection, so *β* is unique modulo SNPs in perfect LD. Define , the heritability explained by SNPs in *S*, as and *ρS*(*y*_{1}, *y*_{2}), the genetic covariance among SNPs in *S*, as . The genetic correlation among SNPs in *S* is , which lies in [-1,1]. Following [13], we use subscript *g* (as in , *ρ _{g}*,

*r*) when the set of SNPs is genotyped and imputed SNPs in GWAS.

_{g}SNP genetic correlation (*r _{g}*) is different from family study genetic correlation. In a family study, the relationship matrix captures information about all genetic variation, not just common SNPs. As a result, family studies estimate the total genetic correlation (

*S*equals all variants). Unlike the relationship between SNP-heritability [13] and total heritability, for which , no similar relationship holds between SNP genetic correlation and total genetic correlation. If

*β*and

*γ*are more strongly correlated among common variants than rare variants, then the total genetic correlation will be less than the SNP genetic correlation.

Genetic correlation is (asymptotically) proportional to Mendelian randomization estimates. If we use a genetic instrument to estimate the effect *b*_{12} of *y*_{1} on *y*_{2}, the 2SLS estimate is [8]. The expectations of the numerator and denominator are and . Thus, . If we use the same set *S* of SNPs to estimate *b*_{12} and *b*_{21} (*e.g.*, if *S* is the set of all common SNPs, as in the genetic correlation analyses in this paper), then this procedure is symmetric in *y*_{1} and *y*_{2}.

Genetic correlation is different from pleiotropy. Two traits have a pleiotropic relationship if many variants affect both. Genetic correlation is a stronger condition than pleiotropy: to exhibit genetic correlation, the directions of effect must also be consistently aligned.

### Cross-Trait LD Score Regression

We estimate genetic covariance by regressing *z*_{1j}*z*_{2j} against , (where *N _{ij}* is the sample size for SNP

*j*in study

*i*) then multiplying the resulting slope by

*M*, the number of SNPs in the reference panel with MAF between 5% and 50% (technically, this is an estimate of

*ρ*

_{5-50%,}see the Supplementary Note).

If we know the amount of sample overlap ahead of time, we can reduce the standard error by constraining the intercept with the `--constrain-intercept` flag in `ldsc`. This works even if there is nonzero sample overlap, in which case the intercept should be constrained to .

### Regression Weights

For heritability estimation, we use the regression weights from [21]. If effect sizes for both phenotypes are drawn from a bivariate normal distribution, then the optimal regression weights for genetic covariance estimation are

(Supplementary Note). This quantity depends on several parameters which are not known a priori, so it is necessary to estimate them from the data. We compute the weights in two steps:

The first regression is weighted using heritabilities from the single-trait LD Score regressions,

*ρN*= 0, and_{s}*ρ*_{g}estimated as .The second regression is weighted using the estimates of

*ρN*and_{s}*ρ*from step 1. The genetic covariance estimate that we report is the estimate from the second regression._{g}

Linear regression with weights estimated from the data is called feasible generalized least squares (FGLS). FGLS has the same limiting distribution as WLS with optimal weights, so WLS *p*-values are valid for FGLS [8]. We multiply the heteroskedasticity weights by 1/*ℓ _{j}* (where

*ℓ*is LD Score with sum over regression SNPs) in order to downweight SNPs that are overcounted. This is a heuristic: the optimal approach is to rotate the data so that it is de-correlated, but this rotation matrix is difficult to compute.

_{j}### Assessment of Statistical Significance via Block Jackknife

Summary statistics for SNPs in LD are correlated, so the OLS standard error will be biased downwards. We estimate a heteroskedasticity-and-correlation-robust standard error with a block jackknife over blocks of adjacent SNPs. This is the same procedure used in [21], and gives accurate standard errors in simulations (Table 1). We obtain a standard error for the genetic correlation by using a ratio block jackknife over SNPs. The default setting in `ldsc` is 200 blocks per genome, which can be adjusted with the `--num-blocks` flag.

### Computational Complexity

Let *N* denote sample size and *M* the number of SNPs. The computational complexity of the steps involved in LD Score regression are as follows:

Computing summary statistics takes time.

Computing LD Scores takes time, though the

*N*for computing LD Scores need not be large. We use the*N*= 378 Europeans from 1000 Genomes.LD Score regression takes time and space.

For a user who has already computed summary statistics and downloads LD Scores from our website (URLs), the computational cost of LD Score regression is time and space. For comparison, REML takes time for computing the GRM and time for maximizing the likelihood.

Practically, estimating LD Scores takes roughly an hour parallelized over chromosomes, and LD Score regression takes about 15 seconds per pair of phenotypes on a 2014 MacBook Air with 1.7 GhZ Intel Core i7 processor.

### Simulations

We simulated quantitative traits under an infinitesimal model in 2062 controls from a Swedish study. To simulate the standard scenario where many causal SNPs are not genotyped, we simulated phenotypes by drawing casual SNPs from 622,146 best-guess imputed 1000 Genomes SNPs on chromosome 2, then retained only the 90,980 HM3 SNPs with MAF above 5% for LD Score regression.

We note that the simulations in [21] show that single-trait LD Score regression is only minimally biased by uncorrected population stratification and moderate ancestry mismatch between the reference panel used for estimating LD Scores and the population sampled in GWAS. In particular, LD Scores estimated from the 1000 Genomes reference panel are suitable for use with European-ancestry meta-analyses. Put another way, LD Score is only minimally correlated with *F*_{ST}, and the differences in LD Score among European populations are not so large as to bias LD Score regression. Since we use the same LD Scores for cross-trait LD Score regression as for single-trait LD Score regression, these results extend to cross-trait LD Score regression.

### Summary Statistic Datasets

We selected traits for inclusion in the main text via the following procedure:

Begin with all publicly available non-sex-stratified European-only summary statistics.

Remove studies that do not provide signed summary statistics.

Remove studies not imputed to at least HapMap 2.

Remove studies that include heritable covariates [61].

Remove all traits with heritability

*z*-score below 4. Genetic correlation estimates for traits with heritability*z*-score below 4 are generally too noisy to interpret.Prune clusters of correlated phenotypes (

*e.g.*, obesity classes 1-3) by picking the trait from each cluster with the highest heritability heritability*z*-score.

We then applied the following filters (implemented in the script `sumstats_to_chisq.py` included with `ldsc`):

For studies that provide a measure of imputation quality, filter to INFO above 0.9.

For studies that provide sample MAF, filter to sample MAF above 1%.

In order to restrict to well-imputed SNPs in studies that do not provide a measure of imputation quality, filter to HapMap3 [62] SNPs with 1000 Genomes EUR MAF above 5%, which tend to be well-imputed in most studies. This step should be skipped if INFO scores are available for all studies.

If sample size varies from SNP to SNP, remove SNPs with effective sample size less than 0.67 times the 90th percentile of sample size.

Remove indels and structural variants.

Remove strand-ambiguous SNPs.

Remove SNPs whose alleles do not match the alleles in 1000 Genomes.

Because the presence of outliers can increase the regression standard error, we also removed SNPs with extremely large effect sizes (

*χ*^{2}> 80, as in [21]).

Genomic control (GC) correction at any stage biases the heritability and genetic covariance estimates downwards (see the Supplementary Note of [21]. The biases in the numerator and denominator of genetic correlation cancel exactly, so genetic correlation is not biased by GC correction. A majority of the studies analyzed in this paper used GC correction, so we do not report genetic covariance and heritability.

Data on Alzheimer’s disease were obtained from the following source:
International Genomics of Alzheimer’s Project (IGAP) is a large two-stage study based upon genome-wide association studies (GWAS) on individuals of European ancestry. In stage 1, IGAP used genotyped and imputed data on 7,055,881 single nucleotide polymorphisms (SNPs) to meta-analyze four previously-published GWAS datasets consisting of 17,008 Alzheimer’s disease cases and 37,154 controls (The European Alzheimer’s Disease Initiative, EADI; the Alzheimer Disease Genetics Consortium, ADGC; The Cohorts for Heart and Aging Research in Genomic Epidemiology consortium, CHARGE; The Genetic and Environmental Risk in AD consortium, GERAD). In stage 2, 11,632 SNPs were genotyped and tested for association in an independent set of 8,572 Alzheimer’s disease cases and 11,312 controls. Finally, a meta-analysis was performed combining results from stages 1 and 2.

We only used stage 1 data for LD Score regression.

## URLs

`ldsc`software: github.com/bulik/ldscThis paper: github.com/bulik/gencor_tex

PGC (psychiatric) summary statistics: www.med.unc.edu/pgc/downloads

GIANT (anthopometric) summary statistics: www.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files

EGG (Early Growth Genetics) summary statistics: www.egg-consortium.org/

MAGIC (insulin, glucose) summary statistics: www.magicinvestigators.org/downloads/

CARDIoGRAM (coronary artery disease) summary statistics: www.cardiogramplusc4d.org

DIAGRAM (T2D) summary statistics: www.diagram-consortium.org

Rheumatoid arthritis summary statistics: www.broadinstitute.org/ftp/pub/rheumatoid_arthritis/Stahl_etal_2010NG/

IGAP (Alzheimers) summary statistics: www.pasteur-lille.fr/en/recherche/u744/igap/igap download. php

IIBDGC (inflammatory bowel disease) summary statistics: www.ibdgenetics.org/downloads.html

We used a newer version of these data with 1000 Genomes imputation.

Plasma lipid summary statistics: www.broadinstitute.org/mpg/pubs/lipids2010/

SSGAC (educational attainment) summary statistics: www.ssgac.org/

## Author Contributions

MJD provided reagents. BMN and ALP provided reagents. CL, ER, VA, JP and FD aided in the interpretation of results. JP and FD provided data on age at onset of menarche. The caffeine molecule is responsible for all that is good about this manuscript. BBS and HKF are responsible for the rest. All authors revised and approved the final manuscript.

## Competing Financial Interests

We have no financial conflicts of interest to declare.

## Collaborators

Collaborators from the Psychiatric Genomics Consortium were, in alphabetical order: Devin Absher, Rolf Adolfsson, Ingrid Agartz, Esben Agerbo, Huda Akil, Margot Albus, Madeline Alexander, Farooq Amin, Ole A Andreassen, Adebayo Anjorin, Richard Anney, Dan Arking, Philip Asherson, Maria H Azevedo, Silviu A Bacanu, Lena Backlund, Judith A Badner, Tobias Banaschewski, Jack D Barchas, Michael R Barnes, Thomas B Barrett, Nicholas Bass, Michael Bauer, Monica Bayes, Martin Begemann, Frank Bellivier, Judit Bene, Sarah E Bergen, Thomas Bettecken, Elizabeth Bevilacqua, Joseph Biederman, Tim B Bigdeli, Elisabeth B Binder, Donald W Black, Douglas HR Blackwood, Cinnamon S Bloss, Michael Boehnke, Dorret I Boomsma, Anders D Borglum, Elvira Bramon, Gerome Breen, Rene Breuer, Richard Bruggeman, Nancy G Buccola, Randy L Buckner, Jan K Buitelaar, Brendan Bulik-Sullivan, William E Bunner, Margit Burmeister, Joseph D Buxbaum, William F Byerley, Sian Caesar, Wiepke Cahn, Guiqing Cai, Murray J Cairns, Dominique Campion, Rita M Cantor, Vaughan J Carr, Noa Carrera, Miquel Casas, Stanley V Catts, Aravinda Chakravarti, Kimberley D Chambert, Raymond CK Chan, Eric YH Chen, Ronald YL Chen, Wei Cheng, Eric FC Cheung, Siow Ann Chong, Khalid Choudhury, Sven Cichon, David St Clair, C Robert Cloninger, David Cohen, Nadine Cohen, David A Collier, Edwin Cook, Hilary Coon, Bru Cormand, Paul Cormican, Aiden Corvin, William H Coryell, Nicholas Craddock, David W Craig, Ian W Craig, Benedicto Crespo-Facorro, James J Crowley, David Curtis, Darina Czamara, Mark J Daly, Ariel Darvasi, Susmita Datta, Michael Davidson, Kenneth L Davis, Richard Day, Franziska Degenhardt, Lynn E DeLisi, Ditte Demontis, Bernie Devlin, Dimitris Dikeos, Timothy Dinan, Srdjan Djurovic, Enrico Domenici, Gary Donohoe, Alysa E Doyle, Elodie Drapeau, Jubao Duan, Frank Dudbridge, Naser Durmishi, Howard J Edenberg, Hannelore Ehrenreich, Peter Eichhammer, Amanda Elkin, Johan Eriksson, Valentina Escott-Price, Tonu Esko, Laurent Essioux, Bruno Etain, Ayman H Fanous, Stephen V Faraone, Kai-How Farh, Anne E Farmer, Martilias S Farrell, Jurgen Del Favero, Manuel A Ferreira, I Nicol Ferrier, Matthew Flickinger, Tatiana Foroud, Josef Frank, Barbara Franke, Lude Franke, Christine Fraser, Robert Freedman, Nelson B Freimer, Marion Friedl, Joseph I Friedman, Louise Frisen, Menachem Fromer, Pablo V Gejman, Giulio Genovese, Lyudmila Georgieva, Elliot S Gershon, Eco J De Geus, Ina Giegling, Michael Gill, Paola Giusti-Rodriguez, Stephanie Godard, Jacqueline I Goldstein, Vera Golimbet, Srihari Gopal, Scott D Gordon, Katherine Gordon-Smith, Jacob Gratten, Elaine K Green, Tiffany A Greenwood, Gerard Van Grootheest, Magdalena Gross, Detelina Grozeva, Weihua Guan, Hugh Gurling, Omar Gustafsson, Lieuwe de Haan, Hakon Hakonarson, Steven P Hamilton, Christian Hammer, Marian L Hamshere, Mark Hansen, Thomas F Hansen, Vahram Haroutunian, Annette M Hartmann, Martin Hautzinger, Andrew C Heath, Anjali K Henders, Frans A Henskens, Stefan Herms, Ian B Hickie, Maria Hipolito, Joel N Hirschhorn, Susanne Hoefels, Per Hoffmann, Andrea Hofman, Mads V Hollegaard, Peter A Holmans, Florian Holsboer, Witte J Hoogendijk, Jouke Jan Hottenga, David M Hougaard, Hailiang Huang, Christina M Hultman, Masashi Ikeda, Andres Ingason, Marcus Ising, Nakao Iwata, Assen V Jablensky, Stephane Jamain, Inge Joa, Edward G Jones, Ian Jones, Lisa Jones, Erik G Jonsson, Milan Macek Jr, Richard A Belliveau Jr, Antonio Julia, Tzeng JungYing, Anna K Kahler, Rene S Kahn, Luba Kalaydjieva, Radhika Kandaswamy, Sena Karachanak-Yankova, Juha Karjalainen, David Kavanagh, Matthew C Keller, Brian J Kelly, John R Kelsoe, Kenneth S Kendler, James L Kennedy, Elaine Kenny, Lindsey Kent, Jimmy Lee Chee Keong, Andrey Khrunin, Yunjung Kim, George K Kirov, Janis Klovins, Jo Knight, James A Knowles, Martin A Kohli, Daniel L Koller, Bettina Konte, Ania Korszun, Robert Krasucki, Vaidutis Kucinskas, Zita Ausrele Kucinskiene, Jonna Kuntsi, Hana Kuzelova-Ptackova, Phoenix Kwan, Mikael Landen, Niklas Langstrom, Mark Lathrop, Claudine Laurent, Jacob Lawrence, William B Lawson, Marion Leboyer, Phil Hyoun Lee, S Hong Lee, Sophie E Legge, Todd Lencz, Bernard Lerer, Klaus-Peter Lesch, Douglas F Levinson, Cathryn M Lewis, Jun Li, Miaoxin Li, Qingqin S Li, Tao Li, Kung-Yee Liang, Paul Lichtenstein, Jeffrey A Lieberman, Svetlana Limborska, Danyu Lin, Chunyu Liu, Jianjun Liu, Falk W Lohoff, Jouko Lonnqvist, Sandra K Loo, Carmel M Loughland, Jan Lubinski, Susanne Lucae, Donald MacIntyre, Pamela AF Madden, Patrik KE Magnusson, Brion S Maher, Pamela B Mahon, Wolfgang Maier, Anil K Malhotra, Jacques Mallet, Sara Marsal, Nicholas G Martin, Manuel Mattheisen, Keith Matthews, Morten Mattingsdal, Robert W McCarley, Steven A McCarroll, Colm McDonald, Kevin A McGhee, James J McGough, Patrick J McGrath, Peter McGuffin, Melvin G McInnis, Andrew M McIntosh, Rebecca McKinney, Alan W McLean, Francis J McMahon, Andrew McQuillin, Helena Medeiros, Sarah E Medland, Sandra Meier, Carin J Meijer, Bela Melegh, Ingrid Melle, Fan Meng, Raquelle I Mesholam-Gately, Andres Metspalu, Patricia T Michie, Christel M Middeldorp, Lefkos Middleton, Lili Milani, Vihra Milanova, Philip B Mitchell, Younes Mokrab, Grant W Montgomery, Jennifer L Moran, Gunnar Morken, Derek W Morris, Ole Mors, Preben B Mortensen, Valentina Moskvina, Bryan J Mowry, Pierandrea Muglia, Thomas W Muehleisen, Walter J Muir, Bertram Mueller-Myhsok, Kieran C Murphy, Robin M Murray, Richard M Myers, Inez Myin-Germeys, Benjamin M Neale, Michael C Neale, Mari Nelis, Stan F Nelson, Igor Nenadic, Deborah A Nertney, Gerald Nestadt, Kristin K Nicodemus, Caroline M Nievergelt, Liene Nikitina-Zake, Ivan Nikolov, Vishwajit Nimgaonkar, Laura Nisenbaum, Willem A Nolen, Annelie Nordin, Markus M Noethen, John I Nurnberger, Evaristus A Nwulia, Dale R Nyholt, Eadbhard O’Callaghan, Michael C O’Donovan, Colm O’Dushlaine, F Anthony O’Neill, Robert D Oades, Sang-Yun Oh, Ann Olincy, Line Olsen, Edwin JCG van den Oord, Roel A Ophoff, Jim Van Os, Urban Osby, Hogni Oskarsson, Michael J Owen, Aarno Palotie, Christos Pantelis, George N Papadimitriou, Sergi Papiol, Elena Parkhomenko, Carlos N Pato, Michele T Pato, Tiina Paunio, Milica Pejovic-Milovancevic, Brenda P Penninx, Michele L Pergadia, Diana O Perkins, Roy H Perlis, Tune H Pers, Tracey L Petryshen, Hannes Petursson, Benjamin S Pickard, Olli Pietilainen, Jonathan Pimm, Joseph Piven, Andrew J Pocklington, Porgeir Porgeirsson, Danielle Posthuma, James B Potash, John Powell, Alkes Price, Peter Propping, Ann E Pulver, Shaun M Purcell, Vinay Puri, Digby Quested, Emma M Quinn, Josep Antoni Ramos-Quiroga, Henrik B Rasmussen, Soumya Raychaudhuri, Karola Rehnstrom, Abraham Reichenberg, Andreas Reif, Mark A Reimers, Marta Ribases, John Rice, Alexander L Richards, Marcella Rietschel, Brien P Riley, Stephan Ripke, Joshua L Roffman, Lizzy Rossin, Aribert Rothenberger, Guy Rouleau, Panos Roussos, Douglas M Ruderfer, Dan Rujescu, Veikko Salomaa, Alan R Sanders, Susan Santangelo, Russell Schachar, Ulrich Schall, Martin Schalling, Alan F Schatzberg, William A Scheftner, Gerard Schellenberg, Peter R Schofield, Nicholas J Schork, Christian R Schubert, Thomas G Schulze, Johannes Schumacher, Sibylle G Schwab, Markus M Schwarz, Edward M Scolnick, Laura J Scott, Rodney J Scott, Larry J Seidman, Pak C Sham, Jianxin Shi, Paul D Shilling, Stanley I Shyn, Engilbert Sigurdsson, Teimuraz Silagadze, Jeremy M Silverman, Kang Sim, Pamela Sklar, Susan L Slager, Petr Slominsky, Susan L Smalley, Johannes H Smit, Erin N Smith, Jordan W Smoller, Hon-Cheong So, Erik Soderman, Edmund Sonuga-Barke, Chris C A Spencer, Eli A Stahl, Matthew State, Hreinn Stefansson, Kari Stefansson, Michael Steffens, Stacy Steinberg, Hans-Christoph Stein-hausen, Elisabeth Stogmann, Richard E Straub, John Strauss, Eric Strengman, Jana Strohmaier, T Scott Stroup, Mythily Subramaniam, Patrick F Sullivan, James Sutcliffe, Jaana Suvisaari, Dragan M Svrakic, Jin P Szatkiewicz, Peter Szatmari, Szabocls Szelinger, Anita Thapar, Srinivasa Thirumalai, Robert C Thompson, Draga Toncheva, Paul A Tooney, Sarah Tosato, Federica Tozzi, Jens Treutlein, Manfred Uhr, Juha Veijola, Veronica Vieland, John B Vincent, Peter M Visscher, John Waddington, Dermot Walsh, James TR Walters, Dai Wang, Qiang Wang, Stanley J Watson, Bradley T Webb, Daniel R Weinberger, Mark Weiser, Myrna M Weissman, Jens R Wendland, Thomas Werge, Thomas F Wienker, Dieter B Wildenauer, Gonneke Willemsen, Nigel M Williams, Stephanie Williams, Richard Williamson, Stephanie H Witt, Aaron R Wolen, Emily HM Wong, Brandon K Wormley, Naomi R Wray, Adam Wright, Jing Qin Wu, Hualin Simon Xi, Wei Xu, Allan H Young, Clement C Zai, Stan Zammit, Peter P Zandi, Peng Zhang, Xuebin Zheng, Fritz Zimprich, Frans G Zitman, and Sebastian Zoellner.

Genetic Consortium for Anorexia Nervosa (GCAN): Vesna Boraska Perica, Christopher S Franklin, James A B Floyd, Laura M Thornton, Laura M Huckins, Lorraine Southam, N William Rayner, Ioanna Tachmazidou, Kelly L Klump, Janet Treasure, Cathryn M Lewis, Ulrike Schmidt, Federica Tozzi, Kirsty Kiezebrink, Johannes Hebebrand, Philip Gorwood, Roger A H Adan, Martien J H Kas, Angela Favaro, Paolo Santonastaso, Fernando Fernández-Aranda, Monica Gratacos, Filip Rybakowski, Monika Dmitrzak-Weglarz, Jaakko Kaprio, Anna Keski-Rahkonen, Anu Raevuori-Helkamaa, Eric F Van Furth, Margarita C T Slof-Op’t Landt, James I Hudson, Ted Reichborn-Kjennerud, Gun Peggy S Knudsen, Palmiero Monteleone, Allan S Kaplan, Andreas Karwautz, Hakon Hakonarson, Wade H Berrettini, Yiran Guo, Dong Li, Nicholas J Schork, Gen Komaki, Tetsuya Ando, Hidetoshi Inoko, Tõnu Esko, Krista Fischer, Katrin Männik, Andres Metspalu, Jessica H Baker, Roger D Cone, Jennifer Dackor, Janiece E DeSocio, Christopher E Hilliard, Julie K O’Toole, Jacques Pantel, Jin P Szatkiewicz, Chrysecolla Taico, Stephanie Zerwas, Sara E Trace, Oliver S P Davis, Sietske Helder, Katharina Bühren, Roland Burghardt, Martina de Zwaan, Karin Egberts, Stefan Ehrlich, Beate Herpertz-Dahlmann, Wolfgang Herzog, Hartmut Imgart, André Scherag, Susann Scherag, Stephan Zipfel, Claudette Boni, Nicolas Ramoz, Audrey Versini, Marek K Brandys, Unna N Danner, Carolien de Kove, Judith Hendriks, Bobby P C Koeleman, Roel A Ophoff, Eric Strengman, Annemarie A van Elburg, Alice Bruson, Maurizio Clementi, Daniela Degortes, Monica Forzan, Elena Tenconi, Elisa Docampo, Geòrgia Escaramí, Susana Jiménez-Murcia, Jolanta Lissowska, Andrzej Rajewski, Neonila Szeszenia-Dabrowska, Agnieszka Slopien, Joanna Hauser, Leila Karhunen, Ingrid Meulenbelt, P Eline Slagboom, Alfonso Tortorella, Mario Maj, George Dedoussis, Dimitris Dikeos, Fragiskos Gonidakis, Konstantinos Tziouvas, Artemis Tsitsika, Hana Papezova, Lenka Slachtova, Debora Martaskova, James L Kennedy, Robert D Levitan, Zeynep Yilmaz, Julia Huemer, Doris Koubek, Elisabeth Merl, Gudrun Wagner, Paul Lichtenstein, Gerome Breen, Sarah Cohen-Woods, Anne Farmer, Peter McGuffin, Sven Cichon, Ina Giegling, Stefan Herms, Dan Rujescu, Stefan Schreiber, H-Erich Wichmann, Christian Dina, Rob Sladek, Giovanni Gambaro, Nicole Soranzo, Antonio Julia, Sara Marsal, Raquel Rabionet, Valerie Gaborieau, Danielle M Dick, Aarno Palotie, Samuli Ripatti, Elisabeth Widén, Ole A Andreassen, Thomas Espeseth, Astri Lundervold, Ivar Reinvang, Vidar M Steen, Stephanie Le Hellard, Morten Mattingsdal, Ioanna Ntalla, Vladimir Bencko, Lenka Foretova, Vladimir Janout, Marie Navratilova, Steven Gallinger, Dalila Pinto, Stephen W Scherer, Harald Aschauer, Laura Carlberg, Alexandra Schosser, Lars Alfredsson, Bo Ding, Lars Klareskog, Leonid Padyukov, Chris Finan, Gursharan Kalsi, Marion Roberts, Darren W Logan, Leena Peltonen, Graham R S Ritchie, Jeff C Barrett, Xavier Estivill, Anke Hinney, Patrick F Sullivan, David A Collier, Eleftheria Zeggini, and Cynthia M Bulik.

Wellcome Trust Case Control Consortium 3 (WTCCC3): Carl A Anderson, Jeffrey C Barrett, James A B Floyd, Christopher S Franklin, Ralph McGinnis, Nicole Soranzo, Eleftheria Zeggini, Jennifer Sambrook, Jonathan Stephens, Willem H Ouwehand, Wendy L McArdle, Susan M Ring, David P Strachan, Graeme Alexander, Cynthia M Bulik, David A Collier, Peter J Conlon, Anna Dominiczak, Audrey Duncanson, Adrian Hill, Cordelia Langford, Graham Lord, Alexander P Maxwell, Linda Morgan, Leena Peltonen, Richard N Sandford, Neil Sheerin, Frederik O Vannberg, Hannah Blackburn, Wei-Min Chen, Sarah Edkins, Mathew Gillman, Emma Gray, Sarah E Hunt, Suna Nengut-Gumuscu, Simon Potter, Stephen S Rich, Douglas Simpkin, and Pamela Whittaker.

The members of the ReproGen consortium are John RB Perry, Felix Day, Cathy E Elks, Patrick Sulem, Deborah J Thompson, Teresa Ferreira, Chunyan He, Daniel I Chasman, Tnu Esko, Gudmar Thorleifsson, Eva Albrecht, Wei Q Ang, Tanguy Corre, Diana L Cousminer, Bjarke Feenstra, Nora Franceschini, Andrea Ganna, Andrew D Johnson, Sanela Kjellqvist, Kathryn L Lunetta, George McMahon, Ilja M Nolte, Lavinia Paternoster, Eleonora Porcu, Albert V Smith, Lisette Stolk, Alexander Teumer, Natalia Ternikova, Emmi Tikkanen, Sheila Ulivi, Erin K Wagner, Najaf Amin, Laura J Bierut, Enda M Byrne, JoukeJan Hottenga, Daniel L Koller, Massimo Mangino, Tune H Pers, Laura M YergesArmstrong, Jing Hua Zhao, Irene L Andrulis, Hoda AntonCulver, Femke Atsma, Stefania Bandinelli, Matthias W Beckmann, Javier Benitez, Carl Blomqvist, Stig E Bojesen, Manjeet K Bolla, Bernardo Bonanni, Hiltrud Brauch, Hermann Brenner, Julie E Buring, Jenny ChangClaude, Stephen Chanock, Jinhui Chen, Georgia ChenevixTrench, J. Margriet Colle, Fergus J Couch, David Couper, Andrea D Coveillo, Angela Cox, Kamila Czene, Adamo Pio D’adamo, George Davey Smith, Immaculata De Vivo, Ellen W Demerath, Joe Dennis, Peter Devilee, Aida K Dieffenbach, Alison M Dunning, Gudny Eiriksdottir, Johan G Eriksson, Peter A Fasching, Luigi Ferrucci, Dieter FleschJanys, Henrik Flyger, Tatiana Foroud, Lude Franke, Melissa E Garcia, Montserrat GarcaClosas, Frank Geller, Eco EJ de Geus, Graham G Giles, Daniel F Gudbjartsson, Vilmundur Gudnason, Pascal Gunel, Suiqun Guo, Per Hall, Ute Hamann, Robin Haring, Catharina A Hartman, Andrew C Heath, Albert Hofman, Maartje J Hooning, John L Hopper, Frank B Hu, David J Hunter, David Karasik, Douglas P Kiel, Julia A Knight, VeliMatti Kosma, Zoltan Kutalik, Sandra Lai, Diether Lambrechts, Annika Lindblom, Reedik Mgi, Patrik K Magnusson, Arto Mannermaa, Nicholas G Martin, Gisli Masson, Patrick F McArdle, Wendy L McArdle, Mads Melbye Kyriaki Michailidou, Evelin Mihailov, Lili Milani, Roger L Milne, Heli Nevanlinna, Patrick Neven, Ellen A Nohr, Albertine J Oldehinkel, Ben A Oostra, Aarno Palotie,, Munro Peacock, Nancy L Pedersen, Paolo Peterlongo, Julian Peto, Paul DP Pharoah, Dirkje S Postma, Anneli Pouta, Katri Pylks, Paolo Radice, Susan Ring, Fernando Rivadeneira, Antonietta Robino, Lynda M Rose, Anja Rudolph, Veikko Salomaa, Serena Sanna, David Schlessinger, Marjanka K Schmidt, Mellissa C Southey, Ulla Sovio Meir J Stampfer, Doris Stckl Anna M Storniolo, Nicholas J Timpson Jonathan Tyrer, Jenny A Visser, Peter Vollenweider, Henry Vlzke, Gerard Waeber, Melanie Waldenberger, Henri Wallaschofski, Qin Wang, Gonneke Willemsen, Robert Winqvist, Bruce HR Wolffenbuttel, Margaret J Wright, Australian Ovarian Cancer Study The GENICA Network, kConFab, The LifeLines Cohort Study, The InterAct Consortium, Early Growth Genetics (EGG) Consortium, Dorret I Boomsma, Michael J Econs, KayTee Khaw, Ruth JF Loos, Mark I McCarthy, Grant W Montgomery, John P Rice, Elizabeth A Streeten, Unnur Thorsteinsdottir, Cornelia M van Duijn, Behrooz Z Alizadeh, Sven Bergmann, Eric Boerwinkle, Heather A Boyd, Laura Crisponi, Paolo Gasparini, Christian Gieger, Tamara B Harris, Erik Ingelsson, MarjoRiitta Jrvelin, Peter Kraft, Debbie Lawlor, Andres Metspalu, Craig E Pennell, Paul M Ridker, Harold Snieder, Thorkild IA Srensen, Tim D Spector, David P Strachan, Andr G Uitterlinden, Nicholas J Wareham, Elisabeth Widen, Marek Zygmunt, Anna Murray, Douglas F Easton, Kari Stefansson, Joanne M Murabito, Ken K Ong.

## Acknowledgements

We would like to thank P. Sullivan, C. Bulik, S. Caldwell, O. Andreassen for helpful comments. This work was supported by NIH grants R01 MH101244 (ALP), R03 CA173785 (HKF) and by the Fannie and John Hertz Foundation (HKF). The coffee that Brendan drank while writing this paper was roasted by Barismo in Arlington, MA and Blue Bottle Coffee in Oakland, CA.

Data on anorexia nervosa were obtained by funding from the WTCCC3 WT088827/Z/09 titled “A genome-wide association study of anorexia nervosa”.

Data on glycaemic traits have been contributed by MAGIC investigators and have been downloaded from www.magicinvestigators.org.

Data on coronary artery disease / myocardial infarction have been contributed by CARDIo-GRAMplusC4D investigators and have been downloaded from www.CARDIOGRAMPLUSC4D.ORG

We thank the International Genomics of Alzheimer’s Project (IGAP) for providing summary results data for these analyses. The investigators within IGAP contributed to the design and implementation of IGAP and/or provided data but did not participate in analysis or writing of this report. IGAP was made possible by the generous participation of the control subjects, the patients, and their families. The i-Select chips was funded by the French National Foundation on Alzheimer’s disease and related disorders. EADI was supported by the LABEX (laboratory of excellence program investment for the future) DISTALZ grant, Inserm, Institut Pasteur de Lille, Universit de Lille 2 and the Lille University Hospital. GERAD was supported by the Medical Research Council (Grant 503480), Alzheimer’s Research UK (Grant 503176), the Wellcome Trust (Grant 082604/2/07/Z) and German Federal Ministry of Education and Research (BMBF): Competence Network Dementia (CND) grant 01GI0102, 01GI0711, 01GI0420. CHARGE was partly supported by the NIH/NIA grant R01 AG033193 and the NIA AG081220 and AGES contract N01-AG-12100, the NHLBI grant R01 HL105756, the Icelandic Heart Association, and the Erasmus Medical Center and Erasmus University. ADGC was supported by the NIH/NIA grants: U01 AG032984, U24 AG021886, U01 AG016976, and the Alzheimer’s Association grant ADGC-10-196728.

## Footnotes

↵* Co-first authors

↵† Co-last authors

↵8 A list of members and affiliations appears in the Supplementary Note.

↵1 We ignore the distinction between normalizing and centering in the population and in the sample, since this introduces only error.

↵2 The assumption that all

*β*is drawn with equal variance for all SNPs hides an implicit assumption that rare SNPs have larger per-allele effect sizes than common SNPs. As discussed in the simulations section of the main text and in our earlier work [21], LD Score regression is robust to moderate violations of this assumption, though it may break down in extreme cases,*e.g.*, if all causal variants are rare. In situations where a different model for Var[*β*] is more appropriate, all proofs in this note go through with LD Score replaced by weighted LD Scores, .↵3 For instance, it is sufficient but not necessary to assume that

*β, γ, δ*and*ϵ*are multivariate normal. More generally, the*z*-scores will be approximately normal if*β*and*γ*are reasonably polygenic. If the distribution of effect sizes is heavy-tailed,*e.g.*, if there are few casual SNPs, then the CVF may be larger.↵4 Conditional on the marginal effect of

*j*, the expected value of is not equal to*p*unless_{j}*P*=*K*or the marginal effect of*j*is zero.↵5 For

*ℓ*= 100 (roughly the median 1kG LD Score),_{j}*M*= 10^{7}and*ρ*= 1, we get_{g,obs}*ρ*/_{g,obs}ℓ_{j}*M*= 10^{−5}. A worst-case value for*N*/_{s}*N*_{1}*N*_{2}might be*N*=_{s}*N*_{1}=*N*_{2}= 10^{3}, in which case*N*/_{s}*N*_{1}*N*_{2}= 10^{−3}. Thus,*ρ*/_{g,obs}ℓ_{j}*M*and*N*/_{s}*N*_{1}*N*_{2}will generally be at least 3 orders of magnitude smaller than 1.

## 1 References

- [1].↵
- [2].↵
- [3].↵
- [4].
- [5].
- [6].
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].
- [66].
- [67].
- [68].
- [69].
- [70].
- [71].
- [72].
- [73].↵
- [74].
- [75].
- [76].
- [77].
- [78].↵