Abstract
Identifying genetic correlations between complex traits and diseases can provide useful etiological insights and help prioritize likely causal relationships. The major challenges preventing estimation of genetic correlation from genome-wide association study (GWAS) data with current methods are the lack of availability of individual genotype data and widespread sample overlap among meta-analyses. We circumvent these difficulties by introducing a technique for estimating genetic correlation that requires only GWAS summary statistics and is not biased by sample overlap. We use our method to estimate 300 genetic correlations among 25 traits, totaling more than 1.5 million unique phenotype measurements. Our results include genetic correlations between anorexia nervosa and schizophrenia, anorexia and obesity and associations between educational attainment and several diseases. These results highlight the power of genome-wide analyses, since there currently are no genome-wide significant SNPs for anorexia nervosa and only three for educational attainment.
Footnotes
↵* Co-first authors
↵† Co-last authors
↵8 A list of members and affiliations appears in the Supplementary Note.
↵1 We ignore the distinction between normalizing and centering in the population and in the sample, since this introduces only error.
↵2 The assumption that all β is drawn with equal variance for all SNPs hides an implicit assumption that rare SNPs have larger per-allele effect sizes than common SNPs. As discussed in the simulations section of the main text and in our earlier work [21], LD Score regression is robust to moderate violations of this assumption, though it may break down in extreme cases, e.g., if all causal variants are rare. In situations where a different model for Var[β] is more appropriate, all proofs in this note go through with LD Score replaced by weighted LD Scores, .
↵3 For instance, it is sufficient but not necessary to assume that β, γ, δ and ϵ are multivariate normal. More generally, the z-scores will be approximately normal if β and γ are reasonably polygenic. If the distribution of effect sizes is heavy-tailed, e.g., if there are few casual SNPs, then the CVF may be larger.
↵4 Conditional on the marginal effect of j, the expected value of is not equal to pj unless P = K or the marginal effect of j is zero.
↵5 For ℓj = 100 (roughly the median 1kG LD Score), M = 107 and ρg,obs = 1, we get ρg,obsℓj/M = 10−5. A worst-case value for Ns/N1N2 might be Ns = N1 = N2 = 103, in which case Ns/N1N2 = 10−3. Thus, ρg,obsℓj/M and Ns/N1N2 will generally be at least 3 orders of magnitude smaller than 1.