An Atlas of Genetic Correlations across Human Diseases and Traits

Brendan Bulik-Sullivan; Hilary K Finucane; Verneri Anttila; Alexander Gusev; Felix R. Day; ReproGen Consortium; Psychiatric Genomics Consortium; Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3; Laramie Duncan; John R. B. Perry; Nick Patterson; Elise B. Robinson; Mark J. Daly; Alkes L. Price; Benjamin M. Neale

doi:10.1101/014498

Abstract

Identifying genetic correlations between complex traits and diseases can provide useful etiological insights and help prioritize likely causal relationships. The major challenges preventing estimation of genetic correlation from genome-wide association study (GWAS) data with current methods are the lack of availability of individual genotype data and widespread sample overlap among meta-analyses. We circumvent these difficulties by introducing a technique for estimating genetic correlation that requires only GWAS summary statistics and is not biased by sample overlap. We use our method to estimate 300 genetic correlations among 25 traits, totaling more than 1.5 million unique phenotype measurements. Our results include genetic correlations between anorexia nervosa and schizophrenia, anorexia and obesity and associations between educational attainment and several diseases. These results highlight the power of genome-wide analyses, since there currently are no genome-wide significant SNPs for anorexia nervosa and only three for educational attainment.

Footnotes

↵* Co-first authors
↵† Co-last authors
↵8 A list of members and affiliations appears in the Supplementary Note.
↵1 We ignore the distinction between normalizing and centering in the population and in the sample, since this introduces only error.
↵2 The assumption that all β is drawn with equal variance for all SNPs hides an implicit assumption that rare SNPs have larger per-allele effect sizes than common SNPs. As discussed in the simulations section of the main text and in our earlier work [21], LD Score regression is robust to moderate violations of this assumption, though it may break down in extreme cases, e.g., if all causal variants are rare. In situations where a different model for Var[β] is more appropriate, all proofs in this note go through with LD Score replaced by weighted LD Scores, .
↵3 For instance, it is sufficient but not necessary to assume that β, γ, δ and ϵ are multivariate normal. More generally, the z-scores will be approximately normal if β and γ are reasonably polygenic. If the distribution of effect sizes is heavy-tailed, e.g., if there are few casual SNPs, then the CVF may be larger.
↵4 Conditional on the marginal effect of j, the expected value of is not equal to p_j unless P = K or the marginal effect of j is zero.
↵5 For ℓ_j = 100 (roughly the median 1kG LD Score), M = 10⁷ and ρ_g,obs = 1, we get ρ_g,obsℓ_j/M = 10⁻⁵. A worst-case value for N_s/N₁N₂ might be N_s = N₁ = N₂ = 10³, in which case N_s/N₁N₂ = 10⁻³. Thus, ρ_g,obsℓ_j/M and N_s/N₁N₂ will generally be at least 3 orders of magnitude smaller than 1.