Abstract
Genome-wide significant associations generally explain only a small proportion of the narrow-sense heritability of complex disease (h2). While considerably more heritability is explained by all genotyped SNPs (hg2), for most traits, much heritability remains missing (hg2 < h2). Rare variants, poorly tagged by genotyped SNPs, are a major potential source of the gap between hg2 and h2. Recent efforts to assess the contribution of both sequenced and imputed rare variants to phenotypes suggest that substantial heritability may lie in these variants. Here we analyze sequenced SNPs, imputed SNPs and haploSNPs— haplotype variants constructed from within a sample, without using a reference panel— and show that studies of heritability from these variants may be strongly confounded by subtle population stratification. For example, when meta-analyzing heritability estimates from 22 randomly ascertained case-control traits from the GERA cohort, we observe a statistically significant increase in heritability explained by imputed SNPs even after correcting for principal components (PCs) from genotyped (or imputed) SNPs. However, this increase is eliminated when correcting for stratification using PCs from a larger number of haploSNPs. We note that subtle stratification may also impact estimates of heritability from array SNPs, although we find that this is generally a less severe problem. Overall, our results suggest that estimating the heritability explained by rare variants for case-control traits requires exquisite control for population stratification, but current methods may not provide this level of control.