Abstract
A central question in human genetics is to find the proportion of variation in a trait that can be ex-plained by genetic variation. A number of methods have been developed to estimate this quantity, termed narrow-sense heritability, from genome-wide SNP data. Recently, it has become clear that estimates of narrow-sense heritability are sensitive to modeling assumptions that relate the effect sizes of a SNP to its minor allele frequency (MAF)and linkage disequilibrium (LD) patterns [3]. A principled approach to estimate heritability while accounting for variation in SNP effect sizes involves the application of linear Mixed Models (LMMs) with multiple variance components where each variance component represents the fraction of genetic variance explained by SNPs that belong to a given range of MAF and LD values. Beyond their importance in accurately estimating genome-wide SNP heritability, multiple variance component LMMs are useful in partitioning the contribution of genomic annotations to trait heritability which, in turn, can provide insights into biological processes that are associated with the trait.
Existing methods for fitting multi-component LMMs rely on maximizing the likelihood of the variance components. These methods pose major computational bottlenecks that makes it challenging to apply them to large-scale genomic datasets such as the UK Biobank which contains half a million individuals genotyped at tens of millions of SNPs.
We propose a scalable algorithm, RHE-reg-mc, to jointly estimate multiple variance components in LMMs. Our algorithm is a randomized method-of-moments estimator that has a runtime that is observed to scale as for N individuals, M SNPs, K variance components, and B ≈ 10 being a parameter that controls the number of random matrix-vector multiplication. RHE-reg-mc also efficiently computes standard errors. We evaluate the accuracy and scalability of RHE-reg-mc for estimating the total heritability as well as in partitioning heritability. The ability to fit multiple variance components to SNPs partitioned according to their MAF and local LD allows RHE-reg-mc to obtain relatively unbiased estimates of SNP heritability under a wide range of models of genetic architecture. On the UK Biobank dataset consisting of ≈ 300, 000 individuals and ≈ 500, 000 SNPs, RHE-reg-mc can fit 250 variance components, corresponding to genetic variance explained by 1 MB blocks, in ≈ 40 minutes on standard hardware.