## Abstract

A central question in human genetics is to find the proportion of variation in a trait that can be ex-plained by genetic variation. A number of methods have been developed to estimate this quantity, termed narrow-sense heritability, from genome-wide SNP data. Recently, it has become clear that estimates of narrow-sense heritability are sensitive to modeling assumptions that relate the effect sizes of a SNP to its minor allele frequency (MAF)and linkage disequilibrium (LD) patterns [3]. A principled approach to estimate heritability while accounting for variation in SNP effect sizes involves the application of linear Mixed Models (LMMs) with multiple variance components where each variance component represents the fraction of genetic variance explained by SNPs that belong to a given range of MAF and LD values. Beyond their importance in accurately estimating genome-wide SNP heritability, multiple variance component LMMs are useful in partitioning the contribution of genomic annotations to trait heritability which, in turn, can provide insights into biological processes that are associated with the trait.

Existing methods for fitting multi-component LMMs rely on maximizing the likelihood of the variance components. These methods pose major computational bottlenecks that makes it challenging to apply them to large-scale genomic datasets such as the UK Biobank which contains half a million individuals genotyped at tens of millions of SNPs.

We propose a scalable algorithm, RHE-reg-mc, to jointly estimate multiple variance components in LMMs. Our algorithm is a randomized method-of-moments estimator that has a runtime that is observed to scale as for *N* individuals, *M* SNPs, *K* variance components, and *B* ≈ 10 being a parameter that controls the number of random matrix-vector multiplication. RHE-reg-mc also efficiently computes standard errors. We evaluate the accuracy and scalability of RHE-reg-mc for estimating the total heritability as well as in partitioning heritability. The ability to fit multiple variance components to SNPs partitioned according to their MAF and local LD allows RHE-reg-mc to obtain relatively unbiased estimates of SNP heritability under a wide range of models of genetic architecture. On the UK Biobank dataset consisting of ≈ 300, 000 individuals and ≈ 500, 000 SNPs, RHE-reg-mc can fit 250 variance components, corresponding to genetic variance explained by 1 MB blocks, in ≈ 40 minutes on standard hardware.

## 1 Introduction

Heritability is a central parameter in understanding the contribution of genetic variation to trait variation [22]. Narrow-sense heritability refers to the maximal proportion of variation in a trait that can be explained by a linear function of genetic variation[22]. Over the last decade, there has been substantial attention focused on estimating narrow-sense heritability from genome-wide SNP genotype data [25]. These SNP heritability estimates are of great interest in understanding the genetic basis of complex traits and can inform strategies for designing future genetic studies. While a number of methods have been developed to estimate SNP heritability [25, 27, 11, 5, 19], recent studies have shown that the estimates of SNP heritability from these methods can be highly sensitive to modeling assumptions such as the joint distribution of the effect sizes at causal variants, their allele frequencies as well as the levels of correlation or linkage disequilibrium (LD) between causal variants and the genotyped SNPs[19, 3]. An observation from these studies is that methods that assume that the effect size of each SNP on a trait comes from the same distribution can yield biased estimates of SNP heritability [3]. One potential strategy to relax the assumption of identically distributed effect sizes consists of parametrizing the distribution of SNP effect sizes in terms of relevant covariates that are expected to influence their distributions. For example, methods that partition SNPs into bins based on minor allele frequency (MAF) and local LD patterns thereby allowing the distribution of effect sizes to vary as a function of MAF and LD tend to be robust to variation in the underlying genetic models [3]. Such partitioning strategies are also motivated by a number of recent studies suggest that the distribution of effect sizes of SNPs on a trait can have a complex relationship with their allele frequencies and LD [3, 4]. Beyond the dependency of SNP effect sizes on MAF and LD, the SNP effect sizes have been observed to vary as a function of a number of genomic annotations such as whether a SNP lies in the protein coding regions or in regions of open chromatin [7, 6, 4]. These observations, together, motivate statistical models that allow for the flexible modeling of genetic effects on phenotype where the genetic effects vary as a function of covariates such as MAF, LD or other genomic annotations.

Linear mixed models (LMMs) are an important class of models that permit the representation of flexible relationships between genetic variation and complex traits, *i.e.*, traits that are modulated by multiple genetic and environmental factors. LMMs have been applied successfully in linkage analysis in family studies, in genomic selection and risk prediction, as well as in association analysis to control for individual relatedness and population stratification.

LMMs [15] have been applied to estimate SNP heritability attributed to genome-wide SNPs [25]. In the simplest setting, the LMM is endowed with two parameters, *i.e.*, variance components, corresponding to the phenotypic variance explained by genetic and residual factors respectively. These models, termed *single-component* LMMs, as they employ a single variance component to capture all the genetic effects, assume that the effect of each SNP on the phenotype is drawn independently from the same underlying distribution. However, when this assumption is violated, these single component LMMs are expected to yield biased estimates of SNP heritability. To overcome this limitation, multi-components LMMs have been proposed which assign SNPs into one of several variance components and assume that the effect sizes at SNPs assigned to a given variance component are drawn independently from the same distribution [24]. By binning SNPs according to their MAF and LD and assigning a variance component to each bin, these multi-components LMMs have been shown to yield heritability estimates that are accurate across a range of underlying genetic architectures.

While multi-component LMMs appear to be well-suited to estimate genome-wide SNP heritability, estimating their parameters, *i.e.*, the variance components, is computationally demanding. The most common approach to estimate the variance components is to search for parameter values that maximize the likelihood. Usually a particular form of maximum likelihood, the restricted maximum likelihood (REML) estimator [16], is preferred due to a reduced bias relative to the full maximum likelihood estimator. Computing maximum likelihood or REML estimators, however, can be challenging. Methods for computing maximum likelihood or REML rely on iterative optimization algorithms. These algorithms do not scale well to large data sets like the UK biobank which contains ≈ 100, 000 individuals and over a million SNPs. While a number of algorithmic approaches have been proposed for efficient inference in LMMs, many of these are designed to leverage the specific structure of single-component LMMs and cannot be applied to the multi-component setting [13, 28]. Thus, efficient parameter estimation in multi-component LMMs remains a challenging computational problem.

### 1.1 Our contribution

We propose a fast multi-component variance components estimation algorithm for linear mixed models with many variance components based on a randomized method-of-moments (MoM) estimator. Our estimator can be viewed as a generalization of the classic Haseman-Elston (HE) regression [8] estimator to the multi-component setting as well as of a recently proposed randomized version of HE regression for the single component setting[23].

Being a Method-of-Moments estimator, our proposed estimator is statistically less efficient than REML but it is computationally attractive. Recently, a scalable estimator of variance components for a single-component LMM based on a randomized version of HE regression (RHE-reg), was proposed. This method has a runtime complexity for *N* individuals and *M* SNPs and a parameter *B* that controls the number of random matrix-vector multiplications[23]. In this paper, we extend RHE-reg to the multiple component setting where we assume that SNPs have been classified into one of *K* different non-overlapping functional categories. Our method, RHE-reg-mc, estimates the variance component of each category in time

The time complexity of RHE-reg-mc scales with the size of the genotype matrix as long as the number of components or partitions *K* is less than (*MN*) ^{1/3} where *M* and *N* are the number of SNPs and individuals respectively. For example, in the full set of UK Biobank genotypes (*M* ≈ 10^{6}, *N* ≈ 10^{5}), this allows us to define 10^{4} variance components while maintaining scalability.

We apply RHE-reg-mc to the problem of estimating genome-wide SNP heritability when the underlying genetic architecture assumes that the effect sizes are a function of the MAF and LD patterns at the SNP. To account for variation in the SNP effect sizes, we bin SNPs according to their MAF and LD patterns (following previous approaches [26] and assign a distinct variance component for each bin. In small scale experiments containing around 14, 000 individuals, we show that RHE-reg-mc yields approximately unbiased estimates of SNP heritability albeit with larger standard errors relative to multi-component REML methods. RHE-reg-mc remained relatively unbiased in large-scale experiments on about 337, 000 individuals. Importantly, RHE-reg-mc was the only method that used individual-level genotypes that could run on these datasets. Our bench-marking experiments show that RHE-reg-mc is around 400 times faster than other state-of-the-art methods such as BOLT-REML on a dataset of 100, 000 individuals and 500, 000 SNPs even when the latter were required to estimate only a single component. Therefore, RHE-reg-mc is the only method that used individual-level genotypes that can run efficiently on large datasets such as full UK Biobank which contains half a million individuals and millions of SNPs. Beyond estimating SNP heritability, we also show that RHE-reg-mc accurately partitions heritability across 1 Mb regions which corresponds to jointly estimating 250 variance components.

## 2 Methods

### 2.1 Motivation

Various methods have been developed to estimate narrow sense heritability [1]. According to recent studies [18, 3], our assumptions about the levels of LD, MAF and effect sizes of underlying causal variants have significant effects on the accuracy of estimates of narrow sense heritability. However,methods which partition SNPs into bins based on minor allele frequency (MAF) and linkage disequilibrium (LD) [24] obtain reduced bias in their estimates of heritability.

To understand why the estimation of narrow-sense heritability is sensitive to assumptions about the underlying causal variants, and why single component LMMs are expected to yield biased estimates of SNP heritability, consider the following single-component linear mixed model:

Here ** y** is a

*N*-vector of phenotypes which is centered, and

**is a**

*X**N*×

*M*matrix of standardized genotypes obtained by centering and scaling each column of the genotype matrix

**where**

*G**g*

_{i,j}∈ {0, 1, 2} denotes the number of minor alleles carried by individual

*i*at SNP

*j*,

*N*and

*M*are the number of individuals and SNPs respectively.

**is a**

*β**M*-vector of SNP effect sizes. is the residual variance while is the variance component corresponding to the

*M*SNPs. In this model, the SNP heritability is defined as . This model assumes that the variance of the effect size is the same across all SNPs. When effect sizes are coupled with MAF or LD, the mismatch between the model assumptions and data can lead to biased heritability estimates.

To tackle this problem, the LMM can be extended to include multiple components as follows:

Here each of the *M* SNPs is assigned to one of *K* non-overlapping categories. Each category *k* contains *M*_{k} SNPs, *k* ∈ {1,…, *K*}, ∑_{k} *M*_{k} = *M*. Let *G*_{k} denote the *N* × *M*_{k} genotype matrix for category *k* such that *g*_{k,i,j} ∈ {0, 1, 2} denotes the number of minor alleles carried by individual *i* at SNP *j* in category *k*. Let *X*_{k} be a *N* × *M*_{k} matriLx of standardized genotypes obtained by centering and scaling each column of *G*_{k} so that ∑_{i}*g*_{k,i,j} = 0 and for *m* ∈ {1, 2,…, *M* }. Let *β*_{k} be a *M*_{k}-vector of SNP effect sizes for the *k*-th category. In this model, is the residual variance, and is the variance component of the *k*-th category. In this model, the total SNP heritability is defined as:

The SNP heritability of category *k* is defined as:

The model in Equation 3 has *K* partitions such that the variance of effect sizes can differ among the partitions. We can partition SNPs into bins based on minor allele frequency (MAF) and local LD patterns thereby allowing the distribution of effect sizes to vary as a function of MAF and LD tend to be robust to variation in the underlying genetic models and achieve unbiased estimation of SNP heritability. However, in this setting we have several challenges. First, can we estimate the parameters of the model efficiently? Second, for a given genotype, what is the minimum value of *K* required for an accurate estimation of heritability?

### 2.2 Method-of-moments for estimating multiple variance components

To estimate the variance components of the multi-component LMM, we use a Method-of-Moments (MoM) estimator that searches for parameter values so that the population moments are close to the sample moments. Since 𝔼 [** y**] = 0, we derived the MoM estimates by equating the population covariance to the empirical covariance. The population covariance is given by:

Here is the genetic relatedness matrix (GRM) computed from all SNPs of *k*-th category. Using *yy*^{T} as our estimate of the empirical covariance, we need to solve the following least squares problem to find the variance components.

It is not hard to see that the MoM estimator satisfies the following normal equations:

Here ** T** is a

*K*×

*K*matrix with entries

*T*

_{k,l}=

*tr*(

*K*_{k}

*K*_{l}),

*k, l*∈ {1,…,

*K*},

**is a**

*b**K*-vector with entries

*b*

_{k}=

*tr*(

*K*_{k}) =

*N*(because

*X*_{k}s is standardized), and

**is a**

*c**K*-vector with entries

*c*

_{k}=

*y*^{T}

*K*_{ky}.

Every GRM *K*_{k} can be computed in time 𝒪(*N*^{2} *M*_{k}) and 𝒪(*N*^{2}) memory. Given *K* GRMs, the quantities *T*_{k,l}, *c*_{k}, *k, l* ∈ {1,…, *K*}, can be computed in 𝒪(*NM*). Given the quantities *T*_{k,l}, *c*_{k}, the normal equation 8 can be solved in 𝒪(*K*^{3}). Therefore, the total time complexity for estimating the variance components is 𝒪(*N*^{2}*M* + *K*^{3}).

#### 2.2.1 RHE-reg-mc: Randomized estimator of multiple variance components

The key bottleneck in solving the normal equation 8 is the computation of *T*_{k,l}, *k, l* ∈ {1,…, *K*} which takes 𝒪(*N*^{2}*M*). Instead of computing the exact value of *T*_{k,l}, we use Hutchinson’s estimator of the trace [9]. This estimator uses the fact that for a given *N* × *N* matrix ** C**,

*z*^{T}

**is an unbiased estimator of**

*Cz**tr*(

**) (**

*C**E*[

*z*^{T}

**] =**

*Cz**tr*[

**]) where**

*C***be a random vector with mean zero and covariance**

*z*

*I*_{N}. Hence, we can estimate the values

*T*

_{k,l},

*k, l*∈ {1,…,

*K*} as follows:

Here *z*_{1},…, *z*_{B} are *B* independent random vectors with zero mean and covariance *I*_{N}. In our method, we draw these random vectors independently from a standard normal distribution. Note that computing *T*_{k,l} by using the unbiased estimator involves four matrix-vector multiplications which is repeated *B* times. Therefore, the total running time for estimating the values *T*_{k,l} is 𝒪(*NMB*).

Moreover, we can leverage the structure of the genotype matrix which only contains entries in {0, 1, 2}. For a fixed genotype matrix *X*_{k}, we can improve the per iteration time complexity of matrix-vector multiplication from 𝒪(*NM*) to by using the Mailman algorithm [12]. Solving the normal equations takes 𝒪(*K*^{3}) time so that the overall time complexity of our algorithm is .

### 2.3 Computing the Standard Errors of the estimates

We obtain standard errors for RHE-reg-mc using a block jackknife [10]. The jackknife is a useful resampling technique for estimating the variance and bias of an estimator [21]. A jackknife subsample is created by leaving out a subset of observations from a dataset. The jackknife estimate of a parameter can be found by estimating the parameter for each subsample omitting the *i*-th jackknife block. A naive way to compute jackknife estimate requires computing the estimator of the parameters for every subsample. For instance, in our problem, if we define *J* jackknife blocks, then we need to run RHE-reg-mc for every subsample which takes . We propose an efficient way to compute the jackknife estimate in time .

Let ** X** be a

*N*×

*M*matrix of standardized genotypes where

*N*and

*M*are the number of individuals and SNPs respectively. To generate

*J*jackknife subsamples, we partition

*X*into

*J*non-overlapping blocks

*X*^{(1)},…,

*X*^{(J)}such that

**= [**

*X*

*X*^{(1)},

*X*^{(2)},…,

*X*^{(J)}]. Note that for every

*j*,

*X*^{(j)}is a

*N*×

*M*

_{j}matrix where

*M*

_{j}is the number of SNPs in the

*j*-th block.

We create the *j*-th jackknife subsample by removing the *j*-th block *X*^{(j)} from ** X**. To estimate the variance components of the

*j*-th jackknife subsample, we need to compute the corresponding quantities of the

*j*th subsample in the normal equations 8. Let be the GRM of the

*k*-th partition which is created by removing the

*j*-th block

*X*^{(j)}from

**where**

*X**k*∈ {1,…,

*K*},

*j*∈ {1,…,

*J*}. In Appendix A we show that we can compute and , for all

*k, l*∈ {1,…,

*K*},

*j*∈ {1,…,

*J*}, in time which does not effect the total running time of the algorithm.

Therefore, for every jackknife subsample, we can estimate the corresponding variance components in 𝒪(*K*^{3}) by solving the corresponding normal equations 8. Given the estimates of variance components for each jackknife subsamples, we can compute jackknife estimate of the variance, estimate the bias, and bias-corrected jackknife estimate of the variance components.

### 2.4 Including covariates

We can extend our model in Equation 2 to include covariates as follows:

Here ** W** is a

*N*×

*C*matrix of covariates while

**is a**

*α**C*-vector of fixed effects. In Appendix B, we show that in this setting, we need to solve the following normal equations to estimate the variance components.

Here ** V** =

*I*_{N}−

**(**

*W*

*W*^{T}

**)**

*W*^{−1}

*W*^{T}and

**is a**

*T**K*×

*K*matrix where

*T*

_{k,l}=

*tr*(

*K*_{k}

**V K**

_{l}

**), and**

*V**b*is a

*K*vector where

*b*

_{k}=

*tr*(

*V K*_{k}), and

**is a**

*c**K*- vector where

*c*

_{k}=

*y*^{T}

**V K**

_{k}

**. Commonly, the number of covariates**

*V y**C*is small (tens to hundreds) so that including covariates does not significantly affect the computational cost. The cost of computing of the elements of the normal equations 11 includes the cost of inverting

*W*^{T}

**which is a**

*W**C*×

*C*matrix and multiplying

**by a real-valued**

*W**N*-vector which can be done in 𝒪(

*C*

^{3}+

*NC*).

## 3 Results

### 3.1 Estimating total heritability by partitioning based on MAF and LD

#### 3.1.1 Simulations

We performed simulations to compare the performance of RHE-reg-mc with several state-of-the-art methods for heritability estimation that cover the spectrum of methods that have been proposed. These methods include single-component as well as multi-component LMMs which use likelihood maximization for parameter estimation. These methods require access to individual-level genotypes and phenotypes. We also compared to methods that only require access to summary statistics and are typically quite scalable. GCTA and BOLT-REML estimate heritability by maximizing the restricted maximum likelihood (REML). BOLT-REML is a computationally efficient approximate method to compute the REML estimator. GCTA-ldms is the extension of GCTA to a multi-component LMM where the variance components are typically defined by binning SNPs according to their MAF as well as local LD. LDAK is similar to GCTA, except that it assumes allelic effects are a function of LD scores. Among the summary statistic methods, LD score regression (LDSC) uses the slope from the GWAS *χ*^{2} statistics regressed on the LD scores to estimate the . Stratified LD score method (S-LDSC) is an extension of LDSC for partitioning heritability from summary statistics. SumHer is the summary statistic analog of LDAK [17].

We considered two simulation settings. The small-scale setting was designed so that we could compare the accuracies of our method to state-of-the-art methods on the same data. In this setting, we simulated phenotypes from a subsampled set of genotypes from the UK Biobank [20]. Specifically, we chose *M* = 14, 000 SNPs from chromosome 1 of the UK Biobank Axiom array and a subset of *N* = 9, 000 individuals. In the large-scale simulation setting, we simulated phenotypes for the full set of UK Biobank genotypes consisting of *M* = 590, 000 array SNPs and *N* = 337, 000 unrelated white British individuals.

We simulated phenotypes from genotypes using the following model for the genetic architecture:
where *a* ∈ {0, 0.75},*b* ∈ {0, 1}, *c* is a constant, *β*_{m}, *f*_{m} and *w*_{m} are the effect size, the minor allele frequency and LDAK score of *m*^{th} SNP respectively. The LD score of a SNP is defined to be the sum of the squared correlation of the SNP with all other SNPs that lie within a specific distance, and the LDAK score of a SNP is computed based on local levels of LD such that the LDAK score tends to be higher for SNPs in regions of low LD[19]. The above models relating genotype to phenotype are commonly used in methods for estimating SNP heritability: the GCTA Model (when *a* = *b* = 0 in Equation 12), which is used by the software GCTA [26] and LD Score regression (LDSC) [2], and the LDAK Model (where *a* = 0.75*, b* = 1 in Equation 12) used by software LDAK [19]. Moreover, under each model, we varied the proportion and minor allele frequency (MAF) of causal variants (CVs). Proportion of causal variants set to be either 100% or 1%, and MAF of causal variants drawn uniformly from [0, 1] or [0.01, 0.05] to consider genetic architectures that are either infinitesimal or sparse as well genetic architectures that include a mixture of common and rare SNPs as well as one that includes only common SNPs. The GCTA Model assumes that heritability is independent of LD, while the LDAK Model assumes that heritability varies according to local levels of LD[18].

We generated 100 sets of simulated phenotypes for each setting of parameters and report accuracies averaged over these 100 sets.

#### 3.1.2 Accuracy

We compared the accuracy of RHE-reg-mc with the most popular single component methods such as BOLT-REML [14], GCTA [26], LDAK [19], LDSC [2], SumHer [17], as well as multi-components methods such as GCTA-ldms and S-LDSC (BOLT-REML can also be estimate multi-component variance components but we did not explore this option). In the small scale setting, we compared RHE-reg-mc with BOLT-REML, GCTA, GCTA-ldms, and LDAK, each of which use individual-level phenotypes and genotypes[19, 2, 26, 17]. In the large scale setting, we only compared RHE-reg-mc with methods that use summary statistics like SumHer, LDSC, and S-LDSC because the existing methods based on individual-level phenotypes and genotype are not scalable to large data sets. For the multi-component methods, we applied GCTA-ldms by binning SNPs into 8 bins based on 2 bins for MAF (MAF< 0.05, MAF> 0.05), MAF refers to the frequency at which the second most common allele occurs in a SNP, and 4 bins based on quartiles of the LD score of a SNP. The LD score of a SNP is defined to be the sum of the squared correlation of the SNP with all other SNPs that lie within a specific distance. We used the default LD score computation that is used by GCTA. For RHE-reg-mc, we used 16 bins formed by the combination of 4 bins based on MAF (MAF< 0.009,0.009 <MAF< 0.011,0.011 <MAF< 0.05,0.05 <MAF< 0.5) as well as 4 bins based on quartiles of the LDAK score of a SNP. The LDAK score of a SNP is computed based on local levels of LD such that the LDAK score tends to be higher for SNPs in regions of low LD[19].

Figure 1 and Table 1 show that both multi-component methods, *i.e.*, GCTA-ldms and RHE-reg-mc, have the least bias across the settings considered. RHE-reg-mc tends to have larger standard errors relative to GCTA-ldms consistent with the lower statistical efficiency of method-of-moments estimators relative to REML estimators. In the large-scale setting (Figure 2 and Table 2), RHE-reg-mc has reduced bias as well as low standard errors compared to other methods.

### 3.2 Computational Efficiency

We compared the running time and memory usage of RHE-reg-mc method with GCTA [26] and BOLT-REML [14]. In this comparison, we used the UK Biobank genotypes consisting of around 500, 000 SNPs over different sample sizes. For each data set, we ran RHE-reg-mc with *B* = 10 random vectors and 22 bins (SNPs partitioned based on chromosomes). We ran GCTA and BOLT-REML for a single variance component. All computations were restricted to a single core on a standard compute machine.

Figure 3 shows that we could not run single-component GCTA to sample sizes beyond 50, 000 due to memory constraints. Single-component BOLT-REML took about 5 days to run on 100, 000 individuals while the computation of RHE-reg-mc was about 20 minutes. Our bench-marking experiments show that RHE-reg-mc is around 400 times faster than other state-of-the-art methods such as BOLT-REML on a dataset of 100*k* individuals and 500*k* SNPs. Extrapolating this result, we expect that RHE-reg-mc could run on large datasets such as the full UK Biobank which contains half a million individuals genotyped at tens of millions of SNPs efficiently. The memory usage of RHE-reg-mc is linear with respect to sample size. The running time and accuracy of RHE-reg-mc relies on the choice of the number of random vectors *B*. In practice, it turns out that the estimator is highly accurate with a small *B* ≈ 10 across all datasets analyzed.

### 3.3 Partitioning heritability

To examine the ability of RHE-reg-mc to partition the heritability explained across genomic regions, we partition SNPs into 1 MB regions so that the first 50 bins each explains 1% of the variance while the rest of the bins have zero heritability. Figure 4 shows that RHE-reg-mc accurately estimates heritability in each partition on 100, 000 individuals and 460, 000 SNPs. Further, RHE-reg-mc computed the partitioned heritability in 20 minutes.

### 3.4 Application to phenotypes in the UK Biobank

Finally, we used RHE-reg-mc to partition the heritability of three phenotypes (trunk fat percentage, diastolic blood pressure, and systolic blood pressure) on the UK Biobank dataset consisting of around 500, 000 SNPs and 300, 000 individuals. Partitioning SNPs according to 22 chromosomes, we see that longer chromosomes length have higher heritability consistent with studies in other traits such as height [25] (Figure 5).

## 4 Discussion

We have described RHE-reg-mc, a scalable estimator of multiple variance components in linear mixed models. RHE-reg-mc uses a randomized Method-of-Moments estimator to estimate a large number of variance components on datasets with hundreds of thousands of individuals and SNPs in less than an hour. The ability to estimate multiple variance components efficiently is useful both in obtaining unbiased genome-wide SNP heritability estimates as well as in heritability partitioning analyses.

There are several ways to further improve the runtime and memory usage of RHE-reg-mc method. First, in the context of multiple phenotypes measured for a fixed genotype matrix, the matrix *T* in normal equation 8 is the same for every given phenotype. Hence, we just need to compute the sufficient statistics of matrix ** T** once in the multiple phenotype setting. Second, RHE-reg-mc can perform its computation in a streaming version which can lead to a highly memory efficient implementation.

## A Computing jackknife standard errors

For simplicity, here we explain how we can estimate the corresponding parameters of jackknife subsamples when we have only one variance component (one partition). It is easy to see how this approach can be generalized to the multiple variance components setting.

Let ** X** be a

*N*×

*M*matrix of standardized genotypes where

*N*and

*M*are the number of individuals and SNPs respectively. We partition

**to**

*X**J*non-overlapping blocks

*X*^{(1)},…,

*X*^{(J)}such that

**= [**

*X*

*X*^{(1)},…,

*X*^{(J)}]. Note that for every

*j*,

*X*_{j}is a

*N*×

*M*

_{j}matrix where

*M*

_{j}is the number of SNPs in the

*j*-th block.

Let be the GRM. Moreover, Let *K*^{(−j)} be the GRM of the subsample created by removing the *j*-th block from ** X**. We will show how the values

*y*

^{T}

*K*^{(−j)}

*y*and can be computed for every

*j*∈ {1,…,

*J*} efficiently.

Therefore, for every *j* we have:

Therefore, we can compute *y*^{T} *K*^{(j)}** y**, for every

*j*, and

*y*^{T}

**in .**

*Ky*We rewrite as:

Assume that *v*_{b} = *X*^{T} *z*_{b}, then we have:

Recall that *v*_{b} is a vector of length *M*. We partition *v*_{b} to *J* sub-vectors corresponding to the partitions of ** X**:

*v*_{b1},…,

*v*_{bJ}such that

*v*_{b}= [

*v*_{b1},…,

*v*_{bJ}]

^{T}and is a sub-vector of length

*M*

_{j}, for every

*j*∈ {1,…,

*J*}.

Assume that for every *j* ∈ {1,…, *J*}. and *b* ∈ {1,…, *B*} Let . We then have:

Let *d*_{b} = *c*_{b1}+…+ *c*_{bJ} for every *b* ∈ *B*. Now we can compute for every *j* ∈ {1,…, *J* } based on the values of *C*_{total}, *d*_{b}, in a way that does not effect the total running time:

It is not hard to see that we can compute *C*_{total} in 𝒪(*BNM*). Moreover, we can compute *c*_{bj} for every *b* = 1,…, *B* and *j* = 1,…, *J* in (*NM*_{j}), so all can be computed in 𝒪(*BNM*). We can again use the Mailman algorithm to speed up the vector-matrix multiplications. Therefore, the total running time for computing the variance component of all jackknife subsamples is .

## B Including Covariates

Here ** W** is a

*N × C*matrix of covariates while

**is a**

*α**C*-vector of coefficients. It is easy to see that the matrix

**=**

*V*

*I*_{N}−

**(**

*W***)**

*W*^{T}*W*^{−1}

*W*^{T}is symmetric and idempotent (

*V*^{2}=

**) of rank**

*V**N − C*. Therefore, we consider the eigendecomposition of

**=**

*V*

*EDE*^{T}, where

**is a diagonal matrix with**

*D**N − C*ones and

*C*zeros on the diagonal (we can assume that first

*N − C*elements are one). Now let the matrix

*U*_{N×(N−C)}represent the first

*N*×

*C*columns of

**. It is not hard to see that**

*E***satisfies**

*U*

*U*^{T}

**=**

*U*

*I*_{N−C},

*UU*^{T}=

**,**

*V*

*U*^{T}

**= 0. Now we multiplying by**

*W*

*U*^{T}on both sides of the above equation:

The matrix *U*^{T} is constant and the vector ** y** is random. Therefore, we have

*E*[

*U*^{T}

**] =**

*y*

*U*^{T}

*E*[

**].**

*y*Hence

Using , we have:

The MoM estimator is obtained by solving the following ordinary least squares problem:

## 5 Acknowledgments

This research was conducted using the UK Biobank Resource under applications 33127 and 33297. We thank the participants of UK Biobank for making this work possible. We thank Rob Brown for feedback on this manuscript. This work was funded by NIH grants R01HG009120 (B.P. and K.S.B.), R35GM125055 (S.S.), an Alfred P. Sloan Research Fellowship (S.S.), and a NSF grant III-1705121 (Y.W. and S.S.).