## Abstract

In order to infer that a single-nucleotide polymorphism (SNP) either affects a phenotype or is linkage disequilibrium with a causal site, we must have some assurance that any SNP-phenotype correlation is not the result of confounding with some environmental variable that also affects the trait. Here we provide a mathematical analysis of LD Score regression, a recently developed method for using summary statistics from genome-wide association studies (GWAS) to ensure that confounding does not inflate the number of false positives. We do not treat the effects of genetic variation as a random variable and thus are able to obtain results about the unbiasedness of this method. We demonstrate that LD Score regression can produce unbiased estimates of confounding at null SNPs under very general conditions. This robustness holds even in cases now thought to be unfavorable, such as a correlation over SNPs between LD Scores and the degree of confounding. LD Score regression is thus an even stronger technique for causal inference than foreseen by its developers. Additionally, we demonstrate that LD Score regression produces unbiased estimates of the genetic correlation, even when its estimates of the genetic covariance and the two univariate heritabilities are substantially biased.

## 1 Introduction

The goal of genome-wide association studies (GWAS) is to find loci in the genome where variation affects a phenotype. However, this must be accomplished from observed correlations, and inferring causation from correlation is a famously perilous endeavor (Freedman, 1999; Pearl, 2009). GWAS has been fortunate in that it offers a variety of methods to check whether confounding effects have produced spurious correlations between genetic and phenotypic variation. These methods have led to a strong consensus that confounding has a minimal impact on GWAS results (Goldstein, 2011; Visscher, Brown, McCarthy, & Yang, 2012; Lee, 2012; Lee, Vattikuti, & Chow, 2016).

One of the newer methods used to check the causal status of GWAS associations is known as LD Score regression (Bulik-Sullivan et al., 2015b), which can be applied to summary statistics assembled from the contributions of different research groups and thus does not require access to individual-level data. This ingenious technique relies on the simple linear regression of assayed single-nucleotide polymorphism (SNP) *j*’s association chi-square statistic on
the sum over all SNPs of each SNP’s squared correlation with the focal SNP *j*. This latter quantity is called SNP *j*’s “LD Score.” Empirically, the regression curve relating chi-square statistics to LD Scores is always very close to an upwardly sloping straight line. This result is explicable because a SNP tagging more of its neighbors—and, thus, having a higher LD Score—is more likely to tag one or more causal sites affecting the phenotype. The lowest possible LD Score of a SNP is in fact one, which is obtained when a SNP is in perfect linkage equilibrium (LE) with all other SNPs. A hypothetical SNP with an LD Score of zero, then, fails to tag the causal effect of any SNP in the genome—including whatever effect the SNP itself may have. Therefore, if the intercept of LD Score regression departs upward from unity (the theoretical expectation of the chi-square distribution with one degree of freedom), the departure must be due to confounding, poor quality control, overlapping samples in the meta-analysis, or other artifacts. This simple and insightful method of estimating the distribution of truly null SNPs should in most cases lead to a much better global correction of the association statistics than the overly conservative genomic control (Devlin & Roeder, 1999), and the first published application of this method suggested that the vast majority of chi-square inflation in the GWAS meta-analysis of schizophrenia was attributable to polygenic causal signal rather than confounding (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014).

The slope obtained from LD Score regression could in principle also provide an estimate of the trait’s heritability, although the developers do not recommend this particular use of the method. We will show in detail why LD Score regression is not a reliable estimator of heritability below.

Another use of LD Score regression is the estimation of genetic correlations (Bulik-Sullivan et al., 2015a). The dependent variable in this case is not the chi-square statistic from the GWAS of a single trait but rather the product of two Z statistics, each taken from a GWAS of a distinct trait. In principle, this use offers a means of determining whether a trait-trait correlation (as opposed to a SNP-trait correlation) is attributable to the presence of confounders affecting both traits. If the genetic correlation is statistically and quantitatively significant, then we can be sure that the total phenotypic correlation is not attributable solely to confounders that are entirely environmental in nature. Many interesting relationships have been confirmed or discovered by bivariate LD Score regression, including a high genetic correlation (∼0.70) between years of education and age at first childbirth (Barban et al., 2016) and a moderate one (∼0.35) between years of education and intracranial volume (Okbay et al., 2016).

In the classical era of quantitative genetics, genetic correlations were most commonly estimated with twin data. Rather large samples of twinships are required for precise estimates with this design, and in some cases the estimates are not as robust against modeling assumptions as estimates of univariate heritabilities (Beauchamp, Cesarini, Johannesson, Lindqvist, & Apicella, 2011). For these reasons a welcome development in quantitative genetics has been the advent of GWAS, which can now reach sample sizes in the millions. The appearance of robustness offered by GWAS can be illusory, however, if estimates of genetic correlations are themselves subject to confounding. One can devise estimators of the genetic correlation that might be biased by environmental confounders that affect both phenotypes and happen to be correlated with genetic variation (Palla & Dudbridge, 2015; Okbay et al., 2016). An attractive feature of LD Score regression in this respect is that that its control of confounding extends not just to the evidence of association at individual SNPs but also to its genome-wide estimates of genetic correlations. This is important because, again, it is precisely the issue of a phenotypic correlation’s underlying causal nature that can call for an accurate estimate of the genetic correlation.

As appealing as the intuition behind LD Score regression may be, the mathematical justifications of this method given so far in the literature raise questions because of their assumption that the effects of genetic variants can be treated as a random variable. This assumption is a useful convenience for computations, but it is not biological; the effects of genetic polymorphisms should be invariant, and it is genotypes and phenotypic residuals that vary between individuals (Lee & Chow, 2014; de los Campos, Sorensen, & Gianola, 2015). The assumption also precludes a quantitative treatment of the method’s accuracy. Here we refrain from this assumption of random genetic effects and instead treat the effects as a vector of arbitrary fixed constants. Hence we are able to obtain precise expressions of the quantities estimated by LD Score regression, which can be compared with the quantities of actual interest to determine when they coincide. Here is a preview of our results:

If the per-SNP heritability contributed by SNP

*j*and its correlated neighbors is not related to SNP*j*’s LD Score, then the slope of LD Score regression provides an unbiased estimate of heritability. For evolutionary reasons, however, per-SNP heritability is typically smaller near SNPs with higher LD Scores (Gazal et al., 2017). LD Score regression is therefore not a reliable way to estimate the heritability of a trait (or, by extension, the genetic covariance between two traits).If the regression curve is perfectly linear, the intercept of LD Score regression is unaffected by how well the slope estimates the heritability and can thus be used to estimate the magnitude of confounding.

Here is the most novel and important conclusion of our analysis. The intercept of LD Score regression reflects a useful measure of confounding in the GWAS even if there is a relationship between LD Scores and the correlations of SNPs with environmental confounders. The developers of LD Score regression warn that in this case the intercept will not accurately estimate the contribution of confounding to the GWAS statistics (Bulik-Sullivan et al., 2015b). If linearity characterizes the relevant relationships, however, then the intercept reflects the

*conditional*extent of confounding at just those SNPs that neither affect the trait nor are in LD with any causal SNPs. This is the only piece of information needed to correct the association statistics of null SNPs so that they follow the proper Type 1 error rate with respect to the hypothesis of no causality.LD Score regression provides an accurate estimate of the genetic correlation between two traits, even if neither trait’s heritability is well estimated.

We now substantiate these claims.

## 2 Materials and methods

Consider a meta-analytic sample of *n* individuals and *p* biallelic SNPs. The standard linear model of quantitative genetics is
where *y* ∈ ℝ^{n} is the vector of standardized phenotypes, *α* ∈ ℝ^{p} is a vector of fixed constants equaling the average effects of gene substitution (Fisher, 1941; Lee & Chow, 2013), *e* ∈ ℝ^{n} is the vector of non-genetic residuals, and *X* ∈ ℝ^{n×p} is the matrix of standardized genotypes. From these definitions the heritability of the phenotype attributable to the average effects of the *p* SNPs is *h*^{2} = (1/*n*)*α*′*X*′*Xα*, although LD Score regression uses the definition *h*^{2} = *α*′*α*. These two definitions coincide if all causal sites are in linkage equilibrium (LE). As a result of LD induced by assortative mating and natural selection, this condition will often fail to be satisfied, but the resulting discrepancy between is likely to be small (Tenesa, Rawlik, Navarro, & Canela-Xandri, 2016). Henceforth we will mostly ignore the distinction between these two quantities (and similar distinctions that arise in the consideration of the genetic correlation).

We consider two different types of averages: 1) the expectation over individuals and 2) the empirical average over some attribute of SNPs, such as their GWAS association statistics, represented by the symbols 𝔼_{n} and 𝔼_{p} respectively. With this convention, *X* and *e* are random variables with the properties

The last condition represents confounding due to a correlation between SNP *j* and *e*. Note that our representation of confounding as a correlation between a SNP and the non-genetic residual e is extremely general, including as a special case the sampling of the individuals from different geographically defined subpopulations varying in allele frequencies and exposures to environmental causes. We will use γ_{j} to denote the *j*th column (row) of Γ ∈ ℝ^{p×p}, such that the *j*th LD Score is equal to .

Different populations, such as Europeans and East Asians, are characterized by different values of Γ. We assume throughout this work that the individuals studied in the GWAS can be regarded as members of the same population as the reference sample used to estimate Γ.

Any contribution to the chi-square statistic of a given SNP from a causal site not included in the computation of its LD Score will introduce some form of bias. Such omissions from Equation (1) might occur because the windows used in practice to compute LD Scores are too short or because some causal sites have properties that lead to their exclusion from the reference sample (rare alleles or being a type of polymorphism other than a SNP). Although it may be worthwhile to analyze these and other limitations, we do not do so here.

## 3 Results

### 3.1 The slope of univariate LD Score regression as an estimator of heritability

Although Bulik-Sullivan et al. (2015b) do not encourage using the slope of the regression as a heritability estimator, it is useful to see in further detail why, not least because we will reuse our primary result later. Let *X _{j}* be the jth column of

*X*. In the regression of the GWAS phenotype on a single SNP

*j*, the estimated marginal (univariate) regression coefficient is . Note that the multivariate regression coefficient is . Squaring gives which has the expected value over random sampling of individuals

The problem with evaluating Equation (4) is that the fourth moment of the genotypes is required and it is generally not known. However, if we assume that higher-order cumulants of the genotype distribution are small compared to the second cumulants, then the distribution governing the genotypes can be approximated with a multivariate normal distribution. We can then use Wick’s theorem (sometimes called Isserlis’s theorem), which states that if (*X*_{1},…,*X*_{2n}) follows a zero-mean multivariate normal distribution, then
where the notation ΣΠ means summing over all distinct ways of partitioning *X*_{1}, . . ., *X*_{2n} into pairs such as *X _{i}X_{j}* and each summand is the product of pair expectations.

Applying Wick’s theorem to the first expectation term of Equation (4) yields
where we have applied Equation (3) and the identity The latter is true in the large-*n* limit or by the path-tracing rules (Wright, 1934). The last line assumes that Σ_{k≠k′} Γ_{kk′}α_{k}α_{k′}, the term distinguishing *h*^{2} from , is small; recall our assumption that these two quantities are close in value.

Similarly, the expected sum of the second, third, and fourth terms in Equation (4) is

Substituting all terms back into Equation (4) and assigning gives

Here we have used
where *θ _{j}* is the angle between

*γ*and

_{j}*α*. Hence, the square of the estimated marginal regression coefficient equals the sum of the following quantities:

the square of the regression coefficient induced by any true average effects of gene substitution;

the square of the bias induced by confounding;

twice the cross-product of the true coefficient and the bias; and

sampling noise with a variance equal to 1/

*n*.

We now consider the conditions under which the slope of the regression is proportional to *h*^{2}. We can compute this explicitly by using Equation (5) in the formula for the regression coefficient. However, a more informative way is to compare to the analogous expression in Bulik-Sullivan et al. (2015b), which in our notation is

Our placement of the subscript LDSC on *h*^{2} emphasize that this factor in the regression slope might not necessarily equal *h*^{2}. In the case of *υ* = 0, the equivalence of (6) to the average of (5) over all SNPs implies
which gives a biased estimate of heritability unless 𝔼_{p}(cos^{2} *θ _{j}*) = 1/

*p*. This conditions can hold the

*γ*are uniformly distributed with respect to

_{j}*α*. Thus, the slope of LD Score regression is proportional to the heritability if the average effects of gene substitution and LD Scores are uncorrelated.

The requirement of this null correlation for an unbiased estimate of *h*^{2} is quite reasonable. Regressing on *l _{j}* to estimate the heritability depends on a constant average per-SNP heritability regardless of LD. If average per-SNP heritability declines in higher-LD regions, say, then the estimated heritability must fall short of the true heritability. This sensitivity to LD is a feature shared with the heritability-estimation method GREML (Speed, Hemani, Johnson, & Balding, 2012; Lee & Chow, 2014; Yang et al., 2015; Chen, 2016).

However, a negative correlation between LD and heritability tagged per SNP is expected. Mutations with larger effects on a given trait will tend to be selectively disfavored as a result of stabilizing selection or deleterious pleiotropic side effects. Such mutations will thus rarely drift to high allele frequencies, and SNPs where one allele is rare tend to have smaller LD Scores. The empirical evidence to date clearly bears out this evolutionary prediction (Kemper, Visscher, & Goddard, 2012; Yang et al., 2015; Gazal et al., 2017). In this case of SNPs with higher LD Scores tagging less heritability, the slope of the regression leads to , an underestimation of the true heritability.

### 3.2 The intercept of univariate LD Score regression as an estimator of confounding

A far more important use of LD Score regression is the estimation and correction of confounding (or any other bias that can inflate the association statistics, such as overestimation of the effective sample size). If the intercept of LD Score regression is truly equal to the average chi-square statistic of null SNPs that neither affect the phenotype nor tag any causal sites, then dividing all of the GWAS chi-square statistics by the intercept should restore the average chi-square statistic of null SNPs to the theoretically proper value of unity and bring the Type 1 error rate close to the targeted level.

It may seem that a failure of the regression slope to equal (*n*/*p*)*h*^{2} may lead the estimate of the regression intercept to be biased, by analogy to the negative sampling covariance between these two parameters. For instance, it may seem that a slope greater than (*n*/*p*)*h*^{2} should lead to a downwardly biased intercept and thus an underestimate of how much confounding and other biases are inflating the GWAS statistics (Bulik-Sullivan et al., 2015b, p. 293). Any such intuition is misleading, however, because of our argument in the previous subsection that there is no relationship between the slope of the regression and *h*^{2}. Furthermore, we will argue that the intercept can be used for the correction of confounding in the situation where LD Scores and SNP-environment correlations are related (i.e., *l _{j}* and are correlated).

Recall that by definition for a null SNP, which is neither causal nor in LD with any causal sites, *β _{j}* = 0. The term 2

*β*is thus also equal to zero for each such SNP. Thus, at

_{j}υ_{j}*l*= 0, Equation (5) gives

_{j}Thus, if is found to be linear in *l _{j}* (which is empirically observed), then the intercept of the regression will equal the average of Equation (8) over all null SNPs. If and

*β*are both independent of

_{j}υ_{j}*l*, then the intercept will equal the average over all SNPs in the GWAS. However, even if these confounding terms have an

_{j}*l*dependence, we only need to know the extent of confounding at truly null SNPs to rescale the chi-square statistics such that the statistics of these SNPs have an expectation in line with the theoretical value under the null hypothesis (i.e., no causality or LD with a causal site). A failure to correctly factor out the contribution of to the chi-square statistics of non-null SNPs simply leaves us with more or less statistical power to detect such SNPs without affecting the Type 1 error rate.

_{j}It is certainly possible to create gross violations of linearity in simulations (Bulik-Sullivan et al., 2015b, Supplementary Fig. 7). For example, if we depopulate high-LD regions of causal SNPs, then the regression curve can be non-monotonic, rising at first and then declining as *l _{j}* increases. In this case the slope of LD Score regression can be negative and the intercept greater than unity even in the absence of confounding (

*υ*= 0). However, no empirical application of LD Score regression has ever uncovered any situation remotely resembling this hypothetical one. Nevertheless it is a salutary practice to inspect the actual scatterplot for any evidence of pathology.

A mild degree of nonlinearity might have some effect on the intercept if the SNPs with largest LD Scores deviate from the linear trend extrapolated from the SNPs with the smallest LD Scores. For this reason it is fortunate that in practice LD Score regression is a weighted regression where the SNPs with the smallest LD Scores receive the largest weights. The purpose of this weighting is to address heteroskedasticity and non-independence; if the regression curve is perfectly linear, then the effect of this weighting is to improve the standard errors. If the curve is nonlinear, then an additional effect is to bring the entire regression line closer to the linear extrapolation from the SNPs with the smallest LD Scores and the intercept thereby closer to the average chi-square statistic of truly null SNPs.

This conclusion regarding the extraordinary robustness of LD Score regression as a safeguard against confounding is a novel result of our analysis. Bulik-Sullivan et al. (2015b) went to some lengths to show that LD Scores are uncorrelated with *F _{ST}* (a measure of population differentiation in allele frequencies) at various geographical scales within Europe. This is very convincing evidence in support of the assumption that confounding is uncorrelated with LD Scores—at least when the confounding takes the form of “population stratification,” the sampling of the individuals in the study from geographically distinct subpopulations differing in both allele frequencies and exposure to environmental factors. But even if confounding is correlated with LD Scores, what we find is that the intercept of LD Score regression can still be used to ensure that null SNPs have the required average chi-square statistic of unity.

With all of these considerations in mind, we consider the recent work of de Vlaming, Johannesson, Magnusson, Ikram, and Visscher (2017). These authors found that a very large in their simulations leads to an intercept falling short of itself and also an overestimate of *h*^{2}. These *in silico* results are rather puzzling because, in addition to contradicting our deductions, they were not replicated by Bulik-Sullivan et al. (2015b) despite apparently similar simulation settings. One possibility is that SNPs with larger LD Scores tend to exhibit higher *F _{ST}* in the cohorts available to de Vlaming et al. (2017), perhaps because of higher-quality imputation leading to more accurate estimates of allele-frequency differences. This would lead to both . (Note that it is only the smaller of the two quantities in the latter inequality that is needed for the restoration of the Type 1 error rate.) Whatever the problem may be, evidence for it can be seen in the scatterplot, which shows a nonlinearity in the leftmost simulated data points that we have never observed in real empirical data. It is also worth noting that the problems in these simulations only arise when population stratification is quite extreme, leading to an intercept greater than 1.5 with rather small sample sizes. In this regime Wick’s theorem may no longer provide a good approximation, although we think this unlikely to be the explanation of the simulation results. In any event intercepts of this magnitude have not yet been observed in actual GWAS.

### 3.3 Bivariate LD Score regression as an estimator of genetic correlations

We now consider LD Score regression as an estimator of the genetic correlation between the two traits

We will use *r*_{LDSC} to denote the genetic correlation as it is estimated by bivariate LD Score correlation—which is not necessarily the same as the true genetic correlation *r* := . Nevertheless, previous studies have found these two quantities to be consistently close (Bulik-Sullivan et al., 2015a; Shi, Mancuso, Spendlove, & Pasaniuc, 2017), and our goal now is to explain this robustness.

The dependent variable in bivariate LD Score regression is now the product of SNP *j*’s two *Z* statistics,
which has the expected value

As before, we can use Wick’s theorem to evaluate the expectation and obtain
where is the genetic covariance, *ρ _{e}* = 𝔼

_{n}(

*e*

_{i1}

*e*

_{i2}) is the environmental covariance, and

*ρ*:=

*ρ*+

_{g}*ρ*. The last three terms arise from the coincidence of the person indices in the summations and thus become smaller with decreasing sample overlap. They vanish if the samples are independent. Henceforth we ignore these overlap-dependent terms. We are then left with

_{e}In LD Score regression (regression of *Z*_{1j}*Z*_{2j} on *l _{j}*), the slope is naively expected to be proportional to the genetic covariance. In the absence of confounding and sample overlap, the intercept is zero since the expected product of two independent and null-distributed

*Z*statistics is zero. Any upward departure of the intercept from zero in this case is indicative of confounders affecting both traits, just as an upward departure from unity is analogously indicative of confounders affecting the focal trait in the univariate case.

As in the univariate case, we can compute the circumstances under which the regression slope is proportional to the genetic covariance explicitly using Equation (10) in the formula for the regression coefficient, but it is more informative to compare directly to the analogous expression from Bulik-Sullivan et al. (2015a),

Assume that *β*_{1j}*υ*_{2j}, *β*_{1j}*υ*_{1j}, and *υ*_{1j}υ_{1j} are all uncorrelated with *l _{j}*; a total absence of confounding,

*υ*

_{1}=

*υ*

_{2}= 0, meets this assumption. We have found that the robustness of bivariate LD Score regression holds in certain importance cases of

*l*dependence, such as a direct effect of parental phenotype discussed by Lee (2012), but these details are beyond the scope of this work. The output of bivariate LD Score regression is then

_{j}The average of (10) over all SNPs and (11) are equivalent if
which we will show is not generally true. As in the univariate case above, the righthand side of Equation (13) can be rewritten as
where is the unit-vector projection of *α _{k}* onto

*γ*. The average over SNPs in (14) is equivalent to taking the unit-vector projections of

_{j}*α*

_{1}onto the

*γ*in turn, doing the same with

_{j}*α*

_{2}, and taking the

*l*-weighted dot product of the two results. From (14) we can see two sources of bias, which can be interpreted geometrically. The first is the nontrivial correlation between

_{j}*γ*and

_{j}*α*as in the univariate case and manifested as nonuniformity in . We will shortly see, however, that this bias cancels from the numerator and denominator of Equation (12). The second source of bias is that the

_{k}*γ*vectors do not form an orthogonal basis over SNP space, which then distorts the angle between

_{j}*α*

_{1}and

*α*

_{2}after projecting onto the

*γ*basis.

We will proceed as if the *γ _{j}* are indeed an orthogonal basis. In reality, they are nearly orthogonal; if the SNPs are numbered in order, then will be virtually zero for |

*j*—

*k*| sufficiently large. Then the angle between

*α*

_{1}and

*α*

_{2}is preserved in the new basis, and we have the condition where

*θ*is the angle between

_{12}*α*

_{1}and

*α*

_{2}. We can then obtain

Inserting (15) and (7) into (12) then gives which is an unbiased estimator of the genetic correlation.

If, on the other hand, it is unacceptable to treat the γ vectors as an orthogonal basis, then LD Score regression will not produce an unbiased estimator of genetic correlation—at least when this quantity is defined as . We can estimate the bias by considering the eigenvalue decomposition *S′*Γ*S* = Λ, where *S* is the orthonormal matrix with columns of eigenvectors and Λ is the diagonal matrix of eigenvalues. We then have

We now decompose Λ^{2} = λ^{2}*I* + Δ and obtain
where λ^{2} represents the average correlation of γ_{j} and *α* and Δ represents the deviation from orthogonality.

## 4 Discussion

The regression of GWAS association statistics on LD Scores partitions the statistics into a part that covaries with LD Scores (the slope) and a part that does not (the intercept). Polygenic causal signal contributes to the first part by necessity, whereas confounding and other biases spuriously inflating the statistics need not—and typically do not—make any such contribution. This insight lies at the heart of the LD Score regression, the outstanding invention of Bulik-Sullivan et al. (2015b).

The reason that the slope of LD Score regression cannot be used to estimate the heritability of a trait (or the genetic covariance between two traits) is that per-SNP heritability (genetic covariance) will itself vary as a function of LD Score, such that naive estimates based on LD Score regression will typically fall short of the target quantities. However, if the dependence of the GWAS association statistics on LD Scores remains linear despite a non-constant heritability per SNP, then there is no coupling whatsoever between slope and intercept. The slope times the appropriate constant can thus exceed or fall short of the heritability (genetic covariance) without affecting the intercept in the slightest.

In order for the intercept to equal the average squared covariance between SNP and residual (“environment”) present in the GWAS (which can then be factored out from the association statistics), LD Scores must be uncorrelated over SNPs with squared SNP-residual covariance. In the framework of Bulik-Sullivan et al. (2015b), this is equivalent to the absence of a correlation between LD Scores and the *F _{ST}* characterizing the two subpopulations. There may be such a correlation, however, in certain cases such as when the phenotype of the parent affects the phenotype of the offspring through some environmental mechanism. Remarkably we found that LD Score regression remains a robust means of correcting the association statistics, for in such a case the intercept becomes the average squared confounding at just those SNPs that are neither causal themselves nor in LD with any causal sites—that is, at precisely those SNPs where otherwise an excess of false positives might occur.

These conclusions depend importantly on the linearity of the relationship between LD Scores and the GWAS chi-square statistics (product of *Z* statistics). This is essentially because without linearity there is no guarantee that the intercept of a particular simple least-squares regression equals the conditional expected value of the dependent variable characterizing observations with a zero value of the independent variable (i.e., truly null SNPs). In real-data applications of LD Score regression to date, the scatterplots have always borne out approximate linearity, and they should continue to be inspected in future applications. When users follow the developers’ recommendations for weighting of the SNPs in the regression, those SNPs with smaller LD Scores will receive larger weights, which in the case of nonlinearity brings the intercept closer to the conditional expected chi-square statistic of null SNPs.

Despite the inability of LD Score regression to estimate the heritability (genetic covariance) without bias, the method is able to estimate the genetic correlation quite accurately. Our argument on this point will be valid if the genetic correlation depends primarily on direct overlap of the causal sites affecting the two traits—and negligibly on SNPs in LD with more potential causal sites thereby being more likely to tag one site affecting trait 1 and a distinct site affecting trait 2, with the signs of the alleles coupled with the reference allele at the tagging SNP showing a consistency across the genome. This tagging of distinct sites with appropriately coupled alleles contributes to the second term of the genetic covariance in Equation (16), which is not a multiplicative bias and therefore cannot be canceled by any division in the calculation of the genetic correlation. Such a genome-wide pattern seems quite implausible; for example, if it is to create a misleading nonzero ^{r}LDSC when *r* is in fact zero, it amounts to causal sites that affect the two traits occurring in the same genes and regulatory elements, with the appropriate coupling of alleles, but never coinciding. Furthermore, one might argue that this biologically implausible scenario does not necessarily invalidate ^{r}LDSC as an estimator of *r* when the latter is defined correctly. We have adopted the definition because this seems most consistent with the definition of heritability given in the original LD Score regression paper (Bulik-Sullivan et al., 2015b, Supplementary Note, p. 1), but other authors have included contributions from LD and consistent coupling of allele signs to the definition of *r* (Lynch & Walsh, 1998).

A use of LD Score regression that we did not study in this work is the functional partition of heritability between different parts of the genome (Finucane et al., 2015). Simulation studies conducted by the authors suggest that this use is also quite robust, and this is probably the result of a similar cancellation of bias from numerator and denominator.

In a field already marked by remarkable progress toward the goal of elucidating the causal relationship between its variables of interest without undue hindrance by confounding, LD Score regression adds a powerful new tool that allows whatever confounding there may be in a GWAS to be estimated and removed. In addition, it is a robust estimator of the genetic correlation, which is valuable in its own right because of its relevance to the causal nature of the phenotypic correlation (Duffy & Martin, 1994). It is fascinating to speculate about why the inference of causation of correlation has proven to be so eminently possible in genetics when it has been elusive in so many other scientific fields (Lee, 2012; Plomin, DeFries, Knopik, & Neiderhiser, 2016). Whatever the reasons, researchers in genetics can be grateful that Nature seems to be willing to oblige their curiosity.

## Footnotes

The authors declare no conict of interest.