Evidence of natural selection in human complex phenotypes from GWAS summary statistics

We propose an extended Gaussian mixture model for the distribution of causal effects of common single nucleotide polymorphisms (SNPs) for human complex phenotypes, taking into account linkage disequilibrium (LD) and heterozygosity (H), while also allowing for independent components for small and large effects. Using a precise methodology showing how genome-wide association studies (GWAS) summary statistics (z-scores) arise through LD with underlying causal SNPs, we applied the model to multiple GWAS. Our findings indicated that causal effects are distributed with dependence on a SNP’s total LD and H, whereby SNPs with lower total LD are more likely to be causal, and causal SNPs with lower H tend to have larger effects, consistent with the influence of negative pressure from natural selection. The degree of dependence, however, varies markedly across phenotypes.


INTRODUCTION
There is currently great interest in the distribution of causal effects among trait-associated single nucleotide polymorphisms (SNPs), and recent analyses of genome-wide association studies (GWAS) have begun uncovering deeper layers of complexity in the genetic architecture of complex traits [21,13,14,59,60]. This research is facilitated by using new analytic approaches to interrogate structural features in the genome and their relationship to phenotypic expression. Some of these analyses take into account the fact that different classes of SNPs have different characteristics and play a multitude of roles [42,43,12]. Along with different causal roles for SNPs, which in itself would suggest differences in distributions of effect-sizes for different sets of causal effects, the effects of minor allele frequency (MAF) of the causal SNPs and their total correlation with neighboring SNPs are providing new insights into the action of selection on the genetic architecture of complex traits [14,56,60].
Selection pressure can be assessed by measuring the extent of long-range linkage disequilibrium (LD), given heterozygosity and local recombination rates. In general, the signature of positive selection is higher heterozygosity of variants given the extent of LD (or higher extended LD, given the heterozygosity). Negative selection acts to keep variants (mutations) deleterious to fitness at low frequency (or ultimately remove them), a process that is facilitated by higher recombination rates [18]. There are more ways of being wrong than being right: any given mutation is more likely to be deleterious to, rather than aid, fitness. Since weakly deleterious variants will take a while to be removed, recent variants are likely to be deleterious [29]. In addition, the larger the effect of a deleterious variant the more efficient negative selection will be, which all suggests that the lower the MAF the larger the effect sizewhich indeed is expected under evolutionary models [36] and consistent with empirical findings [34]. On the other hand, older variants, being old, are likely to have been positively selected (perhaps pleiotropically with a fitnessrelated trait), and over time will acquire LD with recent variants [14]. So when negative selection is operating, we should expect to find more and more effects with larger and larger effect size at lower LD score (or total LD) and lower heterozygosity.
Several recent reports focused on individual aspects of genetic architecture and their potential roles in natural selection, or excluded important quantities like polygenicity, but it is important to treat disparate (and overlapping) architectural features simultaneously to avoid over-interpreting the role of any individual feature and more precisely predict quantities of interest like heritability. Here we present a unifying approach, using Bayesian analysis on GWAS summary statistics and maximum likelihood estimation of model parameters, to explore the role of total LD (TLD) on the distribution of causal SNPs, and the impact of MAF on causal effect size, while simultaneously incorporating multiple effect-size distributions (TLD used here and previously [22,13] is similar to LD score [3] but takes into account more neighboring SNPs; it is defined in the Methods section). Treating each of these factors independently is likely to provide a misleading assessment. Thus, we posit a model for the prior distribution of effect sizes, and building on previous work [21], fit it to the GWAS summary statistics for a wide range of phenotypes. From the estimated set of model parameters for a phenotype, we generate simulated genotypes and calculate the corresponding z-scores. With a wide range of model parameter values across real phenotypes, the specificity of the individual parameters for a given phenotype make them narrowly defining of the distribution of summary statistics for that phenotype; since each parameter is physically motivated, they allow for direct interpretation, and the more accurate the model fit the more accurate are estimates of physical quantities based on the parameter values. We find that the detailed distributions of GWAS summary statistics for real phenotypes are reproduced to high accuracy, while many of the parameters used to set up a simulated phenotype are accurately reproduced by the model. The distribution of causal SNPs with respect to their TLD will be shown to vary widely across different phenotypes. Additionally, accurate detailed model reproduction of empirical distributions of GWAS summary statistics, while taking into account complexity arising from selection pressure and categories of strong and week effects, argues in favor of greater accuracy in estimating "SNP heritabilty" [46], the measure of trait heritability that can in principle be gleaned from GWAS summary statistics.
In previous work [21], which builds on earlier reports of others (e.g., [61,10,15]), we presented a Gaussian mixture model to describe the distribution of underlying causal SNP effects (the "β" simple linear regression coefficients that indicate the strength of association between causal variants and a phenotype). Due to extensive and complex patterns of LD among SNPs, many non-causal SNPs will exhibit a strong association with phenotypes, resulting in a far more complicated distribution for the summary zscores. The basic model for the distribution of the causal β's is a mixture of non-null and null normal distributions, the latter (denoted N (0, 0)) being just a delta function (or point mass at zero): where π 1 is the polygenicity, i.e., the proportion of SNPs that are causal (equivalently, the prior probability that any particular SNP is causal), and σ 2 β is the discoverability, i.e., the variance of the non-null normal distribution, which was taken to be a constant across all causal SNPs. The distribution of z-scores arising from this shows strong heterozygosity and TLD dependence (Eqs. 21 and 29 in [22]). Correctly implementing this relatively simple model (Eq. 1) and applying it to GWAS summary statistics involved a good deal of complexity, but resulted in remarkably accurate estimation of the overall distribution of z-scores, as demonstrated by the fit between the actual distribution of z-scores for multiple phenotypes and the model predictions, visualized on quantile-quantile (QQ) plots. As noted above, the more accurate the fit the more correct are the values of the physically meaningful parameters; additionally, these values help distinguish the genetic architectures of different phenotypes, and enable prediction, for example of sample size required to detect the preponderance of causal SNPs.
Given that GWAS, due at least to lack of power, consistently failed to discover the SNPs that explain the bulk of heritability, the objective of the model was to characterize the likely large number of causal SNPs of weak signal that evaded detection -and likely would explain missing or hidden heritability. Thus, the model was not intended to accurately predict the tails of the distribution of zscores, which for large enough GWAS would correspond to already discovered SNPs (i.e., whose summary p-values reached genome-wide significance), and involved the simplifying assumption that the bulk of causal effects roughly followed the same Gaussian. Nevertheless, the predicted distributions captured the main characteristics of the SNP associations for many phenotypes. For some phenotypes, however, the fit was relatively poor, and even for phenotypes where the overall fit was good, a breakdown of the z-score distributions for SNPs stratified by heterozygosity and TLD indicated some unevenness in how well the tails of the sub-distributions were predicted. Additionally, recent work by others focusing on selection pressure [59,41] indicated that it is important to take SNP heterozygosity into account as a component in discoverability (initially explored by examining four fixed relationships in [48]; see also [26]), and that an additional Gaussian distribution for the β's might be appropriate if large and small effects are distributed differently [60]. These approaches, however, were not combined (the former involving a single causal Gaussian incorporating heterozygosity, the latter involving two Gaussians with no heterozygosity dependence). An extra complexity is that TLD might play an important role in the distribution of causal effects [14]. It is unclear how all these factors impact each other. A final and very important matter is that most analyses in the literature by other groups were conducted in the "infinitesimal" model framework [41,46,14,12,3], where all SNPs are causal, though there is compelling evidence that only a very small fraction of SNPs are in fact causal for any given phenotype [59,62,60,13,43,22].
In the current work, we sought to extend our earlier work to incorporate multiple Gaussians, while taking into account TLD and selection effects reflected in heterozy-gosity, in modeling the distribution of causal β's. Note that this is independent of the need to correctly take heterozygosity and TLD into account when building a distribution for z-scores even when the distribution of causal effects (β's) does not depend on these factors, as in our earlier work [22]. We then applied the model to study evolutionary aspects of complex human phenotypes, using large-GWAS summary statistics.

MODEL
The appropriate methodology calls for using an extensive reference panel that likely involves all possible causal SNPs with MAF greater than some threshold (e.g., 1%), and regard the z-scores for typed or imputed SNPs -a subset of the reference SNPs -as arising, directly or through LD, from the underlying causal SNPs in the reference panel.
Since the single causal Gaussian, Eq. 1, has provided an appropriate starting point for many phenotypes, it is reasonable to build from it. With additional terms included, if it turns out that this original term is not needed, the fitting procedure, if implemented correctly, should eliminate it. Also, anticipating extra terms in the distribution of causal β's, we introduce a slight change in labeling the Gaussian variance (σ 2 β → σ 2 b ), and write the distributions for the causal component only -it being understood that the full distribution will include the last term on the right side of Eq. 1 for the prior probability of being null.
Given that for some phenotypes there is strong evidence that rarer SNPs have larger effects, we next include a term that reflects this: a Gaussian whose variance is proportional to H S , where H is the SNP's heterozygosity and S is a new parameter which, if negative, will reflect the noted behavior [59]. With the addition of the new term, the total prior probability for the SNP to be causal is still given by π 1 . Thus, extending Eq. 1, we get: (ignoring, as noted, the null component (1 − π 1 )N (0, 0)), where p c (0 p c 1) is the prior probability that the SNP's causal component comes from the "c" Gaussian (with variance H S σ 2 c ), and p b ≡ 1 − p c is the prior probability that the SNP's causal component comes from the "b" Gaussian (with variance σ 2 b ). This extension introduces an extra three parameters p c , σ c , and S, assumed for the moment to be the same for all SNPs. Ignoring inflation and implementation details like choice of reference panel and parameter estimation scheme, setting p c ≡ 1 recovers the model distribution assumed in [59]; additionally setting π 1 ≡ 1 recovers the primary model distribution assumed in [41]; and additionally setting S ≡ −1 recovers the model assumed in [3], while instead setting S ≡ −0.25 partially recovers the "recommended" LD-adjusted kinships (LDAK) model distribution in [47,46].
The LDAK model of Speed et al also incorporates two predetermined fixed factors that scale the variance of the causal effect size distribution: SNP-specific local LD weightings and "information scores" (the latter a quantification of SNP quality [47]). The local LD weighting (which remains implicitly dependent on MAF) is designed to scale down the effect size in the context of the infinitesimal model, to try to compensate (in empirical z-scores) for the effect itself being replicated through LD with neighboring SNPs. It should be noted that this problem of LD is implicitly taken care of in our basic model by: (1) using a point-normal distribution for non-causal SNPs; (2) an extensive underlying reference panel; and (3) directly modeling the effects of LD on z-scores in our PDF for GWAS summary statistics -e.g., see Eq. 21 in [22]. The LDAK local LD weighting scheme is quite distinct from exploring the possibility of whether or not a SNP is causal depending on its TLD, which is the role of π 1 × p c in the extended model presented here (see below), or the degree to which its effect size is LD-dependent. Rather, within the infinitesimal framework (i.e., assuming a polygenicity of 1) and not explicitly taking the direct effects of LD on z-scores into account (as is done in Eq. 21 noted above), it unwinds the otherwise implicit problem -that SNPs with larger TLD would have over-inflated z-scores -by means of an assumed exponential decay function with respect to base-pair distance (such a function provides an approximation for the gross decay of long-range LD [37,24]).
In the LD Score regression model, which is another instance of the infinitesimal framework, Finucane et al [12] introduce an annotation-weighted LD score l(j, c) for variant j in LD with reference SNPs with (potentially overlapping) categorical annotation c, finding that different categories of SNPs are differentially enriched for heritability. Gazal et al [14] extend this to include continuous-valued annotations, and also introduce rank-based inverse normal transformation of the LD score, denoted LLD, which is calculated independently for different MAF bins and, in this way, unlike LD Score, is intended to be independent of MAF, with the effect size variance for SNP j given as a sum of terms: a baseline parameter common to all SNPs, plus one of ten parameters depending on which of ten bins the MAF of j is in, plus another parameter times the LLD of j.
Schoech et al [41] implement a variation of their main model, again in the infinitesimal framework, by augmenting their variance with an additional factor (1 − τ · LLD j ), searching over five values of the new parameter τ along with the original 20 values of the selection parameter in a fixed 2D grid.
Within these infinitesimal models [48,47,46,12,14,41], it is not clear the degree to which explicit LD-dependence in the variance of the causal effect size merely takes into account the effect on z-scores due to LD with causal SNPs, and how much it models any true effect of LD on underlying causal effect size. Also, such models preclude examining if the TLD of a variant has any bearing on whether the variant is causal. In contrast is the "BayesS" model of Zeng et al [59], a causal mixture model (i.e., not infinitesimal), using individual genotype data and a reference panel of ∼484k non-imputed SNPs, that examines the effects of heterozygosity on effect size. The model we present here is in some respects an extension of that, but based on summary statistics, adding TLD dependence and an additional Gaussian, and we fit the model from a reference panel of 11 million SNPs.
Along with heterozygosity, SNP effect size might independently depend on TLD (for which we use the variable L in equations below), and in principle this could be explored in a manner similar to how heterozygosity is incorporated in the "c" causal effect Gaussian variance. However, TLD and heterozygosity are often related (given TLD, the expected heterozygosity shows a distinct well-defined pattern for SNPs with TLD<200, i.e., about 80% of SNPs -see Supplementary Figure S8), and independent contributions -wich may exist [14] -might be difficult to disentangle. Instead, here we explore a separate potential role for TLD.
Patterns of TLD among SNPs are effectively constant in a population over a few generations, but there is no a priori reason why the probability (in a Bayesian sense) of a SNP's being causally associated with any particular trait in the populaiton should be independent of the SNP's TLD. For example, there might be lower probability of association for SNPs with larger TLD. Indeed, with the complex interaction of multiple genetic forces such as mutation, genetic drift, and selection, the net role of LD (particularly TLD), through mechanisms like background selection and genetic hitching, on causal association with a particular phenotype is not clear. As noted above, when negative selection is operating (when effects are deleterious to fitness), it is likely that there are increasing numbers of larger effects at lower TLD. If the "c" Gaussian is capturing larger effects from rarer SNPs, reflecting selection pressure, it seems reasonable to inquire if the prior probability for a causal SNP's contribution from the "c" Gaussian is TLD-mediated. Here, instead of treating p c as a constant, we explore the possibility that it is larger for SNPs with lower TLD (in the event that this probability is in fact constant, or increasing, with respect to TLD, we would at least not expect to find it decreasing). This can be accomplished by means of a generalized sigmoidal function that will have a maximum at very low TLD, might maintain that maximum for all SNPs (equivalently, p c is a constant), or decrease in amplitude slowly or rapidly, possibly to 0, for SNPs with higher TLD. Letting the variable L be the TLD of a SNP, such a function of TLD can be characterized by three parameters: its amplitude (at L = 1), the TLD at the midpoint of the sigmoidal transition, and the width of the sigmoidal transition (over a wide or narrow range of TLD). We use the following general form with three parameters, y max , x mid , and x width : defined in the range −∞ < x < ∞, for which y(x) is monotonically decreasing and bounded 0 < y(x) < y max , with  Table S10 for parameter values). These functions can be summarized by three quantities: the maximum value, p c1 , which occurs at L = 1; the total LD value, L = mc, where pc(mc) = p c1 /2, give by the gray dashed lines in the figure; and the total LD width of the transition region, wc, defined as the distance between where pc(L) falls to 95% and 5% of p c1 given by the flanking red dashed lines in the figure. Numerical values of p c1 , mc, and wc are given in Table 1  0 y max 1; −∞ < x mid < ∞ locates the mid point of the overall sigmoidal transition (y(x mid ) = y max /2), and 0 < x width < ∞ controls its width (y(x) smoothly changing from a Heaviside step function at x mid as x width → 0 to a constant function as x width → ∞). Examples are shown in Figure 1 (scaled by π 1 , giving the physically interestng total prior probability of a SNP being causal with respect to the selection "c" Gaussian, as a funciton of the SNP's TLD); parameter values are in Supplementary Table S10. Mathematically, the curve can continue into the "negative TLD" range, revealing a familiar full sigmoidal shape; since we are interested in the range 1 x max(TLD), below we report the actual mid point (denoted m c ) and width (w c , defined below) of the transition that occurs in this range. Then where p c (L) is the sigmoidal function (0 p c (L) 1 for all L) given by y(x) in Eq. 3 for L = x 1, which numerically can be found by fitting for its three characteristic parameters.
As a final possible extension, we add an extra term -a "d" Gaussian -to describe larger effects not well captured  Figure 2: QQ plots of (pruned) z-scores for qualitative phenotypes (dark blue, 95% confidence interval in light blue) with model prediction (yellow). See Supplementary Material Figures S15 to S22. The value given for p c1 is the amplitude of the full pc(L) function, which occurs at L = 1; the values (mc, wc) in parentheses following it are the total LD (mc) where the function falls to half its amplitude (the middle gray dashed lines in Figure 1 are examples), and the total LD width (wc) of the transition region (distance between flanking red dashed lines in Figure 1). Similarly for p d1 (m d , w d ), where given. h 2 b , h 2 c , and h 2 d are the heritabilities associated with the "b", "c", and "d" Gaussians, respectively. h 2 is the total SNP heritability, reexpressed as h 2 l on the liability scale for binary phenotypes. Parameter values are also given in Table 1 and heritabilities are also in Table 3; numbers of causal SNPs are in Table 2. Reading the plots: on the vertical axis, choose a p-value threshold for typed or imputed SNPs (SNPs with z-scores; more extreme values are further from the origin), then the horizontal axis gives the proportion, q, of typed SNPs exceeding that threshold (higher proportions are closer to the origin). See also Supplementary Figure  S1, where the y-axis is restricted to 0 −log 10 (p) 10 by the "b" and "c" Gaussians. This gives finally: where σ 2 d is a new parameter, p d (L) is another general sigmoid function (0 p d (L) 1 for all L) where now there is the added constraint 0 p c (L) + p d (L) 1, and the prior probability for the "b" Gaussian becomes Depending on the phenotype and the GWAS sample size, it might not be feasible, or meaningful, to implement the full model. In particular, for low sample size and/or low discoverability, the "b" Gaussian is all that can be estimated, but in most cases both the "b"and "c" Gaussians can be estimated, and β will be well characterized by Eq. 4.
As we described in our previous work [22], a z-score is given by a sum of random variables, so the a posteriori pdf (given the SNP's heterozygosity and LD structure, and the phenotype's model parameters) for such a composite random variable is given by the convolution of the pdfs for the component random variables. This facilitates an essentially exact calculation of the z-score's a posteriori pdf arising from the underlying model of causal effects as specified by Eq. 1, 4, or 5.
For our reference panel we used the 1000 Genomes phase 3 data set for 503 subjects/samples of European ancestry [6,5,52]. In ref. [22] we describe how we set up the general framework for analysis, including the use of the reference panel and dividing the reference SNPs into a sufficiently fine 10×10 heterozygosity×TLD grid to facilitate computations.
In our earlier work on the "b" model we gave an expression, denoted G(k), for the Fourier transform of the genetic contribution to a z-score, where k is the running Fourier parameter. The extra complexity in the "c" and "d" models here requires a modification only in this term, which we describe in the Methods. In addition to the parameters presented above, we also include an inflation parameter σ 0 : if z u denotes uninflated GWAS z-scores and z denotes actual GWAS z-scores, then σ 0 is defined by z = σ 0 z u [22,9]. The optimal model parameters for a particular phenotype are estimated by minimizing the negative of the log likelihood of the data (z-scores) as a function of the parameters. This is done with the "b" model as before [22], and then proceeding iteratively with the more complex models, continually re-estimating the new values of the earlier parameters that maximize the likelihood when a new parameter is introduced, with extensive single and multiple parameter searches (to avoid being trapped in local minima) until all parameters simultaneously are at a convex minimum. This involved many explicit and repeated coarse-and fine- grained linear (for each individual parameter) and grid (for multiple parameters simultaneously) searches as well as many Nelder-Mead multidimensional unconstrained nonlinear minimizations. There is no artificial constraining on the parameters; for example, no prior assumption is made about the relative sizes of the causal σ's, or the parameter values of the LD-dependent prior probabilities (the full range of amplitudes, transition widths, and location of the mid-point of the transitions are searched).
The total SNP heritability is given by the sum of heritability contributions of each SNP in the reference panel from each of the relevant Gaussians. In the Bayesian approach, we do not know the value of the causal effect of any particular SNP, but we assume it comes from a distribution, β(H, L), which characterize our ignorance of it, where H is the SNP's heterozygosity and L is its total LD. For a specific Gaussian ("b", "c", or "d") in our model, the contribution of the SNP to heritability is given by the prior probability that the SNP is causal with respect to that Gaussian, times the expected value of the square of the effect size, E(β 2 ), times H [22]. For the "c" Gaussian, for example, the prior probability that the SNP is causal is π 1 p c (L), and E(β 2 ) is just the variance, H S σ 2 c . Thus, the contribution of this SNP to the overall heritability associated with the "c" Gaussian, h 2 c , is π 1 p c (L)H S σ 2 c H. Below we report the sums over all such contributions. The number of causal SNPs associated with the "c" Gaussian, n c , is given by summing π 1 p c (L) for each reference panel SNP, and similarly for the other Gaussians. All heritabilities and discoverabilities are, as before [22], corrected with respect to the inflation parameter, i.e., divided by σ 2 0 .
All code used in the analyses, including simulations, is publicly available on GitHub [20,19].

Data Preparation
We analyzed summary statistics for fourteen phenotypes-  [54]. Most participants were of European ancestry. (A spreadsheet giving data sources is provided in the Supplement.) In the tables, we also report results for body mass index (GIANT 2015) (N = 233,554) [28], and height (UKB-GIANT 2018) [58].
In Figures 2 and S1 we report effective sample sizes, N ef f , for the case-control GWASs. This is defined as N ef f = 4/(1/N cases + 1/N controls ), so that when N cases = N controls , N ef f = N cases + N controls = N , the total sample size, allowing for a straightforward comparison with quantitative traits.
Confidence intervals for parameters were estimated using the inverse of the observed Fisher information matrix (FIM). The full FIM was estimated for up to eight parameters used in Model C, and for the remaining parameters that extend the analysis to Model D the confidence intervals were approximated ignoring off-diagonal elements. Additionally, the w d parameter was treated as fixed quantity, the lowest value allowing for a smooth transition of the p d (L) function to 0 (see Supplementary Material Figure S7; for CD, UC, and TC, however, the function p d (L) was a constant (=p d (1)). For the derived quantities h 2 and n causal , which depend on multiple parameters, the covariances among the parameters, given by the off-diagonal elements of the inverse of the FIM, were incorporated. Numerical values are in Supplementary Material Tables S5-S9.
In order to carry out realistic simulations (i.e., with realistic heterozygosity and LD structures for SNPs), we used HAPGEN2 [27,49,51] to generate genotypes for 10 5 samples; we calculated SNP MAF and LD structure from 1000 simulated samples.

Phenotypes
Summary QQ plots for pruned z-scores are shown in Figure 2 for seven binary phenotypes (for AD we separate out chromosome 19, which contains the APOE gene), and in Figure 3 for seven quantitative phenotypes (including two separate GWAS for height), with model parameter values in Table 1; breakdowns of these summary plots with respect to a 4×4 grid of heterozygosity×TLD (each grid a subset of a 10×10 grid) are in Supplementary Material Figures S15-S29. For each phenotype, model selection (B, C, or D) was performed by testing the Bayesian information criterion (BIC) -see Supplementary Table S4. For comparison, all QQ figures include the basic (B) model in green; the extended model C or D, consistently demon-  Table 1: Model parameters for phenotypes, case-control (upper section) and quantitative (lower section). π 1 is the overall proportion of the 11 million SNPs from the reference panel that are estimated to be causal. pc(L 1) is the prior probability multiplying the "c" Gaussian, which has variance H S σ 2 c , where H is the reference SNP heterozygosity. Note that pc(L) is just a sigmoidal curve, and can be characterized quite generally by three parameters: the value p c1 ≡ pc(1) at L = 1; the total LD value L = mc at the mid point of the transition, i.e., pc(mc) = p c1 /2 (see the middle gray dashed lines in Figure 1, which shows examples of the function pc(L)); and the width wc of the transition, defined as the distance (in L) between where the curve falls to 95% and 5% of p c1 (distance between the flanking red dashed lines in Figure  1). Note that for AD Chr19, AD NoC19, and ALS Chr9, π 1 is the fraction of reference SNPs on chromosome 19, on the autosome excluding chromosome 19, and on chromosome 9, respectively. For bipolar disorder, Crohn's disease, ulcerative colitis, and total cholesterol p d (L) = p d1 for all L. Examples of H S multiplying σ 2 c are shown in Supplementary Material Figure S9. Estimated Bayesian information criterion (BIC) values for three models (B, C, and D) are shown in Supplementary material Table S4: the 3-parameter model B with only the "b" Gaussian (π 1 , σ b , σ 0 ); the 8-parameter model C with both the "b" with "c" Gaussians (Eq. 4); and the 12-parameter model D with "b", "c" and "d" Gaussians (Eq. 5). 95% confidence intervals are in Supplementary Materials Tables S5-S7.  Table 2: Numbers of causal SNPs. n causal is the total number of causal SNPs (from the 11 million in the reference panel); n b , nc, and n d are the numbers associated with the "b", "c", and "d" Gaussians, respectively. 95% confidence intervals are in Supplementary Materials Table S8.
strating improved fits, in yellow; and the data in blue. The distributions of z-scores for different phenotypes are quite varied. Nevertheless, for most phenotypes analyzed here, we find evidence for larger and smaller effects being distributed differently, with strong dependence on total LD, L, and heterozygosity, H.
Our model estimates polygenicity as a one-dimensional function of L. We find that polygenicity is dominated by SNPs with low L. However, the degree of restriction varies widely across phenotypes, depending on the shapes and sizes of p c (L) and p d (L) in Eq. 5, the prior probabilities that a causal SNP belongs to the "c" and "d" Gaussians. These prior probabilities are shown in Figure 1 and Supplementary Figures S5-S7. Taking into account the underlying distribution of reference SNPs with respect to heterozygosity, these distributions lead to a varied pattern across phenotypes of the expected number of causal SNPs in equally-spaced elements in a two-dimensional H×TLD grid, as shown for height (2014) in Figure 4 C, and for all phenotypes in Supplementary Figures S11-S14 (third columns). Further, for any given phenotype, the effect sizes of causal variants come from distributions whose variances can be widely different -by up to two orders of magnitude. Thus, given the prior probabilities (p b , p c , and p d ) by which these distributions are modulated as a function of L, we are able to estimate the expected effect size per causal SNP, E(β 2 ), in each H×TLD grid element, as shown in Figure   , h 2 c , and h 2 d are the heritabilities associated with the "b", "c", and "d" Gaussians, respectively. The last column, labeled %c, gives the percentage of SNP heritability that comes from the "c" Gaussian. 95% confidence intervals are in Suporting Materials Table S9. A comparison of the total heritabilities with estimates from our basic model alone [22], and with estimates from other models [60,59,41] are given in Supplementary Table S1. shapes and sizes of p c (L) and p d (L), SNPs with lower L have larger E(β 2 ). However, the selection parameter S in the "c" Gaussian has a large impact on E(β 2 ) as a function of H (see Supplementary Figure S9). As a result, for most phenotypes, we find that the effect sizes for low MAF causal SNPs (H < 0.05) are several times larger than for more common causal SNPs (H > 0.1). We find that heritability per causal SNP is larger for lower L, a general pattern that follows from at least one of the prior probabilities, p c (L) or p d (L), being non-constant. However, because heritability per causal SNP is proportional to H, we find that, even with negative selection parameter, S (and thus larger E(β 2 ) for lower H), the heritability per causal SNP is largest for the most common causal SNPs (H > 0.45).
SNP heritability estimates from the extended model, as shown in Table 3, are uniformly larger than for the exclusive basic model [22]. Since they are calculated from an arguably more realistic distribution that gives a more accurate fit to the empirical z-scores, the values in turn are expected to be a more accurate assessment of the SNP heritability inherent in GWAS summary statistics. With the exception of BMI, generally a major portion of SNP heritability was found to be associated with the "selection" component, i.e., the "c" Gaussian -see the right-most column in Table 3.

Simulations
To test the specificity of the model for each real phenotype, we constructed simulations where, in each case, the true causal β's (a single vector instantiation) for all reference panel SNPs were drawn from the overall distribution defined by the real phenotype's parameters (thus being the "true" simulation parameters). We set up simulated phenotypes for 100,000 samples by adding noise to the genetic component of the simulated phenotype [22], and performed a GWAS to calculate z-scores. We then sought to determine whether the true parameters, and the component heritabilities, could reasonably be estimated by our model. In Supplementary Figures S3 and S4 we show the results for the simulated case-control and quantitative phenotypes, respectively. Overall heritabilities were generally faithful to the true values (the values estimated for the real phenotypes) -see Supplementary Table S3 -though for Crohn's disease the simulated value was overestimated due to the h d component. Note that for the case-control simulated phenotypes, the heritabilities on the observed scale, denotedĥ 2 in Supplementary Figure S3, should be compared with the corresponding values in Figure 2, not withĥ 2 l , which denotes heritability on the liability scale, i.e., adjusted for population prevalence; note also that for case-control phenotypes, we implicitly assume the same proportion P of cases in the real and simulated GWAS (for the basic model, assuming the liability-scale model is correct, one can easily see that discoverability σ 2 β ∝ P (1−P )see Eqs. 38 and 39 in [22]; this carries over to σ b , σ c , and σ d used here). Polygenicities and discoverabilities were also generally faithfully reproduced. However, for ALS re-stricted to chromosome 9, and BMI, the selection parameter was incorrectly estimated, owing to the weak signal in these GWAS (e.g., for BMI, π 1 p c (1) ≃ 8 × 10 −6 , Fig. S6 (A), and only around 5% of SNP heritability was found to be associated with the "c" Gaussian, Table 3) and very low polygenicity (small number of causal SNPs) for the "c" Gaussian. Given the wide variety and even extreme ranges in parameters and heritability components across diverse simulated phenotypes, the multiple simulated examples provide checks respect to each-other for correctly discriminating phenotypes by means of their model parameter estimates: the results for individual cases are remarkably faithful to the respective true values, demonstrating the utility of the model in distinguishing different phenotypes.

DISCUSSION
We propose an extended Gaussian mixture model for the distribution of underlying SNP-level causal genetic effects in human complex phenotypes, allowing for the phenotypespecific distribution to be modulated by heterozygosity, H, and total LD, L, of the causal SNPs, and also allowing for independent distributions for large and small effects. The GWAS z-scores for the typed or imputed SNPs, in addition to having a random environmental and error contribution, arise through LD with the causal SNPs. Thus, taking the detailed LD and heterozygosity structure of the population into account by using a reference panel, we are able to model the distribution of z-scores and test the applicability of our model to human complex phenotypes.
Complex phenotypes are emergent phenomena arising from random mutations and selection pressure. Underlying causal variants come from multiple functional categories [42], and heritability is known to be enriched for some functional categories [12,43]. Thus, it is likely that different variants will experience different evolutionary pressure -negative, neutral, or positive -due at least to pleiotropic roles. Indeed, heritability has been found partially to depend on functional annotations differentially correlated with levels of LD [14], consistent with the operation of negative selection pressure.
Here, we find evidence for markedly different genetic architectures across diverse complex phenotypes, where the polygenicity (or, equivalently, the prior probability that a SNP is causal) is a function of SNP total LD (L), and discoverability is multi-component and MAF dependent.
In contrast to previous work modeling the distribution of causal effects that took total LD and multiple functional annotation categories into account while implicitly assuming a polygenicity of 1 [14], or took MAF into account while ignoring total LD dependence and different distributions for large and small effects [59], or took independent distributions for large and small effects into account (which is related to incorporating multiple functional annotation categories) while ignoring total LD and MAF dependence, here we combine all these issues in a unified way, using an extensive underlying reference panel of ∼11 million SNPs and an exact methodology using Fourier transforms to relate summary GWAS statistics to the posited underlying distribution of causal effects [22]. We show that the distributions of all sets of phenotypic z-scores, including extreme values that are well within genome-wide significance, are accurately reproduced by the model, both at overall summary level and when broken down with respect to a 10×10 H×TLD grid -even though the various phenotypic polygenicities and per-causal-SNP heritabilities range over orders of magnitude. Improvement with respect to the basic model can be seen in the summary QQ plots Figs. 2 and 3, and also the H×TLD grids of subplots Figs. S15-S29. But even when model fits are similar, the extended model fit taking selection pressure into account results in higher likelihood and lower Bayesian informaiton criterion.
As noted in the Introduction, negative selection can be expected to result in an increasing number of effects with increasing effect size at lower total LD and lower heterozygosity. It was found in [14] -which, in addition to analyzing total LD, modeled allele age and recombination ratesthat common variants associated with complex traits are weakly deleterious to fitness, in line with an earlier model result that most of the variance in fitness comes from rare variants that have a large effect on the trait in question [11]. Thus, larger per-allele effect sizes for less common variants is consistent with the action of negative selection. Further, based on a model equivalent to Eq. 2 with p c ≡ 1, it was argued in [59], using forward simulations and a commonly used demographic model [16], that negative values for the selection parameter, S, which leads to larger effects for rarer variants, is a signature of negative selection. Similar results were found for the related infinitessimal model (p c ≡ 1 and π 1 ≡ 1) in [41].
We find negative selection parameter values for most traits, which is broadly in agreement with [59] with the exception of BMI, which we find can be modeled with two Gaussians with no or weakly positive (S 0) heterozygosity dependence, though it should be noted that the polygenicity for the larger-effects Gaussian (the "c" Gaussian with the S parameter) is very low, amounting to an estimate of < 100 common causal SNPs of large effect. A similar situation (S ≃ 0) obtains with ALS restricted to chromosome 9; here, the sample size is relatively low, leading to a weak signal, and we estimate only 7 common causal SNPs associated with the "c" Gaussian. Our BMI results are not in agreement with the results of others. is in reasonable agreement with S = −0.422 reported in [59], and with α = −0.45 reported in [41].
Compared with our basic model results [22], we generally have found that using the extended model presented here heritabilities show marked increases, with large contributions form the selection ("c") Gaussian -see Supplementary Table S1. However, due to the small effect sizes associated with the "b" Gaussian as a component of the extended model, the overall number of causal SNPs can be considerably larger. Supplementary Figures S11-S14 capture the breakdown in SNP contributions to heritability as a function of heterozygosity and total linkage disequilibrium.
Generally, we find evidence for the existence of genetic architectures where the per causal-SNP heritability is larger for more common SNPs and/or for SNPs with lower total LD. But the trend is not uniform across phenotypessee Supplementary Figures S11-S14, second columns. The observed-scale heritability estimates given by h 2 b correspond to effects not experiencing much selection pressure. The new final values presented here result from a model that gives better fits between data and model prediction of the summary and detailed QQ plots, and thus constitute more accurate estimates of SNP heritability. For 2010 and 2014 height GWASs, we obtain very good consistency for the model parameters and therefore heritability, despite considerable difference in apparent inflation. The 2018 height GWAS [58] has a much larger sample size (almost three quarters of a million); the slightly different parameter estimates for it might arise due to population structure not fully captured by our model [45,2]. Our h 2 estimates for height, however, remain consistently lower than other reported results (e.g., h 2 = 0.33 in [60], h 2 = 0.527 in [59], h 2 = 0.61 in [41]). For educational attainment, our heritability estimate agrees with [59] (h 2 = 0.182), despite our using a sample more than twice as large. The difference in selection parameter value, S = −0.335 in [59] versus S = −0.44 here, might partially be explained by the model differences (Zeng et al use one causal Gaussian with MAF-dependence but no LD dependence). A comparison of the total heritabilities reported here with estimates from the work of others [60,59,41], in addition our basic model alone [22], are given in Supplementary Table S1.
For most traits, we find strong evidence that causal SNPs with low heterozygosity have larger effect sizes (S < 0 in Table 1; the effect of this as an amplifier of σ 2 c in Eq. 5 is illustrated in Supplementary Material Figure S9)see also Supplementary Figures S11-S14, fourth columns. Thus, negative selection seems to play an important role in most phenotypes-genotypes. This is also indicated by the extent of the region of finite probability for variants of large effect sizes (which are enhanced by having S −0.4) being relatively rare (low H), which will be greater for larger p c1 , the amplitude of the prior probability for the "c" Gaussian (see Supplementary Figures S5 and S6).
The "b" Gaussian in Eq. 4 or 5 does not involve a selection parameter: effect size variance is independent of MAF. Thus, causal SNPs associated with this Gaussian are likely undergoing neutral (or very weakly negative) selection. It should be noted that in all traits examined here, whether or not there is evidence of negative selection (S < 0), the effect size variance of the "b" Gaussian is many times smaller -sometimes by more than an order of magnitude -than that for the "c" Gaussian. Thus, it appears there are many causal variants of weak effect undergoing neutral (or very weakly negative) selection. For the nine phenotypes where the "d" Gaussian could be implemented, its variance parameter was several times larger than that of the "c" Gaussian. However, the amplitude of the prior probability for the "d" Gaussian, p d1 , was generally much smaller than the amplitudes of the prior probabilities for the "b" or "c" Gaussians, which translated into a relatively small number of causal variants with very large effect associated with this Gaussian. (Due to lack of power, in three instances -CD, UC, and TC -p d (L) was treated as a constant, i.e., independent of L.) Interestingly, intelligence had the highest number of causal SNPs associated with this Gaussian, while the extent of total LD for associated SNPs was also liberal (m d = 561; see also Supplementary Figure S7). It is possible that some of these SNPs are undergoing positive selection, but we did not find direct evidence of that.
We find a diversity of genetic architectures across multiple human complex phenotypes. We find that SNP total LD plays an important role in the likelihood of the SNP being causal, and in the effect size of the SNP. In general, we find that lower total LD SNPs are more likely to be causal with larger effects. Furthermore, for most phenotypes, while taking total LD into account, we find that causal SNPs with lower MAF have larger effect sizes, a phenomenon indicative of negative selection. Additionally, for all phenotypes, we found evidence of neutral selection operating on SNPs with relatively weak effect. Apart from a weak effect in BMI, we did not find direct evidence of positive selection. Future work will explore SNP functional annotation categories and their differential roles in human complex phenotypes.

Total Linkage Disequilibrium
Sequentially moving through each chromosome in contiguous blocks of 5,000 SNPs in the reference panel, for each SNP in the block we calculated its Pearson r 2 correlation coefficients (that arise from linkage disequilibrium, LD) with all SNPs in the central bock itself and with all SNPs in the pair of flanking blocks of size up to 25,000 each. For each SNP we calculated its total linkage disequilibrium (TLD), given by the sum of LD r 2 's thresholded such that if r 2 < r 2 min we set that r 2 to zero (r 2 min = 0.05). The fixed window size corresponds on average to a window of ±8 centimorgans. This is deliberatly larger than the 1centimorgan window used to define LD Score [3], because the latter appears to exclude a noticeable part of the LD structure.
In applying the model to summary statistics, we calculated histograms of TLD (using 100 bins) and ignoring SNPs whose TLD was so large that their frequency was less than a hundredth of the respective histogram peak; typically this amounted to restricting to SNPs for which TLD 600. We also ignored summary statistics of SNPs for which MAF 0.01.
Since we are estimating three parameters from millions of data points, it is reasonable to coarse-grain the data. Knowing the TLD and heterozygosity (H) of each SNP, we divided the full set of GWAS SNPs into a H×TLD coarse-grained grid; we found that 10×10 is more than sufficient for converged results.

Model PDF
When implementing the discrete Fourier transform (DFT) to calculate model a posteriori probabilities for z-score outcomes for a single SNP, we discretize the range of possible z-scores into the ordered set of n (equal to a power of 2) values z 1 , . . . , z n with equal spacing between neighbors given by ∆z (z n = −z 1 − ∆z, and z n/2+1 = 0). Taking z 1 = −38 allows for the minimum p-values of 5.8 × 10 −316 (near the numerical limit); with n = 2 10 , ∆z = 0.0742. Given ∆z, the Nyquist critical frequency is f c = 1 2∆z , so we consider the Fourier transform function for the z-score pdf at n discrete values k 1 , ..., k n , with equal spacing between neighbors given by ∆k, where k 1 = −f c (k n = −k 1 − ∆k, and k n/2+1 = 0; the DFT pair ∆z and ∆k are related by ∆z∆k = 1/n).
In our earlier work on the "b" model [22] we gave an expression, denoted G(k j ), for the Fourier transform of the genetic contribution to a z-score, where k j is the discrete Fourier variable described above. In constructing the a posteriori pdf for a z-score, the extra complexity in the "c" and "d" models presented in the current work requires a modification only in this term.
The set of typed SNPs are put in a relatively fine 10×10 grid, called the "H-L" grid, based on their heterozygosity and total LD (coarser grids are refined until converged results are achieved, and 10 × 10 is more than adequate). Given a typed SNP in LD with many tagged SNPs in the reference panel (some of which might be causal), divide up those tagged SNPs based on their LD with the typed SNP into w max LD-r 2 windows (we find that w max = 20, dividing the range 0 r 2 1 into 20 bins, is more than adequate to obtain converged results); let r 2 w denoted the LD of the w th bin, w = 1, . . . , w max . Denote the number of tagged SNPs in window w as n w and their mean heterozygosity as H w . Then, given model "b" parameters π 1 , σ 2 b , and σ 2 0 , and sample size N , G(k j ) is given by G(k j ) = wmax w=1 π 1 exp(B w k 2 j ) + (1 − π 1 ) nw ,