Evaluating and improving heritability models using summary statistics

There is currently much debate regarding the best model for how heritability varies across the genome. The authors of GCTA recommend the GCTA-LDMS-I model, the authors of LD Score Regression recommend the Baseline LD model, and we have recommended the LDAK model. Here we provide a statistical framework for assessing heritability models using summary statistics from genome-wide association studies. Based on 31 studies of complex human traits (average sample size 136,000), we show that the Baseline LD model is more realistic than other existing heritability models, but that it can be improved by incorporating features from the LDAK model. Our framework also provides a method for estimating the selection-related parameter α from summary statistics. We find strong evidence (P < 1 × 10−6) of negative genome-wide selection for traits, including height, systolic blood pressure and college education, and that the impact of selection is stronger inside functional categories, such as coding SNPs and promoter regions. Assessing heritability models using summary statistics from genome-wide association studies of 31 human traits shows that the Baseline LD model is realistic and can be improved by incorporating features from the LDAK model.


Results
For our main analysis, we use summary statistics from 31 genomewide association studies (GWAS) to compare 12 heritability models (Table 1). We perform this analysis using SumHer, part of our software package LDAK (www.ldak.org) 4,6 . Here, we briefly describe the heritability models, our proposed model likelihood and the data we use; for full details see Methods.
Heritability models. The heritability model specifies how E[h 2 j ], the expected heritability contributed by SNP j, varies across the genome. We consider nine existing heritability models. The oneparameter GCTA model 5  j ] is proportional to w j [f j (1 − f j )] 0.75 , where w j is the LDAK weighting of SNP j (w j tends to be higher for SNPs in low-LD regions) and f j is its MAF. The twoparameter LDAK+1Fun 1 and 25-parameter LDAK+24Fun 4 models extend the LDAK model by adding either one or all 24 function indicators of the Baseline model.
We construct three novel heritability models. The 66-parameter BLD-LDAK model combines features of the Baseline LD and LDAK models; first, we add the LDAK weighting w j to the Baseline LD model, and then we remove the ten MAF indicators and scale the remaining 66 annotations by [f j (1 − f j )] 0.75 . The 67-parameter BLD-LDAK+Alpha model is the same, except it scales the annotations by [f j (1 − f j )] 1+α , where α is estimated from the data using a grid search. The one-parameter LDAK-Thin model is a simplified version of the LDAK model, obtained by setting the LDAK weightings to either zero or one.
Measuring model fit. Suppose we have summary statistics from a GWAS; let S j denote the χ 2 (1) test statistic from regressing the phenotype on SNP j. Suppose also that we have genotype data from an ancestrally matched reference panel, from which we can estimate r 2 jl , the squared correlation between SNPs j and l. The authors of LDSC 3 derived that the marginal distribution of each S j is approximately Evaluating and improving heritability models using summary statistics Doug Speed 1,2,3 ✉ , John Holmes 4 and David J. Balding 3,4 There is currently much debate regarding the best model for how heritability varies across the genome. The authors of GCTA recommend the GCTA-LDMS-I model, the authors of LD Score Regression recommend the Baseline LD model, and we have recommended the LDAK model. Here we provide a statistical framework for assessing heritability models using summary statistics from genome-wide association studies. Based on 31 studies of complex human traits (average sample size 136,000), we show that the Baseline LD model is more realistic than other existing heritability models, but that it can be improved by incorporating features from the LDAK model. Our framework also provides a method for estimating the selection-related parameter α from summary statistics. We find strong evidence (P < 1 × 10 −6 ) of negative genome-wide selection for traits, including height, systolic blood pressure and college education, and that the impact of selection is stronger inside functional categories, such as coding SNPs and promoter regions.
gamma with shape ½ and scale 2E[S j ], where E[S j ] is the expectation of S j (a function of the parameters of the chosen heritability model). Based on this, we propose the approximate joint log-likelihood and Γ(X|a,b) is the probability density function of a gamma distribution with shape a and scale b. The weights 1/u j are the same as those used by LDSC when regressing S j onto E[S j ], and are included to allow for correlations between local SNPs 3 . We perform three analyses to support the use of logl SS to compare heritability models. First, Extended Data Fig. 1 shows that for scenarios where both logl SS and the REML likelihood can be computed, they are concordant. Second, the Supplementary Note shows that when we add a non-informative annotation to a heritability model, twice the increase in logl SS is approximately χ 2 (1) distributed-the expected distribution were logl SS an exact likelihood (although this analysis indicates the approximation is less accurate for highly heritable traits). Third, Table 1 shows that the ranking of heritability models based on logl SS is consistent with the ranking based on leave-one-chromosome-out prediction of test statistics. Additionally, Supplementary Table 1 shows that the ranking of models is unchanged if we instead compute an unweighted version of logl SS using only SNPs in approximate linkage equilibrium and u j = 1.

Data.
We use summary statistics from two sets of GWAS. The first set comprises 14 traits from UK Biobank (UKBb) 13,14 : eight continuous (body mass index, forced vital capacity, height, impedance, neuroticism score, pulse rate, reaction time and systolic blood pressure), four binary (college education, ever smoked, hypertension and snorer) and two ordinal (difficulty falling asleep and preference for evenings). We performed these GWAS ourselves; after stringent quality control, 130,000 samples and 4.7 million SNPs remained. We additionally use summary statistics from 17 public GWAS 4,15 : ten continuous (including anthropometric measures and psychiatric scores) and seven binary (mostly complex diseases). The average sample size is 141,000 (range 21,000-329,000). As a reference panel, we use 489 European individuals from the 1000 Genome Project 16 , recorded for 10.0 million SNPs (MAF > 0.005).
Performance of heritability models. Table 1 reports logl SS for the 12  heritability models, averaged across either the 14 UKBb or 17 public  GWAS (values for individual GWAS are in Supplementary Table 2). We rank models based on the Akaike Information Criterion 17 (AIC), equal to 2K − 2logl SS , where K is the number of parameters.
When we restrict to the nine existing heritability models, the Baseline LD model performs best; it has average AIC 221 lower than the next best model and is the top-ranked model for 28 of the 31 GWAS. However, when we consider all 12 heritability models, the BLD-LDAK and BLD-LDAK+Alpha models are the best; they both have average AIC 124 lower than the Baseline LD model, and now these are the top two models for 28 of the 31 GWAS. These two models would remain the best if, instead of the AIC, we ranked models based on −2logl SS or 4K − 2logl SS (that is, either removed or doubled the penalty on parameters). Genetic architecture estimates. Figure 1, Extended Data Fig. 2 and Supplementary Tables 3-6 compare estimates of SNP heritability, confounding bias and functional enrichments from the 12 heritability models (note that only the seven models that include function indicators can be used to estimate functional enrichments). As heritability models have become more complex, estimates have tended to converge, so that the more complex models produce estimates of SNP heritability and confounding bias intermediate between those from the GCTA and LDAK models, and estimates of functional enrichments intermediate between those from the GCTA+1Fun and LDAK+1Fun models. Based on model fit (Table 1), we consider the BLD-LDAK and BLD-LDAK+Alpha models to be the most reliable of the 12 models; their estimates of confounding bias and functional enrichments are close to those from the Baseline LD model, while their estimates of SNP heritability tend to be between those from the LDAK model and those from the Baseline LD or GCTA-LDMS-I models.
Evidence of selection. Figure 2a and Supplementary Table 7 report estimates of α in the BLD-LDAK+Alpha model. This parameter specifies the assumed relationship between heritability and MAF 1,4 , and has been used to measure selection 10,18 (negative α indicates that less common SNPs tend to have larger effect sizes than more common SNPs, and vice versa). Across the 31 GWAS, the average estimate of α is We also investigated whether α varies across functional categories of SNPs. Ideally we would use a single heritability model containing 24 α, one for each category, but to solve this model would require a grid search across 24 parameters. Therefore, we instead use simpler models containing two α, corresponding to SNPs inside and SNPs outside each category. Specifically, these models are derived from the eight-parameter BLD-LDAK-Lite+Alpha model, a reduced version of the BLD-LDAK+Alpha model. Supplementary Table 8 explains how we constructed the BLD-LDAK-Lite+Alpha model by identifying the most important annotations of the BLD-LDAK+Alpha model, while Extended Data Fig. 3 shows that estimates of α from the BLD-LDAK-Lite+Alpha model are consistent with those from the BLD-LDAK+Alpha model. Figure 2b and Supplementary Table 9 show that the two α are significantly different from each other (P < 0.05/24) for 18 of the 24 functional categories. The largest differences are observed for coding SNPs, promoter regions, 3′ untranslated region (UTR) and transcription start sites (inside each of these four categories, the average estimate of α is below −0.5).

Discussion
When software for estimating SNP heritability was first developed, little attention was given to the heritability model, and it was standard to assume that all SNPs are expected to contribute equal heritability 2,3 . It is now recognized that this assumption is suboptimal 1,4 , and the best way to model how heritability varies across the genome has become a topic of much debate 7,8,10,19,20 . Heritability models have previously been compared based on REML likelihood 1,20 , prediction accuracy 4,20 and performance on simulated data 7,8 . However, all three approaches have shortcomings; the REML likelihood requires individual-level data and cannot be computed for complex heritability models, to measure prediction accuracy requires two independent datasets for each trait (one for training and one for testing) and there is no consensus regarding the best prediction method, while comparisons of heritability models based on simulated data are sensitive to the assumptions of the simulation model 1 .
We have proposed logl SS , an approximate model likelihood that can be computed from summary statistics and for complex heritability models. logl SS can be used both to evaluate heritability models (for example, if two models produce contrasting estimates, then those from the model with highest AIC should be preferred) and to improve them (for example, by combining pairs of models when this significantly improves logl SS ). Using logl SS , we have shown that the Baseline LD model is better than other existing heritability models, but that it can be substantially improved by incorporating the SNP weightings and MAF scaling used by the LDAK model. Estimates of confounding bias and functional enrichments from the resulting BLD-LDAK model are close to those from the Baseline LD model, while its estimates of SNP heritability are between those of the Baseline LD, GCTA-LDMS-I and LDAK models. Our results support the conclusion of Gazal et al. 20 , that estimates of functional enrichments from the Baseline LD model are more accurate than those from the LDAK+1Fun model. They provide partial support for Evans et al. 8 , who argued that the GCTA-LDMS-I model produces more accurate estimates of SNP heritability. Although we found that the GCTA-LDMS-I model performs well compared with the other existing models, our analysis indicates that the BLD-LDAK model should now be preferred. Our results support our previous finding 4 that the LDAK model is more realistic than the GCTA model, but not that estimates of functional enrichments from the LDAK+24Fun model should be preferred to those from the Baseline and Baseline LD models. We discuss these three papers in detail in the Supplementary Note.
Recently, Hou et al. 21 proposed GRE, a method for estimating SNP heritability without specifying a heritability model. While we agree with the benefits of performing heritability analysis without requiring a heritability model, this is not feasible for most analyses. For example, GRE cannot be used on our public GWAS, because it requires individual-level data, nor can it be used on our (full) UKBb GWAS, because it requires that the number of samples be larger than the number of SNPs on the largest chromosome. For Extended Data Fig. 4, we run GRE using a subset of our UKBb GWAS (we restrict to 623,000 directly genotyped SNPs). Estimates of SNP heritability from the BLD-LDAK model are consistent (P > 0.05/14) with those from GRE for all 14 traits. However, this partly reflects that reducing the number of SNPs reduces the impact of the heritability model (the LDAK-Thin, GCTA-LDMS-R, GCTA-LDMS-I and Baseline LD models also produce consistent estimates for all 14 traits).
The BLD-LDAK+Alpha model is a generalization of the BLD-LDAK model. The two models have similar fit and produce similar estimates of SNP heritability, confounding bias and functional enrichments. For computational reasons, we generally recommend the BLD-LDAK model. However, the advantage of the BLD-LDAK+Alpha model is that it provides estimates of the selection-related parameter α. Our results broadly agree with those of Zeng et al. 18 ; using the software BayesS, they found significantly negative α for 23 out of 28 UK Biobank traits, while their average estimate of α was −0.38 (s.d. = 0.01). To our knowledge, SumHer is the only software to estimate α from summary statistics, and therefore can be viewed as a more computationally efficient alternative to BayesS (for a full comparison of the two methods and their results, see Supplementary Note).
Although its shortcomings have been well documented, the GCTA model continues to be widely used in statistical genetics. It remains the default model of both the GCTA and LDSC software, and is the model used by LD Hub 22 . More widely, the GCTA model is implicitly assumed by any penalized or Bayesian regression method that standardizes genotypes then assigns the same penalty or prior distribution to each SNP, or in simulations when causal SNPs are picked at random then their standardized effects sizes are drawn from the same distribution. Ideally, the GCTA model should be replaced by the BLD-LDAK or BLD-LDAK+Alpha models whenever it occurs. However, we recognize that for many methods, introducing a multiparameter heritability model would require substantial algorithmic changes and dramatically increase computational demands. When this is the case, we instead recommend using the LDAK-Thin model; this is a one-parameter model, so computation demands should not be affected, that can be incorporated in any existing method simply by changing which predictors are included in the regression and how these are standardized.
We finish by highlighting three areas for future work. First, we have only considered common SNPs; with the increasing availability of sequence data, it will be necessary to examine whether the BLD-LDAK and BLD-LDAK+Alpha models remain better performing models when rare SNPs are included. Second, the ability to measure model fit for very large sample sizes means that we now have sufficient power to construct heritability models specific to either individual traits or groups of traits (Supplementary Table 2). Third, we have only considered the genomic annotations contained within the Baseline LD model. We expect it will be possible to find new annotations predictive of how heritability varies across the genome (that is, whose inclusion in the heritability model significantly increases logl SS ). Identifying these will both improve the performance of the heritability model and our understanding of the genetic architecture of complex traits.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41588-020-0600-y.

Methods
Let h 2 j denote the heritability contributed (uniquely) by SNP j. Suppose we have summary statistics from a GWAS of n individuals; let S j denote the χ 2 (1) test statistic from regressing the phenotype on SNP j. Suppose we also have access to an ancestrally matched reference panel, from which we can estimate r 2 jl , the squared correlation between SNPs j and l (genotypes coded additively).
Linear heritability models. The heritability model describes how the expectation of h 2 j varies across the genome. We first assume the model takes the form where the a jk are prespecified SNP annotations and the parameters τ k are estimated in the analysis. Assuming no confounding bias 3 where Reg j indexes the regression SNPs (those used when regressing S j on E[S j ]) near SNP j. We propose the approximate log-likelihood The term within the large parentheses is the log-likelihood for a single SNP; therefore, logl SS computes a weighted sum of these, where the weights 1/u j reflect local correlations. Extended Data Fig. 1 and Supplementary Note show that logl SS is concordant with the exact likelihood computed from REML and can be used for likelihood ratio testing.
Estimating parameters. LDSC estimates the parameters of the heritability model using weighted least-squares regression 3 , with regression weights 1/u j × 1/(2E[S j ] 2 ); weighting by 1/u j allows for correlations between nearby SNPs (motivating our use of 1/u j in the definition of logl SS ), while weighting by 1/(2E[S j ] 2 ) allows for heteroscedasticity (2E[S j ] 2 is the variance of a gamma distribution with shape 1/2 and scale 2E[S j ]).
When we proposed SumHer, we followed LDSC and estimated parameters using weighted least-squares regression 4 . However, now that we have an expression for the model likelihood, we can instead use maximum likelihood estimation. To identify the values that maximize logl SS , we use (multidimensional) Newton-Raphson 23 . Let θ denote the vector of parameters (the τ k and, if allowing for confounding, either A or C). Starting from the null model (τ k = 0, A = 0, C = 1), we update θ iteratively until convergence using θ → θ − E −1 D, where the vector D and matrix E contain, respectively, the first and second derivatives of logl SS evaluated using the current parameter values (the required derivatives can be computed using the chain rule; for example, δlogl SS /δθ k = δlogl SS /δE[S j ] × δE[S j ]/ δθ k ). Sometimes, a move causes a (nonnegligible) reduction in logl SS . When this happens, we cancel the move, then for the next iteration (only) update each parameter once individually using (one-dimensional) Newton-Raphson.
Except where stated, our analyses use the maximum likelihood solver, regardless of the heritability model. In Extended Data Fig. 5, we compare the impact of changing the solver. This shows that for simple heritability models, the weighted least squares and maximum likelihood solver result in identical logl SS , but that for complex models the maximum likelihood solver often results in substantially higher logl SS .
Nonlinear heritability models. To date LDSC and SumHer have required that the heritability model be linear (this ensures that E[S j ] can be expressed as a linear combination of the model parameters). However, SumHer can now accommodate a small number of non-linear parameters. We first crudely estimate the non-linear parameters using a grid search, selecting the values that result in highest logl SS . We then increase resolution and obtain standard deviations by fitting a Gaussian likelihood to the realizations of logl SS . The Supplementary Note demonstrates how we use this approach to estimate the selection-related parameter α. (1) using parameter estimates obtained from the other 21 chromosomes. In Table 1 we report a weighted correlation between predicted and observed test statistics (across regression SNPs)

Leave-one-chromosome-out cross-validation of test statistics. For each SNP we compute E[S j ] in equation
We estimated the standard deviation of ρ using block jackknifing with 200 blocks 3,24 . We consider it appropriate to include the weights 1/u j , as otherwise ρ will overweight high-LD regions; however, Supplementary Table 1 shows that the ranking of models is the same if we instead compare unweighted correlations.

GWAS.
We accessed UK Biobank 13,14 data via Project 21432. In total, we identified 20 phenotypes that were recorded for the majority of individuals: the 14 we retained were body mass index (data field 21001), forced vital capacity (3062), height (50), impedance (23106), neuroticism score (20127), pulse rate (102), reaction time (20023), systolic blood pressure (4080), college education (6138), ever smoked (20160), hypertension (20002), snorer (1210), difficulty falling asleep (1200) and preference for evenings (1180); the six we discarded were asthma, wears glasses, handedness, any mouth problem, basal metabolic rate and diastolic blood pressure (each either had estimated heritability less than 0.1 or was highly correlated with one of the retained phenotypes). The imputed dataset contains 487k individuals recorded for 93 M SNPs. However, after quality control, which included filtering individuals based on ancestry and relatedness (for the latter, we ensured that no pair remained with allelic correlation >0.02), and excluding SNPs with MAF < 0.01, info score < 0.99 or within the major histocompatibility complex (Chr6: 25-34 Mb), only 130,080 individuals and 4,725,151 SNPs remained 4,5,25 . As we have individual-level data, we could use our previously published protocols to confirm that confounding due to residual population structure, relatedness or genotyping errors was slight 1 . For the association analysis, we tested each SNP using linear regression (regardless of whether the phenotype was continuous, categorical or binary), having first regressed the phenotype on 13 covariates: age (data field 21022), sex (31), Townsend Deprivation Index (189) and ten principal components. For more details, see Supplementary Note and Extended Data Fig. 6.
The 17 public GWAS are coronary artery disease 26 , Crohn's disease 27 , ever smoked 28 , inflammatory bowel disease 27 , rheumatoid arthritis 29 , schizophrenia 30 , type 2 diabetes 31 , bone mineral density 32 , body mass index 33 , depressive symptoms 34 , height 35 , menarche age 36 , menopause age 37 , neuroticism 34 , subjective well-being 34 , waist/hip ratio 38 and years of education 39 . These are a subset of the 24 GWAS that we considered previously 4 ; we excluded the remaining seven GWAS as the authors of LDSC 9,40 recommend only using traits with a heritability Z-score above seven. For these GWAS we have to rely on the quality control choices of the original authors (Supplementary Table 10), which are generally less strict than ours, and without access to individual-level data, we could not test for confounding due to population structure, relatedness or genotyping errors.

Software settings.
When running an analysis using LDSC or SumHer it is necessary to choose the heritability and confounding models, provide a reference panel, select the regression and heritability SNPs and, if estimating enrichments, specify the expected proportion of SNP heritability contributed by each category (for an explanation of each option, see Supplementary Table 11). We describe the different heritability models we consider below. For the other options, our main analysis follows the recommendations of LDSC 3,9 . When analyzing the UKBb GWAS, we assume there is no confounding bias (the exception is for Fig. 1b, when we allow for additive confounding bias, then report the estimate of 1 + A); when analyzing the public GWAS, we always allow for additive confounding bias. Our reference panel is the 1000 Genome Project 16 Table 1 shows that the ranking of heritability models is the same if we construct a reference panel from the UKBb data (instead of the 1000 Genomes Project data), if we reduce the reference panel to the 4.7 M SNPs in our UKBb GWAS or if we reduce the regression SNPs to a subset in approximate linkage equilibrium. Supplementary Table 12  The one-parameter LDAK model 1,4 where w j is the LDAK weighting of SNP j, f j is its MAF and p j = f j (1 − f j ). We recommend that the weightings be only computed over high-quality SNPs 1,6 (so low-and moderatequality SNPs automatically get w j = 0). We do not have SNP info scores for the 1000 Genome Project reference panel, so when computing weightings, we restrict to the 4.7 M SNPs in our UKBb GWAS (that is, we assume that SNPs that are well genotyped in the UK Biobank are well genotyped in the 1000 Genome Project).
The novel 66-parameter BLD-LDAK and 67-parameter BLD-LDAK+Alpha models both take the form where the c jk are the 64 LD-related and functional annotations from the Baseline LD model 10 ; the BLD-LDAK model fixes α = −0.25, while the BLD-LDAK+Alpha estimates α from the data. To construct the BLD-LDAK model we first added the LDAK weighting to the Baseline LD model (this increased average logl SS by 11), then scaled all annotations by p j 0.75 (this increased average logl SS by a further 50). At this point we noted that the ten MAF indicators had limited value (excluding them reduced average logl SS by only 8) so we removed them. We were unable to improve the model further by adding features from the GCTA-LDMS-R and GCTA-LDMS-I models (for example, if we incorporated the four LD or the 20 MAF-LD bins from the GCTA-LDMS-I model, this increased the number of parameters by 3 and 19, respectively, but increased average logl SS by only 2 and 13). For more details, see Supplementary Table 16.
The seven-parameter BLD-LDAK-Lite and eight-parameter BLD-LDAK-Lite+Alpha models are reduced versions of the BLD-LDAK and BLD-LDAK+Alpha models, respectively, obtained by removing two of the nine continuous annotations and all 57 binary annotations (Supplementary Table 8 explains how we used forward stepwise selection to decide which of the continuous annotations to retain). Extended Data Fig. 3 shows that estimates of SNP heritability and confounding bias from the BLD-LDAK-Lite model are consistent with those from the BLD-LDAK model, and that estimates of α from the BLD-LDAK-Lite+Alpha model are consistent with those from the BLD-LDAK+Alpha model. To allow α to vary based on functional annotations, we use a 16-parameter model obtained by concatenating two versions of the BLD-LDAK-Lite+Alpha model (one where only SNPs within the category contribute heritability, and one where only SNPs outside the category contribute).
When computing the LDAK weightings, the first step is to thin SNPs so that no pair remains within 100 kilobases with r 2 jl > 0.98 (excluding duplicate SNPs substantially improves the efficiency of the solver used to compute the weightings 1 ). The LDAK-Thin model assumes E[h 2 j ] = I j p j 0.75 τ 1 , where I j indicates whether SNP j remains after the thinning. To implement the LDAK-Thin model within an existing penalized or Bayesian regression method requires two changes: first, thin the SNPs (for the UKBb data, this reduced the number from 4.7 M to 1.4 M); second, center and scale the genotypes so that SNP j has variance p j 0.75 . Computational demands. To run SumHer with a linear heritability model involves two steps 4 . The first is to compute the tagging file; this takes at most 1 d on a single CPU and requires 20 Gb memory. The second is to estimate the parameters of the heritability model; this takes at most 2 h and requires 20 Gb memory. To run SumHer with a non-linear heritability model, it is necessary to repeat the above process multiple times (for example, Supplementary Note explains how we fit the BLD-LDAK+Alpha model by calculating tagging files and estimating parameters for 31 versions of the BLD-LDAK model). The LDAK website (www.ldak.org) provides precomputed tagging files for the BLD-LDAK and BLD-LDAK-Lite+Alpha models, suitable for analyzing European, South Asian, East Asian or African populations. Fig. 6 | Reduced quality control for uKBb GWaS. For our main analysis of the UKBb GWAS, we first identified individuals with values for all 14 phenotypes, then filtered so that no pair remained with allelic correlation >0.02 (Supplementary Note 6). As a secondary analysis, we instead identified individuals with values for any of the 14 phenotypes, then filtered so that no pair remained with allelic correlation >0.03125. This increased the number of individuals from 130,080 to 246,655, with on average 236k phenotypic values per GWAS (range 201k to 247k). The first plot shows that increasing the sample size does not change the ranking of models based on the Akaike Information Criterion. The remaining three plots shows that it does not significantly change estimates of SNP heritability or average functional enrichments from the BLD-LDAK Model, nor estimates of the selection-related parameter α from the BLD-LDAK + Alpha Model (horizontal and vertical segments indicate 95% confidence intervals; numbers indicate how many of the pairs of estimates are inconsistent either nominally or after Bonferroni correction).

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code

Data collection
No data were collected (we only used previously-collected data, from the UK Biobank and 1000 Genomes Project)

Data analysis
The majority of our analysis used LDAK version 5 (www.ldak.org). We additionally used LDSC version 1.0.0 (https://github.com/bulik/ldsc) and PLINK2 (https://www.cog-genomics.org/plink/2.0) For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability We performed the UKBb GWAS using data applied for and downloaded via the UK Biobank website (www.ukbiobank.ac.uk) We obtained summary statistics for the Public GWAS from the websites of the corresponding studies. We downloaded the 1000 Genome Project data from the LDSC website (www.github.com/bulik/ldsc).