Multi-Ancestry Meta-Analysis yields novel genetic discoveries and ancestry-specific associations

We present a new method, Multi-Ancestry Meta-Analysis (MAMA), which combines genome-wide association study (GWAS) summary statistics from multiple populations to produce new summary statistics for each population, identifying novel loci that would not have been discovered in either set of GWAS summary statistics alone. In simulations, MAMA increases power with less bias and generally lower type-1 error rate than other multi-ancestry meta-analysis approaches. We apply MAMA to 23 phenotypes in East-Asian- and European-ancestry populations and find substantial gains in power. In an independent sample, novel genetic discoveries from MAMA replicate strongly.


INTRODUCTION
The past decade has seen the discovery of hundreds of thousands of credible genetic associations for complex traits and diseases 1 . Many of these discoveries were identified by meta-analyzing genome-wide association studies (GWAS) 2-4 conducted in multiple cohorts, boosting statistical power relative to GWAS in any one cohort. GWAS summary statistics are now facilitating progress in many areas, including novel drug development 5,6 and disease prediction for potential clinical application 7 .
Unfortunately, GWAS research to date has overwhelmingly been conducted in European-ancestry samples. Consequently, GWAS estimates for European populations are substantially more precise than those for other populations, potentially generating unequal gains from the scientific advances. Clearly, correcting these imbalances requires intensifying data-collection efforts in non-European populations 8,9 , and a number of promising efforts are underway [10][11][12][13][14] . However, it will take years before levels of precision are reached comparable to those currently available for European-ancestry studies. Even then, for many traits, further increases in precision would almost certainly still be valuable.
To complement the current efforts to improve the precision of GWAS effect-size estimates for non-European-ancestry populations, we have developed a new method, Multi-Ancestry Meta-Analysis (MAMA), that efficiently shares information across populations via cross-ancestry GWAS meta-analysis. Currently, researchers who wish to meta-analyze summary statistics from different-ancestry cohorts must confront two wellknown obstacles. First, the populations may differ in allele frequencies and linkage disequilibrium (LD) patterns. Second, the effects of SNPs may not be identical across the two populations. Both obstacles induce heterogeneity in marginal (i.e., GWAS) effect sizes between populations. Consequently, a naïve meta-analysis of summary statistics that ignores these obstacles would generally result in biased estimates for both populations.
In different contexts, the two obstacles above have each been addressed separately in previously developed methods. For example, POPCORN directly models allelefrequency and LD differences to estimate genetic correlation between populations 15 .
MTAG accounts for heterogeneity of marginal effect sizes across GWAS summary statistics in a one-population, multi-trait setting 16 . MAMA adapts and combines strategies from both methods, accounting for differences between populations in conditional effects, allele frequencies, and LD.
Compared to these, MAMA has a unique combination of attractive features: (i) it only requires GWAS summary statistics and a reference panel for each population; (ii) under plausible assumptions discussed below, it is the best linear unbiased estimator, but also appears robust under alternative assumptions; (iii) it has a linear, closed-form solution, ensuring fast computation time.
For MAMA's intuition, first consider an inverse-variance-weighted (IVW) metaanalysis 18 applied to GWAS summary statistics with no heterogeneity (e.g., each GWAS sample is from the same population). In such a case, IVW is unbiased and efficient.
However, if there is heterogeneity in the marginal effect sizes between populations, then the estimate from the alternate ancestry needs to be adjusted to avoid bias, and the relative weight applied to the alternate ancestry's estimate must be reduced to be efficient 16 . Before meta-analyzing, MAMA estimates the heterogeneity across ancestries for each SNP and optimally adjusts the estimates and relative weights on the alternate ancestries.
To evaluate MAMA, we apply it to GWAS summary statistics for a wide range of anthropometric, health, and behavioral phenotypes in populations with European and East-Asian ancestries. MAMA identifies many new loci that were not found using either of the single-ancestry GWAS results. For some phenotypes, we calculate that MAMA

The MAMA Framework
MAMA is an extension of the related method, MTAG 16 , adapting its assumptions to the multi-ancestry context and generalizing it to allow for differences across ancestries.
The We assessed three metrics of performance: bias, mean 4 statistic, and type-1 error rate. To evaluate bias, we orient each SNP such that the true marginal effect is positive and report the mean difference between the estimated effect size and the true marginal effect. For the type-1 error rate, we report the fraction of SNPs with null marginal effects that have a P value of less than 0.05. MANTRA, which is a Bayesian method, does not report a standard error, so we cannot calculate a standard mean 4 statistic or type-1 error rate. We therefore use the posterior standard deviation as an imperfect proxy for the standard error to calculate these performance metrics. Because the bias, mean 4 statistic, and type-1 error rate are a function of the LD correlation,   populations. The type-1 error rate is controlled if it is less than 0.05. MANTRA is the only method that has an uncontrolled type-1 error rate, but this just means that the posterior standard deviation is not a good proxy for the standard error of MANTRA estimates. Both MAMA and MTAG have a type-1 error rate less than 0.05. That is because both methods include a stratification correction (even though there is no stratification in the simulation), which slightly inflates the standard errors. Figure 3b reports the type-1 error rate of SNPs that were null in the EAS population but non-null in the AFR and EUR populations, a violation of the homogeneous-assumption. For these SNPs, MAMA has an uncontrolled type-1 error rate, especially for SNPs with a high LD correlation. However, MANTRA, RE2, and FE each have even higher type-1 error rates. MR-MEGA has higher type-1 error rates than MAMA in all but the highest LD correlation bin. In this scenario, MTAG has lower type-1 error rate across most of the LD correlation spectrum.
In summary, although some methods have greater statistical power (as measured by the mean 4 statistic) than MAMA, MAMA generally is the least biased across the LD correlation spectrum. Furthermore, although all methods considered except MANTRA control the type-1 error rate when the homogeneous-assumption holds, MAMA is more robust to a plausible violation of this assumption than all other methods except MTAG.

Application in Real Data
We applied MAMA to GWAS summary statistics for 23 phenotypes that were available from the China Kadoorie Biobank (CKB) and/or the Biobank Japan (BBJ) and for which we had access to comparable phenotypes in the UK Biobank (UKB To address the concern that our assumption of perfect genetic correlation could lead to false discoveries, we validate the robustness of our results to alternate assumptions. When we assume a cross-ancestry genetic correlation of 0.9, we find that 179 out of 201 lead SNPs for BMI and 80 of 105 lead SNPs for educational attainment remain genome-wide significant. Out of the 56 novel lead SNPs for BMI and 6 for EA, 48 and 3 remain genome-wide significant, respectively.
We also address concerns about false discoveries with a replication analysis. In  Third, the gains from MAMA are not uniform genome-wide; increases in power will be larger for SNPs that have more similar LD structure between populations 26 . As a result, MAMA is more likely to miss genetic signal that is concentrated in regions of high LD variability across populations.
Finally, although a polygenic predictor based on MAMA-generated weights for population j should generally outperform a polygenic predictor derived from population j's GWAS summary statistics alone, MAMA is not designed as a prediction tool. The unbiasedness of MAMA estimates is a strength for gene discovery, but it is a limitation for prediction. Methods that generate biased estimates but less sampling varianceincluding standard, inverse-variance-weighted meta-analysis of cross-ancestry GWAS summary statistics, not taking into account LD and other population differences-can outperform MAMA in prediction accuracy. Exploring optimal ways to use GWAS summary statistics from European-ancestry populations to improve polygenic prediction in non-European-ancestry populations is an active and important area of research 27,28 .
We have shown that MAMA can be a useful tool for multi-ancestry genetics research. In our simulations, MAMA did not always yield the greatest gains in reported statistical power, but we consistently found that MAMA estimates were less biased than other methods and more robust to violations of key assumptions. We anticipate that MAMA estimates will prove especially valuable when it is important to have approximately unbiased estimates of SNP associations. For example, many methods for fine-mapping 29 , partitioning heritability 30,31 , and biological annotation 32,33 depend on reliable estimates of effect sizes.

DATA AVAILABILITY
For each phenotype that we analyze, we report GWAS and MAMA summary statistics.
With the exception of the summary statistics for educational attainment (which include data from 23andMe), complete summary statistics for both the EAS and EUR samples will be available for download at the SSGAC website (http://www.thessgac.org/data) upon publication. SNP-level summary statistics from analyses based entirely or in part on 23andMe data can only be reported for up to 10,000 SNPs. Therefore, for this phenotype, we provide summary statistics for only the genome-wide-significant SNPs from that analysis. In addition, we provide complete summary statistics for an analysis that omits 23andMe. The full GWAS summary statistics for the 23andMe discovery data set will be made available through 23andMe to qualified researchers under an agreement with 23andMe that protects the privacy of the 23andMe participants. Please visit https://research.23andme.com/collaborate/#dataset-access/ for more information and to apply to access the data.

ACKNOWLEDGEMENTS:
This

Figure 1. Bias for cross-ancestry meta-analysis methods by LD correlation bin.
Note: Bias estimates in EAS defined as the difference between marginal SNP effects and meta-analyzed SNP effects averaged within LD correlation bins. All SNPs are oriented such that their marginal effects are positive. 95% confidence intervals are represented by vertical bars. In the three-population case, rLD is calculated as the average rLD value between the two population pairs that contain the target population (i.e., EAS-EUR and EAS-AFR for EAS). SNPs are binned into 5 groups of equal width between zero and one, and the bias is reported for SNPs within a bin.

Figure 2. Mean statistic for cross-ancestry meta-analysis methods by LD correlation bin.
Note: Mean estimates in EAS defined as the squared ratio of the meta-analyzed SNP effect over the SNP's standard error averaged within LD correlation bins. RE2 mean uses the reported RE2 P value and evaluates it on the inverse distribution with one degree of freedom. In the three-population case, rLD is calculated as the average rLD value between the two population pairs that contain the target population (i.e., EAS-EUR and EAS-AFR for EAS). SNPs are binned into 5 groups of equal width between zero and one, and the mean estimate is reported for SNPs within a bin. Figure 3. Type-1 error rate for cross-ancestry meta-analysis methods by LD correlation bin. Note: Type-1 error rate in (a) EAS for SNPs that are null in all three populations and (b) EAS for SNPs that are null in the EAS population but nonnull in the other populations. Type-1 error rate is defined as the fraction of null SNPs whose P value is less than 0.05. rLD is calculated as the average rLD value between the two population pairs that contain the target population (i.e., EAS-EUR and EAS-AFR for EAS). SNPs are binned into 5 groups of equal width between zero and one, and the type-1 error rate is reported for SNPs within a bin. Note: Type-1 error rate in (a) EAS for SNPs that are null in all three populations and (b) EAS for SNPs that are null in the EAS population but nonnull in the other populations. Type-1 error rate is defined as the fraction of null SNPs whose P value is less than 0.05. rLD is calculated as the average rLD value between the two population pairs that contain the target population (i.e., EAS-EUR and EAS-AFR for EAS). SNPs are binned into 5 groups of equal width between zero and one, and the type-1 error rate is reported for SNPs within a bin. Note: BMI for (a) EAS GWAS and (b) EAS MAMA. The x-axis is chromosomal position, and the y-axis is the P value on a -log10 scale (note the y-axes scale logarithmically). The dashed line marks the threshold for genome-wide significance (P = 5 × 10 -8 ). For the GWAS Manhattan plots, lead SNPs (i.e., approximately independent SNPs surpassing the genome-wide-significance threshold) are marked with a red ×. For the MAMA Manhattan plots, lead SNPs are binned into one of three mutually exclusive categories: matching a GWAS lead SNP in either population (marked with a red ×), in LD with a GWAS lead SNP in either population (marked with a black •), or a novel SNP that is independent of any GWAS lead SNP in either population (marked with a yellow ▲). Details on lead SNP and novel SNP identification can be found in Online Methods. Note: Educational Attainment for (a) EAS GWAS and (b) EAS MAMA. The x-axis is chromosomal position and the y-axis is the P value on a -log10 scale (note the y-axes scale logarithmically). The dashed line marks the threshold for genome-wide significance (P = 5 × 10 -8 ). For the GWAS Manhattan plots, lead SNPs (i.e., approximately independent SNPs surpassing the genome-wide-significance threshold) are marked with a red ×. For the MAMA Manhattan plots, lead SNPs are binned into one of three mutually exclusive categories: matching a GWAS lead SNP in either population (marked with a red ×), in LD with a GWAS lead SNP in either population (marked with a black •), or a novel SNP that is independent of any GWAS lead SNP in either population (marked with a yellow ▲). Details on lead SNP and novel SNP identification can be found in Online Methods. Note: QQ plot and sign-test replication for BMI of (a) all lead SNPs in MAMA EAS and (b) all novel lead SNPs in MAMA EAS. The x-axis corresponds to the uniform distribution of P values expected under the null hypothesis, and the y-axis corresponds to the observed P values in the replication GWAS output. P values are reported on the -log10 scale. The 45-degree line is plotted for reference. The sign of each MAMA SNP is compared against an independent replication GWAS. SNPs whose sign matches the sign reported in the replication GWAS are marked with a dark blue ▲, and SNPs whose sign does not match the sign reported in the replication GWAS are marked with a light blue •. Sign test P values are drawn from the binomial distribution with success probability 1/2. Details on lead SNP and novel SNP identification can be found in Online Methods. Note: QQ plot and sign-test replication for Educational Attainment of (a) all lead SNPs in MAMA EAS and (b) all novel lead SNPs in MAMA EAS. The x-axis corresponds to the uniform distribution of P values expected under the null hypothesis, and the y-axis corresponds to the observed P values in the replication GWAS output. P values are reported on the -log10 scale. The 45-degree line is plotted for reference. The sign of each MAMA SNP is compared against an independent replication GWAS. SNPs whose sign matches the sign reported in the replication GWAS are marked with a dark blue ▲, and SNPs whose sign does not match the sign reported in the replication GWAS are marked with a light blue •. Sign test P values are drawn from the binomial distribution with success probability 1/2. Details on lead SNP and novel SNP identification can be found in Online Methods.

Online Methods
This article is accompanied by a Supplementary Note with further details.

Data Generating Procedure for Simulations
In our simulations, we generate GWAS summary statistics for hypothetical AFR, EAS, and EUR populations. For the infinitesimal model used to measure bias and the mean 4 statistic, we generate true marginal effect sizes with the following approach.
First, for the corresponding subsample of the 1000 Genomes Project data 22 (consisting of all individuals with a genotype missingness rate no greater than 2%), we estimate the LD between each pair of SNPs that have an allele frequency greater than 1% and a missingness rate no more than 2% within their subsample in every population. We fix the LD between a pair of SNPs to zero if they are farther than 1 centimorgan apart or if they are on different chromosomes. We then draw true conditional effect sizes for each standardized SNP in these samples from a standard normal distribution. We assume that the cross-ancestry genetic correlation is 1 for each pair of populations, which means that the conditional effect of each standardized SNP is the same in each population.
True marginal effect sizes for each population are calculated using equation (3). For computational reasons, we retain only every 100 th SNP for the subsequent steps of the simulation, resulting in 60,107 SNPs.
For the spike-and-slab model used to measure type-1 error rate, we generate true marginal effect sizes using the same procedure as above with an additional step at the end. Specifically, we take the 60,107 SNPs for which we have marginal effects and we randomly set 10% of the marginal effects to zero in all three populations. We then randomly set the marginal effect to zero in just the EAS population for a random 10% set imputation are described elsewhere [34][35][36] . We retrieved individual phenotypes from medical records and applied standard quality control criteria followed by inverse-rank normalization as described previously 37 . We restricted our analysis to unrelated individuals and held out a random sample of 5,000 individuals as a replication sample.

Construction of LD scores
LD scores used in these analyses were constructed using the 1000 Genomes Project Phase 3 samples. The LD scores were constructed using the -std-geno-ldsc units option, which assumes that the genotypes in the model are in allele-count units (see

MAMA settings in empirical applications
The MAMA analyses in this paper are conducted using the following settings. We assume that the phenotype has a genetic correlation of 1 between populations but that the heritability may differ between populations. We also assume that the model is in standardized units. This assumption implies that the conditional effect of a onestandard deviation change in genotype is independently and identically distributed across the genome. This assumption also implies that the expected magnitude of effect sizes (in allele count units) is larger for SNPs that are rarer. Finally, because there is little to no variation in sample size across SNPs within each population's summary statistics in our data, there is also little variation in the standard errors. Thus, the intercept and squared standard error term are highly collinear in the LD score regression step of MAMA. To address this issue, we fix the intercept to zero and freely estimate the coefficient on the squared standard errors.

GWAS-equivalent sample size
As one of the metrics of performance, MAMA reports the "GWAS-equivalent sample size." It is parallel to the identically named metric used by MTAG 16  is the mean 4 statistic of the GWAS summary statistics, and HIGJ is sample size of the GWAS. Because MAMA standard errors are implicitly corrected for population stratification using the LD score intercept, to make the mean 4 statistic comparable between the GWAS and MAMA summary statistics, the GWAS standard errors were corrected using the LD score intercept before calculating HIGJ 4 . One difference between the GWAS-equivalent sample size in MAMA relative to MTAG is that the amount of weight put on the alternate set of summary statistics varies by SNP in MAMA, whereas it is constant in MTAG. As a result, the GWAS-equivalent sample size does not represent the sample size needed to obtain a certain level of power for any individual SNP, but instead it represents how large a sample would be needed to obtain an average amount of power genome-wide. For MAMA, the SNP-level effective sample size will be larger when the LD correlation is higher between the pairs of populations (see Supplementary Materials Section 1.5.1).

Identifying lead SNPs
Using the discovery GWAS and MAMA results for each phenotype, we identify a set of genome-wide-significant lead SNPs. If there are no SNPs with a P value less than 5 ´ 10 -8 , then there are no genome-wide significant lead SNPs for that set of summary statistics. Otherwise, we use the following algorithm to identify lead SNPs.
(1) The SNP with the smallest P value is set aside as a potential lead SNP.
(2) All SNPs that are on the same chromosome as the SNP identified in (1) and that are correlated with the SNP at R 2 > 0.1 are removed from the summary statistics. (3) Steps (1) and (2) are repeated until no SNP remains with a P value less than 5 ´ 10 -8 . (4) Among the list of potential lead SNPs, the SNP with the lowest P value is added to the final list of lead SNPs. (5) Any potential lead SNP within 500 kbp of the lead SNP identified in (4) is removed. (6) Steps (4) and (5) are repeated until no SNPs remain among the potential lead SNPs.
These steps are executed using two rounds of Plink's clumping algorithm. R 2 values were calculated using the 1000 Genomes Project reference panel sample corresponding to the population of the GWAS or MAMA summary statistics.

Identifying Novel SNPs
To assess whether the lead SNPs identified by MAMA are novel relative to the input GWAS summary statistics, we first remove any of the SNPs from the MAMA lead-SNP list that have the same rsID as any genome-wide-significant SNPs from the GWAS summary statistics used to generate the MAMA summary statistics. We then combine the reduced set of MAMA lead SNPs with the lead SNPs from the input GWASs, adjusting the P values of the lead SNPs from the GWASs so they are smaller than the minimum P value in the MAMA lead-SNP list. We then conduct the two-stage clumping algorithm described in the Methods section "Identifying lead SNPs" to this combined set of SNPs. Any of the SNPs that are retained from the reduced set of MAMA lead SNPs after clumping are considered "novel lead SNPs." Note that "novel" in this case does not imply that the SNPs are novel relative to the literature but rather that the SNPs are novel relative to the summary statistics used in the analysis.

Replication
To validate the lead SNPs and the novel lead SNPs identified by MAMA, we look up these SNPs in GWAS in an independent sample from the same population. We then compare the sign of the novel lead SNPs from MAMA to the signs of the GWAS in the replication sample corresponding to the same population and phenotype. Under the null hypothesis that the novel lead SNPs from MAMA are null in the replication sample, the number of times the MAMA estimates have a concordant sign with the replication GWAS estimates will have a Binomial(M,0.5) distribution, where M is the number of novel lead SNPs. We perform a one-sided test measuring whether the sign concordance is higher than expected given the null hypothesis. We did not perform a replication analysis for Physical Activity because no lead SNPs were found, nor for Schizophrenia because we did not have access to a replication sample.