## ABSTRACT

We introduce Multi-Trait Analysis of GWAS (MTAG), a method for the joint analysis of summary statistics from GWASs of different traits, possibly from overlapping samples. We demonstrate MTAG using data on depressive symptoms (*N _{eff}* = 354,862), neuroticism (

*N*= 168,105), and subjective well-being (

*N*= 388,538). Compared to 32, 9, and 13 genome-wide significant loci in the single-trait GWASs (most of which are novel), MTAG increases the number of loci to 74, 66, and 60, respectively. Moreover, the association statistics from MTAG yield more informative bioinformatics analyses and, consistent with theoretical calculations, improve prediction accuracy by approximately 25%.

## INTRODUCTION

The standard approach in genetic association studies is to meta-analyze association statistics from cohort-level genome-wide association analyses of a single trait. Such single-trait analyses do not exploit information that may be available from genomewide association studies (GWAS) of other, correlated traits. In this paper, we develop a method, Multi-Trait Analysis of GWAS (MTAG), which jointly analyzes GWAS results for several related traits, thereby boosting statistical power to detect genetic associations for *each* of the traits.

MTAG is a generalization of standard, inverse-variance-weighted meta-analysis. The estimator takes summary statistics from at least two single-trait GWASs and, after jointly analyzing these, outputs trait-specific association statistics. The set of results for each trait represents an optimal combination of the information from the singletrait summary statistics. The resulting *P* values can be interpreted like GWAS *P* values and used in the usual ways, for example, to prioritize SNPs for subsequent analyses such as biological annotation.

MTAG has several features that make it useful in many practical applications where existing multivariate methods may be infeasible. First, it can be applied without access to individual-level data, only requiring GWAS summary statistics from studies of quantitative or binary traits. Second, the summary statistics need not come from independent discovery samples: MTAG uses bivariate linkage disequilibrium (LD) score regression^{1} to account for (possibly unknown) sample overlap between the GWAS results for different traits. Third, running MTAG does not require powerful computational resources.

We examine the performance of MTAG in a range of simulation scenarios, and then we apply it to GWAS meta-analysis results for three traits: depressive symptoms (DEP, *N _{eff}* = 354,862), neuroticism (NEUR,

*N*= 168,105), and subjective well-being (SWB,

*N*= 388,538). We compare the results from MTAG to the results from standard GWAS along three dimensions: the power to detect associations with individual single-nucleotide polymorphisms (SNPs), the predictive power of a polygenic score constructed from all available SNPs, and the findings from applying the bioinformatics tool DEPICT

^{2}to the results. We show theoretically that the gains in statistical power from these MTAG analyses are equivalent to increasing the sample size of the original, single-trait GWASs by 37% (DEP), 96% (NEUR), and 85% (SWB). The corresponding gains in predictive power of a polygenic score are consistent with these theoretical expectations.

The standard errors of MTAG are calculated under the assumption that the correlation across traits in the effect size of a SNP is the same across all SNPs. In our simulations, we examine some scenarios in which—in violation of this assumption—a fraction of SNPs are related to one trait but not another. In these scenarios, MTAG generates anti-conservative standard errors and thus inflated test statistics, especially for such SNPs. Therefore, when applying MTAG in settings where the correlation in effect sizes across traits is expected to be systematically different for some SNPs than for others, we advise caution in interpreting associations with individual SNPs. Even in such settings, however, MTAG generates consistent effect-size estimates and hence is expected to produce gains in prediction accuracy.

The closest existing method we are aware of is a Bayesian analog of MTAG derived for the special case of no sample overlap^{3}. Multivariate methods have also been developed to enable identification of SNPs associated with some but not all of the traits^{4–6} and to discriminate between colocalization and different models of pleiotropy^{7–10} Yet another family of methods tests the joint hypothesis of a zero effect on every trait against the alternative of association with at least one of the traits, sometimes by combining multiple traits into a single variable prior to analysis^{11–13}. A recent, independent paper combines DEP, NEUR, and SWB into GWAS analyses using a subset of the data we use here^{14}.

## RESULTS

### Overview of MTAG

The key idea underlying MTAG is that when GWAS estimates from two different traits are correlated, the effect estimates for each trait can be improved by appropriately incorporating information contained in the GWAS estimates for the *other* traits.

Correlation between GWAS estimates can arise for two reasons. First, the traits may be genetically correlated, in which case the *true effects* of the single-nucleotide polymorphisms (SNPs) are correlated across traits. We standardize all traits and genotypes to have mean zero and variance one. For SNP *j*, we denote the vector of marginal (i.e., not controlling for other SNPs), true effects on each of the ** T** traits by

*β*_{j}. We assume that these true effects follow an infinitesimal model and treat them as random effects:

**, where**

*β*~_{j}*N*(0,Ω)**Ω**is the genetic variance-covariance matrix of the effect sizes across traits. Correlation in the true effects means that the off-diagonal elements of

**Ω**are non-zero. The matrix

**Ω**does not depend on the SNP

*j*, meaning that it is the variance-covariance matrix calculated across all SNPs.

Second, the *estimation error* of the SNPs’ effects may be correlated across traits. Such correlation will occur if (a) the phenotypic correlations are nonzero and there is sample overlap across traits, or if (b) biases in the SNP-effect estimates (e.g., population stratification or cryptic relatedness) have correlated effects across traits. We denote the vector of estimates of SNP *j’s* effects on the traits by **β̂**_{j}. As long as the GWAS of each trait was conducted in a large sample, its distribution can be well approximated as *β̂*_{j} \*β** _{j}* ~

*N*(

*β**,*

_{j}**Σ**

_{j}) where

**Σ**

_{j}is the variance-covariance matrix of the estimation error. The off-diagonal elements of

**Σ**

_{j}are non-zero whenever the estimation errors are correlated.

To derive the MTAG estimator of the effect of SNP *j* on trait *t*, we derive the marginal likelihood function^{15} *f*(*β̂*_{j} | *β* _{j},_{t|}) and maximize it with respect to *β* _{j},_{t} The solution (see Online Methods) is a weighted sum of the GWAS estimates:
where *ω*_{t}is a vector equal to the *t*^{th} column of **Ω** and *ω _{tt}* is a scalar equal to the

*t*

^{th}diagonal element of

**Ω**. The MTAG estimator is consistent.

For intuition, here we describe how MTAG works for two traits with a positive phenotypic correlation (see Supplementary Note). First, if the genetic correlation is zero and there is no sample overlap—that is, neither of the two sources of possible correlation in the GWAS estimates is present—then the MTAG estimates are identical to the GWAS estimates. Second, if the genetic correlation is positive but there is no sample overlap, then the MTAG estimator for the SNP effect on trait 1 will incorporate evidence from the GWAS estimate for trait 2. Finally, if the genetic correlation is zero but the GWAS samples overlap—thereby making the estimation error positively correlated—then the MTAG estimator for the SNP effect of trait 1 will assign a *negative* weight to the GWAS estimate for trait 2. In this circumstance, the GWAS estimate for trait 2 contains information about the estimation error in the GWAS estimate for trait 1 (cf. Evans^{16}). These are the two key forces at play with MTAG, but they can interact with each other in complex ways when both sources of correlation are present, when there are more than two traits, and when the GWASs for the traits are differently powered.

There are several useful special cases of MTAG (see Online Methods). For example, when all estimates are for the same trait (implying and , equation (1) simplifies to: . When the GWAS estimates are obtained from non-overlapping samples (i.e., **Σ**_{j} is diagonal), this case is the well-known formula for inverse-variance-weighted meta-analysis.

To make equation (1) operational, we use consistent estimates of **Σ**_{j} and **Ω**. In standard meta-analysis, the diagonal elements of **Σ̂**_{j} would be constructed using the squared standard errors from the GWAS results and the off-diagonal elements of **Σ̂**_{j}, would be set to zero. In MTAG, however, we want to allow the estimation error to include bias (in addition to sampling variation) and to be correlated across the GWAS estimates.

Therefore, MTAG proceeds by running linkage disequilibrium (LD) score regressions^{17} on the GWAS results and using the estimated intercepts to construct the diagonal elements of **Σ̂**_{j}. Next, bivariate LD score regressions^{1} are run using each pair of GWAS results, and the estimated intercepts are used to construct the off-diagonal elements of **Σ̂**_{j}. Under the assumptions of LD score regression, the resulting matrix **Σ̂**_{j} captures all relevant sources of estimation error, including population stratification, unknown sample overlap, and cryptic relatedness. Moreover, there is no need to adjust the results of MTAG using the LD score intercept (as is becoming standard for GWAS meta-analysis results) because the adjustment is already built into the MTAG estimates.

We estimate **Ω̂** by maximum likelihood, using as the likelihood function the marginal distribution *β̂*_{j} ~ *N*(**0**,**Σ**_{j} + **Ω**), with **Σ̂**_{j} substituted in place of **Σ**_{j}. Details and explanations about how **Σ̂**_{j} and **Ω̂** are estimated are in Supplementary Note 1.2.

In summary, the MTAG results for SNP *j* are obtained in three steps: (i) estimate the variance-covariance matrix of the GWAS estimation error, **∑̂**_{j} by using a sequence of LD score regressions, (ii) estimate the variance-covariance matrix of the SNP effects, **Ω̂**, using maximum likelihood, and (iii) for each SNP, substitute these estimates into equation (1). We have made available for download a Python command line tool that runs our MTAG estimation procedure (see URLs). Because most of the above steps have closed-form solutions, genome-wide analyses using the MTAG software run quickly (see Online Methods).

### Simulations

We conduct simulations to investigate the performance of MTAG across 3^{5} = 243 different statistical scenarios. For simplicity, we analyze two traits. For each scenario, we generate GWAS results at 100,000 independent SNPs for each trait, and we perform 1,000 replications. The scenarios differ across five dimensions. First, the true effect sizes *β̂*_{j} are drawn from one the following distributions: multivariate normal (as assumed in the derivation of MTAG), multivariate t-distribution (which has fatter tails), or multivariate spike-and-slab (which has a positive fraction of truly null SNPs). Second and third, we set the expected *χ*^{2}-statistic of the GWAS results for each of the two traits to be equal to 1.1 (low power), 1.4 (medium power), or 2.0 (high power). (For comparison, the GWAS of DEP (*h ^{2}* = 0.064) would have these expected

*χ*

^{2}-statistics if estimated in effective sample sizes of 81,189, 324,758, and 568,326 individuals, respectively.) Fourth, we set the correlation in SNP effect sizes between the traits equal to 0, 0.35, or 0.7. Fifth, we set the correlation of the estimation error equal to 0, 0.35, or 0.7.

We assess the gain in performance of MTAG over GWAS in each scenario using the root mean squared error (RMSE) of the estimates over all SNPs for each trait. RMSE reflects both the bias and the variance of the effect estimates, but we find that the bias of the MTAG estimates is very close to zero in all scenarios (Supplementary Tables 2.1, 2.2, and 2.3). In Supplementary Note and Supplementary Figures 2.1, 2.2, and 2.3, we additionally assess the improvements from MTAG in terms of the mean *χ*^{2}-statistic and the number of top hits for each trait. The conclusions are the same from all three metrics.

The patterns we will describe hold across all the scenarios we examined, but we illustrate them using a small selection of scenarios (for complete results, see Supplementary Note). For each scenario, Figure 1 shows the RMSE in the first trait’s GWAS estimates and in its MTAG estimates.

There are four main results illustrated in Figure 1. First, as expected, MTAG generates gains (i.e., reduces RMSE) when the traits are genetically correlated. This can be seen in Figure 1’s Scenario A, which we call the “baseline” scenario: effect sizes are drawn from a multivariate normal distribution, effect sizes have a correlation of 0.7, the estimation error has zero correlation, and both GWAS summary statistics have low power.

Second, MTAG improves the estimates even when one or both GWASs are low powered. Relative to the baseline scenario, Scenario B shows results for the scenario in which the GWAS of the first trait has low power and that of the second trait has high power, and Scenario C shows the scenario in which both have high power. The gains are larger in Scenario B.

Third, we confirm the theoretical result that MTAG yields improvements when the traits have correlated estimation error even when they are not genetically correlated. Scenario D shows results for the scenario where, relative to the baseline, we set the correlation of effect sizes to zero and the estimation error correlation to 0.7. The gains from MTAG are roughly equivalent in this scenario as in the baseline.

Fourth, across the alternative effect-size distributions we examined, MTAG reduces the RMSE even when the true effect-size distribution deviates from the assumption of multivariate normality. Scenario E shows the results for the scenario that is the same as the baseline, except that effect sizes are drawn from a spike-and-slab distribution with 60% of true effects being null for the first trait and 80% being null for the second. In this case, the gains from MTAG are smaller than in the baseline scenario but still positive. In fact, in every scenario in which MTAG produces gains over GWAS when effect sizes are normally distributed, MTAG also produces gains in the corresponding scenario with spike-and-slab or *t*-distributed effect sizes.

The simulations also shed light on two possible ways that MTAG could produce anticonservative standard errors and hence spuriously inflated *χ*^{2}-statistics. First, MTAG does not take into account the estimation error in **∑̂**_{j} and **Ω̂**. Across the range of scenarios we examined, however, this bias was minimal. Second, the correlation of the effect sizes across traits may be systematically different for some SNPs than others, in which case the standard error does not take into account this source of variation across SNPs (see Online Methods). We study an extreme case: a spike-and-slab scenario where a fraction of SNPs is null for trait 1 and non-null for trait 2. In this case, MTAG is expected to produce an inflated Type | error rate for the effects of the null SNPs on trait 1. In our simulations, this bias can be substantial, particularly if the GWAS is lower-powered for the trait for which the SNP is null (Supplementary Note and Supplemental Table 2.4). We conclude that MTAG can generate anti-conservative standard errors when the correlation of effect sizes across traits varies systematically across SNPs. However, as noted above, the MTAG point estimates are consistent and have lower RMSE (taken over all SNPs) in all the scenarios we considered. We may therefore expect improvements in prediction accuracy even in settings where single-SNP associations should be interpreted with caution.

### GWAS Summary Statistics for Depression, Neuroticism, and Subjective WellBeing

For our empirical application of MTAG, we build on a recent study by the Social Science Genetic Association Consortium (SSGAC) of three traits that have been found to be highly polygenic and strongly genetically related: depressive symptoms (DEP), neuroticism (NEUR), and subjective well-being (SWB). Those analyses combined data from the largest previously published studies^{18–21} with new genomewide analyses from the genetic testing company 23andMe, Inc. and the first release of the UK Biobank (UKB) data. Relative to the previous SSGAC study, we reran the association analyses in UKB using a slightly revised analysis protocol, and much more importantly, we expanded the SSGAC meta-analyses for DEP and SWB. For DEP, we added the results from a recently published GWAS of depression in a large 23andMe cohort^{21}, and for SWB, we added new association analyses of SWB in a 23andMe cohort. As depicted in Figure 2, there is substantial overlap between the estimation samples for the three traits. For additional details, see Online Methods and Supplementary Note.

### MTAG Results

To enable a fair comparison between the MTAG and GWAS results, we restrict all analyses only to those SNPs that are available in all three GWASs and that survive our recommended filters for MTAG. These filters include omitting SNPs with low minor allele frequency, SNPs with small sample size, and SNPs in genomic regions with uncharacteristically large effect sizes. Details and explanations for these restrictions are in Online Methods.

For each trait, we assess the gain in average power from MTAG relative to the GWAS results by the increase in the mean **χ**^{2}-statistic. We use this increase to calculate how much larger the GWAS sample size would have to be to attain an equivalent increase in expected **χ**^{2} (Online Methods). We find that the joint analysis of DEP, NEUR, and SWB via MTAG yielded gains equivalent to augmenting the original samples sizes by 37%, 96%, and 85%, respectively. The resulting GWAS-equivalent sample sizes are thus 487,603 for DEP, 329,835 for NEUR, and 718,284 for SWB.

Figure 3 shows side-by-side Manhattan plots from the GWAS and MTAG analyses for each trait. Approximately independent genome-wide significant SNPs, hereafter **“lead SNPs,”** were defined by clumping (Online Methods). From GWAS to MTAG, the number of lead SNPs increases from 32 to 74 for DEP, from 9 to 66 for NEUR, and from 13 to 60 for SWB. A list of all clumped SNPs with a *P* value less than 10^{−5} is in Supplementary Table 4.1.

### Replication of MTAG-identified Loci

To test the lead SNPs for replication, we use the Health and Retirement Study (HRS) as our prediction sample because it contains high-quality measures of DEP (*N =* 8,307), NEUR (*N* = 8,197), and SWB (*N* = 6,857). Because HRS was included in the SSGAC discovery sample for SWB, we re-ran the GWAS and MTAG analyses for SWB after omitting it. Although replicating the lead SNPs individually would be underpowered, we are well powered to test them as a group: for the set of MTAG-identified lead SNPs for each trait, we regressed the effect sizes in HRS on the MTAG effect sizes, after correcting the MTAG effect-size estimates for the winner’s curse (Online Methods). The regression slopes are 0.81 (s.e. = 0.24) for DEP, 0.79 (s.e. = 0.22) for NEUR, and 0.90 (s.e. = 0.42) for SWB. In all cases, the slope is statistically significantly greater than zero (one-sided *P =* 6.28 × 10^{−4}, 3.22 × 10^{−4}, and 1.67 × 10^{−2}, respectively) and not statistically distinguishable from one, indicating that the lead SNPs replicate in an independent sample.

### Polygenic Prediction

We next compare the predictive power of polygenic scores constructed from the GWAS and MTAG results. Each score is constructed as the weighted sum of SNP genotypes using all SNPs in the analysis. These weights are calculated by passing the GWAS or MTAG estimates into the software LDpred^{22}, which corrects for LD across SNPs. We again use the HRS as our prediction sample (and we obtain the SNP effect estimates for SWB from the analyses that omit it from the discovery sample). In addition to our filters above, the SNPs used in the polygenic score are further restricted to genotyped SNPs that meet a number of other strict quality control filters in the HRS data (see Supplementary Note).

We measure the predictive power of each polygenic score by its incremental *R ^{2}*, defined as the increase in coefficient of determination (

*R*) as we move from a regression of the trait only on a set of controls (year of birth, year of birth squared sex, their interactions, and 10 principal components of the genetic data) to a regression that additionally includes the polygenic score as an independent variable. We use the bootstrap to estimate confidence intervals for the incremental

^{2}*R*

^{2}’s as well as for the differences in incremental ft

^{2}across polygenic scores.

Figure 4 and **Table 1** summarize the results. The GWAS-based polygenic scores have incremental *R*^{2}’s of 0.81% for DEP, 1.28% for NEUR, and 1.46% for SWB. The corresponding MTAG-based polygenic scores all have greater predictive power: 1.00% for DEP, 1.58% for NEUR, and 1.84% for SWB. The proportional improvement in incremental *R ^{2}* is roughly 25% for each trait. In each case, the 95% confidence interval of the difference excludes a gain of zero. The improvements in

*R*are close to those we would expect theoretically based on the observed increases in mean χ

^{2}^{2}-statistics (see Online Methods).

### Biological Annotation

For a final comparison, we analyze both the GWAS and MTAG results using DEPICT^{2}. We present the prioritized genes, enriched gene sets, and enriched tissues identified by DEPICT at the standard false discovery rate threshold of 5%.

Table 1 summarizes the results (see Supplemental Tables 6.1 to 6.9 for the complete set of findings). In the GWAS-based analysis, very little enrichment is apparent. For DEP, 4 genes are identified, but no gene sets or tissues. For NEUR and SWB, no genes, gene sets, or tissues are identified. In contrast, the MTAG-based analysis is more informative. The strongest results are again for DEP, now with 99 genes, 351 gene sets, and 23 tissues. For NEUR, there are 7 genes, 1 gene set, and 13 tissues, and for SWB, 8 genes, 9 gene sets, and 14 tissues.

For brevity, we discuss the specific results only for DEP; the results for NEUR and SWB are similar but more limited. For the tissues tested by DEPICT, Figure 5a plots the *P* values based on both the GWAS and MTAG results. As expected, nearly all of the enrichment of signal is found in the nervous system. To facilitate interpretation of the enriched gene sets, we used a standard procedure^{23} to group the 351 gene sets into ‘clusters’ defined by degree of gene overlap. Many of the resulting 41 clusters, shown in Figure 5b, implicate communication between neurons (‘synapse,’ ‘synapse assembly,’ ‘regulation of synaptic transmission,’ ‘regulation of postsynaptic membrane potential’). This evidence is consistent with that from the DEPICT-prioritized genes, many of which encode proteins that are involved in synaptic communication. For example, *PCLO, BSN, SNAP25*, and *CACNA1E* all encode important parts of the machinery that releases neurotransmitter from the signaling neuron^{24}.

The results, however, also contain some intriguing findings. For example, while hypotheses regarding major depression and related traits have tended to focus on monoamine neurotransmitters, our results as a whole point much more strongly to glutamatergic neurotransmission. Moreover, the particular glutamate-receptor genes prioritized by DEPICT *(GRIK2, GRIK3, GRM1, GRM5*, and *GRM8*) suggest the importance of processes involving communication between neurons on an intermediate timescale^{25,26}, such as learning and memory. Such processes are also implicated by many of the enriched gene sets, which relate to altered reactions to stress and novelty in mice (e.g., ‘decreased exploration in a new environment,’ ‘increased anxiety-related response,’ ‘behavioral fear response’).

## DISCUSSION

We have introduced MTAG, a method for conducting meta-analysis of GWAS summary statistics for different traits which is robust to sample overlap. Both our simulation and real-data results confirm that MTAG can increase the statistical power to identify genetic associations with *each* trait. MTAG can be especially useful when the GWAS for the trait of interest is underpowered but shows substantial genetic correlation with other traits. In our real-data application to the traits DEP, NEUR, and SWB, we found that relative to the separate GWASs for the traits, MTAG led to substantial improvements in number of loci identified, predictive power of polygenic scores, and informativeness of a bioinformatics analysis. Table 1 summarizes the gains from MTAG across these analyses.

Although we have emphasized the use of MTAG to combine GWAS results across different traits, MTAG also has two features that can also make it a valuable tool for conventional meta-analyses of a single trait (see Online Methods). First, the method is robust to sample overlap and cryptic relatedness across cohorts. Such overlap can occur if closely related individuals are participants in different cohorts from the same country. The latter scenario may become increasingly common as data from large, national biobanks are incorporated into genetic-association studies together with data from other cohorts. Second, even in meta-analyses that are purportedly of a single trait, different cohorts often have phenotypic data from different measures, which may have different levels of heritability (due to having different amounts of measurement error). In such cases, using MTAG weights instead of sample-size or inverse-variance weights will improve statistical power.

## URLs

Social Science Genetic Association Consortium (SSGAC) website: http://www.thessgac.org/#!data/kuzq8.

## CONTRIBUTOR LIST FOR THE 23andMe RESEARCH TEAM

Michelle Agee, Babak Alipanahi, Adam Auton, Robert K. Bell, Katarzyna Bryc, Sarah L. Elson, Pierre Fontanillas, Nicholas A. Furlotte, David A. Hinds, Bethann S. Hromatka, Karen E. Huber, Aaron Kleinman, Nadia K. Litterman, Matthew H. McIntyre, Joanna L. Mountain, Carrie A.M. Northover, J. Fah Sathirapongsasuti, Olga V. Sazonova, Janie F. Shelton, Suyash Shringarpure, Chao Tian, Joyce Y. Tung, Vladimir Vacic, Catherine H. Wilson, and Steven J. Pitts.

## AUTHOR CONTRIBUTIONS

B.N., D.B., D C. and P.T. oversaw the study. The theory underlying MTAG was conceived of and developed by P.T., with contributions from B.N., D.B., D.C., D.L., O.M. and R.W. O.M., P.T., and R.W. performed the simulations and developed the MTAG software. P.T. and P.V. designed the analyses comparing the observed MTAG gains to theoretical expectations. A.O., D.C., and O.M. played major roles in data analyses. J.J.L. designed and executed the bioinformatics analyses. D.B., D.C., and P.T. coordinated the writing of the manuscript. All authors provided input and revisions for the final manuscript.

## COMPETING FINANCIAL INTERESTS

The authors declare no competing financial interests.

## ONLINE METHODS

This article is accompanied by a Supplementary Note with further details.

### Theory

There are *T* traits, which may be binary or quantitative. We standardize each trait and the genotype for each single-nucleotide polymorphism (SNP) *j* so that they all have mean zero and variance one. The length-*T* vector of marginal (i.e., not controlling for other SNPs), true effects of SNP *j* on each of the traits is denoted *β*_{j}. We assume that these are random effects drawn *β*_{j} ~ *N(***Ο,Ω**). The mean is zero because we treat the choice of reference allele as arbitrary. We make the common assumption^{17,22,27} that the *β*_{j}’s are identically distributed across *j*. The assumption implies that each SNP’s expected contribution to a trait’s heritability is equal, regardless of SNP characteristics such as allele frequency.

The length-*T* vector of GWAS estimates is denoted *β̂*_{j}. We assume that *β̂*_{j} | *β** _{j}* ~

*N*(

*β**,*

_{j}**∑**

_{j}), where the matrix,

**∑**

_{j}is the variance-covariance matrix of estimation error, which includes sampling variation and biases (such as population stratification or technical artifacts). The matrix,

**∑**

_{j}may differ across SNPs due to differences in the SNPs’ sample sizes per trait and the SNPs’ sample overlap between traits.

By the properties of normal distributions, the conditional distribution of the estimated effect vector *β̂*_{j}, conditional on the true effect of the SNP on *only one* of the traits is
where *ω*_{t} is a vector equal to the *t*^{th} column of **Ω** and is a scalar equal to the *t ^{t}*

^{h}element of

*ω*_{t}(or equivalently, the

*t*

^{th}diagonal element of

**Ω**). The MTAG estimator is obtained by maximizing this marginal likelihood function

^{15}with respect to

*β*. In the Supplementary Note, we show that the solution is:

_{j,t}Since the MTAG estimator with known **Σ**_{j} and **Ω** is unbiased:

E(*β̂** _{MTAG}, _{j,t}* |

*β*

_{j},_{t}) =

*β**. Therefore, when calculated with consistent estimates*

_{j,t}**∑̂**

_{j}and

**Ω̂**, the MTAG estimator is consistent.

The sampling variance of the estimator is

For each trait *t*, the standard error of the estimate is the square root of this quantity. As is standard, we obtain a *P* value using the fact that in large samples, *β̂*_{MTAG,j,t} divided by its standard error follows a standard normal distribution under the null hypothesis.

The above formulas for the MTAG estimator and its standard error use the variance-covariance matrix of true SNP effects across all SNPs, **Ω**, to calculate the optimal MTAG weights for each SNP. If in fact there are different types of SNPs characterized by different variance-covariance matrices, then the standard error formula is anti-conservative because it does not account for this source of variation across SNPs. The MTAG estimator, however, remains consistent (but could be made more efficient if it took into account the different types of SNPs).

For expositional simplicity, our derivations above and in Supplementary Note are parameterized in terms of the parameter vector *β̂*_{j} We note, however, that the input to the MTAG software is the standard output from meta-analysis software: z-statistics and sample sizes (Supplementary Note, Section 1). Because MTAG is applied to z-statistics, the GWAS summary statistics do not need to have been estimated using traits and genotypes that were standardized.

### Special Cases

There are four special cases of MTAG that may often be relevant in practice and for which the estimation procedure is made faster and more efficient. The MTAG software offers the option to specialize the analysis for these cases.

### No sample overlap across traits

In this case, the off-diagonal elements of **Σ**_{j} can be set equal to zero, so LD score regression needs to be run only *T* rather than *T*(*T +* 1)/2 times. Note that this version of MTAG does not take into account correlation in estimation error across traits that is due to bias. For this reason, LD score regression should be run on the MTAG results, and the resulting MTAG standard errors should be inflated by the square root of the estimated intercept.

### Perfect genetic correlation but different heritabilities

This case arises when the “traits” are different measures of the same trait, some with more measurement error than others, or when the variance in the trait due to non-genetic factors differs. Here the **Ω** matrix has only ** T** rather than

*T*(

*T +*1)/2 unique parameters to be estimated.

### Perfect genetic correlation and equal heritabilities

This special case corresponds to the “traits” being (the same measure of) a single trait; in other words, applying MTAG instead of inverse-variance-weighted meta-analysis to *T* GWAS results. Doing so can be useful if there is sample overlap in the GWAS results. In this case, as noted in the main text, MTAG specializes to for all *t*, and it is no longer necessary to estimate **Ω**.

### Equal sample sizes across SNPs

In this case, the maximum-likelihood estimation of the **Ω** matrix has a closed-form solution, substantially reducing computation time. When applying MTAG to a large number of traits, numerical maximum-likelihood estimation becomes computationally intensive, so it may be preferable to instead restrict the set of SNPs to those with approximately equal sample sizes and use this special case.

### Computational Run-time

The only step of MTAG that may be computationally intensive is estimating the **Ω** matrix via numerical maximum likelihood (the other steps have closed-form solutions and therefore execute very quickly). In the real-data application described in this paper—with three traits and 6.1M SNPs—this step takes approximately 4.5 hours, and the MTAG estimation overall takes approximately 4.9 hours. For comparison, we apply MTAG with the option selected for equal sample sizes across SNPs to the same GWAS summary statistics except that, for each trait, we treat the sample size for each SNP as if it were equal to the mean sample size across SNPs for that trait. In that case, the run-times fall to 2 seconds and 31 minutes, respectively.

The reported run times are the medians of five identical runs using one core of a 2.20 GHz Intel(R) Xeon(R) CPU E5-2650 v4 processor. Of course, run time may vary as a function of the computing environment.

### Simulations

To speed computations, instead of simulating data and then estimating effect sizes, we directly generated effect-size estimates by adding multivariate normally distributed noise to the simulated effect sizes. The variance of the noise for each trait was pinned down by the scenario’s expected *χ*^{2}-statistic, and the covariance of the noise between the traits was pinned down by the scenario’s value for the correlation of the effect sizes.

In our simulation, we cannot estimate **∑**_{j} using LD Score regression because there is no LD between SNPs. Nonetheless, we would like to verify that MTAG is robust to the sampling variation in the estimate **∑̂** _{j}. To accomplish this, in each replication we directly generated an estimate **∑̂** _{j}. by adding noise to the true value of **∑**_{j}. The variance of the noise was set to be roughly equal to the sampling variance observed in the empirical application considered in this paper.

GWAS Meta-analyses of DEP, NEUR, and SWB. Details on the cohorts, phenotype measures, genotyping, quality-control filters, and association models are provided in Supplementary Note and Supplementary Table 1.1. As shown in Figure 2, there is substantial overlap in samples across the three GWAS meta-analyses.

All analyses were based on autosomal SNPs from cohorts with genotypes imputed against the 1000 Genomes reference panel. The input files in each meta-analysis were subject to a uniform set of quality-control and diagnostic procedures. These are described in the previous SSGAC study^{20} and Supplementary Note.

As expected under polygenicity^{28}, we observe inflation of the median test statistic in each GWAS (λGC,DEP = 1.36, λGc,NEUR = 1.24, λGc,swB = 1.28; see Supplementary Fig 3.5). The intercept estimates from LD score regression are all below 1.02, however, suggesting that nearly all of the observed inflation is due to polygenic signal^{17}. When we report GWAS results, as in the SSGAC study^{20} we account for the potential bias due to this small amount of stratification by inflating the standard errors of our GWAS estimates by the square root of the LD score regression intercept.

Manhattan plots from each of the GWAS meta-analyses are shown in Supplementary Figs 3a, 3b and 3c. Our NEUR meta-analysis was based on the same cohort-level data as the SSGAC study^{20} and unsurprisingly yielded substantively identical results: 12 lead SNPs. Consistent with what studies have reported for other complex traits, the increased discovery samples for DEP and SWB relative to the SSGAC study increased the number of lead SNPs: from 2 to 32 for DEP (*N*_{eff} = 149,707 to 354,862) and from 3 to 13 for SWB (*N* = 298,420 to 388,538). Applying bivariate LD score regression^{1} to the GWAS results, we estimate the genetic correlations to be 0.72 (s.e. = 0.025) between DEP and NEUR, −0.67 (s.e. = 0.027) between NEUR and SWB, and −0.687 (s.e. = 0.024) between DEP and SWB.

### Clumping Algorithm

We applied the same clumping algorithm to the GWAS and MTAG results to identify each set of “lead SNPs.” Our clumping algorithm is the same as in the previous SSGAC study^{20}. First, the SNP with the smallest *P* value was identified in the meta-analysis results. This SNP was designated the index SNP of clump 1. Second, we identified all SNPs on the same chromosome whose LD with the index SNP exceeds *R ^{2}* = 0.1 and assigned them to clump 1. To generate the second clump, we removed the SNPs in clump 1 and then iterated the process to identify further index SNPs and their corresponding clumps until no SNPs remain.

### MTAG SNP Filters

Since the derivation of MTAG relies on some assumptions regarding the distributions of the effect sizes and estimation error, its performance may be sensitive to violations of those assumptions. To reduce the risk of extreme violations, when we apply MTAG, we impose three additional SNP filters beyond the standard filters used in a GWAS.

First, we restrict the set of SNPs to those with a minor allele frequency greater than 1%. This filter is motivated by the assumption that each SNP contributes equally to heritability in expectation. Rare variants may follow a different effect-size distribution both in terms of the variance and covariance of their effect sizes, which could lead to biased MTAG estimates.

Second, for each trait, we restrict variation in sample size by calculating the 90^{th} percentile of the SNP sample-size distribution and removing SNPs with a sample size smaller than 75% of this value. This filter is similar to, though slightly more strict than, the sample-size filter recommended for LD Score regression^{17}. If a SNP’s effect is estimated in a relatively small sample, then the sample overlap across traits will likely be different for that SNP than for other SNPs in the sample. In that case, the covariance of the estimation error across traits as estimated by LD Score regression may not be a good approximation of the covariance of the estimation error for that SNP.

Third, we drop SNPs in genomic regions containing SNPs that are outliers with respect to their effect-size estimates. Because the effect size of these SNPs appear to have a different variance-covariance matrix than the rest of the genome, including these regions would likely lead to the biases and inefficiencies that can occur when **Ω** is not constant across SNPs. In our real-data application, in GWAS of neuroticism, the effect sizes of SNPs in a region of chromosome 8 that tag an inversion polymorphism have been found to be strongly inflated relative to the effects estimated for SNPs in all other regions of the genome^{20,29}. Therefore, we omit SNPs in chromosome 8 between base-pair positions 7,962,590 and 11,962,591.

### GWAS-Equivalent Sample Size for MTAG

The increase in the mean *χ*^{2}-statistic for each trait from the GWAS results to the MTAG results can be used to calculate a “GWAS-equivalent sample size” for MTAG. Under the assumptions of LD score regression^{17}, the expected *χ*^{2}-statistic for some SNP with LD score *ℓ _{j}* is
where

*N*is the sample size for SNP

_{j}*j; h*is the SNP-heritability of the trait;

^{2}*M*is the number of SNPs for which we define the SNP-heritability; and

*a*is the variance due to biases (e.g., due to population stratification). Note that scales linearly with

*N*. Thus, we can use the mean

_{j}*χ*

^{2}-statistic from the GWAS and the MTAG results to calculate how much larger the GWAS sample size would have to be to give a mean

*χ*

^{2}-statistic equal to that attained by MTAG: where is the mean

*χ*

^{2}-statistic in the GWAS results, and is the mean

*χ*

^{2}-statistic in the MTAG results. Put another way, conducting MTAG gives the same power (as measured by mean

*χ*

^{2}-statistic) as conducting GWAS in sample size that is larger by a factor of .

For DEP, going from GWAS to MTAG, the mean *χ*^{2}-statistic increases from 1.44 to 1.60, implying an increase in the GWAS-equivalent sample size by a factor of

Thus, the MTAG analysis has statistical power equivalent to a GWAS of DEP conducted in 354,862 × 137% = 487,603 individuals. For NEUR, the mean *χ*^{2}-statistic rises from 1.284 to 1.557, implying a GWAS-equivalent sample size for MTAG that is 96% larger than the GWAS sample size: the effective sample size is 168,105 × 196% = 329,835 individuals. For SWB, the mean *χ*^{2}-statistics rises from 1.308 to 1.570, implying a GWAS-equivalent sample size 85% larger than the GWAS: 388,538 ϗ 185% = 718,284 individuals.

Expected Increase in Mean *χ*^{2}-statistic from MTAG. The expected *χ* ^{2}-statistic of the GWAS summary statistics for trait *t* is
where is the *t*^{th} diagonal element of Σ_{;j.} The expected *χ*^{2}-statistic of the MTAG summary statistics for trait *t* is
where we can substitute E(*β̂*_{j} ** β̂**′

_{j}) = (

**Ω+**Σ

_{;j}).

Plugging our estimates **Ω̂** and **Σ̂** into the above equations, we can use the expected *?*^{2}-statistics to calculate the theoretically expected gain in GWAS-equivalent sample size from applying MTAG (as derived previously):

Note that **∑̂**_{j} is a function of the sample sizes used to generate the GWAS summary statistics, and we use the **∑̂**_{j} corresponding to the maximum sample size in the SNPs used in MTAG. Based on this formula, the theoretically expected increases in the GWAS-equivalent sample sizes are 37%, 96%, and 85% for DEP, NEUR, and SWB, respectively. These are close to the observed increases of 39%, 89%, and 92%.

**Winner’s** Curse Correction for Replication Analysis. MTAG estimates are corrected for winner’s curse following the same procedure found in Okbay et al.^{20} Briefly, for each trait, we use maximum likelihood to fit the MTAG results to a spike-and-slab distribution such that

For DEP, NEUR, and SWB, we estimate *π̂* to be 0.565, 0.563, and 0.549 and τ̂ ^{2} to be 3.04 × 10^{−6}, 4.72 × 10^{−6}, and 2.02 × 10^{−6}, respectively. We then use these estimates as the parameters of the prior distribution and calculate the posterior distribution of the effect size *β _{j}* given the estimate

*β̂*

_{MTAG},

_{j}for each SNP as where

*π*

_{post,j}is the posterior probability that

*β*= 0 and is the squared standard error of the MTAG estimate. For further details about the estimation procedure, see Section 8 of the Supplementary Note of Okbay et al.

_{j}^{20}

### Polygenic Prediction

We used the Health and Retirement Study^{30} (HRS) as our holdout cohort for polygenic prediction. We applied the same SNP filters as in the main MTAG analyses. We also imposed several additional, HRS-specific filters in order to ensure that the polygenic scores are based only on SNPs that were reliably genotyped in the HRS. Specifically, SNPs were omitted unless they (i) were genotyped in the HRS cohort with a call rate greater than 98%, (ii) have a Hardy-Weinberg Equilibrium (HWE) test *P* value greater than 10^{−4} in the HRS, (iii) have a minor allele frequency (MAF) greater than 1% in the HRS, and (iv) are not strand ambiguous.

Bootstrapped confidence intervals were calculated by drawing, with replacement, a sample of equal size to the prediction sample, and calculating the incremental R^{2} for the GWAS-based polygenic score, the MTAG-based polygenic score, and the difference between them. As the bounds of the 95%-confidence intervals, we use the 2.5^{th}- and 97.5^{th}-percentile values of the incremental *R ^{2}*’s across 500 bootstrap draws.

### Expected Increase in Polygenic-score Predictive Power from MTAG

The phenotypic value of a trait in individual *i*, denoted *y _{i}*, can be decomposed into the sum of the additive genetic variance component and a residual:

We denote the GWAS- and MTAG-based polygenic scores for the trait by *ĝ* _{GWAS,i} and *ĝ* _{MTAG},_{i}, respectively. Note that GWAS and MTAG produce unbiased estimates of the SNP effect sizes, and LDpred produces a consistent estimate of the additive genetic variance component. Therefore, each polygenic score *k* ∈ {GWAS, MTAG} is approximately equal to *g _{i}* plus estimation error:

By the central limit theorem, the estimation error is approximately normally distributed,
with variance *V _{k}* that is proportional to the GWAS-equivalent sample size of the estimator (which we can estimate for the MTAG polygenic score using the approach described in a previous section of the Online Methods).

The expected predictive power of a polygenic score is
where *h*^{2} is the SNP heritability of the trait^{31}. Using the GWAS results, we obtain an estimate of *h*^{2} using LD score regression^{17} and an estimate of from the predictive power of the GWAS-based polygenic score. Plugging these estimates into the above formula, we solve for an estimate of *V* _{GWAS}. We then multiply this value by (which we showed previously is equal to the ratio of the GWAS sample size to the MTAG’s GWAS-equivalent sample size) to obtain an estimate of *V*_{MXAG}. Substituting this back into the above formula along with our estimate of *h*^{2} gives us the expected predictive power of the MTAG-based PGS.

For DEP, we anticipated an increase in predictive power of 0.25 percentage points, which is within the estimated confidence interval [0.04, 0.36]. For NEUR, we expected an increase of 0.95 percentage points, which is greater than the confidence interval [0.05, 0.59]. For SWB, we expected an increase of 0.62 percentage points, which is again within the confidence interval [0.03, 0.72]. While not all of these predictions fall within the estimated confidence intervals, some of this discrepancy may be due to differences in heritability or imperfect genetic correlation between the discovery and prediction sample.^{32} As a whole, we find that the observed predictive power of the MTAG-based polygenic score is largely consistent with this theoretical expectation.

### Biological Annotation

For the identification of genes, we supplemented the DEPICT inventory with protein-coding genes that have a status of ‘known’ in GENCODE (downloaded February 26, 2015). Specifically, we assigned such a gene to a lead SNP in Supplementary Tables 6.1, 6.4, and 6.7 if it either encompasses the SNP in a DEPICT-defined locus or has the transcription start site closest to such a SNP.

## ACKNOWLEDGMENTS

We thank Jonathan Beauchamp, Philipp Koellinger, Örjan Sandewall, Carl Shulman, and Ronald de Vlaming for helpful comments. This research was carried out under the auspices of the Social Science Genetic Association Consortium (SSGAC). The SSGAC seeks to facilitate studies that investigate the influence of genes on human behavior, well-being, and social-scientific outcomes using large genome-wide association study meta-analyses. The study was supported by the Ragnar Söderberg Foundation (E9/11 E42/15), the Swedish Research Council (421-2013-1061), The Jan Wallander and Tom Hedelius Foundation, an ERC Consolidator Grant (647648 EdGe), the Pershing Square Fund of the Foundations of Human Behavior, and the NIA/NIH through grants P01- AG005842, P01-AG005842-20S2, P30-AG012810, and T32-AG000186-23 to NBER and R01-AG042568-02 to the University of Southern California. This research has also been conducted using the UK Biobank Resource under Application Number 11425. We thank the research participants and employees of 23andMe for making this work possible. A full list of acknowledgments is provided in the Supplementary Note.