Testing and correcting for weak and pleiotropic instruments in two‐sample multivariable Mendelian randomization

Multivariable Mendelian randomization (MVMR) is a form of instrumental variable analysis which estimates the direct effect of multiple exposures on an outcome using genetic variants as instruments. Mendelian randomization and MVMR are frequently conducted using two‐sample summary data where the association of the genetic variants with the exposures and outcome are obtained from separate samples. If the genetic variants are only weakly associated with the exposures either individually or conditionally, given the other exposures in the model, then standard inverse variance weighting will yield biased estimates for the effect of each exposure. Here, we develop a two‐sample conditional F‐statistic to test whether the genetic variants strongly predict each exposure conditional on the other exposures included in a MVMR model. We show formally that this test is equivalent to the individual level data conditional F‐statistic, indicating that conventional rule‐of‐thumb critical values of F> 10, can be used to test for weak instruments. We then demonstrate how reliable estimates of the causal effect of each exposure on the outcome can be obtained in the presence of weak instruments and pleiotropy, by repurposing a commonly used heterogeneity Q‐statistic as an estimating equation. Furthermore, the minimized value of this Q‐statistic yields an exact test for heterogeneity due to pleiotropy. We illustrate our methods with an application to estimate the causal effect of blood lipid fractions on age‐related macular degeneration.


Introduction
Instrumental variables (IV) is a form of regression analysis which estimates the causal effect of an exposure on an outcome in the presence of unobserved confounding. Mendelian randomisation (MR) is a rapidly expanding application of the IV method in the field of epidemiology in which genetic variants are used as instruments. If genetic variants -usually single nucleotide polymorphisms (SNPs) -are available which reliably predict the exposure and are not associated with the outcome through any other pathway, then they are valid IVs. These genetic variants can then be used as instruments to obtain an estimate for the causal effect of a modifiable health exposure on a disease These assumptions are illustrated in Fig 1. A violation of IV1 induces 'weak instrument bias' in the resulting estimates 5,6 . In a conventional (univariable) MR analysis, the definition of instrument strength is straightforward and unambiguous. Assumption IV1 can be tested with an F-statistic, which tests the association between the SNP and the exposure. When univariable MR analysis based on individual level data from a single sample, if the F-statistic is larger than the rule-ofthumb value of 10 then the SNPs are said to be a 'strong' instrument. We can then reject the null hypothesis that the instruments are weak in the sense that the bias of the MR estimate is equal to or greater than 10% of the observational (OLS) association. 5,6 In any MVMR analysis it is also necessary that this F-statistic is large for each exposure included, but this is no longer sufficient; the SNPs also need to predict each exposure conditional on the other predicted exposures. This additional condition ensures that there is sufficient variation in association between the SNPs and each exposure, to avoid a problem of weak instrument bias in the MVMR model. Therefore when testing for weak instruments, verifying that F SW is greater than the rule-of-thumb of 10 means that we can reject the null hypothesis that the average bias of the MVMR estimates is at least 10% of the bias of the equivalent multivariable OLS estimates.
Application to two-sample summary data MR When individual level data on the genetic variants, exposure and outcome are not available two sample MVMR can be conducted using summary data estimates of SNP-exposure and SNP-outcome associations. In two-sample MR, weak instruments bias the causal estimates towards the null rather than the observational association. Sanderson  The two sample conditional F-statistic calculated for these data helped to highlight that it was not possible to strongly predict multiple lipid fractions from the same subgroup despite each lipid fraction having a moderately high F-statistic. Any analyst naively applying MVMR methods to such data without the correct diagnostic statistics to hand is in danger of generating poor quality and potentially misleading results.

A Test for Weak Instruments
Let X = (X 1 , X 2 , ..., X K ) be a set of K exposure variables and let G be a set of L instruments G = (G 1 , G 2 , ..., G L ). Define the K × L matrix of associations between each exposure and each instrument as; where for example π 32 represents the association between exposure 3 and SNP 2. Without loss of generality, testing whether the instrument set G can explain variation in a single exposure, X 1 , conditional on all other exposures (X 2 , ..., X K ) is equivalent to testing whether model (2) below is identified Here: δ 01 and each π 0m are scalar parameters; δ 1 is a K − 1 vector of parameters, and 1 and m are random error terms. Collecting π 2 , ..., π K into a single (K − 1) × L matrix, define Π −1 as the matrix Π minus its first row. If this model is overidentified then the rank of Π −1 is > K − (L − 1).
In two sample summary data settings we do not directly observe exposures X 1 , ..., X K , only estimates for the K × L SNP-exposure associations that defineΠ. However we can use these association estimates to define an analogous formula to (2) Where Π −1j is the jth column of Π. The Q statistic for exposure 1 based on the summary data estimates can be written as; where the variance term σ 2 x k j is given by; , andδ k is an efficient estimator for δ k , for example estimated through an inverse variance weighted least squares regression ofπ 1 onΠ −1 .
The matrix Σ V,j defines the covariance of the estimated effects of snp j on each of the exposures: If eachπ kj is obtained separately via univariable regressions with an intercept, then the error terms are obtained from the expressions: Under the null hypothesis that the instruments do not contain enough information to predict both exposure variables, Q x 1 will be asymptotically χ 2 L−1 distributed where L is the number of SNPs in the estimation. Rejecting the null hypothesis indicates that the SNPs can predict X 1 conditional on X 2 . Dividing the Q-statistic described above by the number of instruments, adjusted for the number of exposures, in the model gives a test statistic that is equivalent to the one sample conditional F statistic F SW .
In Supplementary Section S.1 we show that under the assumption that the instruments are uncorrelated the two sample conditional F-statistic F T S in (7) is equivalent to the one sample conditional F-statistic F SW .

Critical values
Comparing this statistic to standard critical values from the F-distribution provides a test for a lack of identification. However, even if the genetic instruments explain some of the variation in the exposure they could still be 'weak'. In this case the estimates obtained from the MVMR estimation could still be considerably biased. The one sample conditional F-statistic (F SW ) has the same distribution as the Stock-Yogo weak instrument test 6 . Therefore we can apply its weak instrument critical values to identify weak instrument bias for univariable and multivariable twosample MR 5,6,7 . The weak instrument critical values derived by Stock and Yogo (2005) for the bias of the 2SLS estimator relative to the OLS estimator have a non-central χ 2 distribution, with L degrees of freedom and a non-centrality parameter that is a function of L and K, divided by K.
These critical values are derived under the definition that the instruments are weak when the bias of the IV estimator relative to the OLS estimator is at least 10%. The measure of relative bias used is the squared bias of the IV estimator (β IV ) relative to the squared bias of the OLS estimator (β OLS ). This is given by the equation; Where Σ X = plim 1 n X X and X here represents the n×K matrix of all of the exposures included in the estimation. Calculating the bias in this way standardises the exposures X so they are orthogonal and have unit standard deviation. However it mean that the bias of the estimated effect of any particular exposure may differ from 10%. If F T S is larger than the relevant Stock-Yogo critical value we can reject the null hypothesis that the exposure is only weakly predicted by the instruments.
These critical values have only been derived for models including up to 30 instruments, therefore in Table 1 we provide critical values for a larger range of instruments to test for a 5%, 10% or 20% relative bias. These critical values are often approximated to a rule of thumb of F > 10 to test a null hypothesis that the bias is at least 10% of the bias of the OLS estimator. The critical values given above also show that the rule of thumb of 10 is slightly smaller than the true critical value for this test and would lead to the null hypothesis being rejected more frequently. The two sample F T S statistic tests the bias of the model as a whole, this means that the sign of the bias of an individual causal parameter may differ from that of the model's bias, which is averaged across all of its constituent parameters.

Weak Instrument Robust Two-sample MVMR Estimation in the presence of weak instruments
In the presence of weak instruments, standard inverse variance weighted estimation of the MVMR mode, which we refer to as MVMR-IVW, is biased. The reasons for this will be explained in more detail below. The LIML estimator has previously been proposed as an alternative estimator for individual-level MR as it is less biased when there are many weak instruments 15 . In the two-sample summary data setting, Bowden and colleagues 16 and Zhao and colleagues 17 show that weak instruments can be effectively mitigated through minimisation of an appropriate heterogeneity statistic using weights that account for the variance of the SNP-exposure associations is analagous to one-sample LIML estimation. It gives results that are substantially less biased than conventional regression based IVW estimates in the presence of a non-zero causal effect. The weak instrument robust estimation proposed by Bowden and colleagues can be extended to the MVMR setting as a minimisation of; over β, Where β is a vector of causal parameters (to be estimated),Γ j is the estimated effect of SNP j on the outcome,π j is a vector of effects of SNP j on each exposure included in the estimation (i.e. a column of the matrix Π) and; Here, σ 2 y,j is the variance of the estimated effect of the SNPs on the outcome, and Σ V,j is the variance-covariance matrix defined in equation (5). This is equivalent to minimisation of the Q A statistic to test for heterogeneity described in Sanderson et al 2019 4 extended to a model with more than two exposures. We label estimates for β obtained in this manner asβ Q . The standard MVMR-IVW estimate is vunerable to weak instrument bias because instead of minimising Q A in (8) using the full weights defined in (9) it incorrectly assumes that σ 2 A j = σ 2 y j . This ignores the component of variation from β Σ j β and is only valid if either all elements of β are zero or Σ j is negligable by comparison to σ 2 yj .

Testing for pleiotropy in the presence of weak instruments
Horizontal pleiotropy -where genetic variants influence the outcome through multiple phenotypes can lead to a violation of the IV assumptions if they are not included as exposures in the MVMR estimation. Under the assumption that not all the SNPs included in the estimation have a pleiotropic effect on the outcome through the same pathway, this will lead to greater variation in the estimated causal effect of the exposures on the outcome than would be expected by chance. This excess heterogeneity can be reliably tested for using the minimised Q A statistic. More formally if all SNPs used in the MVMR analysis are valid instruments, in the sense that they identify a common set of causal parameters β, we would expect the Q A statistic in (6) evaluated at β =β Q to follow a Chi-squared distribution with L-K degrees of freedom. Crucially, the test is exact in the sense that it will achieve its nominal type I error rate, even in the presence of weak instruments. 18 .
The standard Q-statistic used to generate the MVMR IV estimate by setting σ 2 A,j = σ 2 y,j , referred to here as Q IV W , will generally have an inflated type 1 error rate (i.e. will detect pleiotropy too often when none is present) unless all β Σ j β terms are negligible.

Estimation in the presence pleiotropy and weak instruments
Estimation of β through minimisation of (8) will give estimates of the direct effect of each exposure on the outcome that are robust to weak instruments. However, these estimates will still be biased in the presence of pleiotropy. In order to account for heterogeneity due to pleiotropy, we extend the estimation of β by adding a pleiotropy variance parameter τ 2 to the multivariable Q estimation and finding the joint value of (β, τ 2 ) which minimises; We refer to the causal estimates derived in this way asβ Q,het This is a extension of the method described in 16 for univariable MR to the MVMR setting. Although it will account for balanced pleiotropy, where there are equal positive and negative pleiotropic effects of the SNPs on the outcome, this method of estimation will not adjust for directional pleiotropy where the pleiotropic effects of the SNPs on the outcome all, or mostly, act in one direction to either increase or decrease the outcome. However, it is possible to look at the individual contribution of each SNP to Q A to identify the largest outliers. If a small number of SNPs are observed to have a large effect on Q A they can potentially be removed as a sensitivity analysis and the MVMR model re-estimated without them.

Obtaining confidence intervals for estimated effects
Estimation of β and τ 2 through minimisation of Q A , does not provide readily available and reliable standard errors. We therefore suggest that standard errors are obtained, and confidence intervals calculated, through a Jackknife procedure.
We propose the use of Jackknife rather than a bootstrap as with a moderate number of SNPs the repeated sampling in a bootstrap can lead to very weak instruments in any particular iteration even when the model has relatively strong instruments as a whole. A jackknife procedure estimates the model leaving out each SNP in turn and then calculates the standard deviation of the effect estimate from these results. As each iteration includes all but one of the SNPs and includes each SNP only once this is unlikely to be affected by weak instruments due to the exclusion of some SNPs.

Estimation of σ ij
So far we have assumed that the pairwise covariance between a SNPs estimated association with any two exposures is known for all exposures and all SNPs. However, this data is not generally reported by GWAS summary statistics. Similarly it would not be feasible for these studies to report this data due to the large number of potential covariances that could be required for all potential future MVMR analyses. Results from our simulations (given in section 5) show that estimation without these covariances (i.e. imposing the assumption σ i,j = 0, i = j) can have large effects on the results obtained, altering the interpretation of the results. Excluding these covariances will give the correct estimation only under the global null (β = 0).
Therefore, in this section we suggest three different solutions for dealing with the lack of co-variances in the GWAS summary results in order to estimate σ km,j : the covariance betweenπ k,j andπ m,j with respect to exposure, k, exposure, m (k = m) and SNP j.
Estimate σ km,j from the individual level data If some or all of the individual level data that was used in the GWAS to estimate the SNP -exposure associations is available then the covariances for the effect of each SNP on each exposure can be calculated from equation 6.
Estimate the phenotypic correlation between the exposures from individual level data The covariance for each SNP can then be approximated as; where ρ km is the correlation between X k and X m (or phenotypic correlation). As each SNP explains only a small proportion of the variation in each exposure, the covariance between the exposures provide a very good approximation for the covariance between the error terms in the SNP exposure associations. Although ideally this information would be calculated from the data used for the GWAS study, ρ km could also be estimated from only part of the data used in the GWAS or from an alternative dataset which is thought to have a similar structure.
Estimate the effect of the SNPs on each exposure from separate samples Estimating the effect of the SNPs on each exposure in this manner means that the covariances will be zero and so excluding this information will not affect the statistics calculated. For an MVMR analysis involving K exposures, this would require K + 1 separate samples and so is likely to only be practicable in a limited number of cases.
In any given scenario some of these solutions may be impossible (due to a lack of data) and of the solutions that are possible, one may be the most reasonable. We suggest that estimation of ρ km from phenotypic data, from which the appropriate covariances can then be calculated, is likely to be the most feasible and appropriate approach in many cases.

Simulation Results
To illustrate the methods presented so far give here results from simulating and fitting MVMR models with 200 SNPs and either 2 or 3 exposures. parameter σ i,j , i = j was estimated from calculation of the phenotypic correlation between X 1 and X 2 as described in section 4. The set up of this model is illustrated in Fig. 3 and results from the simulation are given in Table 2. Results for the same model without the pleiotropic effect of the SNPs on the outcome are given in Supplementary Table S Table 2 also gives F T S and β Q,het estimated without accounting for σ km , labelled F T S,0 andβ Q,het,0 respectively. This imposes the assumption that σ km = 0 k = m but not the assumption that σ 2 k = 0 and so is a point between standard MVMR-IVW estimation andβ Q,het . These results also show that in the presence of conditionally weak instruments, when there is correlation between the effect of the SNPs on each exposure, if these correlations are not taken into account F T S,0 does not reliably test the strength of the instruments andβ Q,het,0 produces biased estimates of the effect of each exposure on the outcome.  Table 2 Three exposure model  on the third exposure). This set up means that when only the first two exposures are included in the estimation there is directional pleiotropy present, however when all three exposures are included there is potential weak instrument bias. The model under which the data was generated is illustrated in Fig. 4 and results are given in Table 3.

MVMR model with two exposures
We give results from estimation of the model firstly including only two exposures,   Table 3 Heterogeneity Testing Table 4 gives the rejection rates when using Q IV W and Q A to test for pleiotropy for the model considered in Figure 2. In addition, we show rejections rates using a third heterogeneiyt statistic that attemps to improve Q σ 2 y by extending the weights so they take the form σ 2 y + β Σ j β. We

Application
In this section we illustrate the use of the methods described above through an application to the estimation of the effect of multiple metabolites to age-related macular degeneration (AMD). AMD is disease that causes loss of central vision and is a leading cause of blindness 19 . Elevated lipid serum levels have previously been associated with increased risk of AMD 20 . We use data from a Genome-wide association study (GWAS) of 118 metabolites by Kettunen et al 2016 14 as our The GWAS data included 118 potential metabolite exposures. For the purposes of illustration we restricted the analysis to 14 metabolites moderately well predicted by a large number of SNPs.
From the 150 SNPs included in the data we selected a set of 78 which were associated with at least one of our exposures with an F statistic greater than 5. Table 5 gives MVMR-IVW estimates of all of these metabolites against AMD. As well as the MVMR-IVW estimation results, Table 5 also includes; the mean F-statistic for the SNPs associated with each metabolite (F individual ), the mean F-statistic across all of the SNPs included in the analysis for each metabolite (F all ) and the conditional F-statistic for each metabolite (F T S ). The correlation between the metabolites was not available from the GWAS data used here. We therefore calculated these using external data on the same metabolites from the Avon Longitudinal Study of Mothers and Children (ALSPAC) 25,26 .
A description of the ALSPAC study is given in the supplementary material. The F-statistics and conditional F-statistics presented show that although each metabolite is strongly predicted by the SNPs associated with it, and most have a F all > 10, the conditional F-statistics for each exposure are very small and therefore the effect estimates are subject to weak instrument bias.
This set of metabolites can be divided into four groups, IDL, LDL, small VLDL and very small VLDL. In Table 6 we present the results for the calculation of the F-statistics F all and F T S for each of these groups. They show that, with the exception of IDL.PL and small VLDL, the SNPs  These results are given in Table 7. Although the exposures here are jointly moderately strongly predicted by the set of SNPs the conditional F-statistics for each exposures are still between 4.2 and 8.3 indicating that there may be some weak instrument bias. In Table 8 we re-estimate our MVMR model using the weak instrument robust methodology presented earlier. These results show that the initial MVMR including all of the metabolites could give misleading results due to weak instrument bias. Q A for this model is 118, the critical value at a 5% level of significance for a chi-squared distribution with 64 degrees of freedom is 84.7. It therefore indicates potential pleiotropy and we consider theβ Q,het to be the most appropriate estimates in this case. The results from this analysis suggest that none of the metabolites considered are causally associated with AMD but that the standard MVMR IVW estimates for the final model were biased due to both weak instruments and pleiotropic effects of the SNPs on the outcome. This null result is consistent with other results using an alternative method to analyse the same data which found that HDL (not included in this analysis) was the only lipid fraction type that was causally associated with AMD 24 .

Software
We have written an R package MVMR which facilitates the implementation of MVMR estimation and corresponding sensitivity analyses. The package requires summary data on instrumentexposure and instrument-outcome associations, as well as information on the pairwise covariances of the error in the estimated association between each SNP and each pair of exposures. As these covariances are often not available the software can be implemented in three ways; estimating the covariances from individual level data, approximating the covariances from the phenotypic correlation between the exposures or assuming that these covariances are zero.

Workflow
Fitting and interpreting MVMR using the methods described in this paper, including tests for instrument strength and horizontal pleiotropy, is performed using a five-step procedure. Initially, summary data should be provided, including a covariance matrix for the effect of the genetic variants on each exposure. As such covariances are not conventionally reported in publicly available data, two functions snpcov_mvmr() and phenocov_mvmr() can be used to generate the covariance matrix.
The function snpcov_mvmr() estimates the covariance terms directly from individual level data, whilst phenocov_mvmr() uses the phenotypic correlation and summary data (input by the user) to generate estimates of the covariances.
As a second stage, the summary data is reformatted using the function format_mvmr() into a data frame which is subsequently used as the input for estimation and sensitivity analyses.
We then provide the functions strength_mvmr() to evaluate instrument strength using the two sample conditional F-statistic described in Section 2. Tests for horizontal pleiotropy are performed using pleiotropy_mvmr(), performing both standard and Q-minimisation approaches simultaneously (see section 3 for more details). Finally, causal effects can be estimated using two different approaches; fitting an inverse variance weighted (IVW) MVMR model using ivw_mvmr()and minimising the Q-statistic allowing for heterogeneity using qhet_mvmr(). Each step in the MVMR workflow is illustrated in Figure 5. The MVMR package is available to download at https://github.com/WSpiller/MVMR/. The package also includes a detailed tutorial demonstrating functionality of the package in an analyses of the effects of lipid fractions upon systolic blood pressure using data from the Global Lipids Genetics Consortium and UK Biobank.

Discussion
In this paper we develop a general statistical framework for conducting two sample MVMR analyses for an arbitrary number of exposures in the presence of weak instrument bias and pleiotropy.
The methods presented here give ways to test for weak instruments in two-sample MVMR and to robustly test for heterogeneity due to pleiotropy in the presence of moderately weak instruments. We additionally give a method to estimate causal effects in the presence of moderately weak instruments which is robust to balanced pleiotropy.
Weak instruments are a potential issue in many applications where estimating direct effects of multiple exposures using MVMR is preferred over univariable MR analyses , which are thought to be likely to be affected by directional pleiotropy 27,28,29,30,31 . MVMR approaches are also used to gague the extent to which one exposure mediates the effect of another on the outcome 32,33 .Any application of MVMR will be biased by conditionally weak instruments and, as illustrated by our application, this can occur even when the genetic variants strongly predict each exposure individually. Therefore, the methods presented here are important as they provide a way to identify and correct for weak instruments in two-sample MVMR estimation.
The F T S statistic described here is calculated using estimates ofδ calculated from an IVW estimation of the effect ofπ −k onπ k . An alternative method of estimation, equivalent to that described for estimation of β, is to directly minimise its constituent Q x k to obtain LIML estimates for δ 16,17 . Whilst this procedure enacted on the Q A statistic furnishes attractive, weak instrument robust causal estimates, initial simulation results (not reported here) showed limited benefit of estimating δ in this way therefore we did not investigate potential implementation further.
There are a number of limitations to this work. The test statistic and weak instrument robust estimation requires an estimate of the covariance between the error in the estimated effect of each SNP on each exposure. Although this data is generally not available we propose a method to estimate it, using the phenotypic correlation between the exposures, which can be used to obtain a reasonable approximation if the relevant covariance when each SNP only explains a small proportion of each exposure. We hope that our work will influence future GWAS consortia to release this information as a matter of course, in order to enable the straightforward application of estimates of the direct effect of each exposure. We propose using a jackknife to estimate these standard errors. This does however make the estimation of these statistic more computationally intensive than would the case if the standard errors could be calculated analytically. Q IV W,up -A heterogeneity test for MVMR that uses the IVW point estimates but accounts for the uncertainty in the estimated SNP-exposure associations. This test over rejects the null in the presence of weak instruments, but to a lesser extent that Q IV W .
Q A -A heterogeneity test for MVMR that is robust to weak instruments, in the sense that it has the appropriate type 1 error rate in the presence of weak instruments.

Estimation statistics;
β IV W -Estimates of the causal effect of each exposure on the outcome, estimated using standard inverse variance weighting.
β Q -Estimates of the causal effect of each exposure on the outcome, estimated through minimisation of Q A . Robust to weak instruments.
β Q,het -Estimates of the causal effect of each exposure on the outcome, estimated through minimisation of Q A with an additional parameter to account for heterogeneity. Robust to weak instruments and pleiotropy.

Box 2: Recommended tests in Two-sample MVMR.
In all two-sample summary data MVMR estimation two statistics should be calculated; 1. Conditional F statistics, F T S , for each exposure.
These test the strength of the genetic variants to predict each exposure in the multivariable mode. F T S < 10 suggests potential weak instrument bias in the MVMR estimation.
2. A Q-statistic for heterogeneity, Q A , for the model.
Rejection of Q A using standard significant levels (e.g. p < 0.05) indicates potential pleiotropy in the form of excessive heterogeneity in the MVMR model. However, this test will often reject in the presence of weak instruments.
If weak instruments are detected, i.e. any of the F T S values are less than 10, IVW MVMR estimates are potentially biased. When large numbers of SNPs are available this can be corrected through; 3. Estimatingβ Q,het for each exposure This method gives estimates of the direct effect of each exposure on the outcome that are robust to (moderately) weak instruments.
4. An updated Q A,min which minimises the Q statistic over β Q .
This test provides a test for heterogeneity that has the correct size in the presence of weak instruments. Rejection of Q A,min using standard significant levels (e.g. p < 0.05) indicates potential pleiotropy in the MVMR model even in the presence of moderately weak instruments.
All of these tests and estimation statistics are provided in the MVMR R package.

Code availability
The code used to conduct the simulations and applied analysis is available at https://github.com/eleanorsanderson/MVMRweakinstruments. The MVMR package is available at https://github.com/WSpiller/MVMR/.

Author contributions
ES and JB devised the project. ES conducted the analysis and wrote the first draft of the paper.
WS developed the software package. All authors reviewed and approved the final version. This publication is the work of the authors and they will serve as guarantors for the contents of this paper.