Improving on a modal-based estimation method: model averaging for consistent and efficient estimation in Mendelian randomization when a plurality of candidate instruments are valid

Background A robust method for Mendelian randomization does not require all genetic variants to be valid instruments to give consistent estimates of a causal parameter. Several such methods have been developed, including a mode-based estimation method giving consistent estimates if a plurality of genetic variants are valid instruments; that is, there is no larger subset of invalid instruments estimating the same causal parameter than the subset of valid instruments. Methods We here develop a model averaging method that gives consistent estimates under the same ‘plurality of valid instruments’ assumption. The method considers a mixture distribution of estimates derived from each subset of genetic variants. The estimates are weighted such that subsets with more genetic variants receive more weight, unless variants in the subset have heterogeneous causal estimates, in which case that subset is severely downweighted. The mode of this mixture distribution is the causal estimate. This heterogeneity-penalized model averaging method has several technical advantages over the previously proposed mode-based estimation method. Results The heterogeneity-penalized model averaging method outperformed the mode-based estimation in terms of effciency and outperformed other robust methods in terms of Type 1 error rate in an extensive simulation analysis. The proposed method suggests two distinct mechanisms by which inflammation affects coronary heart disease risk, with subsets of variants suggesting both positive and negative causal effects. Conclusions The heterogeneity-penalized model averaging method is an additional robust method for Mendelian randomization with excellent theoretical and practical properties, and can reveal features in the data such as the presence of multiple causal mechanisms. (249 words) Key messages We propose a heterogeneity-penalized model averaging method that gives consistent causal estimates if a weighted plurality of the genetic variants are valid instruments. The method calculates causal estimates based on all subsets of genetic variants, and upweights subsets containing several genetic variants with similar causal estimates. The method is asymptotically effcient and does not rely on bootstrapping to obtain a confidence interval, nor is the confidence interval constrained to be symmetric. In particular, the confidence interval can include multiple disjoint intervals, suggesting the presence of multiple causal mechanisms by which the risk factor influences the outcome. The method can incorporate biological knowledge to upweight the contribution of genetic variants with stronger plausibility of being valid instruments.


Introduction
Mendelian randomization is an epidemiological approach for making causal inferences from observational data by using genetic variants as instrumental variables [1,2]. If a genetic variant is a valid instrument for the risk factor, then any association of the variant with the outcome is indicative of a causal effect of the risk factor on the outcome [3]. When there are multiple genetic variants that are all valid instrumental variables, and under certain parametric assumptions (most notably that all relationships between variables are linear and there is no effect modification), an efficient test of the causal null hypothesis as the sample size increases can be obtained using the two-stage least squares method (based on individual-level data) [4] or equivalently the inverse-variance weighted (IVW) method (based on summarized data) [5]. With uncorrelated instruments, the IVW estimate (equal to the two-stage least squares (2SLS) estimate) is a weighted mean of the Wald (or ratio) estimates obtained separately from each individual instrumental variable.
While the 2SLS/IVW estimator is asymptotically efficient, it is not robust to violations of the instrumental variable assumptions. Specifically, if a genetic variant is a valid instrument, then the ratio estimate based on that variant is a consistent estimate of the causal effect.
Hence the weighted mean of these ratio estimates is a consistent estimate of the causal effect if all genetic variants are valid instruments, but not in general if at least one variant is not a valid instrument [6]. This has motivated the development of robust methods for instrumental variable analysis based on only a subset of the genetic variants being valid instruments. For example, Kang et al. developed a method using L1-penalization that gives consistent estimates if at least 50% of the instrumental variables are valid [7]. Bowden et al. considered simple and weighted median methods that again are consistent if at least 50% of the instrumental variables are valid; the simple median method is a median of the variantspecific ratio estimates [8]. Most recently, Hartwig et al. have developed a modal-based estimation method that provides a consistent estimate under the zero modal pleiotropy assumption (ZEMPA) [9]. This assumption states that, in large sample sizes, the largest subset of variants with the same ratio estimate comprises the valid instruments. Invalid instruments may have different ratio estimates asymptotically, but there is no larger subset of invalid instruments with the same ratio estimate than the subset of valid instruments.
Intuitively, this means that the true causal estimate can be identified asymptotically as the mode of the variant-specific ratio estimates.
While the idea of a modal-based estimate has merit, there are several issues with the implementation of Hartwig's modal-based estimate that could be improved upon. In particular, their implementation of this approach fits a kernel density-smoothed function to the variant-specific ratio estimates, and calculates confidence intervals based on the median absolute deviation of a bootstrapped distribution. Varying the bandwidth of the kernel density can result in substantial changes to the estimate and its confidence interval, as demonstrated later in this paper.
In this paper, we propose an alternative way of constructing a density function for the causal effect estimate as a heterogeneity-penalized weighted mixture distribution. This approach upweights estimates that are supported by multiple genetic variants, but severely downweights heterogeneity. We show that the mode of this distribution will be an asymptotically consistent estimator of the causal effect if a weighted plurality of the genetic variants are valid instruments. We first introduce this method, and then we demonstrate its performance in a simulation study compared to other robust methods. We consider its behaviour in two applied examples. Finally, we discuss the results of this paper and their relevance to applied research. In particular, we consider how to incorporate biological knowledge into the weighting procedure. Software code for implementing the proposed method is provided in the Supplementary Material.

Methods
In this section, we first introduce the data requirements and parametric assumptions necessary for summarized data Mendelian randomization. We then recall the inverse-variance weighted method, and subsequently introduce the model averaging procedure proposed in this paper.

Data requirements and assumptions
For practical reasons, many modern Mendelian randomization investigations are conducted using summarized data on genetic associations with the risk factor (X) and outcome (Y ) taken from univariable regression models of the risk factor (or outcome) regressed on the genetic variants in turn [10]. We assume, as is common in applied practice, that the genetic variants are all uncorrelated (not in linkage disequilibrium). For each genetic variant G j (j = 1, 2, . . . , J), we assume that we have an estimateβ Xj of the association of the genetic variant with the risk factor obtained from linear regression. Similar association estimates are assumed to be available for the outcome (β Y j ). The standard error of the association estimate with the outcome is se(β Y j ). If any of the variables is binary, then these summarized association estimates may be replaced with association estimates from logistic regression; as has been shown previously, the interpretation of the causal estimate in this case is not clear due to non-collapsibility, but estimates still represent valid tests of the causal null hypothesis [11,12]. See Bowden et al. [13] for a more detailed exposition of the parametric assumptions typically made in summarized data Mendelian randomization investigations that are also made here.

Inverse-variance weighted method
The ratio estimate based on genetic variant j isθ j =β Y j /β Xj , with standard error taken as se(θ j ) = se(β Y j ) /β Xj (the leading order term from the delta expansion for the standard error of the ratio of two variables). The inverse-variance weighted (IVW) estimate is a weighted mean of the ratio estimates: The same estimate can be obtained from the weighted regression: For uncorrelated variants, this estimate is also equivalent to the estimate obtained from two-stage least squares, a method typically used for instrumental variable analysis with individual-level data [5]. These estimates do not take into account uncertainty in the genetic associations with the risk factor; however, these associations are typically more precisely estimated than those with the outcome, and ignoring this uncertainty does not lead to inflated Type 1 error rates in realistic scenarios [14].
The standard error of the IVW estimate based on a fixed-effect meta-analysis model is: We also consider a multiplicative random-effects model based on the weighted linear regres- where ψ is the residual standard error. Most statistical software packages estimate this additional parameter by default in a weighted linear regression model. A fixed-effect analysis can be performed by fixing the value of ψ to 1 [15]. To ensure that the standard error of the IVW estimate is never more precise than that from a fixed-effect analysis, we allow ψ to take values above 1 (corresponding to over-dispersion of the genetic association estimates), but not values below 1 (corresponding to under-dispersion). If all genetic variants estimate the same causal parameter, then ψ should tend to 1 asymptotically.

Heterogeneity-penalized model averaging method
We seek to estimate a distribution with the property that the mode (the maximum value) of the distribution will tend to the true causal effect when a plurality of the genetic variants where the kth normal distribution has mean and standard deviation corresponding to the IVW estimate and standard error based on all the variants in the kth subset: where σ k = (σ k1 , σ k2 , . . . , σ kJ ) : σ kj ∈ {0, 1} represents a subset of the genetic variants, j ∈ σ k when σ kj = 1 (this means thatθ IV W,k is the IVW estimate based on all the variants in subset k), andψ k = max(1, where K is the number of variants included in subset k. The random-effects versions of the standard error se(θ IV W r,k ) are used in this mixture distribution to appropriately allow for heterogeneity between the variant-specific ratio estimates in the overall causal estimate.
The weight given to each of these normal distributions is calculated as: Aside from the constant term, this is the likelihood assuming the variant-specific ratio estimatesθ j are normally distributed about a common meanθ IV W,k with variant-specific standard deviation se(θ j ). Weights will be larger when more variants are included in the subset k due to the se(θ j ) −1 terms, but they will reduce sharply if there is more heterogeneity between the variant-specific ratio estimates for variants in the subset than would be expected due to statistical uncertainty alone if all variants estimated the same causal parameter. If the variant-specific ratio estimates for variants in a particular subset substantially differ, then the weight for that subset will be low. Note that the reason for excluding subsets with one variant is that heterogeneity cannot be estimated for these subsets. We then normalize the weights so that they sum to one: The causal estimate is the mode of the the mixture of normal distributions using these weights:θ

Consistency and efficiency
In the asymptotic limit for a fixed number of genetic variants but as the sample size tends to infinity (and hence the standard errors of the ratio estimates decrease to zero), the weighted mixture distribution tends to a series of spikes about the IVW estimates based on each subset of variants. The height of each spike depends on the total weight of variants that have that causal estimate, and the tallest spike is the estimate with the greatest weight of evidence.
The modal estimate will be the IVW estimate corresponding to the subset k of variants all having the same ratio estimate which has the greatest product of the inverse standard errors of the ratio estimates ∏ j∈σ k se(θ j ) −1 . Therefore a consistent estimate is obtained under a Hartwig's weighted ZEMPA assumption [9]. The intuition of this assumption is that a weighted plurality of the genetic variants is required to be valid instruments (as opposed to median-based methods that require a majority or weighted majority of variants to be valid instruments). The term 'plurality' is taken from the terminology of elections; a political party winning more votes than any other is said to have a plurality of the votes.
Under this assumption, the heterogeneity-penalized model averaging method is asymptotically efficient, as the weight of the IVW estimate based on all the valid instruments will increase to 1 as the sample size tends to infinity. This can be seen as the weight for any subset containing variants with different ratio estimates will decrease to zero rapidly. The weight of the largest subset of variants with the same ratio estimates will be the greatest of all subsets by the ZEMPA assumption, and the ratio of this weight to all other weights will increase to infinity as the sample size increases. However, asymptotic efficiency is not necessarily an important property in practice, as infinite sample sizes are rarely encountered in applied investigations. The model averaging estimate should be efficient for finite sample sizes when several variants have similar ratio estimates.

Inferences on the weighted model-averaged distribution
We perform causal inferences based on the model-averaged distribution using a generalized likelihood ratio test to construct a confidence interval. We take twice the log-likelihood function, and construct a confidence interval consisting of all points for which twice their loglikelihood is within a given vertical distance from the modal estimate. For a 95% confidence interval, this distance is 3.841 (half of the 95th percentile of a chi-squared distribution with 1 degree of freedom). This is based on the result that twice the difference in the log-likelihood at the estimate and at the true value of the parameter has a chi-squared distribution (here with 1 degree of freedom as the parameter is 1-dimensional). This results in inference without requiring resampling techniques (such as bootstrapping). The confidence interval is not guaranteed to be symmetrical, or to be a single range of values (see later for an example of a bimodal weighted distribution resulting in a composite confidence interval).
Practically, the modal estimate and confidence interval were obtained using a grid search approach. The likelihood was evaluated at a series of points (in the simulation study, from −1 to +1 at intervals of 0.001 -so estimates and confidence intervals were estimated to 3 decimal places). The modal estimate was taken as the point with the greatest value of the likelihood function, and the 95% confidence interval was taken as the set of points for which twice the log-likelihood was within 3.841 of the twice the log-likelihood at the modal estimate.

Simulation study
To consider the expected performance of this proposed method in realistic situations as well as in comparison to alternative robust methods, we perform a simulation study. We consider four scenarios: 1. no pleiotropy -all genetic variants are valid instruments; 2. balanced pleiotropy -some genetic variants have direct (pleiotropic) effects on the outcome, and these pleiotropic effects are equally likely to be positive as negative; 3. directional pleiotropy -some genetic variants have direct (pleiotropic) effects on the outcome, and these pleiotropic effects are simulated to be positive; 4. directional pleiotropy via a confounder -some genetic variants have pleiotropic effects on the outcome via a confounder. These pleiotropic effects are correlated with the instrument strength.
In the first three scenarios, the Instrument Strength Independent of Direct Effect (InSIDE) assumption [6] is satisfied; in Scenario 4, it is violated. This is the assumption required for the MR-Egger method to provide consistent estimates.
We simulate data for a risk factor X, outcome Y , confounder U (assumed unmeasured), and J genetic variants G j , j = 1, . . . , J. Individuals are indexed by i. The data-generating model for the simulation study is as follows: In total, 10 000 datasets were generated in each scenario. We considered a two-sample setting in which genetic associations with the risk factor and outcome were estimated on non-overlapping groups of 20 000 individuals. We compared estimates from the proposed heterogeneity-penalized model averaging method with those from a variety of methods: the standard IVW method, MR-Egger [6] (both using random-effects), weighted and simple median [8], and the mode-based estimate (MBE) of Hartwig et al. [9]. Each of the methods was implemented using summarized data only.

Results
Results for all of the methods are provided in Tables 1 (Scenario 1) and 2 (Scenarios 2 to 4).
We provide the mean estimate, the standard deviation of estimates, the mean standard error (

Low-density lipoprotein cholesterol and CAD risk
We consider the causal relationship between low-density lipoprotein (LDL) cholesterol and coronary artery disease (CAD) risk based on 8 genetic variants having strong biological links with LDL-cholesterol. Each of these variants is located in a gene region that either encodes a biologically relevant compound to LDL-cholesterol, or is a proxy for an existing or proposed LDL-cholesterol lowering drug. Genetic associations with LDL-cholesterol were obtained from the Global Lipids Genetics Consortium's 2013 data release [17], and associations with CAD risk from CARDIoGRAMplusC4D's 2015 data release [18]. These associations are displayed graphically in Figure 1 (left panel).

C-reactive protein and CAD risk
We also consider the causal relationship between C-reactive protein (CRP) and CAD risk based on 17 genetic variants previously demonstrated to be associated with CRP at a genomewide level of statistical significance [19]. The biological rationale for this analysis is not to evaluate the causal role of CRP, as several of these genetic variants are not specifically associated with CRP and hence are not valid instruments. The causal role of CRP can be evaluated in a Mendelian randomization analysis using genetic variants in the CRP gene region, the region that encodes CRP [20]. Rather, the biological rationale for this analysis considers CRP as a proxy measure for inflammation more generally, and investigates whether there are any consistent causal relationships between inflammation and CAD risk. Genetic associations with CRP are obtained from Dehghan et al. [19], and associations with CAD risk from the CARDIoGRAM consortium [21]. These associations are displayed graphically in Figure 1 (right panel). Table 3. Estimates represent log odds ratios for   a The heterogeneity-penalized model averaging method does not estimate a standard error. For the risk factor LDL-cholesterol, and assuming normality, the standard error would be 0.062.

Results for both examples are presented in
b The confidence interval in this case is the union of two disjoint ranges.

Discussion
The aim of this manuscript was to develop a mode-based estimation method that provides a consistent estimate of the causal effect under the assumption that a plurality of the genetic variants are valid instruments. In comparison with the MBE method proposed by Hartwig et al., we believe that our method has several technical advantages: 1) it does not rely on the specification of a bandwidth parameter; 2) it makes inferences that do not rely on resampling methods; 3) it makes no asymptotic assumption about the distribution of the causal estimate for making inferences, in particular allowing confidence intervals to be asymmetric and to span multiple ranges; 4) it is asymptotically efficient, and should be efficient in finite samples, as the method seeks to upweight the IVW estimate based on the largest number of variants with homogeneous ratio estimates. One particular concern with the MBE method is that the precision of the estimate is highly variable depending on the choice of bandwidth parameter. There would be a great temptation as an applied researcher to perform the method for a variety of values of the bandwidth parameter, and choose the bandwidth parameter corresponding to the most desirable estimate.
The proposed heterogeneity-penalized model averaging method also outperformed Hartwig's method in the simulation study, and in the applied examples. No sizeable inflation in Type 1 error rates was observed across the simulation scenarios when 2 or 3 of the 10 genetic variants were invalid, and bias and Type 1 error rates were generally either better or no worse than for other robust methods. The method was also at least as efficient as other robust methods when all variants were valid instruments, and had reasonable power to detect a causal effect throughout.
One An extension of the method that could be valuable in applied practice is the use of prior information on particular variants. This can be achieved by multiplying the unnormalized weights w k by a prior weighting π 0 (k) before normalizing. For example, if an investigator is particularly confident that a genetic variant is likely to be a valid instrument, then models containing this variant can be upweighted. Alternatively, prior weightings of models containing specific variants could be based on biological characteristics of the variants. For example, exonic and/or non-synonymous variants could be upweighted, or variants with functional information relating them to the risk factor. If these variants truly are more likely to be valid instruments, then this prior weighting would add to the robustness of the method. Additionally, a prior could be set to more strongly upweight less parsimonious models (that is, upweight models based on more genetic variants). This could add efficiency to the analysis, as models including more genetic variants will have more precise IVW estimates. Equal prior weights corresponds to a prior belief that 50% of genetic variants are valid instruments. If one instead believed that (say) 80% of genetic variants were valid instruments, then the prior for subset k could be set to π 0 (k) = 0.8 K × 0.2 J−K where J is the total number of genetic variants and K is the number of variants in subset k. The option to set this prior probability is included in the software code.