Using genetic instruments to estimate interactions in Mendelian Randomization studies

BACKGROUND The interactive effect of two exposures on an outcome can be confounded. We demonstrate the use of Mendelian Randomization (MR) to estimate unconfounded additive interactions. METHODS Using simulation, we test an extension to multivariable MR using two-stage least squares to estimate the additive interaction between two continuous exposures on a continuous outcome, including scenarios where one exposure has a causal effect on the other (mediation). The interaction parameters were set to be one third of the main effect’s parameters to impose a limit on the variance explained by interaction terms. We compare the performance of the two-stage least squares estimator to a Factorial MR design, in which genetic risk scores for each exposure are dichotomised to create four groups, akin to a factorial randomized controlled trial. As an illustrative example, we apply factorial MR and the 2SLS estimator to the interactive effect of education and BMI on systolic blood pressure in UK Biobank. RESULTS Our simulations demonstrate that factorial MR has very low statistical power; 5-7% at N=50,000 and 8-23% at N=500,000 across the range of parameters tested. The two-stage least squares estimator had higher power to detect interactions than factorial MR and a lower type I error. For N=500,000, the two-stage least squares estimator had a power ranging from 29.7-92.9% and a type I error ranging from 4-6%. 95% Monte Carlo confidence intervals suggested that the estimator was unbiased to a reasonable degree of accuracy at this sample size. In comparison, the power at N=50,000 was 7-55%. CONCLUSIONS A two-stage least squares estimator using genetic risk scores for each exposure is a more powerful alternative for detecting an unconfounded additive interaction of two exposures on an outcome than existing approaches that rely on dichotomising genetic risk scores, but it requires a large sample size and instruments of adequate strength.


CONCLUSIONS
A two-stage least squares estimator using genetic risk scores for each exposure is a more powerful alternative for detecting an unconfounded additive interaction of two exposures on an outcome than existing approaches that rely on dichotomising genetic risk scores, but it requires a large sample size and instruments of adequate strength.

INTRODUCTION
Epidemiology typically relies on observational data, which can suffer from confounding and reverse causality. Mendelian Randomization (MR) is an application of instrumental variable (IV) analysis, in which genetic variants robustly related to an exposure of interest are used to infer causal associations between the exposure and other phenotypic outcomes (1,2).
Children inherit their genomes from their parents at random. Such randomization means that reverse causality can be ruled out and genetic variants are often less related to environmental confounders (1). Multivariable MR (MVMR) is a method that can be used to estimate the effect of two or more exposures on an outcome (3). The approach is suitable in a range of scenarios, e.g. when two exposures are closely related, or when one exposure is hypothesised to mediate the effect of the other. MVMR requires a set of genetic variants that are associated with the exposure variables but do not affect the outcome other than through these exposure variables.
In many epidemiological research topics, it is of interest to explore whether the effect of one exposure depends on another exposure, i.e. to investigate an interaction (4). A recent study (5) used genetic IVs to apply a factorial design to examine whether lowering LDL-C using ezetimibe, statins or a combination of both is most effective for lowering Coronary Heart Disease (CHD) risk. This '2 x 2 factorial mendelian randomization study design' mimics a factorial Randomized Controlled Trial (RCT) in which participants are randomly allocated to one of four treatment regimens (treatment 1, treatment 2, both treatments 1 and 2, or neither treatment). Genetic risk scores for each treatment exposure were generated and were split at the median, and the resultant four groups (reference, Ezetimibe, Statin, both) compared with respect to their association with CHD (5). While this approach can give an indication of whether interaction might be present, it does not quantify the interaction, and the dichotomisation of genetic risk scores means power is likely to be low. To date, MR has not been used to estimate interaction terms (with the exception of gene-environment interactions(6)) and such methods in an MR context have not yet been thoroughly investigated. External to the MR literature, instrumental variable approaches for interaction terms have been considered, and our proposed approach draws on the models used in these non-genetic IV analyses (7,8).
A common approach to implement MR is two-stage least squares (2SLS), where the exposure is predicted from its instrument and regressed on the outcome with statistical adjustment of the standard errors in the second stage. In this study, we propose an extension to MVMR, using a 2SLS estimator to estimate additive interactions between two continuous exposures on a continuous outcome. We use simulations to assess the performance of the estimator, including scenarios where one exposure has a causal effect on the second exposure, i.e. there is mediation in addition to interaction. We compare the 2SLS estimator with the two-by-two factorial MR design. As an illustrative example, we apply the proposed 2SLS method to the interactive effect of years of schooling and BMI on systolic blood pressure in UK Biobank (9).

METHODS
We explore an extension to MVMR that enables the estimation of an additive interaction between two exposures, by using the genetic risk scores for each exposure (Z1 and Z2) and the product of the two genetic risk scores, i.e. Z=(Z1,Z2,Z1Z2) as the instrumental variables (Z). When the first exposure has a causal effect on the second exposure, i.e. exposure 2 mediates the effect of exposure 1 on the outcome, the instrument is Z=(Z1,Z2,Z1Z2,Z1Z1). We use simulations to evaluate the performance of the 2SLS MVMR estimator for assessing additive interactions, with and without a causal effect of exposure 1 on exposure 2 (mediation), and compare these estimators with a factorial MR design. For both approaches, we consider power and type I error, and for the 2SLS estimators we also evaluate bias and coverage of the interaction coefficient.
Simulation study: data generating process  Standard random normal variables were generated to represent error terms (U, V and E), confounders (C) and genetic instruments (Z1 and Z2) (1 -3). Z1 is the genetic risk score for exposure 1 (A1) and Z2 is the (independent) genetic risk score for exposure 2 (A2).

Equation 4: Description of system
Interactions between the confounder variable (C) and A1 and A2 separately were included in the Y equation (10,11). A1 was allowed to affect A2, implying the effect of A1 on Y is partially 25 combinations of these parameters were tested, which were representative of the parameter space, including scenarios with and without an interaction between A1 and A2, and with and without a causal effect of A1 on A2. A table of parameter permutations is provided in the Supplement (Section 1). The interaction parameters in the Y equation (θ, λ and τ) were set to be one third of the main effect's parameters (β and γ) to impose a limit on the variance explained by interaction terms.
The coefficients of 0.14 and 0.18 for the effect of the instrument on the exposures (A1 and A2) in Equation 4 were chosen to impose an approximate proportion of variance explained by the genetic risk scores of 1-2%. This aims to emulate the R 2 from a typical genetic risk score in the literature.
We simulated twelve sample sizes: N=10,000 to N=100,000 (in steps of 10,000), N=500,000 and N=1,000,000. Models were run 1,000 times for each sample size and results pooled across repeats. All regression analyses and subsequent testing were performed using functions in the R systemfit package (12) version 1.1-22, the AER package (13) version 1.2-5, or using R's lm function.

Two-stage least squares estimation
Instrumental variable regression in AER (13) (using the ivreg command) or systemfit (12) was implemented. This uses the instruments Z to generate predicted values for A1, A2 and

Factorial MR estimation
We modelled the factorial MR approach, by splitting the two genetic instruments Z1 and Z2 at the median to generate four subgroups representing low A1-low A2, low A1-high A2, high A1-low A2 and high A1-high A2. An ordinary least squares regression of the outcome against these four categories was performed (low-low was treated as the reference category) and a linear combination of the regression coefficients tested using an F statistic and Wald test (using the -linearHypothesis-R function from the car package (14)) to conclude presence or absence of an interaction between A1 and A2 on Y. The linear hypothesis we tested was that the sum of the regression coefficients for the low A1-high A2 and the high A1-low A2 categories equalled the high A1-high A2 coefficient. Rejection of this hypothesis at p=0.05 was regarded as suggestive of an interaction between the two exposures. This formal interaction test was not included in the original application of factorial MR (5) but we have adopted it to enable comparisons with the 2SLS approaches.

Evaluation of approaches to estimate interaction
Coefficients and standard errors were collapsed across the 1,000 simulation runs; the regression coefficients were averaged to give ̂= ∑1 000 =1

1000
. This provides a measure of the bias. The 95% Monte Carlo confidence interval (MC CI) (15) for these coefficients was calculated using the formula ̂± 1.96 , where sd is the sample standard deviation. If the 95% MC CI includes the true value we conclude that the regression coefficient from this estimating procedure is unbiased (15). We also examined the mean estimated standard error of ̂ across simulation repeats and calculated the standard deviation of the regression coefficients ( (̂)) across repeats to evaluate precision (16). As the number of simulation repeats tends to infinity, (̂) estimates the true precision of the estimator and deviation from the mean estimated standard error of ̂ is indicative of bias in the estimate of precision. We also considered power, type I error and coverage. Power was defined as detecting an interaction at the 5% level (two-sided test) when an interaction (θ≠0) was present. A power of 68% therefore represents 680 of the 1,000 repeats detecting an interaction at p≤0.05 when the true interaction parameter was non-zero. Type I error was similarly calculated by counting the number of repeats where an interaction was detected at p≤0.05 when the true parameter value was zero. 'Interaction detection' was defined as the 95% confidence interval not including 0 for the 2SLS and observational linear regression approaches. Coverage was calculated as the percentage of repeats where the 95% confidence interval contained the true parameter value. Due to the nature of the factorial design, coverage could not be calculated because a confidence interval for the interaction parameter is not available. However, power and type I error were calculated by testing for an interaction using the linear contrast described previously.

Sensitivity analyses i. Pleiotropy
Pleiotropy refers to deviation from the assumption that an IV only affects the outcome through the exposure (17). A sensitivity analysis allowing for a pleiotropic effect of A2's instrument on A1 was run with N=500,000. We compared the performance of the 2SLS approach using the instrument Z=(Z1,Z2,Z1Z2,Z1Z1) under a model of no pleiotropy versus pleiotropy using data from 1,000 random variable simulations generated by ten new seeds.
We also considered the power and type I error of the factorial MR approach. The updated A1 equation for the pleiotropic scenario is given by equation 5. To explore whether the use of robust standard errors (e.g. to account for deviations from assumptions of normality or other potential violations of assumptions) affected our findings, the observational and 2SLS regressions (using Z=(Z1,Z2,Z1Z2,Z1Z1)) in the main analysis were repeated with heteroskedastic robust standard errors using the coeftest R function or the robust.se R function (the former from the lmtest (18) R package and the latter from the ivpack (19) R package) with the results examined at N=50,000 and N=500,000.

iv. Weak instrument bias
The Sanderson-Windmeijer F statistics (20) were examined after the 2SLS regressions in the main analysis to test for weak-instrument bias and the correlation with sample size.

ILLUSTRATIVE EXAMPLE
We used data from the UK Biobank (9) women (n=372) were removed from the analysis because of changes to both BMI and blood pressure during pregnancy. We created genetic risk scores for BMI and years of schooling for all UK Biobank participants using the results of recent GWAS studies (21,22). The discovery datasets excluded UK Biobank (the SNPs for years of schooling were taken from the dataset EduYears_Discovery_5000.txt available on the SSGAC website https://www.thessgac.org/data). We selected SNPs from each study with a P value below genome-wide significance (P ≤ 5 x 10-8). We pruned SNPs from the BMI genetic risk score to only include those independent from the education genetic risk score. Prior to pruning, the BMI genetic risk score included 69 SNPs and after pruning, this reduced to 67, while the education genetic risk score included 74 SNPs. Each genetic risk score was calculated as the sum of the effect alleles for all SNPs associated with BMI or education, with each SNP weighted by the regression coefficient from the GWAS from which the SNP was identified.
We searched for proxy SNPs with an R 2 above 0.8 for any SNP missing from UK Biobank using genotype data from European individuals (CEU) from phase 3 (version 5) of the 1000 Genomes project (23).
Age in whole years during baseline assessment centre attendance and sex were used as covariates in all analyses. Genetic analyses were also adjusted for the first 40 principal components to minimise the effects of population stratification. Sensitivity analyses were run with and without adjusting for genotyping array. Further information on UK Biobank and the quality control filtering applied to the data in this manuscript is provided in the Supplement. Quality Control filtering of the UK Biobank data was conducted by R.Mitchell, G.Hemani, T.Dudding, L.Paternoster (24).
STATA's -lincom-command was used to combine coefficients for the purposes of interaction detection assessment for the factorial MR approach. The main effects for BMI and years of schooling were included in the linear regression and instrumental variable models. Details of the predictors used in each model are provided in the supplement (Section 11.3).

Sensitivity analyses:
An additional simulation was run at N=500,000 where A2's instrument had a pleiotropic

DISCUSSION
Our simulation study showed that factorial MR has very low power to detect additive interactions. In contrast, an extension to multivariable MR (MVMR, an approach to estimate the effects of two or more exposures on an outcome, which allows for overlapping instruments between exposures) (27, 28) using genetic risk scores for each exposure and their product as the instrument in 2SLS had greater power. The 2SLS approach was generally unbiased, with reasonable coverage and type I error, but required large sample sizes and strong instrumental variables. In addition to having greater statistical power than factorial MR, 2SLS also has the advantage of being comparable with and providing estimates on the same scale as multivariable linear regression. Factorial MR has an implicit simplicity, which may be attractive to researchers wishing to present findings to clinical or non-statistical audiences, but our results suggest that in most cases, the 2SLS approach will be preferable.
Within the 2SLS framework, we tested two instrumental variables based on the genetic risk scores for two exposures (Z1 and Z2): Z=(Z1,Z2,Z1Z2,Z1Z1) and Z=(Z1,Z2,Z1Z2). The former includes the square of the genetic risk score for the first exposure, motivated by situations when exposure 1 has a causal effect on exposure 2. We found Z=(Z1,Z2,Z1Z2,Z1Z1) to exhibit less bias and have a smaller standard error and mean standard error at smaller sample sizes (N=50,000 and N=100,000); at N=500,000 the differences between the instrumental variables were minimal both in the presence and absence of causal effects of exposure 1 on exposure 2.
Our findings of low statistical power in factorial MR are similar to the literature on factorial randomised controlled trials, which indicates that a fourfold increase in sample size is generally necessary to detect with the same power an interaction of the same magnitude as the main effect (29). In our simulations, power of the 2SLS approaches increased when the simulation was repeated with genetic risk scores that explained a higher percentage of the variance in the exposures; this suggests that the increased power we see with higher sample sizes may be partially driven by increasing instrument strength.
Our simulations were limited to a continuous outcome and interactions on the additive scale; further work is required to test and develop approaches for binary outcomes and multiplicative interactions. Our approach uses individual participant data; assessing interactions within two-sample MR would be challenging and requires further methodological development. We have focused on the ability of the 2SLS approaches to detect additive interaction; we have not investigated the implications of these approaches for the main effects of each exposure. Our main analysis assumed no pleiotropy but a sensitivity analysis showed that pleiotropic effects of the instruments for one exposure on the other can affect the results from 2SLS, likely through making the instruments weaker.
Our simulations modelled scenarios when the first exposure has a causal effect on the second exposure, but we did not explore the potential for bidirectional causal effects.
In our illustrative example, in contrast to the ordinary least squares estimate, neither factorial MR nor either of the 2SLS approaches provided evidence of an interaction of BMI and years of schooling on systolic blood pressure. This could be because the observed interaction is induced by confounding. Alternatively, this could be because the 2SLS and factorial MR approaches are underpowered to detect the interactive effect. The sample size 27 for the analyses exceeded 300,000, and the genetic risk scores explained 1.5% (BMI) and 0.5% (years of education) of the variation in the exposures, suggesting power in this example was reasonable for the 2SLS approaches and unlikely to be the explanation for the null finding. Our findings are interesting in the context of a recent study (6) that examined the interaction of genetic risk for body size and education instrumented by policy reform on diastolic and systolic blood pressure, finding weak evidence for an interaction on diastolic but not systolic blood pressure.
In summary, we have shown that factorial MR has low statistical power in most realistic scenarios, and that 2SLS using genetic risk scores for each exposure is a more powerful technique to estimate an unconfounded additive interaction of two continuous exposures.
However, large sample sizes and instrument strength are crucial to conducting an analysis that is sufficiently powered to detect such effects.