Understanding genetic interactions and assessing the utility of the additive and multiplicative models through simulations

Lina-Marcela Diaz-Gallo; Boel Brynedal; Helga Westerlind; Rickard Sandberg; Daniel Ramsköld

doi:10.1101/706234

Abstract

Interaction analysis is used to investigate the effect which two risk factors have on each other, and on disease risk. To study interactions, both additive and multiplicative models have been used, although their interpretations are not universally understood. In this study, we simulated several scenarios of risk factors relationships and investigated the resulting interactions using additive or multiplicative models. Independent risk factors approach additive effect at low disease prevalence, showing a sub-additive relationship. However, risk factors that contribute to the same chain of events (i.e. have synergy) lead to multiplicative relative risk. Thresholds on the number of required risk factors lead to intermediaries between additive and multiplicative risk. We proposed a novel metric of interaction consistent with additive, multiplicative and multifactorial threshold models. Finally, we demonstrate the utility of the simulation-strategy and discovered relationships by analyzing and interpreting gene-gene odds ratios obtained in a rheumatoid arthritis cohort.

Introduction

Screening the genetics of large cohorts of individuals can identify genetic loci that impact phenotypic traits on a gene-by-gene basis¹, e.g. linking single-nucleotide polymorphisms to traits in genome-wide association studies. These studies have for some diseases resulted in lists containing over hundreds of associated risk genes^2–4. However, these genes have not been sufficient in our understanding of why a particular individual gets a certain disease. Although we know that for several diseases, combinations of genetic and/or environmental risk factors have been observed to have a larger than expected risk when both factors are present, it is still a challenge to resolve if, and how, these multiple factors interact in shaping traits, and to biologically interpret the identified interactions⁵.

The association between individual genetic loci and an outcome (e.g. disease) is typically quantified as odds ratios or relative risks. Often the case-control design is used to query low prevalence diseases, in which odds ratios approximate the relative risk in the population if the samples are unevenly drawn. Interaction tests among risk factors are often examined pairwise, yielding three odds (or risk) ratios notated as⁶: OR₁₁ for carrying both risk factors; OR₁₀ and OR₀₁ for the exclusive combinations, and lack of both risk factors OR₀₀, which is used as reference (OR₀₀=1). Confusingly, two different null models are commonly used, the additive (OR₁₁ = OR₁₀ + OR₀₁ − 1) and the multiplicative (OR₁₁ = OR₁₀ ⋅ OR₀₁). The additive null model builds on work by KJ Rothman⁷, who showed that if two factors are part of the disease’s cause and are part of the same sufficient cause (e.g. pathway), then their join risks will be larger than their sum (often termed “departure from additivity”). This additive model has been criticized for always giving positive results^8,9. On the other hand, the multiplicative model has been criticized as a statistical convenience without theoretical basis, boosted by the implicit multiplicativity in logistic regression^8,10,11.

In this study we strive to lessen the confusion by using simulations of various models of interaction. We thereafter show how additive and multiplicative risk scales compare to our simulation models, aiming to help a broader group of geneticists and epidemiologists to understand and interpret different models of interaction.

Results

Models used for simulation

To better understand risk interactions for qualitative traits, we performed simulations. As their basis, we designed five different models (Model I-V) of interactions with increasing complexity. We will later show how these models are related. Across all simulations the risk factors are dichotomous, and neither necessary nor sufficient for disease to occur.

In the single-group model (Model I), individuals are first split into cases and controls and then subject to independent “spiking in” (i.e. artificial creation) of two risk factors with higher frequency among cases than controls (Figure 1a). The etiological meaning of this model is unclear, but the model has been used in the past, in part due to its simplicity⁹.

Figure 1. Illustration of the five simulated scenarios investigated.

The numbers are example frequencies, and frequencies in bold highlight the higher frequencies of the spiked in risk factors (X and Y) associated with disease. For example, “Y: freq. 0.3” means that each simulated individual in the group had a 30% chance of being assigned the risk factor Y. The numbers in italics are the average frequency in the other group of simulated individuals, note that this will depend on the prevalence (which is adjusted in the models in the split to cases and controls). For the Models IV and V, components 1 and 2 (comp1 and comp2) were used as a strategy to obtain probabilistic risk factors. If the simulated individuals were both in comp1 and comp2, they were assigned cases (in the Model IV) or controls (in the Model V) respectively, in the intersection strategy (or ‘AND’). Whereas, in the union strategy (or ‘OR’), if the simulated individuals were either in comp1 or comp2, they were assigned cases (Model V) or controls (Model IV), respectively.

The next two models (separate-groups models, Model II and Model III) are versions of spiking-in two risk factors into two groups of simulated individuals. These two groups are random and independent from the case and control groups. Each split only has an increase in frequency of one of the risk factors, so the risk factors cannot interact and are thus forced to be independent of each other. In Model II (Figure 1b), the frequency of the non-risk factor corresponds to the overall frequency in each random split (i.e. the same frequency for cases and controls). In Model III (Figure 1c) the frequency of the non-risk factor in each split is instead set to the frequency among the controls in the other split. Model II was designed to mimic mathematical addition, whereas Model III is a low-(trait)prevalence simplification of Models II and V (the latter of which is described below) designed to work for case-control setups.

In the Models IV and V, we simulated more explicit relationships between the two risk factors using the AND/OR relationships from Boolean logic. First, we assign randomly drawn cases and controls to two different types of groups, arbitrary called component 1 and component 2 (comp1 and comp2, respectively). Then, we spike in one risk factor by increasing the frequency among the cases. Finally, we implement the respective Boolean logic to assign case/control status. In Model IV we applied AND to assign cases, meaning that an individual was a case only if it was present in both comp1 and comp2 (Figure 1d). While in Model V, we assigned case status if it was in either comp1 or comp2 (Figure 1e). Because there will be simulated individuals that are comp1 cases and not exposed to the risk factor, and individuals that are comp1 controls yet exposed the risk factor, this is a simulation of two risk factors that are neither necessary nor sufficient to develop disease, but that will have different risk levels. Thus, Model IV requires the two risk factors jointly present for disease to develop, where both X and Y are part of different mechanisms (i.e. synergism between causes). While Model V corresponds to multiple mechanisms yielding disease, with X and Y risk factors taking part in independent mechanism which separately cause the same phenotype (i.e. heterogeneity of causes).

Relative risks

As we wanted to learn how additive and multiplicative risk scales compare to these simulation models, we calculated the relative risk of having both risk factors, for a range of simulated frequencies of the risk factors. We then compared these observed relative risks to the expected value based on additive or multiplicative combination of the relative risk for the individual risk factors, and varied the fraction of cases (corresponding to outcome/disease prevalence) (Figure 2a).

Figure 2. Relative risk and odds ratio relationships for five simulation models.

(a) The relative risks for double risk (RR₁₁) calculated from the simulation models, with boxes summarizing 1,000 simulation runs with different risk factor frequencies. The observed RR₁₁ were compared to the additive and multiplicative combinations of the relative risks for single risk (RR₁₀ and RR₀₁). Boxplots show median and quartiles for the simulations, but extreme values are omitted for clarity. Yellow arrows highlight where the median is visibly close to multiplicativity, while blue arrows do the same for additivity. (b-c) Algebra deriving the expected relative risks for the models IV and V respectively, for two risk factors here called X and Y. A different formula compared to (a) is used, instead of the population-based estimate it uses RR₁₁=risk(X=1,Y=1)/risk(X=0,Y=0) etc. In (c) when the prevalence is decreased, the risk factors are assumed to remain responsible at an unchanged proportion, implicating that their frequencies approach zero at the same rate. (d) As (a), but calculating odds ratios (OR) instead of relative risks. “Fraction of cases” need not be prevalence, as models I-III can represent samples rather than a population. The blanching for models IV and V is because ORs are not an appropriate risk measure at high disease prevalence.

For all fractions of cases, Model II and Model IV stood out. Model II, which had separated risk factors, followed additivity and thus showed no interaction term on the additive. Model IV, with full synergism between the two risk factors, instead followed multiplicativity, and showed interaction on the additive scale.

For the remaining models, Model I, III and V, as the fraction of cases decreased, they became indistinguishable with respect to relative risk, odds ratio (Figure 2)and correlations (Supplementary Figure 1). Implying that they are approximations of one another under low disease prevalence. Model I, the simple model of spiking in two factors with higher frequency in a single group of cases, produced multiplicative effects, indicating that it becomes a version of the logical AND model (Model IV). The logical OR model (Model V) on the other hand showed additive behavior, as did the second separation model (Model III). Reassuringly, we could from algebra derive the same conclusions about multiplicative relative risks for Model IV and about the approaching additive relative risk as prevalence decreases for Model V (Figure 2b-c). The other three models (Model I-III) did not have obvious formulas we could work with. Model V had a lower-than-additive relative risk for the doubly exposed at higher frequency of cases (Figure 2a). This is consistent with the prediction from the algebra (Figure 2c) where high penetrance coefficients (i and k) had the same effect. We could also extend Model IV and V to three-factor models, as tested on a mixed logical AND and OR models which produced a relative risk in line with our results on two-factor models (Supplementary Figure 2).

Theoretically the relative risk for the doubly exposed in Model IV should be RR₁₁=RR₁₀+RR₀₁−1−a⋅b⋅m/(1−m) where a=RR₁₀−1, b=RR₀₁−1, m=(−1+a⋅x−b⋅y−p ± (p²−2⋅a⋅x⋅p−2⋅b⋅y⋅p−4⋅a⋅x⋅b⋅y⋅p−2⋅p+a²⋅x²+2⋅a⋅x+b²⋅y²+2⋅b⋅y+2⋅a⋅x⋅b⋅y+1)^0.5)/2/(−1−a⋅x−b⋅y−a⋅x⋅b⋅y), p=frequency of outcome, x=frequency of X, y=frequency of Y. However we could not get this, nor with the approximation m/(1−m)≈p, to agree with the observed values for RR₁₁ in the simulation (data not shown).

Odds ratios

We calculated odds ratios (Figure 2d), and performed the same comparisons as we had for relative risks, in order to identify good simulation setups for case-control studies. Two models, not the same ones as for relative risk, stably follow additive or multiplicative risk at all simulated frequencies of cases. The odds ratios for the double risk from Model I followed multiplicativity, and Model III produced additive odds ratios. At a low fraction of cases the remaining models (Model II, IV, V) converged the same way as they did for relative risk. The same was true for correlations between the risk factors (Supplementary Figure 1), where only the models that produced additive risk had negative correlation among cases.

Multifactorial thresholds

While we have already shown mechanistic relationships that give rise to additive or multiplicative relative risks and odds ratios, it would also be useful to know what kind of risk factor relationships give rise to intermediary interaction terms between additive and multiplicative. As we were interested in what multifactorial thresholds would do to mathematical risk relationships, we set up a simulation (Figure 3a) with five equally common components than, when reaching a certain threshold, cause disease. The extreme thresholds 1 and 5 correspond directly to the models V and IV respectively and therefore cause additive or multiplicative risk respectively (Figure 3b). More important are the intermediary thresholds, which give double risk relative risks estimates between additivity and multiplicativity. Specifically, we found that the intermediary thresholds produced, for a ratio f_thr=(t−1)/(F−1) where F is the number of components (factors) that can cross the threshold t, that the relative risk for doubly exposed was (1 − √f) ⋅ expected(additive) + √f ⋅ expected(multiplicative) (Figure 3b). Here √f is the square root of fthr. This formula for the double risk can be inverted, and the observed OR₁₁ plugged in; this produced a metric √f_est (signed square root of the estimated multifactorial threshold fraction) with the convenient properties of having both a natural lower value (0 for additive) and a natural higher value (1 for multiplicative) as well as an interpretable scale in-between through its connection to the threshold fraction f_thr (Figure 3c). While the metric f_est = sign(√f_est) ⋅ (√f_est)² had a more natural scale, unlike √f_est this was not symmetric around the median (data not shown). The large spread for √f_est at threshold 1 (Figure 3c) could have resulted from having cases that did not involve components 1 or 2 in this simulation model (Figure 3a) and thus lowering odds ratios, rather than intrinsically from additive risk. √f_est is related to another measure of interaction size, relative excess risk due to interaction (RERI), by √f_est = RERI/(OR₁₀ − 1)/(OR₀₁ − 1) and could perhaps be used instead of measures like attributable proportion, synergy index and RERI, given that it can pinpoint multiplicative risk in testing on additive scale (and vice versa).

Figure 3. A multifactorial threshold model.

There are many possible ways to set up a thresholding effect, here we use one where each simulated individual is assigned into having or lacking five different components, independently between the components so that one individual can have several. Risk factors are spiked in according to two of the components. Simulated individuals are considered cases and controls depending on how many components they were assigned, as in whether the number crossed the given threshold (t). In (c), the red curve is the square root of the threshold fraction; this and √f_est make up the y axis. Notched in the box plots show bootstrapped 95% confidence intervals for the medians.

Example from rheumatoid arthritis

Both the synergism (Model IV) and heterogeneity (Model V) models represent interesting relationships between two risk factors, and the appropriate interaction model to use depends on the hypothesis one is interested in. Given the lack of insight into many RA risk loci, we needed an hypothesis-free approach, and the closest to that is evaluating both null hypotheses, as that would cover both models as well as threshold-based scenarios due to their intermediate nature (i.e. they would fail both types of tests in the opposite direction). We therefore evaluated both additive and multiplicative interaction on a case-control genome-wide association dataset for anti-citrullinated protein antibody positive (ACPA-positive) rheumatoid arthritis (RA), from the Swedish epidemiological investigation of RA (EIRA) cohort. For the two top genetic risk factors for RA in European-descendent populations, HLA-DRB1 shared epitope and PTPN22 rs2476601 T, we tested the risk factor against all non-HLA risk SNPs. HLA-DRB1 shared epitope is a group of alleles with similar effect, and rs2476601 is a non-synonymous coding variant of the PTPN22 gene. Two tests were used, one which used additivity as null hypothesis and one that used multiplicativity as null hypothesis. We found that there was no detectable deviation for multiplicativity, but there was from additivity (Figure 4a-b). In the case of HLA-DRB1 shared epitope, we have published on the deviation from additivity before¹². The new simulation presented here has increased our ability to interpret this result as a widespread interaction between HLA-DRB1 shared epitope and all non-HLA genetic risk factors, in the common meaning of interaction where synergism is a type of interaction. From it, we can derive that the HLA-DRB1 shared epitope cannot be substituted for (i.e. phenocopied by) a non-HLA genetic risk factor for its part in the chain of ACPA-positive RA etiology (Figure 4c). The same is the case for the PTPN22 risk allele, given the similarities in P-value distributions we observed (Figure 4a-b). For both set of tests there were a majority of tested loci where there was too little data to distinguish additive from multiplicative odds ratios. We followed up the results of multiplicativity by looking only at known risk SNPs (Figure 4c), but found results similar to a randomization based on Model I (and therefore bound to produce multiplicative odds ratios), with similar variability (P=0.6-0.8, Levene’s test) implying a dearth of non-multiplicative odds ratios (Figure 4d). This randomization is the same as Test III of Ignac et al¹³. We also devised a randomization scheme creating additive odds ratios based on Model III and tested it on those full SNP set (Supplementary Figure 3), where it as expected deviated very noticeably from the real data.

Figure 4. Application to genome-wide association data for rheumatoid arthritis

(a) P-value distribution for two test, one for deviation from additivity and one from multiplicativity. Two risk factors are tested in EIRA data against the rest of the genome except nearby SNPs. Each bin is 0.01 wide. A uniform distribution means a lack of deviation from the null model. (b) The P-values in A with Benjamini-Hochberg adjustment or multiple testing. Lack of adjusted P-values near zero means a lack of detected deviation from the null model. (c) Odds ratio for HLA-DRB1 shared epitope and one other risk factor, compared to the expected from an additive or multiplicative null model. Only known RA risk SNPs from the literature are shown. Black bars show median and 95% confidence intervals (bootstrap). (d) The same SNPs as for (c), but shuffled within cases and shuffled within controls to match Model I, thereby being a positive control for multiplicative odds ratios, allowing a comparison of dispersion with (c).

Discussion

We herein present a simulation approach intended to help interpretation of additive and multiplicative interaction of relative risks and odds ratios. We show that additivity of risk factors occurs when the risk is comprised of two independent risk factors (Model V, Figure 1c), or a process that approximates that setup at a given fraction of cases. Multiplicativity of risk factors, and deviation from additivity, as well as negative correlation between risk factors, follows if two different mechanisms are required for disease (Model IV, Figure 1d). Depending on the hypothesis one should therefore chose the appropriate statistical test. For example, if the null hypothesis is that two factors are co-operating in causing a disease, this could be tested using deviation from a multiplicative effect. Often however, we are interested in testing whether disease is caused by the interaction of two factors, and then it is appropriate to test for deviation from additivity.

Our inspiration for this work came from a simulation study⁹, which in turn discussed our previous research on RA¹², in which we detected deviation from additivity between risk factors. In the simulation study⁹, case and control status was randomly assigned, and one risk factor was spiked in to resemble the strongest genetic risk factor for ACPA-positive RA, and interaction with other risk factors (selected by p-value for risk) was computed. The simulation lead to an overrepresentation of additive interactions (i.e. deviation for additive odds ratios). However, selecting by risk p-value from a large random set is equivalent to spiking-in, except for real-world allele frequencies and population substructures (as opposed to arbitrary frequencies and statistical independence). Thus, this simulation⁹ was set up equivalent to Model I (Figure 1a). The author⁹ noted that this model produced a multiplicative null model that does not match additivity and concluded that the additive interaction observed were erroneous, as no interaction should be present. However, as mentioned, the setup in the simulation is similar to our Model I, and if the results are put in a causal context, the assumption about no interaction and following conclusion is incorrect. We instead propose an alternative interpretation, based on the convergences we found at low prevalence: Model I has an intrinsic interaction in the meaning of synergy between the risk factors¹⁴ as the model is equivalent to taking Model IV at low prevalence and subsampling it with a bias for cases (unchanged odds ratio means Model I is unaffected by biased sampling, and the equal match to multiplicative model at low prevalence, and in terms of correlations, means they become the same at a prevalence like that of RA: 0.7% for all RA in Sweden¹⁵, of which 60% are ACPA-positive¹⁶). The simulation study⁹ is therefore in line with the confusion over additive and multiplicative interaction that can sometimes be found in the literature¹⁷, highlighting the need to understand the relationships to the risk factors that they imply. It should be noted that the recent simulation study⁹ does demonstrate the emergence of multiplicative odds ratios when there are false risk factors, if one is using a common study design that creates the same scenario as Model I. After all, this paper assumes X and Y are true risk factors, for models IV onwards, instead of simulating false risk factors⁹, causing a divergence in which type of results can be interpreted in the light of each paper.

There was a negative correlation among cases for two risk factors for additive models. Theoretically, two individually sufficient factors (meaning that there is heterogeneity of causation) should have a strongly negative correlation, a similar but attenuated pattern for non-sufficient risk factors in regards to correlation is not surprising.

In this paper we also present the results when testing the multiplicative interaction between the strongest genetic risk for ACPA-positive RA and other risk-SNPs in the same material as our previous paper¹², and show that it always follows the multiplicative null. In light of the new understanding that our simulations give, the presence of deviation from additivity, along with no deviation from multiplicativity, supports the existence of widespread synergism between the genetic risk factors in causing ACPA-positive RA.

The fact that most loci showed no statistically significant deviation from neither additive nor multiplicative interaction will be the unfortunate reality for many applications of interaction testing. While statistical power for single risk factor testing scales with the inverse square of the number of samples, already requiring large sample sizes in genome-wide association studies, the statistical power for interaction testing scales to the inverse power of four¹⁸, thus requiring far larger sample sizes than standard association testing.

A multiplicative assumption would have merit in our testing against HLA-DRB1 shared epitope and PTPN22, if ACPA-positive RA were a homogeneous set of causes, rather than the kind of heterogeneity of causation that we have shown give rise to additivity between risk factors. Despite being defined based on a mediating risk factor, such homogeneity of ACPA-positive RA is not thought to be the case¹⁹.

There is a concept of testing for additive interaction to find multiplicative risk relationship. For example, KS Kendler studied gene-environment interactions and hypothesized that those at genetic risk of major depression should have a stronger effect of environmental factors and therefore the risk ought to be multiplicative. He then intentionally tested on a linear scale as that was the opposite hypothesis²⁰. For the same reason we added testing for multiplicative interactions to find additive risk relationships, which would imply heterogeneity of etiology, and in-between the two types of tests one might find multifactorial threshold relationships.

For both relative risk and odds ratios, the expected value for the double risk was always higher for multiplicativity. This is unsurprising, as the formula for the expected odds ratio or relative risk can be rewritten OR₁₁ = OR₁₀ + OR₀₁ − 1 + (OR₁₀ − 1)(OR₀₁ − 1). For model V (heterogeneity), both the algebra and the simulations could produce less-than-additive effect wherever the prevalence (in the simulation) or penetrance (in the algebra) was not approaching zero. This indicates that small negative interaction terms on the additive scale can be caused by the trivial reason of non-infinitesimal prevalence and penetrance.

The algebra we employed was written as a deterministic model, for the convenience of simple equations. However, in the equation if ((X AND I) OR J) then comp1 case, I and J are there to make X neither necessary nor sufficient. This is the case for the typical risk factor, but the equation does not require that I and J are single risk factors, they could as well be compound risk factors, nor does it require that they are risk factors at all. They can for example be representations of chance, or stochastic factors, meaning our algebra covers probabilistic thinking. Considering the rules of Boolean logic, where (NOT a) AND (NOT b) = NOT (a OR b), and (NOT a) OR (NOT b) = NOT (a AND b), we expect processes which produce synergism among risk factors to show causal heterogeneity among protective factors and vice versa.

Model IV can be viewed as the chain of events scenario, whereas Model V corresponds to phenocopying. In terms of Rothman’s sufficient-cause model, the risk factors X and Y in Model IV correspond to risk factors in the same cause, referred to as causal co-action, joint action or synergism, whereas X and Y in Model V correspond to risk factors in different causes²¹. The multifactorial threshold model has thought of in terms of genetic liability²². To describe models I-III in very simplistic terms: Model I is placing the risk factors together, creating a type of interaction, while models II and III are putting two groups together to get additive.

In this simulation study we demonstrate the causal interpretations of additive and multiplicative interaction in both the relative risk and odds ratio setting. Some of this has been understood intuitively in the past, especially the connection between multiplicative effect and logical AND²³, but here we try to further show this to the reader through simulation and simple algebra. We hope that this will guide the interpretation of future interaction studies.

Methods

Simulations

For each of the different models (models I to V – Figure 2 and Supplementary Figure 1 and 3, multifactorial threshold model – Figure 3 and three risk factors model – Supplementary Figure 2) 1,000 simulations were performed. Each simulation consisted of 1 million data points (or simulated individuals) where the presence or absence of a risk allele was assigned, as well as the status of case or control according to each model. For the sake of simplicity, binary factors were used, corresponding to a dominant or recessive scenario in genetics. We used components with the same ratio of cases to controls, except for the three factors model, where components 2 and 3 had the same ratio and component 1 has the same ratio as the OR combination as those two.

The allele frequency for the two factors tested in each model, named X and Y, was established before each simulation and set from a random value in a given range. For instance, the lower frequency for the risk factor Y was a random value between 5% to 15%. Then the higher frequency of the factor Y was the multiplication of the lower frequency by a random number between 1.1 to 4. Similarly, the lower frequency for the factor X was randomly designated between 5% to 25%. In turn, the higher frequency of the factor X was the multiplication of its lower frequency by a random number between 1.1 to 2.

For Model II, we spiked-in both risk factors with their risk frequencies into both groups, but then shuffled X in one group (between both cases and controls in the group) and shuffled Y in the other group, as a result the frequency corresponding to the numbers in italics in Figure 1b (X: freq. 0.35 and Y: freq. 0.25) will be higher_freq ⋅ prevalence + lower_freq ⋅ (1-prevalence), for example for a 50% prevalence and using the numbers for X in Figure 1b, 0.35 = 0.6 ⋅ 0.5 + 0.1 ⋅ (1 − 0.5).

Correlation between risk factors

The Pearson correlation was implemented to calculate the relationship between two risk factors in the five different models (Supplementary Figure 1).

Computational packages

We used the web calculator from symbolab²⁴ to solve algebra, specifically the RR₁₁=RR₁₀+RR₀₁−1−a⋅b⋅m/(1−m) formula. Otherwise, calculations were done using python, including the packages numpy, scipy, matplotlib and pandas.

Interactions in rheumatoid arthritis GWAS

The genotyped and imputed GWAS data from the EIRA study (see ¹² for sources included) were used in this part of the study. Only data from ACPA-positive RA patients was included. The standard data filtering was performed as previously described¹². Briefly, missing rate higher or equal to 5% and p-values of less than 0.001 for Hardy-Weinberg equilibrium in controls. The SNPs located in the extended mayor histocompatibility complex (MHC) region (chr6:27339429-34586722, GRCh37/hg19) were removed, due to the high linkage disequilibrium and possible independent signals of association with ACPA-positive RA in the locus.

The departure from additivity or multiplicativity for two risk factors was estimated in the imputed GWAS data (3,138,911 SNPs for the test with HLA-DRB1 shared epitope and 3,308,784 SNPs for the test with PTPN22 rs2476601 T), using GEISA²⁵, where a dominant model was assumed. The first ten principal components and gender were used as covariables in this analysis, in order to control by population stratification and differences between allele frequencies due to sex, respectively. A cut-off of minimum five individuals per each odds ratio (OR) combination was applied. The HLA-DRB1 share epitope alleles included *01 (except *0103), *0404, *0405 and *0408 and *1001. The p-values for interaction from these analyses are plotted in the Figure 4a-b. We included only SNPs at risk allele frequencies between 10% and 50% in this testing to minimize the risk of including protective factors, however when we tested all the SNPs at a minor allele frequency above 1% the result provided the same conclusion.

To address both additive and multiplicative risk scales, and evaluate the behavior of the ORs for double risk exposure (OR₁₁ – Figure 4c-d and Supplementary Figure 4), we used genotyped EIRA GWAS data (281,195 SNPs). For this analysis, the data was transposed using Plink 1.07²⁶, then risk SNPs were selected base on a OR higher than 1.1 together with the criterion of having been reported as associated to RA in published case-control RA GWAS^27–29.

Code availability

Code is available at https://github.com/danielramskold/additive_risk_heterogeneity_multiplicative_risk_synergism where we provide the code used to generate the figures. It also has code for the data-not-shown for the heterogeneity model at high prevalence as well as a program designed to be user-friendly (Model_I-V_sim.py) for calculating relative risk and odds ratio for each model with given allele frequencies and which includes the codominant scenario.

Author contributions

D.R. and B.B. conceived the study. L.M.D.G. and R.S. provided feedback on study design. D.R. performed the simulations. L.M.D.G. and D.R. performed the other analyses. H.W. helped with analysis settings. D.R. drafted the manuscript, and all authors critically revised the manuscript.

Acknowledgements

Anton Larsson helped during literature search. Lars Klareskog, Lars Alfredsson and Leonid Padyukov engaged in scientific discussions. This research was funded in part by Ulla och Gustaf af Uggla Foundation (2018-02670), Reumatikerförbundet (R-861801) and Konung Gustaf V:s 80-årsfond (FAI-2018-0518).

Footnotes

https://github.com/danielramskold/additive_risk_heterogeneity_multiplicative_risk_synergism

References

1.↵
Visscher, P. M. et al. 10 years of GWAS discovery: Biology, function and translation. Am J Hum Genet. 101, 5–22 (2017)
OpenUrl CrossRef PubMed
2.↵
Beecham, A. H. et al. Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis. Nat Genet. 45, 1353–60 (2013)
OpenUrl CrossRef PubMed
3.
Stahl, E. A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat Genet. 44, 483–9 (2012)
OpenUrl CrossRef PubMed
4.↵
Xue, A. et al. Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes. Nat Commun. 9, 2941 (2018)
OpenUrl CrossRef PubMed
5.↵
Gilbert-Diamond, D., Moore, J.H. Analysis of gene-gene interactions. Curr Protoc Hum Genet. Chapter 1, Unit 1.14, doi:10.1002/0471142905.hg0114s70 (2011)
OpenUrl CrossRef
6.↵
Erkan, D. et al., 14th International congress on antiphospholipid antibodies: task force report on antiphospholipid syndrome treatment trends. Autoimmun Rev. 13, 685–96 (2014)
OpenUrl CrossRef PubMed
7.↵
Rothman, K. J. Causes. Am J Epidemiol. 104, 587–592 (1976)
OpenUrl PubMed Web of Science
8.↵
Kendler, K. S. and C.O. Gardner, Interpretation of interactions: guide for the perplexed. Br J Psychiatry 197, 170–1 (2010).
OpenUrl Abstract/FREE Full Text
9.↵
Kim, K. Massive false-positive gene-gene interactions by Rothman’s additive model. Ann Rheum Dis. 78, 437–439 (2019).
OpenUrl FREE Full Text
10.↵
Clayton, D. Commentary: reporting and assessing evidence for interaction: why, when and how? Int J Epidemiol. 41, 707–10 (2012).
OpenUrl CrossRef PubMed
11.↵
Weinberg, C. R. Less is more, except when less is less: Studying joint effects. Genomics 93, 10–2 (2009).
OpenUrl CrossRef PubMed Web of Science
12.↵
Diaz-Gallo, L. M. et al., Systematic approach demonstrates enrichment of multiple interactions between non-HLA risk variants and HLA-DRB1 risk alleles in rheumatoid arthritis. Ann Rheum Dis 77,1454–1462 (2018).
OpenUrl
13.↵
Ignac, T. M. et al. Discovering pair-wise genetic interactions: an information theory-based approach. PLoS ONE 9, e92310 (2014)
OpenUrl CrossRef PubMed
14.↵
Sjölander, A. et al. Bounds on causal interactions for binary outcomes. Biometrics 70, 500–5 (2014).
OpenUrl
15.↵
Neovius, M. et al. Nationwide prevalence of rheumatoid arthritis and penetration of disease-modifying drugs in Sweden. Ann Rheum Dis. 70, 624–9 (2011).
OpenUrl Abstract/FREE Full Text
16.↵
Jiang, X. et al. An Immunochip-based interaction study of contrasting interaction effects with smoking in ACPA-positive versus ACPA-negative rheumatoid arthritis. Rheumatology 55, 149–55 (2016).
OpenUrl CrossRef PubMed
17.↵
1. L. Padyukov
Källberg, H., Bengtsson, C. Chapter One - Terminology and definitions for interaction studies, in between the lines of genetic code, L. Padyukov, Editor. Academic Press. pages 3–23 (2014).
18.↵
Zuk, O. et al. The mystery of missing heritability: Genetic interactions create phantom heritability. Proc Natl Acad Sci U S A 109,1193–8 (2012).
OpenUrl Abstract/FREE Full Text
19.↵
Sokolove, J. Rheumatoid arthritis pathogenesis and pathophysiology. Respiratory Medicine, 19–30 (2017). doi:10.1007/978-3-319-68888-6_2
OpenUrl CrossRef
20.↵
Dohrenwend, B. P. Adversity, stress, and psychopathology. Oxford university press, New York, page 482 (1998)
21.↵
Rothman, K. J., Greenland S. Modern epidemiology, 2nd edition, page 12, Lippincott Williams Wilkins (Wolters Kluwer), Philadelphia USA, ISBN 0-316-75780-1 (1998)
22.↵
Todorov, A. A., Suarez, B.K. Genetic liability model. Encyclopedia of Biostatistics. doi:10.1002/0470011815.b2a05036 (2005)
OpenUrl CrossRef
23.↵
Li, W., Reich, J. A complete enumeration and classification of two-locus disease models. Hum Hered 50, 334–349 (2000)
OpenUrl CrossRef PubMed Web of Science
24.↵
https://www.symbolab.com/solver/algebra-calculator
25.↵
Uvehag, D., Zazzi, H. “Geisa.” https://github.com/menzzana/geisa (2014)
26.↵
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 81, 559–75 (2007)
OpenUrl CrossRef PubMed
27.↵
Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014)
OpenUrl CrossRef PubMed Web of Science
28.
Karlson, E. W. et al. Cumulative association of 22 genetic variants with seropositive rheumatoid arthritis risk Ann Rheum Dis 69,1077–1085 (2010).
OpenUrl Abstract/FREE Full Text
29.↵
Stahl, E. A. et al. Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nat Genet. 42, 508–14 (2010)
OpenUrl CrossRef PubMed Web of Science