Abstract
Epistasis plays a significant role in the genetic architecture of many complex phenotypes in model organisms. To date, there have been very few interactions replicated in human studies due in part to the multiple hypothesis burden implicit in genome-wide tests of epistasis. Therefore, it is of paramount importance to develop the most powerful tests possible for detecting interactions. In this work we develop a new gene-gene interaction test for use in trio studies called the trio correlation (TC) test. The TC test computes the expected joint distribution of marker pairs in offspring conditional on parental genotypes. This distribution is then incorporated into a standard one degree of freedom correlation test of interaction. We show via extensive simulations that our test substantially outperforms existing tests of interaction in trio studies. The gain in power under standard models of phenotype is large, with previous tests requiring more than twice the number of trios to obtain the power of our test. We also demonstrate a bias in a previous trio interaction test and identify its origin. We conclude that the TC test shows improved power to identify interactions in existing, as well as emerging, trio association studies. The method is publicly available at www.github.com/BrunildaBalliu/TrioEpi.
INTRODUCTION
Genetic association studies, especially in humans, have focused primarily on marginal effects of genetic variants. While this approach has successfully identified thousands of variants associated with hundreds of complex human diseases (Hindorffet al. 2009), it ignores the role of epistasis in shaping phenotypes. Recent work in model organisms has shown that epistasis is a major contributor to broad sense heritability (Ayroleset al. 2009; Bloomet al. 2013; Ackermann and Beyer 2012) and interactions have been repeatedly posited as a key component of missing heritability in humans (Zuket al. 2012; Gibson 2012). Furthermore, identification of epistatic interactions provides important insights into the functional organization of molecular pathways (Carlborg and Haley 2004; Bremet al. 2005; Cordell 2009; Ayroleset al. 2009; Maet al. 2012; Maet al. 2012; Maet al. 2013).
One of the major obstacles in the identification of interactions in genetic association studies is the multiple hypothesis correction penalty induced by the examination of millions of pairs of SNPs. Therefore, it is of fundamental importance to develop the most powerful possible test statistic when searching for interactions. In this work we are concerned with tests for epistasis in trio studies in which mother-father-offspring trios are genotyped and the offspring is a carrier for the disease of interest, or the phenotype of interest is fitness.
There currently exist three classes of test for interaction between pairs of markers in trio studies. First, case-only correlation tests have been proposed where the null hypothesis is no correlation between genotypes at the two loci. While these can easily be performed via χ2 tests of independence between genotypes, they fail to leverage the information available from the parents in the trios and are susceptible to inflation from population structure. Second, pseudo-controls can be created via the non-transmitted parental alleles and used as matched controls in a conditional logistic regression framework (Cordell 2002; Cordellet al. 2004; Schwenderet al. 2013). These tests account for population stratification that can induce long range LD between marker pairs and bias the case-only correlation tests. Third, and most recently, Ackermann and Beyer (2012) proposed the Imbalanced Allele Pair frequencies (ImAP) test. Their insight was that the expected counts of offspring alleles at a pair of SNPs could be computed conditional on parental genotypes. They incorporated this into a four degrees of freedom (d.f.) correlation test and used permutations to determine the null distribution (Ackermann and Beyer 2012).
The primary contributions of this work are two-fold. First, we show that in the presence of marginal effects but absence of interaction effects between pairs of markers, the ImAP test is biased. We identify that the normalization procedure used in ImAP is the source of this bias. Second, we develop a new interaction test, called the trio correlation test (TC) for use in trio studies. Using the insight of the ImAP approach, we begin by computing the expected distribution of the offspring’s genotype conditional on the parental genotypes and use this distribution to build a correlation test with one d.f. The TC has several advantages over previous approaches. First, it has one instead of four d.f. needed for ImAP. Second, it does not require expensive permutation tests to compute p-values. Third, we show via extensive simulations that our test is the most powerful among previously proposed tests under standard disease models. Indeed, in simulations where SNPs contained marginal and interaction effects, our average test statistic (16.4) was more than double the standard correlation based test statistic (7.8).
The rest of the paper is organized as follows. In Section 2, we introduce the existing tests and our proposed TC test. In Section 3, we evaluate the finite sample performance of the existing and proposed tests using an extensive simulation study. We close with a discussion in Section 4.
METHODS
Consider a trio study in which n mother-father-offspring trios are genotyped, and the off-spring are carriers for the disease of interest, or the phenotype of interest is fitness. In this section we present existing tests for detecting interaction between pairs of markers as well as our novel TC approach.
The Standard Independence (SI) Test
Consider a pair of bi-allelic markers with possible genotypes g1, g2 ∈ {0, 1, 2}. Let O(g1, g2) the observed counts for the nine possible genotype combinations at these two markers in the offspring. Further, let O(g1) = g2 O(g1, g2) and O(g2) = g1 O(g1, g2) the observed marginal counts of the three possible genotypes at each marker. Last, let E(g1, g2) the expected counts of all nine possible genotype combinations at the two markers, as computed from the products of the observed marginal counts at each marker. That is E(g1, g2) = O(g1) × O(g2)/n.
A χ2 test statistic can be obtained by first calculating the squared difference of observed and expected counts for each genotype combination of the two markers divided by the corresponding expected counts. The final score for a marker pair is the sum of these values over all nine possible genotype combinations,
Under the null hypothesis of marker independence, SI is assumed to be asymptotically distributed.
The Imbalanced Allele Pair Frequencies (ImAP) Test
Ackermann and Beyer (2012) proposed to calculate the expected counts in the children using the parental genotypes and the laws of Mendelian inheritance. Under Mendelian segregation, the offspring inherits alleles randomly from its parents and the expected genotype of each marker can be derived from the genotypes of the parents. The resulting probabilities for all possible parental genotype combinations are shown in Table 1.
Let the genotypes of the mother, father and child of trio i at a marker.
Moreover, let the probability of the offspring having genotype g1 conditional on the parental genotypes, as calculated using the probabilities in Table 1. For example, if and then Pi(0) =.25, Pi(1) = .5, Pi(2) = .25.
If the offspring are selected based on a phenotypic designation such as disease status, then a SNP increasing risk for the disease will be non-randomly inherited by the offspring. In order to correct for such main effects Ackermann and Beyer (2012) proposed to multiply each offspring’s expected genotype by the ratio of the sample-wide observed and expected counts for the corresponding marker, that is
The above computation of does not guarantee that Ackermann and Beyer (2012) proposed to use the following normalization:
Subsequently, the expected counts of each genotype combination using the adjusted for main effects and normalized genotype counts can be calculated as
The corresponding ImAP statistic is given as,
Because this χ2-like test statistic is not properly calibrated under the null hypothesis, Ackermann and Beyer (2012) assess the significance via a permutation approach. ImAP is nearly χ42 distributed and we use this approximation to compute p-values in addition to the permutation approach. However, we show in Section 3 that in the presence of main effects the ImAP test statistic is inflated, even when the permutation approach is used to compute the p-values. The source of this inflation is the normalization step (4) (see Supplementary Text).
The Standard Correlation (SC) Test
Let the expected value of the genotype at a marker and let the corresponding variance. Moreover, let /inline/the covariance of genotypes at the two markers and their Pearson’s correlation coefficient.
The test statistic is given as
Under the null hypothesis, SC is assumed to be asymptotically distributed.
The Trio Correlation (TC) Test
Let the expected genotype counts of genotypes at each of the two markers in the pair, computed based on the un-adjusted for main effects and un-normalized conditional genotype counts of the offspring in (2). In order to extend the standard correlation test based on the Pearson’s correlation coefficient such that information from parental genotypes is incorporated, we propose to compute μ1 and μ2 from the expected counts Ep(g1) and Ep(g2), rather than the observed genotype counts O(g1) and O(g2).
Letthe new expected value and the new variance. Moreover, let the new covariance and the new correlation coefficient. Then the TC test statistic is given as
Under the null hypothesis, T C is assumed to be asymptotically distributed. Unlike the ImAP test we do not utilize the normalized conditional probabilities and so we do not suffer from this source of inflation.
Conditional Logistic Regression with Pseudo-controls (CLRPC)
In addition to the correlation and independence based tests described above, tests based on CLRPC have been proposed for detecting epistasis in trio studies. Briefly, fifteen pseudo-controls are constructed via the Mendelian genotype realizations, given the parents’ genotypes at the two loci, which are then used as matched controls in a CLRPC. Here we consider two types of tests based on CLRPC. First, interactions might be tested with a likelihood ratio test such as the one proposed by Cordell (2002). In this case, two conditional logistic regression models are fitted to the cases and the respective matched pseudo-controls, one consisting of two coding variables for each of the two SNPs, and the other additionally containing the four possible interactions of these variables. Then, p-values can be computed by approximation to a χ2 distribution with four d.f. We refer to this test as CLRP C1.
Alternatively, a more simple approach can be used in which interactions might be tested with a likelihood ratio test comparing a conditional logistic regression model containing one parameter for each SNP and one parameter for the interaction of these two SNPs with a model only consisting of the two parameters for the main effects of the SNPs, where a single genetic mode of inheritance assumed for each SNP (e.g. additive, dominant, or recessive). In this work we use additive marginal effects for both SNPs. The p-values can be computed by approximation to a χ2 distribution with one d.f. We refer to this test as CLRP C2.
In this work we use the R (R Core Team 2014) implementation of these tests available in the R package trio (Schwenderet al. 2013) at http://cran.r-project.org.
SIMULATION STUDY
Data Generation
To evaluate the relative performance of the tests described in the previous section in terms of type I error rate and power, we performed a series of simulation studies under a liability threshold model of disease. Using random mating and Hardy-Weinberg Equilibrium assumptions we generated genotypes at two markers, each with a minor allele frequencies of 0.5, for a random sample of N trios, with N =10K, 50K and 100K. To generate yi, the liability of offspring i, we used the following regression model: where β1 and β2 are the main effects of each marker and γ is their interaction effect, ∊i ∼ N (0, σ2) a random error term with variance s2 chosen such that both markers explain 10% of the heritability of the liability. This large effect size is used to test for inflation of interaction tests under main effects only models of phenotype.
In order to study the impact of ascertainment and disease prevalence in the performance of the tests, we selected from the initial random sample of N trios the 1,000 offspring with the highest phenotype values and their parents. The higher the value of N, the stronger the ascertainment is and the lower the disease prevalence is, since the sample of 1,000 trios represents a smaller fraction of the initial sample.
For each choice of parameters 10,000 data sets were generated and we applied the SI, ImAP, SC, TC, as well as the two CLRPC tests to each data set. We also examined the ImAP test without the normalization step (4), which we refer to as ImAP2. The ImAP2 test statistic is computed as the ImAP test statistic in (5) but the expected counts of each genotype combination are computed using the product of the adjusted for main effects only genotype counts P′, as opposed to the product of the adjusted for main effects and normalized counts P * used for ImAP. The reason we show results for ImAP2 is to present the source of bias in the ImAP test, i.e. the normalization step in (4).
In addition, we also examined the ImAP test without the adjustment for main effects (3) and without the normalization step (4), which we refer to as ImAP3. Similarly to the ImAP2 test statistic, the ImAP3 is computed as the ImAP test statistic in (5) but now the expected counts of each genotype combination are computed using the product of P, i.e. the genotype counts without main effect adjustment or normalization. Results for ImAP3 are presented in order to study the impact of main effect adjustment when main effects are not present.
Results on Type I Error Rate
We first examined the type I error rate performance of the tests under the null hypothesis of no main effects and no interaction effects (β1=β2=γ=0). The results in Table 2.a show that all tests with the exception of the ImAP tests are well calibrated, with an average χ2 test statistic equal to the d.f. of the test. We then examined the robustness of the tests when main effects, but no interaction effects exist (β1=β2=0.1 and γ=0). The results displayed in Table 2.b show that all tests suffered some degree of inflation. This inflation is driven by the fact that marginally associated SNPs will become correlated in ascertained data (Zaitlenet al. 2012; Zaitlenet al. 2012). Ascertainment based inflation will increase as the disease prevalence decreases.
Interestingly, the TC test is the least biased of these tests for the highest prevalence disease and the most biased for the lowest prevalence disease. This known source of bias is relatively minor compared to the genome-wide testing threshold and therefore unlikely to result in false-positive interaction results. However, the ImAP test is severely inflated under this marginal effects only model and is very likely to produce false positives. As noted in the previous section, this inflation is driven by the normalization procedure as evidenced by the fact that without normalization (ImAP2), the test is similar to the other tests considered.
It is well known that the ImAP tests are not χ2 distributed and so we computed the permutation-based p-values for the ImAP, ImAP2, and ImAP3 test statistics as described in Ackermann and Beyer (2012). The type I error rate for each of the three test based on the permutation p-values are listed in Table 3. Under the null hypothesis of no interaction and the absence of main effects, all three tests are well calibrated (Table 3.a). In the presence of main effects, ImAP and ImAP3 are very inflated. ImAP3 is inflated due to the presence of main effects. ImAP2 adjusts to some degree for the presence of main effects and has a much lower type I error rate. However, in the recommended ImAP test, the normalization step is performed which leads to an increased type I error rate even after performing permutations.
Population structure can induce correlation between distant SNPs. In order to examine the effects of structure on the interaction tests we performed another a set of simulations in a pseudo-admixed population. We assumed minor allele frequencies of 0.5 and 0.1 in populations one and two respectively. For each mother and father we drew genome-wide ancestry uniformly on the interval [0.1,0.9]. We then drew local ancestry from a binomial according to the genome-wide ancestry. Finally, we drew genotypes based on the minor allele frequencies for each population conditional on the local ancestry at each SNP. The rest of the simulation proceeded as above.
The results of the pseudo-admixed population simulations under the null hypothesis of no interaction effect are presented in Table 4. As expected all interactions tests are inflated in structured populations. However, the CLRPC test is less inflated than all others and is therefore the recommended test when dealing with structured populations.
Results on Power
Tables 5 and 6 show the mean χ2 and power results respectively under the alternative hypothesis of interaction effect, i.e. γ=0.1, for both absence of main effects, i.e. β1=β2=0, and presence of main effects, i.e. β1=β2=.1. In order to compare the relative performance of the tests, we selected a significance threshold a such that the power of the SI test was 50%. For all tests power increases as disease prevalence decreases since a larger genetic burden is required to exceed the liability threshold.
The TC test is the most powerful test with a 40%, 70%, and 85% gain in test statistic relative to the SC test with a concomitant gain in power. The pseudo-control based interaction tests did not perform as well as the TC, SC and SI interaction tests. The SC test outperformed the SI tests due to the reduction in d.f and the additive generative model for the phenotype used in these simulations. The ImAP test is disqualified due to inflation introduced by the normalization. The ImAP2, which did not suffer this inflation, is underpowered relative to the other tests. By comparing ImAP2 to ImAP3 we can see that the reason ImAP2 is under-powered is mainly because of the ‘over-adjustment’ for main effects. ImAP3 is also disqualified as a proper test because it is severely inflated under the null hypothesis in the presence of main effects.
DISCUSSION
In this work we develop the TC test, a new one d.f. statistical interaction test for use in trio studies that leverages information form the parental genotypes. We compare our test with existing tests for epistasis and show via simulations that the TC test properly controls the type I error in the absence of main effects and is similarly biased to other tests for lowprevalence diseases. Our test substantially outperforms all other tests of interaction in trio studies. The gain in power under standard models of phenotype is large, with the SC test requiring up to twice the number of trios to obtain the power of our test.
Ackermann and Beyer (2012) showed that in the absence of main effects the significance of the ImAP test statistic for each marker pair can be properly assessed via a permutation approach. We showed here that the permutation approach used to calibrate the ImAP test statistic in the absence of main effects will not address the inflation caused in the presence of main effects. We identify as a source of inflation the normalization step in (4), which alters the correlation structure of the expected genotype probabilities and introduces bias in the test statistic.
All methods considered here will be slightly inflated due to ascertainment for low prevalence diseases. However, this inflation is minimal and not enough to pass a genome-wide significance threshold. Population structure can induce long range LD inflating correlation based tests. Permutations based on pseudo-controls can account for structure (Epsteinet al. 2012) if inflation is observed in the study. The logistic regression based pseudo-control method can directly account for population stratification and can flexibly model one d.f. up to four d.f. tests of interaction (Cordellet al. 2004). However, this test was the least powerful in our simulations and will lose power when there is assortative mating for the phenotype of interest (Kleiet al. 2012).
In this work we used a liability threshold model (i.e. probit model) with and additive by additive interaction term to simulate disease phenotypes. Our previous work suggests that replacing this with a logit model will not have substantial effect on outcomes (Zaitlenet al. 2012). There exist a myriad of alternative models and assumptions each of which can alter the definition of interaction and the relative power of statistical tests (Cordell 2002; Hallgrímsdóttir and Yuster 2008). For example if the true model is four d.f.,e.g. contains additive by dominance effects, our test, which only models additive by additive effects, may no longer be the most powerful test. Because the ImAP test is inflated in the presence of main effects, a powerful and unbiased four d.f. test leveraging trios remains an open research question as does a one d.f. allelic test.
While we developed a correlation based test leveraging the parental genotypes, it also is possible to construct a likelihood based approach for interaction conditional on parental genotypes. This requires an assumption about the underlying disease model (e.g. logit/probit model). Such assumptions can lead to false conclusions of epistasis when the assumed model is incorrect (Clayton 2012). However, these tests can be more powerful when the model choices are similar to the real disease model.
Similar to commonly used tests for marginal effects, the interaction tests presented here are inappropriate for rare variants. In the marginal case it is recommended to use a Fisher’s Exact Test when the minor allele frequency is small. The definition of rare depends on the sample size of the study, as the number of observations of the rare allele is the quantity of interest. For interaction tests one must use a threshold on the product of the minor allele frequencies of pairs of markers.
There has been a renewed interest in trio cohorts with affected offspring for the purposes of identifying denovo mutations and parent of origin effects (O’Roaket al. 2011; Arjomandiet al. 2011; O’Roaket al. 2012; Nealeet al. 2012; Sanderset al. 2012). While these collections have identified many denovo mutations, they have not yet been examined for the presence of interactions and our test is therefore of immediate benefit to these rapidly growing trio cohorts.
ACKNOWLEDGEMENT
We thank Marit Ackermann and Andreas Beyer for helpful discussions. N.Z. was funded by 1K25HL121295-01A1.