Epistasis and Entropy

Kristina Crona

doi:10.1101/014829

ABSTRACT

Epistasis is a key concept in the theory of adaptation. Indicators of epistasis are of interest for large system where systematic fitness measurements may not be possible. Some recent approaches depend on information theory. We show that considering shared entropy for pairs of loci can be misleading. The reason is that shared entropy does not imply epistasis for the pair. This observation holds true also in the absence of higher order epistasis. We discuss a refined approach for identifying pairwise interactions using entropy.

1. INTRODUCTION

Epistasis tends to be prevalent for antimicrobial drug resistance mutations. Sign epistasis means that the sign of the effect of a mutation, whether good or bad, depends on background Weinreich et al. (2005). Sign epistasis may be important for treatment strategies, both for antibiotic resistance and HIV drug resistance Goulart et al., 2013; Desper et al., 1999; Beerenwinkel et al., 2007 a). For instance, there are sometimes constraints on the order in which resistance mutations occur. A particular resistance mutation may only be selected for in the presence of another resistance mutation. It is important to identify such constraints. A first question is how one can identify pairwise epistasis in a large system. We will discuss entropy (Shannon, 1948) and epistasis. Information theory has been used for HIV drug resistance mutations (Gupta and Adami, 2015) and more extensively for analyzing human genetic disease (e.g. Dong et al., 2008; Kang et al., 2008; Streiloff et al., 2010). For recent review articles on epistasis and fitness landscapes see e.g. Hartl (2014); Kondrashov and Kondrashov (2014), and for an empirical perspective (Szendro et al., 2012).

2. RESULTS

It is well established that genotypes are expected to be in equilibrium proportions if there is no epistasis in the system, i.e., if fitness is multiplicative. For instance, if two rare mutations have frequencies p and q, then the frequency of the genotype combining the two mutations is expected to be close to pq. This statement holds true regardless if recombination occurs or not (Otto and Lenormand, 2002).

We will explore the relation between entropy and epistasis for a system with constraints as described in the introduction.

Consider a 3-locus balletic system where a mutation at the first locus confers resistance, whereas mutations at the second and third loci are only selected for in the presence of the first mutation (otherwise they are deleterious). We represent the case with a fitness graph (Crona et al., 2013) (Figure 1). As conventional, 000 denotes the wild-type. For instance, one obtains a system with the fitness graph as in Figure 1 for the log-fitness values

Figure 1.

systems

The gene interactions for a 3-loci system can be described by the sign pattern of 20 circuits, or minimal dependence relations (Beerenwinkel et al., 2007 b). The relevant two-way interactions in this context be described by the six circuits corresponding to the faces of the 3-cube. Specifically,

The four inequalities express that there is positive epistasis for the first and second loci, as well as for the first and third loci. The two equalities show that there is no epistasis for the second and third loci, regardless of background. The total 3-way epistasis is zero as well,

Higher order gene interactions have also been described using Walsh coefficients (Weinreich et al., 2013; Poelwijk et al, 2015). For this landscape the Walsh coefficient E₀₁₁ = 0, which indicates an absence of background averaged epistasis for the second and third loci.

We will consider entropy during the process of adaptation for this landscape. The starting point for adaptation is the wild-type 000. We use a standard Wright-Fisher model for an infinite population with mutation rate μ = 10⁻⁷. The gene frequencies and shared entropy after the given number of generations are listed in the table.

View this table:

Table 1.

Gene frequencies and shared entropy I(2, 3) for an infinite population with mutation rate 10⁻⁷.

The shared entropy for the second and third loci differs from zero. However, there is no 2-way epistasis for the pair of loci.

By extrapolation, consider an analogous system for L-loci. Then L − 1 mutations are selected for only if the first mutation has occurred, but there are no other interactions. One would get non-zero shared entropy for pairs of loci, although there is 2-way epistasis for L − 1 pairs of loci only.

2.1. A pair with no epistasis and maximal shared entropy

The landscape is closely related to the previous example. Indeed, the two-way interactions can be described by the sign pattern and the total 3-way epistasis is zero:

Also in this case, there is no epistasis for the second and third loci. Mutations at the second and third loci are selected for only in the presence of a mutation at the first locus. However, this fitness landscape differs from the previous example in that a mutation at the first locus is neutral for the wild-type.

Suppose that 50 percent of hosts start a new treatment with 000 viruses, and 50 percent start with the 100 genotype. That could be realistic, for instance if the 100 genotype had some resistance to a previously used drug. By assumption, eventually one would have about 50 percent 000 genotypes and 50 percent 111 genotype in the total population. Then I(2, 3) = 2 although there is no epistasis for the second and third loci. This example also points at a fundamental problem relating pairwise epistasis and entropy. At the time when we have 50 percent 000 genotypes and 50 percent 111 genotypes, obviously no method can reveal pairwise epistasis.

2.2. A refined approach

We will discuss a refined approach for identifying pairwise epistasis. Suppose that we have identified shared entropy for a particular pair of loci {k, l}. Let denote the set of loci such that the shared entropy

Let denote the set of loci with non-zero shared entropy for some locus in S₁, and so forth. Let .

Let v denote one of the 2^|S| possible states for S, and consider the subsystem of genotypes determined by v. If the shared entropy I^v(k: l) = 0 for all v, then there is no indication of of epistasis for {l, k}.

We can apply the refined approach for the second and third loci in our example where I(2, 3) = 2. Then

Consequently, there is no indication of epistasis for the second and third loci.

The described method could be useful for identifying cases with shared entropy and no epistasis. However, it remains to explore to what extent the method is useful in a more general setting.

3. DISCUSSION

We have demonstrated that shared entropy for two loci does not imply epistasis for the pair. This observation holds true also in the absence of 3-way epistasis in a single environment. Entropy based approaches to epistasis are coarse. We have discussed a refined approach which filters out some cases where shared entropy depends on states at other loci.

There are obviously other reasons for caution in interpretations of entropy for drug resistance mutations. Different drugs constitute different environments. Some resistance mutations may be correlated if they are beneficial in the presence of a particular drug, but not for other drugs. In such cases entropy would not not imply epistasis.

Our results show that observations on entropy and epistasis based on 2-locus systems can be misleading for general systems. From a theoretical point of view, a better understanding of large systems would be useful for handling drug resistance data.

4. METHODS

Let x and y be discrete random variables with states x₁, x₂ and y₁, y₂. Let p_i denote the frequency of x_i, and p_ij the frequency for the combination of x_i and y_j. The entropy (Shannon, 1948) H(x) and the joint entropy H(x, y) are defined as

The shared entropy is the quantity I(x : y) = H(x) + H(y) − H(x, y).

In general I (x : y) ≥ 0, and the shared entropy is a measure of dependence.

REFERENCES

↵
Beerenwinkel, N., Eriksson, N. and Sturmfels, B. (2007). Conjunctive Bayesian networks. Bernoulli; 13:893–909.
OpenUrl CrossRef Web of Science
↵
Beerenwinkel, N., Pachter, L. and Sturmfels, B. (2007). Epistasis and shapes of fitness landscapes. Statistica Sinica 17:1317–1342.
OpenUrl
↵
Crona, K., Greene, D. and Barlow, M. (2013). The peaks and geometry of fitness landscapes. J. Theor. Biol. 317: 1–13.
OpenUrl CrossRef PubMed Web of Science
↵
Desper, R., Jiang, F., Kallioniemi, O.P., Moch, H., Papadimitriou, C.H. and Schäffer, A.A. (1999). Inferring tree models for oncogenesis from comparative genome hybridization data. Comput. Biol 6 37–51.
OpenUrl
↵
Dong, C., Chu, X., Wang, Y., Wang, Y., Jin, L. (2008). Exploration of gene-gene interaction effects using entropy-based methods. Eur. J. Hum. Genet. 16: 229–235
OpenUrl CrossRef PubMed Web of Science
↵
Goulart, C. P., Mentar, M., Crona, K., Jacobs, S. J., Kallmann, M., Hall, B. G., Greene D. and Barlow M. (2013). Designing antibiotic cycling strategies by determining and understanding local adaptive landscapes. PLoS ONE 8(2): e56040. doi:10.1371/journal.pone.0056040.
OpenUrl CrossRef PubMed
↵
Gupta, A. and Adami, C. (2015). Changes in epistatic interactions in the long-term evolution of HIV-1 protease. arxiv: 14 0 8. 2 7 61.
↵
Hartl, D. (2014) What can we learn from fitness landscapes? Current Opinion in Microbiology 21 (2014): 51–57.
OpenUrl CrossRef PubMed
↵
Kang, G., Yue, W., Zhang, J, Cui, Y., Zuo, Y., Zhang, D. (2008) An entropy-based approach for testing genetic epistasis underlying complex diseases, J. Theor. Biol. 250
↵
Kondrashov, D. A. and Kondrashov, F. A. (2014). Topological features of rugged fitness landscapes in sequence space. Trends in Genetics
↵
Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal vol. 27, pp. 379–423 and 623-656, July and October, 1948.
OpenUrl CrossRef Web of Science
↵
Otto, S. P. and Lenormand, T. (2002). Resolving the paradox of sex and recombination. Nature Reviews Genetics; 3:252–261.
OpenUrl CrossRef PubMed Web of Science
↵
Poelwijk, F., Krishna, V. and Ranganathan, R. (2015). The context-dependence of mutations: a linkage of formalisms. arXiv:1502.00726 (2015).
↵
Strelioff, C. C., Lenski R. E. and Ofria, C. (2010) J. Theor. Biol. 266 (4), pp. 584–594
OpenUrl PubMed
Szendro, I. G., Schenk, M. F., Franke, J. Krug, J. and de Visser J. A. G. M. (2013). Quantitative analyses of empirical fitness landscapes J. Stat. Mech. P01005.
De Visser, J. A. G. M, and Krug, J. (2014) Empirical fitness landscapes and the predictability of evolution.” Nature Reviews Genetics 15.7 (2014): 480–490.
OpenUrl
↵
Weinreich D. M., Lan, Y., Wily, C. S., Heckendorn, R. B. (2013). Should evolutionary geneticists worry about higher-order epistasis? Curr Opin Gen Dev 23: 700–7.
OpenUrl CrossRef PubMed
↵
Weinreich, D. M, Watson, R. A. and Chao, L. (2005) Sign epistasis and genetic constraint on evolutionary trajectories. Evolution., 9(6) pp 1165–74.
OpenUrl