Elsevier

Journal of Theoretical Biology

Volume 317, 21 January 2013, Pages 1-10
Journal of Theoretical Biology

The peaks and geometry of fitness landscapes

https://doi.org/10.1016/j.jtbi.2012.09.028Get rights and content

Abstract

Fitness landscapes are central in the theory of adaptation. Recent work compares global and local properties of fitness landscapes. It has been shown that multi-peaked fitness landscapes have a local property called reciprocal sign epistasis interactions. The converse is not true. We show that no condition phrased in terms of reciprocal sign epistasis interactions only, implies multiple peaks. We give a sufficient condition for multiple peaks phrased in terms of two-way interactions. This result is surprising since it has been claimed that no sufficient local condition for multiple peaks exist. We show that our result cannot be generalized to sufficient conditions for three or more peaks. Our proof depends on fitness graphs, where nodes represent genotypes and where arrows point toward more fit genotypes. We also use fitness graphs in order to give a new brief proof of the equivalent characterizations of fitness landscapes lacking genetic constraints on accessible mutational trajectories. We compare a recent geometric classification of fitness landscape based on triangulations of polytopes with qualitative aspects of gene interactions. One observation is that fitness graphs provide information that are not contained in the geometric classification. We argue that a qualitative perspective may help relating theory of fitness landscapes and empirical observations.

Highlights

► We study qualitative aspects of gene interactions and fitness landscapes. ► A sufficient local condition for multiple peaks is given. ► The fitness graph reveals sign epistasis and other coarse properties. ► The shape, as defined in the geometric theory, reveals all gene interactions. ► Fitness graphs and shapes provide complementary information.

Introduction

We will study qualitative aspects of gene interactions. In particular, it is of interest to what extent beneficial mutations combine well. This question relates to the concept epistasis. Absence of epistasis means that the fitness effects of mutations sum, where fitness is defined as the expected reproductive success (different definitions of these concepts occur in the literature Mani et al., 2008). It is immediate that beneficial mutations combine well if there is no epistasis. However, it is well known that double mutants which combine beneficial single mutations may have very low fitness. Several examples from different species are given in Weinreich et al. (2005). Put briefly, “good+good=better” if there is no epistasis, but sometimes “good+good=not good” in nature. By a qualitative perspective we understand that one considers fitness ranks of genotypes, but not necessarily more fine scaled information, such as relative fitness values.

Fitness landscapes are central in the theory of adaptation and we will focus on the qualitative perspective. The fitness landscape was initially introduced as a metaphor for adaptation (Wright, 1931). Informally, the surface of the landscape consists of genotypes, where similar genotypes are close to each other, and the fitness of a genotype is represented as a height coordinate. Adaptation can then be pictured as an uphill walk in the fitness landscape.

A qualitative analysis is sufficient for several theoretical aspects of fitness landscapes. Coarse properties of fitness landscapes, such as the number of peaks, depend on fitness ranks of genotypes only. The relation between global and local properties can be analyzed from a qualitative perspective as well. From a more practical point of view, the qualitative perspective has several advantages. Fitness ranks are usually easier to determine as compared to relative fitness values. Fitness ranks tend to be stable under small variations in the environment. Moreover, fitness data of qualitative nature are already available. In particular, medical records on HIV drug resistance and antibiotic resistance provides indirect information about fitness ranks (see Section 5). It is frequently claimed that we know virtually nothing about fitness landscapes in nature. In our view, better methods for interpretation of fitness data are at least as important as new fitness measurements.

The concept of a fitness landscapes has been formalized in different ways. Conventionally, as a string in the 20, 4 or 2 letter alphabet, depending on if one considers the amino acids, the base pairs or biallelic system. In many real systems at most two alternative alleles occur at each position (or locus), resulting in a biallelic system. Alternatively, a biallelic assumption may be a reasonable simplification. For simplicity, we will consider biallelic populations throughout the paper. Let Σ={0,1} and let ΣL denote bit strings of length L. The zero-string denotes the string with zero in all L positions, and the 1-string denotes the string with 1 in all L positions. We define the fitness landscape as a function w:ΣLR, which assigns a fitness value to each genotype. The metric we use is the Hamming distance, meaning that the distance between two genotypes equals the number of positions where the genotypes differ. In particular, two genotypes are adjacent, or mutational neighbors, if they differ at exactly one position.

A walk in the fitness landscape has a precise interpretation. Consider a population after a recent change in the environment. Assume that the wild-type no longer has optimal fitness. If we assume the strong-selection weak-mutation (SSWM) regime, then a beneficial mutation is assumed to go to fixation in the population before the next mutation occurs (Gillespie, 1983, Gillespie, 1984). The population is monomorphic for most of the time, so that one genotype dominates the population at a particular point in time. It follows that we can think of a Darwinian process as an adaptive walk in the fitness landscape, where each step represents that a beneficial mutation goes to fixation in the population. The described model of adaptation has been widely used and relies on work by Gillespie, 1983, Gillespie, 1984. The sequence-based model of adaptation was introduced by Maynard Smith (1970). For more background and references, see also Orr, 2002, Orr, 2006.

For the qualitative perspective on fitness landscapes, one needs a refined version of the concept epistasis. According to our definition, fitness is additive or non-epistatic if fitness effects of mutations sum. (In the literature non-epistatic fitness is sometimes defined as multiplicative fitness.) Suppose that w(00)=1,w(10)=1.04,w(01)=1.02.If one considers 00 as a starting point, then the fitness effect of a mutation at the first locus is +0.04, and at the second +0.02. If fitness is additive, then w(11)=1.06 since 0.04+0.02=0.06, meaning that the fitness effects sum. Epistasis exists if w(11)1.06. Sign epistasis means that a particular mutation is beneficial or deleterious depending on genetic background. For example, if w(11)=1.03, then there is sign epistasis. Indeed, in this case a mutation at the second locus is beneficial for the genotype 00 since w(01)>w(00), and deleterious for the genotype 10 since w(11)<w(10). In contrast, if w(11)=1.05 there is epistasis, but no sign epistasis since fitness increases whenever a 0 at some locus is replaced by 1. For more background about epistasis, see e.g. Weinreich et al. (2005), Beerenwinkel et al. (2007b), Poelwijk et al., 2007, Poelwijk et al., 2011 and Kryazhimskiy et al. (2011). Recent work that considers qualitative properties of fitness landscapes includes Weinreich et al. (2005) and Poelwijk et al., 2007, Poelwijk et al., 2011. A central theme is how global properties of the fitness landscape, such as the number of peaks, relate to local properties, such as sign epistasis (see 2 A sufficient local condition for multiple peaks, 3 Fitness landscapes with no constraints). A related field is the study of constraints for orders in which mutations accumulate (see e.g. Desper et al., 1999, Beerenwinkel et al., 2007a). It is well known that a drug resistance mutation is sometimes selected for, only if a different mutation has already occurred. Such a phenomenon requires sign epistasis. Indeed, if a particular mutation is beneficial regardless of background, then it can occur before or after other mutations.

We will give an overview of classical models of fitness landscapes, and then compare with recent approaches and the qualitative perspective.

Several models of fitness landscapes have had a broad influence in evolutionary biology, primarily additive fitness landscapes, random fitness landscapes, the block model and Kaufman's NK model. Additive fitness landscapes are single peaked. In contrast, for a random (uncorrelated or rugged) fitness landscape (see e.g. Kingman, 1978, Kauffman and Levin, 1987, Flyvbjerg and Lautrup, 1992, Rokyta et al., 2006, Park and Krug, 2008) there is no correlation between the fitnesses of mutational neighbors, or genotypes that differ at one locus only. Random fitness landscapes tend to have many peaks. Random fitness and additivity can be considered as two extremes with regard to the amount of structure in the fitness landscapes.

For the block model (Macken and Perelson, 1995, Orr, 2006) the string representing a genotype can be subdivided into blocks, where each block makes an independent contribution to the fitness of the string. Each block has random fitness, and the fitness of the string is the sum of contributions from each block. In particular, if there is only one block, then the block model coincides with a random fitness landscape.

Kaufmann's NK model (see e.g. Kauffman and Weinberger, 1989) is defined so that the epistatic effects are random, whereas the fitness of a genotype is the average of the “contributions” from each locus. More precisely, for the NK model the genotypes have length N (in our notation L=N), and the parameter K, where 0KN1, reflects interactions between loci. The fitness contribution from a locus is determined by its state and the states at exactly K other loci. The key assumption is that this contribution, determined by the 2K+1 states (since we assume biallelic systems), is assigned at random from some distribution. The fact that the fitness of the genotype is the average of these N contributions, means that fitness effects of non-interacting mutations sum. Several important properties of NK landscapes depend mainly on N and K, rather than the exact structure of the epistatic interactions.

Notice that the NK model, as well as the block model includes additive landscapes and random landscapes as special cases. More importantly, the models are similar in that there is a sharp division between effects which are completely random and effects which are additive.

In contrast to the models discussed, the Orr-Gillespie theory (e.g. Orr, 2002) depends on the strategy to make minimal assumptions about the underlying fitness landscape, motivated by the fact that our knowledge about fitness landscapes is limited. The theory focuses on properties that hold for a broad category of fitness landscapes. Most results depend on extreme value theory. For more background and references on fitness landscapes in evolutionary biology, see e.g. Weinreich et al. (2005), Beerenwinkel et al. (2007b) and Kryazhimskiy et al. (2009). Fitness landscapes have been used in chemistry, physics and computer science, in addition to evolutionary biology. For a survey on combinatorial landscapes in general see Reidys and Stadler (2002). In combinatorial optimization the fitness function is referred to as the cost function.

The classical theory of fitness landscapes has been criticized for the lack of contact with empirical data (Kryazhimskiy et al., 2009). One sometimes encounters the misunderstanding that the block model, or Kaufmanns NK models, would include almost all [theoretically possible] fitness landscapes since they include the two extremes. According to this view, the goal of empirical work would be to determine parameters; the block length in the first case, or the “K” value in the NK model. On the contrary, these two models are equipped with very special structures. Additive fitness and random fitness are of course even more special. The Orr-Gillespie theory on the other hand focuses on general properties of adaptation for a broad category of landscapes, rather than relating properties of fitness landscapes and fitness data.

Put briefly, the theory of fitness landscapes has been developed in isolation from data, and available fitness data have been left without systematic interpretations. We argue that qualitative methods could be of some help. Practical methods for checking if some of the standard models of fitness landscapes are compatible with fitness data are indicated in Section 5, along with concepts for interpretations of fitness data (see also Crona et al., 2012).

In general, methods for revealing and interpreting properties of fitness landscapes from data, without assumptions about the underlying fitness landscape are especially valuable in our view. A recent contribution in this category is the geometric theory of gene interactions (Beerenwinkel et al., 2007b). Conventionally, the study of epistasis is restricted to two-way interactions or average effects of mutations. A full description of the gene interactions for multiple loci requires an entirely different theory. The geometric classification in Beerenwinkel et al. (2007b) uses triangulations of polytopes. For mathematical background we refer to De Loera et al. (2010), see also Ziegler (1995) for general theory about polytopes. Briefly, a square is an example of a polytope, and the two triangles obtained by cutting the square along a diagonal constitute a triangulation of the square (see Section 4). The geometric approach has revealed previously unappreciated gene interactions (Beerenwinkel et al., 2007b, Beerenwinkel et al., 2007c). This approach is relevant for the theory of recombination. Geometric and qualitative information is compared in Section 4.

The paper is structured as follows. 2 A sufficient local condition for multiple peaks, 3 Fitness landscapes with no constraints consider the relation between global and local properties of fitness landscapes. We prove our main results in Section 2, and introduce fitness graphs. We give a new proof of the main result in Weinreich et al. (2005) using fitness graphs in Section 3. In Section 4 we compare qualitative aspects of gene interactions with the geometric classification of fitness landscapes in terms of triangulations of polytopes. Section 5 is about applications, mainly the relation between models and fitness data.

Section snippets

A counterexample

As before, Σ={0,1} and w:ΣLR is the fitness landscape. For simplicity we assume that w(s)w(s) for any two strings s and s which differ in one position only in this section, in addition to the assumptions stated in the introduction.

An adaptive step in the fitness landscape corresponds to a change in exactly one position of a string so that the fitness increases strictly. An adaptive walk is a sequence of adaptive steps. A peak in the fitness landscape has the property that there are no

Fitness landscapes with no constraints

We will demonstrate the efficiency of fitness graphs by providing a brief proof of a result from Weinreich et al. (2005). For simplicity we assume that w(s)w(s) if ss in this section, in addition to the assumptions stated in the introduction. We refer to the global maximum of the landscape as “the fitness peak”. Moreover, define a general step similar to “adaptive step”, except that the fitness may decrease. A general walk, as opposed to “adaptive walk” is a sequence of general steps. If a

Fitness graphs and the shapes of fitness landscapes

We will compare information derived from fitness graphs with the geometric classification of fitness landscapes. For the reader's convenience, we will give a brief description of the geometric theory of gene interactions introduced in Beerenwinkel et al. (2007b). Our discussion is somewhat informal, and we refer to the original article for concepts and theory about the geometric classification of fitness landscapes, and to De Loera et al. (2010) for theory about polytopes and triangulations.

As

Applications

Qualitative aspects of gene interactions are interesting for practical reasons. Suppose that two single mutants which confer resistance to a particular drug have been found frequently, but never the corresponding double mutant. From this observation one concludes that there is sign epistasis. Indeed, “good+good=not good”, since the combination of two beneficial single mutations was never found. This information is intrinsic, whereas fitness measurements tend to be sensitive to the environment

Discussion

Fitness landscapes are central in the theory of adaptation, and we have studied them from a qualitative perspective. Our main result relates global and local properties of fitness landscapes. The qualitative perspective has contributed to our understanding of coarse properties of fitness landscapes. Moreover, there are practical reasons for the qualitative perspective. We have indicated simple tests for checking if fitness data are compatible with some of the classical models of fitness

Acknowledgement

This work was supported by NIH Grant 1R15GM090164-01A1.

References (34)

  • S. Bonhoeffer et al.

    Evidence for positive epistasis in HIV-1

    Science

    (2004)
  • M. Carnerio et al.

    Colloquium papersadaptive landscapes and protein evolution

    Proc. Natl. Acad. Sci. USA

    (2010)
  • Crona, K., Patterson, D., Stack, K., Greene, D., Goulart, C., Mahmudi, M., Jacobs, S.D., Kallmann, M., Barlow, M.,...
  • J.A. De Loera et al.

    TriangulationsApplications, Structures and Algorithms. Number 25 in Algorithms and Computation in Mathematics

    (2010)
  • R. Desper et al.

    Inferring tree models for oncogenesis from comparative genome hybridization data

    Comput. Biol.

    (1999)
  • H. Flyvbjerg et al.

    Evolution in a rugged fitness landscape

    Phys. Rev. A

    (1992)
  • J.H. Gillespie

    The molecular clock may be an episodic clock

    Proc. Natl. Acad. Sci. USA

    (1984)
  • Cited by (63)

    • From genotypes to organisms: State-of-the-art and perspectives of a cornerstone in evolutionary dynamics

      2021, Physics of Life Reviews
      Citation Excerpt :

      Thus, the input for CPMs is a matrix of individuals or patients by alteration events, where each entry in the matrix is binary coded as mutated/not-mutated or altered/not-altered (Fig. 10). The output from CPMs are directed acyclic graphs (DAGs) that encode the restrictions inferred (which are in fact sign epistasis relationships [43,342]). In these DAGs, an edge between nodes i and j is to be interpreted as a direct dependence of an alteration of event j on an alteration of event i; j should never be observed altered unless i is also altered.

    • An uncertainty law for microbial evolution

      2020, Journal of Theoretical Biology
      Citation Excerpt :

      The collection of fitness values for all genotypes, referred to as the fitness landscape (Wright, 1932), carries information on the predictability of evolution (De Visser and Krug, 2014). For an overview of evolutionary potential it is useful to consider fitness graphs (Crona et al., 2013; De Visser et al., 2009) (Figs. 1, 2, 3). The adaptive process for a population can be described as a walk that respects the arrows, starting from the ancestral genotype, marked blue, until it reaches some peak, marked red.

    • Match fitness landscapes for macromolecular interaction networks: Selection for translational accuracy and rate can displace tRNA-binding interfaces of non-cognate aminoacyl-tRNA synthetases

      2019, Theoretical Population Biology
      Citation Excerpt :

      Interactions between genes may cause double, triple, etc. mutants to have greater or lesser fitness than expected from the isolated fitness effects of their component mutations, a phenomenon known as epistasis. Reciprocal sign epistasis (in which recombinants of haplotypes are less fit than non-recombinants) is a necessary (but not sufficient Crona et al., 2013) condition for fitness landscapes to become rugged (Poelwijk et al., 2011), exhibiting potentially many separated local fitness maxima. Abstract genotype-fitness and genotype–phenotype-fitness models, such as the tunably rugged NK model (Kauffman, 1993; Kauffman and Levin, 1987) or other regulatory or metabolic network evolution models (Siegal and Bergman, 2002; MacCarthy and Bergman, 2007; Orlenko et al., 2017) typically lack a concrete, mechanistic interpretation for how epistasis actually manifests through the combined actions of genes on the basis of sequences.

    View all citing articles on Scopus
    View full text