## Abstract

The study of fitness landscapes, which aims at mapping genotypes to fitness, is receiving ever-increasing attention. Novel experimental approaches combined with NGS methods enable accurate and extensive studies of the fitness effects of mutations – allowing us to test theoretical predictions and improve our understanding of the shape of the true underlying fitness landscape, and its implications for the predictability and repeatability of evolution.

Here, we present a uniquely large multi-allelic fitness landscape comprised of 640 engineered mutants that represent all possible combinations of 13 amino-acid changing mutations at six sites in the heat-shock protein Hsp90 in *Saccharomyces cerevisiae* under elevated salinity. Despite a prevalent pattern of negative epistasis in the landscape, we find that the global fitness peak is reached via four positively epistatic mutations. Combining traditional and extending recently proposed theoretical and statistical approaches, we quantify features of the global multi-allelic fitness landscape. Using subsets of this data, we demonstrate that extrapolation beyond a known part of the landscape is difficult owing to both local ruggedness and amino-acid specific epistatic hotspots, and that inference is additionally confounded by the non-random choice of mutations for experimental fitness landscapes.

**Author Summary** The study of fitness landscapes is fundamentally concerned with understanding the relative roles of stochastic and deterministic processes in adaptive evolution. Here, the authors present a uniquely large and complete multi-allelic intragenic fitness landscape of 640 systematically engineered mutations in yeast Hsp90. Using a combination of traditional and recently proposed theoretical approaches, they study the accessibility of the global fitness peak, and the potential for predictability of the fitness landscape topology. They report local ruggedness of the landscape and the existence of epistatic hotspot mutations, which together make extrapolation and hence predictability inherently difficult, if mutation-specific information is not considered.

## Introduction

Since first proposed by Sewall Wright in 1932 [1], the idea of a fitness landscape relating genotype (or phenotype) to the reproductive success of an individual has inspired evolutionary biologists and mathematicians alike [2, 3, 4]. With the advancement of molecular and systems biology towards large and accurate data sets, it is a concept that receives increasing attention across other subfields of biology [5, 6, 7, 8, 9]. The shape of a fitness landscape carries information on the repeatability and predictability of evolution, the potential for adaptation, the importance of genetic drift, the likelihood of convergent and parallel evolution, and the degree of optimization that is (theoretically) achievable [4]. Unfortunately, the dimensionality of a complete fitness landscape of an organism – that is, a mapping of all possible combinations of mutations to their respective fitness effects – is much too high to be assessed experimentally. With the development of experimental approaches that allow for the assessment of full fitness landscapes of tens to hundreds of mutations, there is growing interest in statistics that capture the features of the landscape, and that relate an experimental landscape to theoretical landscapes of similar architecture, which have been studied extensively [10]. It is, however, unclear whether this categorization allows for an extrapolation to unknown parts of the landscape, which would be the first step towards quantifying predictability – an advance that would yield impacts far beyond the field of evolutionary biology, in particular for the clinical study of drug resistance evolution in pathogens and the development of effective vaccine and treatment strategies [8].

Existing research in this rapidly growing field comes from two sides. Firstly, different empirical landscapes have been assessed (reviewed in [4]), generally based on the combination of previously observed beneficial mutations or on the dissection of an observed adaptive walk (i.e., a combination of mutations that have been observed to be beneficial in concert). Secondly, theoretical research has proposed different landscape architectures (such as the House-of-cards, the Kaufmann NK, and the Rough-Mount-Fuji model), studied their respective properties, and developed a number of statistics that characterize the landscape and quantify the expected degree of epistasis (i.e., interaction effects between mutations) [11, 12, 13, 10, 14].

The picture that emerges from these studies is mixed, reporting both smooth [15] and rugged [16, 17] landscapes with both positive epistasis (i.e., two mutations in concert are more advantageous than expected; [18]) and negative epistasis (i.e., two mutations in concert are more deleterious than expected; [19, 20]; but see [21, 22]). Current statistical approaches have been used to rank the existing landscapes by certain features [10, 14] and to assess whether they are compatible with Fisher’s Geometric Model [23]. A crucial remaining question is the extent to which the non-random choice of mutations for the experiment affects the topology of the landscape, and whether the local topology is indeed informative as to the rest of the landscape.

Here, we present an intragenic fitness landscape of 640 amino-acid changing mutations in the heat shock protein Hsp90 in *Saccharomyces cerevisiae* in a challenging environment imposed by high salinity. With all possible combinations of 13 mutations of various fitness effects at 6 positions, the presented landscape is not only uniquely large but also distinguishes itself from previously published work regarding several other experimental features – namely, by its systematic and controlled experimental setup using engineered mutations of various selective effects, and by considering multiple alleles simultaneously. We begin by describing the landscape and identifying the global peak, which is reached through a highly positively epistatic combination of four mutations. Based on a variety of implemented statistical measures and models, we describe the accessibility of the peak, the pattern of epistasis, and the topology of the landscape. In order to accommodate our data, we extend several previously used models and statistics to the multi-allelic case. Using subsets of the landscape, we discuss the predictive potential of such modeling and the problem of selecting non-random mutations when attempting to quantify local landscapes in order to extrapolate global features.

## Material and Methods

Here, we briefly outline the materials and methods used. A more detailed treatment of the theoretical work is presented in the Supporting Information.

### Data generation

Codon substitution libraries consisting of 640 combinations (single, double, triple and quadruple mutants) of 13 previously isolated individual mutants within the 582-590 region of yeast Hsp90 were generated from optimized cassette ligation strategies as previously described in [24] and cloned into the p417GPD plasmid that constitutively expresses Hsp90.

Constitutively expressed libraries of Hsp90 mutation combinations were introduced into the *S. cerevisiae* shutoff strain DBY288 (can1-100 ade2-1 his3-11,15 leu2-3,12 trp1-1 ura3-1 hsp82⁚ ⁚leu2 hsc82⁚ ⁚leu2 ho⁚ ⁚ ⁚pgals-hsp82-his3) using the lithium acetate method [25]. Following transformation the library was amplified for 12 hours at 30deg C under nonselective conditions using galactose (Gal) medium with 100 *μ*g/mL ampicillin (1.7 g yeast nitrogen base without amino acids, 5 g ammonium sulfate, 0.1 g aspartic acid, 0.02 g arginine, 0.03 g valine, 0.1 g glutamic acid, 0.4 g serine, 0.2 g threonine, 0.03 g isoleucine, 0.05 g phenylalanine, 0.03 g tyrosine, 0.04 g adenine hemisulfate, 0.02 g methionine, 0.1 g leucine, 0.03 g lysine, 0.01 g uracil per liter with 1% raffinose and 1% galactose). After amplification the library culture was transferred to selective medium similar to Gal medium but raffinose and galactose are replaced with 2% dextrose. The culture was grown for 8 hours at 30deg C to allow shutoff of the wild-type copy of Hsp90 and then shifted to selective medium containing 0.5M NaCl for 12 generations. Samples were taken at specific time points and stored at ‐80degC.

Yeast lysis, DNA isolation and preparation for Illumina sequencing were performed as previously described [26]. Sequencing was performed by Elim Biopharmaceuticals, Inc and produced ≈30 million reads of 99% confidence at each read position based on PHRED scoring [27, 28]. Analysis of sequencing data was performed as previously described [29].

### Estimation of growth rates

Individual growth rates were estimated according to the approach described by [20] using a Bayesian Monte Carlo Markov Chain (MCMC) approach proposed in [30]. Nucleotide sequences coding for the same amino acid sequence were interpreted as replicates with equal growth rates. The resulting MCMC output consisted of 10,000 posterior estimates for each amino acid mutation corresponding to an average effective samples size of 7,419 (minimum 725). Convergence was assessed using the Hellinger distance approach [31] combined with visual inspection of the resulting trace files.

### Adaptive walks

In the strong selection weak mutation (SSWM) limit [32], adaptation can be modeled as a Markov process only consisting of subsequent fitness-increasing one-step substitutions that continue until an optimum is reached (so-called adaptive walks). This process is characterized by an absorbing Markov chain with a total of *n* different states (i.e., mutants), consisting of *k* absorbing (i.e., optima) and *n* – *k* transient states (i.e., non-optima). Defining w(g) as the fitness of genotype *g*, and *g*_{[i]} as the genotype *g* carrying a mutant allele at locus *i*, the selection coefficient is denoted by *S*_{j}(*g*) = *w*(*g*_{[i]}) – *w*(*g*), such that the transition probabilities *P*_{g,g[i]} for going from any mutant *g* to any mutant *g*_{[i]} are given by the selection coefficient normalized by the sum over all adaptive, one-mutant neighbours of the current genotype *g*. If *g* is a (local) optimum, *p*_{g,g} = 1. Putting the transition matrix **P** in its canonical form and computing the fundamental matrix, then allows to determine the expectation and the variance in the number of steps before reaching *any* optimum, and to calculate the probability to reach optimum *g* when starting from genotype *g*′ [33]. Robustness of the results and the influence of specific mutations were assessed by deleting the corresponding columns and rows in **P** (i.e., by essentially treating the corresponding mutation as unobserved), and re-calculating and comparing all statistics to those obtained from the full data set.

### Correlation of fitness effects of mutations

Strength and type of epistasis was assessed by calculating the correlation of fitness effects of mutations *γ* [14], which quantifies how the selective effect of a focal mutation is altered when put onto a different genetic background, averaged over all genotypes of the fitness landscape. Extending recent theory [14], we calculated the matrix of epistatic effects between different pairs of alleles (*A*_{i},*B*_{i}) and (*A*_{j},*B*_{j}) termed *γ*_{(Ai,Bi) → (Aj,Bj)}. (eq. S1_8), the vector of epistatic effects between a specific pair of alleles (*A*_{i},*B*_{i}) on all other pairs of alleles *γ*(*A*_{i},*B*_{i})→ (eq. S1_9), the vector of epistatic effects between all pairs of alleles on a specific allele pair (*A*_{j},*B*_{j}) termed *γ*→(*A*_{j}, *B*_{j}) (eq. S1_12), and the decay of correlation of fitness effects *γ*_{d} (eq. S1_15) with Hamming distance *d* averaged over all genotypes *g* of the fitness landscape.

### Fraction of epistasis

Following [34] and [35], we quantified whether specific pairs of alleles between two loci interact epistatically, and if so whether these display magnitude epistasis (i.e., fitness effects are nonadditive, but fitness increases with the number of mutations), sign epistasis (i.e., one of the two mutations considered has an opposite effect in both backgrounds) or reciprocal sign epistasis (i.e., if both mutations show sign epistasis). In particular, we calculated the type of epistatic interaction between mutations *g*_{[i]} and *g*_{[j]} (with *i* ≠ *j*) with respect to a given reference genotype *g* over the entire fitness landscape. There was no epistatic interaction if |*s*_{i}(*g*_{[j]}) – *s*_{i}(*g*)|< *ɛ* = 10^{‒6}, magnitude epistasis if *S*_{j}(*g*)*s*_{j}(*g*_{[i]}) ≥ 0 and *s*_{i}(*g*)*s*_{i}(*g*_{[j]}) ≥ 0, reciprocal sign epistasis if *S*_{j}(*g*)*s*_{j}(*g*_{[i]}) < 0 and *s*_{i}(*g*)*s*_{i}(*g*_{[j]}) < 0, and sign epistasis in all other cases [36].

### Roughness-to-slope ratio

Following [11], we calculated the roughness-to-slope ratio *ρ* by fitting the fitness landscape to a multidimensional linear model using the least-squares method. The slope of the linear model corresponds to the average additive fitness effect [10, 23], whereas the roughness is given by the variance of the residuals. Generally, the better the linear model fit, the smaller the variance in residuals such that the roughness-to-slope ratio approaches 0 in a perfectly additive model. Conversely, a very rugged fitness landscape would have a large residual variance and, thus, a very large roughness-to-slope ratio (as in the House-of-Cards model). In addition, we calculated a test statistic *Z*_{ρ} by randomly shuffling fitness values in the sequence space to evaluate the statistical significance of the obtained roughness-to-slope ratio from the data set *ρ*_{data} given by .

## Results and Discussion

We used the EMPIRIC approach [24, 26] to assess the growth rate of 640 mutants in yeast Hsp90 (see Materials and Methods). Based on previous screenings of fitness effects in different environments [29] and on different genetic backgrounds [20], and on expectations of their biophysical role, 13 amino-acid changing point mutations at 6 sites were chosen for the fitness landscape presented here (Fig. 1). The fitness landscape was created by assessing the growth rate associated with each individual mutation on the parental background, and all possible mutational combinations. A previously described MCMC approach was used to assess fitness and credibility intervals ([30]; see Materials and Methods).

### The fitness landscape and its global peak

Figure 2 presents the resulting fitness landscape, with each mutant represented based on its Hamming distance from the parental genotype and its median estimated growth rate. Lines connect single-step substitutions, with vertical lines occurring when there are multiple mutations at the same position (Fig. 1). With increasing Hamming distance from the parental type, many mutational combinations become strongly deleterious. This indicates strong negative epistasis between the substitutions that, as single steps on the background of the parental type, have small effects. This pattern is consistent with Fisher’s Geometric Model [37] when combinations of individually beneficial or small-effect mutations overshoot the optimum. It is also intuitively comprehensible on the protein level, where the accumulation of too many mutations is likely to destabilize the protein and render it dysfunctional [38].

Curiously, the global peak of the fitness landscape is located 4 mutational steps away from the parental type (Fig. 2B), with 98% of posterior samples identifying the peak. The fitness advantage of the global peak reaches nearly 10% over the parental type, and is consistent between replicates (see Materials and Methods; Supplementary Fig. 1). Though perhaps surprising given the degree of conservation of the studied genomic region ([24], Fig. S5), it is important to note that these fitness effects are measured under highly artificial experimental conditions including high salinity, which are unlikely to represent a natural environment of yeast. The effects of the individual mutations comprising the peak in a previous experiment without added NaCl were ‐0.04135, ‐0.01876, ‐0.03816, ‐0.02115 for mutations W585L, A587P, N588L and M589A, respectively, emphasizing the potential cost of adaptation associated with the increased salinity environment (data from [20]; see also [29]).

The global peak is not reached by combining the most beneficial single-step mutations, but via a highly synergistic combination of one beneficial and three ‘neutral’ mutations (i.e., mutations that are individually indistinguishable from the parental type in terms of growth rate). Figure 2C demonstrates that a multiplicative combination of the four mutations involved in the peak (termed “opt”) predicts only a 4% fitness advantage. Furthermore, even a combination of the four individually most beneficial single-step mutations in the data set (“best”; considering at most one mutation per position) only predicts a benefit of 6%. Notably, the actual combination of these four mutations is highly deleterious and thus exhibits strong negative epistasis. Although negative epistasis between beneficial mutations during adaptation has been reported more frequently, positive epistasis has also been observed occasionally [18, 39], particularly in the context of compensatory evolution. In fact, negative epistasis between beneficial mutations and positive epistasis between neutral mutations has been predicted by de Visser *et al*. [40]. Furthermore, our results support the pattern recently found in the gene underlying the antibiotic resistance enzyme TEM-1 b-lactamase in *E. coli*, showing that large-effect mutations interact more strongly than small-effect mutations such that the fitness landscape of large-effect mutations tends to be more rugged than the landscape of small-effect mutations [13]. However, this conclusion is highly dependent on the measures of epistasis used and the selection of mutants for the landscape [10].

### Adaptive walks on the fitness landscape

Next, we studied the empirical fitness landscape within a framework recently proposed by Draghi and Plotkin [41]. Given the empirical landscape, we simulated adaptive walks and studied the accessibility of the six observed local optima. In addition, we evaluated the length of adaptive walks starting from any mutant in the landscape, until an optimum is reached. In the strong selection weak mutation limit [42], we can express the resulting dynamics as an absorbing Markov chain, where local optima correspond to the absorbing states, and in which the transition probabilities correspond to the relative fitness increases attainable by the neighboring mutations (see Materials and Methods). This allowed us to derive analytical solutions for the mean and variance of the number of steps to reach a fitness optimum (see extended Materials and Methods), and the probability to reach a particular optimum starting from any given mutant in the landscape (Figs. 3, S2_2, S2_3).

Using this framework, we find that the global optimum can be reached with non-zero probability from almost 95% of starting points in the landscape, and is reached with high probability from a majority of starting points – indicating high accessibility of the global optimum (Fig. 3). The picture changes when restricting the analysis to adaptive walks initiating from the parental type (Figs. 3, S2_2, S2_3). Here, although 73% of all edges and 78% of all vertices are included in an adaptive walk to the global optimum, it is reached with only 26% probability. A local optimum two substitutions away from the parental type (Fig. 3C) is reached with a much higher probability of 47%. Hence, adaptation on the studied landscape is likely to stall at a sub-optimal fitness peak. This indicates that the local and global landscape pattern may be quite different, an observation that is confirmed and discussed in more detail below. In line with the existence of multiple local fitness peaks, we find that pairs of alleles at different loci show pervasive sign (30%) and reciprocal sign (8%) epistasis [34], whereas the remaining 62% are attributed to magnitude epistasis (i.e., there is no purely additive interaction between alleles; for a discussion of the contribution of experimental error see Fig. S2_4).

### Epistasis measures and the topology of the fitness landscape

Next, we considered the global topology of the fitness landscape. Various measures of epistasis and ruggedness have been proposed, most of them correlated and hence capturing similar features of the landscape [10]. However, drawing conclusions has proven difficult because the studied landscapes were created according to different criteria. Furthermore, published complete landscapes are too small to be divided into subsets, preventing tests for the consistency and hence the predictive potential of landscape statistics. The landscape studied here provides us with this opportunity. Moreover, because multiple alleles at the same site are contained within the landscape, we may study whether changes in the shape of the landscape are siteor amino-acid specific.

We computed various landscape statistics (roughness-to-slope ratio, fraction of epistasis, and the recently proposed gamma statistics; see Supplementary Material) [11, 10, 14], and compared them to expectations from theoretical landscape models (NK, Rough-Mount-Fuji (RMF), House-of-Cards (HoC), Egg-Box landscapes). Whenever necessary, we provide an analytical extension of the used statistic to the case of multi-allelic landscapes (see Materials and Methods). To assess consistency and predictive potential, we computed the whole set of statistics for (1) all landscapes in which one amino acid was completely removed from the landscape (a cross-validation approach [43], subsequently referred to as the ‘drop-one’ approach), (2) all possible 360 di-allelic sub-landscapes, and (3) for all 1,570 di-allelic 4-step landscapes containing the parental genotype, highlighting as special examples the three focal landscapes discussed.

We find that the general topology of the fitness landscape resembles that of a RMF landscape with intermediate ruggedness, which is characterized by a mixture of a random HoC component and an additive component (Fig. 4A,B). Whereas the whole set of landscape statistics supports this topology and our conclusions, the gamma statistics measuring landscape-averaged correlations in fitness effects, recently proposed by [14], proved to be particularly illustrative. We will therefore focus on these in the main text; we refer to the Supplementary Material for additional results.

### Predictive potential of landscape statistics

When computed based on the whole landscape and on a drop-one approach, the landscape appears quite homogeneous, and the gamma statistics show relatively little epistasis (Fig. 4B, 5B). On first sight, this contradicts our earlier statement of strong negative and positive epistasis but can be understood given the different definitions of the epistasis measures used: Above, we have measured epistasis based on the deviation from the multiplicative combination of the single-step fitness effects of mutations on the parental background. As these effects were small, epistasis was strong in comparison. Conversely, the gamma measure is independent of a reference genotype and captures the fitness decay with a growing number of substitutions as a dominant and quite additive component of the landscape.

Only mutation 588P has a pronounced effect on the global landscape statistics, and seems to act as an epistatic hotspot by making a majority of subsequent mutations (of indivually small effect) on its background strongly deleterious (clearly visible in Fig. 4B, 5C). This can be explained by looking at the biophysical properties of this mutation. In wild-type Hsp90, amino acid 588N is oriented away from solvent and forms hydrogen bond interactions with neighboring amino acids [24]. Proline lacks an amide proton, which inhibits hydrogen bond interactions. As a result, substituting 588N with a proline could disrupt hydrogen bond interactions with residues that may be involved in main chain hydrogen bonding and destabilize the protein. In addition, the pyrrolidine ring of proline is extremely rigid and can constrain the main chain, which may restrict the conformation of the residue preceding it in the protein sequence [44].

The variation between inferred landscape topologies increases dramatically for the 360 di-allelic 6-locus sub-landscapes (Fig.4C). Whereas they are still largely compatible with an RMF landscape, the decay of landscape-wide epistasis with mutational distance (as measured by *γ*_{d}) shows a large variance, suggesting large differences in the degree of additivity.

Interestingly, sub-sections of the landscape, typically carrying mutation *588P, show a relaxation of epistatic constraint with increasing mutational distance that cannot be captured by any of the proposed theoretical fitness landscape models, suggestive of non-random compensatory interactions. The variation in the shape of the fitness (sub-)landscapes is also reflected in the corresponding roughness-to-slope ratio (inset of Fig. 4C-D), further emphasizing inhomogeneity of the fitness landscape with local epistatic hotspots.

Finally, the 1,570 di-allelic 4-locus landscapes containing the parental genotype, though highly correlated genetically, reflect a variety of possible landscape topologies (Fig.4D), ranging from almost additive to egg-box shapes, accompanied by an extensive range of roughness-to-slope ratios. The three focal landscapes discussed above are not strongly different compared with the overall variation; yet show diverse patterns of epistasis between substitutions (Fig. 5).

Thus, predicting fitness landscapes is difficult indeed. Extrapolation of the landscape, even across only a single mutation, may fail due to the existence of local epistatic hotspot mutations. While the integration of biophysical properties into landscape models is an important step forward [e.g. 45], we demonstrate that such models need to be mutation-specific. Considering a site-specific model (e.g., BLOSUM matrix; [46]) is not sufficient. Newer models such as DeepAlign may provide the opportunity to allow integration of mutation-specific effects via aligning two protein structures based on spatial proximity of equivalent residues, evolutionary relationship and hydrogen bonding similarity [47].

## Conclusion

Originally introduced as a metaphor to describe adaptive evolution, fitness landscapes promise to become a powerful tool in biology to address complex questions regarding the predictability of evolution and the prevalence of epistasis within and between genomic regions. Due to their high-dimensional nature, however, the ability to extrapolate will be paramount to progress in this area, and the optimal quantitative and qualitative approaches to achieve this goal are yet to be determined.

Here, we have taken an important step towards addressing this question via the creation and analysis of a landscape comprising 640 engineered mutants of the Hsp90 protein in yeast. The unprecedented size of the fitness landscape along with the multi-allelic nature allows us to test whether global features could be extrapolated from subsets of the data. Although the global pattern indicates a rather homogeneous landscape, smaller sub-landscapes are a poor predictor of the overall global pattern because of ‘epistatic hotspots’.

In combination, our results highlight the inherent difficulty imposed by the duality of epistasis for predicting evolution. In the absence of epistasis (i.e., in a purely additive landscape) evolution is globally highly predictable as the population will eventually reach the single fitness optimum, but the path taken is locally entirely unpredictable. Conversely, in the presence of (sign and reciprocal sign) epistasis evolution is globally unpredictable, as there are multiple optima and the probability to reach any one of them depends strongly on the starting genotype. At the same time, evolution may become locally predictable with the population following obligatory adaptive paths that are a direct result of the creation of fitness valleys owing to epistatic interactions.

The empirical fitness landscape studied here appears to be intermediate between these extremes. Although the global peak is within reach from almost any starting point, there is a local optimum that will be reached with appreciable probability, particular when starting from the parental genotype. From a practical standpoint, these results thus highlight the danger inherent to the common practice of constructing fitness landscapes from ascertained mutational combinations. However, this work also suggests that one promising way forward for increasing predictive power will be the utilization of multiple small landscapes used to gather information about the properties of individual mutations, combined with the integration of site-specific biophysical properties.

## Supporting Information 1: Extended Materials and Methods

### Adaptive walks

Under the strong selection weak mutation (SSWM) limit [42], adaptation follows an absorbing Markovian process characterized by a series of fitness-increasing substitutions along onemutant neighbours until reaching a fitness optimum (forming a so-called adaptive walk), with a total of *n* different states (i.e., mutants), consisting of *k* absorbing (i.e., optima) and *n* ‒ *k* transient states (i.e., non-optima). Defining _{w}(*g*) as the fitness of genotype *g*, and *g*_{[i]} as the genotype *g* carrying a mutant allele at locus *i*, the selection coefficient is denoted by
such that the transition probabilities for going from any mutant *j* to any mutant *i* are given by the transition matrix
where *M*(*g*):= {*j*: *S*_{j}(*g*) > 0,**D**_{HD}(*g*,*g*_{[j]}) = 1} denotes the set of all adaptive, one-mutant neighbours of the current genotype *g*.

The canonical form of **P** can then be obtained by permutation, such that
where **Q** is a (*n* – *k*) × (*n* – *k*) matrix which contains the transition probabilities between transient states; **R** is a (*n* – *k*) × *k* matrix which gives the transition probabilities from any transient to any absorbing state; 0 is the *k* × (*n* – *k*) zero matrix; and **I**_{k} is the *k* × *k* identity matrix [33].

Using the above representation, all basic properties of the absorbing Markov chain can be calculated from the fundamental matrix

In particular, the expected number of steps before absorption (i.e., the expected number of steps on the fitness landscape before reaching any optimum) is given by
where 1 is a column vector of length (*n* ‒ *k*) with all entries being 1, and the *i*^{th} entry of *E*[*t*] gives the expected number of steps when starting from state (mutant) *i*.

Similarly, the variance in the number of steps before being absorbed can be computed as where ʘ denotes the Hadamard product.

Finally, the probability of being absorbed in state *j* when starting from transient state *i* (i.e., reaching optimum *j* when the initial genotype is *i*), is given by the (*n* – *k*) *x* k-matrix

Thus, these methods give an easy and computationally fast way of quantifying and predicting adaptive walks on fitness landscapes. Furthermore, the robustness of these results and the influence of particular mutants can be assessed by deleting the *i*^{th} column and row of **P** – i.e., by essentially treating mutant *i* as unobserved -, recalculating the above statistics and comparing these to the statistics obtained from the full data set. Similarly, entire mutations (i.e., amino acids) can be left out to assess their relative effect on the fitness landscape and the generality of our analysis (and the statistics used).

### Measuring epistasis

We applied different metrics for quantifying epistasis over the entire fitness landscape as well as for particular mutations and assessed their consistency in capturing the strength of gene x gene interactions. In particular, we follow the definition of epistasis by [48] (originally termed epistacy), as the deviation from additivity when combining two genetic effects which is measured by the difference in log-fitness between the effects of the double mutant and the single mutant relative to the wild-type fitness.

### Correlation of fitness effects of mutations: γ

The first measure has recently been introduced by [14] and is defined as the single-step correlation of fitness effects for mutations between neighbouring genotypes. It quantifies how the selective effect of a focal mutation is altered when it occurs in a different genetic background averaged over all genotypes of the fitness landscape. Geometrically, γ measures the correlation between slopes (with respect to genotype-fitness hypercube) of the same mutation put into different genetic backgrounds. Thus, if the fitness effect of a mutation is independent of its genetic background (i.e., if there is no epistasis), the correlation in slopes will be perfect (γ = 1), whereas it will be zero if the fitness slopes of each genotype are independent of the fitnesses of other genotypes (as in the House-of-Cards model; 49). Depending on the scale *γ* can either be used to quantify the strength of gene x gene interactions between specific mutations or as an overall measure for the entire fitness landscape. However, in its original form *γ* was defined for bi-allelic data only and thus needs to be extended by considering pairs of specific alleles at different loci [14].

Let **A**_{i} = (1*i*, 2_{i},…,*m*_{i}} denote the set of different alleles present at locus *i* for all polymorphic loci *i* ∈ {1, 2,…,*n*} such that |**A**_{i}|≥ 2 for all loci *i*. Further, let * G* be the set of all phenotypes that can be formed by combining all alleles such that the total number of genotype is .

Then the matrix of epistatic effects between loci *i* and *j* carrying alleles **A**_{i},*B*_{i} ∈ **A**_{i} and *A*_{j}, *B*_{j} ∈ **A**_{j} is given by
where *g*:= {*x* ∈ * G*\

*x*

_{i}=

*A*

_{i}or

*x*

_{i}=

*B*

_{i}and

*x*

_{j}=

*A*

_{j}or

*x*

_{j}=

*B*

_{j}} ⊆

*such that the sum is only calculated over the subset of genotypes carrying one of the two focal alleles at each focal locus. Thus,*

**G***γ*(

*A*

_{i},

*B*

_{j}) is a quadratic matrix of dimension . Note that in the case where |

**A**_{i}|= 2 for all loci

*i*, we obtain equation 9 in [14].

Likewise, the epistatic effect of a mutation in locus *i* with alleles (*A*_{i},*B*_{i}) on other loci (and pairs of alleles) can be calculated as
where the summation index **a**_{j} = {(*A*_{j},*B*_{j}) |*A*_{j},*B*_{j} ∈ *A*_{j} and *A*_{j} = *B*_{j}} is over the set of subsets of size two that can be constructed from all alleles found at locus *j*. Note that the third summation index *g* changes depending on **a**_{j}.

An additional summation allows calculation of the epistatic effect of a mutation in locus *i* carrying allele (*A*^{i}) on other loci (and pairs of alleles) can be calculated as
where **f**_{i} = {(*A*_{i},*B*_{i}) | *B*_{i} ∈ *A*_{i} and *A*_{i} = *B*_{i}} such that the sum is only calculated over the elements of the set of subsets of size two that can be constructed from all alleles found at locus *i* that contain allele *A*_{i}.

Then, summing over *l*_{i} = {(*A*_{i}, *B*_{i}) | *A*_{i}, *B*_{i} ∈ *A*_{i} and *A*_{i} = *B*_{i}}, i.e., the elements of the set of subsets of size two that can be constructed from all alleles found at locus *i*, gives the epistatic effect of a mutation in locus *i*

Similarly, the epistatic effect of other mutations (again considering pairs of alleles first) on locus *j* with alleles (*A*_{i}, *B*_{i}) can be calculated as
the epistatic effect of other mutations on locus *j* carrying allele (*A*_{j}) is given by
and the epistatic effect of other mutations on locus *j* becomes

Finally, *γ*_{d}, that is the decay of correlation of fitness effects with Hamming distance d, (i.e., the cumulative epistatic effect of *d* mutations averaged over the entire fitness landscape) is calculated as
where the last summation is over all different alleles present at locus *j* except the one carried by genotype *g* at locus *j*. Note that there is unfortunately no multi-allelic analog to equation (14) of [14]. Furthermore, as desired when there are only two alleles at a given locus, equations (S1_9 – S1_11) and equations (S1_12 – S1_14) collapse and give identical values, and reduce to their bi-allelic counterparts (i.e., eq. 7-8 in 14).

### Fraction of epistasis

The second statistic quantifies whether specific pairs of alleles between two loci interact epistatically, and if so whether they display magnitude epistasis (i.e., fitness effects are nonadditive, but fitness increases with the number of mutations), sign epistasis (i.e., one of the two mutations considered has an opposite effect in both backgrounds) or reciprocal sign epistasis (i.e., if both mutations show sign epistasis; 34, 35; for an implementation see [36]).

Using equation (S1_1) the type of epistatic interaction between a mutations at locus *i* and *j* (with *i* = *j*) when introduced on some reference genotype *g*, *e*(*g*,*i*,*j*), can formally be given as

Note that for numerical purposes we allowed for a very small deviation *ε* = 10^{‒6} in the case of no epistatic interaction. For ease of notation, we treat the elements in **A**_{i} as fixed and ordered, and define *I*(*g*_{[i]}) as the index of the allele present at locus *i* in genotype *g* in *A*_{i} such that the fraction of epistasis over all G can be calculated as
where *x* ∈ {none, magnitude, sign, reciprocal sign}, *1*_{x}(*y*) is the indicator function that is 1 if *x* = *y* and 0 otherwise, and I is a normalization constant.

## Acknowledgements

This project was funded by grants from the Swiss National Science Foundation (FNS) and a European Research Council (ERC) Starting Grant to JDJ. Computations were performed at the Vital-IT (http://www.vital-it.ch) Center for high-performance computing of the Swiss Institute of Bioinformatics (SIB). We thank Ines Fragata and Ana-Hermina Ghenu for helpful comments and discussion.