Uninterpretable interactions: epistasis as uncertainty

Epistasis is a common feature of genotype-phenotype maps. Understanding the patterns of epistasis is critical for predicting unmeasured phenotypes, explaining evolutionary trajectories, and for inferring the biological mechanisms that determine a map. One common approach is to use a linear model to decompose epistasis into specific pairwise and high-order interactions between mutations. Such interactions are then used to identify important biology or to explain how the genotype-phenotype map shapes evolution. Here we show that the coefficients extracted from such analyses are likely uninterpretable. They cannot be extracted reliably from experimental genotype-phenotype maps due to regression bias. Further, we can generate epistatic “interactions” indistinguishable from those in experimental maps using a completely random process. From this, we conclude that epistasis should be treated as a random, but quantifiable, variation in these maps. This perspective allows us to build predictive models with known error from a small number of measured phenotypes. It also suggests that we need mechanistic, nonlinear models to account for epistasis and decompose genotype-phenotype maps.


Introduction 21
Epistasis-that is, non-additivity between mutations-is a ubiquitous feature of genotype-22 phenotype maps (Fowler et al., 2010;Weinreich, 2011;Weinreich et al., 2013;Yokoyama 23 et al., 2014;Anderson et al., 2015;Palmer et al., 2015;Podgornaia and Laub, 2015;Doud 24 and Bloom, 2016; Boyle et al., 2017;Hopf et al., 2017;Chan et al., 2017;Sailer and Harms, 25 2017a; Starr et al., 2017;Domingo et al., 2018;Weinreich et al., 2018). Epistasis can pro-26 vide mechanistic insight into the determinants of phenotypes (Schreiber and Fersht, 1995;27 Horovitz, 1996;Ritchie et al., 2001;Segrè et al., 2005); however, it also complicates predict-28 ing unmeasured phenotypes (de Visser and Krug, 2014;Miton and Tokuriki, 2016;Sailer and 29 Harms, 2017b; Nyerges et al., 2018), as the effect of a mutation changes depending on the 30 presence or absence of other mutations. Despite a century of work (Fisher, 1918), epista-31 sis remains challenging to analyze and interpret (Cordell, 2002;Phillips, 2008;Crow, 2010;32 Weinreich et al., 201332 Weinreich et al., , 2018. 33 One approach is to decompose epistasis into specific pairwise and high-order interactions 34 between mutations (Heckendorn and Whitley, 1999;Weinreich et al., 2013;Poelwijk et al., 35 2016; Sailer and Harms, 2017a;Poelwijk et al., 2017;Weinreich et al., 2018). This is often 36 done by treating each coefficient as a linear and independent perturbation to the additive 37 phenotype (Heckendorn and Whitley, 1999;Poelwijk et al., 2016). Such an approach is a 38 direct extension of classic approaches in quantitative genetics and biochemistry. In a genetics 39 context, one might measure the effect of a mutation in two genetic backgrounds to dissect 40 metabolic and regulatory pathways (Ritchie et al., 2001;Segrè et al., 2005). Likewise, mutant 41 cycles are a mainstay of biochemistry. Introducing mutations individually and together 42 allows one to infer the nature of physical interactions between residues in macromolecules 43 (Schreiber and Fersht, 1995;Horovitz, 1996). 44 Although linear epistasis models are very commonly used (Weinreich et al., 2013;Yokoyama 45 et al., 2014;Anderson et al., 2015;Palmer et al., 2015;Starr et al., 2017;Domingo et al., 46 2018), two recent observations raise questions about their utility. The first is that regression 47 can lead to biased estimates of linear epistatic coefficients, and thus poor predictive power 48 of epistatic models (Otwinowski and Plotkin, 2014). The second is that one can generate 49 maps with extensive pairwise and high-order epistasis using a toy model of proteins that do 50 not explicitly include such interactions (Sailer and Harms, 2017b). This indicates that there 51 may be no simple way to relate linear epistatic coefficients back to underlying biology, thus 52 undermining their utility as indicators of biological mechanism. 53 Motivated by these concerns, we set out to systematically investigate linear epistatic 54 models constructed from twelve published genotype-phenotype maps. We focused on two 55 criteria for utility: the ability of such models to predict unmeasured phenotypes and the 56 ability of such coefficients to provide mechanistic insight into the map. We studied maps for 57 which all 2 L combinations of L mutations were measured. Because these maps have the same 58 number of observations as coefficients in a high-order epistatic model, they can be readily 59 decomposed into epistatic coefficients from second to L th -order. Further, the selected maps 60 cover many different classes of genotypes, phenotypes, and total magnitudes of epistasis. 61 We find that the epistatic coefficients we extract by regression from such maps are 62 quite poor at predicting unmeasured phenotypes. This arises from bias in the regressed 63 coefficients-exactly as predicted by Otwinowski and Plotkin (Otwinowski and Plotkin,64 2014). Further, we find we can generate epistatic coefficients similar to experimental co-65 efficients by simply using randomly assigned phenotypes. This suggests that the pairwise 66 and high-order interactions we extract are likely decompositions of random noise. We there-67 fore propose that we should not decompose genotype-phenotype maps into specific interac-68 tions between mutations using linear models. Rather, in the context of a whole genotype-69 phenotype map, epistasis is best interpreted as a global metric capturing roughness (Szendro 70 et al., 2013;Ferretti et al., 2018). This translates directly to a measure of uncertainty on 71 predicted phenotypes, as well as an indication that an improved mechanistic model is re-72 quired.

74
Linear epistasis models 75 We used a linear epistasis model to decompose genotype-phenotype maps into up to L th -order 76 epistatic coefficients. The model is linear in that it consists of a collection of independent 77 epistatic coefficients that are summed to describe each phenotype (Fisher, 1918;Poelwijk 78 et al., 2016). (The assumption of linearity contrasts with other models, such as a Potts 79 model, in which mutations sum in a nonlinear fashion (Hopf et al., 2017)). There are two 80 common formulations a linear epistasis model, the Hadamard model (sometimes called a 81 Walsh or Fourier model) and the biochemical model (Poelwijk et al., 2016). The approaches 82 differ in their choice of coordinate origin. Each model has been described in detail elsewhere 83 (Heckendorn and Whitley, 1999;Weinreich et al., 2013;Poelwijk et al., 2016). The two mod-84 els are related by a simple set of linear transformations (Poelwijk et al., 2016). Throughout 85 the text, we describe our results using the Hadamard model, but our conclusions are robust 86 to the choice of model (see supplemental figures referenced throughout the text).

87
The Hadamard model uses the geometric center of the map as the coordinate origin 88 (Heckendorn and Whitley, 1999;Weinreich et al., 2013;Poelwijk et al., 2016;Sailer and 89 Harms, 2017a). Each genotype is made up of L sites. In a binary genotype-phenotype map, 90 the sites have two possible states: "wildtype" or "derived". Both states have equal effects but 91 opposite signs. Each mutation is treated as a linear perturbation away from the origin of 92 the map, where β origin is the origin of the genotype-phenotype map, β i is the effect of site i, and x i is 94 1 if site i is "wildtype" and −1 if "derived". We can then add linear coefficients to describe 95 interactions between mutations to Eq. 1. For pairwise interactions, this has the form: where β ij is a pairwise epistatic coefficient. For the high-order model, the expansion contin-97 ues: The model can be expanded all the way to L th -order interactions.

99
Linearizing experimental genotype-phenotype maps Prior to extracting epistatic coefficients from experimental genotype-phenotype maps, we 101 corrected each map for global epistasis, which arises when mutations combine on some scale 102 other than an additive scale (Chou et al., 2011;Tokuriki et al., 2012;Schenk et al., 2013;103 Sailer and Harms, 2017a; . This violates the assumption of linearity 104 inherent in the epistasis models (Fisher, 1918;Cordell, 2002;Sailer and Harms, 2017a). 105 Global epistasis manifests as a non-normal distribution of the residuals between the P obs 106 (the vector of observed phenotypes) and P add (the vector phenotypes calculated using an 107 additive model) (Sailer and Harms, 2017a;. Such epistasis can 108 be minimized by identifying a nonlinear function T that captures global curvature in the 109 relationship between P obs and P add , yielding normally distributed fit residuals (Box and Cox, 110 1964;Szendro et al., 2013;Sailer and Harms, 2017a;: where ε are the fit residuals. We linearized all experimental maps by fitting a second-order 112 spline to the P obs vs. P add curve for each map prior to extracting linear epistatic coefficients 113 .

114
Epistasis and linear regression 115 We used linear regression to regress epistasis models against experimental and simulated 116 genotype-phenotype maps. We formulated the problem as follows: where P obs is a vector of observed phenotypes (corrected for global epistasis), β is a vector of 118 epistatic coefficients, X is a matrix that encodes the sign of each coefficient according to Eq. 119 3, and ε is a vector of residuals. The goal was to estimate coefficients in β that minimized 120 the magnitudes of the values in ε.

121
We used three different regression approaches: ordinary least-squares, lasso, and ridge. 122 The number of coefficients in these maps grows rapidly with the number of sites. For a 123 binary map with L sites, there are 2 L possible fit coefficients. Lasso and ridge regression 124 are strategies to identify only those coefficients that contribute significantly to the variation 125 in the data. These strategies have been used previously to dissect linear epistatic models 126 (Otwinowski and Plotkin, 2014;Poelwijk et al., 2017). Throughout the text, we describe 127 results using lasso regression, but our conclusions are robust to the choice of regression 128 strategy (see supplemental figures referenced throughout the text).

129
Simulating epistatic genotype-phenotype maps 130 We constructed genotype-phenotype maps using Equations 1 and 3. First, we set the addi-131 tive coefficients to random values drawn from a normal distribution. We then added all 2 nd -132 through L th -order epistatic coefficients. We set the values of the coefficients to random values 133 drawn from a different normal distribution. The widths of the additive and epistatic distri-134 butions were tuned to match the relative magnitudes of epistatic coefficients extracted from 135 experimental maps. Further, we could tune the fraction of epistasis in a simulated genotype-136 phenotype map by changing the relative widths of the additive and epistatic distributions 137 with respect to one another.

154
Regression yields biased estimates of epistatic coefficients 155 We started with a straightforward question: What fraction of a genotype-phenotype map 156 must we observe to resolve a linear epistatic model that predicts unmeasured phenotypes? 157 We simulated a genotype-phenotype map consisting of all 2 8 binary combinations of 8 muta-158 tions. We then assigned random epistatic coefficients using an 8th-order Hadamard matrix, 159 such that epistasis accounted for 20% of the variation in phenotype (see methods). The 160 epistatic coefficients were similar in magnitude and sign to those extracted from experimen-161 tal genotype-phenotype maps ( Fig S1).

162
To test our ability to predict phenotypes, we masked a fraction of the genotypes, fit linear 163 epistatic models to the unmasked genotypes, and attempted to predict the masked genotypes. 164 We then calculated the correlation between the model and unmasked observations (ρ 2 train ) 165 and the model and masked observations (ρ 2 test ). We repeated this for 1,000 pseudo-replicate 166 training and test sets.

167
As a starting point, we fit the additive model (Eq. 1). We found that the additive model 168 converged on ρ 2 train = ρ 2 test = 0.8 when 30% of the map was used for the fit (red lines, Fig 169   1A). The model converges once each mutation has been observed across a sufficient number 170 of genetic backgrounds to average out the epistatic perturbations to the phenotype. Because, 171 by construction, 20% of the variation in the map is due to epistasis, the best the additive 172 model can do is explain 80% of the variation in phenotype. 173 We next tried to improve our predictive power by adding coefficients describing either 174 pairwise interactions between mutations (Eq. 2) or all interactions (up to eighth-order) (Eq. 175 3). Because Eq. 3 is the model we used to generate the map, this model should, in principle, 176 be able to explain all variation in the map.
additive model (green and blue lines, Fig 1A). Even when 90% of the genotypes were included 179 in the training set, the pairwise and high-order models had ρ 2 test of 0.73 and 0.62-much less 180 than the value of 0.80 achieved by the additive model. Worse, this failure to predict the 181 test set was accompanied by much higher ρ 2 train values. The high-order model, in particular, 182 had a correlation of 1.0 with the training set ( Fig 1A,  The divergence between these curves indicates that the regression fails to extract the correct 191 values for the epistatic coefficients.

192
This can also be seen by examining the values of the extracted epistatic coefficients. The 193 blue curve in Fig 1C shows  We found that all of these maps exhibited statistically significant pairwise and high-order 220 epistatic interactions. Epistasis contributed from 6% to 79% of the variation in these maps 221 (Fig 2A). 222 We next probed our ability to extract predictive epistatic coefficients from the linearized 223 maps. We created a training set consisting of 80% of the genotype-phenotype pairs in each 224 map, regressed models against this set of observations, and then predicted the phenotypes of 225 the remaining 20% of the genotypes. As above, we fit the additive, pairwise and high-order 226 models. We then repeated this for 1,000 pseudo-replicate training and test sets on each map. 227 As with our simulations, we found we could not reliably extract predictive epistatic 228 coefficients ( Fig 2B). In 11 of 12 maps, the additive model performed better than any other 229 model. In seven of the twelve maps (I, VII, VIII, IX, X, XI, and XII), ρ 2 test consistently 230 decreased with each addition of epistatic coefficients. In four of the maps (II, III, V, and 231 VI) the addition of pairwise epistasis led to a drop in ρ 2 test that was partially offset by the 232 addition of high-order coefficients. Ultimately, however, the high-order model did no better 233 than the additive model in these maps. Map IV was the the only map in which adding 234 epistatic coefficients had any positive effect: the addition of pairwise epistasis led to a small 235 increase in ρ 2 test (from 0.80 to 0.87). This is achieved, however, by increasing the number of 236 fit parameters from 10 to 40, implying that each epistatic coefficient contributed very little 237 to the overall model. As with the simulated maps, these observations were robust to the 238 choice of epistatic model and regression strategy ( Fig S3).

239
Experimental epistatic coefficients cannot be distinguished from a 240 random model

241
These results indicate that predictive, linear epistatic coefficients cannot be estimated by 242 regression in these genotype-phenotype maps. We must characterize essentially every pheno-243 type in a genotype-phenotype map to resolve the epistatic coefficients that describe the map. 244 But, if we have measured every phenotype, there are no more phenotypes to predict. One 245 might conclude that understanding epistasis requires measuring every genotype-phenotype 246 pair in a map.

247
Given the effort required to measure every phenotype, we posed another question: is it 248 worth exhaustively characterizing a map just to extract epistatic coefficients? Or, put dif-249 ferently, are the epistatic coefficients one can decompose from a complete map informative? 250 We approached this question by comparing the epistatic coefficients extracted from an ex-251 perimental genotype-phenotype map to those extracted from a null model. Our null model 252 was a random map: we generated phenotypes with an additive model and then perturbed 253 each phenotype by a random value drawn from a normal distribution centered at zero. This 254 is an appropriate null model because the generating model has no mechanistic interactions 255 at all; any correlations between mutations arise from noise. Such a map consists entirely of 256 "statistical" epistasis (Cordell, 2002). 257 We decomposed the epistasis in Map VIII using all 32 measured phenotypes and compared 258 the resulting epistasis to our null model. Fig 3A-C shows the epistasis extracted from the 259 experimental map. In this map 26% of the variation in phenotype is due to epistasis (Fig 260   3B). The residuals between the additive model and the observed phenotypes are normally 261 distributed ( Fig 3B). When we decompose the epistasis, we find that pairwise coefficients 262 capture 16.2% and high-order coefficients capture 9.2% of the variation in phenotype.

263
We next constructed our null map. We generated a collection of random additive co-264 efficients and calculated P add for each genotype. We then added random perturbations to 265 each phenotype, drawn from a normal distribution with a mean of 0 and a standard devia-266 tion selected to yield a total magnitude of epistasis similar to the experimental map. This 267 sampling procedure gave the P obs vs. P add curve shown in Fig 3E. As with the experimental 268 map, epistasis accounted for 26% of the total variation in the map. We then decomposed 269 this random epistasis with a high-order epistasis model (Fig 3F).

270
The overall structure of epistasis is indistinguishable between the experimental and a 271 random map, even though the values of the specific epistatic coefficients are different (Fig 272   3C vs. F). If we generate many random maps-effectively, sampling over the possible config-273 urations of epistatic coefficients that arise from a random variation in phenotype-we cannot 274 distinguish the experimental map from among the decoys (Fig S4). This suggests that the 275 linear epistatic coefficients extracted from this map should be viewed as decompositions of 276 random noise, unless this can be shown otherwise.

277
Using an additive model to treat epistasis

278
Our results speak against decomposing epistasis into collections of linear interaction 279 terms. So how should we treat epistasis? We will touch on nonlinear treatments in the 280 Figure 3: Experimental maps resemble random maps. Panels A and D show genotypephenotype maps. Each node is a genotype; each edge is a single point mutation. Colors indicate quantitative phenotype. Panels B and E show the correlation between the observed phenotypes and the additive model, with fit residuals shown below the plot. Panels C and F indicate the magnitude of epistasis in each map as in Fig 2A (top subpanel) and the values of all model coefficients (bottom subpanel). Colors indicate additive components (red), pairwise components (green), and high-order components (blue). Each bar shows the value of a single model coefficient: the red bars correspond to the 5 additive coefficients, the green bars to the 10 pairwise coefficients, and the blue bars to the 17 high-order coefficients. Panels A-C are for experimental map VIII; panels D-F are for a simulated map with random epistasis.
discussion, but before doing so, we will explore our top-performing epistasis model from 281 above: the additive model.

282
The additive model treats epistasis as residual variation not explicitly accounted for by 283 the model. If we measure the phenotypes of a set of combinatorial genotypes, we observe the 284 effect of each mutation in a large number of genetic backgrounds (Fig 4A). We can describe 285 the effect of mutation i with two numbers, its average effect β i and the variance of its 286 effect σ 2 i . This same logic applies at the level of whole genotypes. If we have linearized the 287 genotype-phenotype map (Szendro et al., 2013;Sailer and Harms, 2017a;Otwinowski et al., 288 2018), the residuals between P obs and P add will be normally distributed ( Fig 3B). As a result, 289 the phenotype of a genotype g is given by: 290 P obs,g = P add,g ±ξ (6) where ξ is the standard deviation of the residuals between P obs and P add . This is the basic 291 definition of epistasis given by Fisher (Fisher, 1918), applied across the whole map.

292
This view is particularly useful for predicting unmeasured phenotypes. First: it means 293 each predicted phenotype has a known, normally distributed uncertainty. Even if a large 294 amount of variation remains unexplained by the additive model, it is safely partitioned into 295 a random normal distribution. Put another way, ξ acts as a prediction interval. Second: 296 because the additive model has few terms, we can train it using a very small amount of data. 297 Following this line of reasoning, we asked how many phenotypes we would have to measure 298 to construct a maximally predictive additive model. We constructed additive maps with 299 different alphabet sizes (ranging from 2 to 5) and numbers of mutations (ranging from 6 to 300 8). We then injected random epistasis ranging in magnitude from 10% to 60% of the variation 301 in the phenotype. We simulated experiments where we measured one random genotype at a 302 time, added it to our observations, and predicted the phenotypes of the remaining genotypes. 303 We then plotted ρ 2 test as a function of the average number of times we saw each individual 304 Figure 4: Epistasis as uncertainty. A) A partially characterized map. Circles represent genotypes, some of which have been measured (filled), some of which have not (unfilled). Lines represent single point mutations. Given these observations, we measure the effect of mutation 1 in five different backgrounds (red arrows) and can thus calculate the mean and variance in its effect across the map ( β 1 and σ 1 ). B) ρ 2 test versus the average number of times each mutation is seen in randomly sampled genotype-phenotype maps with epistasis responsible for 10% (blue) to 60% (brown) of the variation in the maps. Points indicate where ρ 2 test is within 5% of the maximum predictive power of the additive model. C) A calibration curve indicating how many times, on average, one must observe each mutation in map to resolve the additive coefficients in a map with different fractions of epistasis. mutation across all genetic backgrounds ( n obs ).

305
When plotted as a function of n obs , ρ 2 test rapidly rises and then saturates at the mag-306 nitude of the epistasis in the map, independent of alphabet size and number of mutations 307 ( Fig 4B). We next asked, as a function of the magnitude of the epistasis in the map, when 308 our predictions would be within 0.05 of the best achievable ρ 2 test . This is indicated by the 309 points on Fig 4B. We plotted these values as a function of the magnitude of the epistasis in 310 the map. This reveals a linear relationship between the average number of times we need to 311 see each mutation and the total epistasis in the map (Fig 4C). 312 We set out to test this approach using a partially sampled, experimental genotype-313 phenotype map characterizing the binding specificity of dCas9 to 23-base-pair oligonu-314 cleotides ( Fig 5A). The published experiment sampled 59, 394 of the 7 × 10 13 (4 23 ) possible 315 oligonucleotides. Although all bases were sampled at all positions, there was significant bias 316 towards a specific base at each position in the library (Fig 5A). The map exhibited a highly 317 non-linear relationship between P obs and P add (Fig 5B), so we linearized the map with a 318 5th-order spline (Eq. 4), yielding normal residuals between P obs,linearized and P add (Fig 5C). 319 We then assessed the predictive power of the map: we added genotypes individually to a 320 training set and evaluated our ability to predict the test set. We found that we were able 321 to fit a model using ≈ 4, 000 genotypes to predict the remaining ≈ 55, 000 measurements. 322 Because of the biased sampling of genotypes in the map, it took 4, 000 genotypes to observe 323 each individual mutation a sufficient number of times to resolve the additive effects of all 324 mutations ( Fig 5D). Our prediction curve saturated after we had seen each mutation at least 325 39 times. This is in good agreement with our calibration curve on simulated data, which 326 indicated we would need to observe each mutation an average of 40 times (with random 327 sampling) to saturate an additive model in which epistasis was responsible for 38% of the 328 variation in the map.

329
The predictive power of this model is quite good considering its simplicity: we are able 330 to predict any phenotype to ±38% given we only sampled one, one-billionth of a percent 331 of the map. Extensive epistasis remains, but it follows a normal distribution with a known 332 standard deviation. While there are certainly more sophisticated models, an additive model 333 provides significant predictive power for this map.

335
Our results suggest that a linear model should not be used to extract pairwise and 336 multi-way interactions between mutations in a genotype-phenotype map. Regressed epistatic 337 coefficients are biased (Fig 1B), unstable to the addition of new data (Fig 1C), and not 338 useful for predicting unmeasured phenotypes (Fig 2A). Far from being an anomaly, this 339 appears to be a shared feature of a collection of a dozen high-precision, combinatorially-340 complete genotype-phenotype maps ( Fig 2B). Further, we can generate epistatic coefficients 341 very similar to those observed in these maps using a simulated map in which we added 342 random, normally distributed noise to each phenotype (Fig 3). This argues for viewing 343 epistatic coefficients as uninterpretable decompositions of random variation, unless shown 344 Figure 5: A predictive, additive model can be trained on a large genotypephenotype map. A) Summary of the genotype-phenotype map reported in (Boyle et al., 2017). Map consists of 23 sites, each with four bases with the frequency at each site shown in the sequence logo. The total map has 7 × 10 13 genotypes; the publication reports measured phenotypes for 59, 394 genotypes. B) Raw P obs vs. P add plot for the map. Each point is a genotype. The fit residuals are shown below the main plot. We fit an 5th-order spline to linearize the map (red curve). C) The linearized form of the map, with epistasis removed using the spline shown in panel B. D) A predictive model can be trained using ≈ 4, 000 genotypes.
The bottom x-axis shows the number of unique genotypes used to train the model (sampled randomly); the top x-axis shows the fewest number of times any mutation was seen in that sample given the bias in the frequencies of the input mutations. ρ 2 test was measured against the remaining 50, 000+ genotypes not used to train the model. otherwise.

345
Viewed mechanistically, this is unsurprising. The epistatic models under investigation 346 assume linearity, but biology is nonlinear (Bershtein et al., 2006;Lehner, 2011;Chou et al., 347 2011;Tokuriki et al., 2012;Schenk et al., 2013;Sailer and Harms, 2017a;Otwinowski et al., 348 2018). There is no reason to believe that a linear model will capture a complicated non-349 linear system in a predictive and interpretable way. For example, we showed recently that 350 we could generate high-order epistasis using a toy thermodynamic model of proteins with 351 only explicit pairwise interactions (Sailer and Harms, 2017b). The epistasis arise because 352 mutations have a nonlinear effect on the relative populations of individual protein confor-353 mations. As a result, epistatic coefficients cannot be interpreted mechanistically-they are 354 purely "statistical" (Cordell, 2002). 355 Further, our results indicate that the signs and magnitudes of specific epistatic interac-356 tions extracted from genotype-phenotype maps have no universal meaning. For example, 357 in Fig 1C, the selected pairwise coefficient flips between positive, zero, and negative. If 358 different genotypes of the map are characterized, we obtain different values for the pairwise 359 coefficient, and thus a different interpretation for the effect of epistasis on the phenotype.

360
Treating epistasis with an additive model 361 A simple way to treat epistasis is as the residual variation after fitting an additive model (Eq. 362 6). Despite its simplicity, this is a useful perspective. It can be used to predict unmeasured 363 phenotypes in a genotype-phenotype map with known uncertainty. This is because deviation 364 from the additive model is determined by the magnitude of epistasis in the map (Fig 4A). 365 Further, the simplicity of the model means we can characterize an extremely sparse sample 366 of combinations of mutations across a genotype-phenotype map and still predict missing 367 phenotypes ( Fig 4C).

368
This suggests that sparsely sampling combinatorial genotypes, rather than aiming to 369 exhaustively characterize point mutants, may be a powerful way to understand and predict 370 genotype-phenotype maps. As long as each mutation is seen across a sufficiently large number 371 of genetic backgrounds, we can resolve its average effect across a volume of the genotype-372 phenotype map. In contrast, exhaustively sampling point mutations in a single background-373 such as a deep mutational scan-will yield mutational effects specific to whatever genetic 374 background is used. Epistasis is not averaged out, meaning such coefficients should not 375 provide high predictive power when mutations are combined.

376
Interpretation of epistasis as a prediction interval only holds when the fit residuals 377 are normally distributed about zero. Curvature between P obs and P add will lead to non-378 normal residuals and, thus, a distorted picture of the uncertainty (Sailer and Harms, 2017a;379 Otwinowski et al., 2018). Multiple methods exist for linearizing genotype-phenotype maps, 380 including taking the log of phenotypes (Cordell, 2002), power transforms (Sailer and Harms,381 2017a), splines , and even mechanistic models (Schenk et al., 2013;382 Otwinowski, 2018). This is one area for improvement, as better global models will decrease 383 the amount of variation that must be explained by the additive model. For example, in 384 our analysis of the dCas9 binding specificity, there is still structure in the residuals, with 385 some clustering along P add despite linearization using an 5th-order spline (Fig 5C). A model 386 that captures such variation could improve predictive power. Further improvement of global 387 models will thus be an important area of investigation.

388
Moving away from linear models 389 We see three promising ways forward. The first is to view epistasis in terms of its conse-390 quences for evolutionary trajectories. This includes metrics like the number of accessible 391 trajectories, number of fitness peaks, and summary statistics such as the roughness to slope 392 ratio (Szendro et al., 2013;Ferretti et al., 2018;Crona et al.). These metrics generally do 393 not allow prediction of unmeasured phenotypes nor mechanistic understanding of the map, 394 but can provide useful insights into evolutionary trajectories and outcomes without the poor 395 behavior observed in linear epistasis models.

396
The second is to use non-biological, nonlinear models to extract information from each 397 map. These include tools such as Potts models (Figliuzzi et al., 2016;Hopf et al., 2017), 398 variational auto encoders (Riesselman et al., 2017;Sinai et al., 2017), and neural networks 399 (Wang et al., 2017;Ma et al., 2018). Such approaches can yield predictive models of genotype-400 phenotype maps, and will no doubt continue to grow in popularity and sophistication. One 401 downside to these models is a requirement for massive amounts of training data-which may 402 not always be feasible, even in the modern high-throughput era. Further, it may be difficult 403 to link such models to an underlying biological mechanism.

404
The third is to attempt to model the underlying mechanistic process that leads to the 405 map (Tokuriki et al., 2012;Schenk et al., 2013;Otwinowski, 2018;Dutta et al., 2018). 406 Rather than taking a "top-down" approach, in which one dissects epistasis into statistical 407 interactions that are hopefully meaningful, one can instead take a "bottom-up" approach, 408 in which one calculates phenotypes from genotypes using a mechanistic biological model. 409 This model can then be trained against measured phenotypes. This provides a predictive 410 model for unmeasured phenotypes, as well as providing mechanistic insight into map between 411 genotype and phenotype. A good example is that of Schenk et al, who dissected a genotype-412 phenotype map by explicitly modeling the effect of each mutation on protein stability and 413 enzymatic activity (Schenk et al., 2013). This model captured extensive variation in the map 414 that could not be described with a linear model, while also providing mechanistic insight 415 into the protein under investigation.

417
Epistasis was described by Fisher as residual variation left over after fitting an additive 418 model (Fisher, 1918). While it may sometimes be productive to separate these residuals into 419 specific statistical coefficients, a better approach is to build better model. In our view, the 420 long-term goal should not be interpreting epistatic interactions between mutations; rather, 421 the long-term goal should be building mechanistic models that fit experimental observations 422 and, ultimately, make epistasis disappear.