Predicting an epistasis-rich genotype-phenotype map with a coarse-grained bottom-up model of budding yeast polarity

Accurate phenotype prediction based on genotypical information has numerous societal applications, such as design of useful crops of cellular factories. However, the prevalence of epistasis, a phenomenon that prevents many biological systems to perform in accordance with the sum of its parts, necessitates modelling the complex path between genotype and phenotype. Defining intermediate levels in this path reduces the complexity of prediction, and may also elucidate the phenotype coupling to other levels by evolution. Inconveniently, the latter requires definitions that maintain biophysical justification from the bottom-up, which conflicts with tractability. By means of a cell growth model, we exemplify a resolution for this conflict by polarization of Cdc42p in budding yeast, a process requiring clustering of active Cdc42p to one zone on the membrane and known to generate ample epistasis. Concretely, our model parsimoniously encompasses constant membrane area growth, stochastic Cdc42p turnover and a simple, justifiable polarity rule we define as the ‘mesotype’. Through intuitively interpretable simulations, we describe previously documented, yet puzzling epistasis inside the polarity module. Moreover, we generate evolutionary relevant predictions e.g., on environmental perturbations, which are general enough to apply to other systems. We quantify how poor growth medium can equalize fitness differentials and enables, otherwise very distinct, evolutionary paths. For example, the fitness of the crippled Δbem1 relative to WT can easily be raised from 0.2 to above 0.95. Finally, we can extend our predictions on epistasis to other modules. We determine that modelled epistasis predictions only add predictive value when functional information of the involved modules is included. This inspires a road-map towards modelling the bidirectional genotype-phenotype map for other model systems with abundant interactions, where the intermediate levels reveal targets that evolution can optimize and facilitate a biophysical justifiable incorporation of epistasis. Author summary Efforts to understand how traits follow from genes facilitate a broad range of applications. For example, crops can be engineered faster to better resist drought, salt and heat stress, and medicines can be better tailored to individuals. Unfortunately, the path from genes to traits can generally involve a complex interplay of hundreds of genes and gene products whose individual contributions can be heavily context-dependent. In this work, we provide the proof-of-concept in a relatively simple system for a road-map towards elucidating this path. We have constructed a cell growth model for budding yeast, only involving simple rules on membrane growth, protein production and centrally, polarity, the process where yeast chooses the future division site. Despite the simplicity, the polarity rule is fully justifiable from underlying biophysics. Model simulations show good accordance with formerly puzzling traits, and also predict the ease with which the environment can change evolutionary paths. While lab conditions may prohibit the emergence of certain polarity mutations, this becomes much more feasible ‘in the wild’. The tractable model nature allows us to extrapolate the context dependence of mutational effects beyond polarity, showing that this method for understanding trait generation also helps to elucidate protein evolution.


68
Many fields, such as personalized medicine [1], agriculture [2], chemical production [3] 69 and forensics [4], will greatly benefit from advances in understanding of the so-called genotype-70 phenotype (GP) map, the way that traits are connected to genes. However, this connection can 71 be quite complex even for known heritable traits ("missing heritability") [5], limiting the power 72 of genome-wide association studies [6]. On the one hand, one gene can be responsible for 73 multiple traits, pleiotropy, although this may not always be very common [7]. On the other 74 hand, multiple genes can contribute to one trait. Frequently, their individual effects are non-75 4 additive in humans [8,9], but also in model systems as Escherichia coli [10] or Saccharomyces 76 cerevisiae (budding yeast) [11], a phenomenon known as epistasis. Theoretically, epistasis is 77 expected to surface very easily based on metabolic network analysis [12], and has some known 78 molecular origins [13]. While epistasis can be inconsequential for fitness evolution [14], its 79 presence complicates the predictions of phenotypes from genotypes and consequently gene 80 evolution [15,16]. Therefore, predictions on epistasis constitute an important challenge for GP-81 map models. for additional observables that fine-tune predictions, but an abstract, unobservable entity as a 87 definition is also possible. Most importantly, a level serves to break up and re-bundle the 88 intertwined paths from individual genes to traits such that a more modular and hence more 89 tractable picture arises. In that view, a suitable level definition acts as a tree which branches out 90 to otherwise difficult to connect genotypes and phenotypes. 91

92
Multiple level examples exist, such as the biofunctional gene ontology level (ontotypes) [19], 93 the network based trophic level [20], the diffuse endophenotypes [21] and the mathematical 94 system design space [22]. Ideally, a one-level-fits-all approach exists, where the level definition 95 facilitates understanding of the emergence of phenotypes from genotypes, while at the same 96 time elucidating the handles for evolution, the reverse path in the GP-map. This

Coarse-grained bottom-up model design 157
Coarse description of cell expansion. We modeled the yeast cell cycle as a process involving 158 three modules, namely coarse-grained cell growth, protein turnover and cell polarity ( Fig. 2A). 159 A parsimonious approach to cell membrane growth was chosen consisting of two stages of 160 constant membrane area growth, as several alternative formulations proved immaterial for 161 8 phenotype description (S1 Fig.). The membrane expansion rates (C1 for G1, C2 elsewhere) were 162 brought in decent agreement with literature [35,36] (see S2 Table for  The first stage involves isotropic growth of a spherical cell of radius rm(t), which corresponds 184 to the G1 phase including the Start transition, at the end of which a checkpoint must be passed, 185 which is further explained in the following paragraph. Thereafter, the cell switches to the second 186 stage of growth, where the membrane grows in a polarized manner defining a bud with radius 187 rb(t), while the rest of the mother cell retains its size. The bud membrane growth area is constant 188 for the modelled equivalent of the S, G2 and M phase but larger than for the mother in G1. Bud 189 growth lasts until the second checkpoint, at which the bud proceeds as an independent cell when 190 it has reached a sufficient size (rb=rm cr). performed. The observed trends in G1 times along the evolutionary trajectory from WT to the 279 fully evolved mutant in that paper were qualitatively matched, including the unusual inversion 280 for the Δbem1 (Fig. 3B). The logic behind this inversion is that for WT cells in slower growth 281 medium, the size requirement is the most important criterion for the first checkpoint of Fig

Genetic interaction predictions 297
Poorer medium quality reduces fitness differentials. As aforementioned, the effect of the 298 environmental effects such as changes in growth media quality can be integrated in the model 299 15 through a change in membrane area growth rates C1 and C2. To assess the evolutionary 300 consequences of poorer medium content, we considered a roughly three-fold area growth rate 301 range that caused WT fitness to span between 0.5 and 1 (normalized to maximum growth). The intuition for this result is as follows. As seen in Fig. 3A, the Δbem1 background suffers 315 from the high Cdc42p concentration threshold, relevant at checkpoint 1 ( fig. 2A), and recover 316 fitness when this threshold is lowered by successive GAP deletions. Fig. 2B had in turn shown 317 the strong negative influence of dilution on the ability to exceed this threshold. Therefore, 318 Δbem1 cells benefit greatly from reducing the speed of membrane growth, while WT cells, for 319 which the threshold is not a problem at all, only suffer from slowing down the membrane 320 growth. An unmodelled inhibitor of this effect would be a reduced Cdc42p production in 321 medium with lower quality. However, Cdc42p expression is at least known to remain stable 322 upon switching from dextrose to ethanol, an inferior carbon source [44]. 323 324 Information on biological function of mutated genes are a prerequisite for predicting 325 epistasis. To assess whether we can extend the model predictions beyond polarity, we focus 326 on predictions of epistasis. This is the most generalizable quantity to assess cross-modular 327 interactions and is as mentioned in the introduction critical for constructing GP-maps. For this 328 purpose, we considered high-throughput data on numerous mutants, with varying levels of 329 detail regarding the mutant phenotypes, which we define as the information content. This 330 information will determine the precision with which the mutant can be incorporated into the 331 model. 332 333 Concretely, we restrict ourselves to epistasis between general mutants and Δbem1, since we 334 suspected that fitness differences in this ill background are exaggerated and hence more likely 335 to have been picked up in literature. We used Bayesian analysis on the prevalence of epistasis 336 signs to determine what degree of information on the general mutants add value to sign 337 predictions. The general mutants were absorbed in the model in three different ways; either 338 using the coarse information on the single deletion phenotype (deleterious or beneficial), or the 339 mid-detail information on the single deletion phenotype (faster, slower, larger or smaller in G1), 340 or the functional information (proteasomal, phospholipid or ribosomal). Within these three 341 categories, there is a further subdivision into two sets, based on whether the model predicts 342 positive or negative epistasis with Δbem1 (Fig. 5A). Firstly, mutants of which the coarse information are incorporated through modifying the 356 membrane area growth rates, concretely smaller and larger rates for deleterious and beneficial 357 mutants respectively. As seen in Fig. 4, smaller rates reduce the deleterious effect of the Δbem1, 358 prompting the prediction that negative epistasis with Δbem1 is generally more prevalent for 359 deleterious mutants than for beneficial mutants. The analysis shows no evidence that this 360 statement is correct (only a 20% chance, Fig. 5B). 361 362 Analogously, integrating the mutants on mid-detail information implies changing tG1,min (shorter 363 when fast in G1, longer when slow) or rmin (lower when small in G1, higher when large). 364 Mutants with shorter tG1,min and lower rmin disproportionally benefit the Δbem1 which suffers 365 most from Cdc42p dilution before the mesotype checkpoint. Therefore, the model prediction is 366 that mutants fast or small in G1 have more negative epistasis with Δbem1 than mutants that are 367 slow or large in G1. Still, the experimental evidence is not compelling (70% chance). 368 369 Finally, when incorporating the mutants using functional information, we lower τh 370 (proteasomal), membrane growth rates (phospholipid) and mean burst size pb (ribosomal). The 371 former two, which mitigate the problematic lack of Cdc42p in the Δbem1 to some extent, should 372 exhibit more negative epistasis than the latter one, which deteriorates the Δbem1 situation. 373 There is strong positive evidence for this statement (using the rules-of-thumb on Bayesian odds 374 ratios [45]), which is true with around 95% certainty. This displays the benefit of integrating 375 mutants based on functional information. 376 377

378
We have constructed, verified, validated and applied a coarse-grained growth model 379 encompassing the newly defined mesotype in order to describe phenotypes (subject to epistasis) 380 from genotypes or predict these. When ample molecular information is present, as is the case 381 for Bem1p and the GAPs, this strategy is quite successful to predict cell cycle times, given the 382 largely good quantitative matches in Fig. 3A and C and qualitative match for the peculiar G1 383 time inversion for the Δbem1 compared to WT (Fig. 3B). 384 385 Additionally, the information content about the phenotypes, associated with mutated 386 genes, required for predicting epistasis was assessed as it is a general hurdle for GP-map 387 19 models. As the mutant is encapsulated in our model through more detailed phenotypes, the 388 prediction quality increases accordingly. Typically, functional information is required to make 389 meaningful epistasis sign predictions (Fig. 5) (such as in the case with Nrp1p) is used, predictions can still be of decent quality (Fig. 3A). The 394 efficacy of phenomenologically integrating Nrp1p into this model provided substance to the 395 claim that this protein is mechanistically involved in shortening G1. Since obtaining near-396 complete information on the function of proteins is not within reach for most organisms, it is 397 comforting that mildly positive results may be achieved with phenomenological information 398 when building an otherwise biophysically justifiable bottom-up model. 399

400
Because the yeast polarity example shows the feasibility of our modelling strategy, we aim to 401 provide a road-map to apply these to general genotype-phenotype maps (Fig. 6). The core 402 functional component, in this case polarity, is modelled by justifiable coarse-graining, which 403 results in the mesotype of the system. This mesotype in turn emerges from functional subunits 404 quantities. For example, the GAP epistasis is accurately retrieved (Fig. 3A), and the prediction 423 of the poor medium effect to reduce fitness differentials (Fig. 4) readily allows interpretation. 424 The benefit of slower medium for the ill mutant Δbem1 fits the picture that haploinsufficiency 425 in YPD is typically lifted in poorer medium [47], and opens up a distinct avenue for adaptation. 426 Given that laboratory conditions are much more comfortable than the conditions under which 427 historical evolution has taken and is taking place, the likelihood of fixation of a polarization 428 network optimized on Bem1p or an rescue mechanism (as experimentally occurring in [27]) 429 becomes much more similar than naively expected. Moreover, this insight is quantifiable, we 430 show e.g., that merely slowing WT down by a factor of 2 reduces the relative fitness differential 431 to 0.05. Given that BEM1 has comparable characteristics to an essential gene, the evolvability 432 of essential genes may be greater than anticipated.  Fig. 3A were fitted using the native fminsearch on a normalized score 454 objective for varying [Cdc42]min and manual inspection for setting tmut (to 0.75) for the nrp1 455 deletion. Interaction and phenotype data for Fig. 5