GroEL/S helps purge deleterious mutations and reduce genetic diversity during adaptive protein evolution

Chaperones are proteins that help other proteins fold. They also affect the adaptive evolution of their client proteins by buffering deleterious mutations and increasing the genetic diversity of evolving proteins. We study how the bacterial chaperone GroE (GroEL + GroES) affects the evolution of green fluorescent protein (GFP). To this end we subjected GFP to multiple rounds of mutation and selection for its color phenotype in four replicate E. coli populations, and studied its evolutionary dynamics through high-throughput sequencing and mutant engineering. We evolved GFP both under stabilizing selection for its ancestral (green) phenotype, and to directional selection for a new (cyan) phenotype,. We did so both under low and high expression of the chaperone GroE. In contrast to prevailing wisdom, we observe that GroE does not just buffer but also helps purge deleterious mutations from evolving populations. In doing so, GroE helps reduce the genetic diversity of evolving populations. In addition, it causes phenotypic heterogeneity in mutants with the same genotype, potentiating their effect in some cells, and buffering it in others. Our observations show that chaperones can affect adaptive evolution through more than one mechanism. Highlights GroE reduces genetic diversity GroE potentiates the effect of deleterious mutations GroE intensifies purifying selection and leads to higher activity of client proteins


31
In most proteins, the majority of amino acids help provide a stable structural scaffold, whereas fewer 32 amino acids are directly responsible for catalysis or other protein activities 1 . Protein evolution is 33 thus constrained by mutations that destabilize a protein's three-dimensional fold 2,3 . Such mutations 34 can reduce protein activity and organismal fitness, for example by reducing the amount of correctly 35 folded and thus active protein. They can also increase a protein's propensity to form toxic aggregates 36 of misfolded proteins 4-6 . In most proteins, the majority of amino acids help provide a stable 37 structural scaffold, whereas fewer amino acids are directly responsible for catalysis or other protein 38 activities 1 . Mutations that create a new protein activity are especially often destabilizing 7-10 . 39 Cells encode multiple proteins called chaperones that are dedicated to help other proteins to 40 fold correctly and to maintain their fold. Chaperones act via various mechanisms, such as the 41 stabilization of newly synthesized polypeptides, the acceleration of the folding process, and the 42 refolding of misfolded proteins. This diversity of mechanisms is reflected in a diversity of chaperone 43 structures [11][12][13] . Prominent chaperone classes include the protein family Hsp60 (heat shock protein 44 with a molecular weight of 60Kda), the Hsp70, Hsp90 and Hsp100 families, as well as the trigger 45 factor. Chaperones from all these families exist in both bacteria and eukaryotes [11][12][13] . 46 The GroEL/S complex (GroE) is one of the major chaperones in bacteria. It is composed of the 47 essential proteins GroEL and GroES 14,15 , and belongs to the Hsp60 family. Eukaryotes also express a 48 GroE homolog, which helps mitochondrial and chloroplast proteins fold. Structurally, GroE belongs 49 to a class of chaperones known as chaperonins, which form a cylindrical cage that entraps an 50 unfolded polypeptide molecule and allows it to refold 16 . 51 During adaptive evolution, chaperones can facilitate the evolution of various organismal traits, single protein mutations 28,29 or small populations of variants 17,18 . In contrast, we maintained large 80 populations of more than 10 5 evolving proteins in which many variants segregate during multiple 81 rounds of directed evolution. 82 Specifically, we studied the influence of GroE on the adaptive evolution of green fluorescent 83 protein (GFP) in E. coli cells that overexpress GroE. We subjected GFP to directed evolution ex- 84 periments in which we alternated cycles of (PCR-mediated) mutation with selection imposed by 85 fluorescence-activated cell sorting (FACS), both with and without overexpression of GroE. In phase 86 1 of our experiments, we performed five rounds ("generations") of evolution under stabilizing 87 selection on the ancestral green fluorescent phenotype. We followed this phase 1 by a phase 2, in 88 which we imposed directional selection on the new color phenotype of cyan fluorescence during an 89 additional five rounds. We studied both stabilizing and directional selection, because a chaperone 90 might have different effects under different types of selection. 91 We chose GFP in this study for several reasons. First, its light-emission phenotype can be easily 92 measured at single cell resolution in a high throughput manner using flow cytometry. Second, it 93 allows us to exert selection in a highly controlled manner via FACS. Third, GFP is non-native to 94 the E.coli host, and interferes less with the host's cell physiology, growth, and metabolism than 95 native proteins would. Fourth, GFP is a known GroE client, that is, the chaperone can promote GFP 96 folding 34 . 97 We studied the genotypic and phenotypic evolution of GFP via high-throughput single molecule 98 real time (SMRT) sequencing, protein engineering, and phenotypic analysis. We focused on a key 99 prediction that distinguishes the buffering and potentiation hypotheses: If a chaperone buffers 100 the deleterious effects of mutations, then it should help increase genetic diversity in a population 101 over time, because some mutations that would otherwise be deleterious would be tolerated in its 102 presence. Conversely, if a chaperone potentiates the effect of such mutations, it should lead to 103 a loss of genetic diversity, because it renders such mutations more deleterious. We note that a 104 chaperone may buffer some mutations and potentiate others. Our experiments show that both 105 buffering and potentiation can occur in the same population, but that potentiation far outweighs 106 buffering in its effects on genetic diversity. 107

108
Experimental design 109 To evolve GFP under conditions of varying GroE (GroEL + GroES) expression, we first constructed an 110 E.coli plasmid ( Figure S2) that expresses GFP constitutively, and that allowed us to vary chaperone 111 expression via an arabinose-inducible promoter. With this expression system, we studied GFP 112 evolution at different chaperone expression levels. We note that GroEL and GroES are essential 113 proteins, such that the chromosomal genes groS and groL, cannot be deleted. Thus, when we refer 114 to GroE expression throughout, we strictly refer to overexpression of GroE from the expression 115 plasmid. Consistent with a previous demonstration that GFP is a client of GroE 34 , we found that 116 chaperone expression affects the fluorescence of our ancestral GFP protein ( Figure S4). 117 We performed directed evolution in four replicate populations that overexpressed GroE (condi-118 tion G + ) and in four other populations that did not (G − ). In each round (generation) of evolution and 119 for each population, we introduced random mutations into GFP via error-prone PCR at a rate of 120 ∼1 nucleotide substitution per GFP-coding gene, corresponding to approximately 0.95 amino acid 121 changes per GFP protein (Materials and Methods). The size of the population bottleneck during our 122 experiment was ∼ 10 5 individuals, such that genetic drift plays a negligible role on the time scale of 123 the experiment. 124 We selected cells for survival using FACS (Figure 1) under two selection regimes that distinguish 125 phase 1 from the later phase 2 of our experiments. In phase 1 we selected cells for survival that 126 showed the native (ancestral) GFP phenotype of green fluorescence. In phase 2, we selected for 127 the new phenotype of cyan fluorescence. Each phase consisted of five rounds ("generations") of 128 mutagenesis and selection. In both phases we applied weak rather than strong selection for high 129 fluorescence, because we reasoned that strong selection may favor mutants that fold well on their 130 own, may thus not require chaperone assistance, and would thus subvert the intent of our study.
GroE expression slows the decay of fluorescence under weak stabilizing selection. 136 The vast majority of mutations affecting protein evolution are deleterious [36][37][38] . Because phase 1 137 evolution involved only weak selection on our ancestral green fluorescence phenotype, we would 138 expect that such mutations accumulate in our phase 1 populations. This was indeed the case. We 139 measured the distribution of green fluorescence of 10 5 single cells from G + and G − populations at the 140 end of each round of phase 1 evolution. During all five generations, green fluorescence consistently 141 declined in all populations relative to the ancestor (Figure 2). However, median fluorescence of G + 142 populations declined significantly more slowly than in G − populations ( = 10 −7 , linear mixed effects 143 model [LMM], type-III analysis of variance [ANOVA] using Satterthwaite's method; Materials and 144 Methods). As a result, at the end of phase 1 evolution, all G + populations showed significantly higher 145 median green fluorescence than G − populations ( = 0.0088, one tailed Mann-Whitney U-test). 146 GroE slows genetic diversification under weak stabilizing selection. 147 We next turned to the mechanism by which the chaperone slows down the decay of green fluores- 148 cence. If this mechanism relies on buffering the effects of deleterious mutations, then chaperone 149 expression should help increase genetic diversity over time, because some mutations that would 150 otherwise be eliminated by purifying selection could remain in the population. Conversely, if the 151 chaperone acts predominantly by potentiating the effect of deleterious mutations, it should help 152 reduce genetic diversity, because more such mutations would be subject to purifying selection. 153 We define a deleterious mutation as one that reduces fluorescence, because in our experiments 154 selection acts on fluorescence. We note that buffering and potentiation may occur simultaneously 155 in the same population, i.e., GroE may buffer the effect of some mutations while it potentiates 156 the effect of others. To find out which process dominates in its effect on genetic diversity, we se- (LMM: ANOVA, < 10 −5 ). We performed analogous analyses for the average pairwise distance 165 between the genotypes in the same population ( Figure 3B), and for the Shannon entropy ( Figure 3C), 166 an information-theoretic measure of genetic diversity. We found that both these diversity metrics 167 also increase more slowly in G + populations (LMM: ANOVA, < 0.0012).

168
In sum, GroE reduces genetic diversity in our evolving populations. This supports the view that it 169 predominantly potentiates rather than buffers the effects of deleterious mutations, and thus helps 170 purge such mutations. 171 In addition to affecting the overall amount of genetic diversity, GroE may cause different kinds of 172 genotypes to accumulate. To find out whether this is the case, we randomly sampled 200 sequences 173 from each population, and displayed the location of these sequences in genotype space using  1 in two   187   conditions, one where the chaperone is not overexpressed and one where it is, then fluorescence   188   should be higher when the chaperone is overexpressed. This is not necessarily expected under   189   the potentiation hypothesis, where the chaperone may have simply helped eliminate deleterious   190 mutations, and the remaining mutations may or may not be chaperone dependent. In addition, 191 a potentiating chaperone may also decrease the fluorescence of some of the GFP variants that 192 remain in the final population. 193 To find out whether fluorescence at the end of phase 1 evolution is chaperone dependent, we 194 measured the fluorescence of those populations that had evolved while GroE was overexpressed, 195 both with and without the induction of the chaperone (Figure 4), and compared their median 196 fluorescence using a Mann-Whitney U-test. In three out of four populations chaperone expression 197 increased fluorescence ( < 1.5 × 10 −4 ), and in one (replicate 3) it decreased fluorescence ( < 198 10 −15 ). Although these differences are statistically highly significant because of the large number of 199 individuals we analyzed ( > 77000), we also note that they are small in magnitude, ranging from 200 2 to 11%. They contrast with the much greater differences that emerge in fluorescence during 201 evolution (Figure 2), most of which must be caused by potentiation. In sum, some mutational 202 buffering takes place in our evolving populations but its effect on overall fluorescence is small. 203 This conclusion is reinforced by specific candidates for buffered mutants that we engineered and 204 analyzed phenotypically (SOM section 8). 205 GroE disfavors the accumulation of deleterious mutations. 206 To further validate the hypothesis that GroE helps purge deleterious mutations, we examined our 207 sequence data for single amino acid variants that attained significantly lower frequency in G + than 208 in G − populations at the end of phase 1 (Materials and Methods). To keep this analysis tractable, 209 and to restrict ourselves to those mutations that are likely to affect fluorescence most strongly, we 210 restricted this analysis to variants whose frequency exceeded 3.5% at the end of evolution in at 211 least one replicate population ( Figure S10). We note that this frequency threshold is higher than 212 the expected frequency of any one variant due to mutation pressure alone ( = 10 5 , < 10 −5 , 213 Monte-Carlo simulations). 214 In total we identified seven such variants (GLM:LRT, < 10 −15 for the null hypothesis that they 215 have equal frequency in G + and G − populations). Specifically, these are the variants M1I, M1L, 216 M1V, S2G, K52R, I128T and N198D. Of these seven variants, the first four had consistently high 217 frequency (8.5 -67%) in every replicate G − population ( Figure 5A). More than 87% of individuals in 218 every population had at least one of these four mutations. In contrast, the other three mutations: 219 K52R, I128T and N198D, had comparatively lower frequencies (0.7 -5.5%; Figure S10). Therefore, 220 we chose to further investigate the mutations M1I, M1L, M1V and S2G. 221 To prove that these mutations indeed reduce fluorescence, we engineered them individually into 222 the ancestral GFP using site directed mutagenesis, and measured their fluorescence. They caused a 223 2.7 to 64 fold reduction in median green fluorescence relative to ancestral GFP (Figure 5B) multi-mutant genotypes also shows that these mutations do not simply hitchhike to fixation with 227 other, beneficial mutations (SOM section 6). 228 These observations raise the question why strongly fluorescence-deleterious mutations can 229 become highly abundant in G − populations in the first place. Since these mutations do not increase 230 fitness by enhancing GFP activity, the likely reason is that they provide a growth advantage to cells 231 harboring them. For example, three of these mutations (M1I, M1L, M1V) are start codon mutations. 232 Such mutations can reduce the translation initiation rate 40 , the amount of synthesized protein, and 233 hence also the protein expression cost 41 . Cells carrying these GFP mutations might have a lower 234 metabolic burden and can outgrow other cells that synthesize more GFP 41 . To find out whether this 235 is indeed the case, we measured the maximum growth rate of cells carrying the mutations M1I, 236 M1L, M1V and S2G, relative to that of ancestral GFP (Materials and Methods), and found that these 237 mutations indeed provide a significant growth advantage (Mann-Whitney U-test, < 0.013). Thus, 238 mutations that are deleterious for fluorescence can accumulate when GroE is not overexpressed. 239 We note that our choice of weak selection is advantageous to detect strongly fluorescence-reducing  Table S3). 259 These results suggest that both buffering and potentiation are possible for the same GFP variant, 260 depending on the cell where it is expressed.

261
Phenotypic heterogeneity leads to the potentiation of some deleterious mutations 262 but the buffering of others. 263 To understand how this phenotypic heterogeneity may affect the selection of deleterious mutations, 264 we developed a statistical model that relates fluorescence to fitness (i.e. the likelihood to survive 265 experimental selection). We define the fitness of a genotype as the fraction of cells in an isogenic 266 (genotypically homogeneous) population whose fluorescence intensity lies above the selection 267 threshold we used in our directed evolution experiments. In the absence of GroE expression, 268 individual cells of a given genotype show a unimodal Gaussian distribution with a mean − and 269 variance − that we can estimate from our engineered mutants( Figure S14, Table S3). In the 270 presence of GroE expression, this distribution changes to a bimodal distribution whose parameters 271 we can also estimate from data ( Figure S14,  Using this data-driven model, we found that GroE buffers strongly deleterious mutations whose 279 fluorescence mean ( − ) lies no more than 5% above the threshold value that is needed for survival 280 in our experiment. In contrast, GroE potentiated moderately deleterious mutations whose − lies 281 between 5 and 35.6% above this threshold (Figure 6). Outside this range, the value of ΔF is zero, 282 and GroE neither buffers nor potentiates mutational effects. 283 This model can explain several of our experimental observations, if one keeps in mind that our 284 populations evolved under weak selection for fluorescence, and that individuals can accumulate 285 deleterious mutations and survive selection as long as they fluoresce above a low fluorescence 286 threshold. Even mutants whose mean fluorescence lies slightly below the threshold can persist at 287 low frequency, because a few individuals may cross the selection threshold every generation due to 288 phenotypic heterogeneity ( Figure S14). Since most new mutations are deleterious 36, 37 , fluorescence 289 in our populations declines continually (Figure 2; G − populations) until most genotypes fluoresce 290 barely above the threshold. Our model predicts that GroE reduces the fitness of such deleterious but 291 above-threshold genotypes, causing them to become depleted in G + populations. This prediction is 292 supported by our genetic diversity analysis (Figure 3). The model also predicts that mutations which 293 are less deleterious and reduce fluorescence by a smaller amount, can persist in G + populations 294 (SOM Section 8), because GroE has no effect on the fitness of these mutations. 295 In addition, the model can help explain that some highly deleterious mutations become enriched 296 in G + populations, because GroE can buffer the deleterious effects of such mutations. One such 297 mutation is a start-codon mutation M1T discussed in SOM Section 8, which becomes enriched in 298 G + populations, even though its mean fluorescence lies 5 percent below the selection threshold.

299
GroE leads to evolution of higher fluorescence intensity but lower color shift dur-300 ing directional selection towards a new phenotype. 301 Since mutations that bring forth a new protein phenotype are often deleterious and destabilize a 302 protein 7-10 , we also asked how GroE may affect the adaptive evolution of a new phenotype. We 303 thus conducted a phase 2 of our evolution experiment, in which we selected for the new phenotype 304 of cyan fluorescence. Since green and cyan fluorescence are correlated phenotypes (Figure S5), 305 a green-fluorescing variant with high expression or stability could have a higher absolute cyan 306 fluorescence than a cyan-fluorescing variant with low expression or stability. To avoid this problem, 307 we selected cells whose cyan fluorescence increased relative to green fluorescence ( Figure S5). 308 Phase 2 started with populations from the end of phase 1 (round zero of phase 2). We subjected 309 these populations, to five additional rounds of directed evolution towards cyan fluorescence. 310 After every generation of phase 2 evolution, we measured cyan and green fluorescence of  Figure S6B). Thus, 314 our populations can evolve increased cyan fluorescence. 315 Next, we asked if GroE expression influences the rate of evolution towards the new color. To 316 this end, we compared the cyan fluorescence of G + and G − populations. During every generation 317 (including the starting population derived from the end of phase 1), G + populations had higher 318 median cyan fluorescence than G − populations (Mann-Whitney U test, < 0.015; Figure S6A). 319 We next asked whether the faster rate of cyan fluorescence in phase 2 G + population originated 320 during phase 2, or whether it might stem from the already higher fluorescence of the starting G +  Figure 7A). Moreover at the end of evolution, normalized cyan fluorescence was 330 not significantly higher in G + than in G − populations. This analysis suggests that the difference 331 between G + and G − populations during phase 2 may result from differences accumulated during 332 phase 1. However, we also note that after generation one of phase 2, median cyan fluorescence 333 increased more rapidly during every generation and remained somewhat higher in each of the last 334 three generations ( Figure 7A, Figure S6A). 335 We also analyzed a different aspect of the phenotype, which is the extent of the spectral shift 336 from green to cyan that occurred during phase 2. To find out whether GroE expression can affect 337 the rate of this spectral shift, we calculated the ratio of cyan and green fluorescence for each 338 cell in the different phase 2 populations. We refer to this ratio as relative color. Just like cyan 339 fluorescence increased during phase 2 ( Figure 7A), so did the spectral shift in both G + and G − 340 populations ( Figure 7B). However, this shift was lower for G + populations than for G − populations 341 during every round of evolution (Mann-Whitney U test, < 0.03).

342
GroE reduces genetic diversity during evolution towards the new phenotype. 343 We next asked whether GroE helps buffer deleterious mutations during phase 2, thus increasing sum, like in phase 1, GroE helps reduce genetic diversity, which is inconsistent with a net buffering 355 of deleterious mutations, and supports the notion that GroE helps purge deleterious mutations. 356 We also found that G + populations had lower phenotypic diversity than G − populations in every  One limitation of our work is the assumption that different GFP variants do not affect the growth 442 rate of the host. Such minimal interference with the host was one of the motivations to choose 443 the non-native GFP, and to express it from a low-copy-number plasmid (Material and Methods). 444 However, some of our start-codon mutations actually increased the host's growth rate. To avoid this 445 problem, future experiments might restrict mutagenesis to exclude such start-codon mutations. 446 However, it may be difficult to eliminate growth-affecting mutations completely. with BamHI and re-ligated the larger fragment corresponding to the plasmid backbone so as to 466 eliminate the GroE operon. We named this control plasmid pΔGro7. 467 We next identified a region in pGro7 that can be used to place a GFP expression cassette. This it is weakly dimerizing, has a single excitation peak (488nm), and undergoes fast maturation 46 . 472 We obtained the GFP expression cassette, which consists of a promoter followed by a ribosome 473 binding site and the GFP coding sequence, from plasmid pMSs201 47 . The GFP coding sequence is 474 additionally flanked by 5'-XhoI and 3'-XbaI restriction sites. Since these sites already exist in pGro7 475 and are thus not useful for cloning, we engineered a 5'-SalI site and a 3'-SacI site flanking the GFP 476 coding sequence in addition to the original restriction sites. We did so by PCR-amplifying the plasmid 477 with the primers, pMS-Sal1-GFP-F and pMS-GFP-SacI-R (Table S4), and cloned the PCR-product back 478 into the plasmid backbone. Next, we amplified the modified GFP expression cassette using the 479 primers pMS-BglII-F and pMS-HindIII-R (Table S4), and cloned it into pGro7 and pΔGro7. 480 To identify the best promoters for GFP expression, we repeated this process with three variants 481 of plasmid pMSs201, thus creating three pGro7 and three ΔGro7 plasmid variants that drive GFP 482 expression from the ompA, rpsM and rplN promoters 47 . We quantified GFP expression from each 483 promoter as explained in the next section. 484 Estimating of growth rates associated with different promoters 485 The host organism for our experiments is E. coli strain BW27784 (CGSC 7881), which cannot 486 metabolize arabinose. We cultured all cells hosting our expression plasmids in LB with 25µg/ml 487 chloramphenicol (LB+chl). Visual inspection of plated cells under blue light yielded green colonies 488 and showed that all constructed plasmids expressed GFP. We corroborated this observation by 489 measuring fluorescence on a plate reader (Tecan Spark 10M; Figure S1A). To this end, we diluted Under arabinose induction, the growth rate was higher for the rplN promoter strain ( Figure S1B) 509 while the end point OD was comparable between the two promoter strains ( Figure S1C). Therefore, 510 we chose the pGro7-rplN-GFP ( Figure S2) plasmid for all evolution experiments.  50% v/v methanol, 10% v/v acetic acid). Next, we destained the gel with destaining solution until the 536 background was clean and the bands were clear. 537 We observed no induction of GroEL (60Kda) in the absence of arabinose ( Figure S3 For mutagenesis by error-prone PCR, we used the primers Gro-Mut-F and Gro-Mut-R to amplify GFP 563 from pGro7-rplN-GFP (Table S4). 564 For the error-prone PCR itself, we used the following reaction mixture: 150nM each of the 565 nucleotide analogs 8-oxodeoxyguanosine triphosphate (8-oxo-dGTP, Trilink Biotechnologies) and Transformation of the mutant library using electroporation 592 We thawed frozen electrocompetent cells on ice and added the purified ligation products to them. 593 We transferred the resulting suspension into a 2mm electroporation cuvette (EP202, Cell Projects, Selection of transformed cells using FACS 615 We performed directed evolution in four replicate populations where GroE was expressed from our 616 expression plasmid, along with four control populations in which it was not expressed from this 617 plasmid. We applied the following selection protocol to each population. To prepare for selection, 618 we inoculated 4ml of LB-chl in a 20ml glass tube with 80µl of the appropriate transformed library. 619 We allowed the cells to grow at 37°C with shaking at 220rpm for 60 minutes, and then induced 620 GroE expression in G + populations with 0.1µg/ml of L-arabinose. We allowed cells to continue their 621 growth for another 10 hours. Subsequently, we transferred the tubes to ice and pelleted cells from and 493V, respectively. We recorded 100,000 events and analyzed the data using both MATLAB 666 (fca-Readfcs.m 48 ) and the R package flowCore 49 . We note that GroE expression led to an increase in 667 number of non-fluorescent "events" (signals) even in an isogenic population (data not shown). We 668 surmise that these non-fluorescent events could originate from nonviable cells which in turn could 669 arise due to protein overexpression stress. Therefore, we excluded all non-fluorescent cells from 670 our analyses. 671 We measured the fluorescence of evolved populations after every round of directed evolution. High-Fidelity polymerase (Thermo-Fisher). We performed PCR using an initial denaturation at 95°C 708 for 5min, followed by 30 cycles of amplification using the following program: 95°C for 30s, 62°C 709 for 30s, and 72°C for 40s, and a final extension at 72°C for 2min. We determined the purity of the 710 PCR products through agarose gel electrophoresis, and found that most of the products were clean, 711 without non-specific bands or primer dimers. We purified these products using a QIAquick PCR  769 We then performed PCA on a matrix containing all these numerical sequences, using the prcomp 770 function from the R package stats (v3.4.4) 58 (Figures S8A & S9A). The rows of this matrix harbor 771 individual sequences (genotypes). Its columns correspond to individual positions in the sequence. 772 We also performed PCA on a matrix harboring allele frequencies of all single amino acid 773 mutations from each population at the end of evolution ( Figures S8B & S9B) alone. 791 We explain this procedure with the mutation S147P, which occurs at a frequency of less than For an S147P mutation to occur, three events must take place. We calculate their probability as The probability ( ) that the mutation S147P occurs (i.e. all the above-mentioned events occur) in 815 any one generation is the product of the above three probabilities: mut × pos × sub = 8.18 × 10 −4 . 816 We note that we can neglect amino acid changes caused by double or triple nucleotide mutations, 817 because our sequencing data showed that every amino acid variant that exceeded our threshold 818 frequency was caused by a single nucleotide change. 819 We next turn to the second part of our numerical analysis, where we use the probability that a 820 specific variant arises to calculate how the expected mean frequency of this variant changes over 821 time. To this end, we used a discrete time stochastic model of a population whose individuals 822 mutate at a rate , such that the number of unmutated individuals becomes progressively smaller. 823 Our simulations neglect back-mutations to the wild-type allele, which will slightly overestimate 824 the allele frequencies caused by mutation pressure. In consequence, our analysis below will be 825 statistically conservative. That is, it might accept some variants as having a frequency consistent 826 with mutation pressure alone, while they may actually be affected by selection. 827 Specifically, for each mutant whose frequency exceeded our threshold, we performed the Engineering specific mutations 878 We used PCR-based site directed mutagenesis to engineer specific mutations into the GFP gene. 879 To this end, we first created a "minimal" plasmid that expresses GFP constitutively (pMini-GFP) 880 from the rplN promoter, but that did not contain the chaperone genes and the araC gene. We 881 designed primer pairs to amplify this entire plasmid from the site of the desired mutation (Table S5). 882 Specifically, we designed these primer pairs with 15 complementary nucleotides at their 3' end and 883 a non-complementary region that did not exceed 25 nucleotides in length. We included the desired 884 mutation in the complementary region. Whenever the difference in melting temperature (Tm) of 885 the primers exceeded 5°C, we trimmed the non-complementary region of the primer with higher 886 Tm from the 5' end. We used the software toolmelting (ver. 5.1) 59 to calculate Tm. We designed 887 the primers in this way to minimize inefficient amplification due to primer dimer formation. 888 We amplified pMini with different primer pairs for each mutation to be engineered (Table S5) 889 using high fidelity Q5 polymerase (NEB). We transformed the PCR products into E.coli BW27784 cells 890 made transformation-competent with the CaCl 2 method 60 , using a standard heat shock transforma-891 tion method 60 . We isolated and purified plasmid from the clones thus obtained and sequenced 892 the GFP gene to confirm the mutation. Next, we cloned each mutated GFP gene into the GroE 893 expression plasmid, pGro7-rplN-GFP. 894 We generated double mutants via the same procedure, by engineering the mutations serially 895 via two rounds of PCR. Next, we cloned the mutated GFP gene into pGro7-rplN-GFP. Modeling the effect of GroE overexpression on fitness 916 We define a genotype's fitness based on its fluorescence rather than its growth rate, because this is 917 the criterion we used during directed evolution. More specifically, we define a genotype's fitness as Secondly, for these mutations, the two fluorescence peaks that arose due to GroE expression 951 were so close that bimodality was not clearly apparent. Their overlap with the autofluorescence 952 distribution further hindered the discrimination of these peaks. 953 Our procedure resulted in an estimate of the parameters − , − , + L , + H , + L , + H , + L , and + H , 954 for each of the 19 mutants we analyzed, and for three biological replicates for each mutant ( Figure   955 S14). 956 Across these mutants, the value of + L was on average ∼ 93% of that of − , and that of + H 957 was on average ∼ 107% of that of − (Figure S16A). In addition, for any one mutation the values 958 of + L and + H were clearly distinct from each other (Figure S16A). For these reasons, we chose to 959 express + L and + H relative to − . By doing that, one can obtain the absolute value of each peak by 960 multiplying the relative values with − . We denote these relative values by the symbols ′ + L and ′ + H , 961 respectively. 962 The weight coefficents, + L and + H , did not depend on − , and their values showed a non-963 overlapping distribution across mutants, with means of 0.64 and 0.76, respectively (Figure S16B). 964 The standard deviations − , + L and + H also did not depend strongly on − but their distributions 965 across mutants overlapped ( Figure S16C). Below, we will refer collectively to + L , + H , + L , + H , + L , 966 and + H as the parameters of the bimodal distribution.

967
Step 2: To map fluorescence distributions of arbitrary mutants onto fitness, we first represented 968 different mutants through different mean fluorescence values − in the absence of GroE expression. 969 We explored a range of − values ranging from 10 to 10 5 , because this is the range of green  (Table S3). 983 We  (Figure S16). We thus estimated each of these 998 parameters by sampling them from a Gaussian distribution whose parameters we estimated from 999 the experimental data (Table S3), exactly as we described above for − . 1000 At the end of this procedure we had obtained a total of 4000×4000×6 combinations of parameters.

Ancestral GFP is a GroE client
In this analysis, we determined whether GroE expression affected the fluorescence of the ancestral GFP. To this end, we measured the fluorescence of 10,000 cells expressing the ancestral GFP, with and without GroE expression, using flow cytometry (Methods). We found that GroE expression created phenotypic heterogeneity in GFP expression, which is evident by a bimodal distribution of fluorescence ( Figure S4). This change in the fluorescence distribution suggests that our GFP variant is likely to be a GroE client. One of the two fluorescence intensity peaks has a higher intensity and the other one has a lower intensity than the unimodal peak of the fluorescence distribution without GroE induction.    increase their fluorescence when GroE is overexpressed compared to when it is not overexpressed. This is indeed the case ( Figure S7). Cyan fluorescence was significantly higher in these populations when we overexpressed GroE (Mann-Whitney U-test, P < 6.5 × 10 −4 ), although the extent of this increase was small (2 -25%; Figure S7).
This suggests that at least some buffering of mutations takes place during phase 2.

GroE expression affects the spectrum of accumulated genotypes in both phase 1 and phase 2.
To find out whether the chaperone causes different kinds of genotypes to accumulate in phase 1 populations, we randomly sampled 200 sequences from each population at the end of phase 1, and displayed the location of these sequences in genotype space using PCA (Materials and Methods: Principal component analysis of the genotypes).
Populations that evolved under chaperone overexpression cluster in different regions of genotype space than the control populations ( Figure S8A). These patterns are corroborated by a complementary PCA, which we conducted on the frequency spectrum of alleles that differ in specific amino acids from the ancestor ( Figure S8B).
We performed a similar analysis for populations at the end of phase 2 and found that G + populations accumulate a different set of genotypes compared to G − populations ( Figure S9).     Figure S11: Frequency of single amino acid mutations during phase 2 evolution. We have shown only those mutations whose frequencies exceed of 5% in at least one replicate population, by the end of evolution and are differentially enriched between G + (red) and G − (blue) populations. We included M1I in these plots to show that while it is abundant in phase 1 G − populations, it is lost in phase 2. Mutations that we analyzed further are highlighted with bold face font. Vertical axes (in log scale) denote the mutation frequency and the horizontal axes denote the generation i.e. round of evolution. Dotted lines indicate the frequency of mutations in individual replicates, whereas solid lines denote the median frequency.
7 Deleterious mutations do not hitchhike with potentially stabilizing mutations in the absence of GroE expression.
Most random mutations are likely to be deleterious [1][2][3] and they can persist in populations by hitchhiking with beneficial or stabilizing mutations. The high frequency of the deleterious mutations M1I, M1L, M1V and S2G in G − populations raises the possibility that these mutations hitchhike with other beneficial mutations. To exclude this possibility, we analyzed the frequencies of variant genotypes that harbor one or more single mutation. We found that the most abundant genotype in all the G − populations contained just the mutation M1I (Table S1). This genotype exceeded a frequency of 10% in all the replicate populations. We next determined if the other three abundant deleterious mutations (M1L, M1V and S2G) also frequently existed as single mutation genotypes. This was indeed the case. Specifically, these single mutation genotypes were among the six most frequent genotypes in all replicate populations, and had a frequency that ranged between 0.45 and 40 percent (Table S1). This suggests that these deleterious mutations can rise to appreciable frequencies on their own. We next analyzed the multi-mutation genotypes in G − populations and found that M1I+S2G and M1V+S2G were the most abundant multi-mutation genotypes in every replicate population, with a frequency range of 1.5 -5%. No other multi-mutation genotype rose above the frequency of 0.9% in any replicate population. To further validate these findings, we calculated the pairwise cooccurrence frequency of different mutations in G − populations. Consistent with our previous finding, S2G coexisted most frequently with M1I and M1V in more than 6.5% of the genotypes and in every replicate population. In contrast, no other mutation pair had a frequency higher than 4% in any replicate population. Taken together, these results indicate that the abundance of deleterious mutations in G − populations is not due to hitchhiking.   Table S2: Frequencies of the 10 most frequent genotypes in G + populations at the end of phase 1.

Deleterious mutations rarely accumulate through GroE mediated buffering.
Because the phenotype of populations evolved under GroE overexpression (G + ) is not strongly dependent on chaperone expression (Main text: Figure 4), the presence of most deleterious mutants in these populations cannot be explained by GroE mediated buffering. To further validate this observation, we analyzed mutations that are enriched in G + populations, i.e. mutations that have a higher frequency in these populations compared to G − populations. They are the most promising candidates for mutations whose deleterious effects are buffered by GroE. In this analysis, we focused on mutations that attained a frequency exceeding 3.5% in at least one replicate population. We found 25 such mutations with significantly higher frequency in G + populations (GLM: likelihood ratio test [LRT], P < 10 −3 ) in G + populations after the end of phase 1 evolution ( Figure S10). The majority of these 25 mutations met our selection criterion in only one out of four populations. Moreover, in some replicate populations these mutations initially increased in frequency only to become lost again (frequency less than 0.01%). With the exception of I161V and N212D ( Figure S10) the mutations did not steadily increase in frequency. Thus, most mutations enriched in G + populations did not show the kind of evolutionary dynamics expected from consistent and sustained buffering.
We nonetheless analyzed selected mutations in more detail to study their phenotypic effects. They include two classes of G + -enriched mutations that stand out because of a common characteristic. These are isoleucine to valine (I→V) and lysine to arginine (K→R) mutations. We identified four G + -enriched mutations in each class. These are I123V, I161V, I171V, I188V, K26R, K41R, K162R and K166R. In addition, one start codon mutation (M1T) was also G +enriched. This mutation provides a contrast to our previously analyzed start codon mutations that were deleterious, potentiated, and thus G − -enriched (Main text: Figure 5). We hypothesized that M1T was deleterious and possibly buffered. Furthermore, we also analyzed the mutation N212D, because it was one of the only two mutations that steadily increased in frequency during phase 1 evolution. (The other is I161V and it is included in the list of isoleucine to valine mutations we analyzed.) To identify how each of these 10 mutations affected the phenotype, we engineered them individually into ancestral GFP, and quantified their fluorescence in three biological replicate measurements. Since N212D was one of the only two mutations that appeared to have steadily increased in frequency during phase 1 evolution, we also engineered this mutation and quantified its fluorescence. We found that M1T was indeed deleterious for fluorescence, reducing it by 65 -69 fold relative to ancestral GFP. Of the other nine mutations, only two (K166R and I171V) showed a modest (7 -40%) increase in fluorescence relative to ancestral GFP in all the three biological replicate measurements we conducted. The remaining seven mutations did did not significantly affect fluorescence ( Figure S12).  Figure S6C).
To find out which mutations may be involved in this change, we first identified single amino acid variants in the evolving populations that had very low frequency in phase 1 but high frequency in all replicate populations of phase 2, reasoning that these variants may be involved in the color shift. ( Figure S11). The three variants that satisfied this criterion are S147P, T203A and T203S ( Figure S13). We engineered these variants into ancestral GFP to study their fluorescence. In addition, we engineered the double variants S147P/T203A and S147P/T203S. All three single variants and both double variants shifted color, i.e., they showed a significantly higher cyan fluorescence relative to green fluorescence than ancestral GFP (Mann-Whitney U-test, P < 10 −15 , Figure S15A). Of the single variants, T203S showed the greatest color shift (1150% of the ancestral relative color) followed by S147P (130%) and T203A (105%).
Moreover, the two single amino acid changes in the double variants had a synergistic effect, such that the relative color of the double variant was higher than the sum of the individual single variant ( Figure S15B).
Importantly, while G − populations accumulated all three single mutations and the double mutations at varying frequencies, G + populations only accumulated the variant S147P, which swept through these populations. This can explain why the color shift is less pronounced in G + populations. To explain how GroE expression could cause this phenomenon, we considered three hypotheses. First, the color-shifting variants may be deleterious and hence disfavored by GroE (Main text: Figure 6). This is not the case. Both T203A and T203S (also S147P) and both double mutants are in fact beneficial, i.e., they increase the absolute cyan fluorescence ( Figure S15C). Second, GroE might reduce the fluorescence of the T203A/S mutants and thus disfavor their selection. To find out whether this is the case, we measured the fluorescence of all three single mutants and the two double mutations under GroE expression.
We found that GroE expression led to the kind of phenotypic heterogeneity we previously discussed for phase 1 mutations ( Figure S14B). Like phase 1 mutations, the distribution of log transformed fluorescence, which was unimodal without GroE expression, became bimodal upon GroE expression. One of the two bimodal peaks had lower fluorescence intensity (µ+ L ) than the unimodal peak (µ− ) whereas the other peak (µ+ H ) had a higher fluorescence intensity than µ− . As we noted previously using our computational model, GroE expression does not affect the fitness of non-deleterious mutations, thus ruling out the possibility that GroE reduces the fluorescence and thereby the fitness of T203A/S. Third, GroE expression might itself reduce the color shift in T203A/S and thereby disfavor the accumulation of these variants in G + populations. To find out, we compared the relative color of each genotype under GroE expression with that of the ancestral GFP (without GroE expression. We found that the relative color of T203A, T203S, S147P, and both double mutants was still significantly higher under GroE expression than that of ancestral GFP in the absence of GroE expression(Mann-Whitney U-test, P < 10 −15 ). This rules out the possibility that GroE expression suppresses the color shift of these mutations.
In sum, while it is clear that GroE can reduce the fluorescence color shift during evolution by affecting the spreading of color-shifting mutations, the reasons for this observation remain a task for future work.

Fluorescence analysis of engineered mutants
We measured the fluorescence of each GFP mutants with and without GroE expression, using flow cytometry (Materials and Methods: Analysis of fluorescence of populations using flow cytometry). For each mutant, we measured the fluorescence of 10,000 cells to obtain a fluorescence distribution. Also for each mutant, we performed these measurements in three biological replicates, repeating the measurement starting from three different samples of the same mutant's glycerol stock. We also re-measured fluorescence of ancestral GFP along with that of the mutants for every biological replicate experiment, to correct for day to day variations in growth conditions and performance of the flow cytometer. We note that GroE expression led to heterogeneity in fluorescence for most mutants and replicates ( Figure S14A-B), as we had observed for ancestral GFP ( Figure S4).
To analyze the spectral shift in fluorescence in any one genotype harboring one or more mutations, we first calculated the ratio of cyan fluorescence and green fluorescence (relative color) for every cell in an isogenic population of this genotype, in the absence of GroE expression. In this way, we obtained the distribution of relative color for every variant genotype. Next, we compared the distribution of relative color of every genotype with that of the ancestral GFP using Mann Whitney U-test. We found that the genotypes S147P, T203A, T203S, S147P/T203A and S147P/T203S showed a significant increase in relative color (Mann Whitney U-test, P < 10 −15 ; Figure S15).
Next, we asked if GroE expression reduces the color shift of some variants, thereby disfavoring their selection in phase 2 evolution. To this end, we compared the relative color of each genotype under GroE expression with that of the ancestral GFP (without GroE expression), using a Mann-Whitney U-test as described in the previous paragraph. This analysis showed that the relative color of each mutant genotype was still significantly higher than that of ancestral GFP (Mann Whitney U-test, P < 10 −15 ).  Median cyan fluorescence relative to ancestral GFP Figure S15: Beneficial color shifting mutations accumulate in phase 2. A Single mutants, S147P (blue), T203A (yellow) and T203S (brown), as well as the double mutants S147P/T203A (turquoise) and S147P/T203S (dark green) show a color shift towards cyan fluorescence. Ancestral GFP (black) is denoted by the symbol WT. The horizontal axis shows log 10 transformed green fluorescence and the vertical axis shows log 10 transformed cyan fluorescence. (B) Median relative color (cyan:green ratio; horizontal axis) for different variants (vertical axis) is in agreement with the scatterplot. (C) All the mutations (vertical axis) i.e. the three single mutations and the two double mutations are beneficial as they increase the median cyan fluorescence (horizontal axis). Three grouped bars in panels B-C corresponding to each mutation denote three biological replicate measurements.

GroE expression affects the fitness of different variants
We estimated the fitness of different variants represented by a unique value of µ− (mean of Gaussian distributed log 10 -transformed fluorescence in the absence of GroE expression; Materials and Methods: Modeling the effect of GroE overexpression on fitness). The average fitness in the presence and absence of GroE expression (F + and F − , respectively) increased with increasing µ− ( Figure S17A). However, also with increasing values of µ− , the effect of GroE (∆F) increased from zero, changed its sign from positive to negative, and eventually increased again to approach zero from below. This indicates that GroE buffers some mutations, potentiates other mutations, and has no effect on the fitness of the rest (Main text: Figure 6).  Figure S16: Similar qualitative changes in fluorescence distribution for different mutations due to GroE expression. Each panel shows a scatterplot of the correlation between µ− (horizontal axes) and other parameters that characterize the fluorescence distribution (vertical axes). In panel (A) the vertical axis shows the position of the bimodal fluorescence intensity peaks µ+ L (yellow) and µ+ H (green) relative to µ− . In panel (B) the vertical axis shows the weight coefficients C+ L (yellow) and C+ H (green), and in panel (C) the vertical axis shows the peak width parameters σ− (black), σ+ L (yellow), and σ+ H (green). In all three plots, each data item (circle) is derived from the distribution of log 10 -transformed fluorescence of one flow cytometry data file. We excluded the fluorescence distributions of start codon mutations for reasons discussed in the main text.