Abstract
It is incompletely understood how biophysical properties like protein stability impact molecular evolution and epistasis. Epistasis is defined as specific when a mutation exclusively influences the phenotypic effect of another mutation, often at physically interacting residues. By contrast, nonspecific epistasis results when a mutation is influenced by a large number of non-local mutations. As most mutations are pleiotropic, basal protein stability is thought to determine activity-enhancing mutational tolerance, which implies that nonspecific epistasis is dominant. However, evidence exists for both specific and nonspecific epistasis as the prevalent factor, with limited comprehensive datasets to validate either claim. Here we use deep mutational scanning to probe how initial enzyme stability impacts local fitness landscapes. We computationally designed two different variants of the amidase AmiE in which catalytic efficiencies are statistically indistinguishable but the enzyme variants have lower probabilities of folding in vivo. Local fitness landscapes show only slight alterations among variants, with essentially the same global distribution of fitness effects. However, specific epistasis was predominant for the subset of mutations exhibiting positive sign epistasis. These mutations mapped to spatially distinct locations on AmiE near the initial mutation or proximal to the active site. Most intriguingly, the majority of specific epistatic mutations were codon-dependent, with different synonymous codons resulting in fitness sign reversals. Together, these results offer a nuanced view of how protein stability impacts local fitness landscapes, and suggest that transcriptional-translational effects are an equally important determinant as thermodynamic stability in determining evolutionary outcomes.
Introduction
Understanding the mechanisms of molecular evolution is important to molecular biology, virology, evolutionary biology, and protein engineering. Researchers interested in evolving natural proteins, designing proteins de novo, or understanding the extent of contingency on extant proteins must contend with the implicit evolutionary limitations set forth by nature. The challenge, then, is to understand what constrains protein evolution and by what mechanisms. How do these factors interact with one another to alter the frequency of mutations with increased fitness in a given environment, and how do they govern evolvability for new functions?
A particularly important component of evolution is epistasis, or the non-additive combination of mutations (1). Epistasis impacts the rate of evolution and the spectrum of possible evolutionary pathways available to a protein (2). Epistasis is said to be specific if a mutation exclusively influences the phenotypic effect of other mutations, usually at physically interacting residues (3). By contrast, nonspecific epistasis results when a mutation impacts a global property like stability that can be rescued by large numbers of non-local mutations. Of the two classes, specific epistatic effects exert the greatest influence on the possible evolutionary outcomes (3). This is because specific epistatic mutations decrease the reversibility of a protein in an evolutionary trajectory. Specific epistasis also decreases the robustness of a protein to new mutations, reducing the number of possible evolutionary paths available. By modulating the evolutionary trajectories available, epistatic phenomena exert immense influence over the long-term evolution of proteins (4).
What is still incompletely understood is how biophysical parameters like thermodynamic stability constrain epistasis. Analyses at the genomic (5), protein (6–8), and organismal (9) levels have uncovered a complex and dynamic equilibrium between stabilizing and destabilizing mutations. For enzymes in particular, previous studies have shown that missense mutations often act pleiotropically in which catalytically enhancing mutations are, on average, moderately destabilizing (6, 10–11). Consequently, high basal stability can buffer catalytically beneficial but destabilizing mutations (12–13), allowing fixation. Deleterious destabilizing mutations can be repaired by reversion mutations (14), or by specific and non-specific epistatic mutations that rescue stability (5, 15). These epistatic mutations are a central phenomenon in the stabilizing-destabilizing equilibrium, with significant consequences in long-term evolution (16–17). It is uncertain whether specific or non-specific epistatic mutations are more likely to rescue a destabilized protein, with evidence existing for both arguments (14–18).
Deep mutational scanning experiments provide a wealth of mutational data that can be used to address questions in molecular evolution (19). This technology comprises the use of large mutational libraries with selections coupled to deep sequencing to evaluate relative fitness of thousands of variants in a massively parallel fashion (11, 20–22). We previously used deep mutational scanning on the homohexameric aliphatic amidase AmiE from Pseudomonas aeruginosa to understand how local fitness landscapes change with different substrates (22). In this original study, AmiE was chosen as a model as it is stable in its genetic background and has a high probability of folding upon translation. Here we have used deep mutational scanning to comprehensively assess how stability constrains mutational outcomes by designing two variants of AmiE in which catalytic activity is unperturbed but the proteins have different probabilities of folding in vivo. We used these variants to assess how basal enzyme stability impacts local fitness landscapes. While we found moderate epistasis, local fitness landscapes are largely insensitive to starting enzyme stability. In particular, the great majority of beneficial mutations were shared between all three starting points. However, positive sign epistasis was present and was dominated by specific epistasis. Remarkably, we found that the sign of the fitness metric for many mutations depends on the codon preference, suggesting more complicated fitness landscapes than predicted from intrinsic protein biophysics. Together, these results provide a nuanced view of how local fitness landscapes are perturbed under slightly different initial stabilities.
Results
The experimental pipeline used in this study is shown in Fig. 1A. First, we designed destabilized variants of AmiE that possess wild-type catalytic activity. Second, we developed selection conditions for the variants such that cell growth is proportional to enzyme activity using a growth selection with acetamide as the sole nitrogen source. Third, near-comprehensive single-site saturation mutant libraries for our variants were prepared (22) and growth selections performed. Fourth, pre- and post-selection populations were deep sequenced to extract mutant frequencies in the selected and reference populations. These frequencies were converted into a relative fitness metric (ζi) for each mutant i defined as
where μi and μREF represent the specific growth rates in selection media for the mutant (μi) and unmutated AmiE variant (μref), respectively. A relative fitness score above zero means that strains harboring the mutant has higher fitness than those carrying the unmutated variant.
A. A graphical overview of this study. Two AmiE enzyme variants with single point mutants (I38V and I122L) with WT catalytic function and reduced stability were computationally designed and validated experimentally. Constitutive expression of each enzyme from a plasmid was tuned such that the growth rate of our bacterial growth selection strain in selection media was dependent on the expression of functional AmiE. Deep mutational scanning was performed on these variants and compared with WT AmiE to determine how stability constrains enzyme evolution. B-G. Design and validation of destabilized AmiE variants. B. A graphical representation of the computational enzyme design. C. Enzyme velocity as a function of acetamide and Michaelis-Menten parameters determined relative to WT AmiE. Error bars = 1 s.d., n = 2. D. Structural modeling of the cavities introduced into AmiE by designed mutations. E. Enzyme yield following E. coli auto-induction expression. Error bars = 1 s.d., n = 3, * = p-value = 0.0003, # = p-value = 0.002. F. Specific growth rates of strains in M9 (unselective) and in minimal media with 10 mM acetamide as sole nitrogen source (selective) (pEDA2 - low expression, pAG - high expression). Error bars = 1 s.d., n ≥ 3. G. Comparison of enzyme reaction velocities at substrate saturation relative to a folded control. Grey dots represent biological replicates, Error bars = 1 s.d., n = 2.
Destabilized AmiE variants with wild-type catalytic efficiencies engineered
We first sought to identify mutations to AmiE that, under the selection conditions, would decrease the in vivo folding probability of the protein while maintaining wild-type catalytic efficiency. To identify such mutants, we chose to use a computational approach by modifying PROSS (23). Briefly, PROSS designs a protein sequence that will have an improved probability of reaching the folded state in vivo relative to its input. This improved folding probability correlates with biophysical properties like improved protein stability, faster on-target folding rate, or reduced aggregation propensity. As our experimental objective is essentially the inverse problem, we modified the Rosetta FilterScan protocol undergirding PROSS and then selected point-mutations with higher energy scores relative to AmiE wild-type (WT) (Fig. 1B and SI Appendix, Table S1). For each mutant these scores were then cross-referenced with experimental relative fitness scores previously determined for AmiE (22) to ensure that their relative fitness was below zero (SI Appendix, Table S1).
Of thirteen variants with 1-3 mutations from WT selected for experimental characterization, nine expressed as soluble proteins in E. coli BL21* (DE3). We purified a subset of these nine variants and assessed their catalytic efficiency with the substrate acetamide. While most mutants showed reduced enzymatic activity, both AmiE I38V and AmiE I122L showed statistically indistinguishable maximum turnover rates (kcat) and Michaelis constants (KM) compared with WT (Fig. 1C and SI Appendix, Table S2). Furthermore, size exclusion chromatography showed no oligomeric differences between WT and AmiE I38V or AmiE I122L (SI Appendix, Fig. S1) suggesting that the variants maintain the expected homohexameric quaternary structure.
AmiE I38V removes a methyl group to open a small cavity in the core, while I122L modulates hydrophobic core packing in the monomer subunit (Fig. 1D). Complementary in vitro and in vivo experiments suggest that both I122L and I38V variants are less stable than WT in the general order: I38V<I122L<WT. For both variants, all synonymous mutations had a fitness metric below zero (SI Appendix, Table S1), suggesting loss of fitness as a result of change at the protein level (22). When driven from the same T7 promoter under identical Studier auto-induction (24) protein expression conditions, both AmiE I38V and I122L have statistically significant lower purification yields of soluble protein than WT (Fig. 1E and SI Appendix, Table S2). Furthermore, E. coli harboring destabilized AmiE variants expressed from the same plasmid – pEDA2 (22) maintaining the same constitutive promoter, ribosome-binding site (RBS), and 5’ untranslated region (5’ UTR) – showed lower specific growth rates than WT when grown with acetamide as the sole nitrogen source (Fig. 1F and SI Appendix, Table S2). These results suggest that the folding probability upon translation for these variants is lower than for WT. Finally, while denatured WT can refold into active enzyme at 14.2% yield, both I38V (0.06%) and I122L (0.11%) have vastly lower refolding yields (Fig. 1G and SI Appendix, Table S2). These results together support a model where the I38V and I122L mutations destabilize AmiE, which results in a lower probability of folding in vivo.
Deep mutational scans for AmiE variants
Deep mutational scanning of these variants was performed using a previously developed growth selection (22) in media with 10 mM acetamide as the sole nitrogen source. These growth selections required tuning the constitutive amidase expression such that the specific growth rate of variant i expressed in E. coli MG1655 rph+ in the selection media relative to that in defined minimal media (μs,i/μM9,i) is 0.4-0.6. However, plasmid pEDA2 used for AmiE WT selections did not support high enough growth rates for the destabilized variants (Fig. 1F and SI Appendix, Table S2). Thus, we screened additional promoters for AmiE I38V and AmiE I122L while maintaining the same 5’ UTR and RBS for all constructs in order to minimize potential variant-dependent mRNA effects on fitness (SI Appendix, Table S3). Plasmid pAG with a stronger constitutive promoter than pEDA2 supported a growth rate ratio of 0.49 ± 0.03 for AmiE I122L and 0.42±0.02 for AmiE I38V (Fig. 1F and SI Appendix, Table S2). By contrast, pAG AmiE WT had a nearly 2-fold higher growth rate ratio of 0.91 ± 0.04 (Fig. 1F and SI Appendix, Table S2).
Next, we generated near comprehensive single-site saturation mutant libraries for the destabilized enzymes using nicking mutagenesis (25) (full library statistics are shown in SI Appendix, Table S4). For AmiE I38V mutations at residues 32-44 flanking the site of the destabilizing scar were not constructed, while for AmiE I122L mutations at residues 115-130 and 132 were not made. Plasmids expressing mutant enzyme libraries were electroporated into E. coli MG1655 rph+ under conditions minimizing double transformants. Then, strains harboring AmiE libraries underwent growth selections in replicate with initial population sizes of >6×106 cells for approximately 8 generations at 37°C. A biological replicate for AmiE WT covering residues 171-255 was also performed to compare with previous published results (22). The pre- and post-selection populations were barcoded and deep sequenced. The resulting data was processed using PACT (26) to obtain the relevant fitness metrics for each mutant in the library. The depth of sequencing ranged from 155 to 300-fold coverage for the libraries (SI Appendix, Fig. S2). In total, we were able to recover the relative fitness for 93.7% and 91.8% of all possible non-synonymous mutants for AmiE I122L and AmiE I38V, respectively (SI Appendix, Table S5).
To estimate reproducibility, we compared the AmiE WT replicate selections performed here with data from an identical selection experiment performed in Wrenbeck et al. (22). Correlation coefficients between selections are ≥0.90 (SI Appendix, Fig. S3), which is comparable to correlation between replicates performed for this work (AmiE I122L - 0.921; AmiE I38V - 0.952) (Fig. 2A). Additionally, there was essentially no correlation between relative fitness and pre-selection frequency of a given mutant in the library, (WT AmiE – R = 0.011, AmiE I122L – R = 0.026, AmiE I38V – R = −0.0376) (SI Appendix, Fig. S4) indicating that pre-selection read counts do not bias the fitness metrics obtained.
A. Correlation between AmiE variant technical replicates. B. Distributions of fitness effects (DFE) of nonsense and missense mutations for the AmiE variants. Upper plots show full DFE, while lower plots include only beneficial mutants with best-fit exponential curve. C. Correlation of fitness between AmiE variants. D. Venn diagram for all unique and shared beneficial mutations for the respective variants.
Distribution of fitness effects are largely insensitive to protein stability
The shape of the distribution of fitness effects (DFE) governs the local protein fitness landscape. Realizing that beneficial mutations are rare, the likelihood of finding beneficial mutations was predicted by Orr (27) to follow the Pareto family of distributions. Using the set of beneficial mutations – variants with relative fitness above wild-type under selective media – we were previously able to describe the shape of the DFE for beneficial mutations as exponential with high statistical power (22). The new datasets allow us to ask directly whether the shape of DFE changes with respect to enzyme stability. Consistent with expectations, all variants have very similar distributions of fitness effects (Fig. 2B) with a tight range of total possible mutations that are beneficial. For both destabilized enzymes the Pareto family of functions also describes their distributions of beneficial fitness effects (SI Appendix, Table S6). Thus, given approximately the same relative fitness, the probability of finding rare beneficial mutations is independent of enzyme stability.
Moderate epistasis observed with decreasing enzyme stability
How does the local fitness landscape change in the background of the destabilized AmiE variants? If mutations were completely additive with destabilizing mutations, we would expect the local fitness landscape to be identical and the correlation to approach that for replicates (R ~0.92). On the other hand, complete non-additivity of mutations would lead to minimal correlation. We were able to compare 2,813 mutations above the lower bound of relative fitness (45.4% of possible mutations) shared between the three datasets. Pearson’s correlation analysis of the DFE finds that the WT local fitness landscape is reasonably correlated with that of the destabilized variants (WT vs. I38V R= 0.72, WT vs. I122L R = 0.79), and this correlation is similar to that between I122L vs. I38V (R= 0.79) (Fig. 2C). Notably, these correlation coefficients are lower than for replicates, indicating that while in bulk the majority of mutations behave identically in different genetic backgrounds, there is some epistasis.
Precise measurements of negative and positive epistasis are complicated by the relatively narrow range of fitness our experimental system captures. However, we can determine the sign of fitness in our datasets with high precision. To evaluate the relative prevalence of sign epistasis, we defined any mutant as beneficial if ζi > 0 for both replicates and if ζi > 0 within a 95% confidence interval (see methods). Conversely, we define a mutant as deleterious if ζi < 0 for both replicates and if ζi < 0 within a 95% confidence interval. We used these cutoffs to sort beneficial variants into the seven possible fitness bins (Fig. 2D). Using a stricter requirement - that beneficial mutants are defined as those with a ≥10% increase in specific growth rate over the genetic background - leads to similar results (SI Appendix, Fig. S5).
Most beneficial mutations in WT background are shared
Previous studies found that stable proteins can buffer destabilizing mutations that are otherwise beneficial (28–29). We are able to assess the extent of this phenomenon in our datasets. We find that 122/141 (86.5%) of beneficial mutations in the WT background are also beneficial in the I38V and/or the I122L genetic background (Fig. 2D). Of these, six globally beneficial mutations (S9A, A28R, R89E, I165C, V201M, A234M) have been previously characterized biophysically (22) and are known to improve specific amidase flux under the selection conditions of 10 mM acetamide at 37°C. Conversely, only 19 of 141 beneficial mutations (13.5%) are specific in the WT background (Fig. 2D). Therefore, beneficial mutations that are buffered in the stable background are present but in the minority.
The 19 WT-specific beneficial mutations map to two predominant locations: eleven (G291C, L297H, D311C, E320A, S325Y/T/C, 326Y/H, R336L, G341Y; Fig. 3A) are at the extreme C-terminus that creates extensive homodimer contacts, while four (M72T, E74G, A78H, E82I; Fig. 3A) are located on helix B and helix C distal from any oligomeric contacts in the homohexamer. The majority of the C-terminal mutations are predicted to destabilize the homodimerization interface, which we speculate could lead to subtle structural rearrangements in the AmiE active site. Probable mechanisms behind the helix B/C mutations are more obscure as three of these mutations are at surface exposed positions over 10 Å away from any active site residue. Nevertheless, these findings indicate localized regions where small-scale mutational perturbations lead to increased fitness in a more stable genetic background.
A-C. Bar graphs of fitness metrics. Error bars represent 95% confidence intervals, while grey dots are fitness metric for each replicate. Horizontal dotted lines represent the cutoff value for mutations that increase the growth rate by ≥10%. Models show: trimer of dimers (wire + surface models), background residues for a respective enzyme (white mesh + sticks), the active site residues (magenta spheres), and where applicable the original destabilizing mutation (green or orange mesh + sticks). A. AmiE WT unique beneficial mutations are located at the C-terminal tail or in the B/C helices. B. Location of AmiE I122L unique beneficial mutations segregate to either positions adjacent to destabilizing mutation (T84I/V) or adjacent to the active site. C. Locations of a subset of AmiE I38V unique beneficial mutations. In B and C the blue transparent surface with sticks represents the residue in AmiE WT, and green or orange transparent surfaces with sticks represent respective unique beneficial mutations.
Positive sign epistasis is overwhelmingly specific
Our datasets allow direct comparisons between the prevalence of specific and nonspecific epistasis in a destabilizing background. Contrary to previous literature on different model proteins (2,12, 29–33), we find that specific reciprocal positive sign epistatic mutations dominate nonspecific mutations in the I38V and I122L destabilized backgrounds. In particular, we find 8 specific – unique beneficial – mutations for AmiE I122L and 37 specific mutations for AmiE I38V, compared with only 4 nonspecific – shared beneficial – mutations (Fig. 2D and 3B–3C). Analysis of the locations of the unique mutations in the structures of AmiE I122L and AmiE I38V suggest biophysical interpretations of their epistatic mechanism. For AmiE I122L the strongest specific mutations T84I/V directly contact 122L (Fig. 3B), while for AmiE I38V similar mutations (K21P, L25C, L102I, M251N, I253Y, Q254T) occur on loops adjacent to the void created by the destabilizing mutation (Fig. 3C). Other specific mutations line the enzyme active site. For AmiE I122L (Fig. 3B) there are three (P58H, P69Q, I136V), while in AmiE I38V (Fig. 3C) there are five (I65V, G143T, Q190T, M193P, F230C). Notably, since our datasets do not include immediately adjacent mutations for I38V and I122L, the extent of specific positive sign epistasis is probably underestimated.
A plurality of unique beneficial mutations is codon-dependent
Fitness conferred by weak-link enzymes depends on intrinsic protein biophysics but also on mRNA sequence-dependent effects. Perhaps best appreciated of these are synonymous mutations in the first ten codons of a polypeptide because they can substantially alter mRNA stability and access to the ribosome binding sites in bacteria (22, 34). Additionally, synonymous codons can differentially affect cotranslational folding (35) and impact fitness. Here we mapped the variance between synonymous codons encoding beneficial missense mutations (Fig. 4A). For all three variants, the majority of high variance codons occur in the first 10 codons, as expected (SI Appendix, Fig. S6A). However, there were localized punctae of high variance at several downstream positions for both WT and I38V datasets corroborated in replicate measurements. While overall variance in beneficial synonymous codons is weakly or not statistically significant among datasets (SI Appendix, Fig S6B), contingency table analysis of synonymous codon fitness disparities for the unique and shared beneficial mutations finds correlation with unique beneficial mutations in the WT and I38V backgrounds (WT p-value = 2.5×10−8, I122L p-value = 0.053, I38V p-value = 0.00017; 2-tailed Fisher exact probability test) (Fig. 4B). In fact, the vast majority of WT (84.2%) - and many of the I122L (42.8%) and I38V (48.6%) - unique beneficial mutations have synonymous codon fitness sign disparities (Fig. 4B). This indicates that transcriptional-translational effects impose significant evolutionary constraints.
A. Variance of fitness metrics for synonymous codons of beneficial mutations as a function of position in the primary sequence. B. Percentage of shared and unique beneficial mutations with synonymous codon fitness metric disparities. p-values reported are from contingency table analysis with 2-tailed Fisher exact probability test.
Discussion
In this study we used deep mutational scanning to analyze how basal protein stability impacts local fitness landscapes. AmiE I38V and AmiE I122L were designed and validated to have identical catalytic parameters to WT but have a lower probability of folding under selection conditions. We found that the DFE for both variants was largely similar to the AmiE WT, and that most fitness-enhancing mutations are shared. However, there were two major surprises found when analyzing the set of mutations exhibiting positive sign epistasis.
First, we expected there to be a larger subset of beneficial mutations shared only between the I122L and I38V datasets, as current models of stability-induced epistasis posit that many nonspecific globally-distributed mutations can improve the probability of folding (3). In contrast, we found that sign epistatic mutations were overwhelmingly specific for the I122L and I38V backgrounds. Second, we found that unique beneficial mutations strongly depend on codon choice, as approximately 50% of sign epistatic mutations in the I38V background show sign disparities. What are the reasons for these seeming disparities with existing literature? We speculate that both unexpected results arise from the complicated co-translational folding in vivo of the homohexameric AmiE. Local, specific nonsynonymous mutations may recover on-target folding trajectories more efficiently than nonspecific, globally stabilizing mutations. Similarly, on-pathway folding kinetics may differ considerably between variants, which can be selectively modulated by codon choice. Regardless of the exact mechanism, our results show that simple biophysical models currently used to model protein evolution are incomplete and that biophysical models may need to use kinetic models to account for the folding probability in vivo.
Materials and Methods
Full details are available in the SI Appendix, Materials and Methods
Computational design of destabilizing mutants
The FilterScan Rosetta script from Goldenzweig et al. (23) was modified to predict destabilizing mutations.
Protein biochemistry
Plasmid assembly, protein expression and purification, kinetic analysis, and biophysical analysis was done as essentially described in Klesmith et al. (21) and in Wrenbeck et al. (22). For refolding experiments, 50 µM of the respective enzymes were denatured in PBS + 3M GDN-HCl + 1 mM DTT at 4°C for ~16 hours. Enzymes were refolded by dilution to 1 µM in PBS + 0.1% (w/v) BSA + 1 mM DTT in BSA blocked PCR plates at 4°C, the refolding mixture was brought to 37°C and incubated for 20 minutes. Refolded enzymes were assayed exactly as in Wrenbeck et al. (22).
Deep mutational scanning
Mutational libraries were constructed using nicking mutagenesis (25). Growth selections were performed as in Wrenbeck et al. (22). Plasmid DNA from pre- and post-selection populations were purified and processed for deep sequencing using 300-bp paired-end reads on an Illumina MiSeq exactly as in Wrenbeck et al. (22). Obtained data sets were processed by PACT using the following equations. Briefly, enrichment ratios (εi) were calculated by assessing the pre- and post-selection counts for each mutant:
Here foi is the frequency of mutant i in the pre-selection population and ffi is the frequency in the post-selection population. Next, normalized fitness metrics (ζi) for each mutant i were calculated using the following equation:
Where gp is the population-averaged number of doublings during the selection, and ε represents the enrichment ratios of the mutant (εi) and of the unmutated starting variant (εref).
Acknowledgements
Thanks to Dr. J. Klesmith for his PACT troubleshooting help, E. Maurer and J. Hosten for their help with assorted tasks, and members of the Whitehead lab for providing feedback on ideas and figures. This work was supported by NSF CBET Career Award #1254238 to T.A.W.