Abstract
Genetic variation among orthologous proteins can cause cryptic phenotypic properties that only manifest in changing environments. Such variation may also impact the evolutionary potential of proteins, but the molecular basis for this remains unclear. Here we perform comparative directed evolution in which four orthologous metallo-β-lactamases were evolved toward a new function. We found that genetic variation between these enzymes resulted in distinct evolutionary outcomes. The ortholog with the lower initial activity reached a 20-fold higher fitness plateau exclusively via increasing catalytic activity. By contrast, the ortholog with the highest initial activity evolved to a less-optimal and phenotypically distinct outcome through changes in expression, oligomerization and activity. We show that the cryptic molecular properties and conformational variation of residues in the initial genotypes cause epistasis, thereby constraining evolutionary outcomes. Our work highlights that understanding the molecular details relating genetic variation to protein functions is essential to predicting the evolution of proteins.
Introduction
Genetic diversity across orthologous proteins is thought to be predominantly neutral with respect to their native, physiological function, but can cause “cryptic genetic variation”, i.e., variation in other non-physiological phenotypic properties1–3. Cryptic genetic variation has been shown to play an important role in evolution, because genetically diverse populations are more likely contain genotypes with a “pre-adapted” phenotype, e.g., a latent promiscuous function, that confer an immediate selective advantage when the environment changes and a new selection pressure emerges4–8. Beyond these examples, however, we have little understanding of how genetic variation affects long-term adaptive evolutionary potential or the “evolvability” of proteins9. Many biological traits, in particular promiscuous proteins functions, often evolve by accumulating multiple adaptive mutations before reaching a fitness plateau or peak. Thus, the degree to which the trait can improve and the level of peak fitness that it can reach ultimately determines the evolutionary potential and outcomes10. Recent studies have shown that, due to intramolecular epistasis11–13 the phenotypic effects of the same mutations introduced into orthologous proteins can exhibit variable phenotypic effects14,15, this is true even if they are introduced into a genotype that differs by only a few other mutations16,17. However, having examined only a small subset of mutations, the degree to which genetic starting points determines long-term evolutionary outcomes given the same selection remains unclear18,19. Moreover, we have very little understanding of the molecular mechanisms that underlie the relationship between genetic variation and evolvability. It has been suggested that protein fold20–22 and protein stability23–25 can determine evolvability, but these alone cannot explain the prevalence of epistasis and the enormous variation we observe in the evolvability of different genotypes.
Metallo-β-lactamases (MBLs) encompass a genetically diverse enzyme family that confer β-lactam antibiotic resistance to bacteria (Fig. 1a,b)26,27. Our previous work demonstrated that most MBLs exhibit phosphonate monoester hydrolase (PMH) activity (kcat/KM) in the range of 0.1 to 10 M−1s−127,28. β-lactamase and PMH hydrolysis differs by scissile bond (C-N vs. P-O) and transition state geometry (tetrahedral vs. trigonal bipyramidal), making the PMH activity a catalytically distinct promiscuous activity for these enzymes. Here, we perform an empirical test of enzyme evolvability by performing directed evolution with orthologous MBLs (NDM1, VIM2, VIM7, and EBL1) towards PMH activity. Through detailed analysis of the genotypic and phenotypic changes along the evolutionary trajectories, we demonstrate how, at the molecular level, the cryptic properties of these evolutionary starting points determine evolvability.
a, Catalytic efficiencies (kcat/KM) of six metallo-β-lactamases (MBLs) for β-lactamase and PMH activities. The phylogenetic relationship is shown with bootstrap values indicated at each node. b, Comparison of NDM1 (blue, PDB ID: 3SPU) and VIM2 (green, PDB ID: 1KO3) in overall 3D structure and biophysical properties. c, Overview of the comparative directed evolution experiment of MBLs towards PMH activity. d, PMH fitness improvements of the directed evolution. PMH fitness is presented as PMH activity in cell lysate relative to NDM1-WT. The activity level of each variant is listed in Supplementary Table 3. e, The mutations of NDM1 and VIM2 over the ten round of directed evolution. The structural location of the mutations is mapped on the wild-type structures of NDM1 and VIM2 with the C-α atoms of mutated residues shown as spheres (upper panel). Mutations are described on a partial alignment NDM1 and VIM2 sequences (bottom panel). Full alignment of MBLs and mutations of individual variants are presented in Supplementary Fig. 1 and Supplementary Table 2.
Results
Comparative directed evolution of four different MBLs
We performed directed evolution using four different MBL orthologs, NDM1, VIM2, VIM7 and EBL1, as starting genotypes (Fig. 1, Supplementary Fig. 1 and Supplementary Table 1). The level of the native β-lactamase activity across these enzymes is similar, but their promiscuous PMH activity varies by over 50-fold (Fig. 1a). From these starting points, the same directed evolution scheme was applied to improve PMH activity (Fig. 1c). Briefly, randomly mutagenized gene pools were transformed into Escherichia coli, and purifying selection was used to enrich for functional variants and purge out non-functional variants by plating the library onto agar plates with a low concentration (4 μg/ml) of ampicillin. Colonies were inoculated into 96-well plates (in total 396 variants per round), regrown, lysed and screened for PMH activity. The “enzyme fitness” (or selection criteria) in our directed evolution scheme is defined as the level of PMH activity in E. coli cell lysate. The most improved variant(s) were isolated, sequenced and used as templates for the next round of evolution.
Information about the enzymes characterized in this study.
We observed significant differences in the degree of fitness improvement among four alternative trajectories over the first two rounds of directed evolution (Fig. 1d). Interestingly, improvement in fitness is not correlated to the initial fitness level. For example, the variant with the highest initial activity, VIM2, improved only 4-fold, and was surpassed by the other, initially less fit, variants, NDM1 and EBL1, by 210- and 310-fold improvement, respectively. We continued the directed evolution of VIM2 and NDM1 for an additional eight rounds (Fig. 1d and Supplementary Tables 2-3). Overall, both trajectories demonstrated diminishing returns in their evolution toward new activity, i.e. each trajectory eventually reached a plateau29,30. Note that the fitness plateaued regardless of purifying selection using β-lactam antibiotic resistance; purifying selection was not employed in the last two rounds (R9 and R10), yet no variant with further improved fitness was isolated (Fig. 1d). Besides the similar trend of the trajectory, there were substantial differences in their evolutionary outcomes; NDM1 was initially 4-fold less-fit than VIM2, but its fitness was 28-fold higher by the end of the evolutionary experiment. The NDM1 trajectory improved by 3600-fold, whereas the VIM2 trajectory improved by only 35-fold, resulting in a 100-fold difference in their relative evolvability with respect to PMH fitness. Given that the two WT enzymes were almost identical in terms of their physicochemical properties, protein solubility, stability and structure (Fig. 1a,b), the variation in the evolutionary potential that separates these orthologous sequences is substantial.
Information on the directed evolution procedure and mutations that were accumulated during directed evolution.
PMH fitness values (cell lysate activity) of evolved variants and generated mutants.
Genotypic solutions vary across the evolutionary trajectories
Overall, NDM1 and VIM2 accumulated 13 and 15 mutations respectively. Interestingly, the mutations occurred along each trajectory were entirely distinct (Fig. 1e, and Supplementary Table S2). Only two mutations occurred at the same position (154 and 223), and these were mutated to different amino acids. For example, the mutations in the early rounds of evolution, which conferred the largest fitness improvements, were W93G, N116T, and K211R in the NDM1 trajectory, and V72A, and F67L for VIM2. In NDM1, most mutations are scattered relatively evenly around the active site; only one mutation (W93G) is located below loop 3, and several are located in and around loop 10 and other parts of the active site. By contrast, in VIM2, six mutations are tightly clustered within or next to loop 3. These results demonstrate that the distinct phenotypic outcomes for the two enzymes result from distinct mutational responses.
Repeatability and determinism of evolutionary adaptation
An important question that arises from the observation that the two enzymes followed different mutational paths is whether this occurred because of differences in their innate molecular property for evolvability, or if it was due to random chance from experimental variation. To address this, we pursued several lines of evidence that suggest the observed trajectories are largely deterministic (Fig. 2). First, we generated and screened two additional libraries from each wild-type enzyme, which repeatedly identified the same mutations that are observed in early rounds of the original directed evolution (W93G in R1 for NDM1 and V72A and F67L in R1 and 2, respectively, for VIM2, Fig. 2a). Second, we assessed the epistatic effects of the mutations by introducing the mutations that occurred between R2-R4 into the corresponding wild-type genetic background, and found that the positive effect of later mutations in NDM1 confer a selective advantage only after the fixation of the initial mutations along that trajectory (Fig. 2b). Third, we introduced the initial mutation from each trajectory (W93G for NDM1 and V72A for VIM2) into the counterpart enzyme and assayed its effect on fitness, which revealed that each trajectory’s adaptive mutations are incompatible with the other’s. W93G, which increased fitness for NDM1 by 25-fold, caused a 6-fold fitness decrease for VIM2-WT, thus explaining why VIM2 did not acquire this mutation during its evolution (Fig. 2c). Introducing other hydrophobic residues (A, V, L and F) at position Trp93 in VIM2 had similar negative effects (Fig. 2d). Similarly, V72A, which improved the fitness of VIM2 by around 2-fold, was largely neutral for NDM1 (Supplementary Fig. 2, and Supplementary Table 4). Taken together, this evidence suggests that intramolecular epistasis results in each genotype only having a limited and unique set of mutations available for adaptation to the PMH activity, which results in mostly repeatable and deterministic evolutionary trajectories.
Changes in catalytic activity of purified enzymes compared to its wild-type enzyme, and melting temperature of MBL mutants.
a, Additional screening of the wild-type libraries. The improved variants which were isolated from two additional screens of the mutagenized libraries from the wild-type sequences, NDM1-WT and VIM2-WT. Mutations highlighted in colours are the mutations that were isolated in the original directed evolution experiment. b, Epistasis analysis of mutations. The mutations occurring in the trajectory of NDM1 and VIM2 were introduced into the respective wild-type sequence and the change in fitness was compared to the ones observed in the trajectory. The errors bars represent the propagated standard deviation from three independent experiments. c, Fold change in the PMH fitness of W93G mutants compared to the WT variants. d, Fitness effect of introducing various hydrophobic residues at position Trp93 in VIM2.
Distinct molecular changes underlie the two evolutionary trajectories
The conventional paradigm of protein evolution is dominated by the idea that higher protein stability, or greater soluble expression, promotes protein evolvability, buffering the destabilizing effect of function-altering mutations, allowing a greater number of adaptive mutations to accumulate23–25. This model, however, fails to predict the difference in evolvability between NDM1 and VIM2, because their relative stability and solubility are similar (Fig. 1b). In order to elucidate the precise molecular changes that enabled each trajectory’s optimization, we measured a range of molecular properties, including catalytic efficiency (kcat/KM), solubility, melting temperature (Tm), and oligomeric assembly, over the course of their evolution. The molecular changes that underlie their respective fitness improvements differed substantially (Figs. 3, Supplementary Fig. 3 and Supplementary Tables 5-7). In NDM1, kcat/KM improved by 20,000-fold from 0.32 to 5,900 M−1s−1 by R10 (Figs. 3a-b). The significant improvement, however, was offset by a loss of protein solubility, which mostly occurred in R1, where kcat/KM increased 300fold, but solubility decreased from 43% to 25% (Figs. 3c-d). The level of solubility never recovered, while kcat/KM gradually increased until it reached the plateau observed in R7. By contrast, the kcat/KM of VIM2 stagnated at only a 30-fold increase up to round 6, with the subsequent fitness improvements being due to improvement in solubility from 40% to 70% (Fig. 3). Changes in solubility are only weakly correlated to changes in Tm, indicating that other factors such as kinetic stability or protein folding affect the level of soluble protein expression more than thermostability (Supplementary Fig. 3). NDM1 retains the same monomeric quaternary structure along its trajectory to NDM1-R10. The monomeric VIM2-WT, however, evolved to exist in an equilibrium between monomer and dimer by VIM2-R10 (Supplementary Fig. 4a). We isolated monomeric and dimeric states using size-exclusion chromatography, and determined their respective kinetic parameters. The dimer state is less active than the monomer state for PMH activity, indicating that the dimer formation may be associated with increased in overall solubility along the VIM2 trajectory (Supplementary Fig. 4b).
Catalytic parameters of VIM2 variants.
a, Catalytic efficiencies (kcat/KM) of purified variants for PMH activity. Individual catalytic parameters are listed in Supplementary Tables 5-6. b, Correlation between fitness and catalytic efficiency (kcat/KM) along the evolutionary trajectories. c, Changes in solubility in the evolutionary trajectory. SDS-PAGE of each variant was presented in Supplementary Fig. 3.d, Correlation between fitness and solubility along the evolutionary trajectories.
Individual catalytic parameters of NDM1 variants.
Altogether, our results confirm that differences in protein stability cannot explain the differences in evolvability between the two enzymes. VIM2 variants consistently exhibit higher stability and solubility throughout the trajectory, yet the improvements in kcat/KM and fitness are far lower than that of NDM1. Moreover, the distinct phenotypic solutions further highlight the qualitatively different evolutionary processes that led to each trajectory’s fitness plateau. Interestingly, the genetic differences in the evolutionary end points can create additional cryptic variation that becomes apparent when the environment is changed: when the enzymes are expressed at 37°C (rather that the 30°C at which they evolved PMH activity) NDM1 variants are significantly less fit because of their lower soluble expression, whereas VIM2 variants maintain similar fitness levels even at the higher temperature (Supplementary Fig. 5).
Structural adaptation between the evolutionary trajectories
Having established that protein stability does not constrain the evolvability of the enzymes, we sought a molecular explanation by solving the crystal structures of the R10 variants for both trajectories, allowing to us compare them with the previously published wild-type structures (Fig. 4 and Supplementary Table 8)31,32. For NDM1-R10 we obtained crystal structures of the apo-enzyme and a complex with the phenylphosphonate product bound in the active site after in crystallo substrate turnover. Additionally, we conducted molecular dynamics (MD) simulations in complex with the PMH substrate (p-nitrophenyl phenylphosphonate, PPP). We identified three main structural adaptations from NDM1-WT that underlies the 20,000-fold improvement in PMH activity. First, W93G removes the steric hindrance between the side chain of Trp93 and the substrate, and generates a complementary pocket for the phenyl group below loop 3 (Figs. 4b and S6). Second, there is a displacement of loop 3 (Trp93 is located near the base of this loop) inward by ~6 Å, which allows for improved π-π stacking interactions between Phe70 and the p-nitrophenol-leaving group (Figs. 4c and Supplementary Fig. 6). Third, loop 10 is repositioned via reorganization of the local hydrogen bond network (largely by K211R and G222D), allowing it to interact with the leaving group of the substrate (Fig. 4c). For VIM2, we were only able to crystallize the dimer fraction of VIM2-R10. This dimer reveals an unprecedented structural rearrangement: a full half of the structure is symmetrically domain-swapped between two subunits (Figs. 4d and Supplementary Fig. 6). Besides the domain swapping, the major structural arrangement between VIM2-WT and VIM2-R10 involves the reorganization loop 3, as loop 3 is disordered in VIM2-R10, which is caused by six mutations that occurred within and next to loop3 (Figs. 1e and 4d). Four of these six these loop 3 mutations were accumulated by R5, and thus we speculate that these mutations and the loop rearrangement was the major cause of 30-fold increase in PMH activity in VIM2. Taken together, the results further emphasize that the two starting enzymes responded to the same selection pressure by substantially different molecular changes.
Solubility and melting temperature of NDM1 and VIM2 variants.
Crystallographic data collection and refinement statistics.
a, Structural overlay of NDM1-WT (grey, PDB ID: 3SPU) and NDM1-R10 in the apo (cyan, PDB ID: 5JQJ) and the PMH product, phenyl-phosphate (magenta sticks) complexed form (blue, PDB ID: 5K4M). The active site metal ions are shown as spheres. b, Surface views of the active site of NDM1-WT and NDM1-R10 with the PMH product. The product in the NDM1-WT structure is replaced based on the NDM1-R10 structure. The mFo-dFc omit electron density of the pheyl-phosphate is shown (green mesh), contoured at 2.5σ. c, Comparison of active site residues between NDM1-WT and NDM1-R10. d, Structural overlay of monomeric VIM2-WT (grey, PDB ID: 1KO3) and domain swapped-dimeric VIM2-R10 (green and pink, PDB-ID: 6BM9). The disordered loop 3 in VIM-R10 is indicated as a dashed line. The mutations accumulated during the VIM2 trajectory were depicted as light grey spheres. The mFo-Fc omit electron density of the structure is shown in Supplementary Fig. 7.
Molecular basis for the mutational incompatibility of the key mutation W93G
Finally, we investigated the molecular basis for a key mutation that differentiates the two trajectories: W93G. This mutation caused a 100-fold increase in the catalytic activity of NDM1, but when introduced into VIM2, it reduces PMH activity by 10-fold (Supplementary Fig. 2). We performed and compared MD simulations of NDM1-WT, VIM2-WT and models of NDM1-W93G, and VIM2-W93G in the presence of the PMH substrate (Figs. 5, Supplementary Fig. 8 and Supplementary Tables 9-10). NDM1-W93G showed similar structural adaptations as observed in NDM1-R10 (described above); W93G eliminates the steric hindrance with the substrate, and allows loop 3 to shift inward to promote complementary interactions with the substrate, but without the rearrangement of loop 10, which presumably occurs later in that evolutionary trajectory. By contrast, in VIM2-WT, Trp93 adopts a different orientation, avoiding steric hindrance and instead promotes complementary interactions with the phenyl ring of the substrate. Consequently, W93G in VIM2 removes beneficial substrate-enzyme interactions, thus causing a deleterious effect on PMH activity. Similarly, unlike in NDM1, in VIM2-W93G, loop 3 does not form complementary interactions with the substrate, which is consistent with the observation that loop 3 is extensively mutated and reorganized by other mutations later in the VIM2 trajectory. By examining these crystal structures, we found that the “second-shell” residues of Trp93 cause different orientations of the indole sidechain. In NDM1, Leu65, Gln123 and Asp124 constrain Trp93 to point into the active site (Fig 5d). In VIM2, however, the second shell residues (Gln65, Asp123, and Asp124) differ, resulting in Trp93 being stabilized in an alternative conformation (Fig. 5e). Thus, remote and seemingly neutral sequence variation between the enzymes has a substantial impact on the mutagenesis of a key active site residue.
Parameters used to describe the Zn2+ ions in our MD simulations.
List of relevant ionized states as well as the protonation patterns of histidine residues in our molecular dynamics simulations. All other residues were kept in their unionized forms as they were outside the simulation sphere (see main text).
Representative structures from MD simulations of (a) NDM1-WT (grey) and NDM1-W93G (blue), and (b) VIM2-WT (grey) and VIM2-W93G (green). c, Changes in the hydrophobic interactions of the phenyl ring of the PMH substrate upon the W93G mutation observed in the MD simulations of NDM1, quantified through average distances between CE3 and CZ2 atoms of the phenyl ring of PMH and the alpha carbon atoms of residues Leu65 and Val73. d, The position of Trp93 and second shell residues in the NDM1-WT crystal structure (PDB-ID: 3PSU). e, The position of Trp93 and second shell residues in VIM2-WT crystal structure (PDB-ID: 1KO3). f, The average distance between the p-nitrophenyl ring of the substrate and side chain rings of Phe70 in NDM1 and Tyr73 in VIM2, when the two rings form a π-π interaction during the MD simulations.
We expanded our mutational analysis of W93G and V72A (the first mutation in the VIM2 trajectory) to four other orthologous enzymes (EBL1, FIM1, VIM1, VIM7) and found that their effects for both PMH and β-lactamase activities consistently vary significantly, even between orthologs with high sequence identity (Supplementary Fig. 2). For example, in VIM7, which is 80% identical to VIM2, W93G caused a 3-fold increase in activity, despite the same mutation causing a 10-fold decrease in VIM2. Taken together, cryptic and subtle differences in sequence and structure, can influence the conformation of a key active site residue, causing an approximately 3000-fold difference in the phenotypic effect of a mutation, and thereby leading to distinct evolutionary outcomes among orthologous enzymes.
Discussion
The great diversity of protein functions, and many contemporary examples of proteins that have promptly adapted to changing environments suggest a remarkable degree of evolvability for biological molecules33,34.These successful cases of adaptation, however, may also obscure a wealth of cases where proteins failed, or were limited, in their evolution of a new function. Our observations highlight that not all enzymes are equally evolvable, and that seemingly innocuous genetic variation can result in significant consequences for a protein’s ability to evolve a new function. Thus, this work elaborates on the known role of cryptic genetic variation in generating diverse, hidden “pre-adapted” properties5–7, extending this view to encompass the adaptive evolutionary potential of individual genes, with meaningful implications for the evolvability of organisms and populations as well.
Our observations, when combined with those of others, indicate that the evolvability of proteins is generally and profoundly rooted in their genetic variation. Examples from nature and the laboratory have shown that independent evolution trials from a single genotype often follow very similar genetic and evolutionary trajectories, suggesting that evolution is largely deterministic from a particular genetic starting point35–40. On the other hand, the prevalence of epistasis causes distinct mutational effects amongst potential adaptive mutation(s) for otherwise phenotypically similar orthologs14–17, suggesting that genetic variation epistatically “restricts” and/or “permits” the accessibility of certain adaptive mutations 12,41,42. The successful evolution of new protein functions may therefore rely on genetic drift to explore the sequence space, and generate diverse genotypes with differently evolvable genotypes16. The genetic diversity that these neutral drifts generate thereby provide a foundation for the response of a population to new selection pressures9,41. One complicating aspect of this model is the impact of temporal and spatial occurrences of selection pressure on the emergence of diverse genotypes. It is possible that this model explains a commonly observed phenomenon in bacterial adaptation, e.g., drug-resistance and xenobiotic degradation43,44, in which typically there are only a small number of genotypes in a larger population that emerge to confer new functions, after which point these successfully adapted genes quickly disseminate to other bacteria via horizontal gene transfer.
Our observations suggest that the conventional paradigm, where protein stability is the dominant factor in determining evolvability23–25, cannot account for variation in evolutionary potential between the MBL enzymes toward PMH activity. Instead, we found that cryptic and subtle molecular differences have a far greater impact on enzyme evolvability. Moreover, the initial fitness and phenotypes of the enzymes does not necessarily provide a good indicator for their eventual evolutionary outcomes. Thus, adaptive evolutionary potential can be truly “cryptic” and only apparent after evolution happens, and further it is extremely difficult to predict.
These results also have profound implications for protein design, engineering and laboratory evolution: protein engineers overwhelmingly choose a single starting genotype based on the availability of biochemical and structural information (often the highest initial activity), and much effort has been devoted to develop technologies to overcome evolutionary dead-ends45–47. Our observations suggest that it would be more effective to explore diverse genotypes and identify the most evolvable starting sequences to successfully obtain an optimized functional protein. The biophysical rationalization of the cryptic properties of MBL enzymes described here contribute to the ambitious goal of understanding the molecular mechanisms behind neutral genetic variation and evolvability that we observe in nature, in such a way as to allow us to predict evolutionary pathways and understand how to acquire better biological molecules.
Author contributions
F.B. and N.T. conceived and designed this study. F.B. and G.Y. performed experimental evolution, enzymatic assay and mutational analysis. N.H., P.D.C., C.J.J. collected structural data. A.P., A.B. and S.C.L.K. designed and conducted MD simulations. F.B. and N.T. wrote the paper with input from all authors.
Competing Financial Interests
The authors declare no competing financial interests.
Acknowledgements
We thank Dan S. Tawfik, Amir Aharoni, Joelle Pelletier and the members of the Tokuriki lab for comments on the manuscript. Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant (RGPIN 418262-12 and RGPIN 2017-04909), and Canadian Institute of Health Research (CIHR) Foundation Grant to N.T.. N.T. is a CIHR new investigator and a Michael Smith Foundation of Health Research (MSFHR) career investigator. We also thank the Knut and Alice Wallenberg and Wenner-Gren Foundations for fellowships to S.C.L.K. and A.P. respectively, as well as the Swedish National Infrastructure for Computing (SNAC) for supercomputing resources.