Abstract
In the cell, proteins are synthesized from N- to C-terminus and begin to fold during translation. Co-translational folding mechanisms are therefore linked to elongation rate, which varies as a function of synonymous codon usage. However, synonymous codon substitutions can affect many distinct cellular processes, which has complicated attempts to deconvolve the extent to which synonymous codon usage promotes or frustrates proper protein folding in vivo. Although previous studies have shown that some synonymous changes can lead to different final structures, other substitutions will likely be more subtle, perturbing only the protein folding pathway without altering the final structure. Here we show that synonymous codon substitutions encoding a single essential enzyme lead to dramatically slower cell growth. These mutations do not prevent active enzyme formation but instead alter the folding mechanism, leading to enhanced degradation. These results support a model where synonymous codon substitutions can impair cell fitness by altering co-translational protein folding mechanisms. Synonymous codons changes can therefore have a significant impact on fitness even when the native structure is preserved.
Significance Many proteins are incapable of refolding in vitro yet fold efficiently to their native state in the cell. This suggests that more information than the amino acid sequence is required for proper folding of these proteins. Here we show that synonymous mRNA mutations can alter the protein folding mechanism in vivo, leading to changes in cellular fitness. This work highlights the important role of synonymous codon selection in supporting efficient protein production in vivo.
Introduction
Synonymous codon substitutions alter the mRNA coding sequence but preserve the encoded amino acid sequence. For this reason, these substitutions were historically considered to be phenotypically silent and often disregarded in studies of human genetic variation [1,2]. In recent years, however, it has become clear that synonymous substitutions can significantly alter protein function in vivo, through a wide variety of mechanisms that can change protein level [3–5], translational accuracy [6,7], secretion efficiency [8,9], the final folded structure [1,10–12] and post-translational modifications [13]. The full range of synonymous codon effects on protein production is, however, still emerging and much remains to be learned regarding the precise mechanisms that regulate these effects.
One mechanism that has long been proposed but has scant evidence to support its significance in vivo is the extent to which synonymous codon substitutions can perturb co-translational folding mechanisms. In general, rare synonymous codons tend to be translated more slowly than their common counterparts [14–17]. Moreover, rare synonymous codons tend to appear in clusters, creating broader patterns of codon usage [18], many of which are conserved through evolution [19–21]. The folding rates of many protein secondary and tertiary structural elements are similar to their rate of synthesis [22,23], lending conceptual support to the hypothesis that even subtle changes in elongation rate could alter folding mechanisms [24]. In theory, reducing the rate of translation elongation by synonymous common-to-rare codon substitutions could provide the N-terminal portion of a nascent protein with more time to adopt a stable tertiary structure before C-terminal portions are synthesized and emerge from the ribosome exit tunnel [25–27]. Depending on the specific native structure of the encoded protein, such extra time could be either advantageous or detrimental to efficient folding [28]. However, cells contain an extensive network of molecular chaperones to facilitate the folding of challenging protein structures, including several that associate with nascent polypeptide chains during translation [29–32]. Thus, it remains unclear whether a synonymous codon-derived alteration to elongation rate and co-translational folding mechanism could be sufficiently perturbative to rise above the buffering provided by the cellular chaperone network.
Here we show that position-specific synonymous codon changes in the coding sequence of an enzyme essential for E. coli growth can have a dramatic impact on growth rate. We tested a variety of mechanistic origins for this growth defect, including changes to the folded protein structure, expression level, enzymatic activity, mRNA abundance and/or activation of a cell stress response. Our results are consistent with synonymous substitutions altering the co-translational folding mechanism, leading to the formation of intermediates that are more susceptible to post-translational degradation. These results demonstrate that changes to synonymous codon usage can significantly affect protein folding in vivo and therefore have broad implications for both protein design and the interpretation of disease-associated synonymous mutations.
Results
Synonymous codon substitutions impair E. coli growth rate
To develop a system to test connections between synonymous codon usage, co-translational folding and cell fitness, we used chloramphenicol acetyltransferase (CAT), a water-soluble, homotrimeric E. coli enzyme with a complex tertiary structure [33] (Fig. 1). An early study showed that synonymous codon substitutions near the middle of the coding sequence (Fig. 2a) led to lower specific activity for CAT synthesized by in vitro translation [11]. CAT is essential for E. coli growth in chloramphenicol (cam) [34], enabling us to use growth rate in the presence of cam as a convenient fitness assay. Furthermore, because CAT is not part of an operon or regulatory network, we hypothesized that it would be unlikely for feedback regulation of other genes to mask the effects of CAT synonymous codon changes on enzyme function [35]. Crucially, although CAT cannot be refolded to its native structure after dilution from chemical denaturants, the native structure is resistant to unfolding up to 80°C (Fig. S1), highlighting the crucial role of in vivo folding intermediates on attainment of the native CAT structure.
We transformed E. coli with a plasmid encoding the previously described synonymous CAT coding sequence variant [11] under a titratable promoter but detected no discernable difference in growth versus E. coli producing CAT from the wild type (WT) coding sequence (Fig. 2b, S2a). Compared to WT-CAT, this synonymous construct contains a larger number of common codons (Fig. 2a), which leads to increased protein accumulation in vitro due to an overall faster translation elongation rate [11,16,25]. Consistent with these in vitro results, we detected more CAT in E. coli transformed with the coding sequence enriched in common codons (Fig. 2c). We hypothesized that this higher intracellular CAT concentration could mask a defect in specific activity. To test this, we used a Monte Carlo simulation method [18,36] (see Supplemental Methods) to design and select an alternative synonymous CAT coding sequence, Shuf1, which we predicted would be more likely to produce a WT-like amount of CAT. In the Shuf1 coding sequence, the local synonymous codon usage patterns are very different from WT but the global codon usage frequencies are very similar (Fig. 2a, S3). To avoid known effects of 5’ synonymous codon substitutions on translation initiation [5, 37–40], the first 46 codons of Shuf1 are identical to the WT coding sequence. E. coli produced CAT from the Shuf1 coding sequence at levels indistinguishable from WT-CAT (Figure 2c) but exhibited a significantly slower growth rate (Fig. S2a).
We hypothesized that we could further exacerbate the Shuf1-CAT growth defect by adapting a strategy developed by Hilvert and coworkers to couple subtle changes in enzyme function to E. coli growth rate [41]. This strategy involves encoding an ssrA tag at the C-terminus of the protein of interest, selectively enhancing its degradation by the protease ClpXP and leading to correspondingly lower intracellular protein concentrations. The smaller number of remaining enzymes face increased selective pressure to be highly functional. Addition of the ssrA tag did not affect CAT structure, stability or specific activity (Fig. S2b-d) but did lead to a dramatic growth defect for E. coli expressing Shuf1-CAT, versus the ssrA-tagged WT CAT coding sequence, in the presence of cam (Fig. 2b). This defect also led to a lower minimum inhibitory concentration for E. coli expressing Shuf1-versus WT-CAT (Fig. S2e).
Shuf1 CAT mRNA, protein not toxic
A major challenge of all in vivo experiments is discerning the precise origin of an observed effect. For example, a recent study showed that synonymous codon substitutions can lead to toxicity at the mRNA level even in the absence of protein production [42]. To test whether production of Shuf1-CAT mRNA and/or protein is inherently toxic, we compared growth rates of E. coli expressing WT or Shuf1-CAT in the absence of cam. These growth rates were indistinguishable (Fig. 2d), indicating that the Shuf1 defect is specifically related to impaired CAT enzyme function. Moreover, in the presence of cam the growth defect was partially suppressed at higher inducer concentrations (Fig. 2b), contrary to the larger growth defect expected if the Shuf1-CAT mRNA and/or protein were inherently toxic.
To test whether Shuf1-CAT induces a cell stress response, we used mass spectrometry to compare the abundances of 1277 proteins in E. coli expressing ssrA-tagged CAT from either the WT or Shuf1 coding sequence. There was no significant difference detected in the level of most proteins, including known stress-associated molecular chaperones and proteases (Fig. 3a). Taken together, these results support a model where the Shuf1-CAT growth defect is due to a direct loss of CAT protein activity, rather than an indirect effect on other cell functions.
Native CAT proteins are indistinguishable
Synonymous codon substitutions can lead to a wide range of effects on the encoded protein, including changes to translational fidelity (decoding accuracy) [6] and the native structure [1,10,12,17]. As a next test of the mechanism by which Shuf1 codon changes alter cell growth rate, we compared the solubility of CAT produced from the WT and Shuf1 coding sequences. In both cases, CAT was detected only in the soluble fraction of the cell lysate (Fig. S4a), indicating the Shuf1 growth defect is not due to CAT aggregation. Likewise, the purified CAT proteins produced from each mRNA sequence had indistinguishable secondary and tertiary structure (Fig. S4b), indistinguishable resistance to chemical and thermal denaturation (Fig. 3b, S4c) and indistinguishable specific activity (Fig. 3c). We also used mass spectrometry to compare the molecular weights of CAT translated from these coding sequences. These masses were indistinguishable to within one mass unit and matched the expected molecular weight of 25,953 Da. Taken together, these results demonstrate that CAT production from the Shuf1 coding sequence does not prohibit formation of the native, active CAT structure.
Shuf1 coding sequence impairs native CAT protein production
We noticed that addition of the ssrA tag led to a larger reduction in intracellular accumulation of CAT produced from the Shuf1 versus WT coding sequence (Fig. 2c, S4a). To determine whether this decrease in Shuf1-CAT was due to a defect arising from Shuf1 transcription and/or mRNA half-life, we quantified the levels of WT and Shuf1 mRNA. These levels were indistinguishable (Fig. S5a), supporting a model where the Shuf1 synonymous codon changes affect intracellular CAT concentration at the translational level, likely due to a greater susceptibility to protein degradation.
To explore this further, we subjected native, purified ssrA-tagged CAT produced in vivo from the WT or Shuf1 coding sequences to an in vitro ClpXP degradation assay [43,44]. Consistent with our other analyses of the native CAT structures and stabilities (above), the native proteins were equally resistant to degradation by ClpXP (Fig. 4a). This result suggests CAT produced from the Shuf1 mRNA sequence is more susceptible to degradation by ClpXP only prior to acquiring its native structure. Because the ssrA tag is located at the CAT C-terminus, this degradation is presumably post-translational, occurring after release of the nascent chain from the ribosome. To further test this model, we next tested (i) whether the Shuf1 growth defect was dependent on ClpXP activity in vivo and (ii) whether we could develop a physical mechanism for the impact of synonymous codon substitutions on Shuf1-CAT folding.
ClpXP preferentially degrades Shuf1-CAT folding intermediates
A prediction of the model described above is that the Shuf1 codon-dependent growth defect is exacerbated by post-translational degradation of ssrA-tagged CAT by cellular proteases, specifically ClpXP. ClpXP is the major E. coli protease responsible for degrading ssrA-tagged polypeptides under log-phase growth [43,45]. In general, less stably-folded proteins are more susceptible to degradation by ClpXP than more stable substrates [46–48], presumably because less energy is required to unfold unstable protein structures and expose the chains to the ClpXP protease active sites [49]. To test whether ClpXP degradation is the key mechanism impairing growth when E. coli expresses CAT from the Shuf1 coding sequence, we induced expression of WT and Shuf1 CAT in E. coli W3110, which lacks ClpX [46,50], and compared growth in this background to the parent strain, in the presence of cam. ClpX deletion enhanced growth only of cells expressing ssrA-tagged CAT from the Shuf1 coding sequence (Fig. 4b). Likewise, omission of the ssrA tag enhanced growth only for E. coli expressing ClpX; there was no effect on cells lacking ClpX (Fig. 4c). These results confirm that the major effect of the Shuf1 synonymous codon substitutions is enhanced degradation of ssrA-tagged CAT by ClpXP.
Shuf1 synonymous codon substitutions alter the CAT folding mechanism
As the next step towards a physical mechanism for the impact of synonymous codon substitutions on Shuf1-CAT folding, we noticed that after five hours of high-level induction in the presence of cam the growth rate of cells expressing the Shuf1 mRNA sequence increased to match the growth rate of cells expressing the WT sequence (Fig. 2b, right panel; note similar slopes at induction times >5 hr). This increase in growth rate for E. coli transformed with the Shuf1 plasmid was not due to the appearance of suppressor mutations, as cells taken from the endpoints of these growth assays had a reproducible growth lag when subsequently diluted and grown under identical conditions (Fig. S5b). In all of the cell growth assays described above, E. coli were induced to express CAT and at the same time challenged with cam. We hypothesized that this simultaneous challenge to both produce CAT and acetylate the antibiotic might amplify the importance of rapidly producing a sufficient pool of native CAT, highlighting the defect created by the increased susceptibility of CAT folding intermediates to degradation. Further, we hypothesized that even if only a small fraction of CAT translated from the Shuf1 coding sequence attains its native fold, the protease resistance of the native CAT structure (Fig. 4a) will eventually lead to the accumulation of a pool of native CAT sufficient to support a WT-like growth rate, regardless of the precise folding mechanism.
A direct prediction of the hypothesis above is that providing cells with more time to accumulate native CAT prior to cam addition should reduce or eliminate the Shuf1 growth defect. To test this prediction, we modified our growth assay to induce CAT production in the overnight culture, then diluted cells into fresh growth medium in the presence of cam and inducer. Overnight induction was sufficient to suppress the Shuf1-CAT growth defect, but only when a high concentration of inducer was used (Fig. 5). Based on these results, we hypothesized that (1) high levels of induction enable more copies of Shuf1 CAT to fold to its native, active structure and (2) providing more time for CAT protein folding prior to cam addition can suppress the Shuf1 growth defect. In support of these hypotheses, we found that just 30 min of induction prior to cam addition was sufficient to suppress the Shuf1 growth defect, but only at high inducer conditions (Fig. S5c).
mRNA secondary structural stability does not explain Shuf1 growth defect
The results above suggest the Shuf1 synonymous codon substitutions impair CAT co-translational folding by altering the rate of translation elongation. In vitro, synonymous codons have been shown to alter elongation rate either by altering the rate of decoding [51] or by altering downstream mRNA stability, which can impede ribosome translocation [52]. In vivo, there is some evidence that stable mRNA stem-loop structures can alter the elongation rate of the ribosome [53–55], although other studies have detected no difference [37,56,57]. Although the overall predicted mRNA stability of the WT and Shuf1 genes are similar, a predicted stable 3’ stem-loop structure in Shuf1 is not present in the WT coding sequence (Fig. S6a). To test whether this structure is responsible for the Shuf1 growth defect, we created chimeric mRNA sequences with only the 5’, middle or 3’ portion of the wild type sequence substituted with the Shuf1 sequence (Fig. S7a) but observed no growth defect for the chimera bearing the 3’ portion of Shuf1 had no impact on growth rate (Fig. S7b). Moreover, growth rates for these chimeras correlated more closely with the difference in relative codon usage frequencies than measures of mRNA stability (Fig. S6b). Taken together, these results indicate that translation elongation rate differences arising from changes in codon usage frequencies is a more likely origin of the Shuf1 growth defect than changes in mRNA secondary structure.
Discussion
Most of our current understanding of protein folding mechanisms is derived from studies of small proteins that refold reversibly when diluted from chemical denaturants. However, only a small number of proteins can refold robustly in vitro, even though many more can be maintained in a stable state once extracted from the cell [24,58,59]. This suggests that (i) the conformations adopted early during the folding process are crucial to successful folding and (ii) the cellular environment supports the formation of early folding intermediates that are distinct from the conformations populated upon dilution from denaturant. Indeed, there is substantial evidence that molecular chaperones are crucial to the successful folding of many complex proteins in vivo [29–32]. Although it has been hypothesized that synonymous codon changes could alter elongation rate and modify folding mechanisms in vivo, it has thus far been challenging to find evidence to support this hypothesis from experiments performed in vivo, possibly due to buffering provided by molecular chaperones.
Our results demonstrate that, during synthesis, the folding of nascent CAT polypeptide chains is sensitive to synonymous codon-induced changes to translation elongation rate. Although in all cases the nascent chains produced using different synonymous codon patterns remain capable of achieving the native, protease-resistant CAT trimer structure, translation of the wild type mRNA sequence led to the formation of CAT folding intermediates that are less susceptible to degradation by cellular proteases once released from the ribosome (Fig. 6). Crucially, these results demonstrate that although the CAT native structure is indistinguishable regardless of synonymous codon usage, the folding mechanism differs significantly, leading to increased degradation and an adverse effect on cell fitness.
These results are consistent with a small but growing number of studies indicating that synonymous codon substitutions can perturb protein folding mechanisms [1,10,12,60] and highlight strategies for uncovering such perturbations even when they do not alter the final protein structure. In contrast, recent in vitro single molecule force-unfolding experiments have shown that some small, ribosome-bound natively-folded domains fold via similar mechanisms on and off the ribosome [61,62]. However, as these studies noted, forced unfolding measured by molecular tweezers cannot capture the transient folding of a nascent chain during its synthesis, and hence what is measured in these experiments is the effect of close proximity of the ribosome surface, rather than co-translational folding. The very robust folding behavior of these well-characterized, reversible folding models may indeed lead to indistinguishable folding behavior during translation, a model supported by recent force-feedback folding measurements [63]. Of note, the model proteins selected for these studies are smaller than >75% of proteins in the E. coli proteome [24], whereas all known examples of synonymous codon-derived alterations to co-translational folding are much larger (e.g., [1,9,10,64]). We are not aware of an in vitro folding mechanism for a protein >175 aa long that is preserved during co-translational folding.
Our CAT results demonstrate that synonymous changes to mRNA coding sequences can significantly perturb folding of the wild type protein sequence even in the presence of its repertoire of relevant molecular chaperones. This result suggests that mRNA sequences have likely evolved alongside molecular chaperones to most efficiently support folding of the broad repertoire of protein structures produced in vivo. Although our understanding of co-translational folding mechanisms is still in its infancy, these results imply that it should ultimately be possible to rationally design mRNA coding sequences in order to enhance co-translational folding and to identify disease-associated synonymous codon substitutions most likely to adversely affect protein folding, particularly for large or otherwise complex proteins.
Methods
Cell growth assays
A single colony of E. coli KA12 [66] or W3110 [50] transformed with a pKT-CAT plasmid from a freshly streaked LB-amp plate was used to inoculate 20 mL of LB plus 100 μg/mL ampicillin (LB-amp) and grown overnight with shaking at 37°C. Unless otherwise specified, all cultures contained 100 μg/mL ampicillin and no tetracycline. Overnight cultures were used to innoculate fresh LB-amp to an optical density at 600 nm (OD600) of 0.05, to which was added 35 μg/mL chloramphenicol (unless otherwise specified) and the indicated concentration of tetracycline inducer (0-1600 ng/mL), transferred to one well of a 12-well plate and incubated at 37°C with continuous shaking in a Synergy H1 microplate reader (BioTek). Growth was measured as the increase in OD600. The linear portion of the growth curve was fit to a straight line and the slope was taken as the growth rate.
Acknowledgements
We thank Matt Champion for performing the mass spectrometry experiments, Don Hilvert for the kind gift of the pKT and pKTS plasmids and Peter Chein, Don Hilvert, Jeff Nivala and Mark Akeson for sharing E. coli strains with us. We are grateful to Anabel Rodriguez, Gabriel Wright and Scott Emrich for helpful discussions. This project was supported by grants GM120733 and GM105816 from the National Institutes of Health.