Abstract
Many organisms are subject to selective pressure that gives rise to unequal usage of synonymous codons, known as codon bias. To experimentally dissect the mechanisms of selection on synonymous sites, we expressed several hundred synonymous variants of the GFP gene in Escherichia coli, and used quantitative growth and viability assays to estimate bacterial fitness. Unexpectedly, we found many synonymous variants whose expression was toxic to E. coli. Unlike previously studied effects of synonymous mutations, the effect that we discovered is independent of translation, but it depends on the production of toxic mRNA molecules. We identified RNA sequence determinants of toxicity, and evolved suppressor strains that can tolerate the expression of toxic GFP variants. Genome sequencing of these suppressor strains revealed a cluster of promoter mutations that prevented toxicity by reducing mRNA levels. We conclude that translation-independent RNA toxicity is a previously unrecognized obstacle in bacterial gene expression.
Significance statement Synonymous mutations in genes do not change protein sequence, but they may affect gene expression and cellular function. Here we describe an unexpected toxic effect of synonymous mutations in Escherichia coli, with potentially large implications for bacterial physiology and evolution. Unlike previously studied effects of synonymous mutations, the effect that we discovered is independent of translation, but it depends on the production of toxic mRNA molecules. We hypothesize that the mechanism we identified influences the evolution of endogenous genes in bacteria, by imposing selective constraints on synonymous mutations that arise in the genome. Of interest for biotechnology and synthetic biology, we identify bacterial strains and growth conditions that alleviate RNA toxicity, thus allowing efficient overexpression of heterologous proteins.
Main text
Although synonymous mutations do not change the encoded protein sequence, they cause a broad range of molecular phenotypes, including changes of transcription 1, translation initiation2, 3, translation elongation4, translation accuracy5, 6, RNA stability7, and splicing8. As a result, synonymous mutations are under subtle but non-negligible selective pressure, which manifests itself in the unequal usage of synonymous codons across genes and genomes9-11. Several recent experiments directly measured the effects of synonymous mutations on fitness in bacteria2, 12-17. It has been commonly assumed that fitness depends primarily on the efficiency, accuracy, and yield of translation. Here we show that in the context of heterologous gene expression in E. coli, large effects of synonymous mutations on fitness are translation-independent, and are mediated by RNA toxicity.
To study the effects of synonymous mutations on bacterial fitness, we used an IPTG-inducible, bacteriophage T7 polymerase-driven plasmid to express a collection of synonymous variants of the GFP gene2 in E. coli BL21-Gold(DE3) (henceforth referred to as BL21) cells (see Methods). Without IPTG induction, there were no discernible differences in growth between strains (Figure 1A). When induced with IPTG, the growth rate of GFP-producing strains was reduced, consistent with the metabolic burden conferred by heterologous gene expression. The growth phenotype varied remarkably between strains expressing different synonymous variants of GFP (Figure 1B, Supp Figure 1). “Slow” variants caused a long lag phase post-induction, indicating that at this stage the cells either stopped growing or died, while ‘fast’ variants showed growth rates closer to non-induced cells. Several hours after induction, the slow variants appeared to resume growth (Figure 1B): we found that this was related to the emergence of suppressor strains that could tolerate the expression of these variants (Supp Figure 1D, and see below).
We quantified cell viability post-induction by assessing the colony-forming ability of cells (Figure 1C). Fast variants showed the expected increase in cell numbers post-induction, but slow variants caused a 1000-fold decrease in viable cell numbers. Similarly, spotting of non-induced cells onto LB plates with IPTG showed that the slow variants formed markedly fewer colonies than fast variants (Figure 1D). Microscopic analysis of slow variants showed decrease in cell number, growth arrest and in some cases massive cell death following IPTG induction. In the case of fast variants we observed normal increase in cell numbers and negligible cell death after induction (Supp Figure 2). These results indicate that certain synonymous variants of GFP cause significant growth defects when overexpressed in E. coli cells, and we will henceforth refer to these variants as “toxic”.
To test if toxicity was specific to T7 promoter-driven overexpression, we analysed growth phenotypes following the expression of a subset of GFP variants using a bacterial polymerase (trp/lac) promoter system (Methods). Although the growth phenotypes measured with bacterial promoter constructs were not as dramatic as with T7-based constructs, presumably because of lower GFP expression levels, growth rates with both types of promoters were correlated with each other (Figure 1E). Interestingly, toxicity increased at high temperature, and decreased at low temperature (Supp Figure 1C). Taken together, these results indicate that the toxic GFP variants cause growth defects in two different E. coli strains, with two types of promoters, possibly through a common mechanism.
To understand if toxicity depends on the process of translation, we selected several toxic and nontoxic variants of GFP and mutated their Shine-Dalgarno (SD) sequences from GAAGGA to TTCTCT to prevent ribosome binding and block translation initiation. As expected, mutation of SD sequences completely inhibited the production of functional GFP protein from all tested constructs (Figure 2A). To our surprise, GFP variants without SD sequences remained toxic, and their effects on growth were indistinguishable from variants with a functional SD sequence (Figure 2B). Western blot analysis confirmed that mutation of the SD sequences ablates GFP expression (Supp Figure 3). We considered the possibility that a cryptic SD element within the coding region allowed translation of a truncated fragment of GFP, which would be consistent with loss of GFP fluorescence and translation-dependent toxicity. However, analysis of the coding regions with the RBS Calculator18 revealed no strong SD consensus sequences. These results raise the possibility that toxicity might arise at the RNA level, rather than at translation or protein level.
To identify sequence elements required for toxicity, we selected one of the toxic variants (GFP_170), and a nontoxic variant (GFP_012), and performed DNA shuffling19 to generate constructs that consisted of random fragments of GFP_170 and GFP_012. All the shuffled and non-shuffled constructs we generated encoded the same GFP protein sequence. Analysis of growth rate phenotypes of these shuffled constructs revealed a fragment near the 3′ end of the GFP_170 coding sequence (nt 514-645) that was sufficient to elicit the toxic phenotype (Figure 2C, Supp Figure 4A, B). Some mutations outside of the toxic region partially improved fitness, which might be explained by interactions of the RNA secondary structure between the toxic region and the mutated regions. The GFP_170 mRNA is predicted to have a very low translation initiation rate, due to strong RNA secondary structure near the mRNA 5′ end2. Nevertheless, replacement of the strongly structured 5′ region with an unstructured fragment did not affect toxicity (Supp Figure 4A, B).
The above results led us to hypothesize that the toxicity associated with GFP expression was independent of translation, but depended on the presence of a specific fragment of RNA. To test this hypothesis, we performed growth rate measurements with a series of constructs. First, we isolated the 132-nt toxic region identified in the DNA shuffling experiment, and expressed it on its own, with or without start and stop codons. The expression of the 132-nt fragment of GFP_170 was sufficient for toxicity, whereas the corresponding fragment of GFP_012 did not cause toxicity. The effect of the 132-nt fragments on growth did not depend on the presence of translation start and stop codons (Figures 2C, D), the fragments contained no cryptic translation initiation signals, and FLAG tag fusions showed no detectable protein expression from the GFP_170 fragment in any of the three reading frames (Supp Figure 3B). Second, we introduced stop codons upstream of the toxic fragment in the GFP_170 coding sequence, and in the corresponding positions of GFP_012. This placement of stop codons ensures that ribosomes terminate translation before reaching the putative toxic region of the RNA, while still allowing a full-length transcript to be produced. As expected, internal stop codons abrogated GFP protein production (Figure 2C), but despite the presence of premature stop codons, GFP_170_Stop still caused toxicity to bacterial cells while GFP_012_Stop remained non-toxic (Figure 2D). To remove possible out-of-frame translation, we inserted stop codons into GFP_170 in all three frames, before and after the toxic region, and toxicity remained the same in all cases (Supp Figure 4C). Third, we introduced an efficient synthetic T7 transcription terminator20 upstream of the toxic region in GFP_170 and in the corresponding location in GFP_012. Notably, we found that both variants with internal transcription terminators became nontoxic, and GFP_170_TT grew slightly faster than GFP_012_TT (Figure 2D). The GFP_170 fragment also caused toxicity when fused to FLAG tags (in any of the three reading frames), and when fused to fluorescent protein mKate2, it caused toxicity and reduced expression of mKate2 by 50-fold (Supp Figure 4D, E, F). Overall, these data suggest that toxicity is caused by the RNA itself, rather than the process of translation or by the protein produced.
To investigate the sequence determinants of RNA-mediated toxicity, we measured the growth phenotypes of single synonymous mutations within the 132-nt region of GFP_170. Close to half of these mutations reduced or abolished the toxic phenotype, whereas the remaining mutations had no effect (Figure 3A). There was no clear relationship between the position of mutations within the region and their effect on growth, nor was there any relationship between the type of nucleotide introduced and growth. RNA toxicity associated with triplet repeats has been described in Eukaryotes21, but we found no triplet repeats in the toxic GFP mRNAs. Consistent with our observation that the toxic effect does not require translation, codon adaptation index was not associated with toxicity (Figure 3B). RNA folding energy, measured either in the immediate vicinity of each mutation, or for the entire 132-nt mutagenized region, was not correlated with toxicity, and we were unable to identify any RNA structural elements associated with the toxic phenotype (data not shown). We further probed the effects of sets of several mutations within the 132-nt toxic region. 75/98 sets of mutations we introduced within the region reduced or abolished toxicity, whereas 23/98 sets had no effect (Supp Figure 5). In almost all cases, the phenotypes of sets could be deduced from the effects of individual mutations in a simple way: if any mutation in a set abolished toxicity, then the set also did. Four sets did not conform to this rule, indicating potential epistatic interactions between mutations (not shown). Mutations near the 3′ end of the 132-nt fragment had no effect on toxicity, identifying a minimal toxicity-determining region of about a hundred nucleotides that either consists of a single functional element, or it contains multiple elements whose cooperative action causes toxicity.
Several recent studies examined the effects of synonymous mutations on fitness in bacteria, either in endogenous genes, or in overexpressed heterologous genes2, 12-16. Fitness had been found to correlate with the codon adaptation index (CAI), GC content, RNA folding, protein expression level, a codon ramp near the start codon, and measured or predicted translation initiation rates. We quantified these variables in a set of 190 synonymous variants of GFP, and analysed their impact on fitness. We also considered two candidate toxic RNA fragments (GFP_170, nt 514-645, and GFP_155, nt 490-720), both of which were common to several constructs and appeared to negatively influence fitness (Figures 3C, D). High protein expression was previously shown to correlate with slow growth14, whereas we found positive correlations of fitness with total protein yield or protein yield per cell. These correlations presumably reflect reduced protein yields and cell growth after the induction of toxic RNAs. As seen previously, growth rate and optical density were positively correlated with CAI, and GC content was correlated with optical density2, 16. However, in a multiple regression analysis aimed to disentangle the effects of these covariates, we found that the presence of candidate toxic RNA fragments predicted slow growth in both BL21 and DH5α cells, whereas CAI and GC3 did not (Methods). This suggests that the apparent correlation of CAI or GC content with fitness, observed in this and previous studies2, 16, might result from the confounding effect of toxic RNA fragments (Supp Figure 6A, B). Consistently, an experiment with 22 new, unrelated synonymous GFP constructs spanning a wider range of GC content showed no correlation between GC content and bacterial growth (Supp Figure 6C, D). To further test whether toxicity could be explained by unusually high expression of certain GFP variants, we measured the mRNA abundance of 79 toxic and non-toxic RNAs by Northern blots, and correlated GFP mRNA abundance per cell with OD. Although we observed differences in mRNA abundance, mostly related to mRNA folding2, we find no significant correlation between RNA abundance and toxicity (Spearman rho=0.12, p=0.29). Furthermore, we detected no consistent differences in plasmid abundance between toxic and nontoxic variants.
To study the molecular mechanisms of toxicity caused by mRNA overexpression, we aimed to evolve genetic suppressors of this phenotype. We selected several GFP constructs that showed both strong toxicity and moderate or high GFP fluorescence, and plated bacteria containing these constructs on LB agar plates with IPTG and ampicillin. We observed a number of large white colonies that apparently expressed no GFP, and smaller bright green colonies producing high amounts of the GFP protein (Figure 4A). We hypothesized that the green colonies have acquired a genomic mutation that allowed cells to survive while expressing toxic RNAs. To support this, we cured the evolved strains of their respective plasmids and re-transformed the cured strains with the same plasmid. The re-transformed strains readily formed bright green colonies on IPTG+ampicillin plates, and exhibited faster growth rates in IPTG medium compared to the parental strain. This supported our hypothesis that the mutations were located on the chromosome and not the plasmid. We therefore selected 22 evolved strains and the parental strain for genome sequencing, and used the GATK pipeline for calling variants (Methods).
In all green suppressor strains, we found a single cluster of mutations in the Plac promoter of the T7 polymerase gene that explains the suppressor phenotype (Figure 4B, C, Supp Table 1). The parental BL21 strain contains two alleles of the Plac promoter: the wild-type allele PlacWT controls the lac operon, and a stronger derivative allele PlacUV5 controls T7 RNA polymerase. In the suppressor strains, recombination between these two loci associates PlacWT promoter with T7 polymerase, leading to reduced levels of polymerase and presumably to reduced transcription of GFP. The same Plac promoter mutations were recently observed in the C41(DE3) and C43(DE3) strains of E. coli (the ‘Walker strains’), and were responsible for the reduced T7 RNA polymerase expression, high-level recombinant protein production, and improved growth characteristics of those strains22-24. Similar to our suppressor strains, C41(DE3) and C43(DE3) allowed high protein expression of toxic GFP variants, and little toxicity was observed in these strains (Figure 4D). Taken together, these results support our conclusion that high levels of RNA, rather than RNA translation or protein, are responsible for toxicity.
To test whether translation-independent RNA toxicity might affect genes other than GFP, we turned to the ogcp gene, which encodes a membrane protein Oxoglutarate-malate transport protein (OGCP) believed to be toxic for E. coli. OGCP overexpression was originally used to derive the C41( DE3) strain, now commonly used for recombinant protein expression22. As expected, we found that expression of OGCP was toxic to BL21 but not to C41(DE3) cells. In agreement with our observations for GFP, a translation-incompetent variant of OGCP lacking the Shine-Dalgarno sequence was just as toxic to BL21 cells as a translation-competent variant (Supp Figure 7). A translation-competent, codon-optimized variant of OGCP retained toxicity in BL21 cells. These experiments suggest that translation-independent RNA toxicity might be a widespread phenomenon associated with heterologous gene expression in E. coli. Heterologous protein expression is known to inhibit growth of E. coli. Toxicity is typically attributed to the foreign protein itself, and it is often remedied by lowering expression, reducing growth temperature, or using special strains of E. coli such as C41(DE3). Here we demonstrate that the same strategies and strains also prevent toxicity when RNA, rather than protein, is the toxic molecule. We speculate that other cases of toxicity, previously attributed to proteins, may in fact be caused by RNA. Although the molecular mechanisms of RNA toxicity are presently unclear, we identified several GFP and OGCP variants with similar phenotypes, suggesting that the phenomenon may be common. Interestingly, induction of wild-type APE_0230.1 in E. coli inhibits growth, but a codon-optimized variant does not inhibit growth despite increased protein yield25. In addition, several recent high-throughput studies found unexplained cases of slow growth or toxicity upon the expression of various random sequences in E. coli14, 26, 27. Our results point to RNA toxicity as a possible cause of these observations.
Our results are relevant to the phenomenon of synonymous site selection in microorganisms. Synonymous mutations can influence fitness directly (in cis), by changing the expression of the gene in which the mutation occurs12, 13, 15, or indirectly (in trans), by influencing the global metabolic cost of expression2, 14, 16, 28. Experiments with essential bacterial genes predominately uncover cis-effects, most of them mediated by changes of RNA structure or other properties that influence translation yield. For example, mutations in Salmonella enterica rpsT downregulated the gene, and could be compensated by additional mutations in or around rpsT or by increase of the gene copy number13. Similarly, mutations that disrupted mRNA structure of the E. coli infA gene, through local or long-range effects, explained much variation in fitness across a large collection of mutants12. Protein abundance and RNA structure contribute to the observed trans-effect of mutations14. Although our results are broadly consistent with a role of RNA structure, the specific structure is unknown, and the effects we uncovered are translation-independent, suggesting that a novel mechanism is involved. Toxic RNAs might interact with an essential cellular component, either nucleic acid or protein, and interfere with its normal function. Such interactions might be uncovered by pulldowns of toxic RNAs combined with sequencing or mass spectrometry. Alternatively, RNA phase transitions may be involved; such transitions have been shown to contribute to the pathogenicity of CAG-expansion disorders in Eukaryotes, providing a mechanistic explanation for this phenomenon29. Further studies will address the mechanisms, biotechnology applications, and evolutionary consequences of RNA toxicity in bacteria.