Experimental determination of codon usage-dependent selective pressure on high copy-number genes in Saccharomyces cerevisiae

One of the central hypotheses in the theory of codon usage evolution is that in highly expressed genes particular codon usage patterns arise because they facilitate efficient gene expression and are thus selected for in evolution. Here we use plasmid copy number assays and growth rate measurements to explore details of the relationship between codon usage, gene expression level, and selective pressure in Saccharomyces cerevisiae. We find that when high expression levels are required optimal codon usage is beneficial and provides a fitness advantage, consistent with evolutionary theory. However, when high expression levels are not required, optimal codon usage is surprisingly and strongly selected against. We show that this selection acts at the level of protein synthesis, and we exclude a number of molecular mechanisms as the source for this negative selective pressure including nutrient and ribosome limitations and proteotoxicity effects. These findings inform our understanding of the evolution of codon usage bias, as well as the design of recombinant protein expression systems.

Translational efficiency has gained experimental support as one causative parameter that can lead to codon usage bias (Carlini & Stephan, 2003;Chu et al., 2014;Hense et al., 2010;Zhou et al., 2013). Since synonymous mutations can have substantial effects on expressed protein levels, selection for high gene expression levels will favour codon usage patterns compatible with such high levels, but will avoid patterns that restrict attainable expression levels. However, the exact relationship between codon usage, protein expression, other associated parameters and selective pressure are still unclear.
One issue is that selective pressure is difficult to measure directly as an experimental parameter. Moriya et al. used copy number variations in high copy number vectors in Saccharomyces cerevisiae to estimate the direction and extent of selective pressure when these vectors contained different genes (Makanae, Kintaka, Makino, Kitano, & Moriya, 2013;Moriya, Shimizu-Yoshida, & Kitano, 2006). While natural 2 micron circles employ a sophisticated system to stabilise copy numbers (McQuaid et al., 2017), yeast cloning vectors containing solely the 2 micron origin of replication exhibit strong copy number variations and appear to be inherited in a stochastic fashion (Moriya et al., 2006). In cases where such vectors contain genes that affect the growth rate of the cell in a copy-number dependent fashion, cells with reduced or increased copy number (depending on whether the gene in question has a negative or positive effect on growth rate) have a growth advantage. Since such cells are also more likely to produce daughter cells with copy numbers that are lower/ higher than the population average, the average copy number per cell changes until a new balance is achieved between the effect of the gene on growth rates, the metabolic cost of maintaining high numbers of vectors in the cell, and the ability to successfully inherit the vector and its selectable marker to the majority of daughter cells.
Here, we introduce reporter genes with differing codon usage into 2 micron cloning vectors and study the effect of varying codon usage on the high-copy number vector content and growth rate of transformed yeast cells. Surprisingly, our results suggest that codon optimisation is selected against unless high level expression of the gene provides a competitive growth advantage.

Results
We previously described a dual luciferase reporter system allowing us to assess the effect of codon usage variation on Firefly luciferase expression levels (Chu et al., 2014;Chu, Barnes, & von der Haar, 2011). The original reporter system was implemented using centromeric (low copy number) vectors, which contained three different Firefly luciferase variants encoded only by unfavourable codons (minFLuc), by the wild-type gene from Photinus pyralis which has random codon usage from the point of view of yeast (staFLuc), or by the fully codonadapted gene (maxFLuc). All vectors also contained an invariant Renilla luciferase gene for normalisation. For this vector system we had demonstrated that the expression ratio measured using dual luciferase assays corresponds well to the expression ratio measured by western blotting (Chu et al., 2011). We were therefore surprised to observe that, when the same expression system was implemented in multi-copy vectors based on the 2 micron origin of replication, the codon-dependent variation in expression levels was indistinguishable from the centromeric system when measured in dual luciferase assays, but differed markedly when assessed using western blots (figure 1). Note that all experiments shown in the current study were conducted with cytoplasmic firefly luciferase variants which contain deletions of the last three amino acids to remove a peroxisomal targeting signal (Chu et al., 2014).
We reasoned that our results could be explained if plasmid copy numbers changed substantially between expression plasmids containing the different codon usage variants. The dual luciferase assay measures expression normalised per plasmid, whereas the western blotting approach measures total numbers per cell, so that changes in plasmid copy number would only become visible in the latter assay. To test this assumption directly we measured the plasmid copy number for the different luciferase variants using a qPCR approach, and found that introduction of poorly adapted, wild type, and strongly codon adapted sequences sequentially reduced the steady-state plasmid copy number over a fourfold range compared to the vector control (a plasmid which only expresses Renilla luciferase but no firefly luciferase, figure 2b). This implies that expression of a strongly codon-adapted gene is less favourable and more strongly selected against than expression of weakly codon adapted genes. This effect is also visible at the levels of the growth rates of the respective transformants, which become successively reduced with increasing codon adaptation of the recombinant genes (figure 2c). This reduced growth rate appears due to lower cell division rates, as we do not detect any changes in the proportion of viable cells (data not shown).
To test whether selection against codon optimised genes in this system arose at the level of protein synthesis, we introduced a point mutation into the codon adapted maxFLuc construct that resulted in a premature stop signal at the fifth codon of the open reading frame (maxFLuc K5X). This prevents almost all of the translational activity on the mRNA, without significantly affecting the overall sequence or nucleotide composition of the construct nor the transcriptional activity of the promoter. The K5X mutation substantially rescued both the reduction in plasmid copy number and the reduction in growth rate observed with the translated maxFLuc construct (figure 3). Thus, the major effect on fitness requires ongoing translation of the full-length ORF. A, the steady state copy number of 2 micron plasmids is modulated by the genes they contain. B, increasingly optimised codon usage leads to a decrease in steady state plasmid copy numbers as assessed by real time PCR. C, increasingly optimised codon usage in the recombinant luciferase gene leads to a decrease in growth rate. In both panels B and C, "vector" refers to a vector expressing only Renilla luciferase without a firefly luciferase gene. Statistical significance of results was assessed by ANOVA and Tukeys' HSD test. Significance to the vector control is indicated as: ns, p>0.05; **, p<0.01; ***, p<0.001.
We explored a number of potential mechanisms which by our reasoning could have caused the selection against codon optimisation. We determined that the three constructs produced approximately 2.1x10 -14 , 4.5x10 -14 and 5.9x10 -14 g firefly luciferase protein per cell ( figure 4A). In other words, the highest expressing construct produced just over 1% of the 5x10 -12 g of protein contained in a haploid yeast cell (von der Haar, 2007). Most of the resources required for protein synthesis, including amino acids and energy, should scale linearly with the amount of protein made, and a 1-2% increase in resource usage would be unlikely to cause a 10-20% reduction in growth rate. Consistently, we do not observe any change in the effect of maxFLuc expression on the growth rate if we double the concentration of carbon, nitrogen, or amino acids in the medium (figure 4B).
We and others previously predicted that, under fast growth conditions, providing ribosomes for the expression of additional proteins could only be accommodated at the cost of reduced growth (Chu et al., 2011;Chu & von der Haar, 2012;Shah, Ding, Niemczyk, Kudla, & Plotkin, 2013). We therefore estimated the number of ribosomes required to produce the observed protein levels from the three FLuc variants. This number essentially scales with the number of proteins produced per unit time, and inversely scales with the time spent to complete translation of the ORF (ie more ribosomes are needed to make more protein at the same speed, but less ribosomes are needed to make the same amount of protein at higher speed). In the case of our FLuc constructs, the increased speed of translation outweighs the more frequent initiation events for the maxFLuc construct, resulting in a predicted net decrease of ribosome usage with increasing codon optimisation ( figure 4C). This decrease in ribosome usage for codon optimised genes is expected, and is thought to be one of the genome-wide drivers for codon optimisation as this maximises cell-wide ribosome availability (Shah & Gilchrist, 2011). In the context of our experiments, it rules out that ribosome shortages explain the selective pressure against codon optimised constructs. Lastly, we reasoned that simple proteotoxic effects might explain our results. However, the responsiveness of a maxFLuc expressing construct to the proteotoxic drug AZC (Trotter, Berenfeld, Krause, Petsko, & Gray, 2001) is unaltered ( figure 4D), even though we would expect that any pre-existing proteotoxicity should increase the sensitivity to this drug. We also did not detect any significant increases in the expression of the disaggregase Hsp104, which is usually induced under conditions of proteotoxic stress (figure 4E, Bösl, Grimminger, & Walter, 2006).
In summary, although we clearly observe negative selective pressure against codonoptimised firefly luciferase, our experiments do not offer a simple explanation for this effect.
While we have no immediate mechanistic explanation for the strong negative selective pressure arising from codon optimisation in our construct, selection against the optimisation of a gene that is of no benefit to the cell makes some sense from an evolutionary perspective. To test how the dynamics of negative and positive selection interplay in the case of proteins that are of benefit to the cell, we repeated the FLuc expression experiments with the HIS3 gene, a natural yeast gene which catalyses the 6 th step of the histidine biosynthesis pathway (Struhl & Davis, 1977). The yeast strain used in our experiments carries a chromosomal his3 deletion and can only grow in the absence of histidine if the gene is provided extrachromosomally.

Figure 5. The interplay between required gene expression levels and fitness effects arising from codon optimisation.
Growth rates of strains with a chromosomal deletion of the HIS3 gene but containing codon variants of the same gene on 2 micron plasmids are shown. The plasmid borne-gene is expected to be superfluous in -Ura medium as this contains histidine, essential but required at low expression levels in -Ura/-His medium; and essential and required at high expression levels in the same medium containing 3-aminotriazole (3-AT), a competitive inhibitor of His3 enzymatic activity. Statistical significance of results was assessed by ANOVA and Tukeys' HSD test.
Significance to the vector control (top panel) or to the minHIS3 strains (middle and bottom panels) is indicated as: ns, p>0.05; ***, p<0.001.
We determined the growth rates of yeast strains containing poorly adapted, wild-type, and well adapted HIS3 genes under three different growth conditions. In the presence of histidine in the growth medium, the gene is not beneficial to the cell, similar to the FLuc genes in the experiments described above. In contrast, in the absence of histidine HIS3 is essential, although it is not required at high expression levels (Chu et al., 2014). Lastly, in the absence of histidine and the presence of a competitive inhibitor of His3 enzymatic activity, 3-aminotriazole (3-AT), high levels of His3 protein are required to overcome the inhibitor (Durfee et al., 1993).
When His3 is not required, we observe a similar pattern of consecutively reduced fitness as for the FLuc genes ( figure 5, upper panel). This confirms that the fitness effects observed upon high level FLuc expression are not specific to heterologous proteins and hold also upon high-level expression of a homologous gene. When His3 activity is required, the effect of the different constructs on growth rates is quantitatively changed and initially becomes statistically insignificant (figure 5, middle panel). When very high His3 levels are required in the presence of 3-AT, the codon-optimised gene confers a clear fitness advantage over the less optimised genes (figure 5, bottom panel). Links between preferred codon usage and the requirement for high expression levels have previously been established largely by way of correlation, our observations provide experimental confirmation of this notion.
To ask whether the effect of codon usage on selective pressure is quantitatively determined by basic yeast physiology, or is dependent on the genetic background of the strain in question, we compared the growth rates of a number of genetically tractable strains from the Saccharomyces genome resequencing project (SGRP, Cubillos, Louis, & Liti, 2009) in the presence and absence of the maxFLuc gene ( figure 6). We found that the proportional reduction in growth rate upon expression of the gene ranged from 10% in the mildest affected strains, to 50% in the worst affected strains. Although many of the SGRP strains flocculate strongly and determination of growth rates for these strains is less accurate than for BY4741, this wide range indicates that the genetic make-up of strains strongly modulates the correspondence between codon usage and selective pressure.

Discussion
In interpreting our results, it is useful to define in how far plasmid copy number changes reflect selection mechanisms applying during the actual evolution of organisms. Codon usage is thought to have effects at both the individual gene level and at genome-wide levels.
Effects at individual gene level include permitting high translational activity or conversely restricting translational activity to low levels (Tarrant & von der Haar (2014) and references therein), as well as effects on RNA stability (Presnyak et al., 2015) and transcriptional activity (Zhou et al., 2016). Genome-wide effects include compliance with particular GC bias (Palidwor, Perkins, & Xia, 2010), optimisation of ribosome usage by reducing the average dwell time of ribosomes on mRNAs (Chu & von der Haar, 2012;Shah et al., 2013) and other effects (Plotkin & Kudla, 2011). Genes contained in high copy number plasmids are in principle subject to effects at both levels, but because changing the sequence of such a plasmid is equivalent to changing the sequence of multiple genes simultaneously in the genome, we expect the selective forces acting on this system to be more similar to genomewide selective forces than would be the case in a single copy plasmid.
The observation of particularly strong codon usage bias in highly expressed genes (Sharp, Tuohy, & Mosurski, 1986) implies that highly expressed genes show such bias, but also that low-expressed genes do not. It is thought that the absence of bias in genes with low expression levels arises because codon usage is at equilibrium, rather than being selected against (Sharp, Emery, & Zeng, 2010). Interestingly, our data suggest that codon bias can be selected against in genes with low expression levels. Moreover, the successive reduction in plasmid copy numbers with increasing codon optimisation (figure 2) suggests that negative selection is not restricted to extreme codon usage but applies to genes with close to equilibrium codon usage. Consistent with this notion, statistical enrichment of nonpreferred codons in low expressed genes has been shown in some studies (Neafsey & Galagan, 2007).
The principal underlying assumption for explaining codon usage bias in highly expressed genes, that codon usage is selected for when high expression levels are required, has widespread support and is generally accepted. However, to our knowledge this has never been directly experimentally tested. By linking codon usage variants of the endogenous yeast HIS3 gene to growth conditions where this gene is required at different expression levels, we provide an experimental test for this hypothesis. Our findings are entirely consistent with the prevailing hypothesis.
While our results are informative for understanding basic evolutionary mechanisms, they also provide useful information for the design of recombinant protein expression systems. 2 micron vector systems are used for the production of various yeast-derived biopharmaceuticals (Finnis et al., 2010;Gerngross, 2004;Thim et al., 1986). If plasmid copy numbers in these cases are subject to the same effects we describe here, productivity could be boosted by introducing measures that stabilise plasmid copy number, although this could also produce adverse effects by increasing the selection for non-expressing mutants. The observation of substantial strain variability in the effect of recombinant gene expression on growth rates (figure 6) also indicates that the genetic variability of yeast could be harnessed for balancing these effects in order to stabilise high level expression systems.

Materials and Methods
Strains, plasmids and media. The main yeast strain used was BY4741 (Mata ura3Δ0 his3Δ0 leu2Δ0 met15Δ0, Brachmann et al., 1998). Genetically tractable strains from the SGRP collection (Cubillos et al., 2009) were obtained from the National Collection of Yeast Cultures (NCYC, UK, haploid Mata strains from SGRP strain set 2). Luciferase expression vectors were generated by transferring XmaI/EcoRI RLuc and BamHI/SalI CFLuc fragments from the centromeric vectors described in (Chu et al., 2014) into pBEVY-U (Miller, Martinat, & Hyman, 1998). HIS3 expression vectors were generated by cloning the C-terminally HAtagged HIS3 variants described in (Chu et al., 2014) as BamHI/PstI fragments into the same vector. Plasmid sequences, maps, and the plasmids themselves are available through Addgene (table 1). Yeast strains were transformed as described (Gietz & Schiestl, 2007)  Growth rate measurements. Growth rates were measured in 24-or 48-well cell culture plates using automated plate readers. 1 ml medium per well (0.5 ml for 48-well plates) were inoculated with material from transformed yeast colonies grown on selective agar plates and grown overnight in a standard shaking incubator. Following overnight growth, 1 ml or 0.5 ml of fresh medium per well contained in a new plate was inoculated with 10 µl or 5 µl of culture from the overnight plate. Plates were then incubated in Spectrostar Nano plate readers (BMG Lab Tech, UK) and incubated at 30°C under constant shaking, with automated oD measurements every 30 minutes until the culture had reached stationary phase. To analyse the resulting data, logarithmic curves were fitted to seven-time-point windows along the entire incubation time, and the highest growth rate returned from this fitting exercise was reported as the maximum logarithmic growth rate for each well.
qPCR assays. These were adapted from the procedure described in (Moriya et al., 2006). Briefly, 2 oD600 units of cells were resuspended in 850 µl of 1.2 M sorbitol, 100 µl of 200 mM sodium phosphate buffer (pH 7.2), and 50 µl lyticase (Sigma Aldrich, UK, L2524, resuspended in sodium phosphate buffer at 5 units/ µl). The suspension was incubated at 37°C for 30 minutes, and then subjected to 3 cycles of incubation for ten minutes at 95°C followed by incubation for fifteen minutes at -80°. The extract was clarified by centrifugation in a microcentrifuge at 13,000 rpm for 5 minutes, and 0.1 µl of the supernatant were used as input for the qPCR reactions.
For the latter, we used primers against the URA3 marker gene on the pBEVY-U vectors (qURA3f, AGCAGAATTGTCATGCAAGG, qURA3r, TTCCACCCATGTCTCTTTGA) and against the LEU3 gene for standardisation (Chu et al., 2014). qPCR reactions were assembled using QuantiFast SYBR Green PCR Kits (QIAgen, UK) according to the manufacturer's instructions, and reactions were run using a two-step amplification protocol. Ct values were determined in the logarithmic amplification range and converted to fold change values as described (Pfaffl, 2001). All reported data were determined using triplicate biological replicates, each assayed in technical duplicate.
Data analysis and statistics. All data analysis scripts used to prepare individual figures are available for download from Github 1 or for execution online from MyBinder 2 . Statistical significance between samples was tested using ANOVA followed by Tukey's Honestly Significant Difference test.