Abstract
The degeneracy of the genetic code allows nucleic acids to encode amino acid identity as well as non-coding information for gene regulation and genome maintenance. The rare arginine codons AGA and AGG (AGR) present a case study in codon choice, with AGRs encoding important transcriptional and translational properties distinct from the other synonymous alternatives (CGN). We created a strain of Escherichia coli with all 123 instances of AGR codons removed from all essential genes. We readily replaced 110 AGR codons with the synonymous CGU, but the remaining thirteen “recalcitrant” AGRs required diversification to identify viable alternatives. Successful replacement codons tended to conserve local ribosomal binding site-like motifs and local mRNA secondary structure, sometimes at the expense of amino acid identity. Based on these observations, we empirically defined metrics for a multi-dimensional “safe replacement zone” (SRZ) within which alternative codons are more likely to be viable. To further evaluate synonymous and non-synonymous alternatives to essential AGRs, we implemented a CRISPR/Cas9-based method to deplete a diversified population of a wild type allele, allowing us to exhaustively evaluate the fitness impact of all 64 codon alternatives. Using this method, we confirmed relevance of the SRZ by tracking codon fitness over time in 14 different genes, finding that codons that fall outside the SRZ are rapidly depleted from a growing population. Our unbiased and systematic strategy for identifying unpredicted design flaws in synthetic genomes and for elucidating rules governing codon choice will be crucial for designing genomes exhibiting radically altered genetic codes.
Significance Statement This work presents the genome-wide replacement of all rare AGR arginine codons in the essential genes of Escherichia coli with synonymous CGN alternatives. Synonymous codon substitutions can lethally impact non-coding function by disrupting mRNA secondary structure and ribosomal binding site-like motifs. Here we quantitatively define the range of tolerable deviation in these metrics and use this relationship to provide critical insight into codon choice in recoded genomes. This work demonstrates that genome-wide removal of AGR is likely to be possible, and provides a framework for designing genomes with radically altered genetic codes.