Potential role of the X circular code in the regulation of gene expression

The X circular code is a set of 20 trinucleotides (codons) that has been identified in the protein-coding genes of most organisms (bacteria, archaea, eukaryotes, plasmids, viruses). It has been shown previously that the X circular code has the important mathematical property of being an error-correcting code. Thus, motifs of the X circular code, i.e. a series of codons belonging to X, which are significantly enriched in the genes, allow identification and maintenance of the reading frame in genes. X motifs have also been identified in many transfer RNA (tRNA) genes and in important functional regions of the ribosomal RNA (rRNA), notably in the peptidyl transferase center and the decoding center. Here, we investigate the potential role of X motifs as functional elements in the regulation of gene expression. Surprisingly, the definition of a simple parameter identifies several relations between the X circular code and gene expression. First, we identify a correlation between the 20 codons of the X circular code and the optimal codons/dicodons that have been shown to influence translation efficiency. Using previously published experimental data, we then demonstrate that the presence of X motifs in genes can be used to predict the level of gene expression. Based on these observations, we propose the hypothesis that the X motifs represent a new genetic signal, contributing to the maintenance of the correct reading frame and the optimization and regulation of gene expression. Author Summary The standard genetic code is used by (quasi-) all organisms to translate information in genes into proteins. Recently, other codes have been identified in genomes that increase the versatility of gene decoding. Here, we focus on the circular codes, an important class of genome codes, that have the ability to detect and maintain the reading frame during translation. Motifs of the X circular code are enriched in protein-coding genes from most organisms from bacteria to eukaryotes, as well as in important molecules in the gene translation machinery, including transfer RNA (tRNA) and ribosomal RNA (rRNA). Based on these observations, it has been proposed that the X circular code represents an ancestor of the standard genetic code, that was used in primordial systems to simultaneously decode a smaller set of amino acids and synchronize the reading frame. Using previously published experimental data, we highlight several links between the presence of X motifs in genes and more efficient gene expression, supporting the hypothesis that the X circular code still contributes to the complex dynamics of gene regulation in extant genomes.

significantly enriched in the genes, allow identification and maintenance of the reading frame in genes.

23
X motifs have also been identified in many transfer RNA (tRNA) genes and in important functional 24 regions of the ribosomal RNA (rRNA), notably in the peptidyl transferase center and the decoding center.

25
Here, we investigate the potential role of X motifs as functional elements in the regulation of gene 26 expression. Surprisingly, the definition of a simple parameter identifies several relations between the X 27 circular code and gene expression. First, we identify a correlation between the 20 codons of the X 28 circular code and the optimal codons/dicodons that have been shown to influence translation efficiency.

29
Using previously published experimental data, we then demonstrate that the presence of X motifs in 30 genes can be used to predict the level of gene expression. Based on these observations, we propose the 31 hypothesis that the X motifs represent a new genetic signal, contributing to the maintenance of the 32 correct reading frame and the optimization and regulation of gene expression.

197
Equation (1)) of the X motifs in the mRNA sequences. To evaluate the significance of the enrichment, as 9 198 in previous work [23-24], we used a randomization model in which we generated N=100 random codes 199 that preserved most of the properties to the X code, except the circularity. We then identified all random 200 motifs from the 100 random codes and calculated mean values for the 100 codes.   (1)) of X motifs in each construct. Fig 5 clearly  in the complete set of 5450 genes and calculated the density of X motifs (defined in Equation (1)) in 285 three subsets of the genes having different estimated translation rates (Fig 6). We observed that genes 286 with higher translation rates had significantly more X motifs than those with lower translation rates.

287
The density of X motifs is higher for the sequences with medium translation rates than for those with 288 low translation rates (one-sided Student's t-test p < 10 -10 ) and for the sequences with high translation 289 rates than for those with medium translation rates (one-sided Student's t-test p < 10 -14 ). This result 290 demonstrates the link between the total time needed for ribosome transition on a mRNA and density of 291 X motifs along the length of the sequence. To investigate whether X motifs might play a role in modulating ribosome speed in specific regions in 303 mRNA, we considered single protein studies, where local translation elongation rate has been studied 304 in detail. The first example concerns the study of a gene in S. cerevisiae, to investigate the link between 305 translational elongation and mRNA decay [55]. In this study, various HIS3 protein constructs (length of 13 306 699 nucleotides) were designed with increasing codon optimality (measured by the CSC index) from 307 0% to 100%. We identified X motifs in the different constructs as before and compared them to the 308 experimentally measured mRNA half-life. As the authors point out, the mRNA half-life is largely 309 determined by the codon-dependent rate of translational elongation, since mRNAs whose translation 310 elongation rate is slowed by inclusion of non-optimal codons are specifically degraded. The density of X 311 motifs ranges from 0 in the 0% optimized construct to more than 7 in the 100% optimized sequence 312 (Fig 7). The results suggest that the introduction of individual X motifs in specific regions can be used to and insects with different codon usage bias (codon usage tables for these organisms are provided in S1 341 Table), but in all the examples a strong correlation is observed between 'optimal' codons and X codons.

342
Taken together, the results support the idea that the use of X motifs is a conserved mechanism from 343 viruses to animals that may participate in the modulation or regulation of the translation elongation 344 rate along the mRNA.

347
In this work, we have combined two very distinct research domains: gene translation through the 348 genetic code and the theory of circular codes which allows two processes simultaneously: reading frame 349 retrieval and amino acid coding. Our hypothesis is that at least two codes operate in genes: the standard 350 genetic code, experimentally proved to be functional, and the X circular code that has been shown to be 351 statistically enriched in genes. For the first time here, we shed light on a number of biological 352 experimental results by using the definition of a very simple parameter to analyze the density of X motifs 353 in genes, i.e. motifs from the circular code X.

354
We would first like to make some comments about the mathematical structure of these two codes. The