PT - JOURNAL ARTICLE AU - Xiaolong Wang AU - Quanjiang Dong AU - Gang Chen AU - Jianye Zhang AU - Yongqiang Liu AU - Yujia Cai TI - Frameshifts and wild-type protein sequences are always highly similar because the genetic code is optimal for frameshift tolerance AID - 10.1101/067736 DP - 2019 Jan 01 TA - bioRxiv PG - 067736 4099 - http://biorxiv.org/content/early/2019/10/08/067736.short 4100 - http://biorxiv.org/content/early/2019/10/08/067736.full AB - Frameshift mutation yields truncated, dysfunctional product proteins, leading to loss-of-function, genetic disorders or even death. Frameshift mutations have been considered as mostly harmful and of little importance for the molecular evolution of proteins. Frameshift protein sequences, encoded by the alternative reading frames of a coding gene, have been therefore considered as meaningless. However, existing studies had shown that frameshift genes/proteins are widely existing and sometimes functional. It is puzzling how a frameshift kept its structure and functionality while its amino-acid sequence is changed substantially. We revealed here that the protein sequences of the frameshifts are highly conservative when compared with the wild-type protein sequence, and the similarities among the three protein sequences encoded in the three reading frames of a coding gene are defined mainly by the genetic code. In the standard genetic code, amino acid substitutions assigned to frameshift codon substitutions are far more conservative than those assigned to random substitutions. The frameshift tolerability of the standard genetic code ranks in the top 1.0-5.0% of all possible genetic codes, showing that the genetic code is optimal in terms of frameshift tolerance. In some species, the shiftability is further enhanced at gene- or genome-level by a biased usage of codons and codon pairs, where frameshift-tolerable codons/codon pairs are overrepresented in their genomes.