ABSTRACT
Motivation Most amino acids can be encoded by a set of synonymous codons. Often, for any given amino acid, certain codons are significantly more used than others, a phenomenon known as codon usage bias. The genomes of different species differ in the frequencies at which they use each codon (e.g., a codon that is highly used in one species may be lowly used in another species). In addition, within any given genome, genes differ in their degree of codon bias, with highly expressed genes being more likely to use preferred codons. Knowing the codons that are preferred by a certain genome, and the amount of codon bias exhibited by each gene, has multiple applications (e.g., in heterologous expression, gene prediction, or phylogenetic inference).
Results We have developed the Codon Statistics Database, an online database that contains codon usage statistics for all the species with reference or representative genomes in RefSeq. The user can search for any species and access two sets of tables. One set lists, for each codon, the frequency, the Relative Synonymous Codon Usage (RSCU), and whether the codon is preferred. Another set of tables lists, for each gene, its GC content, Effective Number of Codons (ENC), Codon Adaptation Index (CAI), and frequency of optimal codons (Fop). Equivalent tables can be accessed for 1) all nuclear genes, 2) nuclear genes encoding ribosomal proteins, 3) mitochondrial genes and 4) chloroplastic genes (if available in the relevant assembly). The user can also search for any taxonomic group (e.g., “primates”) and obtain a table comparing all the species in the group.
Availability The database is free to access without registration at http://codonstatsdb.unr.edu.
Competing Interest Statement
The authors have declared no competing interest.