A universal trend of amino acid gain and loss in protein evolution

Nature. 2005 Feb 10;433(7026):633-8. doi: 10.1038/nature03306. Epub 2005 Jan 19.

Abstract

Amino acid composition of proteins varies substantially between taxa and, thus, can evolve. For example, proteins from organisms with (G + C)-rich (or (A + T)-rich) genomes contain more (or fewer) amino acids encoded by (G + C)-rich codons. However, no universal trends in ongoing changes of amino acid frequencies have been reported. We compared sets of orthologous proteins encoded by triplets of closely related genomes from 15 taxa representing all three domains of life (Bacteria, Archaea and Eukaryota), and used phylogenies to polarize amino acid substitutions. Cys, Met, His, Ser and Phe accrue in at least 14 taxa, whereas Pro, Ala, Glu and Gly are consistently lost. The same nine amino acids are currently accrued or lost in human proteins, as shown by analysis of non-synonymous single-nucleotide polymorphisms. All amino acids with declining frequencies are thought to be among the first incorporated into the genetic code; conversely, all amino acids with increasing frequencies, except Ser, were probably recruited late. Thus, expansion of initially under-represented amino acids, which began over 3,400 million years ago, apparently continues to this day.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • AT Rich Sequence / genetics
  • Amino Acid Substitution / genetics
  • Amino Acids / genetics*
  • Animals
  • Archaea / genetics
  • Bacteria / genetics
  • Base Composition
  • Eukaryotic Cells / metabolism
  • Evolution, Molecular*
  • GC Rich Sequence / genetics
  • Genome*
  • Humans
  • Phylogeny
  • Polymorphism, Single Nucleotide / genetics
  • Prokaryotic Cells / metabolism
  • Proteins / chemistry*
  • Proteins / genetics*

Substances

  • Amino Acids
  • Proteins