Ancestral paralogs and pseudoparalogs and their role in the emergence of the eukaryotic cell

Nucleic Acids Res. 2005 Aug 16;33(14):4626-38. doi: 10.1093/nar/gki775. Print 2005.

Abstract

Gene duplication is a crucial mechanism of evolutionary innovation. A substantial fraction of eukaryotic genomes consists of paralogous gene families. We assess the extent of ancestral paralogy, which dates back to the last common ancestor of all eukaryotes, and examine the origins of the ancestral paralogs and their potential roles in the emergence of the eukaryotic cell complexity. A parsimonious reconstruction of ancestral gene repertoires shows that 4137 orthologous gene sets in the last eukaryotic common ancestor (LECA) map back to 2150 orthologous sets in the hypothetical first eukaryotic common ancestor (FECA) [paralogy quotient (PQ) of 1.92]. Analogous reconstructions show significantly lower levels of paralogy in prokaryotes, 1.19 for archaea and 1.25 for bacteria. The only functional class of eukaryotic proteins with a significant excess of paralogous clusters over the mean includes molecular chaperones and proteins with related functions. Almost all genes in this category underwent multiple duplications during early eukaryotic evolution. In structural terms, the most prominent sets of paralogs are superstructure-forming proteins with repetitive domains, such as WD-40 and TPR. In addition to the true ancestral paralogs which evolved via duplication at the onset of eukaryotic evolution, numerous pseudoparalogs were detected, i.e. homologous genes that apparently were acquired by early eukaryotes via different routes, including horizontal gene transfer (HGT) from diverse bacteria. The results of this study demonstrate a major increase in the level of gene paralogy as a hallmark of the early evolution of eukaryotes.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Eukaryotic Cells / physiology*
  • Evolution, Molecular*
  • Gene Duplication*
  • Gene Transfer, Horizontal
  • Genes, Archaeal
  • Genes, Bacterial
  • Genomics
  • Molecular Sequence Data
  • Phylogeny
  • Proteins / genetics*
  • Sequence Alignment

Substances

  • Proteins