Evidence Suggesting That a Fifth of Annotated Caenorhabditis elegans Genes May Be Pseudogenes

  1. Andrew Mounsey,
  2. Petra Bauer, and
  3. Ian A. Hope1
  1. School of Biology, University of Leeds, Leeds, LS2 9JT, United Kingdom

Abstract

Only a minority of the genes, identified in theCaenorhabditis elegans genome sequence data by computer analysis, have been characterized experimentally. We attempted to determine the expression patterns for a random sample of the annotated genes using reporter gene fusions. A low success rate was obtained for evolutionarily recently duplicated genes. Analysis of the data suggests that this is not due to conditional or low-level expression. The remaining explanation is that most of the annotated genes in the recently duplicated category are pseudogenes, a proportion corresponding to 20% of all of the annotated C. elegansgenes. Further support for this surprisingly high figure was sought by comparing sequences for families of recently duplicated C. elegans genes. Although only a preliminary analysis, clear evidence for a gene having been recently inactivated by genetic drift was found for many genes in the recently duplicated category. At least 4% of the annotated C. elegans genes can be recognized as pseudogenes simply from closer inspection of the sequence data. Lessons learned in identifying pseudogenes in C. elegans could be of value in the annotation of the genomes of other species where, although there may be fewer pseudogenes, they may be harder to detect.

[Online supplementary material available atwww.genome.org.]

Footnotes

  • 1 Corresponding author.

  • E-MAIL i.a.hope{at}leeds.ac.uk; FAX (44) 113 343 2835.

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr208802. Article published online before print in April 2002.

    • Received August 6, 2001.
    • Accepted March 6, 2002.
| Table of Contents

Preprint Server