Using comparative genome analysis to identify problems in annotated microbial genomes

Maria S Poptsova; J Peter Gogarten

doi:10.1099/mic.0.033811-0

Using comparative genome analysis to identify problems in annotated microbial genomes

Microbiology (Reading). 2010 Jul;156(Pt 7):1909-1917. doi: 10.1099/mic.0.033811-0. Epub 2010 Apr 29.

Authors

Maria S Poptsova¹, J Peter Gogarten¹

Affiliation

¹ Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3125, USA.

PMID: 20430813
DOI: 10.1099/mic.0.033811-0

Abstract

Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli, and consider additional quality control for their results.

Publication types

Comparative Study
Research Support, U.S. Gov't, Non-P.H.S.
Review

MeSH terms

Bacteria / chemistry
Bacteria / genetics*
Computational Biology / methods
Computational Biology / standards*
Fungi / chemistry
Fungi / genetics*
Genome, Bacterial*
Genome, Fungal*
Software