Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies

Brian J Haas; Arthur L Delcher; Stephen M Mount; Jennifer R Wortman; Roger K Smith Jr; Linda I Hannick; Rama Maiti; Catherine M Ronning; Douglas B Rusch; Christopher D Town; Steven L Salzberg; Owen White

doi:10.1093/nar/gkg770

Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies

Nucleic Acids Res. 2003 Oct 1;31(19):5654-66. doi: 10.1093/nar/gkg770.

Authors

Brian J Haas¹, Arthur L Delcher, Stephen M Mount, Jennifer R Wortman, Roger K Smith Jr, Linda I Hannick, Rama Maiti, Catherine M Ronning, Douglas B Rusch, Christopher D Town, Steven L Salzberg, Owen White

Affiliation

¹ The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA. bhaas@tigr.org

Abstract

The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full-length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. Complete and partial gene structures identified by this method were used to improve The Institute for Genomic Research Arabidopsis genome annotation (TIGR release v.4.0). The alignment assemblies permitted the automated modeling of several novel genes and >1000 alternative splicing variations as well as updates (including UTR annotations) to nearly half of the approximately 27 000 annotated protein coding genes. The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.

Publication types

Evaluation Study
Research Support, U.S. Gov't, Non-P.H.S.
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Algorithms
Alternative Splicing
Arabidopsis / genetics*
Arabidopsis / metabolism
DNA, Complementary / analysis
Expressed Sequence Tags
Genome, Plant*
Introns
Plant Proteins / genetics
RNA, Plant / analysis*
RNA, Plant / chemistry
Sequence Alignment / methods*
Software*
Transcription, Genetic
Untranslated Regions

Substances

DNA, Complementary
Plant Proteins
RNA, Plant
Untranslated Regions

Abstract

Publication types

MeSH terms

Substances

Grants and funding