A greedy algorithm for aligning DNA sequences

Z Zhang; S Schwartz; L Wagner; W Miller

doi:10.1089/10665270050081478

A greedy algorithm for aligning DNA sequences

J Comput Biol. 2000 Feb-Apr;7(1-2):203-14. doi: 10.1089/10665270050081478.

Authors

Z Zhang¹, S Schwartz, L Wagner, W Miller

Affiliation

¹ Department of Computer Science and Engineering, The Pennsylvania State University, University Park 16802, USA.

PMID: 10890397
DOI: 10.1089/10665270050081478

Abstract

For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy alignment algorithm with particularly good performance and show that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data. An implementation of this algorithm is currently used in a program that assembles the UniGene database at the National Center for Biotechnology Information.

Publication types

Research Support, U.S. Gov't, P.H.S.

MeSH terms

Algorithms*
Biometry
DNA / genetics*
Databases, Factual
Sequence Alignment / statistics & numerical data*
Sequence Analysis, DNA / statistics & numerical data
Software

Substances

DNA

Grants and funding

LM05110/LM/NLM NIH HHS/United States