Comparing sequences without using alignments: application to HIV/SIV subtyping

BMC Bioinformatics. 2007 Jan 2:8:1. doi: 10.1186/1471-2105-8-1.

Abstract

Background: In general, the construction of trees is based on sequence alignments. This procedure, however, leads to loss of informationwhen parts of sequence alignments (for instance ambiguous regions) are deleted before tree building. To overcome this difficulty, one of us previously introduced a new and rapid algorithm that calculates dissimilarity matrices between sequences without preliminary alignment.

Results: In this paper, HIV (Human Immunodeficiency Virus) and SIV (Simian Immunodeficiency Virus) sequence data are used to evaluate this method. The program produces tree topologies that are identical to those obtained by a combination of standard methods detailed in the HIV Sequence Compendium. Manual alignment editing is not necessary at any stage. Furthermore, only one user-specified parameter is needed for constructing trees.

Conclusion: The extensive tests on HIV/SIV subtyping showed that the virus classifications produced by our method are in good agreement with our best taxonomic knowledge, even in non-coding LTR (Long Terminal Repeat) regions that are not tractable by regular alignment methods due to frequent duplications/insertions/deletions. Our method, however, is not limited to the HIV/SIV subtyping. It provides an alternative tree construction without a time-consuming aligning procedure.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence / genetics
  • HIV / classification*
  • HIV / genetics*
  • Humans
  • Molecular Sequence Data
  • Sequence Alignment / methods*
  • Serotyping / methods*
  • Simian Immunodeficiency Virus / classification*
  • Simian Immunodeficiency Virus / genetics*