Estimating mutation distances from unaligned genomes

Bernhard Haubold; Peter Pfaffelhuber; Mirjana Domazet-Loso; Thomas Wiehe

doi:10.1089/cmb.2009.0106

Estimating mutation distances from unaligned genomes

J Comput Biol. 2009 Oct;16(10):1487-500. doi: 10.1089/cmb.2009.0106.

Authors

Bernhard Haubold¹, Peter Pfaffelhuber, Mirjana Domazet-Loso, Thomas Wiehe

Affiliation

¹ Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, Plön, Germany. haubold@evolbio.mpg.de

PMID: 19803738
DOI: 10.1089/cmb.2009.0106

Abstract

Abstract Alignment-free distance measures are generally less accurate but more efficient than traditional alignment-based metrics. In the context of genome sequence analysis, the efficiency gain is often so substantial that it outweights the loss in accuracy. However, a further disadvantage of alignment-free distances is that their relationship to evolutionary events such as substitutions is generally unknown. We have therefore derived an estimator of the number of substitutions per site between two unaligned DNA sequences, K(r). Simulations show that this estimator works well with "ideal" data. We compare K(r) to two alternative alignment-free distances: a k-tuple distance and a measure of relative entropy based on average common substring length. All three measures are applied to 27 primate mitochondrial genomes, eight whole genomes of Streptococcus agalactiae strains, and 12 whole genomes of Drosophila species. In each case, the cluster diagrams based on K(r) are equivalent to or significantly better than those based on the two alternative measures. This is due to the fact that in contrast to the alternative measures K(r) is derived from an explicit model of evolution. The computation of K(r) is efficiently implemented in the program kr, which can be downloaded freely from the internet.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Animals
Cluster Analysis
Drosophila / classification
Drosophila / genetics
Genome*
Humans
Models, Genetic*
Mutation*
Phylogeny
Sequence Alignment
Streptococcus agalactiae / classification
Streptococcus agalactiae / genetics