Alignment-free phylogenetics and population genetics

Brief Bioinform. 2014 May;15(3):407-18. doi: 10.1093/bib/bbt083. Epub 2013 Nov 29.

Abstract

Phylogenetics and population genetics are central disciplines in evolutionary biology. Both are based on comparative data, today usually DNA sequences. These have become so plentiful that alignment-free sequence comparison is of growing importance in the race between scientists and sequencing machines. In phylogenetics, efficient distance computation is the major contribution of alignment-free methods. A distance measure should reflect the number of substitutions per site, which underlies classical alignment-based phylogeny reconstruction. Alignment-free distance measures are either based on word counts or on match lengths, and I apply examples of both approaches to simulated and real data to assess their accuracy and efficiency. While phylogeny reconstruction is based on the number of substitutions, in population genetics, the distribution of mutations along a sequence is also considered. This distribution can be explored by match lengths, thus opening the prospect of alignment-free population genomics.

Keywords: match length; mutation distance; phylogenetics; population genetics; suffix tree.

Publication types

  • Review

MeSH terms

  • Animals
  • Computational Biology / methods
  • Evolution, Molecular
  • Genetics, Population / methods*
  • Genetics, Population / statistics & numerical data
  • Genome, Mitochondrial
  • Humans
  • Models, Genetic
  • Mutation
  • Phylogeny*
  • Recombination, Genetic
  • Selection, Genetic
  • Sequence Alignment
  • Sequence Analysis, DNA / methods*
  • Sequence Analysis, DNA / statistics & numerical data