Measuring the distance between multiple sequence alignments

Bioinformatics. 2012 Feb 15;28(4):495-502. doi: 10.1093/bioinformatics/btr701. Epub 2011 Dec 23.

Abstract

Motivation: Multiple sequence alignment (MSA) is a core method in bioinformatics. The accuracy of such alignments may influence the success of downstream analyses such as phylogenetic inference, protein structure prediction, and functional prediction. The importance of MSA has lead to the proliferation of MSA methods, with different objective functions and heuristics to search for the optimal MSA. Different methods of inferring MSAs produce different results in all but the most trivial cases. By measuring the differences between inferred alignments, we may be able to develop an understanding of how these differences (i) relate to the objective functions and heuristics used in MSA methods, and (ii) affect downstream analyses.

Results: We introduce four metrics to compare MSAs, which include the position in a sequence where a gap occurs or the location on a phylogenetic tree where an insertion or deletion (indel) event occurs. We use both real and synthetic data to explore the information given by these metrics and demonstrate how the different metrics in combination can yield more information about MSA methods and the differences between them.

Availability: MetAl is a free software implementation of these metrics in Haskell. Source and binaries for Windows, Linux and Mac OS X are available from http://kumiho.smith.man.ac.uk/whelan/software/metal/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computers
  • INDEL Mutation
  • Phylogeny*
  • Proteins / chemistry
  • Proteins / genetics
  • Sequence Alignment / methods*
  • Software*

Substances

  • Proteins