Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood

Bioinformatics. 2005 Dec 15;21(24):4338-47. doi: 10.1093/bioinformatics/bti713. Epub 2005 Oct 18.

Abstract

Motivation: Maximum likelihood (ML) methods have become very popular for constructing phylogenetic trees from sequence data. However, despite noticeable recent progress, with large and difficult datasets (e.g. multiple genes with conflicting signals) current ML programs still require huge computing time and can become trapped in bad local optima of the likelihood function. When this occurs, the resulting trees may still show some of the defects (e.g. long branch attraction) of starting trees obtained using fast distance or parsimony programs.

Methods: Subtree pruning and regrafting (SPR) topological rearrangements are usually sufficient to intensively search the tree space. Here, we propose two new methods to make SPR moves more efficient. The first method uses a fast distance-based approach to detect the least promising candidate SPR moves, which are then simply discarded. The second method locally estimates the change in likelihood for any remaining potential SPRs, as opposed to globally evaluating the entire tree for each possible move. These two methods are implemented in a new algorithm with a sophisticated filtering strategy, which efficiently selects potential SPRs and concentrates most of the likelihood computation on the promising moves.

Results: Experiments with real datasets comprising 35-250 taxa show that, while indeed greatly reducing the amount of computation, our approach provides likelihood values at least as good as those of the best-known ML methods so far and is very robust to poor starting trees. Furthermore, combining our new SPR algorithm with local moves such as PHYML's nearest neighbor interchanges, the time needed to find good solutions can sometimes be reduced even more.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology
  • Databases, Genetic
  • Likelihood Functions*
  • Mathematics
  • Models, Genetic
  • Phylogeny*
  • Software