Estimating species trees from unrooted gene trees

Syst Biol. 2011 Oct;60(5):661-7. doi: 10.1093/sysbio/syr027. Epub 2011 Mar 28.

Abstract

In this study, we develop a distance method for inferring unrooted species trees from a collection of unrooted gene trees. The species tree is estimated by the neighbor joining (NJ) tree built from a distance matrix in which the distance between two species is defined as the average number of internodes between two species across gene trees, that is, average gene-tree internode distance. The distance method is named NJ(st) to distinguish it from the original NJ method. Under the coalescent model, we show that if gene trees are known or estimated correctly, the NJ(st) method is statistically consistent in estimating unrooted species trees. The simulation results suggest that NJ(st) and STAR (another coalescence-based method for inferring species trees) perform almost equally well in estimating topologies of species trees, whereas the Bayesian coalescence-based method, BEST, outperforms both NJ(st) and STAR. Unlike BEST and STAR, the NJ(st) method can take unrooted gene trees to infer species trees without using an outgroup. In addition, the NJ(st) method can handle missing data and is thus useful in phylogenomic studies in which data sets often contain missing loci for some individuals.

Publication types

  • Evaluation Study

MeSH terms

  • Bayes Theorem
  • Biological Evolution*
  • Computational Biology / methods*
  • Computer Simulation
  • Models, Genetic
  • Phylogeny
  • Saccharomyces / classification*
  • Saccharomyces / genetics*