Sampling trees from evolutionary models

Syst Biol. 2010 Jul;59(4):465-76. doi: 10.1093/sysbio/syq026. Epub 2010 May 28.

Abstract

A wide range of evolutionary models for species-level (and higher) diversification have been developed. These models can be used to test evolutionary hypotheses and provide comparisons with phylogenetic trees constructed from real data. To carry out these tests and comparisons, it is often necessary to sample, or simulate, trees from the evolutionary models. Sampling trees from these models is more complicated than it may appear at first glance, necessitating careful consideration and mathematical rigor. Seemingly straightforward sampling methods may produce trees that have systematically biased shapes or branch lengths. This is particularly problematic as there is no simple method for determining whether the sampled trees are appropriate. In this paper, we show why a commonly used simple sampling approach (SSA)-simulating trees forward in time until n species are first reached-should only be applied to the simplest pure birth model, the Yule model. We provide an alternative general sampling approach (GSA) that can be applied to most other models. Furthermore, we introduce the constant-rate birth-death model sampling approach, which samples trees very efficiently from a widely used class of models. We explore the bias produced by SSA and identify situations in which this bias is particularly pronounced. We show that using SSA can lead to erroneous conclusions: When using the inappropriate SSA, the variance of a gradually evolving trait does not correlate with the age of the tree; when the correct GSA is used, the trait variance correlates with tree age. The algorithms presented here are available in the Perl Bio::Phylo package, as a stand-alone program TreeSample, and in the R TreeSim package.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biological Evolution*
  • Computer Simulation
  • Genetic Speciation
  • Models, Genetic*
  • Selection Bias