Inconsistency of phylogenetic estimates from concatenated data under coalescence

Syst Biol. 2007 Feb;56(1):17-24. doi: 10.1080/10635150601146041.

Abstract

Although multiple gene sequences are becoming increasingly available for molecular phylogenetic inference, the analysis of such data has largely relied on inference methods designed for single genes. One of the common approaches to analyzing data from multiple genes is concatenation of the individual gene data to form a single supergene to which traditional phylogenetic inference procedures - e.g., maximum parsimony (MP) or maximum likelihood (ML) - are applied. Recent empirical studies have demonstrated that concatenation of sequences from multiple genes prior to phylogenetic analysis often results in inference of a single, well-supported phylogeny. Theoretical work, however, has shown that the coalescent can produce substantial variation in single-gene histories. Using simulation, we combine these ideas to examine the performance of the concatenation approach under conditions in which the coalescent produces a high level of discord among individual gene trees and show that it leads to statistically inconsistent estimation in this setting. Furthermore, use of the bootstrap to measure support for the inferred phylogeny can result in moderate to strong support for an incorrect tree under these conditions. These results highlight the importance of incorporating variation in gene histories into multilocus phylogenetics.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Classification / methods*
  • Computer Simulation
  • Genes / genetics*
  • Likelihood Functions
  • Models, Genetic*
  • Phylogeny*
  • Research Design