Genes with minimal phylogenetic information are problematic for coalescent analyses when gene tree estimation is biased

Mol Phylogenet Evol. 2015 Nov:92:63-71. doi: 10.1016/j.ympev.2015.06.009. Epub 2015 Jun 24.

Abstract

The development and application of coalescent methods are undergoing rapid changes. One little explored area that bears on the application of gene-tree-based coalescent methods to species tree estimation is gene informativeness. Here, we investigate the accuracy of these coalescent methods when genes have minimal phylogenetic information, including the implementation of the multilocus bootstrap approach. Using simulated DNA sequences, we demonstrate that genes with minimal phylogenetic information can produce unreliable gene trees (i.e., high error in gene tree estimation), which may in turn reduce the accuracy of species tree estimation using gene-tree-based coalescent methods. We demonstrate that this problem can be alleviated by sampling more genes, as is commonly done in large-scale phylogenomic analyses. This applies even when these genes are minimally informative. If gene tree estimation is biased, however, gene-tree-based coalescent analyses will produce inconsistent results, which cannot be remedied by increasing the number of genes. In this case, it is not the gene-tree-based coalescent methods that are flawed, but rather the input data (i.e., estimated gene trees). Along these lines, the commonly used program PhyML has a tendency to infer one particular bifurcating topology even though it is best represented as a polytomy. We additionally corroborate these findings by analyzing the 183-locus mammal data set assembled by McCormack et al. (2012) using ultra-conserved elements (UCEs) and flanking DNA. Lastly, we demonstrate that when employing the multilocus bootstrap approach on this 183-locus data set, there is no strong conflict between species trees estimated from concatenation and gene-tree-based coalescent analyses, as has been previously suggested by Gatesy and Springer (2014).

Keywords: Concatenation methods; Gene informativeness; Gene tree estimation; Gene-tree-based coalescent methods; Multilocus bootstrap approach; PhyML.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Base Sequence
  • DNA / genetics
  • Datasets as Topic
  • Genes
  • Mammals / classification*
  • Mammals / genetics*
  • Phylogeny*
  • Software

Substances

  • DNA