Applying species-tree analyses to deep phylogenetic histories: challenges and potential suggested from a survey of empirical phylogenetic studies

Mol Phylogenet Evol. 2015 Feb:83:191-9. doi: 10.1016/j.ympev.2014.10.022. Epub 2014 Nov 4.

Abstract

Coalescent-based methods for species-tree estimation are becoming a dominant approach for reconstructing species histories from multi-locus data, with most of the studies examining these methodologies focused on recently diverged species. However, deeper phylogenies, such as the datasets that comprise many Tree of Life (ToL) studies, also exhibit gene-tree discordance. This discord may also arise from the stochastic sorting of gene lineages during the speciation process (i.e., reflecting the random coalescence of gene lineages in ancestral populations). It remains unknown whether guidelines regarding methodologies and numbers of loci established by simulation studies at shallow tree depths translate into accurate species relationships for deeper phylogenetic histories. We address this knowledge gap and specifically identify the challenges and limitations of species-tree methods that account for coalescent variance for deeper phylogenies. Using simulated data with characteristics informed by empirical studies, we evaluate both the accuracy of estimated species trees and the characteristics associated with recalcitrant nodes, with a specific focus on whether coalescent variance is generally responsible for the lack of resolution. By determining the proportion of coalescent genealogies that support a particular node, we demonstrate that (1) species-tree methods account for coalescent variance at deep nodes and (2) mutational variance - not gene-tree discord arising from the coalescent - posed the primary challenge for accurate reconstruction across the tree. For example, many nodes were accurately resolved despite predicted discord from the random coalescence of gene lineages and nodes with poor support were distributed across a range of depths (i.e., they were not restricted to a particular recent divergences). Given their broad taxonomic scope and large sampling of taxa, deep level phylogenies pose several potential methodological complications including difficulties with MCMC convergence and estimation of requisite population genetic parameters for coalescent-based approaches. Despite these difficulties, the findings generally support the utility of species-tree analyses for the estimation of species relationships throughout the ToL. We discuss strategies for successful application of species-tree approaches to deep phylogenies.

Keywords: (∗)BEAST; Coalescent variance; Data sufficiency; Incomplete lineage sorting; Mutational variance; Tree of Life (ToL) studies.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Biological Evolution*
  • Models, Genetic*
  • Mutation
  • Phylogeny*
  • Sequence Analysis, DNA