Abstract
Despite the wealth of evolutionary information available from genomic and transcriptomic data, recalcitrant relationships in phylogenomic studies remain throughout the tree of life. Recent studies have demonstrated that conflict is common among gene trees, and less than one percent of genes may ultimately drive species tree inference in supermatrix analyses. In this study, we examined plant and vertebrate datasets where supermatrix and coalescent-based species trees conflict. Using a two-topology site-specific log-likelihood test, we identified two highly influential genes in each dataset. While the outlier genes in the vertebrate dataset have been shown to be the result of errors in orthology detection, we demonstrate that the outlier genes from the plant dataset may be the result of biological processes rather than model or methodological errors. When the outlier genes were removed from each supermatrix, the inferred trees matched the topologies obtained from coalescent analyses. While most tests of this nature limit the comparison to a small number of fixed topologies, often two topologies, gene tree topologies generated under processes such as incomplete lineage sorting are unlikely to precisely match these topologies. We therefore examined edges across a set of trees and recover more support for the resolution favored by coalescent analyses. These results suggest that by expanding beyond fixed-topology comparisons, we can dramatically improve our understanding of the underlying signal in phylogenomic datasets by asking more targeted edge-based questions.