PT - JOURNAL ARTICLE AU - Joseph F. Walker AU - Joseph W. Brown AU - Stephen A. Smith TI - Site and gene-wise likelihoods unmask influential outliers in phylogenomic analyses AID - 10.1101/115774 DP - 2017 Jan 01 TA - bioRxiv PG - 115774 4099 - http://biorxiv.org/content/early/2017/03/10/115774.short 4100 - http://biorxiv.org/content/early/2017/03/10/115774.full AB - Despite the wealth of evolutionary information available from sequence data, recalcitrant nodes in phylogenomic studies remain. A recent study of vertebrate transcriptomes by Brown and Thomson (2016) revealed that less than one percent of genes can have strong enough phylogenetic signal to alter the species tree. While identifying these outliers is important, the use of Bayes factors, advocated by Brown and Thomson (2016), is a heavy computational burden for increasingly large and growing datasets. We do not find fault with the Brown and Thomson (2016) study, but instead hope to build on their suggestions and offer some alternatives. Here we suggest that site- and gene-wise likelihoods may be used to idenitfy discordant genes and nodes. We demonstrate this in the vertebrate dataset analyzed by Brown and Thomson (2016) as well as a dataset of carnivorous Caryophyllales (Eudicots: Superasterids). In both datasets, we identify genes that strongly influence species tree inference, and can overrule the signal present in all remaining genes altering the species tree topology. By using a less computationally demanding approach, we can more rapidly examine competing hypotheses, providing a more thorough assessment of overall conflict. For example, our analyses highlight that the debated vertebrate relationship of Alligatoridae sister to turtles, only has six genes with complete coverage for all species of Alligatoridae, birds and turtles. We also find that two genes (~0.0016%) from the 1237 gene dataset of carnivorous Caryophyllales drive the topological estimate and, when removed, the species tree topology supports an alternative hypothesis supported by the remaining genes. Additionally, while the genes highlighted by Brown and Thomson (2016) were revealed to be the result of errors, we suggest that the topology produced by the outlier genes in the carnivorous Caryophyllales may not be the result of methodological error. Close examination of these genes revealed no obvious biases (i.e. no evidence of misidentified orthology, alignment error, or model violations such as significant compositional heterogeneity) suggesting the potential that these genes represent genuine, but exceptional, products of the evolutionary process. Bayes factors have been demonstrated to be helpful in addressing questions of conflict, but require significant computational effort. We suggest that maximum likelihood can also address these questions without the extensive computational burden. Furthermore, we recommend more thorough dataset exploration as this may expose limitations in a dataset to address primary hypotheses. While a dataset may contain hundreds or thousands of genes, only a small subset may be informative for the primary biological question.