RT Journal Article SR Electronic T1 So many genes, so little time: comments on divergence-time estimation in the genomic era JF bioRxiv FD Cold Spring Harbor Laboratory SP 114975 DO 10.1101/114975 A1 Stephen A. Smith A1 Joseph W. Brown A1 Joseph F. Walker YR 2017 UL http://biorxiv.org/content/early/2017/04/27/114975.abstract AB Phylogenomic datasets have emerged as an important tool and have been used for addressing questions involving evolutionary relationships, patterns of genome structure, signatures of selection, and gene and genome duplications. Here, we examine these data sources for their utility in the estimation of divergence-times. Divergence-time estimation can be complicated by the heterogeneity of molecular rates among lineages and through time. Despite the recent explosion of phylogenomic data, it is still unclear what the distribution of gene-and lineage-specific rate heterogeneity is over these genomic and transcriptomic datasets.Here, we examine rate heterogeneity across genes and determine whether clock-like or nearly clock-like genes are present in phylogenomic datasets that could be used to reduce error in divergence-time estimation. We address these questions with six published phylogenomic datasets including Birds, carnivorous Caryophyllales, broad Caryophyllales, Millipedes, Hymenoptera, and Vitales. We introduce a simple and fast method for identifying useful genes for constructing divergence-time estimates and conduct exemplar Bayesian analyses under both clock and uncorrelated log-normal (UCLN) models.We used a “gene shopping” approach (implemented in SortaDate) to identify genes with minimal conflict, lower root-to-tip variance, and discernible amounts of molecular evolution. We find that every empirical dataset examined includes genes with clock-like, or nearly clock-like, behavior. Many datasets have genes that are not only clock-like, but also have reasonable evolutionary rates and are mostly compatible with the species tree. We used these data to conduct basic divergence-time analyses under strict clock and UCLN models. These exemplar divergence-time analyses show overlap in age estimates when using either clock or UCLN models, but with much larger credibility intervals for UCLN models.We find that “gene shopping” can be productive and successful in finding gene regions that minimize lineage-specific heterogeneity. By doing relatively simple assessments of root-to-tip variance and bipartition conflict, we not only explore datasets more thoroughly but also may estimate ages on phylogenies with lower error. We also suggest the need to explore more detailed and informative approaches to determine fit and deviation from a molecular clock, as existing approaches are exceedingly strict.