TY - JOUR T1 - Estimating rates and patterns of diversification with incomplete sampling: A case study in the rosids JF - bioRxiv DO - 10.1101/749325 SP - 749325 AU - Miao Sun AU - Ryan A. Folk AU - Matthew A. Gitzendanner AU - Robert P. Guralnick AU - Pamela S. Soltis AU - Zhiduan Chen AU - Douglas E. Soltis Y1 - 2019/01/01 UR - http://biorxiv.org/content/early/2019/08/29/749325.abstract N2 - Premise of the Study Recent advances in generating large-scale phylogenies enable broad-scale estimation of species diversification rates. These now-common approaches typically (1) are characterized by incomplete coverage without explicit sampling methodologies, and/or (2) sparse backbone representation, and usually rely on presumed phylogenetic placements to account for species without molecular data. Here we use an empirical example to examine effects of incomplete sampling on diversification estimation and provide constructive suggestions to ecologists and evolutionists based on those results.Methods We used a supermatrix for rosids, a large clade of angiosperms, and its well-sampled subclade Cucurbitaceae, as empirical case studies. We compared results using this large phylogeny with those based on a previously inferred, smaller supermatrix and on a synthetic tree resource with complete taxonomic coverage. Finally, we simulated random and representative taxon sampling and explored the impact of sampling on three commonly used methods, both parametric (RPANDA, BAMM) and semiparametric (DR).Key Results We find the impact of sampling on diversification estimates is idiosyncratic and often strong. As compared to full empirical sampling, representative and random sampling schemes either depress or exaggerate speciation rates depending on methods and sampling schemes. No method was entirely robust to poor sampling, but BAMM was least sensitive to moderate levels of missing taxa.Conclusions We (1) urge caution in use of summary backbone trees containing only higher-level taxa, (2) caution against uncritical modeling of missing taxa using taxonomic data for poorly sampled trees, and (3) stress the importance of explicit sampling methodologies in macroevolutionary studies. ER -