PT - JOURNAL ARTICLE AU - Barbera, Pierre AU - Czech, Lucas AU - Lutterop, Sarah AU - Stamatakis, Alexandros TI - SCRAPP: A tool to assess the diversity of microbial samples from phylogenetic placements AID - 10.1101/2020.02.28.969980 DP - 2020 Jan 01 TA - bioRxiv PG - 2020.02.28.969980 4099 - http://biorxiv.org/content/early/2020/03/02/2020.02.28.969980.short 4100 - http://biorxiv.org/content/early/2020/03/02/2020.02.28.969980.full AB - Microbial ecology research is currently driven by the continuously decreasing cost of DNA sequencing and the improving accuracy of data analysis methods. One such analysis method is phylogenetic placement, which establishes the phylogenetic identity of the anonymous environmental sequences in a sample by means of a given phylogenetic reference tree. However, assessing the diversity of a sample remains challenging, as traditional methods do not scale well with the increasing data volumes and/or do not leverage the phylogenetic placement information.Here, we present SCRAPP, a highly parallel and scalable tool that uses a molecular species delimitation algorithm to quantify the diversity distribution over the reference phylogeny for a given phylogenetic placement of the sample. SCRAPP employs a novel approach to cluster phylogenetic placements, called placement space clustering, to efficiently perform dimensionality reduction, so as to scale on large data volumes. Furthermore, it utilizes the phylogeny-aware molecular species delimitation method mPTP to quantify diversity.We evaluated SCRAPP using both, simulated and empirical datasets. We use simulated data to verify our approach. Tests on an empirical dataset show that SCRAPP-derived metrics can classify samples by their diversity-correlated features equally well or better than existing, commonly used approaches.SCRAPP is available at https://github.com/pbdas/scrapp