PT - JOURNAL ARTICLE AU - Florian Plaza Oñate AU - Alessandra C. L. Cervino AU - Frédéric Magoulès AU - S. Dusko Ehrlich AU - Matthieu Pichaud TI - Abundance-based reconstitution of microbial pan-genomes from whole-metagenome shotgun sequencing data AID - 10.1101/173203 DP - 2017 Jan 01 TA - bioRxiv PG - 173203 4099 - http://biorxiv.org/content/early/2017/10/04/173203.short 4100 - http://biorxiv.org/content/early/2017/10/04/173203.full AB - Analysis toolkits for whole-metagenome shotgun sequencing data achieved strain-level characterization of complex microbial communities by capturing intra-species gene content variation. Yet, these tools are hampered by the extent of reference genomes that are far from covering all microbial variability, as many species are still not sequenced or have only few strains available.Binning co-abundant genes obtained from de novo assembly is a powerful reference-free technique for discovering and reconstituting gene repertoire of microbial species. While current methods accurately identify species core genes, they miss many accessory genes or split them in small separated clusters.We introduce MSPminer, a computationally efficient software tool that reconstitutes Metagenomic Species Pan-genomes (MSPs) by binning co-abundant genes across large-scale metagenomic datasets. MSPminer relies on a new robust measure for grouping not only species core genes but accessory genes also. In MSPs, an empirical classifier distinguishes core from accessory and shared genes.We applied MSPminer to the largest publicly available gene abundance table which is composed of 9.9M genes quantified in 1 267 stool samples. We show that MSPminer successfully reconstitutes in a matter of several hours gene repertoire of > 1600 microbial species (some hitherto unknown) and detects many more accessory genes than existing tools. By compiling the information from thousands of samples, species gene content variability is better accounted for and their quantification is subsequently more precise