RT Journal Article SR Electronic T1 PhenoGMM: Gaussian mixture modelling of microbial cytometry data enables efficient predictions of biodiversity JF bioRxiv FD Cold Spring Harbor Laboratory SP 641464 DO 10.1101/641464 A1 Peter Rubbens A1 Ruben Props A1 Frederiek-Maarten Kerckhof A1 Nico Boon A1 Willem Waegeman YR 2019 UL http://biorxiv.org/content/early/2019/05/18/641464.abstract AB Motivation Microbial flow cytometry allows to rapidly characterize microbial community diversity and dynamics. Recent research has demonstrated a strong connection between the cytometric diversity and taxonomic diversity based on 16S rRNA gene amplicon sequencing data. This creates the opportunity to integrate both types of data to study and predict the microbial community diversity in an automated and efficient way. However, microbial flow cytometry data results in a number of unique challenges that need to be addressed.Results The results of our work are threefold: i) We expand current microbial cytometry fingerprinting approaches by using a model-based fingerprinting approach based upon Gaussian Mixture Models, which we called PhenoGMM. ii) We show that microbial diversity can be rapidly estimated by PhenoGMM. In combination with a supervised machine learning model, diversity estimations based on 16S rRNA gene amplicon sequencing data can be predicted. iii) We evaluate our method extensively by using multiple datasets from different ecosystems and compare its predictive power with a generic binning fingerprinting approach that is commonly used in microbial flow cytometry. These results confirm the strong connection between the genetic make-up of a microbial community and its phenotypic properties as measured by flow cytometry.Availability All code and data supporting this manuscript is freely available on GitHub at: https://github.com/prubbens/PhenoGMM. Raw flow cytometry data is freely available on FlowRepository and raw sequences via the NCBI Sequence Read Archive. The functionality of PhenoGMM has been incorporated in the R package PhenoFlow: https://github.com/CMET-UGent/Phenoflow_package.Supplementary information Supplementary data are available in attachment to this submission.