RT Journal Article SR Electronic T1 SIAMCAT: user-friendly and versatile machine learning workflows for statistically rigorous microbiome analyses JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.02.06.931808 DO 10.1101/2020.02.06.931808 A1 Jakob Wirbel A1 Konrad Zych A1 Morgan Essex A1 Nicolai Karcher A1 Ece Kartal A1 Guillem Salazar A1 Peer Bork A1 Shinichi Sunagawa A1 Georg Zeller YR 2020 UL http://biorxiv.org/content/early/2020/02/06/2020.02.06.931808.abstract AB The human microbiome is increasingly mined for diagnostic and therapeutic biomarkers. However, computational tools tailored to such analyses are still scarce. Here, we present the SIAMCAT R package, a versatile and user-friendly toolbox for comparative metagenome analyses using machine learning (ML), statistical tests, and visualization. Based on a large meta-analysis of gut microbiome studies, we optimized the choice of ML algorithms and preprocessing routines for default workflow settings. Furthermore, we illustrate common pitfalls leading to overfitting and show how SIAMCAT safeguards against these to make statistically rigorous ML workflows broadly accessible. SIAMCAT is available from siamcat.embl.de and Bioconductor.