RT Journal Article SR Electronic T1 Prediction and inference diverge in biomedicine: Simulations and real-world data JF bioRxiv FD Cold Spring Harbor Laboratory SP 327437 DO 10.1101/327437 A1 Danilo Bzdok A1 Denis Engemann A1 Olivier Grisel A1 Gaël Varoquaux A1 Bertrand Thirion YR 2018 UL http://biorxiv.org/content/early/2018/05/21/327437.abstract AB In the 20th century many advances in biological knowledge and evidence-based medicine were supported by p-values and accompanying methods. In the beginning 21st century, ambitions towards precision medicine put a premium on detailed predictions for single individuals. The shift causes tension between traditional methods used to infer statistically significant group differences and burgeoning machine-learning tools suited to forecast an individual’s future. This comparison applies the linear model for identifying significant contributing variables and for finding the most predictive variable sets. In systematic data simulations and common medical datasets, we explored how statistical inference and pattern recognition can agree and diverge. Across analysis scenarios, even small predictive performances typically coincided with finding underlying significant statistical relationships. However, even statistically strong findings with very low p-values shed little light on their value for achieving accurate prediction in the same dataset. More complete understanding of different ways to define ‘important’ associations is a prerequisite for reproducible research findings that can serve to personalize clinical care.