RT Journal Article SR Electronic T1 I Tried a Bunch of Things: The Dangers of Unexpected Overfitting in Classification JF bioRxiv FD Cold Spring Harbor Laboratory SP 078816 DO 10.1101/078816 A1 Michael Skocik A1 John Collins A1 Chloe Callahan-Flintoft A1 Howard Bowman A1 Brad Wyble YR 2016 UL http://biorxiv.org/content/early/2016/10/03/078816.abstract AB Machine learning is a powerful set of techniques that has enhanced the abilities of neuroscientists to interpret information collected through EEG, fMRI, MEG, and PET data. With these new techniques come new dangers of overfitting that are not well understood by the neuroscience community. In this article, we use Support Vector Machine (SVM) classifiers, and genetic algorithms to demonstrate the ease by which overfitting can occur, despite the use of cross validation. We demonstrate that comparable and non-generalizable results can be obtained on informative and non-informative (i.e. random) data by iteratively modifying hyperparameters in seemingly innocuous ways. We recommend a number of techniques for limiting overfitting, such as lock boxes, blind analyses, and pre-registrations. These techniques, although uncommon in neuroscience applications, are common in many other fields that use machine learning, including computer science and physics. Adopting similar safeguards is critical for ensuring the robustness of machine-learning techniques.