RT Journal Article
SR Electronic
T1 I Tried a Bunch of Things: The Dangers of Unexpected Overfitting in Classification
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 078816
DO 10.1101/078816
A1 Michael Skocik
A1 John Collins
A1 Chloe Callahan-Flintoft
A1 Howard Bowman
A1 Brad Wyble
YR 2016
UL http://biorxiv.org/content/early/2016/10/03/078816.abstract
AB Machine learning is a powerful set of techniques that has enhanced the abilities of neuroscientists to interpret information collected through EEG, fMRI, MEG, and PET data. With these new techniques come new dangers of overfitting that are not well understood by the neuroscience community. In this article, we use Support Vector Machine (SVM) classifiers, and genetic algorithms to demonstrate the ease by which overfitting can occur, despite the use of cross validation. We demonstrate that comparable and non-generalizable results can be obtained on informative and non-informative (i.e. random) data by iteratively modifying hyperparameters in seemingly innocuous ways. We recommend a number of techniques for limiting overfitting, such as lock boxes, blind analyses, and pre-registrations. These techniques, although uncommon in neuroscience applications, are common in many other fields that use machine learning, including computer science and physics. Adopting similar safeguards is critical for ensuring the robustness of machine-learning techniques.