RT Journal Article SR Electronic T1 Machine learning versus logistic regression methods for 2-year mortality prognostication in a small, heterogeneous glioma database JF bioRxiv FD Cold Spring Harbor Laboratory SP 472555 DO 10.1101/472555 A1 Panesar, Sandip S A1 D’Souza, Rhett N A1 Yeh, Fang-Cheng A1 Fernandez-Miranda, Juan C YR 2018 UL http://biorxiv.org/content/early/2018/11/17/472555.abstract AB Background Machine learning (ML) is the application of specialized algorithms to datasets for trend delineation, categorization or prediction. ML techniques have been traditionally applied to large, highly-dimensional databases. Gliomas are a heterogeneous group of primary brain tumors, traditionally graded using histopathological features. Recently the World Health Organization proposed a novel grading system for gliomas incorporating molecular characteristics. We aimed to study whether ML could achieve accurate prognostication of 2-year mortality in a small, highly-dimensional database of glioma patients.Methods We applied three machine learning techniques: artificial neural networks (ANN), decision trees (DT), support vector machine (SVM), and classical logistic regression (LR) to a dataset consisting of 76 glioma patients of all grades. We compared the effect of applying the algorithms to the raw database, versus a database where only statistically significant features were included into the algorithmic inputs (feature selection).Results Raw input consisted of 21 variables, and achieved performance of (accuracy/AUC): 70.7%/0.70 for ANN, 68%/0.72 for SVM, 66.7%/0.64 for LR and 65%/0.70 for DT. Feature selected input consisted of 14 variables and achieved performance of 73.4%/0.75 for ANN, 73.3%/0.74 for SVM, 69.3%/0.73 for LR and 65.2%/0.63 for DT.Conclusions We demonstrate that these techniques can also be applied to small, yet highly-dimensional datasets. Our ML techniques achieved reasonable performance compared to similar studies in the literature. Though local databases may be small versus larger cancer repositories, we demonstrate that ML techniques can still be applied to their analysis, though traditional statistical methods are of similar benefit.