PT - JOURNAL ARTICLE AU - Boris Guennewig AU - Zachary Davies AU - Mark Pinese AU - Antony A Cooper TI - blkbox: Integration of multiple machine learning approaches to identify disease biomarkers AID - 10.1101/123430 DP - 2017 Jan 01 TA - bioRxiv PG - 123430 4099 - http://biorxiv.org/content/early/2017/04/03/123430.short 4100 - http://biorxiv.org/content/early/2017/04/03/123430.full AB - Motivation Machine learning (ML) is a powerful tool to create supervised models that can distinguish between classes and facilitate biomarker selection in high-dimensional datasets, including RNA Sequencing (RNA-Seq). However, it is variable as to which is the best performing ML algorithm(s) for a specific dataset, and identifying the optimal match is time consuming. blkbox is a software package including a shiny frontend, that integrates nine ML algorithms to select the best performing classifier for a specific dataset. blkbox accepts a simple abundance matrix as input, includes extensive visualization, and also provides an easy to use feature selection step to enable convenient and rapid potential biomarker selection, all without requiring parameter optimization.Results Feature selection makes blkbox computationally inexpensive while multi-functionality, including nested cross-fold validation (NCV), ensures robust results. blkbox identified algorithms that outperformed prior published ML results. Applying NCV identifies features, which are utilized to gain high accuracy.Availability The software is available as a CRAN R package and as a developer version with extended functionality on github (https://github.com/gboris/blkbox).Contact b.guennewig{at}garvan.org.au