Abstract
Machine learning algorithms identify patterns that would otherwise be difficult to observe in high-dimensional molecular and clinical data. For this reason, machine learning has the potential to have a profound impact on clinical decision making and drug target discovery. Nevertheless, there remain considerable technical challenges in adapting these tools for clinical use. These challenges include clinical feature engineering, model selection, and defining optimal strategies for model training. For cancer care, RNA sequencing of patient tumor biopsies has already proven to be a powerful molecular assay to characterize tumor-intrinsic and -extrinsic phenotypes influencing therapeutic response. To improve the predictive performance of RNA-sequencing data, we developed the tauX machine learning framework to refine gene expression features and improve the performance of machine learning algorithms. The tauX framework uses aggregated ratios of positively and negatively associated predictive genes to simplify the prediction task. We showed a significant improvement in predictive performance using a large database of synthetic gene expression profiles. We also show how the tauX framework can be used to elucidate the mechanisms of response and resistance to checkpoint blockade therapy using data from the Stand Up to Cancer (SU2C) Lung Response Cohort and The Cancer Genome Atlas (TCGA). The tauX strategy achieved superior predictive performance compared to models built upon established feature engineering strategies or widely used cancer gene expression signatures. The tauX framework is available as a freely deployable docker container (https://hub.docker.com/r/pfeiljx/taux).
Competing Interest Statement
All authors are current employees of AbbVie. The design, study design, and financial support for this research were provided by AbbVie. AbbVie participated in the interpretation of the data and the review and approval of the publication.