ABSTRACT
Background Cancer patients who die soon after starting chemotherapy incur costs of treatment without benefits. Accurately predicting mortality risk from chemotherapy is important, but few patient data-driven tools exist. We sought to create and validate a machine learning model predicting mortality for patients starting new chemotherapy.
Methods We obtained electronic health records for patients treated at a large cancer center (26,946 patients; 51,774 new chemotherapy regimen starts) over 2004-14, linked to Social Security data with date of death. Predictive model was derived using 2004-11 data, and performance measured on non-overlapping 2012-14 data.
Findings 30-day mortality from chemotherapy start was 2·1%. Common cancers were breast (21·1%), colorectal (19·3%), and lung (18·0%). Model predictions were accurate for all patients (AUC 0·94). Predictions for patients starting palliative chemotherapy (46·6% of regimens), for whom prognosis essential, remained highly accurate (AUC 0·92). Illustrating model discrimination, we ranked patients initiating palliative chemotherapy by model-predicted mortality risk, and calculated observed mortality by risk decile. 30-day mortality in highest risk decile was 22·6%; in lowest risk decile, no patient died. Predictions remained accurate across range of primary cancers, stages, and chemotherapies. Predictions also accurate for clinical trial regimens that first appeared in years after the model was trained (AUC 0·94). The model also performed well for prediction of 180-day mortality (AUC 0·87; mortality 74·8% in the highest risk decile vs. 0·2% in the lowest). Our predictions were more accurate than data from randomized trials of individual chemotherapies, or SEER estimates.
Interpretation A machine learning algorithm accurately predicts short-term mortality in patients starting chemotherapy using EHR data. Further work needed to show application of this algorithm in clinical workflows.