PT - JOURNAL ARTICLE AU - Yiqing Zhang AU - William Nock AU - Meghan Wyse AU - Zachary Weber AU - Elizabeth Adams AU - Sarah Asad AU - Sinclair Stockard AU - David Tallman AU - Eric P. Winer AU - Nancy U. Lin AU - Mathew Cherian AU - Maryam B. Lustberg AU - Bhuvaneswari Ramaswamy AU - Sagar Sardesai AU - Jeffrey VanDeusen AU - Nicole Williams AU - Robert Wesolowski AU - Daniel G. Stover TI - Machine learning predicts rapid relapse of triple negative breast cancer AID - 10.1101/613604 DP - 2019 Jan 01 TA - bioRxiv PG - 613604 4099 - http://biorxiv.org/content/early/2019/04/21/613604.short 4100 - http://biorxiv.org/content/early/2019/04/21/613604.full AB - Purpose Metastatic relapse of triple-negative breast cancer (TNBC) within 2 years of diagnosis is associated with particularly aggressive disease and a distinct clinical course relative to TNBCs that relapse beyond 2 years. We hypothesized that rapid relapse TNBCs (rrTNBC; metastatic relapse or death <2 years) reflect unique genomic features relative to late relapse (lrTNBC; >2 years).Patients and Methods We identified 453 primary TNBCs from three publicly-available datasets and characterized each as rrTNBc, lrTNBC, or ‘no relapse’ (nrTNBC: no relapse/death with at least 5 years follow-up). We compiled primary tumor clinical and multi-omic data, including transcriptome (n=453), copy number alterations (CNAs; n=317), and mutations in 171 cancer-related genes (n=317), then calculated published gene expression and immune signatures.Results Patients with rrTNBC were higher stage at diagnosis (Chi-square p<0.0001) while lrTNBC were more likely to be non-basal PAM50 subtype (Chi-square p=0.03). Among 125 expression signatures, five immune signatures were significantly higher in nrTNBCs while lrTNBC were enriched for eight estrogen/luminal signatures (all FDR p<0.05). There was no significant difference in tumor mutation burden or percent genome altered across the groups. Among mutations, only TP53 mutations were significantly more frequent in rrTNBC compared to lrTNBC (Fisher exact FDR p=0.009). To develop an optimal classifier, we used 77 significant clinical and ‘omic features to evaluate six modeling approaches encompassing simple, machine learning, and artificial neural network (ANN). Support vector machine outperformed other models with average receiver-operator characteristic area under curve >0.75.Conclusions We provide a new approach to define TNBCs based on timing of relapse. We identify distinct clinical and genomic features that can be incorporated into machine learning models to predict rapid relapse of TNBC.