RT Journal Article SR Electronic T1 Deep learning detects virus presence in cancer histology JF bioRxiv FD Cold Spring Harbor Laboratory SP 690206 DO 10.1101/690206 A1 Jakob Nikolas Kather A1 Jefree Schulte A1 Heike I. Grabsch A1 Chiara Loeffler A1 Hannah Muti A1 James Dolezal A1 Andrew Srisuwananukorn A1 Nishant Agrawal A1 Sara Kochanny A1 Saskia von Stillfried A1 Peter Boor A1 Takaki Yoshikawa A1 Dirk Jaeger A1 Christian Trautwein A1 Peter Bankhead A1 Nicole A. Cipriani A1 Tom Luedde A1 Alexander T. Pearson YR 2019 UL http://biorxiv.org/content/early/2019/07/05/690206.abstract AB Oncogenic viruses like human papilloma virus (HPV) or Epstein Barr virus (EBV) are a major cause of human cancer. Viral oncogenesis has a direct impact on treatment decisions because virus-associated tumors can demand a lower intensity of chemotherapy and radiation or can be more susceptible to immune check-point inhibition. However, molecular tests for HPV and EBV are not ubiquitously available.We hypothesized that the histopathological features of virus-driven and non-virus driven cancers are sufficiently different to be detectable by artificial intelligence (AI) through deep learning-based analysis of images from routine hematoxylin and eosin (HE) stained slides. We show that deep transfer learning can predict presence of HPV in head and neck cancer with a patient-level 3-fold cross validated area-under-the-curve (AUC) of 0.89 [0.82; 0.94]. The same workflow was used for Epstein-Barr virus (EBV) driven gastric cancer achieving a cross-validated AUC of 0.80 [0.70; 0.92] and a similar performance in external validation sets. Reverse-engineering our deep neural networks, we show that the key morphological features can be made understandable to humans.This workflow could enable a fast and low-cost method to identify virus-induced cancer in clinical trials or clinical routine. At the same time, our approach for feature visualization allows pathologists to look into the black box of deep learning, enabling them to check the plausibility of computer-based image classification.