Abstract
Tumor mutation burden (TMB) is a quantitative measurement of how many mutations present in tumor cells from a patient tumor as assessed by next-generation sequencing (NGS) technology. High TMB is used as a predictive biomarker to select patients that likely respond to immunotherapy in many cancer types; thus it is critical to accurately measure TMB for guiding patients to immunotherapy treatments.be used to predict genetic Recent studies showed that image features from histopathology whole slide images could be used to predict genetic features (e.g., mutation status) or clinical outcome of cancer patients. In this study, we develop a computational algorithm to predict the TMB level from cancer patients’ histopathology whole slide images. We formulate TMP prediction problem based on whole slide images as a multiple instance learning (MIL) problem. A whole slide image (a bag) is divided into multiple small image blocks/patches (instances), but a single label (e.g., TMB level) is available only to an entire whole slide image not to each image block. In particular, we propose a novel heteroscedastic noise model for MIL based on the framework of Gaussian process (GP), where the noise variance is assumed to be a latent function of image level features. This noise variance can encode the confidence in predicting the TMB level from each training image and make the method to put different levels of effort to classify images according to how difficult each image can be correctly classified. The proposed method tries to fit an easier image well while it does not put much effort into classifying a harder (ambiguous) image correctly for TMP prediction. Expectation and propagation (EP) is employed to infer our model efficiently and to find the optimal hyper-parameters. In experiments using the whole slide images from synthetic and real-world data sets from The Cancer Genome Atlas (TCGA), we demonstrate that our method outperforms base-line methods for TMP prediction including a special case of our method that does not include the heteroscedastic noise modeling and a multiple instance ordinal regression (MIOR) to solve ordinal regression in the MIL setting.