TY - JOUR T1 - Gaussian Process Based Heteroscedastic Noise Modeling for Tumor Mutation Burden Prediction from Whole Slide Images JF - bioRxiv DO - 10.1101/554261 SP - 554261 AU - Sunho Park AU - Hongming Xu AU - Tae Hyun Hwang Y1 - 2019/01/01 UR - http://biorxiv.org/content/early/2019/02/18/554261.abstract N2 - Tumor mutation burden (TMB) is a quantitative measurement of how many mutations present in tumor cells from a patient tumor as assessed by next-generation sequencing (NGS) technology. High TMB is used as a predictive biomarker to select patients that likely respond to immunotherapy in many cancer types, thus it is critical to accurately measure TMB for cancer patients who need to receive the immunotherapy. Recent studies showed that image features from histopathology whole slide images can be used to predict genetic features (e.g., mutation status) or clinical outcome of cancer patients. In this study, we develop a computational method to predict the TMB level from cancer patients’ histopathology whole slide images. The prediction problem is formulated as multiple instance learning (MIL) because a whole slide image (a bag) has to be divided into multiple image blocks (instances) due to computational reasons but a single label is available only to an entire whole slide image not to each image block. In particular, we propose a novel heteroscedastic noise model for MIL based on the framework of Gaussian process (GP), where the noise variance is assumed to be a latent function of image level features. This noise variance can encode the confidence in predicting the TMB level from each training image and make the method to put different levels of effort to classify images according to how difficult each image can be correctly classified. The method tries to fit an easier image well while it does not put much effort in classifying a harder (ambiguous) image correctly. Expectation and propagation (EP) is employed to efficiently infer our model and to find the optimal hyper-parameters. We have demonstrated from synthetic and real-world data sets that our method outperforms on TMB prediction from whole slide images base-line methods, including a special case of our method that does not include the heteroscedastic noise modeling and multiple instance ordinal regression (MIOR) that is one of few algorithms to solve ordinal regression in the MIL setting. ER -