Abstract
Immunotherapies have achieved phenomenal success in the treatment of cancer and promise even more breakthroughs in the near future. The need to understand the underlying mechanisms of immunotherapies and to develop precision immunotherapy regimens has spurred great interest in characterizing immune cell composition within the tumor microenvironment. Several methods have been developed to estimate immune cell composition using gene expression data from bulk tumor samples. However, these methods are not flexible enough to handle aberrant patterns of gene expression data, e.g., inconsistent cell type-specific gene expression between purified reference samples and this cell type in tumor samples. In this paper, we present a novel statistical model for expression deconvolution called ICeD-T (Immune Cell Deconvolution in Tumor tissues), which models gene expression by a log-normal distribution that is appropriate for both microarray and RNA-seq data. ICeD-T automatically identifies aberrant genes whose expressions are inconsistent with the deconvolution model and down-weights their contributions to cell type abundance estimates. We evaluated the performance of ICeD-T versus existing methods in simulation studies and several real data analyses. ICeD-T displayed comparable or superior performance to these competing methods. Applying these methods to assess the relationship between immunotherapy response and immune cell composition, ICeD-T is able to identify significant associations that are missed by its competitors.