Abstract
Understanding the complicated interactions between cells in their environment is a major challenge in genomics. Here we developed BayesPrism, a Bayesian method to jointly predict cellular composition and gene expression in each cell type, including heterogeneous malignant cells, from bulk RNA-seq using scRNA-seq as prior information. We conducted an integrative analysis of 1,412 bulk RNA-seq samples in primary glioblastoma, head and neck squamous cell carcinoma, and melanoma using single-cell datasets of 85 patients. We identified cell types correlated with clinical outcomes and explored spatial heterogeneity in tumor state and stromal composition. We refined subtypes using gene expression in malignant cells, after excluding confounding non-malignant cell types. Finally, we identified genes whose expression in malignant cells correlated with infiltration of macrophages, T cells, fibroblasts, and endothelial cells across multiple tumor types. Our work introduces a new lens that uses scRNA-seq to accurately infer cellular composition and expression in large cohorts of bulk data.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
We have made several changes following a rigrous peer review: To emphasize the general utility of our tool beyond cancer, we renamed it to BayesPrism (it was called TED in our first submission). We bolster the general utility of BayesPrism using substantial new benchmarks that go far beyond those included in our initial submission. We have also re-written a much more extensive description of how BayesPrism works in the revised manuscript, which we believe should clarify misconceptions that reviewers had. Finally, we have substantially revised our analysis of bulk cancer genomic RNA-seq data to emphasize results that we believe will have the largest impact.