RT Journal Article SR Electronic T1 BayesDeBulk: A Flexible Bayesian Algorithm for the Deconvolution of Bulk Tumor Data JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.06.25.449763 DO 10.1101/2021.06.25.449763 A1 Francesca Petralia A1 Anna P. Calinawan A1 Song Feng A1 Sara Gosline A1 Pietro Pugliese A1 Michele Ceccarelli A1 Pei Wang YR 2021 UL http://biorxiv.org/content/early/2021/06/25/2021.06.25.449763.abstract AB Characterizing the tumor microenvironment is crucial in order to improve responsiveness to immunotherapy and develop new therapeutic strategies. The fraction of different cell-types in the tumor microenvironment can be estimated based on transcriptomic profiling of bulk tumor data via deconvolution algorithms. One class of such algorithms, known as reference-based, rely on a reference signature containing gene expression data for various cell-types. The limitation of these methods is that such a signature is derived from the gene expression of pure cell-types, which might not be consistent with the transcriptomic profiling in solid tumors. On the other hand, reference-free methods usually require only a set of cell-specific markers to perform deconvolution; however, once the different components have been estimated from the data, their labeling can be problematic. To overcome these limitations, we propose BayesDeBulk - a new reference-free Bayesian method for bulk deconvolution based on gene expression data. Given a list of markers expressed in each cell-type (cell-specific markers), a repulsive prior is placed on the mean of gene expression in different cell-types to ensure that cell-specific markers are upregulated in a particular component. Contrary to existing reference-free methods, the labeling of different components is decided a priori through a repulsive prior. Furthermore, the advantage over reference-based algorithms is that the cell fractions as well as the gene expression of different cells are estimated from the data, simultaneously. Given its flexibility, BayesDeBulk can be utilized to perform bulk deconvolution beyond transcriptomic data, based on other data types such as proteomic profiles or the integration of both transcriptomic and proteomic profiles.Competing Interest StatementThe authors have declared no competing interest.