Abstract
Cell-type deconvolution methods aim to infer cell-type composition and the cell abundances from bulk transcriptomic data. The proliferation of currently developed methods (>50), coupled with the inconsistent results obtained in many cases, highlights the pressing need for guidance in the selection of appropriate methods.. Previous proposed tests have primarily been focused on simulated data and have seen limited application to actual datasets.. The growing accessibility of systematic single-cell RNA sequencing datasets, often accompanied by bulk RNA sequencing from related or matched samples, makes it possible to benchmark the existing deconvolution methods more objectively. Here, we propose a comprehensive assessment of 29 available deconvolution methods, leveraging single-cell RNA-sequencing (scRNA-seq) data from different organs and tissues. We offer a new comprehensive framework to evaluate deconvolution across a wide range of scenarios and we provide guidelines on the preprocessing of input matrices. We then validate deconvolution results on a gold standard bulk Polymorphonuclear Peripheral/Blood Mononuclear cell (PMN/PBMC) dataset with well known cell-type proportions. We show that single-cell regression-based deconvolution methods perform well but their performance is highly dependent on the reference selection and the tissue type. Our study also explores the significant impact of various batch effects on deconvolution, including those associated with sample, study, and technology, which have been previously overlooked. Importantly, we suggest a novel methodology for consensus prediction of cell-type proportions for cases when ground truth is not available. The large-scale assessment of cell-type prediction methods is provided in a modularised pipeline for reproducibility (https://github.com/Functional-Genomics/CATD_snakemake). Lastly, we suggest/propose/recommend that the Critical Assessment of Transcriptomic Deconvolution (CATD) pipeline be employed for the efficient, simultaneous deconvolution of hundreds of real bulk samples, utilising various references. weenvision it to be used for speeding up the evaluation of newly published methods in the future.
Competing Interest Statement
The authors have declared no competing interest.
Abbreviations
- sc-RNA-seq
- single-cell RNA-Sequencing
- PMN/PBMCs
- Polymorphonuclear/Peripheral Blood Mononuclear cell