Abstract
Single-cell sequencing is a crucial tool for dissecting the cellular intricacies of complex diseases. Its prohibitive cost, however, hampers its application in expansive biomedical studies. Traditional cellular deconvolution approaches can infer cell type proportions from more affordable bulk sequencing data, yet they fall short in providing the detailed resolution required for single-cell-level analyses. To overcome this challenge, we introduce “scSemiProfiler”, an innovative computational framework that marries deep generative models with active learning strategies. This method adeptly infers single-cell profiles across large cohorts by fusing bulk sequencing data with targeted single-cell sequencing from a few rigorously chosen representatives. Extensive validation across heterogeneous datasets verifies the precision of our semi-profiling approach, aligning closely with true single-cell profiling data and empowering refined cellular analyses. Originally developed for extensive disease cohorts, “scSemiProfiler” is adaptable for broad applications. It provides a scalable, cost-effective solution for single-cell profiling, facilitating in-depth cellular investigation in various biological domains.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
↵# Contributing authors: jingtao.wang{at}mail.mcgill.ca; gregory.fonseca{at}mcgill.ca
New experiments have been carried out to better justify the effectiveness of the in silico inference using the deep generative learning model. Relevant results are presented in the results section and supplementary figures. More details and discussion of the methods are added. New functionalities of the tools such as the global mode and the de novo cell type annotation of the generated data are discussed in the new version. More validation experiments justifying each step of the tool are added.
https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-10026
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE226081
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE200596