Abstract
The advent of single-cell RNA sequencing (scRNA-seq) technologies has revolutionized transcriptomic studies. However, integrative analysis of scRNA-seq data remains a challenge largely due to batch effects. We present single-cell Embedded Topic Model (scETM), an unsupervised deep generative model that recapitulates known cell types by inferring the latent cell topic mixtures via a variational autoencoder. scETM is scalable to over 106 cells and enables effective knowledge transfer across datasets. scETM also offers high inter-pretability and allows the incorporation of prior pathway knowledge into the gene embeddings. The scETM-inferred topics show enrichment in cell-type-specific and disease-related pathways.
Competing Interest Statement
The authors have declared no competing interest.