RT Journal Article SR Electronic T1 Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.01.13.426593 DO 10.1101/2021.01.13.426593 A1 Yifan Zhao A1 Huiyu Cai A1 Zuobai Zhang A1 Jian Tang A1 Yue Li YR 2021 UL http://biorxiv.org/content/early/2021/06/10/2021.01.13.426593.abstract AB The advent of single-cell RNA sequencing (scRNA-seq) technologies has revolutionized transcriptomic studies. However, large-scale integrative analysis of scRNA-seq data remains a challenge largely due to unwanted batch effects and the limited transferabilty, interpretability, and scalability of the existing computational methods. We present single-cell Embedded Topic Model (scETM). Our key contribution is the utilization of a transferable neural-network-based encoder while having an interpretable linear decoder via a matrix tri-factorization. In particular, scETM simultaneously learns an encoder network to infer cell type mixture and a set of highly interpretable gene embeddings, topic embeddings, and batch effect linear intercepts from multiple scRNA-seq datasets. scETM is scalable to over 106 cells and confers remarkable cross-tissue and cross-species zero-shot transfer-learning performance. Using gene set enrichment analysis, we find that scETM-learned topics are enriched in biologically meaningful and disease-related pathways. Lastly, scETM enables the incorporation of known gene sets into the gene embeddings, thereby directly learning the associations between pathways and topics via the topic embeddings.Competing Interest StatementThe authors have declared no competing interest.