Abstract
scRNA-seq dataset integration occurs in different contexts, such as the identification of cell type-specific differences in gene expression across conditions or species, or batch effect correction. We present scAlign, an unsupervised deep learning method for data integration that can incorporate partial, overlapping or a complete set of cell labels, and estimate per-cell differences in gene expression across datasets. scAlign performance is state-of-the-art and robust to cross-dataset variation in cell type-specific expression and cell type composition. We demonstrate that scAlign identifies a rare cell population likely to drive malaria transmission. Our framework is widely applicable to integration challenges in other domains.
List of Abbreviations
- scRNA-seq
- Single-cell RNA sequencing
- LT-HSC
- Long-term hematopoietic stem cell
- ST-HSC
- Short-term hematopoietic stem cell
- MPP
- Multi-potent progenitor
- DEG
- differentially expressed gene
- LPS
- Lipopolysaccharide
- PCA
- Principal components analysis
- CCA
- Canonical correlation analysis
- tSNE
- t-distributed stochastic neighbor embedding
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.