PT - JOURNAL ARTICLE AU - Lei Xiong AU - Kang Tian AU - Yuzhe Li AU - Qiangfeng Cliff Zhang TI - Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space AID - 10.1101/2021.04.06.438536 DP - 2021 Jan 01 TA - bioRxiv PG - 2021.04.06.438536 4099 - http://biorxiv.org/content/early/2021/10/11/2021.04.06.438536.short 4100 - http://biorxiv.org/content/early/2021/10/11/2021.04.06.438536.full AB - Computational tools for integrative analyses of diverse single-cell experiments are facing formidable new challenges including dramatic increases in data scale, sample heterogeneity, and the need to informatively cross-reference new data with foundational datasets. Here, we present SCALEX, a deep-learning method that integrates single-cell data by projecting cells into a batch-invariant, common cell-embedding space in a truly online manner (i.e., without retraining the model). SCALEX substantially outperforms online iNMF and other state-of-the-art non-online integration methods on benchmark single-cell datasets of diverse modalities, (e.g., scRNA-seq, scATAC-seq), especially for datasets with partial overlaps, accurately aligning similar cell populations while retaining true biological differences. We showcase SCALEX’s advantages by constructing continuously expandable single-cell atlases for human, mouse, and COVID-19 patients, each assembled from diverse data sources and growing with every new data. The online data integration capacity and superior performance makes SCALEX particularly appropriate for large-scale single-cell applications to build-upon previously hard-won scientific insights.Competing Interest StatementThe authors have declared no competing interest.