PT - JOURNAL ARTICLE AU - Wenkai Han AU - Yuqi Cheng AU - Jiayang Chen AU - Huawen Zhong AU - Zhihang Hu AU - Siyuan Chen AU - Licheng Zong AU - Irwin King AU - Xin Gao AU - Yu Li TI - Self-supervised contrastive learning for integrative single cell RNA-seq data analysis AID - 10.1101/2021.07.26.453730 DP - 2021 Jan 01 TA - bioRxiv PG - 2021.07.26.453730 4099 - http://biorxiv.org/content/early/2021/07/27/2021.07.26.453730.short 4100 - http://biorxiv.org/content/early/2021/07/27/2021.07.26.453730.full AB - Single-cell RNA-sequencing (scRNA-seq) has become a powerful tool to reveal the complex biological diversity and heterogeneity among cell populations. However, the technical noise and bias of the technology still have negative impacts on the downstream analysis. Here, we present a self-supervised Contrastive LEArning framework for scRNA-seq (CLEAR) profile representation and the downstream analysis. CLEAR overcomes the heterogeneity of the experimental data with a specifically designed representation learning task and thus can handle batch effects and dropout events. In the task, the deep learning model learns to pull together the representations of similar cells while pushing apart distinct cells, without manual labeling. It achieves superior performance on a broad range of fundamental tasks, including clustering, visualization, dropout correction, batch effect removal, and pseudo-time inference. The proposed method successfully identifies and illustrates inflammatory-related mechanisms in a COVID-19 disease study with 43,695 single cells from peripheral blood mononuclear cells. Further experiments to process a million-scale single-cell dataset demonstrate the scalability of CLEAR. This scalable method generates effective scRNA-seq data representation while eliminating technical noise, and it will serve as a general computational framework for single-cell data analysis.Competing Interest StatementThe authors have declared no competing interest.