RT Journal Article SR Electronic T1 inClust: a general framework for clustering that integrates data from multiple sources JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.05.27.493706 DO 10.1101/2022.05.27.493706 A1 Lifei Wang A1 Rui Nie A1 Zhang Zhang A1 Weiwei Gu A1 Shuo Wang A1 Anqi Wang A1 Jiang Zhang A1 Jun Cai YR 2022 UL http://biorxiv.org/content/early/2022/05/29/2022.05.27.493706.abstract AB Clustering is one of the most commonly used methods in single-cell RNA sequencing (scRNA-seq) data analysis and other fields of biology. Traditional clustering methods usually use data from a single source as the input (e.g. scRNA-seq data). However, as the data become more and more complex and contain information from multiple sources, a clustering method that could integrate multiple data is required. Here, we present inClust (integrated clustering), a clustering method that integrates information from multiple sources based on variational autoencoder and vector arithmetic in latent space. inClust perform information integration and clustering jointly, meanwhile it could utilize the labeling information from data as regulation information. It is a flexible framework that can accomplish different tasks under different modes, ranging from supervised to unsupervised. We demonstrate the capability of inClust in the tasks of conditional out-of-distribution generation under supervised mode; label transfer under semi-supervised mode and guided clustering mode; spatial domain identification under unsupervised mode. inClust performs well in all tasks, indicating that it is an excellent general framework for clustering and task-related clustering in the era of multi-omics.Competing Interest StatementThe authors have declared no competing interest.