RT Journal Article SR Electronic T1 CosTaL: An Accurate and Scalable Graph-Based Clustering Algorithm for High-Dimensional Single-Cell Data Analysis JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.11.10.516044 DO 10.1101/2022.11.10.516044 A1 Li, Yijia A1 Nguyen, Jonathan A1 Anastasiu, David A1 Arriaga, Edgar A. YR 2022 UL http://biorxiv.org/content/early/2022/11/10/2022.11.10.516044.abstract AB With the aim of analyzing large-sized multidimensional single-cell datasets, we are describing our method for Cosine-based Tanimoto similarity-refined graph for community detection using Leiden’s algorithm (CosTaL). As a graph-based clustering method, CosTaL transforms the cells with high-dimensional features into a weighted k-nearest-neighbor (kNN) graph. The cells are represented by the vertices of the graph, while an edge between two vertices in the graph represents the close relatedness between the two similar cells. Specifically, CosTaL builds an exact kNN graph using cosine similarity and uses the Tanimoto coefficient as the refining strategy to re-weight the edges in order to improve the effectiveness of clustering. We demonstrate that CosTaL generally achieves equivalent or higher effectiveness scores on seven benchmark cytometry datasets and six single-cell RNA-sequencing datasets using six different evaluation metrics, compared with other state-of-the-art graph-based clustering methods, including PhenoGraph, Scanpy, and PARC. CosTaL is also the most efficient algorithm on large datasets, suggesting that CosTaL generally has better scalability than the other methods, which is beneficial for large-scale analysis.Competing Interest StatementThe authors have declared no competing interest.