TY - JOUR T1 - PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells JF - bioRxiv DO - 10.1101/765628 SP - 765628 AU - Shobana V. Stassen AU - Dickson M. D. Siu AU - Kelvin C. M. Lee AU - Joshua W. K. Ho AU - Hayden K. H. So AU - Kevin K. Tsia Y1 - 2019/01/01 UR - http://biorxiv.org/content/early/2019/09/11/765628.abstract N2 - Motivation New single-cell technologies continue to fuel the explosive growth in the scale of heterogeneous single-cell data. However, existing computational methods are inadequately scalable to large datasets and therefore cannot uncover the complex cellular heterogeneity.Results We introduce a highly scalable graph-based clustering algorithm PARC - phenotyping by accelerated refined community-partitioning – for ultralarge-scale, high-dimensional single-cell data (> 1 million cells). Using large single cell mass cytometry, RNA-seq and imaging-based biophysical data, we demonstrate that PARC consistently outperforms state-of-the-art clustering algorithms without sub-sampling of cells, including Phenograph, FlowSOM, and Flock, in terms of both speed and ability to robustly detect rare cell populations. For example, PARC can cluster a single cell data set of 1.1M cells within 13 minutes, compared to >2 hours to the next fastest graph-clustering algorithm, Phenograph. Our work presents a scalable algorithm to cope with increasingly large-scale single-cell analysis.Availability and Implementation https://github.com/ShobiStassen/PARC ER -