ABSTRACT
Metacells are cell groupings derived from single-cell sequencing data that represent highly granular, distinct cell states. Here, we present single-cell aggregation of cell-states (SEACells), an algorithm for identifying metacells; overcoming the sparsity of single-cell data, while retaining heterogeneity obscured by traditional cell clustering. SEACells outperforms existing algorithms in identifying accurate, compact, and well-separated metacells in both RNA and ATAC modalities across datasets with discrete cell types and continuous trajectories. We demonstrate the use of SEACells to improve gene-peak associations, compute ATAC gene scores and measure gene accessibility in each metacell. Metacell-level analysis scales to large datasets and are particularly well suited for patient cohorts, including facilitation of data integration. We use our metacells to reveal expression dynamics and gradual reconfiguration of the chromatin landscape during hematopoietic differentiation, and to uniquely identify CD4 T cell differentiation and activation states associated with disease onset and severity in a COVID-19 patient cohort.
Competing Interest Statement
Dana Pe'er is on the scientific advisory board of Insitro.