Abstract
One primary reason that makes the analysis of single-cell RNA-seq data challenging is dropouts, where the data only captures a small fraction of the transcriptome of each cell. Many computational algorithms developed for single-cell RNA-seq adopted gene selection and dimension reduction strategies to address the dropouts. Here, an opposite view is explored. Instead of treating dropouts as a problem to be fixed, we embrace it as a useful signal for defining cell types. We present an iterative co-occurrence clustering algorithm that works with binarized single-cell RNA-seq count data. Surprisingly, although all the quantitative information is removed after the data is binarized, co-occurrence clustering of the binarized data is able to effectively identify cell populations, as well as cell-type specific pathways. We demonstrate that the binary dropout patterns of the data provides not only overlapping but also complementary information compared to the quantitative gene expression counts in single-cell RNA-seq data.