PT - JOURNAL ARTICLE AU - Peng Qiu TI - Embracing the dropouts in single-cell RNA-seq data AID - 10.1101/468025 DP - 2018 Jan 01 TA - bioRxiv PG - 468025 4099 - http://biorxiv.org/content/early/2018/11/17/468025.short 4100 - http://biorxiv.org/content/early/2018/11/17/468025.full AB - One primary reason that makes the analysis of single-cell RNA-seq data challenging is dropouts, where the data only captures a small fraction of the transcriptome of each cell. Many computational algorithms developed for single-cell RNA-seq adopted gene selection and dimension reduction strategies to address the dropouts. Here, an opposite view is explored. Instead of treating dropouts as a problem to be fixed, we embrace it as a useful signal for defining cell types. We present an iterative co-occurrence clustering algorithm that works with binarized single-cell RNA-seq count data. Surprisingly, although all the quantitative information is removed after the data is binarized, co-occurrence clustering of the binarized data is able to effectively identify cell populations, as well as cell-type specific pathways. We demonstrate that the binary dropout patterns of the data provides not only overlapping but also complementary information compared to the quantitative gene expression counts in single-cell RNA-seq data.