Abstract
There is an exponential increase in the number of cells measured in single-cell RNA sequencing (scRNAseq) datasets. Concurrently, scRNA-seq datasets become increasingly sparser as more zero counts are measured for many genes. We discuss that with increasing sparsity the binarized representation of gene expression becomes as informative as count-based expression. We show that downstream analyses based on binarized gene expressions give similar results to analyses based on count-based expressions. Moreover, a binarized representation scales to 17-fold more cells that can be analyzed using the same amount of computational resources. Based on these observations, we recommend the development of specialized tools for bit-aware implementations for downstream analyses tasks, creating opportunities to get a more fine-grained resolution of biological heterogeneity.
Competing Interest Statement
The authors have declared no competing interest.