TY - JOUR T1 - Varying-Censoring Aware Matrix Factorization for Single Cell RNA-Sequencing JF - bioRxiv DO - 10.1101/166736 SP - 166736 AU - F. William Townes AU - Stephanie C. Hicks AU - Martin J. Aryee AU - Rafael A. Irizarry Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/07/21/166736.abstract N2 - Single cell RNA-Seq (scRNA-Seq) has become the most widely used high-throughput technology for gene expression profiling of individual cells. The potential of being able to measure cell-to-cell variability at a high-dimensional genomic scale opens numerous new lines of investigation in basic and clinical research. For example, by identifying groups of cells with expression profiles unlike those observed in cells with known phenotypes, new cell types may be discovered. Dimension reduction followed by unsupervised clustering are the quantitative approaches typically used to facilitate such discoveries. However, a challenge for this approach is that most scRNA-Seq datasets are sparse, with the percentages of measurements reported as zero ranging from 35% to 99% across cells, and these zeros are partially explained by experimental inefficiencies that lead to censored data. Furthermore, the observed across-cell differences in the percentages of zeros are partly due to technical artifacts rather than biological differences. Unfortunately, standard dimension reduction approaches treat these censored values as true zeros, which leads to the identification of distorted low-dimensional factors. When these factors are used for clustering, the distortion leads to incorrect identification of biological groups. Here, we propose an approach that accounts for cell-specific censoring with a varying-censoring aware matrix factorization (VAMF) model that permits the identification of factors in the presence of the above described systematic bias. We demonstrate the advantages of our approach on published scRNA-Seq data and confirm these on simulated data. ER -