PT - JOURNAL ARTICLE AU - Florian Wagner AU - Yun Yan AU - Itai Yanai TI - K-nearest neighbor smoothing for high-throughput single-cell RNA-Seq data AID - 10.1101/217737 DP - 2018 Jan 01 TA - bioRxiv PG - 217737 4099 - http://biorxiv.org/content/early/2018/01/24/217737.short 4100 - http://biorxiv.org/content/early/2018/01/24/217737.full AB - High-throughput single-cell RNA-Seq (scRNA-Seq) methods can efficiently generate expression profiles for thousands of cells, and promise to enable the comprehensive molecular characterization of all cell types and states present in heterogeneous tissues. However, compared to bulk RNA-Seq, single-cell expression profiles are extremely noisy and only capture a fraction of transcripts present in the cell. Here, we propose an algorithm to smooth scRNA-Seq data, with the goal of significantly improving the signal-to-noise ratio of each profile, while largely preserving biological expression heterogeneity. The algorithm is based on the observation that across protocols, the technical noise exhibited by UMI-filtered scRNA-Seq data closely follows Poisson statistics. Smoothing is performed by first identifying the nearest neighbors of each cell in a step-wise fashion, based on variance-stabilized and partially smoothed expression profiles, and then aggregating their transcript counts. On data from human pancreatic islet tissue and peripheral blood mononuclear cells, we show that smoothing greatly facilitates the identification of clusters of cells and co-expressed genes. Using simulated datasets that closely mimic real expression data, we show that our algorithm drastically improves upon the accuracy of other smoothing methods. Our work implies that there exists a quantitative relationship between the number of cells profiled and the potential accuracy with which individual cell types or states can be characterized, and helps unlock the full potential of scRNA-Seq to elucidate molecular processes in healthy and disease tissues. Reference implementations of our algorithm can be found at https://github.com/yanailab/knn-smoothing.