Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

K-nearest neighbor smoothing for high-throughput single-cell RNA-Seq data

View ORCID ProfileFlorian Wagner, View ORCID ProfileYun Yan, View ORCID ProfileItai Yanai
doi: https://doi.org/10.1101/217737
Florian Wagner
1Institute for Computational Medicine, NYU School of Medicine, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Florian Wagner
Yun Yan
1Institute for Computational Medicine, NYU School of Medicine, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yun Yan
Itai Yanai
1Institute for Computational Medicine, NYU School of Medicine, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Itai Yanai
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

ABSTRACT

High-throughput single-cell RNA-Seq (scRNA-Seq) is a powerful approach for studying heterogeneous tissues and dynamic cellular processes. However, compared to bulk RNA-Seq, single-cell expression profiles are extremely noisy, as they only capture a fraction of the transcripts present in the cell. Here, we propose the k-nearest neighbor smoothing (kNN-smoothing) algorithm, designed to reduce noise by aggregating information from similar cells (neighbors) in a computationally efficient and statistically tractable manner. The algorithm is based on the observation that across protocols, the technical noise exhibited by UMI-filtered scRNA-Seq data closely follows Poisson statistics. Smoothing is performed by first identifying the nearest neighbors of each cell in a step-wise fashion, based on partially smoothed and variance-stabilized expression profiles, and then aggregating their transcript counts. We show that kNN-smoothing greatly improves the detection of clusters of cells and co-expressed genes, and clearly outperforms other smoothing methods on simulated data. To accurately perform smoothing for datasets containing highly similar cell populations, we propose the kNN-smoothing 2 algorithm, in which neighbors are determined after projecting the partially smoothed data onto the first few principal components. We show that unlike its predecessor, kNN-smoothing 2 can accurately distinguish between cells from different T cell subsets, and enables their identification in peripheral blood using unsupervised methods. Our work facilitates the analysis of scRNA-Seq data across a broad range of applications, including the identification of cell populations in heterogeneous tissues and the characterization of dynamic processes such as cellular differentiation. Reference implementations of our algorithms can be found at https://github.com/yanailab/knn-smoothing.

Footnotes

  • ↵† Email: florian.wagner{at}nyu.edu

  • ↵* Email: itai.yanai{at}nyumc.org

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Back to top
PreviousNext
Posted April 09, 2018.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
K-nearest neighbor smoothing for high-throughput single-cell RNA-Seq data
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
K-nearest neighbor smoothing for high-throughput single-cell RNA-Seq data
Florian Wagner, Yun Yan, Itai Yanai
bioRxiv 217737; doi: https://doi.org/10.1101/217737
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
K-nearest neighbor smoothing for high-throughput single-cell RNA-Seq data
Florian Wagner, Yun Yan, Itai Yanai
bioRxiv 217737; doi: https://doi.org/10.1101/217737

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3602)
  • Biochemistry (7569)
  • Bioengineering (5524)
  • Bioinformatics (20792)
  • Biophysics (10328)
  • Cancer Biology (7980)
  • Cell Biology (11638)
  • Clinical Trials (138)
  • Developmental Biology (6603)
  • Ecology (10202)
  • Epidemiology (2065)
  • Evolutionary Biology (13617)
  • Genetics (9541)
  • Genomics (12847)
  • Immunology (7921)
  • Microbiology (19541)
  • Molecular Biology (7657)
  • Neuroscience (42095)
  • Paleontology (308)
  • Pathology (1258)
  • Pharmacology and Toxicology (2202)
  • Physiology (3267)
  • Plant Biology (7041)
  • Scientific Communication and Education (1294)
  • Synthetic Biology (1951)
  • Systems Biology (5426)
  • Zoology (1117)