Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

K-nearest neighbor smoothing for high-throughput single-cell RNA-Seq data

View ORCID ProfileFlorian Wagner, View ORCID ProfileYun Yan, View ORCID ProfileItai Yanai
doi: https://doi.org/10.1101/217737
Florian Wagner
1School of Medicine, New York University, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Florian Wagner
Yun Yan
1School of Medicine, New York University, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yun Yan
Itai Yanai
1School of Medicine, New York University, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Itai Yanai
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

ABSTRACT

High-throughput single-cell RNA-Seq (scRNA-Seq) methods can efficiently generate expression profiles for thousands of cells, and promise to enable the comprehensive molecular characterization of all cell types and states present in heterogeneous tissues. However, compared to bulk RNA-Seq, single-cell expression profiles are extremely noisy and only capture a fraction of transcripts present in the cell. Here, we describe an algorithm to smooth scRNA-Seq data, with the goal of significantly improving the signal-tonoise ratio of each profile, while largely preserving biological expression heterogeneity. The algorithm is based on the observation that across platforms, the technical noise exhibited by UMI-filtered scRNA-Seq data closely follows Poisson statistics. Smoothing is performed by first identifying the nearest neighbors of each cell in a step-wise fashion, based on variance-stabilized and partially smoothed expression profiles, and then aggregating their UMI counts. For multiple datasets, the application of our algorithm resulted in more stable cell type-specific expression profiles, and recovered correlations between co-expressed genes. More generally, smoothing improved the results of commonly used dimensionality reduction and clustering methods, greatly facilitating the identification of cell subsets and clusters of co-expressed genes. Our work implies that there exists a quantitative relationship between the number of cells profiled and the potential accuracy with which individual cell types or states can be characterized, and helps unlock the full potential of scRNA-Seq to elucidate molecular processes in healthy and disease tissues.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Back to top
PreviousNext
Posted November 21, 2017.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
K-nearest neighbor smoothing for high-throughput single-cell RNA-Seq data
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
K-nearest neighbor smoothing for high-throughput single-cell RNA-Seq data
Florian Wagner, Yun Yan, Itai Yanai
bioRxiv 217737; doi: https://doi.org/10.1101/217737
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
K-nearest neighbor smoothing for high-throughput single-cell RNA-Seq data
Florian Wagner, Yun Yan, Itai Yanai
bioRxiv 217737; doi: https://doi.org/10.1101/217737

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One
Subject Areas
All Articles
  • Animal Behavior and Cognition (4110)
  • Biochemistry (8813)
  • Bioengineering (6518)
  • Bioinformatics (23456)
  • Biophysics (11788)
  • Cancer Biology (9205)
  • Cell Biology (13318)
  • Clinical Trials (138)
  • Developmental Biology (7433)
  • Ecology (11407)
  • Epidemiology (2066)
  • Evolutionary Biology (15146)
  • Genetics (10433)
  • Genomics (14041)
  • Immunology (9169)
  • Microbiology (22152)
  • Molecular Biology (8808)
  • Neuroscience (47559)
  • Paleontology (350)
  • Pathology (1428)
  • Pharmacology and Toxicology (2491)
  • Physiology (3730)
  • Plant Biology (8079)
  • Scientific Communication and Education (1437)
  • Synthetic Biology (2220)
  • Systems Biology (6037)
  • Zoology (1252)