Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Improving replicability in single-cell RNA-Seq cell type discovery with Dune

View ORCID ProfileHector Roux de Bézieux, View ORCID ProfileKelly Street, View ORCID ProfileStephan Fischer, View ORCID ProfileKoen Van den Berge, Rebecca Chance, View ORCID ProfileDavide Risso, View ORCID ProfileJesse Gillis, View ORCID ProfileJohn Ngai, View ORCID ProfileElizabeth Purdom, View ORCID ProfileSandrine Dudoit
doi: https://doi.org/10.1101/2020.03.03.974220
Hector Roux de Bézieux
1Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, USA
9Center for Computational Biology, University of California, Berkeley, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Hector Roux de Bézieux
Kelly Street
2Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
3Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kelly Street
Stephan Fischer
4Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Stephan Fischer
Koen Van den Berge
5Department of Statistics, University of California, Berkeley, CA, USA
6Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Koen Van den Berge
Rebecca Chance
7Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Davide Risso
8Department of Statistical Sciences, University of Padova, Padova, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Davide Risso
Jesse Gillis
4Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jesse Gillis
John Ngai
7Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for John Ngai
Elizabeth Purdom
5Department of Statistics, University of California, Berkeley, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Elizabeth Purdom
Sandrine Dudoit
1Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, USA
5Department of Statistics, University of California, Berkeley, CA, USA
9Center for Computational Biology, University of California, Berkeley, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sandrine Dudoit
  • For correspondence: sandrine@stat.berkeley.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Single-cell transcriptome sequencing (scRNA-Seq) has allowed many new types of investigations at unprecedented and unique levels of resolution. Among the primary goals of scRNA-Seq is the classification of cells into potentially novel cell types. Many approaches build on the existing clustering literature to develop tools specific to single-cell applications. However, almost all of these methods rely on heuristics or user-supplied parameters to control the number of clusters identified. This affects both the resolution of the clusters within the original dataset as well as their replicability across datasets. While many recommendations exist to select these tuning parameters, most of them are quite ad hoc. In general, there is little assurance that any given set of parameters will represent an optimal choice in the ever-present trade-off between cluster resolution and replicability. For instance, it may be the case that another set of parameters will result in more clusters that are also more replicable, or in fewer clusters that are also less replicable.

Here, we propose a new method called Dune for optimizing the trade-off between the resolution of the clusters and their replicability across datasets. Our method takes as input a set of clustering results on a single dataset, derived from any set of clustering algorithms and associated tuning parameters, and iteratively merges clusters within partitions in order to maximize their concordance between partitions. As demonstrated on a variety of scRNA-Seq datasets from different platforms, Dune outperforms existing techniques, that rely on hierarchical merging for reducing the number of clusters, in terms of replicability of the resultant merged clusters. It provides an objective approach for identifying replicable consensus clusters most likely to represent common biological features across multiple datasets.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted March 04, 2020.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Improving replicability in single-cell RNA-Seq cell type discovery with Dune
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Improving replicability in single-cell RNA-Seq cell type discovery with Dune
Hector Roux de Bézieux, Kelly Street, Stephan Fischer, Koen Van den Berge, Rebecca Chance, Davide Risso, Jesse Gillis, John Ngai, Elizabeth Purdom, Sandrine Dudoit
bioRxiv 2020.03.03.974220; doi: https://doi.org/10.1101/2020.03.03.974220
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Improving replicability in single-cell RNA-Seq cell type discovery with Dune
Hector Roux de Bézieux, Kelly Street, Stephan Fischer, Koen Van den Berge, Rebecca Chance, Davide Risso, Jesse Gillis, John Ngai, Elizabeth Purdom, Sandrine Dudoit
bioRxiv 2020.03.03.974220; doi: https://doi.org/10.1101/2020.03.03.974220

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4087)
  • Biochemistry (8762)
  • Bioengineering (6479)
  • Bioinformatics (23341)
  • Biophysics (11750)
  • Cancer Biology (9149)
  • Cell Biology (13247)
  • Clinical Trials (138)
  • Developmental Biology (7416)
  • Ecology (11369)
  • Epidemiology (2066)
  • Evolutionary Biology (15087)
  • Genetics (10399)
  • Genomics (14009)
  • Immunology (9121)
  • Microbiology (22040)
  • Molecular Biology (8779)
  • Neuroscience (47367)
  • Paleontology (350)
  • Pathology (1420)
  • Pharmacology and Toxicology (2482)
  • Physiology (3704)
  • Plant Biology (8050)
  • Scientific Communication and Education (1431)
  • Synthetic Biology (2208)
  • Systems Biology (6016)
  • Zoology (1249)