Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets

Davide Risso, Liam Purvis, Russell Fletcher, Diya Das, John Ngai, Sandrine Dudoit, Elizabeth Purdom
doi: https://doi.org/10.1101/280545
Davide Risso
1Division of Biostatistics and Epidemiology, Weill Cornell Medicine, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Liam Purvis
2Department of Statistics, UC Berkeley, Berkeley, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Russell Fletcher
3Department of Molecular and Cell Biology, UC Berkeley, Berkeley, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Diya Das
3Department of Molecular and Cell Biology, UC Berkeley, Berkeley, CA, USA
4Berkeley Institute for Data Science, UC Berkeley, CA USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John Ngai
3Department of Molecular and Cell Biology, UC Berkeley, Berkeley, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sandrine Dudoit
2Department of Statistics, UC Berkeley, Berkeley, CA, USA
5Division of Biostatistics, UC Berkeley, Berkeley, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Elizabeth Purdom
2Department of Statistics, UC Berkeley, Berkeley, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: epurdom@stat.berkeley.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Clustering of genes and/or samples is a common task in gene expression analysis. The goals in clustering can vary, but an important scenario is that of finding biologically meaningful subtypes within the samples. This is an application that is particularly appropriate when there are large numbers of samples, as in many human disease studies. With the increasing popularity of single-cell transcriptome sequencing (RNA-Seq), many more controlled experiments on model organisms are similarly creating large gene expression datasets with the goal of detecting previously unknown heterogeneity within cells.

It is common in the detection of novel subtypes to run many clustering algorithms, as well as rely on subsampling and ensemble methods to improve robustness. We introduce a Bioconductor R package, clusterExperiment, that implements a general and flexible strategy we entitle Resampling-based Sequential Ensemble Clustering (RSEC). RSEC enables the user to easily create multiple, competing clusterings of the data based on different techniques and associated tuning parameters, including easy integration of resampling and sequential clustering, and then provides methods for consolidating the multiple clusterings into a final consensus clustering. The package is modular and allows the user to separately apply the individual components of the RSEC procedure, i.e., apply multiple clustering algorithms, create a consensus clustering or choose tuning parameters, and merge clusters. Additionally, clusterExperimentprovides a variety of visualization tools for the clustering process, as well as methods for the identification of possible cluster signatures or biomarkers.

The package clusterExperimentis publicly available through the Bioconductor Project, with a detailed manual (vignette) as well as well documented help pages for each function.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted March 12, 2018.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets
Davide Risso, Liam Purvis, Russell Fletcher, Diya Das, John Ngai, Sandrine Dudoit, Elizabeth Purdom
bioRxiv 280545; doi: https://doi.org/10.1101/280545
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets
Davide Risso, Liam Purvis, Russell Fletcher, Diya Das, John Ngai, Sandrine Dudoit, Elizabeth Purdom
bioRxiv 280545; doi: https://doi.org/10.1101/280545

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4094)
  • Biochemistry (8784)
  • Bioengineering (6490)
  • Bioinformatics (23377)
  • Biophysics (11761)
  • Cancer Biology (9163)
  • Cell Biology (13267)
  • Clinical Trials (138)
  • Developmental Biology (7420)
  • Ecology (11380)
  • Epidemiology (2066)
  • Evolutionary Biology (15109)
  • Genetics (10408)
  • Genomics (14017)
  • Immunology (9133)
  • Microbiology (22085)
  • Molecular Biology (8792)
  • Neuroscience (47417)
  • Paleontology (350)
  • Pathology (1421)
  • Pharmacology and Toxicology (2483)
  • Physiology (3710)
  • Plant Biology (8060)
  • Scientific Communication and Education (1433)
  • Synthetic Biology (2213)
  • Systems Biology (6019)
  • Zoology (1251)