Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Probabilistic Count Matrix Factorization for Single Cell Expression Data Analysis

Ghislain Durif, Laurent Modolo, Jeff E. Mold, Sophie Lambert-Lacroix, Franck Picard
doi: https://doi.org/10.1101/211938
Ghislain Durif
Inria;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: gd.dev@libertymail.net
Laurent Modolo
CNRS;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jeff E. Mold
Karolinska Institutet;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sophie Lambert-Lacroix
Grenoble Alpes University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Franck Picard
CNRS;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

The development of high throughput single-cell technologies now allows the investigation of the genome-wide diversity of transcription. This diversity has shown two faces: the expression dynamics (gene to gene variability) can be quantified more accurately, thanks to the measurement of lowly-expressed genes. Second, the cell-to-cell variability is high, with a low proportion of cells expressing the same gene at the same time/level. Those emerging patterns appear to be very challenging from the statistical point of view, especially to represent and to provide a summarized view of single-cell expression data. PCA is one of the most powerful framework to provide a suitable representation of high dimensional datasets, by searching for new axis catching the most variability in the data. Unfortunately, classical PCA is based on Euclidean distances and projections that work poorly in presence of over-dispersed counts that show zero-inflation. We propose a probabilistic Count Matrix Factorization (pCMF) approach for single-cell expression data analysis, that relies on a sparse Gamma-Poisson factor model. This hierarchical model is inferred using a variational EM algorithm. We show how this probabilistic framework induces a geometry that is suitable for single-cell data, and produces a compression of the data that is very powerful for clustering purposes. Our method is competed to other standard representation methods like t-SNE, and we illustrate its performance for the representation of single-cell data. We especially focus on a publicly available data set, being single-cell expression profile of neural stem cells.

Copyright 
The copyright holder for this preprint is the author/funder. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
  • Posted October 31, 2017.

Download PDF

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Probabilistic Count Matrix Factorization for Single Cell Expression Data Analysis
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
Share
Probabilistic Count Matrix Factorization for Single Cell Expression Data Analysis
Ghislain Durif, Laurent Modolo, Jeff E. Mold, Sophie Lambert-Lacroix, Franck Picard
bioRxiv 211938; doi: https://doi.org/10.1101/211938
del.icio.us logo Digg logo Reddit logo Technorati logo Twitter logo CiteULike logo Connotea logo Facebook logo Google logo Mendeley logo
Citation Tools
Probabilistic Count Matrix Factorization for Single Cell Expression Data Analysis
Ghislain Durif, Laurent Modolo, Jeff E. Mold, Sophie Lambert-Lacroix, Franck Picard
bioRxiv 211938; doi: https://doi.org/10.1101/211938

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (542)
  • Biochemistry (742)
  • Bioengineering (447)
  • Bioinformatics (4329)
  • Biophysics (1316)
  • Cancer Biology (890)
  • Cell Biology (1257)
  • Clinical Trials (43)
  • Developmental Biology (845)
  • Ecology (1456)
  • Epidemiology (702)
  • Evolutionary Biology (3437)
  • Genetics (2327)
  • Genomics (3012)
  • Immunology (479)
  • Microbiology (1935)
  • Molecular Biology (757)
  • Neuroscience (5755)
  • Paleontology (36)
  • Pathology (106)
  • Pharmacology and Toxicology (184)
  • Physiology (238)
  • Plant Biology (806)
  • Scientific Communication and Education (222)
  • Synthetic Biology (352)
  • Systems Biology (1192)
  • Zoology (148)