Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM

Marcus Alvarez, Elior Rahmani, Brandon Jew, Kristina M. Garske, Zong Miao, Jihane N. Benhammou, Chun Jimmie Ye, Joseph R. Pisegna, Kirsi H. Pietiläinen, Eran Halperin, Päivi Pajukanta
doi: https://doi.org/10.1101/786285
Marcus Alvarez
1Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Elior Rahmani
2Computer Science Department in the School of Engineering, UCLA, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brandon Jew
3Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kristina M. Garske
1Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zong Miao
1Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
3Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jihane N. Benhammou
1Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
5Vache and Tamar Manoukian Division of Digestive Diseases, UCLA, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Chun Jimmie Ye
4Institute for Human Genetics, Department of Epidemiology and Biostatistics, Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joseph R. Pisegna
1Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
5Vache and Tamar Manoukian Division of Digestive Diseases, UCLA, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kirsi H. Pietiläinen
6Obesity Research Unit, Research Programs Unit, Diabetes and Obesity, University of Helsinki, Biomedicum Helsinki, Helsinki, Finland
7Obesity Center, Endocrinology, Abdominal Center, Helsinki University Central Hospital and University of Helsinki, Helsinki, Finland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eran Halperin
1Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
2Computer Science Department in the School of Engineering, UCLA, Los Angeles, CA, USA
3Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Päivi Pajukanta
1Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
3Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
8Institute for Precision Health, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: ppajukanta@mednet.ucla.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Single-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. Contrary to single-cell RNA seq (scRNA-seq), we observe that snRNA-seq is commonly subject to contamination by high amounts of extranuclear background RNA, which can lead to identification of spurious cell types in downstream clustering analyses if overlooked. We present a novel approach to remove debris-contaminated droplets in snRNA-seq experiments, called Debris Identification using Expectation Maximization (DIEM). Our likelihood-based approach models the gene expression distribution of debris and cell types, which are estimated using EM. We evaluated DIEM using three snRNA-seq data sets: 1) human differentiating preadipocytes in vitro, 2) fresh mouse brain tissue, and 3) human frozen adipose tissue (AT) from six individuals. All three data sets showed various degrees of extranuclear RNA contamination. We observed that existing methods fail to account for contaminated droplets and led to spurious cell types. When compared to filtering using these state of the art methods, DIEM better removed droplets containing high levels of extranuclear RNA and led to higher quality clusters. Although DIEM was designed for snRNA-seq data, we also successfully applied DIEM to single-cell data. To conclude, our novel method DIEM removes debris-contaminated droplets from single-cell-based data fast and effectively, leading to cleaner downstream analysis. Our code is freely available for use at https://github.com/marcalva/diem.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Back to top
PreviousNext
Posted October 02, 2019.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM
Marcus Alvarez, Elior Rahmani, Brandon Jew, Kristina M. Garske, Zong Miao, Jihane N. Benhammou, Chun Jimmie Ye, Joseph R. Pisegna, Kirsi H. Pietiläinen, Eran Halperin, Päivi Pajukanta
bioRxiv 786285; doi: https://doi.org/10.1101/786285
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM
Marcus Alvarez, Elior Rahmani, Brandon Jew, Kristina M. Garske, Zong Miao, Jihane N. Benhammou, Chun Jimmie Ye, Joseph R. Pisegna, Kirsi H. Pietiläinen, Eran Halperin, Päivi Pajukanta
bioRxiv 786285; doi: https://doi.org/10.1101/786285

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (2533)
  • Biochemistry (4977)
  • Bioengineering (3486)
  • Bioinformatics (15232)
  • Biophysics (6910)
  • Cancer Biology (5395)
  • Cell Biology (7753)
  • Clinical Trials (138)
  • Developmental Biology (4539)
  • Ecology (7159)
  • Epidemiology (2059)
  • Evolutionary Biology (10234)
  • Genetics (7517)
  • Genomics (9794)
  • Immunology (4863)
  • Microbiology (13234)
  • Molecular Biology (5144)
  • Neuroscience (29465)
  • Paleontology (203)
  • Pathology (838)
  • Pharmacology and Toxicology (1466)
  • Physiology (2142)
  • Plant Biology (4756)
  • Scientific Communication and Education (1013)
  • Synthetic Biology (1338)
  • Systems Biology (4014)
  • Zoology (768)