Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Hera-T: an efficient and accurate approach for quantifying gene abundances from 10X-Chromium data with high rates of non-exonic reads

Thang Tran, Thao Truong, Hy Vuong, Son Pham
doi: https://doi.org/10.1101/530501
Thang Tran
1Bioturing Inc.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Thao Truong
1Bioturing Inc.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hy Vuong
1Bioturing Inc.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Son Pham
1Bioturing Inc.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: sonpham@bioturing.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

An important but rarely discussed phenomenon in single cell data generated by the 10X-Chromium protocol is that the fraction of non-exonic reads is very high. This number usually exceeds 30% of the total reads. Without aligning them to a complete genome reference, non-exonic reads can be erroneously aligned to the transcriptome reference with higher error rates. To tackle this problem, Cell Ranger chooses to firstly align reads against the whole genome, and at a later step, uses a genome annotation to select reads that align to the transcriptome. Despite its high running time and large memory consumption, Cell Ranger remains the most widely used tool to quantify 10XGenomics single cell RNA-Seq data for its accuracy.

In this work, we introduce Hera-T, a fast and accurate tool for estimating gene abundances in single cell data generated by the 10X-Chromium protocol. By devising a new strategy for aligning reads to both transcriptome and genome references, Hera-T reduces both running time and memory consumption from 10 to 100 folds while giving similar results compared to Cell Ranger’s. Hera-T also addresses some difficult splicing alignment scenarios that Cell Ranger fails to address, and therefore, obtains better accuracy compared to Cell Ranger. Excluding the reads in those scenarios, Hera-T and Cell Ranger results have correlation scores > 0.99.

For a single-cell data set with 49 million of reads, Cell Ranger took 3 hours (179 minutes) while Hera-T took 1.75 minutes; for another single-cell data set with 784 millions of reads, Cell Ranger took about 25 hours while Hera-T took 32 minutes. For those data sets, Cell Ranger completely used all 32 GB of memory while Hera-T consumed at most 8 GB. Hera-T package is available for download at: https://bioturing.com/product/hera-t

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted January 26, 2019.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Hera-T: an efficient and accurate approach for quantifying gene abundances from 10X-Chromium data with high rates of non-exonic reads
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Hera-T: an efficient and accurate approach for quantifying gene abundances from 10X-Chromium data with high rates of non-exonic reads
Thang Tran, Thao Truong, Hy Vuong, Son Pham
bioRxiv 530501; doi: https://doi.org/10.1101/530501
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Hera-T: an efficient and accurate approach for quantifying gene abundances from 10X-Chromium data with high rates of non-exonic reads
Thang Tran, Thao Truong, Hy Vuong, Son Pham
bioRxiv 530501; doi: https://doi.org/10.1101/530501

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3518)
  • Biochemistry (7373)
  • Bioengineering (5355)
  • Bioinformatics (20349)
  • Biophysics (10058)
  • Cancer Biology (7788)
  • Cell Biology (11360)
  • Clinical Trials (138)
  • Developmental Biology (6456)
  • Ecology (9995)
  • Epidemiology (2065)
  • Evolutionary Biology (13369)
  • Genetics (9378)
  • Genomics (12624)
  • Immunology (7733)
  • Microbiology (19122)
  • Molecular Biology (7482)
  • Neuroscience (41191)
  • Paleontology (301)
  • Pathology (1236)
  • Pharmacology and Toxicology (2145)
  • Physiology (3188)
  • Plant Biology (6885)
  • Scientific Communication and Education (1277)
  • Synthetic Biology (1901)
  • Systems Biology (5332)
  • Zoology (1091)