Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Determining sufficient sequencing depth in RNA-Seq differential expression studies

View ORCID ProfileAndrew J. Bass, David G. Robinson, View ORCID ProfileJohn D. Storey
doi: https://doi.org/10.1101/635623
Andrew J. Bass
Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544 USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Andrew J. Bass
David G. Robinson
Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544 USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John D. Storey
Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544 USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for John D. Storey
  • For correspondence: jstorey@princeton.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

RNA-Seq studies require a sufficient read depth to detect biologically important genes. Sequencing below this threshold will reduce statistical power while sequencing above will provide only marginal improvements in power and incur unnecessary sequencing costs. Although existing methodologies can help assess whether there is sufficient read depth, they are unable to guide how many additional reads should be sequenced to reach this threshold. We provide a new method called superSeq that models the relationship between statistical power and read depth. We apply the superSeq framework to 393 RNA-Seq experiments (1,021 total contrasts) in the Expression Atlas and find the model accurately predicts the increase in statistical power gained by increasing the read depth. Based on our analysis, we find that most published studies (> 70%) are undersequenced, i.e., their statistical power can be improved by increasing the sequencing read depth. In addition, the extent of saturation is highly dependent on statistical methodology: only 9.5%, 29.5%, and 26.6% of contrasts are saturated when using DESeq2, edgeR, and limma, respectively. Finally, we also find that there is no clear minimum per-transcript read depth to guarantee saturation for an entire technology. Therefore, our framework not only delineates key differences among methods and their impact on determining saturation, but will also be needed even as technology improves and the read depth of experiments increases. Researchers can thus use superSeq to calculate the read depth to achieve required statistical power while avoiding unnecessary sequencing costs.

Footnotes

  • https://github.com/StoreyLab/superSeq-manuscript

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted May 13, 2019.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Determining sufficient sequencing depth in RNA-Seq differential expression studies
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Determining sufficient sequencing depth in RNA-Seq differential expression studies
Andrew J. Bass, David G. Robinson, John D. Storey
bioRxiv 635623; doi: https://doi.org/10.1101/635623
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Determining sufficient sequencing depth in RNA-Seq differential expression studies
Andrew J. Bass, David G. Robinson, John D. Storey
bioRxiv 635623; doi: https://doi.org/10.1101/635623

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4688)
  • Biochemistry (10380)
  • Bioengineering (7695)
  • Bioinformatics (26373)
  • Biophysics (13550)
  • Cancer Biology (10729)
  • Cell Biology (15463)
  • Clinical Trials (138)
  • Developmental Biology (8509)
  • Ecology (12844)
  • Epidemiology (2067)
  • Evolutionary Biology (16887)
  • Genetics (11416)
  • Genomics (15493)
  • Immunology (10638)
  • Microbiology (25257)
  • Molecular Biology (10241)
  • Neuroscience (54597)
  • Paleontology (402)
  • Pathology (1671)
  • Pharmacology and Toxicology (2899)
  • Physiology (4355)
  • Plant Biology (9263)
  • Scientific Communication and Education (1588)
  • Synthetic Biology (2561)
  • Systems Biology (6789)
  • Zoology (1472)