Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Probabilistic estimation of short sequence expression using RNA-Seq data and the “positional bootstrap”

Hui Y. Xiong, Leo J. Lee, Hannes Bretschneider, Jiexin Gao, Nebojsa Jojic, Brendan J. Frey
doi: https://doi.org/10.1101/046474
Hui Y. Xiong
1Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario M5S 3G4,
2Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario M5S 3E1,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Leo J. Lee
1Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario M5S 3G4,
2Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario M5S 3E1,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hannes Bretschneider
1Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario M5S 3G4,
2Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario M5S 3E1,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jiexin Gao
1Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario M5S 3G4,
2Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario M5S 3E1,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nebojsa Jojic
3Microsoft Research, Redmond, WA 98052, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brendan J. Frey
1Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario M5S 3G4,
2Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario M5S 3E1,
4Canadian Institute for Advanced Research, Toronto, Ontario, M5G 1Z8, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

When estimating expression of a transcript or part of a transcript using RNA-seq data, it is commonly assumed that reads are generated uniformly from positions within the transcript. While this assumption is acceptable for long transcript sequences where reads from many positions are averaged, it frequently leads to large errors for short sequences, e.g., less than 100 bp. Analysis of short sequences, such as when studying splice junctions and microRNAs, is increasingly important and necessitates addressing errors in short-sequence expression estimation. Indeed, when we examined RNA-seq data from diverse studies, we found that large errors are introduced by variations in RNA-seq coverage due to sequence content, experimental conditions and sample preparation.

We developed a technique that we call the positional bootstrap, which quantifies the level of uncertainty in expression induced by non-uniform coverage. Unlike methods that attempt to correct for biases in coverage, but do so by making strong assumptions about the form of those biases, the positional bootstrap can quantify the noise induced by all types of bias, including unknown ones. Results obtained using independently generated RNA-seq datasets show that the positional bootstrap increases the accuracy of estimates of alternative splicing levels, tissue-differential alternative splicing and tissue differential expression, by a factor of up to 10.

A Python implementation of the algorithm to quantify splicing levels is freely available from github.com/PSI-Lab/BENTO-Seq.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted April 02, 2016.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Probabilistic estimation of short sequence expression using RNA-Seq data and the “positional bootstrap”
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Probabilistic estimation of short sequence expression using RNA-Seq data and the “positional bootstrap”
Hui Y. Xiong, Leo J. Lee, Hannes Bretschneider, Jiexin Gao, Nebojsa Jojic, Brendan J. Frey
bioRxiv 046474; doi: https://doi.org/10.1101/046474
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Probabilistic estimation of short sequence expression using RNA-Seq data and the “positional bootstrap”
Hui Y. Xiong, Leo J. Lee, Hannes Bretschneider, Jiexin Gao, Nebojsa Jojic, Brendan J. Frey
bioRxiv 046474; doi: https://doi.org/10.1101/046474

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Molecular Biology
Subject Areas
All Articles
  • Animal Behavior and Cognition (3514)
  • Biochemistry (7371)
  • Bioengineering (5347)
  • Bioinformatics (20328)
  • Biophysics (10048)
  • Cancer Biology (7781)
  • Cell Biology (11353)
  • Clinical Trials (138)
  • Developmental Biology (6454)
  • Ecology (9984)
  • Epidemiology (2065)
  • Evolutionary Biology (13359)
  • Genetics (9375)
  • Genomics (12614)
  • Immunology (7729)
  • Microbiology (19118)
  • Molecular Biology (7478)
  • Neuroscience (41163)
  • Paleontology (301)
  • Pathology (1235)
  • Pharmacology and Toxicology (2142)
  • Physiology (3183)
  • Plant Biology (6882)
  • Scientific Communication and Education (1276)
  • Synthetic Biology (1900)
  • Systems Biology (5328)
  • Zoology (1091)