Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Dumpster diving in RNA-sequencing to find the source of every last read

Serghei Mangul, Harry Taegyun Yang, Nicolas Strauli, Franziska Gruhl, Timothy Daley, Stephanie Christenson, Agata Wesolowska-Andersen, Roberto Spreafico, Cydney Rios, Celeste Eng, Andrew D. Smith, Ryan D. Hernandez, Roel A. Ophoff, Jose Rodriguez Santana, Prescott G. Woodruff, Esteban Burchard, Max A. Seibold, Sagiv Shifman, Eleazar Eskin, Noah Zaitlen
doi: https://doi.org/10.1101/053041
Serghei Mangul
1Department of Computer Science, University of California Los Angeles, Los Angeles, USA
2Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: smangul@ucla.edu Noah.Zaitlen@ucsf.edu
Harry Taegyun Yang
1Department of Computer Science, University of California Los Angeles, Los Angeles, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nicolas Strauli
3Biomedical Sciences Graduate Program, University of California, San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Franziska Gruhl
4Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
5Swiss Institute of Bioinformatics, Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Timothy Daley
6Department of Molecular and Computational Biology, University of Southern California, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stephanie Christenson
7Division of Pulmonary, Critical Care, Sleep and Allergy, Department of Medicine, and Cardiovascular Research Institute, University of California, San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Agata Wesolowska-Andersen
8Center for Genes, Environment, and Health, National Jewish Health, Denver, CO, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Roberto Spreafico
2Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cydney Rios
8Center for Genes, Environment, and Health, National Jewish Health, Denver, CO, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Celeste Eng
9Department of Medicine, University of California, San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrew D. Smith
6Department of Molecular and Computational Biology, University of Southern California, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ryan D. Hernandez
10Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
11Institute for Quantitative Biosciences, University of California, San Francisco, CA, USA
12Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Roel A. Ophoff
13Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University California Los Angeles, Los Angeles, USA
14Department of Human Genetics, University of California Los Angeles, Los Angeles, USA
15Department of Psychiatry, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jose Rodriguez Santana
16Pediatric Pulmonology, San Juan, Puerto Rico
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Prescott G. Woodruff
7Division of Pulmonary, Critical Care, Sleep and Allergy, Department of Medicine, and Cardiovascular Research Institute, University of California, San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Esteban Burchard
9Department of Medicine, University of California, San Francisco, CA, USA
10Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Max A. Seibold
8Center for Genes, Environment, and Health, National Jewish Health, Denver, CO, USA
17Department of Pediatrics, National Jewish Health, Denver, CO, USA
18University of Colorado School of Medicine, Denver CO, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sagiv Shifman
19Department of Genetics, The Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eleazar Eskin
1Department of Computer Science, University of California Los Angeles, Los Angeles, USA
14Department of Human Genetics, University of California Los Angeles, Los Angeles, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Noah Zaitlen
9Department of Medicine, University of California, San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: smangul@ucla.edu Noah.Zaitlen@ucsf.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

High throughput RNA sequencing technologies have provided invaluable research opportunities across distinct scientific domains by producing quantitative readouts of the transcriptional activity of both entire cellular populations and single cells. The majority of RNA-Seq analyses begin by mapping each experimentally produced sequence (i.e., read) to a set of annotated reference sequences for the organism of interest. For both biological and technical reasons, a significant fraction of reads remains unmapped. In this work we develop a read origin protocol (ROP) aimed at discovering the source of all reads, originated from complex RNA molecules, recombinant antibodies and microbial communities. Our approach can account for 98.8% of all reads across poly(A) and ribo-depletion protocols. Furthermore, using ROP we show that immune profiles of asthmatic individuals are significantly different from the control individuals with decreased average per sample T-cell/B-cell receptor diversity and that immune diversity is inversely correlated with microbial load. This demonstrates the potential of ROP to exploit unmapped reads to better understand the functional mechanisms underlying the connection between immune system, microbiome, human gene expression, and disease etiology.

The ROP pipeline is freely available at https://sergheimangul.wordpress.com/rop/

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted May 13, 2016.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Dumpster diving in RNA-sequencing to find the source of every last read
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Dumpster diving in RNA-sequencing to find the source of every last read
Serghei Mangul, Harry Taegyun Yang, Nicolas Strauli, Franziska Gruhl, Timothy Daley, Stephanie Christenson, Agata Wesolowska-Andersen, Roberto Spreafico, Cydney Rios, Celeste Eng, Andrew D. Smith, Ryan D. Hernandez, Roel A. Ophoff, Jose Rodriguez Santana, Prescott G. Woodruff, Esteban Burchard, Max A. Seibold, Sagiv Shifman, Eleazar Eskin, Noah Zaitlen
bioRxiv 053041; doi: https://doi.org/10.1101/053041
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Dumpster diving in RNA-sequencing to find the source of every last read
Serghei Mangul, Harry Taegyun Yang, Nicolas Strauli, Franziska Gruhl, Timothy Daley, Stephanie Christenson, Agata Wesolowska-Andersen, Roberto Spreafico, Cydney Rios, Celeste Eng, Andrew D. Smith, Ryan D. Hernandez, Roel A. Ophoff, Jose Rodriguez Santana, Prescott G. Woodruff, Esteban Burchard, Max A. Seibold, Sagiv Shifman, Eleazar Eskin, Noah Zaitlen
bioRxiv 053041; doi: https://doi.org/10.1101/053041

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4235)
  • Biochemistry (9136)
  • Bioengineering (6784)
  • Bioinformatics (24001)
  • Biophysics (12129)
  • Cancer Biology (9534)
  • Cell Biology (13778)
  • Clinical Trials (138)
  • Developmental Biology (7636)
  • Ecology (11702)
  • Epidemiology (2066)
  • Evolutionary Biology (15513)
  • Genetics (10644)
  • Genomics (14326)
  • Immunology (9483)
  • Microbiology (22840)
  • Molecular Biology (9090)
  • Neuroscience (48995)
  • Paleontology (355)
  • Pathology (1482)
  • Pharmacology and Toxicology (2570)
  • Physiology (3846)
  • Plant Biology (8331)
  • Scientific Communication and Education (1471)
  • Synthetic Biology (2296)
  • Systems Biology (6192)
  • Zoology (1301)