Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Comprehensive analysis of RNA-sequencing to find the source of 1 trillion reads across diverse adult human tissues

Serghei Mangul, Harry Taegyun Yang, Nicolas Strauli, Franziska Gruhl, Hagit T. Porath, Kevin Hsieh, Linus Chen, Timothy Daley, Stephanie Christenson, Agata Wesolowska-Andersen, Roberto Spreafico, Cydney Rios, Celeste Eng, Andrew D. Smith, Ryan D. Hernandez, Roel A. Ophoff, Jose Rodriguez Santana, Erez Y. Levanon, Prescott G. Woodruff, Esteban Burchard, Max A. Seibold, Sagiv Shifman, Eleazar Eskin, Noah Zaitlen
doi: https://doi.org/10.1101/053041
Serghei Mangul
1 Department of Computer Science, University of California Los Angeles, Los Angeles, USA
2 Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: smangul@ucla.edu Noah.Zaitlen@ucsf.edu
Harry Taegyun Yang
1 Department of Computer Science, University of California Los Angeles, Los Angeles, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nicolas Strauli
3 Biomedical Sciences Graduate Program, University of California, San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Franziska Gruhl
4 Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
5 Swiss Institute of Bioinformatics, Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hagit T. Porath
6 The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kevin Hsieh
1 Department of Computer Science, University of California Los Angeles, Los Angeles, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Linus Chen
7 Department of Bioengineering, University of California Los Angeles, Los Angeles, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Timothy Daley
8 Department of Molecular and Computational Biology, University of Southern California, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stephanie Christenson
9 Division of Pulmonary, Critical Care, Sleep and Allergy, Department of Medicine, and Cardiovascular Research Institute, University of California, San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Agata Wesolowska-Andersen
10 Center for Genes, Environment, and Health, National Jewish Health, Denver, CO, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Roberto Spreafico
2 Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cydney Rios
10 Center for Genes, Environment, and Health, National Jewish Health, Denver, CO, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Celeste Eng
11 Department of Medicine, University of California, San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrew D. Smith
8 Department of Molecular and Computational Biology, University of Southern California, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ryan D. Hernandez
12 Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
13 Institute for Quantitative Biosciences, University of California, San Francisco, CA, USA
14 Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Roel A. Ophoff
15 Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University California Los Angeles, Los Angeles, USA
16 Department of Human Genetics, University of California Los Angeles, Los Angeles, USA
17 Department of Psychiatry, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jose Rodriguez Santana
18 Pediatric Pulmonology, San Juan, Puerto Rico
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Erez Y. Levanon
6 The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Prescott G. Woodruff
19 Department of Pediatrics, National Jewish Health, Denver, CO, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Esteban Burchard
9 Division of Pulmonary, Critical Care, Sleep and Allergy, Department of Medicine, and Cardiovascular Research Institute, University of California, San Francisco, CA, USA
10 Center for Genes, Environment, and Health, National Jewish Health, Denver, CO, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Max A. Seibold
8 Department of Molecular and Computational Biology, University of Southern California, CA, USA
19 Department of Pediatrics, National Jewish Health, Denver, CO, USA
20 University of Colorado School of Medicine, Denver CO, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sagiv Shifman
21 Department of Genetics, The Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eleazar Eskin
1 Department of Computer Science, University of California Los Angeles, Los Angeles, USA
16 Department of Human Genetics, University of California Los Angeles, Los Angeles, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Noah Zaitlen
9 Division of Pulmonary, Critical Care, Sleep and Allergy, Department of Medicine, and Cardiovascular Research Institute, University of California, San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: smangul@ucla.edu Noah.Zaitlen@ucsf.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

High throughput RNA sequencing technologies have provided invaluable research opportunities across distinct scientific domains by producing quantitative readouts of the transcriptional activity of both entire cellular populations and single cells. The majority of RNA-Seq analyses begin by mapping each experimentally produced sequence (i.e., read) to a set of annotated reference sequences for the organism of interest. For both biological and technical reasons, a significant fraction of reads remains unmapped. In this work, we develop Read Origin Protocol (ROP) to discover the source of all reads originating from complex RNA molecules, recombinant T and B cell receptors, and microbial communities. We applied ROP to 8,641 samples across 630 individuals from 54 tissues. A fraction of RNA-Seq data (n=86) was obtained in-house; the remaining data was obtained from the Genotype-Tissue Expression (GTEx v6) project. To generalize the reported number of accounted reads, we also performed ROP analysis on thousands of different, randomly selected, and publicly available RNA-Seq samples in the Sequence Read Archive (SRA). Our approach can account for 99.9% of 1 trillion reads of various read length across the merged dataset (n=10641). Using in-house RNA-Seq data, we show that immune profiles of asthmatic individuals are significantly different from the profiles of control individuals, with decreased average per sample T and B cell receptor diversity. We also show that immune diversity is inversely correlated with microbial load. Our results demonstrate the potential of ROP to exploit unmapped reads in order to better understand the functional mechanisms underlying connections between the immune system, microbiome, human gene expression, and disease etiology. ROP is freely available at https://github.com/smangul1/rop and currently supports human and mouse RNA-Seq reads.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted June 12, 2017.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Comprehensive analysis of RNA-sequencing to find the source of 1 trillion reads across diverse adult human tissues
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Comprehensive analysis of RNA-sequencing to find the source of 1 trillion reads across diverse adult human tissues
Serghei Mangul, Harry Taegyun Yang, Nicolas Strauli, Franziska Gruhl, Hagit T. Porath, Kevin Hsieh, Linus Chen, Timothy Daley, Stephanie Christenson, Agata Wesolowska-Andersen, Roberto Spreafico, Cydney Rios, Celeste Eng, Andrew D. Smith, Ryan D. Hernandez, Roel A. Ophoff, Jose Rodriguez Santana, Erez Y. Levanon, Prescott G. Woodruff, Esteban Burchard, Max A. Seibold, Sagiv Shifman, Eleazar Eskin, Noah Zaitlen
bioRxiv 053041; doi: https://doi.org/10.1101/053041
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Comprehensive analysis of RNA-sequencing to find the source of 1 trillion reads across diverse adult human tissues
Serghei Mangul, Harry Taegyun Yang, Nicolas Strauli, Franziska Gruhl, Hagit T. Porath, Kevin Hsieh, Linus Chen, Timothy Daley, Stephanie Christenson, Agata Wesolowska-Andersen, Roberto Spreafico, Cydney Rios, Celeste Eng, Andrew D. Smith, Ryan D. Hernandez, Roel A. Ophoff, Jose Rodriguez Santana, Erez Y. Levanon, Prescott G. Woodruff, Esteban Burchard, Max A. Seibold, Sagiv Shifman, Eleazar Eskin, Noah Zaitlen
bioRxiv 053041; doi: https://doi.org/10.1101/053041

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4851)
  • Biochemistry (10792)
  • Bioengineering (8040)
  • Bioinformatics (27288)
  • Biophysics (13984)
  • Cancer Biology (11120)
  • Cell Biology (16049)
  • Clinical Trials (138)
  • Developmental Biology (8778)
  • Ecology (13279)
  • Epidemiology (2067)
  • Evolutionary Biology (17354)
  • Genetics (11687)
  • Genomics (15915)
  • Immunology (11029)
  • Microbiology (26070)
  • Molecular Biology (10637)
  • Neuroscience (56533)
  • Paleontology (417)
  • Pathology (1732)
  • Pharmacology and Toxicology (3003)
  • Physiology (4544)
  • Plant Biology (9628)
  • Scientific Communication and Education (1615)
  • Synthetic Biology (2685)
  • Systems Biology (6975)
  • Zoology (1508)