RT Journal Article SR Electronic T1 Dumpster diving in RNA-sequencing to find the source of every last read JF bioRxiv FD Cold Spring Harbor Laboratory SP 053041 DO 10.1101/053041 A1 Mangul, Serghei A1 Yang, Harry Taegyun A1 Strauli, Nicolas A1 Gruhl, Franziska A1 Daley, Timothy A1 Christenson, Stephanie A1 Wesolowska-Andersen, Agata A1 Spreafico, Roberto A1 Rios, Cydney A1 Eng, Celeste A1 Smith, Andrew D. A1 Hernandez, Ryan D. A1 Ophoff, Roel A. A1 Santana, Jose Rodriguez A1 Woodruff, Prescott G. A1 Burchard, Esteban A1 Seibold, Max A. A1 Shifman, Sagiv A1 Eskin, Eleazar A1 Zaitlen, Noah YR 2016 UL http://biorxiv.org/content/early/2016/05/13/053041.abstract AB High throughput RNA sequencing technologies have provided invaluable research opportunities across distinct scientific domains by producing quantitative readouts of the transcriptional activity of both entire cellular populations and single cells. The majority of RNA-Seq analyses begin by mapping each experimentally produced sequence (i.e., read) to a set of annotated reference sequences for the organism of interest. For both biological and technical reasons, a significant fraction of reads remains unmapped. In this work we develop a read origin protocol (ROP) aimed at discovering the source of all reads, originated from complex RNA molecules, recombinant antibodies and microbial communities. Our approach can account for 98.8% of all reads across poly(A) and ribo-depletion protocols. Furthermore, using ROP we show that immune profiles of asthmatic individuals are significantly different from the control individuals with decreased average per sample T-cell/B-cell receptor diversity and that immune diversity is inversely correlated with microbial load. This demonstrates the potential of ROP to exploit unmapped reads to better understand the functional mechanisms underlying the connection between immune system, microbiome, human gene expression, and disease etiology.The ROP pipeline is freely available at https://sergheimangul.wordpress.com/rop/