TY - JOUR T1 - Bioinformatic analysis of endogenous and exogenous small RNAs on lipoproteins JF - bioRxiv DO - 10.1101/246900 SP - 246900 AU - Ryan M. Allen AU - Shilin Zhao AU - Marisol A. Ramirez Solano AU - Danielle L. Michell AU - Yuhuan Wang AU - Yu Shyr AU - Praveen Sethupathy AU - MacRae F. Linton AU - Greg A. Graf AU - Quanhu Sheng AU - Kasey C. Vickers Y1 - 2018/01/01 UR - http://biorxiv.org/content/early/2018/01/11/246900.abstract N2 - High-throughput small RNA sequencing (sRNA-seq) has facilitated the discovery of many classes of small RNAs (sRNA) and helped establish the field of extracellular RNA (exRNA). Although several tools are available for sRNA-seq analysis, exRNAs present unique analytical challenges that are not met by current software. Therefore, we developed a novel data analysis pipeline specifically for exRNAs entitled, “Tools for Integrative Genome analysis of Extracellular sRNAs (TIGER).” To demonstrate the power of this tool, sRNA-seq was performed on high-density lipoproteins (HDL), apolipoprotein B-containing particles (APOB), bile, urine, and liver samples collected from wild-type (WT) and scavenger receptor BI knockout (SR-BI KO) mice. TIGER was able to account for approximately 60% of reads on lipoproteins and >85% of reads in liver, bile, and urine, a significant advance compared to existing software, largely due to the identification of non-host sRNAs in these datasets. A key advance for the TIGER pipeline is the ability to analyze host and non-host sRNAs across many classes at the genome, parent RNA, and individual fragment levels. Moreover, disparate sample types were compared at each level using hierarchical clustering, correlations, betadispersions, principal coordinate analysis, and permutational multivariate analysis of variance. TIGER analysis was also used to quantify distinct features of exRNAs, including 5’ microRNA (miRNA) variants, 3’ miRNA non-templated additions, parent RNA positional coverage, and length distributions by RNA class. Results suggest that the majority of sRNAs on lipoproteins are non-host sRNAs derived from bacterial sources in the microbiome and environment, specifically rRNA-derived sRNAs from proteobacteria. Here, we report novel discoveries of lipoprotein sRNAs that were facilitated by the new sRNA-seq analysis pipeline, TIGER, which has tremendous applicability for the field of exRNA.ABBREVIATIONSexRNAextracellular RNAsHDLhigh-density lipoproteinsHMBhuman microbiome projectlncRNAlong non-coding RNALDLlow-density lipoproteinsmiscRNAmiscellaneous sRNAncRNAnon-coding RNANIHNational Institutes of HealthntsnucleotidesosRNAother sRNArDRrRNA-derived sRNARPMReads Per Million total readsrRNAribosomal RNAsRNAsmall RNAssnDRsnRNA-derived sRNAsnoDRsnoRNA-derived sRNAsnoRNAsmall nucleolar RNAsnRNAsmall nuclear RNASR-BIscavenger receptor BIsRNA-seqsmall RNA sequencingtDRtRNA-derived sRNAtRNAtransfer RNAyDRY RNA-derived sRNA3’ UTR3’ untranslated regions ER -