RT Journal Article SR Electronic T1 Unsupervised analysis of multi-experiment transcriptomic patterns with SegRNA identifies unannotated transcripts JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.07.28.225193 DO 10.1101/2020.07.28.225193 A1 Mendez, Mickaël A1 , A1 Scott, Michelle S. A1 Hoffman, Michael M. YR 2021 UL http://biorxiv.org/content/early/2021/11/22/2020.07.28.225193.abstract AB Background Exploratory analysis of complex transcriptomic data presents multiple challenges. Many methods often rely on preexisting gene annotations, impeding identification and characterization of new transcripts. Even for a single cell type, comprehending the diversity of RNA species transcribed at each genomic region requires combining multiple datasets, each enriched for specific types of RNA. Currently, examining combinatorial patterns in these data requires time-consuming visual inspection using a genome browser.Method We developed a new segmentation and genome annotation (SAGA) method, SegRNA, that integrates data from multiple transcriptome profiling assays. SegRNA identifies recurring combinations of signals across multiple datasets measuring the abundance of transcribed RNAs. Using complementary techniques, SegRNA builds on the Segway SAGA framework by learning parameters from both the forward and reverse DNA strands. SegRNA’s unsupervised approach allows exploring patterns in these data without relying on pre-existing transcript models.Results We used SegRNA to generate the first unsupervised transcriptome annotation of the K562 chronic myeloid leukemia cell line, integrating multiple types of RNA data. Combining RNA-seq, CAGE, and PRO-seq experiments together captured a diverse population of RNAs throughout the genome. As expected, SegRNA annotated patterns associated with gene components such as promoters, exons, and introns. Additionally, we identified a pattern enriched for novel small RNAs transcribed within intergenic, intronic, and exonic regions. We applied SegRNA to FANTOM6 CAGE data characterizing 285 lncRNA knockdowns. Overall, SegRNA efficiently summarizes diverse multi-experiment data.Competing Interest StatementThe authors have declared no competing interest.