RT Journal Article SR Electronic T1 Unsupervised reference-free inference reveals unrecognized regulated transcriptomic complexity in human single cells JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.12.06.519414 DO 10.1101/2022.12.06.519414 A1 Dehghannasiri, Roozbeh A1 Henderson, George A1 Bierman, Rob A1 Chaung, Kaitlin A1 Baharav, Tavor A1 Wang, Peter A1 Salzman, Julia YR 2022 UL http://biorxiv.org/content/early/2022/12/07/2022.12.06.519414.abstract AB Myriad mechanisms diversify the sequence content of eukaryotic transcripts at the DNA and RNA level with profound functional consequences. Examples include diversity generated by RNA splicing and V(D)J recombination. Today, these and other events are detected with fragmented bioinformatic tools that require predefining a form of transcript diversification; moreover, they rely on alignment to a necessarily incomplete reference genome, filtering out unaligned sequences which can be among the most interesting. Each of these steps introduces blindspots for discovery. Here, we develop NOMAD+, a new analytic method that performs unified, reference-free statistical inference directly on raw sequencing reads, extending the core NOMAD algorithm to include a micro-assembly and interpretation framework. NOMAD+ discovers broad and new examples of transcript diversification in single cells, bypassing genome alignment and without requiring cell type metadata and impossible with current algorithms. In 10,326 primary human single cells in 19 tissues profiled with SmartSeq2, NOMAD+ discovers a set of splicing and histone regulators with highly conserved intronic regions that are themselves targets of complex splicing regulation and unreported transcript diversity in the heat shock protein HSP90AA1. NOMAD+ simultaneously discovers diversification in centromeric RNA expression, V(D)J recombination, RNA editing, and repeat expansions missed by or impossible to measure with existing bioinformatic methods. NOMAD+ is a unified, highly efficient algorithm enabling unbiased discovery of an unprecedented breadth of RNA regulation and diversification in single cells through a new paradigm to analyze the transcriptome.Competing Interest StatementThe authors have declared no competing interest.