PT - JOURNAL ARTICLE AU - Dehghannasiri, Roozbeh AU - Henderson, George AU - Bierman, Rob AU - Chaung, Kaitlin AU - Baharav, Tavor AU - Wang, Peter AU - Salzman, Julia TI - Unsupervised reference-free inference reveals unrecognized regulated transcriptomic complexity in human single cells AID - 10.1101/2022.12.06.519414 DP - 2022 Jan 01 TA - bioRxiv PG - 2022.12.06.519414 4099 - http://biorxiv.org/content/early/2022/12/07/2022.12.06.519414.short 4100 - http://biorxiv.org/content/early/2022/12/07/2022.12.06.519414.full AB - Myriad mechanisms diversify the sequence content of eukaryotic transcripts at the DNA and RNA level with profound functional consequences. Examples include diversity generated by RNA splicing and V(D)J recombination. Today, these and other events are detected with fragmented bioinformatic tools that require predefining a form of transcript diversification; moreover, they rely on alignment to a necessarily incomplete reference genome, filtering out unaligned sequences which can be among the most interesting. Each of these steps introduces blindspots for discovery. Here, we develop NOMAD+, a new analytic method that performs unified, reference-free statistical inference directly on raw sequencing reads, extending the core NOMAD algorithm to include a micro-assembly and interpretation framework. NOMAD+ discovers broad and new examples of transcript diversification in single cells, bypassing genome alignment and without requiring cell type metadata and impossible with current algorithms. In 10,326 primary human single cells in 19 tissues profiled with SmartSeq2, NOMAD+ discovers a set of splicing and histone regulators with highly conserved intronic regions that are themselves targets of complex splicing regulation and unreported transcript diversity in the heat shock protein HSP90AA1. NOMAD+ simultaneously discovers diversification in centromeric RNA expression, V(D)J recombination, RNA editing, and repeat expansions missed by or impossible to measure with existing bioinformatic methods. NOMAD+ is a unified, highly efficient algorithm enabling unbiased discovery of an unprecedented breadth of RNA regulation and diversification in single cells through a new paradigm to analyze the transcriptome.Competing Interest StatementThe authors have declared no competing interest.