Abstract
The proliferation of single cell transcriptomics has potentiated our ability to unveil patterns that reflect dynamic cellular processes such as the regulation of gene transcription. In this study, we leverage a broad collection of single cell RNA-seq data to identify the gene partners whose expression is most coordinated with each human and mouse transcription regulator (TR). We assembled 120 human and 103 mouse scRNA-seq datasets from the literature (>28 million cells), constructing a single cell coexpression network for each. We aimed to understand the consistency of TR coexpression profiles across a broad sampling of biological contexts, rather than examine the preservation of context-specific signals. Our workflow therefore explicitly prioritizes the patterns that are most reproducible across cell types. Towards this goal, we characterize the similarity of each TR’s coexpression within and across species. We create single cell coexpression rankings for each TR, demonstrating that this aggregated information recovers literature curated targets on par with ChIP-seq data. We then combine the coexpression and ChIP-seq information to identify candidate regulatory interactions supported across methods and species. Finally, we highlight interactions for the important neural TR ASCL1 to demonstrate how our compiled information can be adopted for community use.
Author Summary A common way to analyze gene expression (transcriptomics) data is to correlate gene transcript levels across samples for every pair of genes (coexpression). Coordinated expression between genes may imply a shared biological function, though this warrants cautious interpretation given assumptions about cellular processes inferred from RNA abundances alone. Still, coexpression inference is often used to nominate genes whose expression may be controlled by transcription regulators (TRs). The rapid generation of diverse single cell transcriptomics data has unlocked our ability to discover coexpression patterns across individual cells — though these signals are often noisy. Reproducible patterns across studies can help distinguish meaningful biological relationships from spurious correlations. We used this study to analyze a broad collection of single cell data spanning numerous tissues in human and mouse to infer global TR coexpression patterns. We aimed to learn which interactions were generally observable, to better potentiate future examinations of reproducible coexpression in specific contexts. We evaluate the predictive performance of these global single cell coexpression rankings using independent gene regulation evidence, and highlight TR-gene pairs that are supported across data modalities as well as species. By disseminating these rankings, we hope that other researchers can extract insight for their own TRs of interest.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
- Added saturation analysis (shown in fig2, moved GO to supplement) - analysis updated to Animal TFDB V4 and DIOPT V9 - Supp. Fig 2 goes over dataset relationship to global signal
https://borealisdata.ca/dataset.xhtml?persistentId=doi:10.5683/SP3/HJ1B24