Abstract
The utility of single-cell RNA sequencing (scRNA-seq) is premised on the notion that transcriptional state can faithfully reflect cell phenotype. However, scRNA-seq measurements are noisy and sparse, with individual transcript counts showing limited correlation with cell phenotype markers such as protein expression. To better characterize cell states from scRNA-seq data, researchers analyze gene programs---sets of covarying genes---rather than individual transcripts. We hypothesized that more accurate estimation of gene covariation, especially at a local (i.e., cell-state) rather than global (i.e., experimental) scale, could better capture cell phenotypes. However, the field lacks appropriate mathematical frameworks for analyzing gene covariation: coexpression is quantified as a symmetric positive-definite matrix, where even basic operations like arithmetic differences lack biological interpretability. Here we introduce Sceodesic, which exploits the Riemannian manifold structure of gene coexpression matrices to quantify cell state-specific coexpression patterns using the log-Euclidean metric from differential geometry. Unlike principal components analysis and non-negative matrix factorization, which infer only global covariation, Sceodesic efficiently discovers local covariation patterns and organizes them into interpretable, linear gene programs. Sceodesic outperforms existing approaches in predicting protein expression levels, distinguishing transcriptional responses to gene perturbations, and identifying biologically meaningful programs in fetal development. By respecting the mathematical structure of gene coexpression, Sceodesic bridges the gap between biological variability and statistical analysis of scRNA-seq data, enabling more accurate characterization of cell phenotypes. Software availability: https://singhlab.net/Sceodesic
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Additional analysis with more datasets. The whole manuscript was edited as well.