scDVF: Single-cell Transcriptomic Deep Velocity Field Learning with Neural Ordinary Differential Equations

Recent advances in single-cell RNA sequencing technology provided unprecedented opportunities to simultaneously measure the gene expression profile and the transcriptional velocity of individual cells, enabling us to sample gene regulatory network dynamics along developmental trajectories. However, traditional methods have been challenged in offering a fundamental and quantitative explanation of the dynamics as differential equations due to the high dimensionality, sparsity, and complex gene interactions. Here, we present scDVF, a neural-network-based ordinary differential equation that can learn to model non-linear, high-dimensional single-cell transcriptome dynamics and describe gene expression changes of individual cells across time. We applied scDVF on multiple published datasets from different technical platforms and demonstrate its utility to 1) formulate transcriptome dynamics of different timescales; 2) measure the instability of individual cell states; and 3) identify developmental driver genes upstream of the signaling cascade. Benchmarking with state-of-the-art methods shows that scDVF can improve velocity field representation accuracy by at least 50% in out-of-sample cells. Further, our perturbation studies revealed that single-cell dynamical systems may exhibit properties similar to chaotic systems. In summary, scDVF allows for the data-driven discovery of differential equations that delineate single-cell transcriptome dynamics. Teaser Using neural networks to derive the ordinary differential equations behind single-cell transcriptome dynamics.

a fundamental and quantitative explanation of the dynamics as differential equations due to 23 the high dimensionality, sparsity, and complex gene interactions. Here, we present scDVF, 24 a neural-network-based ordinary differential equation that can learn to model non-linear, 25 high-dimensional single-cell transcriptome dynamics and describe gene expression changes 26 of individual cells across time. We applied scDVF on multiple published datasets from 27 different technical platforms and demonstrate its utility to 1) formulate transcriptome 28 dynamics of different timescales; 2) measure the instability of individual cell states; and 3) 29 identify developmental driver genes upstream of the signaling cascade. Benchmarking with 30 state-of-the-art methods shows that scDVF can improve velocity field representation 31 accuracy by at least 50% in out-of-sample cells. Further, our perturbation studies revealed 32 that single-cell dynamical systems may exhibit properties similar to chaotic systems. In 33 summary, scDVF allows for the data-driven discovery of differential equations that 34 delineate single-cell transcriptome dynamics. . Moreover, single-cell dynamical systems have a high degrees-of-freedom due to the 67 high dimensionality of the data, which could lead to errors in any dimension (8).

68
Inspired by recent developments in neural ODEs and data-driven dynamical systems (9, 10), 69 we present a computational framework called scDVF that learns to formulate the dynamics silico studies to explore the behavior of biological processes over time. In this regard, 74 scDVF differs substantially from most single-cell methods, in that the objective of our 75 framework is to derive neural-network-based differential equations describing single-cell 76 gene expression dynamics. To illustrate the robustness and general validity of our approach, 77 we performed analyses on developmental mouse pancreatic endocrinogenesis, dentate 78 gyrus, and developing neocortex, representing scRNA-seq experiments from different 79 tissues, technical platforms, and developmental time scales (11,12). With two additional 80 data sources (mouse gastrulation, developing human forebrain), we demonstrate the ability 81 for scDVF to deconvolve gene co-expression networks and benchmarked our method 82 against the state-of-the-art vector-field learning approach SparseVCF (4,13,14 Hence, we evaluated scDVF with an out-of-sample initial condition. After training the VAE 123 on neocortex cell states and velocities, we generated an out-of-sample cell as the initial 124 condition. The out-of-sample cell is simulated by adding noise to the gene expression state 125 of an existing cell, thereby representing a cell state that did not previously exist in the data. . c) The CCI reveals unstable fixed points indicative of cell fate commitment. d) Simulating dynamic trajectories can effectively denoise gene co-expression networks in sparse and noisy scRNAseq experiments, especially when compared to the static baseline. e) Genes that highly correlate with the CCI reveal driving forces behind endocrine progenitor dynamics. In particular, Neurog3, which positively correlates with the CCI, is a known pre-endocrine master regulator, and ATP1b1, which negatively correlates with the CCI, has an important function in maintaining homeostasis in stable and stationary cell types. These genes support the CCI as a metric for the stability of single-cell states. f) Differential gene expression analysis on the simulated pre-endocrine cells reveals key putative genes that correlate with the fate of transforming into an alpha versus a beta cell. Early perturbation of the top differentially expressed genes associated with an alpha cell fate resulted in a higher proportion of alpha cells from the perturbed pre-endocrine cells, suggesting a causal relationship through in silico studies.  short, we randomly sampled pre-endocrine cells ( = 1,000) as the initial conditions. By 175 allowing these simulated pre-endocrine cells to naturally evolve according to the dynamics 176 learned by scDVF, we observed a baseline 9:1 ratio of terminal beta versus alpha cell states.

177
The ratio of terminal cell states indicates that the beta cell state is a stronger attractive 178 terminal state than the alpha cell state, which corroborates with previous conclusions (19).

179
Then, we performed differential gene expression between initial conditions of different c) Visualizing the low-dimensional manifold of the high-degrees-of-freedom single-cell dynamical system, with the simulated out-of-sample cell trajectory (in red and viridis). d) The CCI reveals unstable fixed points indicative of cell fate commitment. e) Simulating dynamic trajectories can effectively denoise gene co-expression networks in sparse and noisy scRNAseq experiments, especially when compared to the static baseline. f) Genes that highly correlate with the CCI reveal driving forces behind dentate gyrus dynamics. In particular, IGFBPL-1, which most positively correlates with the CCI, regulates neurodevelopment, and GRM5, which most negatively correlates with the CCI, encodes glutamate receptors in stable and stationary neurons. These genes further substantiate the CCI as a metric for the stability of single-cell states. g) Differential gene expression analysis on the simulated pre-endocrine cells reveals key putative genes that correlate with the fate of transforming into a pyramidal versus granule cells. Early perturbation of the top differentially expressed genes associated with a pyramidal cell fate resulted in a higher proportion of pyramidal cells from the perturbed Nbl2 cells, suggesting a causal relationship through in silico studies.
Page 8 of 19 using droplet-based scRNA-seq (Fig. 3a) (11). After obtaining a neural network 200 representation of the dentate gyrus dynamics, an out-of-sample cell was simulated by 201 perturbing the gene expression state of an Nbl2 cell. With the out-of-sample cell as the initial 202 condition, we used scDVF to simulate an out-of-sample cell trajectory, which moved along 203 the existing granule cell trajectory in the data (Fig. 3b). Further, the VAE embeddings 204 properly encoded the developmental hierarchy of cell types in the low-dimensional dynamic 205 manifold (Fig. 3c).

206
When examining critical cell states in the dentate gyrus, we observed an abrupt gene 207 expression shift in the developmental manifold, which can be visualized when ordering cells 208 in latent time (Fig. 3d). Specifically, the change in gene expression marks the transition 209 from nIPC to Nbl1 cells and suggests fate commitment during the transition. After  (Fig. 4a). The latent layer embeddings also properly encoded the 245 progression from neural progenitors to differentiated neurons (Fig. 4b).
In addition to the latent time ordering of cells, we also observed high criticality in IPs and associated with scRNA-seq (Fig. 5a). We hypothesize that cells in denoised trajectories 272 simulated from scDVF (with a representative initial condition) could amplify the 273 correlations within functional gene modules (Fig. 5b). Indeed, the gene co-expression 274 network of cells in retrograde trajectories has more significant gene correlations compared 275 to co-expression networks from static cells (Fig. 2e, Fig. 3e). Further, we biclustered the co- orders of magnitude higher enrichment for cell-type-specific GO terms compared to static 281 cell clusters (Fig. 5c) Explicitly deriving differential equations for all gene interactions is a challenging task.

308
Therefore, we tackled the problem with a data-driven approach. We considered each cell in 309 Fig. 5. Disentangling trajectory-specific gene co-expression networks. a) Schematic for reducing variability along a developmental trajectory due to sparsity and noise in scRNA-seq experiments with denoising VAEs in scDVF. b) We used a representative initial condition (e.g., the median expression profile for a cluster of cells) to simulate denoised cell trajectories according to the dynamics learned by scDVF. Compared to static cell clusters, the dynamic cells in denoised trajectories have stronger correlation between genes, which leads to better coherence between functional gene modules. c) GO enrichment on cell-type-specific terms from the most significant functional gene module. Our method improves upon existing gene co-expression network approaches on cell-type-specific GO term enrichment by at least two orders of magnitude. d) Benchmarking with state-of-the-art vector field learning method SparseVFC shows that scDVF can improve out-of-sample velocity vector prediction accuracy by at least 50%.
Page 12 of 19 scRNA-seq as an instance sampled from a dynamic system, composed of a state vector

Data Collection and Preprocessing
358 scRNA-seq data (pancreatic endocrinogenesis, dentate gyrus, mouse gastrulation, and 359 human forebrain) were downloaded. After computing the gene expression count matrix.

360
The top 3,000 variable genes and cells with a minimum of 20 transcripts were selected.

361
Velocity genes were found using log-transformed data, and the moments were estimated  In Silico Perturbation Studies 485 We divide an in silico perturbation study into three steps:

530
Hierarchical biclustering was performed on the co-expression matrices, and three gene 531 clusters were identified from each co-expression matrix, representing three functional 532 modules. We performed GO enrichment analysis on each functional module using 533 GOATOOLS with Fisher's exact test (48). Further, we calculated the Benjamin-Hochberg 534 false discovery rates to correct for multiple testing. To compare between two co-535 expression matrices, we considered the most significant enrichment out of the three 536 clusters for each GO term. In Fig. 3a and Fig. 5a, the most significantly enriched GO 537 terms associated with biological processes are listed next to each gene cluster.