## 0 Abstract

RNA velocity analysis can predict cell state changes from single cell transcriptomics data. To interpret these cell state changes as part of underlying cellular trajectories, current approaches rely on visualization with 2D embeddings derived from principal components, t-distributed stochastic neighbor embedding, among others. However, these 2D embeddings can yield different representations of the underlying trajectories, hindering the interpretation of cell state changes. To address this challenge, we developed VeloViz to create RNA-velocity-informed 2D embeddings. We show that by taking into consideration the predicted future transcriptional states from RNA velocity analysis, VeloViz can help ensure a more reliable representation of underlying cellular trajectories. VeloViz is available as an R package at https://github.com/JEFworks-Lab/veloviz.

## 1 Introduction

Single cell transcriptomics provide a static snapshot of transcriptional states for individual cells. The continuum of transcriptional states for cells along dynamic processes can be used to infer how cell states may change over time (Tritschler *et al*., 2019; Saelens *et al*., 2019). Notably, RNA velocity analysis can be applied to infer dynamics of gene expression and predict the future transcriptional state of a cell from single cell RNA-sequencing and imaging data (La Manno *et al*., 2018; Xia *et al*., 2019).

To interpret cell state changes from RNA velocity analysis, current approaches project the observed current and predicted future transcriptional states onto 2-dimensional (2D) embeddings to visualize the putative directed cellular trajectory (La Manno *et al*., 2018; Zywitza *et al*., 2018; Bastidas-Ponce *et al*., 2019; Zhang *et al*., 2019). Previously used 2D embeddings include those derived from principal components (PC), t-distributed Stochastic Neighbor Embeddings (t-SNE), Uniform Manifold Approximation and Projection (UMAP), or diffusion maps (Coifman *et al*., 2005; Maaten and Hinton, 2008; McInnes *et al*., 2018). However, these approaches can yield different representations of the underlying trajectory. Furthermore, when intermediate cell states are not well represented, current 2D embeddings may not capture global relationships between cell subpopulations, thereby further hindering the interpretation of cell state changes (Kester and Oudenaarden, 2018; Weinreb *et al*., 2018).

Here, we developed VeloViz to visualize cellular trajectories by incorporating information from RNA velocity analysis. By taking into consideration cells’ predicted future transcriptional states inferred from RNA velocity analysis, VeloViz can help ensure that relationships between cell states are reflected in the 2D embedding, allowing for more reliable representation of underlying cellular trajectories.

## 2 Method

In order to create an RNA-velocity-informed 2D embedding, VeloViz uses each cell’s current observed and predicted future transcriptional states inferred from RNA velocity analysis to build a nearest neighbor graph between cells in the population (Figure 1A). Briefly, VeloViz computes a cell-cell composite distance between all cell pairs in the population (Fig 1A, Supplementary Information 1ii) and assigns graph edges to the *k* neighboring cells with the smallest composite distances. Edges are then pruned based on similarity thresholds (Supplementary Information 1iii). The resulting graph can be visualized as a 2D embedding using force-directed layout algorithms (Fruchterman and Reingold, 1991).

## 3 Results

### 3.1 Comparing VeloViz to other embeddings

To evaluate the performance of VeloViz, we first assessed VeloViz’s ability to capture trajectories of simulated data representing cycling or branching trajectories (Supplementary Information 2i) and compared to PC, t-SNE, UMAP, and diffusion map embeddings. We calculated a trajectory consistency (TC) score (Supplementary Information 2ii., (Boggust et al., 2019)) where TC scores closer to 1 indicate more accurate representations of the ground truth trajectory. Among evaluated trajectories, VeloViz embeddings had consistently high TC scores (Supplementary Figure 1). Next, we used VeloViz to visualize pancreatic endocrinogenesis single-cell RNA-sequencing (scRNA-seq) data, where cycling ductal cells give rise to endocrine progenitor-precursor (EP) cells, which then differentiate into hormone producing endocrine cell types (Alpha, Beta, Delta, and Epsilon cells) (Bastidas-Ponce *et al*., 2019). We observed that while all evaluated embeddings captured the trajectory of endocrine progenitors, VeloViz was better able to capture the cycling structure of ductal cells (Supplementary Figure 2). VeloViz, UMAP, and tSNE also captured the terminal branching differentiation into the different endocrine cell types, which is not clear in PC or diffusion map. Overall, VeloViz is able to capture trajectories of diverse topologies.

### 3.2 Performance with incomplete trajectories

To evaluate the performance of VeloViz in visualizing trajectories with missing intermediate cell states, we used simulated and real scRNA-seq data where some intermediate cells were removed, creating a trajectory gap. Because t-SNE and UMAP preferentially preserve local cell-cell relationships, we expected that these embeddings would result in two distinct clusters of cells before and after the simulated gap (Kobak and Berens, 2019; Heiser and Lau, 2020). Therefore, in addition to TC scores, we calculated a gap distance (Supplementary Information 2iii), which measures the distance in the 2D embedding space between cells before and after the simulated gap in the trajectory. Embeddings that preserve the underlying trajectory despite this simulated gap will have a smaller gap distance. Indeed, for the cycling trajectory where cells corresponding to a segment of the cycle were removed, VeloViz was the only embedding able to preserve the cycling structure. Likewise, for branching trajectories with missing intermediates, only VeloViz and PCA were able to preserve the underlying topology while tSNE and UMAP split cells before and after the simulated gap into distinct clusters as expected (Supplementary Figure 3). TC scores were consistently higher and the gap distance smaller for VeloViz than with t-SNE, UMAP, and diffusion map. Likewise, for the pancreatic endocrinogenesis scRNA-seq data, we removed pre-endocrine cells and used cell latent time (Bergen et al., 2020) to identify cells before and after pre-endocrine cells in the developmental trajectory and to calculate gap distances (Supplementary Information 2iii). Notably, the transition from endocrine progenitors into terminal endocrine cell types was best captured by VeloViz. As expected, t-SNE and UMAP split ductal and endocrine progenitor cells from terminal endocrine cell types, which is reflected in the gap distances (Figure 1B-F). Overall, VeloViz is able to visualize a more reliable presentation of underlying trajectories even when intermediate cell states are missing.

## Funding

This work was supported by the National Institutes of Health [T32GM136577 to L.A.]

## Conflict of Interest

none declared.