0 Abstract
RNA velocity analysis can predict cell state changes from single cell transcriptomics data. To interpret these cell state changes as part of underlying cellular trajectories, current approaches rely on visualization with 2D embeddings derived from principal components, t-distributed stochastic neighbor embedding, among others. However, these 2D embeddings can yield different representations of the underlying trajectories, hindering the interpretation of cell state changes. To address this challenge, we developed VeloViz to create RNA-velocity-informed 2D embeddings. We show that by taking into consideration the predicted future transcriptional states from RNA velocity analysis, VeloViz can help ensure a more reliable representation of underlying cellular trajectories. VeloViz is available as an R package at https://github.com/JEFworks-Lab/veloviz.
1 Introduction
Single cell transcriptomics provide a static snapshot of transcriptional states for individual cells. The continuum of transcriptional states for cells along dynamic processes can be used to infer how cell states may change over time (Tritschler et al., 2019; Saelens et al., 2019). Notably, RNA velocity analysis can be applied to infer dynamics of gene expression and predict the future transcriptional state of a cell from single cell RNA-sequencing and imaging data (La Manno et al., 2018; Xia et al., 2019).
To interpret cell state changes from RNA velocity analysis, current approaches project the observed current and predicted future transcriptional states onto 2-dimensional (2D) embeddings to visualize the putative directed cellular trajectory (La Manno et al., 2018; Zywitza et al., 2018; Bastidas-Ponce et al., 2019; Zhang et al., 2019). Previously used 2D embeddings include those derived from principal components (PC), t-distributed Stochastic Neighbor Embeddings (t-SNE), Uniform Manifold Approximation and Projection (UMAP), or diffusion maps (Coifman et al., 2005; Maaten and Hinton, 2008; McInnes et al., 2018). However, these approaches can yield different representations of the underlying trajectory. Furthermore, when intermediate cell states are not well represented, current 2D embeddings may not capture global relationships between cell subpopulations, thereby further hindering the interpretation of cell state changes (Kester and Oudenaarden, 2018; Weinreb et al., 2018).
Here, we developed VeloViz to visualize cellular trajectories by incorporating information from RNA velocity analysis. By taking into consideration cells’ predicted future transcriptional states inferred from RNA velocity analysis, VeloViz can help ensure that relationships between cell states are reflected in the 2D embedding, allowing for more reliable representation of underlying cellular trajectories.
2 Method
In order to create an RNA-velocity-informed 2D embedding, VeloViz uses each cell’s current observed and predicted future transcriptional states inferred from RNA velocity analysis to build a nearest neighbor graph between cells in the population (Figure 1A). Briefly, VeloViz computes a cell-cell composite distance between all cell pairs in the population (Fig 1A, Supplementary Information 1ii) and assigns graph edges to the k neighboring cells with the smallest composite distances. Edges are then pruned based on similarity thresholds (Supplementary Information 1iii). The resulting graph can be visualized as a 2D embedding using force-directed layout algorithms (Fruchterman and Reingold, 1991).
A) Workflow to create a VeloViz 2D embedding: 1) Observed current (Xc) and predicted future (Xp) transcriptional cell states inferred from RNA velocity are reduced into a common PC space; 2) composite distances (D) between all cell pairs are computed. Composite distance from Cell A to Cell X (DA→X) takes into account the similarity in transcriptional profiles (dAX) between Cell X’s observed current (Xc) and Cell A’s predicted future transcriptional state (Ap), and the cosine correlation between Cell A’s RNA-velocity (νA) and the change vector (tAX) representing a transition from Cell A’s current state (Ac) to Cell X’s current state (Xc). A distance weight (ω) is used to adjust the relative importance of transcriptional similarity and cosine correlation in the composite distance; 3) for each cell, graph edges are assigned to the k cells with the minimum composite distances to create a graph. Edge weights are computed based on composite distances as weightAB = max(D) - DAB; 4) edges assigned in 3. are removed (in grey, dashed) if they are above the similarity and/or distance thresholds. Edge shade corresponds to edge weight computed based on composite distance, with darker arrows representing edges with larger weights; 5) the resulting graph is visualized as a 2D embedding using a force-directed graph layout. B) VeloViz 2D embedding visualizing pancreatic endocrinogenesis with pre-endocrine intermediates removed creating a gap in the developmental trajectory. Inset shows the VeloViz embedding of the full dataset. Cells are colored by cell state annotations provided in (Bergen et al., 2020). Arrows show the projection of velocities derived from dynamical velocity modelling onto the VeloViz embeddings. Gap distances measure the median distance in the 2D embedding between the 300 cells before and after pre-endocrine cells in the developmental trajectory (Supplementary Information 2iii). White circle and square indicate the median coordinates of cells before and after pre-endocrine cells in the developmental trajectory, respectively. C-F) 2D embeddings visualizing pancreatic endocrinogenesis with removed pre-endocrine intermediates using PCA, t-SNE, UMAP, and diffusion mapping, respectively with arrows showing the projection of velocities derived from dynamical velocity modelling.
3 Results
3.1 Comparing VeloViz to other embeddings
To evaluate the performance of VeloViz, we first assessed VeloViz’s ability to capture trajectories of simulated data representing cycling or branching trajectories (Supplementary Information 2i) and compared to PC, t-SNE, UMAP, and diffusion map embeddings. We calculated a trajectory consistency (TC) score (Supplementary Information 2ii., (Boggust et al., 2019)) where TC scores closer to 1 indicate more accurate representations of the ground truth trajectory. Among evaluated trajectories, VeloViz embeddings had consistently high TC scores (Supplementary Figure 1). Next, we used VeloViz to visualize pancreatic endocrinogenesis single-cell RNA-sequencing (scRNA-seq) data, where cycling ductal cells give rise to endocrine progenitor-precursor (EP) cells, which then differentiate into hormone producing endocrine cell types (Alpha, Beta, Delta, and Epsilon cells) (Bastidas-Ponce et al., 2019). We observed that while all evaluated embeddings captured the trajectory of endocrine progenitors, VeloViz was better able to capture the cycling structure of ductal cells (Supplementary Figure 2). VeloViz, UMAP, and tSNE also captured the terminal branching differentiation into the different endocrine cell types, which is not clear in PC or diffusion map. Overall, VeloViz is able to capture trajectories of diverse topologies.
3.2 Performance with incomplete trajectories
To evaluate the performance of VeloViz in visualizing trajectories with missing intermediate cell states, we used simulated and real scRNA-seq data where some intermediate cells were removed, creating a trajectory gap. Because t-SNE and UMAP preferentially preserve local cell-cell relationships, we expected that these embeddings would result in two distinct clusters of cells before and after the simulated gap (Kobak and Berens, 2019; Heiser and Lau, 2020). Therefore, in addition to TC scores, we calculated a gap distance (Supplementary Information 2iii), which measures the distance in the 2D embedding space between cells before and after the simulated gap in the trajectory. Embeddings that preserve the underlying trajectory despite this simulated gap will have a smaller gap distance. Indeed, for the cycling trajectory where cells corresponding to a segment of the cycle were removed, VeloViz was the only embedding able to preserve the cycling structure. Likewise, for branching trajectories with missing intermediates, only VeloViz and PCA were able to preserve the underlying topology while tSNE and UMAP split cells before and after the simulated gap into distinct clusters as expected (Supplementary Figure 3). TC scores were consistently higher and the gap distance smaller for VeloViz than with t-SNE, UMAP, and diffusion map. Likewise, for the pancreatic endocrinogenesis scRNA-seq data, we removed pre-endocrine cells and used cell latent time (Bergen et al., 2020) to identify cells before and after pre-endocrine cells in the developmental trajectory and to calculate gap distances (Supplementary Information 2iii). Notably, the transition from endocrine progenitors into terminal endocrine cell types was best captured by VeloViz. As expected, t-SNE and UMAP split ductal and endocrine progenitor cells from terminal endocrine cell types, which is reflected in the gap distances (Figure 1B-F). Overall, VeloViz is able to visualize a more reliable presentation of underlying trajectories even when intermediate cell states are missing.
Funding
This work was supported by the National Institutes of Health [T32GM136577 to L.A.]
Conflict of Interest
none declared.