Generative modeling of single-cell population time series for inferring cell differentiation landscapes

Existing computational methods that use single-cell RNA-sequencing for cell fate prediction either summarize observations of cell states and their couplings without modeling the underlying differentiation process, or are limited in their capacity to model complex differentiation landscapes. Thus, contemporary methods cannot predict how cells evolve stochastically and in physical time from an arbitrary starting expression state, nor can they model the cell fate consequences of gene expression perturbations. We introduce PRESCIENT (Potential eneRgy undErlying Single Cell gradIENTs), a generative modeling framework that learns an underlying differentiation landscape from single-cell time-series gene expression data. Our generative model framework provides insight into the process of differentiation and can simulate differentiation trajectories for arbitrary gene expression progenitor states. We validate our method on a recently published experimental lineage tracing dataset that provides observed trajectories. We show that this model is able to predict the fate biases of progenitor cells in neutrophil/macrophage lineages when accounting for cell proliferation, improving upon the best-performing existing method. We also show how a model can predict trajectories for cells not found in the model’s training set, including cells in which genes or sets of genes have been perturbed. PRESCIENT is able to accommodate complex perturbations of multiple genes, at different time points and from different starting cell populations. PRESCIENT models are able to recover the expected effects of known modulators of cell fate in hematopoiesis and pancreatic β cell differentiation.


Introduction 2
Waddington's epigenetic landscape posits that the state of differentiating cells can be modeled by balls rolling down a hilly landscape toward valleys corresponding to cell fates. In this view, topography of the differentiation landscape corresponds to biological 'potential energy', with cells traveling from regions of high energy toward regions of low energy (Ferrell, 2012). Mapping this landscape is essential to improving our understanding of how cells are driven to transient and terminal states in vivo and to enable precise manipulation of cell fates in vitro (Cohen and Melton, 2011).
Single-cell RNA-sequencing (scRNA-seq) technology has enabled the study of developmental landscapes by the observation of gene expression in single cells sampled at multiple stages of differentiation. However, these studies provide snapshots of heterogeneous cell states of a given differentiation process and typically do not directly observe lineage relationships between cells. Furthermore, because cells are destroyed in order to collect transcriptomic information, the ancestors and children of any observed cell are not measured. Recently, experimental lineage tracing methods that couple various barcoding strategies with scRNA-seq have been described that identify relationships between cells (Biddy et al., 2018;Wagner and Klein, 2020;Weinreb et al., 2020). However, these methods are still in relative infancy and have not yet been widely adopted.
Computational methods have also been developed to infer cell lineages from scRNA-seq data in the absence of experimental tracing (Figure 1a). However, these methods typically summarize observations of cell states and couplings emergent of the underlying process and have limited to no capacity for modeling differentiation as a stochastic process in physical time (Saelens et al., 2019;Tanay and Regev, 2017;Trapnell et al., 2012). The predominant approach is pseudo-temporal inference, which orders cells along an arbitrary one-dimensional measurement representing 'differentiation time', and hence cannot model differentiation dynamics with respect to real, physical time. Other methods have also emerged for the specific task of cell fate prediction. For example, Waddington-OT predicts long-range cell-cell probabilistic couplings by reframing the task of inferring cell relationships between population snapshots as an unbalanced optimal transport problem (Schiebinger et al., 2019). Another method, Fate-ID iteratively builds ensembled cell-type classifiers from labeled terminal cell states (Herman et al., 2018). However, these methods only summarize observations of cell states and couplings emergent of the underlying differentiation process.
Recently, a small number of methods have described approaches to modeling differentiation as a process, but they have been limited either in how the model is solved, or in modeling capacity. For example, Population Balance Analysis (PBA) solves a reaction-diffusion partial differential equation describing differentiation but is forced to use a non-parametric solution due to computational constraints (Weinreb et al., 2018). Similarly, pseudo-dynamics models a diffusion process but only in a one-dimensional cell state (Fischer et al., 2019).
Neither have shown to be able to simulate differentiation trajectories starting from an unseen cell state. However, a model that is able to generalize to data points outside of the training set is required to be able to make 3 predictions for the consequences of in silico perturbations that can result in a cell state that has not yet been observed.
We build upon a modeling framework by Hashimoto et al. in which cellular differentiation is modeled as a diffusion process over a potential landscape parameterized by a neural network (Hashimoto et al., 2016). Neural networks are powerful black-box function approximators that allow arbitrary and extremely complex parameterizations of the landscape. Since this modeling framework is fully generative, it is naturally able to sample trajectories for data points not present in the training set. However, the original work was limited to considering relatively simple models and small scRNA-seq datasets with up to 10 genes. It also did not evaluate trajectories sampled from the model and did not take into account cell proliferation, which is well-known to be an important factor in cellular differentiation (Ruijtenberg and van den Heuvel, 2016;Schiebinger et al., 2019).
We introduce PRESCIENT (Potential eneRgy undErlying Single Cell gradIENTs), a generative modeling framework that is fit using longitudinal scRNA-seq datasets that include irregularly sampled time-series with large numbers of cells and high-dimensional measurements. We validate our model on a newly published lineage tracing dataset. In particular, we evaluate our model on its ability to recover held-out time points, and to predict cell fate biases established by experimental lineage tracing. We build upon previous work in three ways. First, we study different parameterizations of underlying potential functions, and show that more complex models can improve the performance of recovering a held-out time point. Second, we extend the modeling framework to incorporate cell proliferation, and show that taking into account proliferation is crucial for fate prediction. Third, we demonstrate how PRESCIENT can be used to model perturbations of multiple genes, at different time points, and from different starting states. We are able to recover expected changes in final cell fate distributions when interrogating our models using perturbations of known regulators of cell fate in hematopoiesis and pancreatic β cell differentiation. This enables large unbiased in silico perturbation experiments, both individual and combinatorial, to aid the design of in vitro genetic perturbational screens.

Results
Learning a generative model of cellular differentiation from high-dimensional scRNA-seq data PRESCIENT models cellular differentiation as a diffusion process over a gene expression landscape parameterized by a potential function that we wish to identify given only time-series population snapshots of single-cell RNA expression. In this diffusion process, evolution of a cell's state at a given time is governed by a drift term, corresponding to the force acting on that cell given its current state, and a noise term, corresponding to stochasticity. In particular, we consider the case where the drift term is defined to be the negative gradient of 4 the potential function, such that the potential induces a force that naturally drives cells toward regions of low potential (Figure 1b; Methods). This stochastic process can then be simulated via first-order time discretization to sample trajectories for a given cell. The potential function is fit by minimizing a regularized Wasserstein loss between empirical and predicted populations at the observed time points (Figure 1c; Methods) (Hashimoto et al., 2016).
To enable the modeling framework to operate on large scRNA-seq datasets, we fit models on PCA projections of the scaled gene expression data rather than on the gene expression itself. Euclidean distance in PCA space has been successfully used in down-stream scRNA-seq analysis methods such as clustering and cross-dataset integration and thus use it as a reasonable distance metric for scRNA-seq data. We also take advantage of modern deep learning methods to study alternative parameterizations of the underlying potential function. We use automatic differentiation of a neural network potential function to calculate potential function gradients to determine the associated drift function of the model. Finally, we extend the framework to take into account cell proliferation by weighting each cell in the source population according to its expected number of descendants in the objective function. This is in contrast to Waddington-OT which uses estimates of cell proliferation to instead formulate an unbalanced optimal transport problem. To assess the importance of incorporating cell proliferation, we study models assuming a priori knowledge of cell proliferation, which can be directly estimated from the data by computing the number of descendants for each starting cell given lineage tracing data. However, since we are largely concerned with model estimation in the absence of lineage tracing, we also study models where cell proliferation is estimated from gene expression.
We first validate our model on a recently published lineage tracing dataset by Weinreb et al. which used DNA barcodes to track clonal trajectories during mouse hematopoiesis (Weinreb et al., 2020). We evaluate our model on two tasks: recovery of a population at a held-out time point, and cell fate prediction (Figure 1d  be classified as operating in pseudo-time or real time (x-axis), and the extent to which the method models the underlying differentiation process (y-axis). PRESCIENT is shown in red (b) Observations of population-level time-series data are used in a generative framework that models the underlying dynamic process in physical time. Evolution of a cell's state is governed by a drift term and a noise term. The drift, depicted by solid arrows, 6 is defined as the negative gradient of the potential function, depicted by the color gradient in the background.
Dashed lines correspond to noise. The model is fit using observations of population-level time-series data, depicted as solid circles. Simulations of cell states are depicted as dashed circles. (c) Cartoon depicting model fitting process. The neural network parameterizing the underlying drift function takes as input the PCA projections of gene expression data at observed time points (again depicted as solid lines). The stochastic process is then simulated via first-order time discretization to produce a population at the next time step, and so on. This proceeds until the next observed time point, at which the loss between the simulated and predicted population is minimized. The model was validated using two tasks (d) held-out recovery, where the model was asked to predict the marginal distribution of a held-out time point, and (e) fate prediction, where the model was asked to predict the fate distribution outcome of a given progenitor cell. Fate prediction can be applied to cells observed in the dataset (left) or cell states in which some perturbation has been applied in silico (right). As shown, the perturbation results in a shift of fate distribution outcomes.
Generative modeling recovers population state at a held-out time point Deep learning enables the modeling of complex potential functions with correspondingly large parameter sets.
We hypothesized that more expressive parameterizations of the potential function might improve modeling capacity of the diffusion process. We hence evaluated potential functions of differing complexity on the task of predicting the marginal distribution of gene expression at a held-out time point.
We fit models with four different architectures (fully connected neural network models with 1 layer of 1000 units, 1 layer of 4000 units, 2 layers of 200 units, or 2 layers of 400 units, all using softplus as the activation function), and using three different regularization strengths (tau = 1e-3, 1e-6, 1e-9). These architectures were chosen such that 2-layer models could be compared with a 1-layer model with roughly the same number of parameters.
Models were fit via pre-training on day 6, and then trained on days 2 and 6. In particular, we used only the subset of data for which lineage tracing data is available to enable comparison of models with and without ground truth cell proliferation rates. For evaluation, we computed the Wasserstein distance between the simulated cell population and the empirically observed cell population at days 4 (training) and 6 (testing) (Figure 2a). The testing distance was evaluated at the epoch with the lowest training distance ( Figure S1a-b).
As baselines, we also computed the distance of the simulated population at day 4 to the actual populations at day 2 and 6. We next computed the distance of the actual population at day 4 to a linearly interpolated population as implemented by Waddington-OT (indicated on plots by a dashed line) ( Figure 2b). Note that we should expect linear interpolation to perform well when populations are sampled at dense time-points. Overall, all models outperformed these baselines except when the regularization strength was too high (i.e. except when = 1e-3), ( Figure 2b-d, S1d).

7
For both models trained with or without cell proliferation, we observed that although 1-layer models may perform as well as 2-layer models on training, 2-layer models achieve lower testing distance even when the number of parameters in the 1-layer models was greater than or equivalent to in the 2-layer models (Figure 2c, S1b-c). This could suggest that the 2-layer models are either better able to approximate the potential function, or are easier to fit. We also observed that model training and testing distance was best with a moderate regularization strength of =1e-6 across model architectures (Figure 2d, Figure S1c-d). In general, models accounting for cell proliferation performed slightly worse at predicting the held-out time point than models that did not account for cell proliferation.
We reasoned that hyper-parameters selected on this task would be transferable to models for fate prediction because good recovery of the held-out time point should imply that the model had been able to find a good approximation of the underlying potential function. We hence use 2-layer models with 400 units and a regularization strength of 1e-6 for subsequent tasks.  Figure S1e). We measure performance as the Pearson correlation with respect to the actual clonal fate bias given by the lineage tracing data. Since we observed that the clonal fate bias as measured in the lineage tracing data was strongly bimodal ( Figure S1g), we also considered a second metric, which we define to be the AUROC of classifying a given cell as having a clonal fate bias of > 0.5.
We first fit a PRESCIENT model on the subset of data for which lineage tracing data is available (Figure 3c). We have shown that accounting for cell proliferation in the modeling framework improves the performance of the fate prediction task. However, these models were fit with ground truth cell proliferation rates derived from the lineage tracing data, which is not only generally unavailable for most scRNA-seq time-series datasets, but even missing for 62.3% of this dataset ( Figure 3c). We next looked at whether cell proliferation rates derived from gene expression could achieve similar performance. To this end, we modified an approach originally described by Schiebinger et al. to use KEGG annotations of cell cycle and apoptosis genes which were also highly variable in the dataset to estimate the number of descendants a cell is expected to have. These estimates correlated weakly but significantly (r = 0.207, p << 1e-45) with the ground-truth rates calculated from the lineage tracing data ( Figure 3g, S1d). We compared models fit to the same set of cells using either our gene-expression-derived or ground-truth proliferation estimates. We found that models incorporating gene-expression-derived estimates achieved r = 0.399 +/-0.025 and AUROC = 0.725 +/-0.008, a slight improvement over ground-truth proliferation -based models (Figure 3d-e).
Generative models can predict cell states not observed during training We next hypothesized that PRESCIENT should be able to predict the fate of cells not observed during training.
We expect that a PRESCIENT model has learned a good approximation of the underlying potential function and 10 hence should be able to generalize to data points that were not observed during training. To test our hypothesis, we used our proliferation estimates to fit models to all cells with and without lineage tracing data.
We found that models incorporating estimated proliferation rates vastly outperformed models that did not in predicting the future expression of the previously described test set. We also found that model performance was

Perturbation of transcription factors involved in the regulation of hematopoiesis
We hypothesized that a PRESCIENT model trained on the Weinreb et. al. dataset (2 layers of 400 units, seed 1, epoch 2500) should be able to recapitulate the effects on cell fate when perturbing transcription factors known to be involved in regulation of neutrophil or monocyte differentiation. The model predicts the outcome of each 13 perturbation on all cell types represented in the dataset ( Figure S2). In particular, we focus on a set of transcription factors previously identified to be potentially antagonistically correlated with either monocyte and neutrophil fate in progenitor cells by MetaCell analyses of a haematopoietic stem cell dataset, many of which are supported by existing literature. The same authors observed in CRISPR-seq experiments that Irf8 deficiency correlated with increase of neutrophils and blocked monocyte differentiation (Giladi et al., 2018). Existing literature also supports several of the transcription factors identified. For example, Feinberg et al. previously showed that forced expression of Klf4 in hematopoietic stem cells induces exclusive monocyte differentiation, while Klf4 deficiency increases granulocyte (of which neutrophils are the most abundant type) differentiation (Feinberg et al., 2007). As another example, Ly6C-monocytes have previously been found to be absent in Nrf4a1-/-mice (Hanna et al., 2011). With regards to transcription factors associated with neutrophil fate, CEBPE deficiency in -/-mice has similarly been shown to block maturation of granulocyte lineages (Yamanaka et al., 1997). Mutations in CEBPE are also one of the known genetic causes of specific granule deficiency, a rare disorder characterized by abnormal neutrophils (Serwas et al., 2018).
We first perturbed Lmo4, Cebpe, Mxd1, and Dach1, transcription factors previously identified to be involved in granulopoiesis, the production of mature neutrophils. These transcription factors were also present in the highly variable gene set from the Weinreb et. al. hematopoiesis time course (Figure 4c). To simulate over-expression, we set the scaled normalized expression of the target gene(s) to 5, while to simulate knock-down/out, we set the scaled normalized expression of the target gene(s) to -2.5. As expected, we observed that down-regulation of these TFs led to a relative decrease in the fraction of neutrophils while up-regulation of these TFs led to a relative increase in the fraction of neutrophils with respect to the unperturbed population ( Figure 4c). We next perturbed transcription factors involved in monocyte development, including Irf8, Irf5, Klf4, and Nr4a1, and observed similar results ( Figure 4c).
We next tested if different magnitudes of the perturbation had different effects. For overexpression, we tested zscore = 2, 5, 10, while for knock-down/out, we tested z-score = -0.5, -1, -2.5. In general, we found that increasing the magnitude of the perturbation resulted in larger changes in the relative fraction of neutrophils (Figure 4d-e).
Furthermore, while individual perturbations resulted in a mixture of significant and non-significant changes in the 14 final neutrophil populations, ensemble perturbations consistently resulted in significant changes (p<0.5) in the fraction of the neutrophil population (Figure 4f-g). We also tested multiple sets of randomly selected non-TFs to ensure the changes in final cell fate were not simply a result of perturbations causing random model changes.
These randomly selected non-TFs do not result in an observed shift in neutrophil and monocyte fates in the final time point (Figure 4h), suggesting that our model is robust to random effects.

Perturbation of gene sets involved in the regulation of β-cell development induced at multiple timepoints and developmental stages
We next applied PRESCIENT to another well-characterized differentiation system, the production of pancreatic We first hypothesized that PRESCIENT should be able to recapitulate the effects on cell fate when perturbing transcription factors known to be involved in the regulation of endocrine induction and specification in the starting population (Figure 5a-b). In endocrine/exocrine induction, previous work has shown that NEUROG3 and NKX6 activation is associated with the endocrine lineage, while PTF1A and HES1 is associated with the exocrine lineage (Dassaye et al., 2016;Gradwohl et al., 2000;Johansson et al., 2007;van der Meulen and Huising, 2015).
For example, (Jensen et al., 2000) showed that HES1 activation results in an enrichment of the exocrine cell lineage, while (Johansson et al., 2007) showed that overexpression of NEUROG3 results in general expansion 16 of endocrine cells. Furthermore, PTF1A and NKX6 have been shown to act antagonistically (Schaefer and Peifer, 2019), as do NEUROG3 and HES1 (Jensen et al., 2000), when modulating the endocrine/exocrine fates. When introducing ensembled in silico perturbations of NEUROG3 and NKX6.1 at day 0, we observe that this results in an increase in endocrine cell types that scales with the magnitude of perturbations (p<0.05) with a reciprocal decrease in exocrine cell types. We also show that PTF1A and HES1 overexpression result in the opposite effect hypoglycemia and an increase in β-cell production (Collombat et al., 2003(Collombat et al., , 2005. In CRISPR-Cas9-induced homozygous PAX4 knock-out rabbits, severe hyperglycemia was observed as a result of a major decrease in insulin-producing β-cells and a corresponding increase in α-cells (Xu et al., 2018). Introducing computational overexpression of ARX (z=5,10) to progenitor cells at day 0 resulted in significant increases in the final proportion of α cells at the expense of β cells (Figure 6a). The same overexpression of PAX4 resulted in a significant increase in the final proportion of β cells (p<0.05) at the expense of α cells (Figure 6a). We observed similar results when performing underexpression experiments ( Figure S4).
To demonstrate the scale at which in silico experiments can be conducted, we next did a larger screen of 200+ TFs (Figure 6b). At p < 0.01 & log2(FC)>0.5, we found that this unbiased in silico screen identified 10 TFs (α specification increase) and 18 TFs (β specification increase) that altered cell fate. The computational screen identified several known fate-specific factors, such as ARX and IRX2 for α-cells (Baron et al., 2016;Collombat et al., 2003) and NKX2.2, NKX6.1, PAX4, PDX1, and MAFA for β-cells. NKX2.2 has previously been associated with repression of the α-cell fate, along with NKX6.1 and PDX1 (Gannon et al., 2008;Papizan et al., 2011;Schaffer et al., 2010Schaffer et al., , 2013. MAFA expression is generally siloed to β-cells, and in MAFA-deficient mice about two-thirds of the β-cell population is lost (Artner et al., 2010). This screen also identified factors common to both 17 alpha and beta-cell lineages, including MAFB, PAX6, and ISL1. PAX6 has previously been reported to be associated with both α and β-cell fate, but Pax6 mutant mice show a more significant decrease in α-cells, which we also observe. In addition, we identified 19 TFs for EC specification (pval < 0.01 & log2(FC)>0.5), showing how PRESCIENT can be used to generate hypotheses for CRISPR genetic screens (Figure 6b).
We next asked if PRESCIENT could recapitulate the known timing of TFs during endocrine induction. To this end, we performed experiments in which we introduced perturbations to the same endocrine/exocrine axis TFs   (Figure 6c, e). In contrast, PDX1 has been shown to continue to promote β-cell neogenesis late into the endocrine induction pathway. Forced expression of PDX1 in NEUROG3 + cells increased the ratio of β-cells to α-cells in mouse embryos and adult mice (Yang et al., 2011). We show that computational perturbations to PDX1 increase the fraction of β-cells when introduced in progenitor cells and neurog3late cells (Figure 6c, e), recapitulating the multiple contexts in which perturbations to PDX1 can modulate endocrine fate. Similarly, MAFA and MAFB are known to continue to modulate fate late into endocrine induction (Artner et al., 2010;Hang and Stein, 2011), and we show that perturbations to both MAFA and MAFB result in increases in β-cell proportion when introduced to both early pancreatic progenitors and cells in late endocrine induction (Figure 6c,e). While ISL1 has been previously reported to stimulate both α and β-cell generation, the timing of ISL1 activation is less well characterized. Our model suggests that ISL1 has a role in endocrine specification in both early pancreatic progenitor phase and late in the endocrine induction pathway (Figure 6c,e).

20
Finally, we observed that modulation of α-cell fate largely occurs via perturbations induced in the pancreatic progenitor phase of the stage 5 differentiation protocol, with diminished effects of all endocrine specification TFs in neurog3early, neurog3mid, and neurog3late (Figure 6d,e). This could suggest that α-cell determination occurs early in the stage 5 differentiation protocol. We observe similar outcomes with in silico knockdown experiments ( Figure S5) as well as no effect of perturbations to randomly sampled non-TF genes that are not involved in apoptotic/proliferative signatures ( Figure S4c).

Discussion
PRESCIENT is a generative modeling framework for learning potential landscapes from population-level time series scRNA-seq data. We build upon previous work in two important ways. First, we used deep learning to approximate complex potential functions that underlie development. Second, we incorporated cell proliferation into the modeling framework via weighting of cells in the loss function. We validated PRESCIENT on a recent experimental lineage tracing dataset that provides gold standard trajectories for the mouse hematopoietic lineage, and show that it outperforms other existing fate prediction methods on the task of predicting the final cell fate distribution of a given progenitor cell.
We found that the incorporation of cell proliferation rates greatly improved the performance of PRESCIENT cell fate prediction on a gold-standard lineage tracing dataset. Cell proliferation is well-known to be an important part of cellular differentiation but is often neglected when modeling differentiation. When lineage tracing data is unavailable we estimate cell proliferation rates in an approach similar to that previously described by Schiebinger et al. Incorporating iterative updating of cell proliferation rates into our framework in the future would reduce reliance on our gene-expression-based estimates (Schiebinger et al., 2019).

22
PRESCIENT marks a departure from the predominant methods for analyzing scRNA-seq studies of cellular differentiation. Computational methods for lineage inference have been dominated by pseudo-time approaches that do not attempt to model the stochastic or dynamic nature of cell fate determination. More recent fate prediction methods such as Waddington-OT and Population Balance Analysis either summarize observations of the emergent process or suffer from modeling limitations (Schiebinger et al., 2019;Weinreb et al., 2018).
However, we have demonstrated an important predictive advantage to fully generational models that seek to describe the underlying differentiation landscape. After the model has learned the landscape, it can generate trajectories for unseen data points.
Generative models can hence be interrogated to propose hypotheses for possible perturbations. We show that inducing perturbations in well-studied regulatory genes of hematopoiesis and β-cell differentiation result in expected changes in fate outcome. We also show that we can introduce complex perturbations consisting of multiple genes, or at different time points or in different starting populations. The ability to study how these different factors interact enables large computational experiments that can help limit the number of in vitro experiments needed to achieve a desired cell fate, especially when considering combinatorial perturbations and timing of perturbations in a differentiation process. Using PRESCIENT to identify genetic targets of CRISPRscreens and small molecules can help aid the design of new directed differentiation and reprogramming protocols as well as fine-tuning existing approaches to improve cell fraction outcomes. We show an example of this type of unbiased, large-scale computational perturbational screen for in vitro β-cell differentiation in which we perturbed 200+ TFs and identified target genes that could cause significant shifts in β-cell, α-cell, and EC-cell fates. While this work was limited to TFs to show the utility of the method, non-TF targets, such as signaling pathway effectors, can also be tested using PRESCIENT. However, it is important to note that the model is subject to constraints and assumptions. For example, learning a model requires that the final time point of the dataset is at steady-state, and model performance improves with more densely sampled time-points. There also remain challenges to confidently suggest gene sets for experimental perturbation. One problem is that information is lost about individual genes when transforming data into PCA space, and lowly-expressed genes important to cell fate decisions may be dropped altogether in the scRNA-seq data. Another is that the association of certain genes with specific cell fates does not necessarily imply causality.

23
Future extensions of generative modeling can accommodate other forms of data. For example, with lineage tracing data a model's objective function can be modified to maximize the likelihood of observing individual trajectories. Data on RNA velocity (La Manno et al., 2018) can be used to constrain the drift function directly. We expect that constraining the model with additional sources of information should improve the quality of the underlying landscape inferred.
( ) given by the stochastic differential equation where ( )represents the k-dimensional state of a cell at time t, ( ( )) is a drift term representing the force acting on a cell given its state, and ( )corresponds to unit Brownian motion. In particular, the drift function is defined to be the negative of the gradient of a potential function, ( ) = − ( ), such that intuitively, the potential function ( ) can be thought of as inducing a gradient field driving cells from regions of high potential to low potential. Within the conceptual framework of Waddington's epigenetic landscape, this potential function corresponds to the height of the landscape. This process can be simulated via first-order time discretization Where lineage tracing data is available, we define the number of descendants a cell is expected to have at time t to be the number of cells sharing a particular clonal lineage barcode at t divided by the number of cells sharing that clonal lineage barcode at the current time. A pseudocount of 1 is added to allow for dropout.
In the absence of lineage tracing data, we estimate the number of descendants using a modified approach from Schiebinger et al.. Let n be the number of descendants, b be the birth rate, d be the death rate, and g be growth.
Then using a birth-death process, To estimate the birth and death scores ( , ) , we first calculate the mean of the z-scores of genes annotated to birth (KEGG_CELL_CYCLE) and death (KEGG_APOPTOSIS). We use these alternative annotations as many of the genes in the original gene sets proposed by Schiebinger et al. were not present in the set of variable genes in the Weinreb et al. datasets. Then, following Weinreb et al., we smooth these scores over the cells using an iterative procedure: Where is the score for a given cell i at the current iteration, are the scores for the k = 20 neighbors of the cell i computed on euclidean distance in PCA space, and = 0.1. This smoothing procedure is performed over 5 iterations.
Then, the birth and death scores are fit to logistic functions to obtain the birth and death rates, To simplify the fitting procedure, we reasoned that we might expect − to be close to 1 to flatten outliers.
Hence, we set k to be 0.001 = (− − ). Then we set 0 and based on the expected minimum and maximum growth rates. For the Weinreb et al. dataset, we set 0 = 0.3 and = 1.2, and found the minimum and maximum number of descendants to be similar to that observed in the clonal lineage tracing data at days 4 and 6. We used the same procedure for estimating growth rates on the Veres et al. dataset. 27 Model implementation and optimization We use deep learning methods to learn the parameters of the potential function ( ). Automatic differentiation can be used to evaluate the drift function ( ) = − ( ) without having to derive the analytical form. Then, by retaining the computation graph, losses can be calculated with respect to the drift function while gradients are accumulated with respect to the potential function. For all models, we set dt and sd to be 0.1.
Optimization of ( ) was performed using the Adam optimizer with a batch size of 1/10th the size of the training set. Models were pre-trained as previously described by Hashimoto et al. by first optimizing only the entropic regularizer via contrastive divergence using stochastic gradient descent with a learning rate of 1e-9. Then, training proceeds at the learning rate described below, with a scheduler that multiplied the learning rate by 0.9 every 100 iterations, and using gradient clipping with a max norm of 0.1. The Wasserstein error is approximated using a multi-scale Sinkhorn algorithm as implemented by the GeomLoss library (v0.2.3), with a scaling of 0.7 and a blur of 0.1. The GeomLoss library allows for efficient, stable computation of gradients that bypasses naive backpropagation (Feydy et al., 2019).
All models were implemented in PyTorch 1.4.0 (Paszke et al., 2019). Models were trained on either Titan RTX or GeForce RTX 2080 Ti GPUs.

Quantification and statistical analysis
Preprocessing of existing scRNA-seq datasets Data for the Weinreb et al. experiments was downloaded from https://github.com/AllonKleinLab/paperdata/blob/master/Lineage_tracing_on_transcriptional_landscapes_links_state_to_fate_during_differentiation/R EADME.md (Weinreb et al., 2020). Normalized gene expression for genes annotated as variable was scaled and projected to 50 dimensions via PCA, which was then used as input to the modeling framework. For experiments evaluating the model on the held-out time point, preprocessing was fit to only the training set consisting of days 2 and 6, and then used to transform all data including day 4. For all other experiments, preprocessing was fit to all data across time points. All visualizations using umap were fit with 30 neighbors.
Data for the Veres et al. experiments was downloaded from GEO (GSE114412) (Veres et al., 2019). Raw counts were first pre-processed using the standard Seurat pipeline (Butler et al., 2018) Figure S3a-b, e-f).

Comparing model architectures on recovering a held-out time point
Architectures were chosen such that the 2 layer models had a corresponding 1 layer model with a roughly equivalent total number of parameters. The smaller models (1 layer of 1000 units, 2 layers of 200 units) had ~50k parameters and were trained using Adam with a learning rate of 0.01. The larger models (1 layer of 4000 units, 2 layers of 400 units) had ~200k parameters and were trained with a learning rate of 0.005, as these models sometimes diverged when trained with a learning rate of 0.01 ( Figure S1a). All models were trained for 2500 epochs. Softplus was used as the activation function for all models.
To evaluate the models at a given time point, 10000 cells were sampled at day 2 with replacement according to the expected cell proliferation rate. Then, the model was used to sample a single trajectory for each of the sampled cells until the time point under evaluation. The Wasserstein distance was then computed between the simulated cell population and the empirically observed cell population. Models were evaluated at day 4 (training) 29 and day 6 (held-out, testing) every 100 training epochs. The testing distance is reported for the epoch with the lowest training error.
As a baseline, we compared to Waddington-OT (WOT), which uses a similar optimal transport formulation but lacks an explicit parametric form (Schiebinger et al., 2019). WOT enables recovery of a held-out time point via linear interpolation using transport maps built between sets of cells in early and late time points. To run WOT, we used python code available on GitHub (https://github.com/broadinstitute/wot). The input to WOT is a set of time-point labeled gene expression profiles and growth rates and the output is an optimal transport map. The optimal transport map was built with the full set of cells with lineage barcodes from day 2 (n=4,638 cells) to day 6 (n=29,679 cells). The ground-truth proliferation rates derived from clonal expansion of this set of cells from day 2 to 6 were provided to WOT and three growth iterations were permitted. The parameters for building the optimal transport map were as follows: λ 1 =1, λ 2 =50, ε=1, τ=10000. With the transport map built between day 2 and day 6, 10,000 cells at day 4 were interpolated using the interpolate_with_ot() function from WOT. This maps a point at the midpoint of each of the pairs in the optimal transport map. The testing distance of these interpolated points from the observed day 4 cells was computed as reported above.

Predicting clonal fate bias
To predict clonal fate bias, models with 2 layers of 400 units and a regularization strength of tau = 1e-6 were first trained on data from all three time points. Models were fit on three sets of data; (a) the subset of cells for which lineage tracing data is available, (b) the subset of cells for which no lineage tracing data is available, and (c) all cells. Models were trained for 2500 epochs and evaluated every 500 epochs. Then, cells were simulated until the final time point via the first-order discretization as parameterized by the trained model.
We evaluated the clonal fate bias metric described by Weinreb et al. on the test set also defined in their paper.
The approximate nearest neighbor (ANN) classifier that we used to classify cells as Neutrophil, Monocyte or other at the final time point was fit with 10 trees, 20 neighbors and using Euclidean distance in PCA space (50 pcs). The model was first fit on a random 80% split of the data. When evaluated on the held-out 20% test split, the model achieved a macro-average f1-score of 0.98. Splits were stratified by cell type. The model was then refit to the full dataset. After classification, the clonal fate bias was then computed as the number of neutrophils divided by the total number of monocytes and neutrophils. Since the model did not always predict any cell within the 2000 sampled trajectories to be a monocyte or neutrophil, we also added a pseudocount of 1. In those cases, clonal fate bias would hence be 0.5.
We observed variation in predictions made by models at each epoch ( Figure S1e). Since no validation set is available to formulate a stopping criteria, we chose to ensemble predictions over the last 5 epochs evaluated (i.e. epoch 2100, 2200, 2300, 2400, 2500) by taking the mean across the estimated clonal fate bias across those epochs.
In most cases, performance metrics for WOT, PBA and FateID were calculated using the predictions already pre-computed and made available by Weinreb et al. (Weinreb et al., 2020) Introducing and evaluating in silico perturbations Perturbation experiments were performed similarly to the clonal fate bias experiments, except using perturbed cells as input to the first-order discretization. Generally, perturbations were introduced in silico by setting the scaled normalized expression of target genes to z-score values less than 0 for knockdowns and greater than 0 for overexpression. The resulting perturbed gene expression profile was then transformed via PCA into lowerdimensional space for input to forward simulation of the trained model. In this section, we describe perturbational experiments performed by leveraging PRESCIENT models trained on longitudinal scRNA-seq datasets characterizing in vitro hematopoiesis and in vitro beta-cell differentiation.
For the in vitro hematopoiesis dataset (Weinreb et al., 2020), To test if there was a significant shift in neutrophil/monocyte cell fractions at the final time point, Welch's independent t-tests were performed between unperturbed simulations and perturbed simulations for each target TF. Next, the effect of perturbational magnitude was evaluated by introducing perturbations of -2.5, -1, -0.5, 2, 5, and 10 to each target gene individually. The same statistical test was performed in comparison to unperturbed simulations. Next, we tested the effect of ensembled perturbations to cell fate outcome by perturbing sets of TFs with z-scores of -2.5, -1, -0.5, 2, 5, and 10. As a control, the ensembled perturbation was repeated with following randomly selected non-TF control genes: Gch1, Pfn2, Dhrs2, Traf1, Lrrk2, Lgmn, Il13, and Sgk1. The same statistical test was performed in comparison to unperturbed simulations. Resource availability Data and code availability All code as well as preprocessed data and trained models are available at https://github.com/gifford-lab/prescient