Learning cell-specific networks from dynamical single cell data

Cell dynamics and biological function are governed by changing patterns of gene expression. Intricate gene interaction networks orchestrate these changes. Inferring these interactions from data is a notoriously diﬀicult inverse problem. The majority of existing network inference methods work at the population level. They construct static representations of gene regulatory networks, and they do not naturally allow us to infer differences in gene regulation across heterogeneous cell populations. Here we build upon recent dynamical inference methods that model single cell dynamics using Markov processes, which leads to an information-theoretic approach, locaTE, which employs the localised transfer entropy to infer cell-specific, causal gene regulatory networks. LocaTE uses high-resolution estimates of dynamics and geometry of the cellular gene expression manifold to inform inference of regulatory interactions. We find that this approach is superior to static inference methods, often by a significant margin. We demonstrate that factor analysis can give detailed insights into the inferred cell-specific GRNs. In application to three experimental datasets, we demonstrate superior performance and additional insights compared to stancard static GRN inference methods. For example, we recover key transcription factors and regulatory interactions driving mouse primitive endoderm formation, pancreatic development, and haematopoiesis. For both simulated and experimental data, we find that locaTE provides a powerful, eﬀicient and scalable network inference method that allows us to distil cell-specific networks from single cell data.


Introduction
Cell identity is controlled by the dynamics of gene regulation networks. Identifying and characterising gene regulatory interactions is a central aim of cell and systems biology. A frequently cited advantage of single cell assays is that population heterogeneity is captured, allowing for the detection and study of rare cellular phenotypes that would otherwise be unobservable from bulk studies [1]. This also allows us to detect variability within groups of cells which would conventionally be regarded as being of the same "type". The popularity of single cell data has spurred rapid advances in data analysis and modelling methods such as trajectory inference [2], dimensionality reduction [3], and network inference [4,5].
In contrast to bulk assays where only population-averaged expression can be measured, single cell data is more abundant, often vastly so. Modern datasets allow us to overcome the "large-p small-N problem" [6], in which the number of predictions outnumber the number of samples [7]. This problem has been plaguing gene regulatory network inference because the number of predictions grows quadratically with the number of genes considered. In this vein, a collection of methods have been developed for network inference from single cell data, some of which build on methods originally intended for bulk expression data, and others which are specifically tailored to single cell data. The recent surveys in [8,9,5] provide an overview of the current methodology. The overwhelming majority of network inference methods aim to reconstruct a single static network [5,8] that describes the set of possible interactions occurring within an observed population. Given the biological importance of cellular heterogeneity, a natural expectation is that variations in transcriptional state may correspond to variations in (cell state dependent) regulatory interactions which cannot be represented as static networks [7], and indeed has been observed empirically in previous studies [10,11]. Figure 1: Cell-specific network inference from estimated dynamics. LocaTE takes as input a transition matrix P that encodes inferred cellular dynamics as a Markov chain on the cell state manifold. By considering the coupling pX τ , X´τ q, locaTE produces an estimate of transfer entropy for each cell i and each pair of genes pj, kq.
The clear need for methods that can learn cell-specific networks [7,12], is now leading to new classes of network inference methods [13,14]. These methods seek to infer multiple networks (e.g. one per cell type, or even one per cell) from single-cell resolved data, thus allowing us to learn about variation in regulatory relationships over time and between conditions. Such methods generally rely on neighbourhood or cluster information to construct networks that are cell, time, or context-specific. Existing methods for cell-specific network inference have focused on inferring undirected networks. If from gene expression measurements we hope to infer causal or directed interactions, then information about underlying dynamics in a cell population is essential [15]. To this end, many methods for static network inference take advantage of pseudotemporal orderings to inform network inference [16,17,18,19,20]. The one-dimensional nature of pseudotime necessarily imposes a total ordering over cells, resulting in a loss of finer structure of the cell state manifold. Recent work on trajectory inference from a Markov process viewpoint [21,22,23,24] depart from this framework and are able to model more complex dynamics on the set of observed cell states directly, and are free from assumptions on the trajectory topology (and do not require coarse graining of the data).
Here we adopt the transfer entropy (TE) [25] as an information-theoretic measure of causality. Transfer entropy, like other information-theoretical measures, is a model-free framework which has been used widely in contexts such as neuroscience where abundant time-series data are available. In a Gaussian setting it reduces to the linear Granger causality [26]. Transfer entropy has been used to predict static networks using a pseudotemporal ordering [16,17]. We demonstrate that the transfer entropy can be adapted to infer cell-specific networks without imposing a prescribed ordering of cells.
Inference of cell-specific networks will help us to address problems in developmental biology, where it is often the transient dynamics that are the most interesting since these are likely to correspond to major epigenetic changes [7]. Our approach is tailored to such problems since we make no restricting simplifying assumptions: the cell state space is treated as a continuum, and a priori no partitioning or clustering is required to group or classify cells.

Method
Local transfer entropy of expression dynamics We represent the transcriptional state of a cell as a vector of expression levels X " pX p1q , . . . , X pN q q. Transfer entropy does not make assumptions on the specific form of a model, but for the sake of illustration let us consider a model of cellular dynamics where the expression level of each transcript X piq is governed by a stochastic differential equation (SDE) of the form (1) Note that this framework includes the chemical Langevin equation (CLE) [27], a biophysically derived model of single cell expression dynamics. Under the Itô interpretation [28,29], for a small time interval τ we may write approximately, where the Z pkq are independent and identically distributed Gaussian random variables. It is straightforward to condition on tX piq t " xu, For any other gene X pjq , j ‰ i, following [30,Section 4.2], the transfer entropy with time lag τ is defined to be where is the mutual information of two random variables, X, Y , conditional on a third variable, Z. Under the discrete approximation, the transfer entropy will be positive whenever f piq or σ piq depend on X pjq , since then one has IpX piq t`τ , X pjq t |X piq t q ą 0. In the biophysically relevant setting of the CLE, the functions f and σ share the same functional dependence on tX pkq u k when X piq " x, and so for infinitesimal time-scales τ a nonzero transfer entropy implies regulation, i.e. B j f piq ą 0, B j σ piq ą 0. In the practically relevant setting where the time-scale τ is short, the data processing inequality [31] guarantees that direct interactions will dominate indirect interactions, i.e. for a feed-forward loop X piq Ñ X pjq Ñ X pkq , one has that I ik ă mintI ij , I jk u [32].
The concept of transfer entropy has previously been used for inference of static networks where pseudotemporally ordered expression data are available [16,17]. However, in these methods the expectations in (5), (4) are taken with respect to the set of lagged tuples of cells along a total ordering. This introduces a strong dependence on the input pseudotemporal ordering; it also results in loss of resolution: any variation in regulatory activity over the expression space is integrated out to yield a static network. This can be problematic since biological trajectories may exhibit bifurcating, cyclic or converging trajectories, potentially requiring multiple runs of GRN inference on different subsets of a dataset.
We propose to take, for each observed cellX, the expectation (5) with respect to the evolution described by the transition kernel P τ of (1) starting from some neighbourhood N pXq of the cell. For short time scales, τ, and an appropriately sized neighbourhood, N pXq, (i.e. large enough to take expectations, but not so large as to lose local detail), the resulting dynamics are therefore local. This is best understood as a conditional transfer entropy, This enables us to infer regulatory interactions that are specific to the local cell context. Some informationtheoretic static inference methods [4,16] condition on a third gene in order to remove indirect interactions. Instead of conditioning separately on each potential indirect interaction, our framework conditions on a local region of the expression space. This can be interpreted as conditioning on some (potentially non-linear) function of the overall gene expression profile that is determined by the construction of the cellular manifold (e.g. in the case of PCA, this would be an affine combination).
Finally, since we model dynamics as a general Markov process, no assumptions on the ordering of cells or topology of trajectories is made or necessary.

Inference in practice
Typically we do not have information about the drift and diffusion terms [28] in (1); dynamical information must instead be inferred from observed expression data. This trajectory inference problem has been widely studied [22,33,2]. Since we are interested in cell-specific networks, we restrict our focus to a class of methods that model cellular dynamics as a Markov process involving drift and diffusion over a manifold of cell states [21,23,34,24]. Let X " tx i u N i"1 be a sample from a population of cells, where x i P R M is the vector of mRNA expression values for cell i; M denotes a graph constructed from X that can be thought of as approximating some underlying manifold of cell states [35]. We assume that we have access to a transition matrix P P R NˆN that is supported on M and that adequately describes the ("true") dynamics of (1). In practice, as we do in Section 3, we could use a range of recently reported methods to estimate P . Then M equipped with P encodes a discrete Markov chain that approximates the true cellular dynamics. In what follows, we will write vnw " t1, . . . , nu for n P N.
For any cell, x P M, we can construct (using e.g. a uniform distribution, truncated Gaussian, etc.) a probability distribution supported on its neighbourhood, π x 0 . We can then consider a Markov process X t , t ě 0 that starts from X 0 " π x 0 and evolves under P . For some fixed τ the resulting coupling is For a pair of genes, pj, kq, we quantify the local transfer entropy for the causal relationship j Ñ k from knowledge of the coupling γ x t and the gene expression states, Repeating this computation for all cells, x i P M, and gene pairs, j, k P vM w, we arrive at a tensor,Ĝ P R NˆMˆM , such that,Ĝ ijk , is the transfer entropy score of j Ñ k in the neighbourhood of cell, x i . Since the TE score matrix,Ĝ i , for each cell has been learned from a local neighbourhood, it is potentially noisy. Furthermore, it is impractical (due to limited data and computational resources) to condition on potential indirectly interacting genes [16]. Thus, TE scores may contain noise and spurious signals resulting from indirect interactions. To deal with this, we first filter interaction scores using context likelihood of relatedness (CLR) [36,4] and then solve a manifold-regularised optimisation problem to both de-noise the interaction signals, and to attenuate noise and indirect interactions [16,13].
Context likelihood of relatedness. Given raw TE scoresĜ, we use the context likelihood of relatedness [37] to filter scored interactions. The CLR algorithm has been shown to be an efficient mechanism for filtering out indirect interactions in the context of mutual information scores [37] and other information theoretical measures [4] For a pair of genes pi, jq and given a interaction score matrix, A, we compute z i (resp. z j ) to be the z-score of A ij with respect to A i¨( resp. A¨j). Then we define the weighted CLR score for pi, jq to be Applying CLR filtering along the first axis ofĜ, we obtain the filtered tensor, r G. We modify the original approach of [37] and weigh the CLR score by the initial MI valueĜ ij . This is important since CLR was originally designed to filter interactions in static networks. In cell-specific networks, very few edges may be "active" in any given cellular context; then entire rows or columns ofĜ may consist only of noise. Computing z-scores along those rows or columns would put both noise and signal on the same scale.
Smoothing and denoising The tensor r G contains a (noisy) matrix of filtered interaction scores, one for each cell. For notational convenience we write r G as r G P R NˆM 2 , i.e. each row is a length-M 2 unfolded score matrix. We propose to solve the optimisation problem G " arg min X LpX; L, r Gq where λ 1 , λ 2 are hyper-parameters, w is a vector of cell weights, and L is the symmetric graph Laplacian on M. The term associated with λ 1 is a manifold regularisation term corresponding to an expectation that regulatory relationships should vary smoothly with changes in cell state, i.e. trpX J LXq is large for rapidly fluctuating X. The term associated to λ 2 is a Lasso [6] term that encourages X to be sparse, reflecting our knowledge of biological networks [15,13]. Together, the objective (9) encourages both parsimony and sharing of information along the cell state manifold. The problem is a case of L1-L2-regularised least squares, and can be solved efficiently using the scheme described in the appendix.
Using a backward transition matrix Since above we consider couplings of the process at times p0, τ q, TE scores may reflect the underlying dynamics up to a shift in time. To remedy this, we consider couplings symmetric in time, i.e. between p´τ, τ q. While time-reversal of a diffusion-drift process away from equilibrium is generally not a well-posed problem, in practice, suitable backward operators have been constructed by transposing the transition matrix [23,38] (see Appendix A.2 for further discussions). Given an approximate backward operator, Q, a time-symmetric coupling, γ x τ " pQ J q τ diagpπ x 0 qP τ , can be constructed.
Extracting a static network Cell-specific networks can be aggregated across a cell population to generate a static network. Averaging over individual cell-specific networks can be understood as applying the so-called "tower" property of conditional expectation. Here we observe that inferred cell-specific networks become less informative near the endpoints of trajectories, where temporal information is lost. Similarly, averaging may result in signals from small subpopulations of cells to be diluted. Thus, some reweighting strategies (or heuristics) will be useful to infer appropriate static networks.

Computational considerations
The problem of cell-specific network inference is at least OpN M 2 q for N cells and M genes, since each cell-gene-gene triplet must be considered. For each triplet pi, j, kq locaTE calculates the joint distributions of pX pkq τ , X pjq 0 , X pkq 0 q under the coupling γ piq . While in the worst-case (if γ were dense) this would have complexity OpN 2 q, in all our applications we consider sparse γ where transitions only occur between nearby cells, in which the complexity is Op|γ piq |q ! N 2 . The CLR filtering step is also OpN M 2 q, and the regression step primarily involves iterative matrix products of sparse NˆN against NˆM 2 . We implemented locaTE using the Julia programming language [39] to take advantage of its speed for numerical computation.
We note that with exception of the regression step, which is global, all computational steps are easily parallelised across cells. The calculation of TE scores can alternatively be parallelised across gene pairs. This allows locaTE to take advantage of large scale parallelisation when available, potentially drastically reducing the compute time required. We find empirically that a large portion of computational burden arises from the construction of the joint distribution, pX pkq τ , X pjq 0 , X pkq 0 q. Since this is effectively an accumulation operation, this can be significantly accelerated using GPU computation. We observe speedups of 100´200ˆusing a NVIDIA V100 card 1 . Other steps in the locaTE pipeline (regression, factor analysis) amount to linear algebra operations and are also amenable to GPU acceleration. For datasets in excess of " 10 4 cells, the computational cost could be greatly reduced by first grouping sampled cells into metacells, using e.g. the approach in [41].

Results
We first discuss two simulation examples where the true regulatory interactions are known by design. To illustrate the advantages of cell-specific networks, we consider examples where the network is rewired over time. Results for a wider range of "simple" developmental trajectories are provided in Appendix B.2.

Simulated data: branch-dependent logic
Overview We start by considering a scenario where a static network cannot capture the mechanisms underlying gene regulation. In the network in Figure 2(a) a bifurcation driven by a toggle-switch feeds into one of two "gene expression modules". Modules A and B involve the same genes tg6, g7, g8, g9u, but with very different interactions among the genes: the flow of information in each module is the mirror image of the other.
A static network (in the graphical modelling sense) is insufficient to describe this system, since the directionality of some interactions depends on cellular context. Context dependence can alternatively be understood as arising from the presence of higher-order interactions. Consider gene 7, which may either be   Inference of higher-order interactions is known to be an even more challenging statistical problem than inference for pairwise interactions [15,4]. Without trying to address this problem in generality, cell-specific networks allow us to disentangle interactions for a certain subclass of higher-order interactions: those which can be understood to be locally first-order, conditional on some latent variable that is encoded in the cellular context. Since existing cell-specific network inference methods [13,14] infer undirected edges only, such methods are unable to distinguish the direction of information flow in this example and thus would predict identical networks for either branch.
Simulation We simulate the network in Figure 2(a) using the BoolODE software package [5]. This uses a chemical Langevin equation scheme for simulating biologically plausible expression dynamics: dX t " f pX t qdt`gpX t qdB t where f, g have specific forms dictated by the reaction network. From 1,000 independent realisations of the system's trajectories, we generate a set of 1,000 sampled cells by sampling a cell state from each trajectory at a time chosen uniformly at random. As a measure of ground truth interactions for each cell x i , we compute the corresponding Jacobians J ijk " B j f k px i q. In Appendix B.1 we document the Boolean rules used to implement this network.
Inferring dynamics As input our method requires a cell-state transition matrix P . We construct an assumed "ground truth" matrix P from the velocity of sampled cells in the expression space ("RNA velocity") as follows. For each observed cell, x i , we calculate the corresponding velocity vector, v i " f px i q. From this, we can construct transition probabilities to a neighbourhood of x i (in gene expression space), where 1 denotes the 0-1 indicator function. We set N to be the k-NN neighbourhood with k " 25, and chose the bandwidth σ following the median heuristic of [23]. We also consider other approaches for constructing P : as in [38], the inner product can be replaced with cosine similarity or Pearson correlation between x j´xi and v i in order to estimate transition probabilities depending only on the orientation of v i . In addition to velocity-based methods, we consider a transition matrix based on diffusion pseudotime [42], which corresponds to a random walk biased in the direction of increasing pseudotime. We also consider the methods StationaryOT [34] and PBA [21], which were shown to have good performance for inferring stochastic dynamics without depending on pseudotime or velocity estimates.

Inferring causal interactions
We apply the method described in Section 2 with a backward matrix obtained via the transpose (see Section A.2), for a range of parameter values pτ, λ 1 , λ 2 q. To construct the initial neighbourhood distribution π x 0 we use quadratically regularised optimal transport [43], which produces a local neighbourhood density similar in effect to a truncated Gaussian. In Figure 2(b) we show the individual cell-specific networks obtained for τ " 3, λ 1 " 25, λ 2 " 10´3 represented as vectors against the pseudotemporal ordering, and in Figure 2(c) we show the averaged networks over the cell clusters corresponding to the two branches. We observe overall that the the de-noised result G resembles the ground truth. The averaged networks for module A and module B are directed and reflect the mirrored connectivities described in Figure 2(a). Averaging networks is convenient for summarising the inference output for cellspecific networks, but may obscure the cell-specific nature of the inference result. To illustrate this, in Figure  S7(b) we show the interaction scores for the two edges g6 Ñ g7, g7 Ñ g6, which are specific to modules A and B, respectively. We observe that the flow of information between genes 6 and 7 is reversed on opposite branches, demonstrating that locaTE can resolve the cell-specific directionalities of edges.

Comparison to other methods
There is a lack of methods for inferring cell-specific GRNs. CeSpGRN [13] is perhaps the method most similar to locaTE in terms of the problem it seeks to address: inferring a NˆGˆG array of cell-specific gene-gene interaction scores. It is limited to undirected interactions due to use of the (Gaussian) Graphical Lasso. For a larger set of comparisons we also consider methods for static network inference. We consider three information-theoretic methods: TENET [17] and Scribe [16], both of which infer static directed networks from pseudotime-ordered single-cell data, as well as PIDC [4] which infers an undirected network using partial information decomposition. We additionally consider SCODE [18] and (for the experimental datasets) GRISLI [20], which are ODE-based models for directed inference.
Cell-specific edge detection For each triple pi, j, kq with i P vN w and j, k P vM w, we treat the problem of detecting an edge j Ñ k in cell x i as a binary classification problem with some threshold q. To construct the set of true positives, we consider the matrix ΠJ, where Π is a neighbourhood transition matrix such that Π i¨" π xi 0 . The motivation for considering this instead of simply using the raw Jacobians, J, is that cellspecific interactions are necessarily inferred using neighbourhood information. Strict cell-wise comparison to a ground truth would be overly stringent and sensitive to small perturbations in the expression space. By computing ΠJ, the ground truth signal is smoothed over the same neighbourhood that was used for inference, allowing for more robust assessment of classifiers. We classify all potential edges for which pΠJq ijk ą 0.5 to be "true" edges.
In Figure S7(c) we show the Precision-Recall (PR) curves calculated for locaTE, as well as raw transfer entropy (TE), transfer entropy followed by CLR filtering (TE+CLR), and the method CeSpGRN [13] which infers cell-specific undirected networks. We find that the CLR filtering step improves upon raw TE scores in terms of prediction performance, and that locaTE improves substantially upon both TE and TE+CLR. This demonstrates that both the CLR filtering and the regularised regression step are essential for inference: this can be understood as a combined effect of (1) removal of indirect interactions using the CLR algorithm, and (2) sharing of information along the cell state manifold and attenuation of noise using the smooth and sparse regression. We also compute a ground-truth static network by aggregating the cell-specific ground truth networks and calculate a static network AUPRC, corresponding to the best-case performance for a static method, i.e. if all cells were estimated to have the same static network.
To investigate the performance of locaTE further we use the BoolODE package to generate 10 datasets using different kinetic parameters for each simulation [5,Methods]. Thus, variation across datasets results both from the random sampling of cells, as well as differences in the simulated biophysics. For each dataset, we infer cell-specific networks using locaTE using several methods for trajectory inference. We consider three approaches using velocity information: dot-product, cosine similarity, and Pearson correlation. We also consider StationaryOT and PBA, as well as a transition kernel based on diffusion pseudotime [42]. In Figure 2(d) we summarise performance in terms of AUPRC ratio, that is the AUPRC for inferred cell-specific networks relative to that of a random predictor.
These results show that the trajectory reconstructions used as input to locaTE can affect the inference result: the dot-product velocity kernel outperforms the cosine and correlation velocity kernels. This is not unexpected, since cosine and correlation kernels depend only on the orientation of the velocity vector and not on its magnitude -thus losing information. StationaryOT performs similarly to cosine and correlation kernels, and the PBA and the pseudotime kernel perform worst. However, we find that all locaTE-based methods outperform the undirected method CeSpGRN, indicating that the availability of even potentially poor dynamical information can lead to improved inference of directed interactions. All methods perform at least as well as the static network baseline, illustrating the value of cell-specific network inference methods in general.

Static networks as aggregates
As most methods in the literature deal with the static inference problem, we investigate the performance of locaTE by averaging local networks across cells to form a static network (see Figure S7(e) for representative examples). We show AUPRC ratios in Figure S7(d) for networks inferred by locaTE, as well as other selected static inference methods. In this setting we find that locaTE performs similarly well across trajectory inference methods, and that CeSpGRN performs marginally better. This can be understood, at least in parts, because the aggregated networks for modules A and B together in Figure  2(a) would be indistinguishable from the corresponding undirected network (as we show in the next example, this is not always the case). Competing static methods all perform worse, with PIDC performing best, and SCODE performing worst, broadly in keeping with the conclusions of [5].
Downstream factor analysis Instead of using the de-noising regression described previously, we can also apply techniques from factor analysis to the tensor of TE values. Methods such as non-negative matrix factorisation (NMF) and non-negative tensor factorisation (NTF) seek a simplified, low-rank description of the data. In the context of cell-specific GRNs, we are faced with an NˆGˆG array. By applying NMF (or NTF) to the flattened NˆG 2 matrix (resp. NˆGˆG tensor), we effectively seek groups or modules of interactions that are active together in some region of the cell state space; here the regulatory state of a given cell is represented as a linear combination of some inferred archetypal regulation patterns. We derive a numerical scheme for finding matrix and tensor low-rank representations of the TE score matrix that incorporates the same smoothing and sparsity priors as (9). This can be used as a drop-in replacement for the (full-rank) problem (9). We describe these in detail in Appendix A.4 (NMF) and Appendix A.5 (NTF).
In Figure 3(a) we illustrate the utility of NMF and NTF for finding simplified representations of regulatory interactions inferred using locaTE for the bifurcating example. Applying locaTE-NMF with k " 8 factors, we find regulatory modules whose activities are localised in the cell state manifold. Following a similar approach to [44], an undirected graph was constructed from the locaTE-NMF gene regulatory modules with edge affinities calculated from the Wasserstein distance between the normalised module activities. We present a spectral layout of this graph, which exhibits the bifurcating structure that reflects the underlying process. Examining the top edges of selected modules shows that key portions of the regulatory dynamics are recovered. Two distinct phases of the bifurcation are found: the first corresponding activation of genes 4 and 5 by gene 3, and the second corresponding to deactivation of gene 1 by genes 4 and 5. Along each of the branches, locaTE-NMF uncovers elements of the different regulatory logics that involve the same genes.
Tensor factorisation differs from matrix factorisation in that the learned atoms have the additional constraint of being separable. Because of this added structure, tensor decompositions are often found to have better properties (identifiablity of factors, robustness to noise [45]). We apply locaTE-NTF to find 16 rankone factors of the scGRN tensor, and we apply k-means clustering to group the rank-one atoms into 8 clusters based on their activities. We find that the resulting clusters largely resemble the ones found by NMF. Unlike NMF however, the clusters can be decomposed into smaller rank-one components, and appear less noisy than the atoms found by NMF owing to the additional constraint. In Figure S8 we show how clusters 1 and 3 from the NTF representation split into rank-one components, demonstrating that the rank constraint leads to finer grained representations of regulatory states.

Simulated data: switch-like behaviour
We next consider a scenario where differential regulation occurs along a linear trajectory instead of a bifurcation. This inference problem is more subtle than a bifurcating process given the lack of a clear branching or clustering structure upon which regulatory interactions depend. We consider a model network depicted in Figure 2(e): this network either contains a feed-forward loop g2 Ñ g3 Ñ g4, or g2 Ñ g4 Ñ g3, depending on the position of a cell along the trajectory. The interaction g2 Ñ g4 is first indirect and then becomes direct: the nature of this differential regulation is such that static GRN inference methods would observe a superposition of these interactions, i.e. g2 Ñ g3 Ñ g4 as well as g2 Ñ g4. As an example, this could potentially lead to the g2 Ñ g4 interaction being erroneously identified as an indirect interaction and thus being pruned, if filtering heuristics such as CLR or the data-processing inequality were used [37,17]. Cell-specific networks may be able to disentangle these two effects.
We implement this network in BoolODE and simulate 1,000 cells. As before, we apply locaTE using a transition kernel derived from the simulated velocity information and the same parameters as in the bifurcating dataset. We find from Figure 2(f, g) that the inferred interactions accurately capture the true directed interactions. In Figure 2(h) we demonstrate that locaTE significantly outperforms competing methods in terms of AUPRC, for both cell-specific and static networks. locaTE substantially outperforms CeSpGRN in both settings for this example, in contrast to the previous example, where CeSpGRN performed well for static inference.
We next apply locaTE-NMF to find k " 5 regulatory modules following the same procedure as for the bifurcating example. In Figure 3(b) we show a force-directed layout of the regulatory module graph, upon which we label modules that correspond to the regulatory logic when the switch condition g6^p␣g5q is respectively on and off. In contrast to the results shown in Figure 2(g), which made use of a separate clustering step, this demonstrates that locaTE-NMF can identify distinct regulatory states in an unsupervised manner independent of clustering.
Together, our simulated network analyses demonstrate that locaTE is able to use dynamical information in the form of transition matrices to improve inference of directed interactions, often substantially so. Unsurprisingly, better estimates of dynamics lead to better results for network inference. Perhaps what is more remarkable is that, even after aggregating cell-specific networks produced by locaTE, the resulting static networks generally outperform competing static inference methods (see Figures S7(d, e), S9(d, e)).

mESC differentiation dataset using StationaryOT
We first consider the dataset of Hayashi et al. (2018) [40], consisting of 456 cells and 100 TFs. This dataset is a time-series of mouse embryonic stem cells differentiating through to primitive endoderm, sampled at 0, 12, 24, 48, 72h. Cells in this differentiation process follow a linear trajectory, as can be seen in Figure 4(a). To infer trajectories, we apply StationaryOT [34] using quadratically-regularised optimal transport to ensure a sparse transition kernel (see Section B.3 for details). Using this transition matrix, we infer cell-specific networks with locaTE, resulting in a 456ˆ100ˆ100 array of interaction scores.
Since no ground truth is available for cell-specific networks in this experimental dataset, we are only able to validate the output of locaTE against a static benchmark. In [19] the authors constructed a reference network from known regulatory interactions in the embryonic stem cell atlas from pluripotency evidence (ESCAPE) database [46], which consists of 19 regulators and 100 targets. Starting from the array of cellspecific interaction scores, we obtain a static 100ˆ100 GRN by averaging interaction scores across cells and comput precision-recall curves relative to the interactions in our benchmark network (Figure 4(c)). Cells from the last 10% of pseudotime were excluded since informative dynamics are lost at the end of the observed trajectory. For comparison, we run CeSpGRN, which infers cell-specific undirected networks, and a collection of static inference methods. We compare a subnetwork of the inferred networks to the ESCAPE reference. In the directed setting, we find that locaTE outperforms other methods in terms of AUPRC, and performs well in terms of EP (0.496, compared to a best EP of 0.553 for TENET and baseline of 0.343). For undirected inference locaTE is out-performed by SCODE (AUPRCs of 0.453 and 0.483 respectively). Together with the observation that SCODE performs poorly for directed inference, this suggests that, despite being a method for directed inference, SCODE may be prone to detecting interactions but with the wrong directionality; PIDC [4] is the second best performing method in terms of AUPRC (0.456).
Since locaTE infers cell-specific networks, we can ask questions at finer resolution than possible at the population level. To demonstrate the power of this type of analysis, we use locaTE-NMF with k " 8 factors to find regulatory modules that are active at different stages along the trajectory (Figure 4(d)). Hierarchical clustering of the inferred factor activities produces three groups, corresponding to "early", "mid", and "late" trajectory stages. For each cluster we compute a representative network by taking weighted averages of NMF factors, and the top 0.25% of edges for each stage are shown in Figure 4(e). It is clear that the nature of the inferred regulatory interactions changes drastically between stages. Included among the top inferred interactions are the canonical ones previously reported to be associated with maintenance of pluripotency and embryonic development, including Nanog-Zfp42 [47], Foxd3-Nanog [48] and Sall4-Klf2 [49] which are active early in the trajectory. Top inferred interactions for the late trajectory include Epas1 (Hif2a)-Pou5f1 (Oct4), which is known to play a role in embryonic development [50]; and Epas1 (Hif2a)-Bhlhe40 (Dec1), which plays a role in suppression of Deptor [51], a stemness factor known to be downregulated during ES differentiation [52].
For each stage, we rank potential regulators by their out-edge eigenvector centrality and show the 25 top-ranked genes for each stage (see Figure S13(a)). We find that many of the top genes are known to be associated with stem cell development, including Nanog, Pou5f1 (Oct4), Sox2 [48], Gata6 [53], Zfp42 [47]. These gene lists are input to Metascape [54] to find corresponding Gene Ontology (GO) annotations ( Figure  S13). Among the gene set annotations found for the early trajectory, we find that enriched pathways with the highest confidence are those associated with pluripotency and embryonic morphogenesis. Annotations for the mid and late trajectory largely correspond to cell fate commitment, endoderm formation, and organ development. These annotations reflect the developmental progression of embryonic stem cell differentiation to primitive endoderm.

Pancreas development dataset with RNA velocity
Next we consider a murine pancreatic development dataset of Bastidas-Ponce et al. [55]; here aspects of the cellular dynamics are captured in the form of RNA velocity estimates. The dataset consists of 2,531 cells, and we filter for 82 transcription factors following the preprocessing of [20]. We apply the scVelo [56] package to produce RNA velocity estimates, and CellRank [23] with a cosine kernel to produce a transition matrix encoding the inferred dynamics on the cell set.
We apply locaTE with the RNA-velocity derived transition matrix, and obtain a 2, 531ˆ82ˆ82 array of cell-specific interaction scores. In Figure 5(a) we show a UMAP embedding of the full dataset with streamlines visualising RNA-velocity dynamics. To illustrate the heterogeneity in the networks inferred by locaTE, we also colour cells by their corresponding cell-specific GRN using the first diffusion component in "scGRNspace" (see Figure S14(b)). This reveals a clear gradient along the differentiation trajectory, suggesting that the networks output by locaTE capture a progression in regulatory interactions over developmental time. The highest-scoring interactions involve the genes Fos and Egr1, which are well known to play a role in stress response [58]. Fos expression has been documented to be an dissocation-induced artifact in scRNA-seq [59], and we therefore conservatively filter out interactions involving Fos or Egr1 in subsequent analyses.
As a first summary of the inference output, we produce an 80ˆ80 static GRN by averaging (as before we exclude the last 10% of pseudotime). Figure 5(b) shows the top 1% of edges from this network. We observe that the genes with highest out-edge eigenvector centrality include Nkx6-1, known to be a master regulator of pancreatic beta cell development [57,60], as well as Isl1 and Pdx1, known determinants of alpha and beta cell fate [61]. In [57] the authors report evidence for a 4-gene network (henceforth referred to as the α/β network) involving Nkx6-1, Pdx1, Isl1 and Arx, that drives alpha and beta fate determination ( Figure 5(e)). We find that all 4 genes have high out-edge centrality in the inferred static network, and furthermore we find   [57]. that the α/β network is recapitulated in the top-1% edges inferred network, as highlighted by the coloured edges in Figure 5(a, b).
This dataset contains annotations for cell types. To investigate differences in regulatory state between subpopulations, we produce cluster-averaged networks for each celltype. In Figure 5(a) we show the top 1% edges of cluster-averaged, filtered networks for the Ngn3´EP (endocrine progenitor), Ngn3+ EP, beta and alpha clusters (see Appendix S14 for remaining clusters). We note that the inferred network for Ngn3É P cells includes Sox9-Hes1. Sox9 is known to play a key role in maintaining the pancreatic progenitor cell state by regulating Hes1 [62], an inhibitor of Neurog3 (Ngn3) [61]. This interaction is no longer present in the Ngn3+ network, further evidence for its role in maintaining the Ngn3´cell state. This interaction also does not appear in the static GRN in Figure 5(b), illustrating the need for cell-specific networks in capturing localised signals, which would be static networks or population averaging. Out-edge eigenvector centrality of the Ngn3´EP cluster identifies additional potential regulators including Foxa3, known to be implicated in endodermal organ formation [63]; and Hmga2, a chromatin-associated protein expressed during development that does not directly regulate downstream genes, but appears to have widespread indirect effects via chromatin interactions [64].
Clusters corresponding to alpha and beta committed cells yield networks that are distinct from the progenitor clusters. In particular, we find that the α/β network is prominent in both subpopulations. This implies that these genes play key roles in the fate determination mechanism. To investigate this at a finer scale we visualise expression levels and interaction scores for the Nkx6.1-Arx interaction in Figure 5(c). We observe a fate decision boundary from expression levels that reflects the competition between Nkx6.1 and Arx corresponding to beta and alpha fate. The interaction scores for both the Nkx6.1-Arx edge and the entire α/β subnetwork are also strongest near this boundary. Examining again the out-edge eigenvector centrality, we identify Xbp1 as a top regulator specific to the alpha and beta subpopulations, recently found to be critical to maintenance of beta-cell identity [65].
Modelling of cellular dynamics as Markov processes naturally allows estimation of fate commitment in terms of absorption probabilities [21,23,34] and so examining correlation of gene interactions with fate commitment can be used to construct a putative "lineage driver" network that is correlated to a cell fate of interest. In Figure 5(d) we construct such a network for the alpha and beta cell lineages. As we expect, the α/β network is strongly correlated to beta cell fate. We find that Isl1 and Nkx-6.1 have higher out-edge eigenvector centrality towards the alpha and beta lineages respectively, reflecting their roles in the known network.

Haematopoiesis with extensions of RNA velocity
One advantage of our Markov framework is that it is agnostic to how the dynamics are inferred. Currently, inference, interpretation and downstream use of RNA velocity estimates is a contentious topic [66,67] and remains an active research area. Marot-Lassauzaie et al. [68] propose κ-velo, an alternative method for estimating RNA velocity and provide an example application to a haematopoiesis dataset where the more established method scVelo [56] fails to recover expected trajectories. We apply locaTE to this dataset using inferred dynamics computed by κ-velo. A bifurcation in the trajectories between myeloid and erythroid lineages is a prominent feature in this data set (see Figure 6(a)). Erythroid-myeloid cell fate determination is governed by a genetic switch comprised of mutually antagonistic Gata1 and Spi1 (known also as PU.1) [69] (see illustration in Figure 6(f)). In Figure 6(b) we show the locaTE interaction score for the Gata1-Spi1 interaction alone along with the smoothed RNA velocity vector field. We find that the peak activity corresponds to a "dead zone" in the velocity field, while in the surrounding regions downstream, the velocity fields are coherent and indicative of lineage commitment. This is of practical interest [68] as one interpretation of this could be that it corresponds to cells which have high fate plasticity and are still in the "decisionmaking" process.
We apply locaTE-NMF to find k " 16 regulatory modules, and in Figure 6(d) we show a layout of the graph constructed from modules in the same manner as Figure 3 (see Figure S16(a) for all modules). From this we can see two clear branches corresponding to erythroid and myeloid fate. We find a module (see Figure 6(d), labelled Myel-Ery) which appears concentrated near the bifurcation point on the UMAP coordinates. This module is centered around activity of Spi1 and Gata1, as well as Mef2c, which is known to be involved in lymphoid-myeloid fate determination [70]. In this module locaTE-NMF detects the key Gata1- Spi1 (PU.1) interaction (edge highlighted in red). In Figure 6(d, e) we display other regulatory modules found corresponding to various degrees of fate commitment, including haematopoietic stem cells (HSCs). These networks uncover known hallmarks of haematopoietic development: the myeloid-lymphoid progenitor (labelled Myel-Lymph) module prominently features Irf8, understood to play a key role in the development of myeloid and lymphoid lineages [71,72] (note that the lymphoid lineage is not captured in this dataset). The erythroid module is centered on a different set of genes such as Gfi1b [73]. The HSC module features Junb prominently, which has been implicated in homeostasis in the long-term HSC population [74]. The HSC module contains interactions involving Baz2b, a master regulator which has been found to reprogram lineage-committed cells to a multipotent state [75]. In the static network produced by aggregating locaTE scGRNs (see Figure 6(c)) we find that interactions such as Gata1-Spi1 are obscured due to the population averaging. Cell-specific methods such as locaTE that are required to learn interpretable representations of regulatory states in an unsupervised fashion without the need for clustering.

Conclusion
Biology is dynamic and a static representation of a network can never do justice to what we observe phenotypically. To discern molecular causes requires experimental and theoretical advances to progress in lock-step. The power of single cell data, and the insights we can gain from such data, crucially rely on flexible statistical analyses that allow us to dissect the differences between gene regulatory networks (and programs) and how they differ between cells.
Gaining deeper mechanistic insights into single-cell data requires a framework to reconstruct cell-specific, causal networks. Our method, locaTE, relies on computing the transfer entropy (TE) [25,30] for pairs of genes to measure causal relationships. The approach uses approximations of the underlying dynamics but makes no extraneous assumptions on linearity of interactions, the distribution of gene expression counts, or the topology of the trajectory.
By using neighbourhoods in gene expression space we can infer cell-specific networks without going to the extreme N " 1 case. How to pool information across cells most efficiently will ultimately decide on the success of such methods and the insights they provide. Temporal ordering helps enormously, but we are still, by and large, not able to collect meaningful single cell transcriptomic time-course data. Pseudo-temporal ordering or some of the many versions of RNA velocity entering the literature can provide useful information, as can, related multi-omic data, such as scATAC-seq.
Directions for future work include investigating approaches using which local information can help dealing with technical zeros (dropout) in single cell data. We investigated the application of data-imputation methods for the pancreatic dataset; however, we found that imputation worsened performance for all inference methods considered. This effect has been reported previously to be due to spurious signals being amplified [76] between non-interacting genes.
The problem of inferring cellular dynamics, upon which application of methods such as locaTE crucially depend, is far from solved. In particular, accurately modelling and quantifying RNA velocity remains challenging in practice [66,38]. Metabolic labelling assays may also provide a more accurate source of dynamical information [24]. Lastly, a further route for improving locaTE and network inference in general is to use relevant prior information to guide network inference.

A.1 Code availability
An open-source Julia implementation of locaTE is available at https://github.com/zsteve/locaTE.jl.

A.2 Constructing a backward operator
It is straightforward to write: If P encoded a reversible Markov chain and we chose PrX´t "¨s to be the stationary distribution, this would give us time-reversal at equilibrium. Since we are interested in the behaviour of the process away from equilibrium, we must prescribe PrX´t "¨s away from equilibrium. In practice, we find that taking PrX´t "¨s " Unif works well -this is equivalent to simply taking the transpose of P and rescaling, as done in [23]. We refer the reader to the analysis presented in Section 4.3 of [38] for a more detailed discussion on the transpose as a backward operator and its interpretation in the continuous limit.

A.4 Regularised non-negative matrix factorisation
Now we consider the same problem setting as (9), but this time we also require that the NˆM 2 matrix of recovered interactions G be low rank. For a target rank k, we seek "tall" matrices U P R Nˆk , V P R M 2ˆk such that G « U V J . That is, for a cell i, its scGRN is given by a linear combination of the columns of V with coefficients U i1 , . . . , U ik : We consider a general loss function of the form We explain the loss function term by term. The first term encourages a fit to the matrix G of raw TE values. We impose a penalty on the smoothness of the low-rank reconstruction (w.r.t. the graph Laplacian L) with weight α. We add an (optional) affine term on the reconstruction, where the entries of the matrix H can be thought of as a prior on which interactions may occur. Positive (negative) entries of H would correspond to prior on edge existence (non-existence). Priors could be constructed from gene-gene correlations, prior knowledge, or additional assay data (e.g. scATAC or ChIP data). In addition, we can impose Tikhonov and sparsity penalties on the coefficients and atoms. One can derive multiplicative update equations for the gradient descent, following the approach of [78]. For positivity-preserving updates, we decompose any Laplacian matrices into their positive and negative parts:

A.5 Regularised non-negative tensor factorisation
Let G have dimensions NˆMˆM , but for notational convenience we do our derivations in a general setting. Write for convenience X " Sˆd i"1 A piq , where the multilinear product is defined as In order to derive a multiplicative update scheme, all matrices need to be decomposed into their positive and negative parts. We write X pkq for the matricization of X along its k-th mode. We want to derive multiplicative update scheme for the problem LpS, tA piq u 3 i"1 q " The final two terms are straightforward to differentiate. For the first term, note that Now employ the identity X pkq " " Sˆi ‰k A piq ‰ pkq for convenience, then one has The second term is more involved. First consider the case k " 1, so X p1q " A p1q B p1q .
For k ‰ 1, note that if we "re-fold" LX p1q along mode 1, we get Sˆ1 pLA p1q qˆj ą1 A pjq . So, For the third term, Combining the gradient expressions with the decomposition of Laplacians into positive and negative parts as done for NMF, we arrive at the update scheme where for each i ą 1 we define for convenience C`" pB piq pB piq q J`BpiqB piq q{2 C´" pB piq pB piq q J`BpiqB piq q{2.

B Results details B.1 Simulated data
For each boolean network considered, simulated datasets were generated using the BoolODE package [5], modified to record additionally the velocity vector of simulated cells as well as the Jacobian matrix of expression state. We provide below the boolean rules that we used for simulating the two examples in the main text; all other simulated trajectories were produced using rules provided as part of the BoolODE package.   Dimensionality reduction was performed using PCA and a k-NN constructed with k " 25. Diffusion pseudotime was then assigned with the root cell taken to be the simulated cell with the earliest simulation time. Various transition kernels P were then constructed as described below.

Boolean rules: switch
Velocity kernel As described in the main text, the velocity kernel was constructed from evaluations of the drift term of the chemical Langevin SDE using the cellrank.tl.kernels.{ DotProductScheme, CorrelationScheme, CosineScheme } classes within CellRank [23], with the previously computed k-NN graph and bandwidth σ estimated using the median heuristic.

Diffusion pseudotime kernel
The diffusion pseudotime (DPT) kernel was constructed from the previously estimated diffusion pseudotime ordering using the cellrank.tl.kernels.PseudotimeKernel class within CellRank.
Optimal transport kernel Cells in the top 92.5%-97.5% of pseudotime were assigned a negative flux rate R i so that the net negative flux was´50. Remaining cells were assigned a positive flux rate to satisfy the zero net flux requirement, ř i R i " 0. StationaryOT [34] with quadratically regularised optimal transport was applied with ∆t " 1.0, ε " 0.05. The cost matrix was taken to be the matrix of pairwise graph distances constructed from the k-NN graph. To prevent short-circuiting resulting from spurious edges, edges were ranked by their random-walk betweenness centrality and the top 5% of edges were pruned. [21] was applied using the previously constructed k-NN graph, the same flux rates R i as used in the optimal transport kernel, and diffusivity D " 1.0.
Evaluating scGRNs For each simulated dataset, ground truth scGRN networks were constructed by recording the Jacobian for each sampled cell state. Inferred scGRNs were evaluated against this ground truth by constructing Precision-Recall curves using the approach described in the main text (Section 3.1). Static GRNs were constructed by averaging scGRNs over the cell population.

B.2 Results: simple trajectories
We applied locaTE with several choices of kernel to the simulated trajectory examples from [5]. For comparison, we also applied CeSpGRN, TENET, PIDC, Scribe and SCODE. A grid search was performed for locaTE, CeSpGRN and TENET, and default parameters were used for the remaining methods. Performance for each method was assessed in terms of the AUPRC ratio for directed edge inference. We summarise performance with boxplots below, and show representative examples of the recovered static networks.

B.3 mESC dataset
Log-transformed expression values for 100 TFs and pseudotime estimates were fetched from the SCODE example "data1" [18]. Cells were embedded into the top 25 principal components and a k-NN graph was constructed with k " 25. Using this k-NN graph, a cost matrix C of squared pairwise shortest-path distances was calculated using the Floyd-Warshall algorithm. StationaryOT [34] was then applied using this cost matrix, treating cells in the final 10% of pseudotime as sink cells. The flux rate R i for sink cells was taken to be uniform such that ř iPsinks R i "´50, and similarly for non-sink cells so that ř iRsinks R i " 50. Quadratically regularised optimal transport was applied with ε " C, i.e. the mean value of C. The resulting transition matrix was used as the forward transition matrix for locaTE. The ESCAPE reference network of [19] was retrieved from https://github.com/gitter-lab/SINGE-supplemental.
LocaTE was applied to estimate TE using the transition kernel derived from StationaryOT with τ " 1. Expression values were discretised using the Bayesian blocks algorithm as implemented in [4], with artifactually small values (ă 10´2) set to zero in order to prevent many bins being created near zero. Raw TE values were first filtered using weighted CLR, and denoising regression was applied with λ 1 " 10, λ 2 " 10´3 using the normalised k-NN graph Laplacian.
For comparison, PIDC [4], GRISLI [20], TENET [17], SCODE [18] and CeSpGRN [13] were applied using the same log-transformed expression values and pseudotime estimates. For TENET a history length of 1 was used, following the usage guidelines in its documentation. We found that TENET tended to perform worse with longer history lengths. For SCODE, the rank parameter was set to D " 4 which is the recommended value. For CeSpGRN, we took a neighbourhood size |N piq| " 25, bandwidth σ " 5, and sparsity level λ " 0.1 in keeping with the suggested parameter ranges in [13,Section 3.1]. All other parameters were taken to be their default values.

B.4 Pancreatic dataset
Expression data was processed and RNA velocity was estimated following the CellRank tutorial available at https://cellrank.readthedocs.io/en/stable/cellrank_basics.html. A transition kernel derived from RNA velocity was produced using cr.tl.kernels.VelocityKernel using the cosine similarity scheme. Pseudotime estimates for visualisation were calculated using the function scvelo.tl.latent_time. Intersecting the genes that passed filtering with the list of TFs from [20] yielded a subset of 82 TFs. LocaTE was applied using log-transformed expression values of the TF subset and the transition kernel derived from RNA velocity, with τ " 1. All other parameters were chosen to be the same as for the mESC dataset.