## Abstract

Inspired by Waddington’s illustration of an epigenetic landscape, cell-fate transitions have been envisioned as bifurcating dynamical systems, wherein the dynamics of an exogenous signal couples to a cell’s enormously complex signaling and transcriptional machinery, eliciting a qualitative transition in the collective state of a cell – its fate. Single-cell RNA sequencing (scRNA-seq) measures the distributions of possible transcriptional states in large populations of differentiating cells, making it possible to interrogate cell fate-transitions at whole-genome scales with molecular-scale precision. However, it remains unclear how to bridge the disparate scales of the dynamics of whole transcriptomes to the molecules that define the collective fate-transitions in Waddington’s geometric vision. We bridge these scales by showing that bifurcations in transcriptional states can be analytically pinpointed and their genetic bases revealed, directly from data. We demonstrate the power of our conceptual framework and analytical scheme in the context of a recent scRNA-seq based investigation of a classic case-study of sequential fate decisions – the transition of hematopoietic stem cells to neutrophils. Our work provides a rigorous and model-independent mathematical framework for detecting and categorizing transitions in cell-fate directly from sequencing data, aiding in gene network inference and determination of the salient properties of cellular differentiation, such as when cell fate transitions are reversible and how these transitions generate a diversity of cell types.

## 1. Introduction

During development and tissue regeneration, cells progress through multiple transitions to ultimately adopt a distinguishable function. While each transition en-route to a terminal fate involves the coordination of myriads of molecules and complex gene regulatory networks interacting with external factors, there is a common view that cellular specification depends on significantly fewer underlying variables and control parameters. This view was notably explicated by Conrad Waddington, in an illustration of an epigenetic space as a tilted, bifurcating landscape, where a vast number of nodes (genes) provide the scaffold for the smooth hills and valleys (cell state) along which a pebble (cell) can reliably roll down until it finds a resting position (terminal fate) (Fig. 1A) (1).

Many of the characteristics of Waddington’s landscape can be codified into the language of dynamical systems, including that cell fates resemble valleys (attractors) in gene expression, or transciptomic, space (2–5), that a small amount of stable states can emerge from large interconnected simple boolean networks (6), and that known genetic interactions can yield multiple cell fates (bistability) (7, 8). Waddington’s illustration has motivated analysis of the wealth of data captured in single-cell RNA-sequencing (scRNA-seq) as well, in which the transcriptome of populations of cells are measured, often at multiple time-points as they differentiate. For example a mathematical model of a pitchfork bifurcation can be fit to scRNA-seq data and yield predictions for developmental perturbations (9), large transcriptomic matrices can be dimensionally reduced to enhance the resolution of bifurcations to precisely determine the genes enabling a cell fate decision (10, 11), and well characterized cell-lineage relationships can be used to extract predictive models of gene regulation (12, 13). More recently, it has been shown that the proliferation in the number of accessible cell fates during development evident in Waddington’s landscape is a statistical feature of time-course scRNA-seq data (14).

While it is now common place to employ statistical dimensionality reduction methods to analyze scRNA-seq datasets, spanning multiple cell fates and decisions, it is often not apparent where, in the low dimensional representation, cell fates change (15–18). This has motivated the formulation of pseudotime algorithms, parametric curve fitting tools that use the transcriptome similarity of cells to determine their relative ordering in developmental time, and yield transcriptomic trajectories that span developmental decisions (10, 19, 20). It remains unclear, however, how to bridge the measurements of transcriptomic trajectories to dynamical systems theory and Waddington’s perspective: that development comprises distinct, discontinuous changes of cell-fate. Additionally, it is unclear if and how transcriptomic trajectories can be used to unbiasedly learn more about the genetic mechanisms driving cell fate changes.

Our main insight in this paper is that developmental pseudotime can be viewed as a control, or bifurcation, parameter for studying changes in transcriptomic state. As an example of how control parameters relate to the transcriptional state within a cell, consider the classical relationship between thio-methylgalacatoside (TMG) (a molecule similar to lactose) and the cellular process of Lactose metabolism (21). Lactose metabolism is maintained via the *lac* operon, a DNA functional unit comprised of three genes: *lacZ*, which encodes enzymes that metabolize Lactose, *lacY* which encodes proteins that facilitate the uptake of TMG and similar molecules from outside the cell, and *lacA* which encodes enzymes for sugar metabolism. TMG activates transcription of the operon by inhibiting LacI from repressing the operon. A positive feedback loop via *lacY* that encodes for extracellular TMG uptake ensures that high concentrations of extracellular TMG enable the cell to adopt a lactose-metabolic state, while low concentrations yield a non-lactose-metabolic state. Most interestingly, intermediate extracellular TMG concentrations yield bistability between the two states, a signature of saddle-node bifurcations (21). Thus, dynamical molecular processes involving transport and the regulated expressions of genes, ultimately yield a cell’s adoption of a specific transcriptional state (e.g., high LacZ and LacY) and corresponding function (lactose-metabolic), but a functional change driven by a bifurcation can only be triggered if the control parameter (the amount of extracellular TMG) is varied past a threshold. While recent work has used pseudotime as a proxy for developmental time, we take this further and show that it can be viewed as an axis along which a control parameter(s) steadily varies, permitting a dynamical systems style investigation of possible bifurcations in the collective transcriptomic state. This dynamical systems view is further suggested by the observation that the dynamic molecular processes that lead to changes in the transcriptomic state, such as signal transduction and transcription, generically occur on the order of seconds and minutes (22), while cell fates change over the course of hours or days (4), suggesting that developmental control parameters vary at significantly slower rates than their downstream molecular processes.

Here, we layout and demonstrate a statistical formalism for detecting and interrogating bifurcations in developmental fate transitions directly from transcriptomic pseudotime trajectories. Unlike previous studies (7–10), we do not assume any specific mathematical form for the underlying genetic interactions, but instead focus on the observation that at some point along a bifurcative developmental fate change, there must be a window of developmental time during which the observed cells exhibit multistability between the transitioning fates (Fig. 1B) (23, 24). We show that our data-driven approach enables us to distinguish between three different types of transcriptomic variation directly from systems-level data: a non-bifurcative cell fate change that is due to continuous changes in gene expression (Fig. 1C, top) without multistability; a cell fate change that is due to a one-to-one state transition (Fig. 1C, middle), such as those that may occur during terminal-fate maturation (25); and a cell fate change that is due to a one-to many-state transition, resembling those depicted in Waddington’s landscape that occur when pluripotent cells decide between multiple cell lineages (Fig. 1C, bottom). We apply our framework to a class of *in-silico* gene networks, to demonstrate its ability to recover the salient features of a bifurcating dynamical system, and examine the effects of high dimensionality and noise. We demonstrate the utility of our framework in the context of a recently published scRNA-seq exploration of hematopoiesis, and show that cell-fate bifurcations can be pinpointed and analyzed in scRNA-seq data even without detailed knowledge of the underlying system’s dynamics and controls. Finally, we demonstrate that our framework can be used to extract genetic relationships that are pivotal to the dynamics underlying a bifurcative cell-fate change, in order to recover genes of known importance in hematopoiesis and generate new testable hypotheses for genetic networks that underly a cell fate decision.

## 2. Theory: the relationship between the Jacobian and the Covariance

In this section we outline the theoretical foundations that enables gleaning key factors driving cellular differentiation from scRNA-seq data. Summarizing our key insights, we layout a conceptual framework tailored to the analysis of sequencing data that, despite the absence of a generative mechanistic model for the dynamics of this high-dimensional and complex system, can reveal and investigate the system’s most salient dynamical features, its bifurcations, from the transcriptomic trajectories alone (24, 26).

A scRNA-seq measurement yields a transcriptomic matrix, where each row is a different cell and each column is a different gene (Fig. 2A). Various dimensionality reduction and visualization tools, such as t-SNE (27), and k-nearest neighbor maps (28), have been used to visualize this data and determine pairwise distances between cells based on their transcriptomes (for example, the methods described in Secn. S3). More recently, parametric curve fitting tools have been developed to determine how cells in transcriptomic space vary with respect to a state variable, such as developmental time (pseudotime) (10, 19, 20) (Secn. S3). Where a pseudotime trajectory contains a cellular state change, then one could perhaps determine which genes are responsible, by binning the cells along the trajectory to statistically compare **G**(*τ*), the transcriptomic matrix at pseudotime *τ*, with a neighboring matrix at a time Δ*τ* in the future, **G**(*τ* + Δ*τ*) (29). While such an analysis may reveal which genes exhibit dynamic signals, it will not reveal if the underlying cell-fate landscape has bifurcated, and may be sensitive to the discretization (13). Here, we present an alternative method, fundamentally rooted in dynamical systems theory and aimed at pinpointing cell-fate bifurcations, that only relies on being able to calculate **C**, the gene-gene covariance at a particular pseudotime (Fig. 2A).

Our analytical framework hinges on a fundamental mathematical relationship between the covariance matrix **C** estimated from a data matrix and the most salient properties of the underlying, unknown, generative model for the dynamics (illustrated in Fig. 2B). Regarding the underlying biochemical processes codified in a mechanistic model, we assume that (1) they are stochastic and Markovian (30–33), and (2) occur at significantly faster timescales (seconds to hours) than the timescales over which transitions in cellular fates are observed (hours to days) (4, 22). A consequence of these assumptions (details in Secn. A) is that the local time evolution of a cell’s transcriptomic profile is controlled by a single matrix, the Jacobian (**J**), where is the effect of the amount of gene *j* on the dynamics of gene *i* (Fig. 2B). Thus, the Jacobian matrix encodes the update rules that determine the future state of the transcriptome based on its current state and the exogenous signals presented to the cell. Note that the Jacobian matrix, in general, changes with pseudotime.

Under these assumptions, **J** relates to **C** through the continuous-time Lyapunov equation (34),
where **D** is the expected noise amplitude for individual genes and their their interactions (derivation in Secn. 5A) (26). Eq. (1) is a remarkable relationship between the Jacobian and Covariance matrices of a system, implying that some mechanistic details of a system are encoded in, and recoverable from, the fluctuations of the state variables alone! While leveraging this relationship doesn’t make it possible to determine every element of **J**, since **D** is generically unknown and **J** (unlike **C**) is generically not a symmetric matrix, its most salient properties corresponding to its eigen-decomposition are inferrable from the covariance matrix near bifurcations.

Before we discuss the quantitative ramifications of this relationship in the context of a realistic high-dimensional dynamical system we first investigate a simple, but easy to visualize, one-dimensional toy-model, that undergoes a (imperfect pitchfork) bifurcation as a function of a control parameter (*a*) and is stochastic, controlled by the amplitude of the Langevin term, *η*. Fig. 2C displays the system before (*a* < 0), at (*a* = 0), and after (*a* > 0) the bifurcation. The slope of the potential function, drawn in black, determines the deterministic features of the dynamics of the system. In particular, parameter regimes (*a* ≪ 0 and *a* ≫ 0) where the potential has highly convex curvature exhibit stable fixed points, while parameter regimes near the bifurcation (*a* = 0), that have much flatter curvature exhibit instability. This dramatic reduction in curvature (mathematically, the curvature goes to 0 at the bifurcation itself) is the geometric signature of a bifurcation. Stochastic simulations of the system (drawn as open circles in Fig. 2C – color corresponds to the value of the control parameter) as a function of the control parameter demonstrate that owing to the reduction in curvature of the underlying potential, the data is spread maximally at the bifurcation, and narrows on either side of it.

The above toy model extends naturally to higher dimensions and captures the essence of the idea put forward in this paper. Summarizing the mathematical argument below, if a complex high-dimensional dynamical system undergoes a bifurcation, then in its vicinity there must be, by definition, some direction in the high-dimensional space whose curvature reduces dramatically, or softens. Consequentially, at the bifurcation point, the fluctuations in the system will be greatly enhanced along that soft direction. Generically, this direction need not point along any single dynamical variable (for example individual genes in a gene interaction network) of the original systems.

We now elaborate on the mathematical details of our central argument. Generically, the landscape’s curvature can be obtained from diagonalizing **J**, such that, if
where Λ is a diagonal matrix of eigenvalues {*λ*_{1}, *λ*_{2} … *λ*_{ng}} and **P***T* is the square matrix of eigenvectors , then *λ*_{i} is the curvature of the landscape in the direction. *λ*_{i} < 0 ∀ *i* indicates that the landscape is convex in all directions, while *λ*_{d} = max (Λ) → 0 indicates that the landscape is flat in the direction, allowing for a fixed point exchange, or bifurcation. *λ*_{d} → 0 also considerably simplifies Eq. (1), such that it can be shown
(details in Secn. 5B). The effect of *λ*_{d} → 0 on **C** is most clearly recognized by rewriting **C** in terms of its eigen-decomposition,
where {*ω*_{1}, *ω*_{2}, … , *ω*_{ng}} are its eigenvalues, and are its eigenvectors. Since all by definition, normalize to 1, Eq. (4) can only equal Eq. (3) if for at least one *ω* (*ω*_{1}, without loss of generality)

Thus, the covariance diverges along the principal direction , as illustrated for the toy model in Fig. 2C. Furthermore, since the direction of the landscape flattening (*x* in the toy model) is the same as the direction of the covariance expansion, it can be shown, by equating Eq. (3) to Eq. (4) (where the *k* = 1 term will dominate), that
(details in Secn. 5C). Lastly, we note a direct result of Eq. (3) is that at a bifurcation the Pearson’s correlation coefficient of the data along axes *i* and *j*, where their corresponding loadings on the eigen vector are non-zero, becomes maximal, as

There is thus a secure correspondence between the fluctuations of a system and the underlying geometric landscape that determines its dynamics. In particular, three specific changes to the transcriptomic covariance data, Eq. (5)–7, that can be determined from observations of the dynamics alone can inform us as to the salient features of the system, its bifurcations, even when we have no direct access to the generative model for the dynamics, or its corresponding underlying geometry. Notably, these features only rely on **G** being at steady state in the vicinity of an attractor, and do not rely on special circumstances, such as the Jacobian being symmetric, or the noise being of a particular nature. Landscape geometry and gene dynamics are two sides of the same phenomena, and we leverage one to learn about the other.

We first demonstrate that these theoretical results apply to high-dimensional noisy dynamical systems in the context of a simulated gene regulatory network, and that we can leverage them to detect and assess bifurcations in the underlying developmental landscape from observations of state-variables alone. Following this, we demonstrate the power of this approach by directly applying it to scRNA-seq data for neutrophil lineage in the hematopoietic system.

## 3. Results

### A. Covariance analysis recovers salient features of high-dimensional *in silico* gene regulatory networks

To better understand our mathematical framework in the context of scRNA-seq data, where the large number of discordant genes and biological noise may obfuscate the predicted covariance signal indicative of a bifurcation, and cell fate changes may take different geometric forms, we tested the framework on a noisy, high-dimensional, gene-regulatory network (GRN), whose dynamics are governed by a set of explicit ordinary differential equations,, simulated with Poissonian noise (details in Secn. 6). In our model, cell fate transitions result from two mutually inhibiting “driver” genes, *g*_{1} and *g*_{2} via their dynamics
where *k*_{D} are their degradation rate, and *m*_{1,2} determine the scales of their synthesis (35). This simple system is illustrated in Fig. 3A. Varying the control parameter *m*_{1} yields a saddle-node bifurcation in gene-expression while varying *k*_{D} yields a pitchfork bifurcation (Fig. S1). Similar networks have been analyzed to provide insight into gene-inhibition and activation (36) in a diversity of biological systems, such as the lac-operon (21) and cell-cycle control (37).

As GRNs typically involve hundreds of genes, we include an additional *n*_{g} – 2 genes in the network that respond variably to the driver genes, according to
where *i* ∈ [3, *n*_{g}]. *k*_{i} is the degradation rate of *i*^{th} gene, *g*_{i}. The variable *d*(*i*) is set to 1 if *g*_{i} responds to *g*_{1} and 2 if it responds to *g*_{2}. *α*_{i} ∈ [0, 1] is the strength of the connection between the responder *g*_{i} and its driver gene, *g*_{d}(*i*) (*α*_{i} = 0 yields full inhibition and *α*_{i} = 1 yields full activation). Thus each of these responder genes is connected to one of the two driver genes, to greater or lesser extents, as indicated in Fig. 3A. Though this GRN can be made more complex by including feedback from the responding genes to the two driver genes, or increasing the number of driver genes themselves, this simple model provides an interpretable demonstration of our proposed scheme.

We simulated Eq. (8)-Eq. (9) for a fixed number of genes (*n*_{g}), statistical replicates, or cells, (*n*_{c}), noise scale (*s*), duration (*N*_{t}), and timestep (*δt*) for different values of the control parameters (*m*_{1}, *k*_{D}) (details in Secn. 6). We define the [*n*_{c} × *n*_{g}] transcriptomic matrix G(*m*_{1}, *k*_{D}) once the system has reached steady-state in the simulation. We observed that the steady state distributions for individual genes (for example, *g*_{1}, shown in Fig. 3B) shift their mean as the control parameter, *m*_{1}, is varied, and exhibit bimodality at the bifurcation point, *m*_{1} = *m*_{1c} = 3, as expected for saddle-node bifurcations (Fig. 1B). We verified that *N*_{t} was sufficiently large by averaging G(*m*_{1}) across cells, and observing that individual genes discontinuously, but predictably, switch their expression at *m*_{1c} (Fig. 3C; genes sorted by *d*(*g*_{i}) and *α*_{i}) compared to the continuous and unpredictable transitions observed with low *N*_{t} (Fig. S2A).

Having verified that our model simulates a situation where the systems undergoes a high dimensional saddle-node bifurcation driven by a 2-gene driver core, we use it to examine the effects of noise and a large number of responding genes on the theoretical predictions described in Secn. 2 (Eq. (5),6,7). As predicted, we found that *ω*_{1}(*m*_{1}), the largest eigenvalue of the covariance of **G**(*m*_{1}), is maximal at the critical value *m*_{1c} (darker line in Fig. 3D), and the increase is significantly larger than can be obtained from a null distribution (lighter line in Fig. 3D). We generated this null through a marginal-resampling approach, which explictly destroys any correlation between genes, of **G**(*m*_{1}) (details in Secn. S2). This contrast between the data and the null can be understood by considering the bimodality of the transcriptomic distribution at the bifurcation. Far from the bifurcation, the transcriptomic distribution is unimodal, and all *ω*_{i} scale with the noise scale *s*, which is undirected, and therefore unaffected by resampling, yielding (Fig. S3 left and right panels). However, at the saddle-node bifurcation, the transcriptomic distribution is bimodal, so *ω*_{1} scales with the distance between the two modes (Fig. S3 center-top); marginal resampling of transcriptomes at the bifurcation yields new modes and the increased dimensionality of the bifurcation diminishes , compared to *ω*_{1} (Fig. S3 center-bottom). While Fig. S3 only demonstrates the bifurcation bimodality in *g*_{1,2}, the full transcriptomic bimodality can be visualized by computing , the normalized projection of each cell’s transcriptome along the principal covariance eigenvector. The distribution of this projection is densely centered around different fixed points to the right and left of *m*_{1c}, but widens significantly at *m*_{1c} as there is non-zero probability for both transcriptomic modes (Fig. 3E).

Since, in this example, we have an explicit generative model given by Eq. (8)–9), we can validate that just as *m*_{1c} resembles a bifurcation from analysis of the covariance matrix, it also resembles a bifurcation of the full, noisy GRN, from analysis of the Jacobian. We show that the maximum eigenvalue (*λ*_{d}) of the Jacobian for this network approaches 0 as *m*_{1} → *m*_{1c} (Fig. 3F). We also show that at *m*_{1c}, the direction of maximal covariance is given by the corresponding eigenvector of the Jacobian , as the Euclidean distance between , the principal eigenvector of the covariance, and approaches 0 as *m*_{c} → *m*_{1c} (Fig. 3F). Thus, while the finite system size (*n*_{c}) prevents, or regularizes, *ω*_{1} from diverging completely, and , *ω*_{1} is still at its largest and the eigenvectors are in closest correspondence, at the bifurcation.

While the covariance eigen decomposition provides insight into the timing and direction of a bifurcation, our theory predicts that the (Pearson) correlation coefficients between genes may help determine which genetic relationships are most critical for the dynamics at the bifurcation. We found that for low and high *m*_{1}, when the network only has one fixed point, the distribution of *R*_{ij} is strongly centered around 0 (Fig. 3G). However, at the bifurcation, this distribution spreads out to ±1 as predicted in Eq. (7) (Fig. 3G). To determine if the gene pairs that yielded large *R*_{ij} corresponded with critical gene relationships in our network, we plot *R*_{i,d}(*i*) the correlation between all responder genes and their drivers, sorted by their connection strength *α*_{i}. We demonstrate that these correlation coefficients were much more strongly indicative of the responder-driver dependency (*α*_{i}) at the bifurcation (Fig. 3H, green) than far away from the bifurcation (Fig. 3H, red and blue). Again, the model makes explicit that although the correspondence between geometry and dynamics is not universal, it provides a bridge in the vicinity of a bifurcation. Thus, entries of a correlation matrix with high magnitude at a bifurcation may be reliable indicators of mechanistic gene-regulatory features.

We further used our model to understand Eq. (5) in the context of the pitchfork bifurcation induced by varying *k*_{D} (Fig. S4A). Unlike the example of a saddle-node bifurcation, we observed that *ω*_{1} does not peak at the bifurcation parameter *k*_{Dc} = 0.5, but rather begins to increase (Fig. S4B). This feature directly follows our interpretation that *ω*_{1} is the distance between the two modes of the transcriptomic distribution. While the bimodality of the saddle-node bifurcation results from the discontinuous transition between states, the bimodality of a pitchfork bifurcation emerges continuously from its root and becomes more pronounced as the control parameter is increased. Therefore, the distance between the modes (*ω*_{1}) increases with the control parameter. By clustering the cells according to their transcriptomic mode, or branches, we are able to recover the bifurcation signature predicted by Eq. (5) (Fig. S4C), but we note that precise clustering requires prior knowledge (e.g., of how many clusters there are).

In this example, we have demonstrated the applicability and power of our theoretical infrastructure to analyze a high-dimensional and noisy dynamical system undergoing a variety of bifurcations, uncovering its crucial aspects, including its location, direction in gene space, and influential genetic relationships. These calculations are also comptuationally simple; while covariance matrices can be cumbersome to compute for large numbers of genes and cells, reduced singular value decomposition can be used to quickly determine its largest eigenvalue and eigenvector, which is all our approach requires. Notably, our results only apply if the system is measured at steady state; in the absence of which there is no reason to anticipate clear divergences in the distribution of eigenvalues, transient bimodality, or equivalence between the covariance and Jacobian principal directions (Fig. S2B-D).

### B. Covariance analysis pinpoints a bifurcation in Hematopoietic stem cell development

Having verified that gene-gene covariance can be used to identify and classify a bifurcation in a simulated genetic context, we applied our analysis framework to a recently published scRNA-seq data set of mouse hematopoietic stem cell (HSC) differentiation (18).In this experiment, HSCs were isolated *in vitro*, barcoded, plated in a media that supports multilineage differentiation (day 0), and subsequently sampled for single-cell sequencing using inDrops (38, 39) on days 2, 4, and 6. The resultant transcriptomic matrix (25,289 genes in 130,887 individual cells) was visualized in 2*D* via the SPRING method (28) (Fig. 4(A)), in which the matrix was filtered to only include highly variable, non-cell cycle genes, and (*x, y*) coordinates are determined such that each cell is optimally placed closest to its 4 nearest neighbors in the space of the top 50 gene-wise principal components (PC) (28). Each cell was then ascribed to one of 11 different cell types (annotations in Fig. 4(A)) based on its position in the SPRING plot and expression of cell-type specific marker genes (18). Cells that belonged to the developmental transition from multipotent progenitor (MPP) to neutrophil were identified by recategorizing cells as a cell-label distribution, and ranking cells by their similarity to fully committed neutrophils (details in Secn. S3). The 61,310 cells identified as belonging to the neutrophil transition were sorted into a neutrophil pseudotime trajectory (Fig. 4A) by ranking cells according to their similarity with the earliest pluripotent cells (Secn. S3). This data-specific pseudotime algorithm was validated via the clonal barcodes, by ensuring that the MPP cells in the trajectory included neutrophil clones, and via the sequencing time, by ensuring that cells collected earlier were ranked earlier in the trajectory. We found that expression of hundreds of highly expressed and variable genes in these cells varied temporally along this trajectory, with large groups of cells either monotonically increasing or decreasing (Fig. 4B).

There were several features of this trajectory that make it ideal for applying our analysis framework: it is robust to the systematic, temporal controls of sequencing-time and cellular bar-codes, and included a large number of cells, enabling statistically reliable covariance and correlation measurements. Finally, this trajectory is part of hematopoiesis, a well-characterized developmental process, allowing us to compare our findings against past work.

To determine if the transitions from HSC to neutrophil were due to bifurcations in transcriptomic space, we split the neutrophil trajectory into overlapping bins of 1000 cells (last bin had 1310; details in Secn. S4 and Fig. S5) and applied our covariance analysis to the transcriptomic matrix at each bin **G**(*τ*). We found that the largest eigenvalue of the covariance matrix (*ω*_{1}(*τ*), dark line in Fig. 4C) exhibited very little variation for *τ* < *τ*_{d} = 85, but began to increase at *τ*_{d}, and exhibited a significant spike at *τ*_{m} = 109. To determine if these *ω*_{1} changes were statistically significant, we computed the corresponding statistical null lighter line in Fig. 4(C), details in Secn. S2) and found that the large peak at *τ*_{m} was easily distinguishable from the null. We first focus our attention at the dynamics at *τ*_{m}, following which we will address those observed at *τ*_{d}. As this pattern of a statistically significant spike following near-constant *ω*_{1} echoed the observed behavior of a saddle-node bifurcation in our toy model (Fig. 3D), we speculated that at *τ*_{m} there was a one-to-one transcriptomic state transition. To further visualize the multimodality due to this one-to-one state transition, we computed the projection of normalized gene expression onto the first covariance eigenvector at the bifurcation . We found (Fig. 4D) that similarly to the saddle-node bifurcation (Fig. 3E) the distribution of this projection was widest at the transition point, *τ*_{m}, providing further evidence that the cells switched from one transcriptomic state to another at *τ*_{m}, and that at *τ*_{m} the transcriptomic state was multimodal.

We focus now on our observations in proximity to *τ*_{d}. As the increase in *ω*_{1} at *τ*_{d} strongly resembled our toy model’s pitchfork bifurcation (Fig. S4B), as well as the proliferation of cell fates seen in high-resolution time-course scRNA-seq experiments (14), we sought to determine if the increase of *ω*_{1} at *τ*_{d} was also due to transcriptomic state changes. While is not obviously well-separated from during the increase, the difference between them switched from negative to positive shortly after *τ*_{d} (Fig. 4C, inset), indicating a regime crossover. Furthermore, despite gene expression being normalized per cell throughout the dataset, the distribution of each gene’s expression (across cells) begins to significantly shift toward higher values at *τ*_{d} (Fig. S5C). These findings suggest that with additional sequencing or prior knowledge, the genomic and geometric details of a one-to-many transcriptomic state transition at *τ*_{d} may be distinguishable.

To determine if the transcriptomic state transitions we identified had biological significance, we compared our findings against the tree of cell fates that has been identified for neutrophil development (Fig. 5A). This tree includes multiple one-to-many cell fate changes, such as the cell fate decision at which a granulocyte monocyte progenitor (GMP, or myeloblast), can be induced to divide into the progenitors of any of four terminal fates: neutrophil, monocyte, eosonophil, and basophil, as well as multiple one-to-one cell fate changes, such as the maturation step where a neutrophil promyelocyte is induced to become a neutrophil myelocyte (18, 40, 41). We found that *τ*_{d} corresponded well with the crossover point between pluripotent and neutrophil-commitment labels (Fig. 5B) identified in previous cell fate clustering (18), as well as the point in pseudotime at which promeylocyte marker genes are maximal (Fig. 5C). This finding suggests that the increase in *ω*_{1} beginning at *τ*_{d} is a consequence of the cell fate decision that occurs between GMP and promyelocyte. Specifically, the resemblance of this increase to the unclustered pitchfork bifurcation (Fig. S4B) suggests that while cells in this trajectory were expected to only include the neutrophil lineage of GMP, they may in fact include other GMP lineages, such as eosonophils or basophils.

Conversely, the spike in *ω*_{1} identified at *τ*_{m} correponds well with the switchover from high expression of promyelocyte marker genes (identified in Ref. (18)) to myelocyte marker genes (Fig. 5C). This biochemical result matches our geometric interpretation that the spike in *ω*_{1} at *τ*_{d} indicates a one-to-one fate transition, as the promyelocyte-to-myelocyte cell fate transition is a maturation step of committed neutrophil progenitors, and not a decision between two fates.

Thus, by using Eq. (5) to quantify the geometry of neutrophil development, we were able to recover the known GMP-neutrophil cell fate decision, qualify the trajectory as likely including other lineages, and pinpoint a maturation step in neutrophil development. Broadly, these results qualify the notions portrayed in Waddington’s landscape, as they show that not only do cell fate decisions correspond to (one-to-many) transcriptomic state changes, but that even maturation fate changes can correspond to discontinous transitions between transcriptomic states. Interestingly, the state changes at both *τ*_{m} and *τ*_{d} appear to occur in similar directions, as revealed by the fact that the distribution of gene expression projected onto begins to widen at *τ*_{d} (Fig. 4D), suggesting that there may be generic transcriptomic modes that more readily enable state changes. Because the promyelocyte-to-myelocyte transition identified at *τ*_{m} is distinct from its null, we use this bifurcation to further test the our mathematical framework (Eq. (7)) and present new hypotheses for mechanistic aspects of neutrophil development.

### C. Correlation analysis identifies genetic contributor in myeloid differentiation

As correlation analysis at a bifurcation correctly ranked important gene relationships in our example saddle-node bifurcation (Fig. 3G-H), we used *R*_{ij}(*τ*_{m}), the Pearson’s correlation coefficient between genes *i* and *j* to identify pairwise gene relationships that are important for the promyelocyte-to-myelocyte transition. We verified that the distribution of *R*_{ij} widened at *τ*_{m}, relative to nearby *τ* (Fig. 6A; see Secn. S4 for PCC calculation details) as predicted for one-to-one state transitions (Eq. (7)) and observed in the saddle-node bifurcation example (Fig. 3G)).

We found that the genes in the tails of the *R*_{ij}(*τ*_{m}) distribution (details in Secn. S4) formed a well connected correlation network, comprising only 2 connected components. By clustering the genes based on their various cellular functions, we determined that the larger component (28/31 genes) showed features of a developmental gene regulatory network. The genes in this cluster included markers that broadly classify neutrophils, as well as markers that more specifically classify neutrophilic promyelocytes, some of which (S100a6, S100a8, S100a9, and Ngp) had previously been used to classify these cells (18), and others (Sirpa, Ccl6) which had not. In particular, the S100a9 gene was found to be highly negatively-correlated with the majority of other genes in the network, suggesting that it is pivotal for the observed bifurcation. This cluster also included genes involved in the mechanisms that drive fate changes, such as signaling and transmembrane processes necessary for cell-cell interactions, as well as housekeeping and metabolic genes responsible for maintaining basic cellular processes (Fig. 6B). Interestingly, the smaller component (3/31 genes) exclusively comprised mitochondria regulators, indicating that background, highly covariable expression relationships may be difficult to filter out exclusively from correlation matrix analysis (Fig. 6B).

We also found that the gene expression patterns within the clusters reflected their importance in the promyelocyte-to-myelocyte transition (Fig. S6). Neutrophil and myelocyte cell fate markers, as well as genes implicated in signaling, appeared to switch between low and high expression at *τ*_{m}, indicating that they have expression patterns tied to specific cell fates. Conversely, metabolic, membrane, and housekeeping genes nearly exclusively exhibit local peaks in expression at *τ*_{m}, indicating that they are transiently important for driving, or guiding, the cell-fate transition but do not pertain to a specific fate. The mitochondrial cluster, as well as some miscellaneous and membrane genes (H2d1, Anxa4) exhibited gradual increases or decreases in expression, indicating that they may have developmental patterns that are generically tied to cell fate specification, or distance from pluripotency, but not specifically to the promyelocyte-myelocyte transition.

Lastly, since the principal covariance eigenvector is equivalent to the soft mode of the Jacobian at a bifurcation (Eq. (6) and Fig. 3F), we examined how the loadings of network genes varied around *τ*_{m} to determine if they implied a deeper mechanistic interpretatitohne (Fig. S7). We found that for nearly all genes in the developmental cluster, loadings were locally maximal at *τ*_{m}, suggesting that their relationships with other genes are most important at *τ*_{m}, even if their absolute expression is tied to a specific fate (e.g., neutrophil markers). Interestingly, some marker genes (Ngp, S100a8, S100a9) appeared to transition between high and low loading values at *τ*_{m}, indicating that they may maintain the specific cell fates of promyelocyte or myelocyte through their interactions with other genes. The loadings also appeared to reiterate the lack of importance of genes in the mitochondrial cluster for the promyelocyte-to-myelocyte fate change, as their loadings were independent of pseudotime.

In summary, by applying our mathematical framework, that genetic relationships closely tied to a fate change would be highly correlated at a bifurcation (Eq. (7)), and that a bifurcation’s principal eigenvector is mechanistically informative (Eq. (6)) we recovered genes that were known to be important in neutrophil development, and could place them in context of a larger, newly hypothesized gene regulatory network in data-driven manner. We verified the importance of many of these genes by examining their expression in a pseudotime window around the promyelocyte-to-myelocyte fate change, and are thus able to predict the extent to which individual genes are either transiently important for a cell fate transition, or characteristic of a specific cell fate. The eigenvector loadings were particularly useful in determining whether gene pairs are highly correlated at a bifurcation because of their importance to other genes, maintaining specific cell fates, or pluripotency. Thus, we claim that the mathematical framework presented here can be extremely elucidative of the pairwise gene relationships that underly cell fate changes.

## 4. Discussion

A singular challenge in understanding cellular fate transitions using transcriptomics has been dimensionality: cell fates are a low-dimensional, functional description, a valley in Waddington’s landscape, while gene-expression profiles are points in a myriad-dimensional space– how can gene-expression possibly show the geometry of development? Here, we have introduced a robust, statistical, model-free approach for mapping transcriptomic changes onto fate decisions based on the first-principles of fluctuative-dissipative systems, and shown that, in fact, the high dimensionality of transcriptomes is a catalyst, not a hindrance, to understanding cell development, as gene-pair correlations reveal exactly when and how the cellular state space bifurcates. We applied this approach to a scRNA-seq transcriptomic trajectory encompassing a sequence of fate decisions from pluripotent stem cells to neutrophils, and found that two of these fate decisions corresponded well with bifurcations in gene-expression state space, exemplifying how neutrophil development could be mapped onto a low-dimension cell-fate (Waddington) landscape.

Our finding that a transcriptomic trajectory can have distinct geometric signatures, including durations during which the principal covariance eigenvalue is constant or spikes, has considerable consequences for understanding scRNA-seq datasets that capture developmental transitions. That we saw any consistent behavior in the principal covariance eigenvalue lends significant support to our initial assumption that cell fate modifiers operate at a much slower rate than transcriptomic modifiers, because if these occured on similar timescales, no statistical signature would be evident, let alone those that align well with our current developmental understanding of the system. Additionally, while previous statistical analyses of scRNA-seq data found that development appears as monotonic proliferation of cell fates (14), our focus on a single developmental trajectory enabled distinguishing multiple developmental epochs, including durations of development during which cell fates did not undergo qualitative changes, proliferated (the GMP-to-promyelocyte transition), and changed state (the promyelocyte-to-myelocyte transition). Finally, our evidence of bifurcations starkly contrasts with scRNA-seq visualizations that show gene expression varying smoothly along a developmental path, and underscores the importance of understanding noise and non-linear dependencies when using transcriptomic profiles to classify a cell’s fate (15).

Our analysis of the data in Ref. (18) also yielded intriguing implications regarding the specific geometry of neutrophil development. In particular, some of the known cell fate changes in neutrophil development were not distinguishable in the covariance eigenvalue trajectory (e.g. from Common myeloid progenitors [CMP] to GMP), which indicates that these changes are less discontinuous than the GMP-to-promyelocyte or promyelocyte-to-myelocyte transitions. This could mean, for example, that even when CMPs differentiate to GMPs, the transition lacks committment, and is dependent on a sustained developmental signal, whereas once cells transition from GMP to promyelocyte, they are committed to becoming neutrophils regardless of an external signal. Furthermore, in projecting gene expression onto the principal covariance eigenvector at the promyelocyte-to-myelocyte fate transition (Fig. 4E), it became apparent that direction along which that fate change happened was well aligned with the direction of the GMP-to-promyelocyte transition. This result may be a sign of distinct, soft directions in transcriptomic space along which cell fates are most likely to change.

Aside from these geometric implications, pinpointing the bifurcation in pseudotime may also enable efficent identification of the genes and molecular mechanisms that drive a cell fate transition. As shown, important pairwise gene relationships are explicitly highlighted in the correlation matrix at a bifurcation. While these correlations lack directionality, and therefore cannot be used to infer all aspects of mechanism, they may aid in building regulatory network models when combined with prior protein-interaction data or new experimental perturbations (42). Additionally, since the principal covariance eigenvector is equivalent to the Jacobian eigenvector at the bifurcation, and the Jacobian directly reflects gene dynamics, the eigenvector may reveal critical dynamic information or constrain an inferred global Jacobian (43).

While we focus here on scRNA-seq data, our approach is broadly applicable, and could, in principle, aid in determining other aspects of developmental bifurcations, such as the genomic structural modifications necessary for fate transitions from single-cell ATAC-seq data (44, 45). Additionally, it may be possible to incorporate our covariance analysis into other indications of pseudotime rank, such as cellular barcodes and low-dimensional distance, to constrain developmental trajectories along bifurcative paths. Our analysis was only possible because scRNA-seq experiments can now measure the expression of tens of thousands of genes in hundreds of thousands of cells, enabling accurate covariance measurements. That we found bifurcative events in this data implies that there are low dimensional, non-linear dynamical systems at play, and that sufficient biological sampling will reveal the knobs to controllably tilt developmental landscapes.

## Materials and Methods

Instructions and Python code for reproducing all figures in this manuscript are available at https://github.com/simfreed/sc_bifurc_figs.

## 5. Derivation of results in Secn. 2

### A. Continuous time Lyapunov equation for transcriptomic matrices

Let **G** be the steady state transcriptomic matrix at a single developmental time with *n*_{c} rows (cells) and *n*_{g} columns (genes) (Fig. 2A) and **F** be a set of differential equations describing the molecular interactions that generate **G**, such that
where is the derivative of **G** with respect to time. Since all cells (columns) in **G** are at steady state at the same developmental time *τ*, we assume (for the purpose of contradiction) that they are all statistical replicates of the same transcriptomic state, , and the full matrix, **G**, is therefore in the vicinity of the hyperbolic fixed point
where 1_{nc} is a vector of *n*_{c} ones and *E* denotes the expectation operator. The dynamics of **G** can be by linearized via the distance to the fixed point **X** = **G** − **G***, such that
where is the Jacobian of **G** and we have used the fact that at steady state, **F**(**G***) = 0.

If **F** is stochastic and Markovian, then the dynamics of **X** can be described as a discretized Ornstein-Uhlenbeck (OU) process,
where Δ*t* is the molecular interaction timescale and *ζ*_{t;i,j} is sampled from *N* (0, *σ*_{i}) where *σ*_{i} is the variance of gene *i*. The gene-gene covariance matrix can then be defined as
where the superscript *T* denotes transpose and we have approximated *E*(**G**) = **G***. The stationary condition for an OU process, (i.e., that *∂*C*/∂t* = 0) then yields
where and we have used the fact that *E*(*ζ*_{t}),*E*(**X**_{t}),*E*(*ζ*_{t}**X**_{t}), and and *E*(**X**^{T} ζ^{T}) are all 0 (26).

### B. Covariance at bifurcation

If J is diagonalizable, such that,
where Λ is a diagonal matrix of eigenvalues (*λ*_{1}, *λ*_{2} … *λ*_{ng}) and **P***T* is the square matrix of eigenvectors then Eq. (15), often referred to as the continuous-time Lyapunov (CL) equation, can be used to qualitatively assess **G**. Left-multiplying Eq. (15) by **P**^{−1} and right-multiplying by (**P**^{†})^{−1}, yields
where the † superscript indicates conjugate transpose, and . Since Λ is diagonal, Eq. (17) can be rewritten elementwise,
which can be substituted to yield an expression for elements of the covariance
since . At a bifurcation, max (Λ) = *λ*_{d} 0, so the *k* = *l* = *d* term in Eq. (19) becomes dominant, and
where is the *d*^{th} column of **P**.

### C. Bifurcation eigenvector equivalence

Since **C** is real and symmetric, the eigenvalue decomposition can be written as a single sum,
where {*ω*_{1}, *ω*_{2}, … , *ω*_{ng}} are its eigenvalues, and are its eigenvectors, which are normalized to 1. For large *n*_{g}, Ω and **S** are signficantly easier to compute than **C** as they can be obtained from the singular value decomposition of **X** = **G** − *E*(**G**). For Eq. (21) to be equivalent to Eq. (20) at a bifurcation, at least one eigenvalue *ω*_{i} → ∞, which we may, without loss of generality, refer to as *ω*_{1}. If *ω*_{1} ≫ *ω*_{i} for *i* ∈ [2 … *n*_{g}] then the *k* = 1 dominates the sum in Eq. (21), and by equating with Eq. (20) we obtain
where we have used the fact that and both normalize to 1.

## 6. Simulation methodology

To explore our analysis framework on a more biologically relevant gene network (Fig. 3A) we utilized a Focker-Plank simulation method. For each of the *N*_{c} = 100 cells (*N*_{c} chosen by examining how many cells were neccesary to accurately detect bifurcations in the neutrophil data Fig. S5A-B) the expression of gene *i*, (*g*_{i}(*t*; *m*_{1}, *m*_{2}, *k*_{D}) is initalized uniformly randomly in the interval (0,4]. The expression at subsequent timesteps (*g*_{i}(*t* + Δ*t*)) is sampled from a Gaussian distribution *N* (*μ, σ*) where
and bounded to be non-negative. The simulation ran for *N*_{t} = 50, 000 timesteps, with Δ*t* = 0.01 as that was sufficient for equilibration Fig. 3, and the last timestep of each simulation is the steady state expression **G**. In the saddle-node example (Fig. 3) the remaining parameters were *k*_{D} = 1, *m*_{2} = 3, *m*_{1} ∈ [2, 4]*, s* = 20 while in the pitchfork example (Fig. S4) *m*_{1,2} = 1, *k*_{D} ∈ [0.24, 5], *s* = 200.

## 7. Supporting Information

**S1. Bifurcations possibilities from two mutually inhibiting genes.** At steady state, Eq. (8) satisfies the quintic polynomial
which, depending on the parameter values, can have one real solution that is an attractor (e.g., if *m*_{1,2} = 1 and *k*_{D} = 1) or three real solutions, two attractors (nodes) and one repellor (saddle) (e.g., *m*_{1,2} = 1, *k*_{D} = 1/3). By examining the null clines,
it can be deduced that varying *m*_{1}, while fixing *τ* and *m*_{2} can yield a saddle-node bifurcation, as Eq. (S2) moves vertically while Eq. (S3) does not, allowing for either node to merge with the saddle (Fig. S1A). Conversely, varying *k*_{D}, while fixing *m*_{1,2} and *m*_{2}, can yield a pitchfork bifurcation, as both null clines move, such that above the bifurcation value, all three real solutions remain (Fig. S1B). Solving Eq. (S1) computationally via the Python function numpy.roots and plotting the real solutions (Fig. S1C-D) yields the bifurcations used in Fig. 3 and Fig. S4 (46).

**S2. Resampling principal eigenvalue.** Given the transcriptomic matrix where and *G*_{i,j} is the expression of the *j*^{th} gene in the *i*^{th} cell, we generate a null sample G^{null} by drawing each of its entries randomly, with replacement, from .In Fig. S5,3,4, we compute the principal covariance eigenvalue for each of *n*_{s} = 20 samples, and compare this null distribution against the principal covariance eigenvalue of the data *ω*_{1}. This resampling technique has little impact on *ω*_{1} for unimodal distributions as the scale of *ω*_{1} is still determined by the system’s noise (Fig. S3 left and right), but significantly decreases *ω*_{1} for multimodal distributions (Fig. S3 center) since the structure of the multimodality is scrambled; thus we found it was an effective method for determining if a spike in *ω*_{1} is due to multimodality or increased noise.

**S3. Algorithm for generating the pseudotime labels in Weinreb et al.** SPRING (x-y) positions, cell type annotations, and pseudotime ranks for the data presented in Fig. 4A-B were downloaded from https://github.com/AllonKleinLab/paper-data/tree/master/Lineage_tracing_on_transcriptional_landscapes_links_state_to_fate_during_differentiation. The algorithms to generate these values are described in detail in Ref. (18) (Supplementary Materials) and recapitulated here for completeness. Given the full *in-vitro* hematopoiesis transcriptomic matrix (all cells and all genes), the SPRING positions in Fig. 4A plot were generated using the following procedure.

A filtered transcriptomic matrix was generated which did not include genes that

had low variability as determined via the filter_genes function with parameters (85,3,3) from https://github.com/AllonKleinLab/SPRING_dev/blob/master/data_prep/spring_helper.py (28).

correlated highly (R>0.1) across all cells with any of the following cell cycle genes: Ube2c, Hmgb2, Hmgn2, Tuba1b,Ccnb1, Tubb5, Top2a, and Tubb4b.

The top 50 principal components (PC) of the filtered transcriptomic matrix were computed.

40,000 of the cells were selected randomly, and a k-nearest-neighbors (KNN) graph between those cells was constructed using the top 50 PC of the filtered transcriptomic matrix and k=4.

X-Y positions of these 40,000 cells were generated using the ForceAtlas2 algorithm with 500 steps (47).

Positions for each of the remaining 90,887 cells were computed as the average position of their 40 nearest neighbors (in the 50-PC space) among the initial 40,000 cells.

Cells were annotated with their cell types (cluster annotation in Fig. 4A) based on their position in the SPRING plot and their expression (terminal cell fates) or lack of expression (pluripotent) of pre-selected marker genes. Specifically the marker genes used to determine if cells were neutrophils were S100a9, Itgb2l, Elane, Fcnb, Mpo, Prtn3, S100a6, S100a8, Lcn2, and Lrg1.

Neutrophil pseudotime rank was then determined by smoothly interpolating between cells in the pluripotent and neutrophil clusters. The interpolation method used throughout this procedure is an iterative, diffusive process defined as
where is a vector quantity defined for cell *i*, is the matrix of this quantity for all cells, *K*_{k}(*i*) are the cell indices of the *k* nearest neighbors of cell *i*, *n* > 0 is the number of iterations, and *b* is the neighbor weight (low *b* and high *n* both yield high diffusion) (11). The pseudotime ranking procedure is:

Cells are identified to be part of the neutrophil trajectory

Let be an indicator vector for the cell type of

*i*; i.e.*t*_{ij}= 1 if cell*i*is type*j*and 0 otherwise. Let be the corresponding matrix for all cells.Let K

_{100}be the k-nearest-neighbor graph between cells for*k*= 100 using the top 50 PC.Let be the smooth cell type indicator.

Let be the weighted average cell type where the weights for each cell type (

*j*) areLet be a neutrophil trajectory indicator such that if

*z*_{i}>*Q*_{0.6}(*z*) and {0} otherwise, where*Q*_{0.6}(*z*) is the 60^{th}quantile of*z*. Let .Let be the smoothed neutrophil trajectory indicator.

Cells were considered part of the neutrophil trajectory if where is the 60

^{th}percentile of .

The 61, 310 cells identified as part of the neutrophil trajectory are sorted

Let if a cell in the trajectory is pluripotent and 0 otherwise; i.e., it is an indicator for pluripotency. is the corresponding matrix for all cells in the trajectory.

Let be the smoothed pluripotency indicator.

The pseudotime of cell

*i*is the rank (largest to smallest) of among all .

**S4. Computing correlation coefficients.** A significant challenge in accurately measuring *R*_{ij}(*τ*), the Pearson’s correlation coefficient between genes *i* and *j* at pseudotime *τ* (Fig. 6) is that the low read-depth of scRNA-seq experiments results in many genes having 0 or very few reads in many cells, which leads to systematically high correlations (e.g., if the read counts for both gene *i* and *j* are almost all 0, but there is one cell in which both are highly expressed). To mitigate this issue, we: (a) used large bins of 1000 or more cells for all experimental calculations (even though smaller bins are sufficient to detect bifurcations, as shown in Fig. S5A-B), (b) only use cells that have non-zero read count of both genes *i* and *j* (effectively viewing 0 reads as a lack of information, rather than a measurement) and (c) only calculate correlations between gene pairs that had 400 or more non-zero read count cells to ensure that correlations are not spuriously increased by low read counts. While this filter yields a significant reduction in correlations measured (an average of 5.5 * 10^{4} per pseudotime window, out of a possible 3.2 * 10^{8}), it is independent of pseudotime, and therefore, with respect to the widening of the *R*_{ij}(*τ*) distribution at *τ*_{m} (Fig. 6A), can be thought of as a random statistical sample of correlation coefficients for which there is high confidence. Genes used to form the network in Fig. 6B had *R*_{ij}(*τ*_{m}) > 0.65, or *R*_{ij}(*τ*_{m}) < −0.35 as determined via inspection of Fig. 6A.

## ACKNOWLEDGMENTS

We thank Richard Carthew, Yogesh Goyal, and Karna Gowda for reviewing the manuscript and providing suggestions. This work was supported by the NSF-Simons Center for Quantitative Biology at Northwestern University and Simons Foundation (597491, MM). SG acknowledges University of Toronto’s Medicine by Design initiative, which receives funding from the Canada First Research Excellence Fund, and NSERC Discovery Grant MM is a Simons Foundation Investigator.