Network inference from perturbation time course data

Networks underlie much of biology from subcellular to ecological scales. Yet, understanding what experimental data are needed and how to use them for unambiguously identifying the structure of even small networks remains a broad challenge. Here, we integrate a dynamic least squares framework into established modular response analysis (DL-MRA), that specifies sufficient experimental perturbation time course data to robustly infer arbitrary two and three node networks. DL-MRA considers important network properties that current methods often struggle to capture: (i) edge sign and directionality; (ii) cycles with feedback or feedforward loops including self-regulation; (iii) dynamic network behavior; (iv) edges external to the network; and (v) robust performance with experimental noise. We evaluate the performance of and the extent to which the approach applies to cell state transition networks, intracellular signaling networks, and gene regulatory networks. Although signaling networks are often an application of network reconstruction methods, the results suggest that only under quite restricted conditions can they be robustly inferred. For gene regulatory networks, the results suggest that incomplete knockdown is often more informative than full knockout perturbation, which may change experimental strategies for gene regulatory network reconstruction. Overall, the results give a rational basis to experimental data requirements for network reconstruction and can be applied to any such problem where perturbation time course experiments are possible.


INTRODUCTION
Networks underlie much cellular and biological behavior, including transcriptional, protein-protein interaction, signaling, metabolic, cell-cell, endocrine, ecological, and social networks, among many others. As such, identifying and then representing their structure has been a focus of many for decades now. This is not just from experimental perspectives alone, but predominantly computational with a variety of statistical methodologies that integrate prior knowledge from interaction databases with new experimental data sets  . Alternatively, a variety of methods have investigated general ways to infer detailed reaction mechanisms-often a foundation of networks-from experimental data [25][26][27][28][29] . Such tasks may be considered a subset of network inference.
Network structure is usually represented as either an undirected or a directed graph, with edges between nodes specifying the system. There are five main areas where current approaches to reconstructing networks struggle to capture important features of biological networks. The first is directionality of edges 6,8,30,31 . Commonly employed correlational methods predominantly generate undirected edges, which impedes causal and other mechanistic analyses. Second is cycles. Cycles such as feedback or feedforward loops are nearly ubiquitous in biological systems and central to their function 32,33 . This also includes an important type of cycle: self-regulation of a node, that is, an edge onto itself, which is rarely considered 34 . Third is that biological networks are often dynamic. Two notable examples are circadian and p53 oscillators 35,36 , where dynamics are key to biological function. Directionality and edge signs (i.e. positive or negative) dictate dynamics. Fourth is pinpointing how external variables impinge on network nodes. For example, is the effect of a growth factor on a network node direct, or though other nodes in the network? Fifth, the design and method employed should be robust to typical experimental noise levels. The experimental design and data requirements to uniquely identify the dynamic, directed and signed edge structures in biological networks containing all types of cycles and external stimuli remains a largely open but significant problem. Any such design should ideally be feasible to implement with current experimental technologies.
Modular Response Analysis (MRA) approaches, first pioneered by Kholodenko and colleagues in 2002 37,38 inherently deal with cycles and directionality by prescribing systematic perturbation experiments followed by steady-state measurements. The premise for data requirements is to measure the entire system response to at least one perturbation for each node. Thus, an n node system requires n experiments, if the system response can be measured in a global fashion (i.e. all nodes measured at once). The original instantiations struggled with the impact of experimental noise, but total least squares MRA and Monte Carlo sampling helped to improve performance [39][40][41] . Incomplete and prior knowledge can be handled as well using both maximum likelihood and Bayesian approaches [42][43][44][45] . However, these approaches are based on steadystate data, or fixed time point data, limiting abilities to deal with dynamic systems. There is a formal requirement for small perturbations, which are experimentally problematic and introduce issues for estimation with noisy data. Subsequent approaches have recommended the use of large perturbations as a trade off in dealing with noisy data, but the theory still formally requires small perturbations 41 . Lastly, there are two classes of biologically relevant edges that MRA does not comprehensively address. First is self-regulation of a node, which is often normalized (to -1) causing it to not be uniquely identifiable. The other are the effects of stimuli external to the network (basally present or administered) on the modeled nodes.
In addition to perturbations, another experimental design feature that can inform directionality is a time-series, which have also been integrated into MRA. This work 37,46 uses time-series perturbation data to uniquely infer a signed, directed network that can predict dynamic network behavior. In an n node open system (e.g. protein levels are not constant), multiple nodes would either be distinctly perturbed more than once, such as both production and degradation of a transcript, or phosphorylation and dephosphorylation of a protein, or the system monitored before and after the perturbation (with one perturbation per node). This can be experimentally challenging both in terms of scale and finding suitable distinct perturbations for a node. Moreover, as is often the case, noise in the experimental data severely limits inference accuracy (due to required estimation of 2 nd derivatives). Subsequent work 47 recommends smaller perturbations and difference in timepoints but also does not address noisy data. Further work has demonstrated that larger perturbations produce better results due to inevitable experimental noise 41 . Thus, there remains a need for methods that can infer signed, directed networks from feasible perturbation time course experiments that capture dynamics, can uniquely estimate edge properties related to self-regulation and external stimuli, and finally that function in the presence of typical experimental noise levels.
Here we describe a novel, MRA-inspired approach called Dynamic Least-squares MRA (DL-MRA). For an n-node system, n perturbation time courses are required, and thus experimental requirements scale linearly as the network size increases. The approach uses an underlying network model that captures dynamic, directional, and signed networks that include cycles, self-regulation, and external stimulus effects. We test DL-MRA using simulated time-series perturbation data with known network topology under increasing levels of simulated noise. The approach has good accuracy and precision for identifying network structure in randomly generated two and three node networks that contain a wide variety of cycles. For the investigated cases, we find between 7 to 11 evenly distributed time points yielded reasonable results, although we expect this will strongly depend on time point placement. We apply the approach to models describing a cell state switching network 48 , a signal transduction network 49 , and a gene regulatory network 32 . Although signaling networks are often a focus in network biology, our analysis suggests they have unique properties that render them generally recalcitrant to reconstruction. Results from the gene regulatory network application suggest that incomplete perturbation (e.g. partial knockdown vs. knockout) is more informative than complete inhibition. While challenges remain for expanding to other and larger systems, the proposed algorithm robustly infers a wide range of networks with good specificity and sensitivity using feasible time course experiments, all while making progress on limitations of current inference approaches.

RESULTS
Formulation of sufficient experimental data requirements for network reconstruction Consider a 2-node network with four directed, weighted edges (Fig. 1a). An external stimulus may affect each of the two nodes differently and its effect is quantified by S 1,ex and S 2,ex , respectively (e.g. Methods, Eq. (15)). We also allow for basal/constitutive production in each node (S i,b ). Let x i (k) be the activity of node i at time point t k . The network dynamics can be cast as a system of ordinary differential equations (ODEs) as follows dx 1 dt f 1 ðx 1 ðkÞ; x 2 ðkÞ; S 1;ex ; S 1;b Þ f 1 ðkÞ; dx 2 dt f 2 ðx 1 ðkÞ; x 2 ðkÞ; S 2;ex ; S 2;b Þ f 2 ðkÞ: The network edges can be connected to the system dynamics through the Jacobian matrix J 37,38,46 , The network edge weights (F ij 's) describe how the activity of one node affects the dynamics of another node in a causal and direct sense, given the explicitly considered nodes (though not necessarily in a physical sense). In practice, however, causality can only be approached if every component of the system is included in the model, which is not typical (and even more so, there must be no model mismatch, which is almost impossible to guarantee) 6,30,31,50,51 . In MRA, these nodes may be individual species or "modules". In order to simplify a complex network it may often be separated into "modules" comprising smaller networks of inter-connected species with the assumption that each module is generally insulated from other modules except for information transfer through so-called communicating species 37 . Cases where such modules may not be completely isolated are explored elsewhere 52 .
What experimental data are sufficient to uniquely estimate the signed directionality of the network edges and thus infer the causal relationships within the system? Fundamentally, we know that perturbations and/or dynamics are important for inferring causality 6,37,46,51,52 . Consider a simple setup of three time-course experiments that each measure x 1 and x 2 dynamics in response to a stimulus ( Fig. 1b-g). One time course is in the presence of no perturbation (vehicle), one has a perturbation of Node 1, and one has a perturbation of Node 2. Consider further that the perturbations are reasonably specific, such that the perturbation of x 1 has negligible direct effects on x 2 , and vice versa, and that these perturbations may be large. Experimentally, this could be an shRNA or gRNA that is specific to a particular node, or that a small molecule inhibitor is used at low enough dose to predominantly inhibit the targeted node. A well-posed estimation problem can be formulated (see Methods) that, in principle, allows for unique estimation of the Jacobian elements as a function of time with the following set of linear algebra relations: Here, y i,j refers to a measured first-time derivative of node i in the presence of node j perturbation (if used), and Δ to a difference with respect to perturbation (subscript p) or time (subscript t) (see Methods). Since we do not use data from the perturbation of node i for estimation of node i edges, we do not have to impose assumptions on how the perturbation functionally acts on the system dynamics (see Methods). Moreover, constraints on the perturbation strength can be relaxed, following recent recommendations 41 (although accuracy of the underlying Taylor series approximation can affect estimation-see Methods). If these measurements with and without perturbations were each taken in their steady state as is done in MRA, the solution for F ij would be trivial. MRA gets around this by normalizing self-regulatory parameters F ii to -1. Using dynamic data allows unique estimation of self-regulatory parameters without such normalization. Estimation of the node-specific stimulus strengths or basal production rates (S's) requires evaluation after specific functional assumptions, but in general these effects are knowable from the data to be generated (see Methods and below results).
Note that this formulation is generalizable to an n dimensional network. With n 2 unknown parameters in the Jacobian matrix, n equations originate from the vehicle perturbation and n−1 equations originate from each of the n perturbations (discarding equations from Node i with Perturbation i). This results in n þ n Ã ðn À 1Þ ¼ n þ n 2 À n ¼ n 2 independent equations.
Using sufficient simulated data to reconstruct a network As an initial test of the above formulation, we used a simple 2 node, single activator network where Node 1 activates Node 2, one node has first-order degradation (-1 diagonal elements), and the other has negative self-regulation (-0.8 diagonal) (Fig. 1a-see Methods for equations). A stimulus at t = 0 (time-invariant; S ,ex = 1) increases the activity of each node, which we sample with an evenly spaced 11-point time course. This simulation was done for no perturbation (i.e. vehicle) and for each perturbation (Node 1 and Node 2) to generate the necessary simulation data per the theoretical considerations above (Fig. 1c, e, g, left panel).
Here, we modeled perturbations as complete inhibition; for example, a perturbation of Node 1 makes its value 0 at all times. Solving Eqs. (3) and (4) to infer the Jacobian elements at each time point yielded good agreement between the median estimates and the ground truth values (Fig. 1h, "Analytic Solution", No Noise). Using the node activity data corresponding to the last time point in the time course and the median estimates of Jacobian elements, the external stimuli S 1,ex and S 2,ex were also determined a b Cross Activation Self Regulation Self Regulation Fig. 1 Overall DL-MRA approach. a Two-node network with Jacobian elements labeled. Green arrows are stimuli and basal production terms. Time course experimental design with perturbations: vehicle (b), Node 1 (d), Node 2 (f). The vehicle may be the solvent like DMSO for inhibition with a drug, or a nontargeting si/shRNA for inhibition with si/shRNA. Simulated time course data for Vehicle perturbation (c), Node 1 perturbation (e), Node 2 perturbation (g) from the network in a. Left Column: no added noise; Right Column 10:1 signal-to-noise added. Actual versus inferred model parameters (S 1,b , S 1,ex , F 11 , F 12 , S 2,b , S 2,ex , F 21 , F 22 ) for direct solution of Eqs. 3-4 in the absence (h) or presence (i) of noise, or with noise and the least-squares approach (j). In h and i, error bars are standard deviation across time points.
How does this approach fare when data are noisy? We performed the estimation with the same data but with a relatively small amount of simulated noise added (10:1 signalto-noise- Fig. 1c, e, g). The resulting estimates are neither accurate nor precise, varying on a scale more than ten times greater than each parameter's magnitude with median predictions both positive and negative regardless of the ground truth value (Fig. 1i). The stimulus strengths S 1,ex and S 2,ex are estimated to be negative, while the ground truth is positive.
Although the analytic equations suggest the sufficiency of the perturbation time course datasets to uniquely estimate the edge weights, in practice even small measurement noise corrupts estimates obtained from direct solution of these equations. Therefore, we considered an alternative representation by employing a least squares estimation approach rather than solving the linear equations directly. For a given set of guesses for edge weight and stimulus parameters, one can integrate to obtain a solution for the dynamic behavior of the resulting model, which can be directly compared to data in a least-squares sense. Least squares methods were shown to improve traditional MRAbased approaches 39,40 , but had never been formulated for such dynamic problems. Two hurdles were how to model the effect of a perturbation without (i) adding additional parameters to estimate or (ii) requiring strong functional assumptions regarding perturbation action. We solved these here by using the already-available experimental measurements within the context of the leastsquares estimation (see Methods). We applied this approach to the single activator model, 10:1 signal-noise ratio case above where the analytic approach failed. This new estimation approach was able to infer the network structure accurately and precisely (Fig. 1j). We conclude that analytic formulations can be useful for suggesting experimental designs that should be sufficient for obtaining unique estimates for a network reconstruction exercise, but in practice directly applying those equations may not yield precise nor accurate estimates. Alternatively, using a least-squares formulation seems to work well for this application.

Reconstruction of random 2 and 3 node networks
To investigate the robustness of the least-squares estimation approach, we applied it to increasingly complex networks with larger amounts of measurement noise and smaller numbers of time points (Fig. 2). We focused on 2 and 3 node networks. We generated 50 randomized 2 and 3 node models, where each edge weight is randomly sampled from a uniform distribution over the interval [−2, 2], and the basal and external strength from [0, 2] (Fig. 2a, Supplementary Figs. S1a and S2a). Each random network was screened for stability. Many networks (29/50 for 2 node and 3 node) displayed potential for oscillatory behavior (non-zero imaginary parts of the eigenvalues of the Jacobian matrix). However, since the real parts of the eigenvalues are non-zero and negative, these oscillations should dampen over time, and no sustained oscillatory behavior was analyzed. For each random model, we generated a simulated dataset based on the prescribed experimental design, using complete inhibition as the perturbation. We considered evenly-spaced sampling within the time interval of 0-10 AU (approximate time to reach steady-state-Supplementary Figs. S1b and S2b) with different numbers of time points (3, 7, 11 and 21), and added 10:1 signal-to-noise, 5:1 signalto-noise, and 2:1 signal-to-noise to the data. Non-uniform time point spacing may change inference results but that was not explored at these first investigations.
For each random network model, number of time points, and noise level, we evaluated the fidelity of the proposed reconstruction approach in terms of signed directionality (Fig. 2c-f). We overall found reasonable agreement between inferred and ground truth values, even at the higher noise levels and low number of timepoints. Expectedly, the overall classification accuracy increases with more time points and decreases with higher noise levels. But, surprisingly, even in the worst case investigated of 3 timepoints and 2:1 signal-to-noise ratio, classification accuracy was above 85% for 2 node models and 70% for 3 node models. Increasing the number of nodes decreases performance, with 3-node reconstruction being slightly worse than 2-node reconstruction, other factors held constant.
We wondered whether the magnitude of an edge weight influenced its classification accuracy, since small edge weights may be more difficult to discriminate from noise. We found that edge weights with greater absolute values, which are expected to have a greater influence on the networks, were more likely to be classified correctly (Supplementary Figs. S1c-f and S2c-f). Also, for models with damped oscillatory behavior, the classification accuracy is very similar to that of all 50 random models ( Supplementary Fig. S3a, b).
How does this method compare to similar network reconstruction methods? There are limited methods to compare to which also use dynamic data and sequential perturbations. MRA 37 , from which this method was inspired, uses steady-state data. However, we could use MRA methods requiring dynamic perturbation data as is used in our method 46,53 .To compare, we further generated another set of perturbation data with 50% perturbation (as opposed to 100%). We then used the two sets of perturbation data to estimate the network node edges with dynamic modular response analysis (Fig. 2g). Even in absence of noise, for low to medium numbers of timepoints (3-11) the network is not always accurately inferred (Fig. 2g). In the presence of noise, DL-MRA performs better, although the difference between the two methods becomes lower at high number of timepoints. Thus, DL-MRA not only outperforms with half the data, but it also estimates 6 additional parameters-basal production and external stimulus for each node. Although Cho's approach 47 builds upon MRA methods by recommending smaller time point intervals and smaller perturbations, for our purposes, the time intervals and perturbations are fixed and this would not affect the results obtained here. Moreover, further work has actually recommended larger perturbations while dealing with noisy data 41 .
To explore a scenario where data from a node might be unavailable, we removed the data from one of the nodes in the 50 random 3 node models and used the remaining data to reconstruct a 2-node system ( Supplementary Fig. S4). Comparing with corresponding model parameters in the 3 node system, we find a good but expectedly reduced classification accuracy (No Noise-94.75%, 10:1 Signal: Noise-93.75%, 5:1 Signal: Noise-91.25%, 2:1 Signal: Noise-87).
A part of the inference process is performing parameter estimation using multiple starting guesses (i.e. multi-start), and we wanted to determine how robust the estimated parameters were across the multi-start processes. We looked at the distribution of coefficient of variation (CV) among the parameters in the multi-start results in the 50 random 3 node models where either the data generated from the estimated parameters had low sum of squared errors (SSE) compared to the original data (<10 −4 ) or with SSE less than twice the minimum SSE. We find that the CVs peak around zero and generally have a small spread, especially for low noise scenarios ( Supplementary Fig. S5). This implies a good convergence of the parameter sets obtained through multi-start.
We conclude that the network parameters of 2 and 3 node systems can be robustly and uniquely estimated using DL-MRA. However, these were ideal conditions where there was no model mismatch that is expected in specific biological applications. How does DL-MRA perform when applied to data reflective of different biological use cases? D. Sarmah et al.

Application to cell state networks
Cell state transitions are central to multi-cellular organism biology. They are commonly transcriptomic in nature and underlie development and tissue homeostasis and can also play roles in disease, such as drug resistance in cancer 48,[54][55][56][57][58][59][60][61] . Could DL-MRA reconstruct cell state transition networks? As the application, we use previous data on SUM159 cells that transition between luminal, basal and stem-like cells 48 . Pure populations of luminal, basal and stem-like cells eventually grow to a stable final ratio amongst the three. The authors used a discrete time Markov transition probability model to describe the data and estimate a cell state transition network (Fig. 3a). Thus, we seek to compare DL-MRA to such a Markov model in this case.
We hypothesized that perturbations to the system in this case, in contrast to above, did not have to change node activity (i.e. edges). Rather, we thought that perturbing the equilibrium cell state distribution could serve an equivalent purpose. Thus, the data for reconstruction consisted of observing the cell state proportions evolve over time from "pure" populations ( Fig. 3b), in addition to equal proportions. DL-MRA is capable of explaining the data (Fig. 3b). Interpretation of the estimated network parameters to DL-MRA depends on the transformation of the original discrete  Fig. 2 Application to linear two and three node models. a Connections around a Node i in an n-Node Model. S i,b and S i,ex are the basal production and external stimulus terms acting on Node i, respectively. F ii is the self-regulation term; Fij the effect of Node j on Node i and F ji the effect of Node i on Node j. b Example of different signal-to-noise ratio effects on time course data. Ground truth versus estimated edge weights across all 50 random networks and noise levels for data from four different total timepoints (3,7,11,21) for 2 node (c) and 3 node (d) networks. Quadrant shading indicates edge classification. Fraction of network parameters correctly classified in 50 randomly generated 2 node networks (e) and 3 node networks (f) with different noise levels and total timepoints. g Fraction of network parameters correctly classified in 50 randomly generated 3 node networks with dynamic MRA using two sets of perturbation data.

Connections affecting 'i'th Node in a Random n-Node Model
time Markov probabilities to a continuous time formulation (see Methods-there are constraints on self-regulatory parameters), but DL-MRA correctly infers the cell state transition network as well (Fig. 3c). Conveniently, DL-MRA is not constrained to 1-day time point spacing as is the original discrete time Markov model. How does noise and the number of timepoints affect the reconstruction? As above, we generated data for 50 random cell state transition models with 3, 7, 11 and 21 timepoints within 5 days, as the models generally seemed to reach close to equilibrium within 5 days. Noise levels of 10:1, 5:1 and 2:1 were used. All parameters were classified accurately ( Fig. 3d) (although additional constraints in the estimation-see Methods-facilitates this classification performance). With 3 timepoints, there was deviation from perfect fit even with no noise in the data. At 7 and higher number of timepoints, the estimates matched ground truth well, and noise expectedly reduced the accuracy (Fig. 3d). We conclude that DL-MRA can robustly infer cell state networks given perturbation data in the form of nonequilibrium proportions as initial conditions.
Application to intracellular signaling networks How does the method perform for intracellular signaling networks? The Huang-Ferrell model 49 (Fig. 4a) is a well-known  29) and (30)). d Ground truth versus estimated edge weights across 50 random cell transition networks and noise levels for data from four different total timepoints (3,7,11,21). intracellular signaling pathway model and has been investigated by different reconstruction methods, including previous versions of MRA 37 is typical for reconstruction efforts (Fig. 4b) 37,39,41,46,52,62 . Second, to model perturbations, we sequentially perturbed the activation parameters of each of the observable species (k3, k15 and k27 respectively). Such perturbations, although hard to achieve experimentally, are important because modules must be "insulated" from one another and perturbations must be specific to the observables 37,52 . Even specific inhibitors do not have such kinetic specificity. Third, in the simplification of the reaction scheme, the observables are shown to influence each other but in the actual scheme, they conduct their effects through the unphosphorylated and semi-phosphorylated species. We sought to keep the levels of these two species relatively constant between different perturbations, so that they wouldn't add to non-linearities in the estimation. Therefore, we used a stimulus which only activated the observables to a maximum of about 5% of the total forms of the protein 52 . Estimation with DL-MRA under the above conditions fits the data (Fig. 4c) and predicts positive node edges down the reaction cascade (F 21 , F 32 ), negligible direct relation between p-MAPKKK and pp-MAPK (F 13 , F 31 ), negative self-regulation of each of the observables (F 11 , F 22 , F 33 ) negative feedbacks from pp-MAPKK to p-MAPKKK (F 12 ) and from pp-MAPK to pp-MAPKK (F 23 ), and negligible external stimuli on pp-MAPK to pp-MAPKK (F 13 , F 31 ). All these effects are consistent with the reaction scheme. The negative feedback effects, although not immediately obvious, are consistent with ground truth sequestration effects. For instance, pp-MAPK has an overall negative effect on pp-MAPKK as the existence of pp-MAPK lowers the amount of species MAPK and p-MAPK which sequester pp-MAPKK and makes it avoid deactivation by its phosphatase.
How do the estimation results for the Huang-Ferrell model in our method compare with those obtained from other methods? Previous work using MRA also reported negative feedbacks from successive modules to the preceding ones 37,46,52 . Similarly, selfregulation parameters in most preceding MRA based methods are also estimated to be negative but are fixed at -1 37,39,52 .
Besides MRA inspired methods, SELDOM is another network reconstruction method which can also deal with dynamic data 62 . SELDOM is a data-driven method which uses ensembles of logic based dynamic models followed by training and model reduction steps to predict state trajectories under untested conditions. However, when dealing with the Huang-Ferrell network, the true value model of SELDOM does not map the effects of selfregulation, nor feedback effects between nodes (Fig. 4e). This may be explained by the fact that although SELDOM uses an extensive number of models to test the data obtained from multiple different stimuli, perturbation data was not included to test the Huang-Ferrell Model. This implies that systematic perturbation of each of the nodes, as prescribed by MRA-based methods, are necessary in order to unearth feedbacks and self-regulation effects.
Although application of DL-MRA to the Huang-Ferrell model was able to unearth latent network structure, the simulation conditions were restrictive. First, the perturbation scheme chosen in this paper, although specifically targeted at the observable species, is hard to produce experimentally. In practice, knockdown/out, overexpression, or specific inhibitors could be used as suitable perturbations, but do not have the preciseness needed to be compatible with MRA-imposed constraints. The feedback effect observed could depend on the perturbation scheme chosen-for instance knockdown of an entire module as a perturbation would likely have manifested as positive feedback to the preceding module. That is because such a knockdown would have reduced the effect of sequestration of the module on the preceding observable and would have made it more available for dephosphorylation. Second, we assumed a low stimulus to avoid effects from the unphosphorylated version of the proteins. A higher activation may increase non-linearities adding to the complexity of the model, whereas a lower stimulus may not activate enough proteins to be well detected in experiments. The degree of activation needed for an experiment may be hard to predict beforehand. Such specific perturbations and stimulus had to be done to reduce the effects arising from the non-observable species behavior. Hence application of DL-MRA to intracellular signaling networks with multiple physical interactions needs to be carefully considered before modeling or experiments.
Application to gene regulatory networks: partial perturbations are more informative than full perturbations Here, we applied DL-MRA further to a series of well-studied nonlinear feed forward loop (FFL) gene regulatory network models that have time-varying Jacobian elements (Fig. 5a, Table 1) 32,33 . Such FFL motifs are strongly enriched in multiple organisms and are important for signaling functions such as integrative control, persistence detection, and fold-change responsiveness [63][64][65] .
The FFL network has three nodes (x 1 , x 2 , and x 3 ), and the external stimulus acts on x 1 (S 1,ex ). There is no external stimulus on x 2 and x 3 ; however, there may be basal production of x 2 (S 2,b ) and x 3 (S 3,b ) , . Each node exhibits first-order decay (F ii = −1). The parameters F 12 , F 13 , and F 23 represent connections that do not exist in the model; we call these null edges, but we allow them to be estimated. The relationship between x 1 and x 2 (F 21 ), between x 1 and x 3 (F 31 ), or between x 2 and x 3 (F 32 ) can be either activating or inhibitory. Furthermore, x 1 and x 2 can regulate x 3 through an "AND" gate (both needed) or an "OR" gate (either sufficient) (Fig. 5a). These permutations give rise to 16 different FFL structures (Table 1).
To generate simulated experimental data from these models, we first integrated the system of ODEs starting from a zero initial condition to find the steady state in the absence of stimulus. We then introduced the external stimulus and integrated the system of ODEs (see Methods) to generate time series perturbation data consistent with the proposed reconstruction algorithm, using full inhibitory perturbations. We used 11 evenly spaced timepoints for all 16 non-linear models, based on the random 3-node model analysis above, and also added noise as above.
We first noticed that even in the absence of added noise, a surprising number of inferences were incorrect (Fig. 5b, f). Model #1 (Table 1, Fig. 5b, c) is used as an example, where F 21 , F 31 and F 32 are activators with an AND gate, and F 31 is incorrectly predicted as null (Fig. 5b-compare ground truth to 100% inhibition). To understand the reason for the incorrect estimation, we looked at the node activity dynamics across the perturbation time courses (Fig. 5d). All three nodes start from an initial steady state of zero, but Node 3 is zero for all three perturbation cases. This is because of the following. Since x 1 is required for the activation of x 2 and x 3 , complete inhibition of x 1 completely blocks both x 2 and x 3 activation. But, because both x 1 and x 2 are required for the activation of x 3 , completely inhibiting x 2 activity also completely inhibits x 3 . Thus, given this experimental setup, it is impossible to discern if x 1 directly influences x 3 or if it acts solely through x 2 .
We thus reasoned that full inhibitory perturbation may suppress the information necessary to correctly reconstruct the network, but that a partial perturbation experiment may contain enough information available to make a correct estimate. If this were true, then upon applying partial perturbations (we chose 50% here), Node 3 dynamics should show differences across the perturbation time courses. Simulations showed that this is the case (Fig. 5e). Subsequently, we found that for partial perturbation data, F 31 is correctly identified as an activator. More broadly, we obtain perfect classification from noise-free data across all 16 FFL networks when partial perturbation data are used, as opposed to 5/16 networks having discrepancy with full perturbation data (Fig. 5f). The fits to simulated data from the reconstructed model align very closely, despite model mismatch (Supplementary Fig.  S6). We conclude that in these cases of non-linear networks, a partial inhibition is necessary to estimate all the network parameters accurately. Thus, moving forward, we instead applied 50% perturbation to all simulation data and proceeded with least squares estimation.
Application to gene regulatory networks: performance The above analysis prompted us to use a partial (50%) perturbation strategy, since it classified each edge for each model in the absence of noise correctly. What classification performance do we obtain in the presence of varying levels of experimental noise? We first devised the following strategy to assess classification performance. We generated 50 bootstrapped datasets for each network structure/signal-to-noise pair, and thus obtained 50 sets of network parameter estimates. To classify the network parameters, we used a symmetric cutoff of a percentile window around the median of these 50 estimates (Fig. 6a). We illustrate this approach with three different example edges and associated estimates, one being positive (Edge 1), one being negative (Edge 2), and one being null   positives and negatives are likely to be accurate while the null parameters are likely to be incorrectly categorized as either positive or negative (Fig. 6a). If we increase the percentile window span slightly (e.g. between the 40th and 60 th percentile, middle panel), we can categorize null edges better, while maintaining good classification accuracy of both true positive and negative edges. However, if we relax the percentile window too much, (e.g. between the 10th and 90 th percentile, far right panel) we may categorize most parameters as null, including the true positive and negatives. Thus, it is clear there will be an optimal percentile cutoff that maximizes true positives and minimizes false positives as the threshold is shifted from the median to the entire range. Now, we applied this classification strategy to the 16 FFL model estimates from data with different noise levels. We varied the percentile window from the median only (50) to the entire range of estimated values (100) and calculated the true and false positive rates for all edges across all 16 FFL models, which allowed generation of receiver operator characteristic (ROC) curves (Fig. 6b). For each noise level, we chose the percentile window that yielded a 5% false positive rate (13-87 percentile for 10:1 Signal:Noise, 19-81 percentile for 5:1 Signal:Noise, and 21-79 percentile for 2:1 Signal:Noise). Using this simple cutoff classifier, we observed good classification performance across all noise levels according to traditional area under the ROC curve metrics (10:1 AUC = 0.99, 5:1 AUC = 0.9, 2:1 AUC = 0.92).
How does classification accuracy break down by FFL model and edge type? To evaluate the performance for each of the 16 FFL cases, we calculated the fraction of the 12 links in each FFL model that was classified correctly as a function of signal-to-noise, given the percentile windows determined above (Fig. 6c). We also looked at the fraction of the 16 models where each of the 12 links were correctly classified (Fig. 6d). Perfect classification is a value of one, which is the case for no noise, and for many cases with 10:1 signal-to-noise. In general, as noise level increases, prediction accuracy decreases, as expected. Although for some models and parameters, performance at 2:1 signal-to-noise is poor, in some cases it is surprisingly good. This suggests that the proposed method can yield information even in high noise cases; this information is particularly impactful for null, self-regulatory, and stimulus edges. High noise has strong effects on inference of edges that are either distinct across models, time variant or reliant on other node activities (F 21 , F 31 , F 32 ) (Fig. 6c, d, Supplementary Fig. S7). F 21, which is reliant on activity of x 1 , is inferred better than F 31 and F 32 . This may be caused by the fact that x 3 dynamics depend on both x 1 and x 2 , whereas x 2 dynamics only depend on x 1 .
Comparing across models, we find that Models 1-8 are reconstructed slightly better than Models 9-16 (Fig. 6c) when noise is high. This performance gap is predominantly caused by S 3,b misclassification-basal production of Node 3 ( Supplementary  Fig. S7). What is the reason for the possible misclassification of S 3,b in Models 9-16? We know that S 3,b depends on the initial values of x 1 , x 2 and x 3 and the estimated values of F 31 , F 32 and F 33 (See Methods, Eq. (19)). For Models 1-8, x 1 (t = 0) and x 2 (t = 0) are both zero and therefore S 3,b is effectively only dependent on estimated value of F 33 and x 3 (t = 0) ( Supplementary Fig. S6 and Methods). But for Models 9-16, x 2 (t = 0) is non-zero and S 3,b is dependent on the estimated values of both F 32 and F 33 , in addition to x 2 (t = 0) and x 3 (t = 0), which increases the variability of S 3,b estimates. Therefore with high levels of noise, S 3,b is more likely to be misclassified in Models 9-16, whereas this does not happen in Models 1-8 (Fig. 6c, d, Supplementary Fig. S7). In the future, including stimulus and basal production parameters in the least squares estimations themselves, rather than further deriving algebraic relations to estimate them, will likely help improve reliability.
We conclude that (i) when dealing with non-linear gene regulatory networks, complete perturbations such as genetic knockouts may fundamentally impede one's ability to deduce network architecture and (ii) this class of non-linear networks can be reconstructed with reasonable performance using the proposed strategy employing partial perturbations.

DISCUSSION
Despite intensive research focus on network reconstruction, there is still room to improve discrimination between direct and indirect edges (towards causality), particularly when biologicallyubiquitous feedback and feedforward cycles are present that stymie many statistical or correlation-based methods, and given that experimental noise is inevitable. The presented DL-MRA method prescribes a realistic experimental design for inference of signed, directed edges when typical levels of noise are present. It allows estimation of self-regulation edges as well as those for basal production and external stimuli. For 2 and 3 node networks, the method can successfully handle random linear networks, cell state transition networks, and gene regulatory networks, and, under certain limiting conditions, signaling networks. Prediction accuracy improved with more timepoints, which in our case accounted for more relevant dynamic data. However, we would like to stress that here we did not explore time point placement, which likely underlies the performance increase rather than simply number of timepoints. Prediction accuracy was strong in many cases even with simulated noise that exceeds typical experimental variability (2:1 signal-to-noise). The method presented here is quite general and could be applied not only to cell and molecular biology, but also vastly different fields where perturbation time course experiments are possible, and where network structures are important to determine.
One type of non-linear model that we did not investigate is one with sustained oscillations, such as those found in the cell cycle 66 , or sometimes even MAPK signaling pathways [67][68][69] . We found that in our application to general two and three node linear models, DL-MRA could reconstruct multiple networks that have damped oscillatory behavior (Fig. 1b). However, we expect time point measurement selection and frequency to be much more important for inferring networks that give rise to sustained oscillations, with time points comprehensively covering peaks and troughs, and the frequency high enough to do so. We do expect that the method could infer the structure of such networks given appropriate sampling, but this requires a much deeper investigation. MRA and its subsequent methods allow for inference of direct edges by prescribing systematic perturbation of each node 37,39,41,43,45 and the idea of directionality has been followed through in DL-MRA. Often, such edge directness is equated to causality, but this is not necessarily the case, especially when the entire system is not explicitly represented. In practice, the causality and strength of an edge may be dependent on how well the model represents the underlying phenomenon and might be affected by simplification of larger networks, non-linearities in the actual model and even by noise in the data. Secondly, in discussions about causal system inferences, consideration of the counterfactuals is important 30,31,50,51 . For a network of nodes going through dynamics, the counterfactuals to intrinsic network edges causing the dynamics would be the environmental factors extrinsic to the network edges. In DL-MRA, by evaluating external stimuli and basal production as well as the network edges, we have mapped some counterfactuals to node dynamics, thus presenting a more complete map of the causal factors to the network dynamics compared to methods which only show network edges. This also allows for a concise mapping of the environmental contexts in which the network edges are reconstructed.
Application of DL-MRA could reconstruct cell state transition networks based on discrete time Markov transition models, with the added benefit of not being constrained to specific time intervals. It can also successfully handle noisy data. The additional constraints in DL-MRA in the context of cell state transitions (summations of transition rates-see Methods) implies that the underlying network may be estimated even with less data requirements than in other cases. This method can be a useful tool to model cell state transitions and predict cell state. Perturbations were modeled as a difference in initial states, and that worked well in this case, suggesting that such modeling of perturbations may work in other cell state transition or biological networks.
Although application of DL-MRA to an intracellular signaling network (Huang-Ferrell MAPK) was able to explain its ground truth, including feedback due to sequestration, the method was constrained to specific, difficult-to-implement perturbations and a low stimulus which may not always be feasible experimentally. Specific inhibitors could be a source of perturbation, but even they influence more kinetic parameters than was required here for a clean solution. In MRA, a larger reaction scheme is often simplified into modules with one species in the module representing the activity of the module. But often, the activity of the other species in the module is implicit and becomes significant in dictating how perturbations and stimulus affect the network dynamics. Moreover, the type of perturbation chosen, such as specific inhibitors versus knock-down, also may yield different network inference results. Therefore, the use of MRA methods on simplified large intracellular signaling networks, especially while dealing with experiments, have significant caveats that should be carefully considered 41,70 .
Although complete inhibition is often used for perturbation studies of gene regulatory networks (e.g. CRISPR-mediated gene knockout), we found that partial inhibition is important to fully reconstruct the considered non-linear gene regulatory networks. It is important to distinguish here, however, small perturbations vs. partial perturbations. Small perturbations are formally recommended for both MRA and other techniques 70 where the effects of noise are not extensively explored. In practice however, there is a tradeoff between perturbation strength and feasibility, since the effects of small perturbations are masked by noise 41 . Partial perturbations, as considered in this work (~50%) are much larger than what are typically considered small perturbations. The theoretical formulation of DL-MRA reduces the impact of not having small perturbations, because perturbation data from a particular node is not used for inference of edges connected to that node. Yet, DL-MRA still uses linearizations of the Jacobian which are are always subject to greater inaccuracy the further away from reference points such perturbations take the system. Since many biological networks share the same types of nonlinear features contained within the considered FFL models, this is not likely to be the only case when partial inhibition will be important. We are thus inclined to speculate that large partial perturbations may be a generally important experimental design criterion moving forward. Partial inhibition is often "built-in" to certain assay types, such as si/shRNA or pharmacological inhibition that are titratable to a certain extent.
One major remaining challenge is scaling to larger networks. Here, we limited our analysis to 2 and 3 node networks. Conveniently, the number of necessary perturbation time courses needed grows linearly (as opposed to exponentially) with the number of considered nodes. Furthermore, as long as system-wide or omics-scale assays are available, the experimental workload also grows linearly. This is routine for transcriptome analyses 71 , and is becoming even more commonplace for proteomic assays (e.g. mass cytometry 72 , cyclic immunofluorescence, mass spectrometry 73 , RPPA 74 . Thus, the method is arguably experimentally scalable to larger networks.
However, the computational scaling past 2 and 3 node models remains to be determined and is likely to require different approaches for parameter estimation. Increasing the network size will quadratically increase the number of unknown parameters, which will significantly increase the computational requirements for obtaining robust solutions. Yet, recent work has shown that large estimation problems in ODE models may be broken into several smaller problems 75 , which may be applicable here, and is likely to yield large computational speed up by allowing parallelization of much smaller tasks. However, theory on how to merge potentially discrepant results between independently estimated overlapping subnetworks would need to be derived. Importantly, we saw in the linear 2 and 3 node model examples that the impact of experimental noise was larger for 3 node models, implying that increasing the number of nodes past 3 will further increase the impact of experimental noise. Another synergistic avenue could be imposing prior knowledge to improve initial parameter guesses and even reduce the parametric space, such as in Bayesian Modular Response Analysis 45 , or with functional database information 76 . Such prior knowledge could also help inform emergent network properties as network size grows, such as degree distributions for scale-free networks 2 . Here, we only investigated dense subnetworks, so sparseness patterns and judicious allocation of non-zero Jacobian elements could also have great impact on estimation for large networks. Overall, application to larger networks is of great interest but these nontrivial computational roadblocks must be solved first.
In conclusion, the proposed approach to network reconstruction is systematic and feasible, robustly operating in the presence of experimental noise and accepting data from large perturbations. It addresses important features of biological networks that current methods struggle to account for: causality/directionality/ sign, cycles (including self-regulation), dynamic behavior and environmental stimuli. It does so while leveraging dynamic data of the network and only requires one perturbation per node for completeness. We expect this approach to be broadly useful not only for reconstruction of biological networks, but to enable using such networks to build more predictive models of disease and response to treatment, and more broadly, to other fields where such networks are important for system behavior.

Deriving sufficiency conditions for unique estimation of Jacobian elements
The first-order partial derivatives comprising J (Eq. (2)) can be approximated by a first-order Taylor series expansion of Eq. (1) about a time point k Equations (5) and (6) may be written more succinctly as where y i ðk þ 1Þ f i ðk þ 1Þ À f i ðkÞ; Δ t x i ðk þ 1Þ x i ðk þ 1Þ À x i ðkÞ: The approximation in Eq. (7) becomes more accurate as more time points are measured. Also, the edge weights are potentially time-dependent, although this is rarely considered when describing biological networks.
How do we estimate the edge weights F in Eq. (7) and thus reconstruct the network? Time series data can inform x i 's and f i 's as a function of time, following application of a stimulus. Given such stimulus-response data, however, for each time point there are only two equations for four unknowns, an underdetermined system for which more data are needed.
Consider now stimulus-response time course data in the presence of single perturbations. Let pi be a variable that reflects the strength and/or presence of different potential perturbations: p 1 represents perturbation of x 1 and p 2 represents perturbation of x 2 . If p j is not explicitly written, its value is zero and/or it has no effect. Now, the ODEs become a function of the perturbation variables where y i;j ðkÞ f i;j ðkÞ À f i ðkÞ; Δ p;j x i ðkÞ x i ðk; p j Þ À x i ðkÞ Here, we have expanded with respect to the perturbation, rather than with respect to time as previously. However, since the reference point is the same, the Jacobian elements remain identical in these equations. It is also interesting to note that the Jacobian elements, or network, may be affected by the perturbation, but we do not necessarily have to know those effects mathematically, since the reference point is the same. Now we have six potential equations with which to estimate the four Jacobian elements. However, we must make some determination as to how the perturbations p 1 and p 2 directly affect Node 1 and Node 2 dynamics f 1 and f 2 to account for the perturbation variable partial derivatives.
By design, the Node 1 perturbation has significant direct effects on Node 1 dynamics, and similarly for the Node 2 perturbation on Node 2 dynamics. Using equations including ∂f 1 =∂p 1 and ∂f 2 =∂p 2 require precise definition of perturbation strength and their effects on dynamics, which could be difficult to determine experimentally and implement in simulations. Therefore, we do not employ equations involving such terms. On the other hand, if the Node 1 perturbation has negligible direct effect on Node 2 dynamics, that is, the effects on Node 2 dynamics are through the network (i.e. p 1 ) and not explicit in f 2 ), and similarly the Node 2 perturbation has negligible direct effect on Node 1 dynamics, then ∂f 2 =∂p 1 and ∂f 1 =∂p 2 are approximately zero. This mild condition is often the case experimentally. The only determining factors for the suitability of the Taylor series truncation are the spacing of time points and the accuracy of the expansion about the perturbation difference. From this, the main set of linear equations presented in Eqs. (3) and (4) are obtained.

General estimation model equations
We employ the following general model for a two-node network: - Here, S 1 and S 2 are the stimuli strengths on Node 1 and Node 2 respectively, and F 11 , F 12 , F 21 and F 22 are the network edge weights (Fig. 1a). In many systems, there may be a basal or constitutive production driving the node activities, besides an external stimulus. For these cases, the Stimulus term (S i ), may be considered as an addition of these two effects-the basal production term (S i,b ) and the external stimulus (S i,ex ). Then the two-node model can be represented by the following equations- Or more generally, where n is the total number of nodes. When a steady state exists, the dx i /dt terms become zero and it becomes easy to represent the stimulus terms as a function of the node activities (x i ) and network edges (F ij ).
This is helpful to understand that the perturbation time course data also generally constrains not only the edge weights, but also the stimulus terms. For a system at a steady state without an external stimulus, for example at t = 0: The two-node single activator model The two-node single activator model (Fig. 1a, Supplementary Fig.  S1a) is described by Here, S 1,ex = 1, F 11 = −1, F 12 = 0, S 2,ex = 1, F 21 = 1.5, F 22 = −0.8. The basal production terms are both zero, for simplicity, and the initial conditions for x 1 (t = 0) and x 2 (t = 0) are zero. The stimulus terms S i,ex are calculated through Eq. (18), using the median values of F ij and the x i (t = 10), when the system reaches near steady state.
Random two-node and three-node models The random 2 node network is described by Values for S 1,b , S 2,b , S 1,ex and S 2,ex are sampled from a uniform distribution over the range [0,2] and values for F 11 , F 12 , F 21 , and F 22 are sampled from a uniform distribution over the range [−2,2] using the MATLAB function rand. To capture basal activity, we use a two-step approach. First, starting from node activity values of zero, without the external stimulus on Node 1 and Node 2 (S 1,ex = S 2,ex = 0 in Eq. (22)) we simulate until the network reaches steady-state with just basal production driving the network behavior. Then, we introduce the external stimulus on Node 1 and Node 2, integrate the ODEs, and sample evenly spaced timepoints using ode15s in MATLAB with default settings. We sample 3,7, 11, and 21 evenly spaced time points across a time course, from 0 to 10 arbitrary time units in all the cases.
The random 3 node networks use the same sampling rules as the 2 node networks with the following equations.

Intracellular signaling networks
In the simplification of the Huang-Ferrell network to three nodes, p-MAPKKK, pp-MAPKK and pp-MAPK were taken as nodes. Since, in absence of external stimuli, the basal values of the nodes are zero, the basal production was estimated as zero beforehand and not considered in the estimation of the rest of the network. Aside from the basal production edges, a full 3 node network ( Fig. 4b) was estimated from the simulation data of each of the observables. After estimation, parameters with values less than 1/100th of the largest parameter, were considered negligible.

Cell state transition models
The cell transition model 48 is a discrete time Markov probability model. Here, we show how this form is related to the ODE model used in DL-MRA. Starting at any initial value, each next step representing a time difference of one day follows from the previous time point as follows- Where M ij denotes the Markov transition probabilities of species j into species i. In matrix form it may be represented as follows- Representing the Markov parameter matrix as M and the species relative concentration variables as vector X, the equation becomes The Markov transition probabilities for a species must add up to 1. In experimental terms, a species can either transition to other species or stay the same and the sum of all those probabilities is 1. X i¼1:3 As a first step in relating these equations to the ODE form underlying DL-MRA, we put the variables in terms in terms of Δx (with respect to time), Where M' is M-I, and I is the identity matrix. M' is M, except that 1 is subtracted from all its diagonal elements. Hence Eq. (26) for M' becomes X i¼1:3 This also implies that the diagonal term for M' is negative of the sum of the other two terms in the same column. In experimental terms, the amount of reduction of a species is equal to how much it got converted to other species.
The above equations apply for the cases where Δt is 1. We can incorporate arbitrary time steps as Where Δt is the scalar value of time difference and M' Δt is the matrix of the set of parameters, specific to the time difference chosen. For a case where Δt tends to 0, the equation becomeslim Δt!0 Where M' dt is the matrix of the set of parameters specific to the case where Δt is infinitesimally small. Note that Eq. (33) is similar in form to Eq. (22), only without the extra stimulus terms and where M' dt is equivalent to the Jacobian matrix F with terms F ij . There would be an added constraint that the sum of the terms in the same column would add up to zero, or that the diagonal term is the negative of the sum of the other two terms in the same column.

Non-linear models
The non-linear feedforward loop models 32 are described by: When an AND gate is present When an OR gate is present For a given u, v ϵ {x 1 , x 2 , x 3 } and K, K u , K v ϵ {K x1x2 , K x1x3 , K x2x3 }: If u activates its target, then: If u represses its target, then: Effectively, an external stimulus of 'S 1,ex = 1', acts on Node 1 at t = 0 and is propagated through the network. There is no external stimulus acting on Node 2 and Node 3. However, in many cases there is basal production in one or both of Node 2 and Node 3. This leads to a non-zero steady-state of the network before the external stimulus is introduced.
To capture basal activity, we use a two-step approach. First, starting from node activity values of zero, without the external stimulus on Node 1 (S 1,ex = 0), we simulate until the network reaches steady-state. Then, we introduce the external stimulus on Node 1, integrate the ODEs, and sample 11 evenly spaced timepoints using ode15s in MATLAB with default settings and steadystate node values without the external stimulus as the initial conditions. We chose 11 timepoints because it yields good classification accuracy for the above random 3 node model even in presence of noisy data. For each of the 16 non-linear models, the values of the parameters (K, Ku, Kv), were varied and chosen so that the resulting node activity data are responsive to the stimulus and perturbations (Supplementary Fig. S6, See Supplementary Code for values).

Modeling perturbations
Precisely modeling perturbations can be a challenge, since experimentally, there may be several ways of causing a perturbation with different mechanisms such as siRNAs, competitive/noncompetitive/uncompetitive inhibition, etc. It may be hard to quantify how much a perturbation is affecting a node, in terms of its dynamics (i.e. right-hand sides of the ODEs). Therefore, we employ the following approaches which circumvent the need to model how each perturbation mechanistically manifests in the ODEs during parameter estimation. There are two cases to consider: (i) when we have a perturbation of node i and we need to simulate node i dynamics; (ii) when we have a perturbation of node i and we need to simulate other node j dynamics. To illustrate the approach, we use the above-described 2 node model with an example of a Node 1 perturbation. Recall that For case (i), we have to obtain values for x 1 under perturbation of Node 1. We refer to the perturbed time-course as x 1,1 . In experimental situations, x 1,1 would be measured directly. To obtain simulation data for x 1,1 we use the following: where x 1 is obtained from the simulations without perturbations, and recall that k refers to time point k. For a 50% inhibition, p = 0.5 and for a complete inhibition, p = 0. For case (ii), we have to obtain the values for x 2 under perturbation of Node 1, which we refer to as x 2,1 . To do this, we have to integrate the ODE for dx 2 /dt, but using x 1,1 values, as follows dx 2;1 dt ¼ S 2;b þ S 2;ex þ F 21 x 1;1 þ F 22 x 2;1 (43) Here, x 2 has been replaced with x 2,1 to represent x 2 under perturbation of Node 1, for clarity. To solve this equation, we simply use the "measured" x 1,1 time course directly in the ODE.
When data are generated by simulations, there is little practical limit to temporal resolution, but with real data, to solve Eq. (43) one may need values for x 1,1 at multiple time points where measurements are not available, depending on the solver being used. We therefore fit x 1,1 data to a polynomial using polyfit in MATLAB, and use the polynomial to interpolate given a required time point. In this work, we have used an order of 5 to fit the data as well as avoid overfitting, but the functional form is quite malleable so long as it captures the data trends.
For modeling perturbations of the cell transition model, the initial value of the simulated data for the perturbed node was taken as zero during simulation. The estimation was performed in a similar way as a random 3 node network as described above.
For modeling perturbations for the Huang-Ferrell model, the parameters k3, k15 and k27 were sequentially set as zero. The estimation was performed in a similar way as a random 3 node network as described above.

Simulated noise
Normally distributed white (zero mean) noise is added to simulated time courses point-wise with where x is the simulation data point, y is the noisy data point, and d represents the noise level. Signal-to-noise ratio of 10:1, 5:1 and 2:1 are, respectively d = 0.1, 0.2, and 0.5. Normally distributed samples are obtained using randn in MATLAB. While there are many different distributional options for modeling noise, we chose this for simplicity and to capture the effects generically of noisier data. We do not intend to answer questions related to whether specific distributional assumptions about the form of the noise have significant impact of the methods performance.

Parameter estimation
For the two-node model, the entire network, with and without perturbations, can be explained by the following system of equations where x 1,1 and x 2,2 are the perturbed node values, from either simulated or experimental data. Eight parameters (S 1,b , S 1,ex , F 11 , F 12 , S 2,b , S 2,ex , F 21 , F 22 ) need to be estimated to fully reconstruct this network. We seek a set of parameters that minimizes deviation between simulated and measured dynamics.
For an initial guess, the node edge parameters (F ij ) are randomly sampled from a uniform distribution over the interval [−2,2] and the stimulus parameters (S i,ex ) are sampled from a uniform distribution over the interval [0,2]. Using data at t = 0, which corresponds to a steady-state without S i,ex , the S i,b can be estimated during each iteration of the estimation as follows-S 1;b ¼ ÀðF 11 x 1 ðt ¼ 0Þ þF 12 x 2 ðt ¼ 0ÞÞ S 2;b ¼ ÀðF 21 x 1 ðt ¼ 0Þ þF 22 x 2 ðt ¼ 0ÞÞ (46) For an n-node model, this equation can be scaled accordingly to obtain eachŜ i,b .
For these initial guesses we compute the activity data using Eq. (45). The perturbation data x k,k is used in the perturbation equations as detailed above (Eq. (43)). Letx i , andx i,j denote the predicted node activity values for non-perturbed and perturbed cases respectively. For a total of n nodes and N t timepoints, the objective function is the sum of squared errors Φ X j≠i x i;j ðkÞ Àx i;j ðkÞ À Á 2 " # Note here that we do not use data from node j, when perturbation j was used (per the derivation). The MATLAB function fmincon is used to minimize Φ by changing edge weights and stimulus terms within the range [−10,10].
We employ "multi-start" by running the estimation 10 times, starting from different randomly generated initial starting points 77 . The estimated parameters and their respective final sum of squared errors (Φ) are saved and the estimated parameter set corresponding to the minimum Φ is chosen as the final parameter set. Variability of parameter estimates across multi-start runs is explored in Supplementary Fig. S5.

Parameter estimation for non-linear models
For estimating the Non-Linear models, we start with a prior knowledge that S 1,b is always zero and S 2,ex and S 3,ex are always zero as well, which is directly evident from x 1 initial conditions and x 2 , x 3 stimulus response in the presence of a complete Node 1 perturbation. The equations for the non-perturbation case become as follows Since the system is at steady-state before the external stimulus, the basal production parameter can be estimated during each iteration of the estimation as-S 2;b ¼ ÀðF 21 x 1 ðt ¼ 0Þ þF 22 x 2 ðt ¼ 0Þ þF 23 x 3 ðt ¼ 0ÞÞ whereF i,j are the current model parameter estimates and x i (t = 0) are the x values at the initial system steady state before the induction of external stimulus.
Bootstrapping simulated data for the FFL model cases To generate multiple parameter set estimates to classify edge weights for the FFL model cases, we employ a bootstrapping approach. In an experimental scenario, each data point will have a mean and a standard deviation, and upon a distributional assumption (e.g. normal), one can then resample datasets to obtain measures of estimation uncertainty. We use the simulated data as the mean, and then vary the standard deviation as described above to generate 50 bootstrapped datasets for each of the 16 considered models. Estimation is carried out for each of the 50 datasets using multi-start, which yields 50 best-fitting parameter sets for each model. Uncertainty analysis and classification error is based on these sets.

DATA AVAILABILITY
All relevant simulated data used in the paper are provided and can be accessed along with the code at https://doi.org/10.5281/zenodo.6516238. Any other relevant data can be obtained from the authors.