## Summary

Quantitative, directional network structure inference remains challenging even for small systems, particularly when loops and cycles are present. We report a method that robustly infers direct, signed connections between network nodes from noisy, sparse perturbation time course data requiring only one perturbation per node. We find good sensitivity and specificity for classification, as well as quantitative agreement in randomized 2- and 3-node systems having varied and complex dynamics. Experimental application of the method to the ERK and AKT pathways, widely important in mammalian signaling, reveals evidence of bi-directional cross-talk coupled with strong negative feedback on both pathways, consistent with prior knowledge. Systematic application of this method can reduce important subnetwork structural uncertainty, enabling better prediction of dynamics, response to perturbations such as drugs, and understanding of biological networks. The method is general and can be applied to any network inference problem where perturbation time course experiments are possible.

## Introduction

Networks underlie much cellular and biological behavior, including transcriptional, protein-protein interaction, signaling, metabolic, cell-cell, ecological, and social networks, among many others. As such, identifying and then representing their structure has been the focus of many for decades now. This is both from experimental perspectives alone, but predominantly computational with a variety of statistical methodologies that integrate new data with prior knowledge from interaction databases and experimental data (Angulo et al., 2017; Barabási and Albert, 1999; Califano et al., 2012; Calvano et al., 2005; Hein et al., 2015; Hill et al., 2017, 2017, Ideker et al., 2001, 2002; Liu et al., 2013; Ma’ayan et al., 2005; Margolin et al., 2006; Mazloom et al., 2011; Mehla et al., 2015; Molinelli et al., 2013; Pe’er et al., 2001; Pósfai et al., 2013; Shannon et al., 2003; Stein et al., 2015; Wynn et al., 2018). Network structure can be represented as either undirected or directed graphs with edges between nodes, and the links between nodes can be weighted or not. The edges in undirected graphs are often non-causal, and even directed graphs may have edges that are purely correlational, rather than causal. A complete and predictive understanding of the biological networks that dictate cellular behavior often requires information relating to causality, which are not typically guaranteed by commonly employed correlational and Bayesian methods. This is particularly the case for networks with cycles such as feedback or feedforward loops, which are nearly ubiquitous in biological systems. Moreover, dynamic and quantitative behavior of a network is important for prescribing biological function (e.g. circadian or p53 oscillators (Bell-Pedersen et al., 2005; Stewart-Ornstein et al., 2017)). To describe and predict such behavior, networks must often also be weighted. Identifying such networks largely requires perturbation time course data to gain information related to causality, but how to uniquely identify such networks and what data are needed to do so is not as well explored.

Modular Response Analysis (MRA) approaches, first pioneered by Kholodenko and colleagues in 2002 (Kholodenko et al., 2002; Santra et al., 2018) inherently deal with cycles and causality by prescribing systematic perturbation experiments. The original instantiations struggled with noise, but total least squares MRA and monte carlo sampling helped to improve performance (Andrec et al., 2005; Santos et al., 2007). Incomplete and prior knowledge can be handled as well using both maximum likelihood and Bayesian approaches (Halasz et al., 2016; Klinger et al., 2013; Santra et al., 2013). However, most of these approaches are based on steady-state data, or fixed time point data, limiting abilities to deal with dynamic systems and infer causality. One MRA variant, dynamic Modular Response Analysis (DMRA) (Sontag et al., 2004), uses time-series perturbation data to uniquely infer a weighted, directed network that in principle can deal with cycles, and can also predict time-dependent network behavior. However, two separate perturbation experiments are reported to be required for each node, which can be experimentally challenging both in terms of scale and finding suitable distinct perturbations for a node (e.g. both production and degradation of a transcript, or phosphorylation and dephosphorylation of a protein). Moreover, as is often the case, noise in the experimental data can severely limit the inference accuracy. This drawback has largely precluded any widespread application of DMRA. Thus, there remains a need for inference methods that can reconstruct weighted, directed networks from realistic perturbation time course experiments and that function in the presence of typical noise levels.

Here we describe a novel network inference approach that builds on MRA and DMRA, and that addresses these challenges of experimental noise, sparse time courses, and correctly inferring cycles. We test our approach on a series of simulated time-series perturbation data with known network topology under increasing levels of simulated noise. The approach has good accuracy and precision for identifying network structure in random two and three node networks that contain a wide variety of cycles. We also demonstrate the approach works well for a family of 16 non-linear feedforward loop network models. We then apply our approach to experimental data to infer pathway interaction strengths between ERK and AKT signaling from ELISA-based perturbation time-course experiments, revealing both previously known and new network connections. While challenges remain for expanding to larger systems with more parameters and dynamic network connection strengths, the proposed algorithm accurately reconstructs networks with feasible time course experiments.

## Results

### Formulation

Consider a two-node network with four directed, weighted edges (Fig. 1a), where a stimulus induces measurable dynamic changes in the nodes (Fig. 1b). Let *x _{i}(k)* be the activity of node

*i*at time point

*t*. The network dynamics can cast as a system of ordinary differential equations (ODEs)

_{k}The network edge weights can be connected to the system dynamics through the so-called Jacobian matrix (Santra et al., 2018; Sontag et al., 2004),

Thus, the network edge weights (*F _{ij}*’s) describe how the activity of one node affects the dynamics of another node in a causal and direct sense (though not necessarily in a physical sense).

Consider three time course experiments that each measure *x _{1}* and

*x*dynamics in response to a common stimulus (Fig. 1c,d), where one is in the presence of no perturbation, one has a perturbation of

_{2}*x*, and one has a perturbation of

_{1}*x*. Consider further that the perturbations are reasonably specific, such that the perturbation of

_{2}*x*has small direct effects on

_{1}*x*, and vice versa. Experimentally, this would assume that a shRNA or gRNA is specific to a particular considered node, or that a kinase inhibitor is used at low enough dose to predominantly inhibit the targeted node. A well-posed estimation problem can be formulated (see Methods) that, in principal, allows for unique estimation of the Jacobian elements as a function of time from such data with the following set of linear algebra relations:

_{2}Here, *y _{i}* refers to a measured first time derivative of node

*i*, superscripts refer to a particular perturbation, and Δ to a difference with respect to perturbation (subscript

*p*) or time (subscript

*t*) (see Methods for details). Importantly, this formulation is generalizable to any

*n*dimensional network. With

*n*unknown parameters in the Jacobian matrix,

^{2}*n*equations originate from the vehicle perturbation and

*n-1*equations originate from each of the

*n*perturbations (following the above requirement, we discard equations from node

*i*with perturbation

*i*). This results in

*n*+

*n** (

*n*– 1) =

*n*+

*n*

^{2}−

*n*=

*n*

^{2}independent equations.

### Directly Solving for Edge Weights

As an initial test of the above formulation, we used a simple two-node, single activator network where Node x_{1} activates Node x_{2}, and both nodes have first-order degradation (−1 diagonal elements), all with time-independent Jacobian elements (Fig. 2a––see Methods for equations). Following a stimulus that increases the activity of each node, we simulated an 11- point time course experiment (chosen for experimental feasibility––despite being on the high side of time points typically gathered). This procedure was done for the vehicle control and for each perturbation case (Node x_{1} and Node x_{2}) to generate the necessary simulated data per the theoretical considerations above (Fig. 2b). Here, we modeled perturbations as complete inhibition (although incomplete perturbations are explored later). After solving Eqs. 3-4 to infer the Jacobian elements at each time point, we saw excellent agreement with the median estimates and the ground truth values (Fig. 2c, “Analytic Solution”).

How does this approach fare when data are noisy? We added a relatively small amount of simulated noise (10:1 signal-to-noise, typical for high precision experimental data) to the time course data and reapplied the inference algorithm. The resulting parameter predictions over the simulated time course became considerably less accurate, varying on a scale more than ten times greater than each parameter’s magnitude with predictions both positive and negative regardless of the ground truth value (Fig. 2d).

### Least Squares Estimation Leveraging Data Across Time Points

Although the aforementioned theoretical approach suggests the sufficiency of the perturbation time course datasets to uniquely estimate the edge weights, in practice even small measurement noise corrupts such estimates. Therefore, we considered an alternative representation by employing a least squares estimation approach to predict edge weights (Andrec et al., 2005), rather than solve the linear equations directly. For a given set of edge weight parameters, one can integrate Eq. 1 to obtain a solution to the dynamic behavior of the resulting model, which can be directly compared to data. Thus, our problem reduces to finding a set of time-invariant parameters *F _{ij}* that best fit simulation to data (see Methods). We applied this new approach to the single activator model, 10:1 signal-noise ratio case above where the analytic approach failed. This new estimation approach was able to infer the network structure accurately and precisely (Figure 2e).

### Random Two and Three Node Networks

To investigate the robustness of the least squares estimation approach, we began testing it on increasingly complex networks with larger amounts of measurement noise. We generated 50 randomized two-node models where each edge weight is randomly sampled from a uniform distribution over the interval [-2,2] (Fig. 3a and Fig. S1, left). Each random network was checked for stability (see Methods), many (21/50) displayed potential for oscillatory behavior (non-zero imaginary parts of eigenvalues), and multiple networks with positive feedback were present (16/50). For each random two-node model, we generated a simulated dataset based on the prescribed time series perturbation experimental design, using complete inhibition as the perturbation. The stimulus to each node was also randomly sampled from a uniform distribution, but over the interval [0,2], which we also attempted to estimate. We added 10:1 signal-to-noise, 5:1 signal-to-noise, and 2:1 signal-to-noise (see Fig. 3b as an example) to the data.

For each randomized model and noise level, we applied least squared estimation and found reasonable agreement between inferred and ground truth values, even at the higher noise levels (Fig. 3c). The algorithm was also able to estimate the contributions of stimulus to each node; we refer to these quantities as *F _{i0}*, where

*i*is the node the stimulus is acting on (see Methods). The no-noise case shows an almost exact match between prediction and ground truth for the edge weights. While the predictability of edge weights decreases as noise increases (especially as the magnitude of the ground truth value increases), in general, larger ground truth values translate to larger predicted values and vice versa. Importantly, we find incorrect classification for an edge (i.e. predicts an edge as positive/activator, when it is negative/repressor, or vice versa) is extremely rare (Fig. 3d).

We next expanded to random three node models (Fig. 4a). We similarly generated 50 randomized three-node models and the same noise levels were applied. As above, each random network was checked for stability (see Methods) and many (23/50) displayed potential oscillatory behavior (non-zero imaginary parts of eigenvalues), with multiple types of motifs present (Fig. S1, right). We generated simulated noisy perturbation time course data using complete inhibition. After applying the least squares estimation approach to the simulated noisy data from each random model, we again find high consistency between the predicted and ground truth parameter values (Fig. 4b). Directionality prediction accuracy is still strong (Fig. 4c), although performance is noticeably worse with the extreme case of 2:1 signal-to-noise.

### Non-Linear Feed Forward Loop Networks with Time-Varying Jacobians

The previous applications were idealized cases, where the parameters *F _{ij}* were time-invariant and the model used to generate simulated data was the same as the one use to fit the data. Here, we challenged the proposed algorithm further with a series of non-linear feed forward loop (FFL) network models that have time-varying Jacobian elements (Fig. 5a, Table 1). Such FFL motifs are strongly enriched in multiple organisms and are important for signaling functions such as integrative control, persistence detection, and fold-change responsiveness (Goentoro and Kirschner, 2009; Goentoro et al., 2009; Nakakuki et al., 2010).

The FFL network has three nodes (x_{1}, x_{2}, and x_{3}). In the original set of models, the stimulus to this network acts only on x_{1} (*F _{10}*). However, we wanted to challenge the algorithm by potentially allowing the stimulus to act on nodes independently, as may be the case in real situations. Each node exhibits first-order decay (

*F*=-1). The parameters F

_{ii}_{12}, F

_{13}, and F

_{23}represent connections that do not exist in the model; we call these null edges, but we allow them to be estimated. The relationship between x

_{1}and x

_{2}(F

_{21}), between x

_{1}and x

_{3}(F

_{31}), or between x

_{2}and x

_{3}(F

_{32}) can be either activating or inhibitory. Furthermore, x

_{1}and x

_{2}can regulate x

_{3}through an “AND” gate (both needed) or an “OR” gate (Fig. 5a). These permutations give rise to 16 different FFL structures (Table 1).

For each of these structures, we integrated the system of ODEs to generate time series perturbation data consistent with the proposed reconstruction algorithm, again with varying levels of signal-to-noise (No noise, 10, 5, 2) and using complete perturbations, and then applied the estimation algorithm. We did not consider any prior knowledge in the network; we allowed the algorithm to choose values for all nine edge weights plus stimuli strengths *F _{i0}*. To assess the performance of the algorithm and account for uncertainty in estimated edge weights, we generated 50 bootstrapped datasets for each network structure/signal-to-noise pair, and thus obtained 50 sets of estimates, similar to previous work (Santos et al., 2007). Edges are classified as activators (>0), repressors (<0), or null (~0).

We first noticed that even in the absence of added noise, some inferences were incorrect (Fig. 5b). Model #4 (Table 1) is used as an example, where F_{21} is an activating and F_{31} and F_{32} are repressors with an AND gate; F_{31} is incorrectly predicted as activating (Fig 5b—compare ground truth to complete inhibition). We reasoned this is due to the following. Since x_{1} is required for the activation of x_{2}, and both x_{1} and x_{2} are required for the activation of x_{3}, complete inhibition of x_{1} removes all regulated activity from x_{2} and x_{3}. Completely inhibiting x_{2} activity also removes all activity from x_{3}. Thus, given this experimental setup, it becomes difficult to discern if x_{1} directly influences x_{3} or if it acts solely through x_{2}. If instead of applying complete inhibition, we use partial (50%) inhibition, F_{31} is correctly identified as a repressor in this case, and more broadly, we obtain perfect classification from noise-free data across all 16 FFL networks (top row, Fig. 5e). Thus, we instead applied 50% perturbation to all simulation data and proceeded with least squares estimation. Example time courses and fits for the no noise case are shown in Fig. S3 and S4.

Next, we sought a rationale for the classification of null, activating and repressing edges from the least squares estimation results. From the bootstrapped datasets, there are 50 estimates for each edge weight. We chose a simple approach of a strict, symmetric cutoff around zero based on the percentile of these 50 estimates. For example, if the percentile cutoff was 0.9, and if 90% of an edge’s weight estimates were greater than zero, then the edge would be classified as an activator, and if 90% were below zero, the edge would be classified as a repressor. Anything between is classified as a null link, meaning that there is not sufficient evidence to support inference of a connection. We varied the percentile cutoff from the median only (0.5) to the entire range of estimated values (1) and calculated the true and false positive rates for all edges across all 16 FFL models (Fig. 5c). For each noise level, we chose the percentile cutoff that yielded a 5% false positive rate (0.69 for 1:10 noise, 0.66 for 1:5 noise, and 0.7 for 1:2 noise) to generate receiver operating characteristic (ROC) curves (Fig. 5d). We observed excellent classification performance across all noise levels (10:1 AUC=0.996, 5:1 AUC=0.992, 2:1 AUC=0.959).

To evaluate the performance for each of the 16 FFL cases, we calculated the fraction of the 12 links in each FFL model that was classified correctly as a function of signal-to-noise, given the cutoffs determined above (Fig. 5e and S2). Perfect classification is a value of one, which is the case for no noise, and for most cases with 10:1 signal-to-noise. In general, as noise level increases, prediction accuracy decreases, as expected. This is particularly the case for the edges that are distinct across models (F_{21}, F_{31}, F_{32}), which are difficult to infer with 2:1 signal-to-noise (Fig. S2). Although for some models, performance at 2:1 signal-to-noise is quite poor, in some cases it is surprisingly good and even perfect. This suggests that the proposed method is very accurate in low-to-medium noise cases and, even when the data are essentially unacceptably noisy (2:1) and thus more difficult to analyze, the non-zero predictions that are made have a surprisingly low likelihood of error.

### Application to Experimental Data

We wanted to test the approach with experimental data. We decided to investigate the interaction strengths (i.e. crosstalk) between the ERK and AKT pathways, two broadly important mammalian signaling pathways regulating cell proliferation, death, migration and differentiation (Manning and Toker, 2017; Shaul and Seger, 2007). We used In-Cell ELISA to measure the dynamic response of ERK and AKT phosphorylation in serum- and growth-factor starved MCF10A cells to an Epidermal Growth Factor (EGF) and Insulin combination stimulus, which robustly activates both ERK and AKT over ~hour time scales to drive the cell cycle (Bouhaddou et al., 2018). We did this in the presence of vehicle (Fig. 6a, top), MEK inhibition (Fig. 6a, middle, directly upstream of ERK), or AKT inhibition (Fig. 6a, bottom).

After applying least squares estimation, we obtained the Jacobian which characterizes the network (Fig. 6b). There are two aspects that are corroborated by prior knowledge. First, F_{10} and F_{20} are both positive; it is well known that EGF+Insulin can activate ERK and AKT with independent mechanisms of Ras and PI-3K activation (Manning and Toker, 2017; Shaul and Seger, 2007). Second, negative feedback has been widely reported for both pathways through post-translational and transcriptional mechanisms (Efeyan and Sabatini, 2010; Fritsche-Guenther et al., 2011; Sturm et al., 2010). F_{11} and F_{22} are both negative and less than −1, which implies negative auto-regulation in excess of simple first-order decay. Cross-talk between these pathways is context dependent, with reports of none, positive and negative links (Dent, 2014; Mendoza et al., 2011; Wang et al., 2009). In this context of serum-starved MCF10A cells stimulated with EGF+Insulin, our results suggest AKT positively regulates ERK, whereas ERK negatively regulates AKT, and both have similar magnitudes. These results can form the basis for future mechanistic investigations into the biochemistry and physical reactions comprising these putative links, and also confirm that our approach can be applied to real experimental time course data to uncover both known and imply new biological network structure.

## Discussion

Despite intensive research focus on network reconstruction, there is still room to improve discrimination between direct and indirect edges (i.e. causality), particularly when biologically-ubiquitous feedback and feedforward cycles are present that stymie many statistical or correlation-based methods, and also given that experimental noise is inevitable. The presented method prescribes a realistic experimental design for causal edge weight inference with experimental noise. For small 2 and 3 node networks, the method can successfully handle both random networks with constant edge weights over a time course as well as time-variant edge weights in more complex, non-linear feed forward loop structures. Prediction accuracy was strong in many cases even with simulated noise that exceeds typical experimental variability (2:1 signal-to-noise). The method presented here is applied to molecular biology examples, but they are in fact quite general, and could be applied to vastly different fields where perturbation time course experiments are possible, and where network structures are important to determine.

Although complete inhibition is often used for practical perturbations (e.g. CRISPR), we found that partial inhibition is important to fully deconvolve the explored FFL networks. This is not likely to be the only case when partial inhibition will be important, so we are inclined to speculate it will be a generally important experimental design criterion moving forward. However, there is likely to be a sharp tradeoff between perturbation strength and feasibility, since the effects of small perturbations may be masked by noise. Partial inhibition is often “built-in” to certain assay types, such as si/shRNA or pharmacological inhibition that are titratable to certain magnitudes.

One major remaining challenge is scaling to larger networks. Here, we limited our analysis to 2 and 3 node networks. Conveniently, the number of necessary perturbation time courses needed grows linearly with the number of considered nodes. Furthermore, as long as system-wide or omic-scale assays are available, so to does the experimental workload. This is routine for transcriptome analyses, and is becoming even more commonplace for proteomic assays (e.g. mass cytometry, cyclic immunofluorescence, mass spectrometry, RPPA) (Lin et al., 2016).

Increasing the network size will quadratically increase the number of unknown parameters. Reducing this parameter space and obtaining good initial guesses (such as with MRA as we did here—see Methods) will be important. Imposing prior knowledge can also reduce the parametric space, such as in Bayesian Modular Response Analysis (Santra et al., 2013). As network size grows, sparseness of the Jacobian will increase, so judicious allocation of non-zero elements will be important. Checking estimated Jacobians for emergent properties such as degree distributions for scale-free networks (Barabási and Albert, 1999) can provide additional important constraints. Lastly, large estimation problems may be broken into several smaller problems to be merged subsequently, which is likely to yield large computational speed up by allowing parallelization of much smaller tasks.

## Author Contributions

MRB and GRS conceived of the work. GRS, MB, MRB, CMA, OMZ and JE performed analyses. MB and GRS made the figures. GRS, MB, and MRB wrote the manuscript. MB and ADS performed the ELISA experiments.

## Declaration of Interests

The authors declare no competing interests.

## Funding

MRB acknowledges funding from Mount Sinai, Clemson University, the National Institutes of Health Grants R01GM104184 and R21CA196418, and an IBM faculty award. MB and ADS were supported by a National Institute for General Medical Sciences-funded Integrated Pharmacological Sciences Training Program grant (T32GM062754).

## Methods

### Deriving Sufficiency Conditions for Unique Estimation of Jacobian Elements

The first-order partial derivatives comprising **J** (Eq. 2) can be defined by a first-order Taylor series expansion of Eq. 1 about a time point *k*

Eq. 3 may be written more succinctly as where

We refer to time point *k* as the reference point with respect to time (RP_{t}) and time point *k*+*1* as the evaluation point with respect to time (EP_{t}) (Fig. 1d). The approximation in Eq. 7 becomes more accurate as RP’s and EP’s become closer; however, in practice experimentally the perturbations must induce measurable changes. Also, it is evident that the edge weights are potentially time-dependent, although this is rarely considered when describing biological networks.

How do we estimate the edge weights in Eq. 7 and thus reconstruct the network? Time series data can inform *x _{i}*’s and

*f*’s as a function of time, following application of a stimulus. Given such stimulus-response data, however, for each time point there are only two equations for four unknowns, an underdetermined system for which more data are needed.

_{i}Consider now stimulus-response time course data in the presence of single perturbations (Fig. 1c). Let *p ^{(i)}* be a variable that reflects the strength and/or presence of different potential perturbations:

*p*represents inhibition of x

^{(1)}_{1}and

*p*represents inhibition of x

^{(2)}_{2}. If

*p*is not explicitly written, its value is zero and/or it has no effect. Now, the ODEs also become a function of the perturbation variables

^{(j)}The Taylor series expansions for cases with perturbations become where

Here, we have chosen the reference point with respect to perturbations (RPp) to be the same as RPt, but the evaluation point with respect to perturbations (EPp) to be defined by the perturbation time course (Fig. 1d). Since the reference point is the same, the Jacobian elements remain identical in these equations. Thus, now we have six potential equations with which to estimate the four Jacobian elements. It is interesting to note that the Jacobian elements, or network, may be affected by the perturbation, but we do not necessarily have to know those effects mathematically, since the reference point is the same. However, we must make some determination as to how the perturbations *p ^{(1)}* and

*p*directly affect Node 1 and Node 2 dynamics

^{(2)}*f*and

_{1}*f*to account for the perturbation variable partial derivatives.

_{2}By design, the Node 1 perturbation has significant direct effects on Node 1 dynamics, and similarly for the Node 2 perturbation on Node 2 dynamics. Using equations including ∂*f*_{1}/*∂p*^{(1)} and ∂*f*_{2}/*∂p*^{(2)} require precise definition of perturbation strength and their effects on dynamics, which is difficult, as well as using perturbation data that is likely to be extremely far away from the reference point, and therefore we do not employ equations involving these terms here. On the other hand, if the Node 1 perturbation has negligible direct effect on Node 2 dynamics, that is, the only effects on Node 2 dynamics are through its effects on Node 1 (i.e. *p ^{(1)}* is not explicit in

*f*), and similarly the Node 2 perturbation has negligible direct effect on Node 1 dynamics, then

_{2}*∂f*

_{2}/

*∂p*

^{(1)}and

*∂f*

_{2}/

*∂p*

^{(2)}can be neglected from the above equations. This mild condition is often the case experimentally. From this, the main set of linear equations presented in Eq. 3-4 are obtained.

### ODE Model Equations and Simulation

The two-node single activator model is described by

The random two node network is described by

Values for F_{10} and F_{20} are sampled from a uniform distribution over the range [0,2] and values for F_{11}, F_{12}, F_{21}, and F_{22} are sampled from a uniform distribution over the range [-2,2] using the MATLAB function `rand`.

The random three node networks use the same sampling rules with the following equations

The feedforward loop models (Mangan and Alon, 2003) are described by:

When an AND gate is present

When an OR gate is present

For a given u, v ϵ {x_{1}, x_{2}, x_{3}} and K, K_{u}, :

If u activates its target, then:

If u represses its target, then:

Models were integrated using `ode15s` in MATLAB with default settings and zero initial conditions.

### Time Point Selection

Unless otherwise noted, we sample 11 evenly spaced time points across a time course. This number is chosen to be on the high end of experimental feasibility. To determine the length of a time course, we calculate the time necessary for each element to reach within 1% of steady-state across perturbation conditions. If not all elements reach a steady state (e.g. oscillations), then a default time of T = 10 arbitrary units is used.

### Modeling Perturbations

We model perturbations by scaling the differential equation for the perturbed node. For example, a complete inhibition of x_{1} multiplies dx_{1}/dt by zero, and a 50% inhibition multiplies by 0.5. Because simulations in this study start from a zero initial condition, this perturbation model suffices, but in other scenarios, different considerations might be important for perturbation models.

### Simulated Noise

Normally distributed white (zero mean) noise is added to simulated time courses point-wise with
where *x* is the simulation data point, *y* is the noisy data point, and *d* represents the noise level (standard deviation of the normal random variable). Signal-to-noise ratio of 10:1, 5:1 and 2:1 are, respectively *d* = 0.1, 0.2, and 0.5. Normally distributed samples are obtained using `randn` in MATLAB.

### Parameter Estimation

Consider experimental data matrices *X _{p0}*,

*X*, where

_{pk}*k*∊{

*1*…

*m*} is a perturbation index, describing the activity level of each node at each time point t

_{i},

*i*∊{1 …

*n*} for each perturbation condition. For a given set of parameters ,

*u*∊{

*1*…

*m*},

*v*∊ {

*0*…

*m*} in a system with

*m*nodes, we want to generate a set of simulated data matrices . To that end, we design a system of differential equations by declaring initial values and for a given time point t

_{i},

*i*∊{1 …

*n*}:

We integrate this system of ODEs at each experimental time point by the MATLAB function `ode15s`. We model perturbations by replacing simulated output for the variable being perturbed with the corresponding experimental value at each time point. That is,

This ensures we are accurately capturing the change in activity as a result of the perturbation. We make this correction prior to performing `ode15s` on each time interval so that changes in other nodes reflect the perturbed values. This allows us to efficiently calculate simulated time courses that reflect each perturbation experiment.

We then define the objective function as the sum of squared errors *y*

Note here that we do not use data from node *k* when perturbation *k* was used (as explained above).

The MATLAB function `fmincon` is used to minimize *y* by changing edge weights and stimulus terms (*F _{i0}*) within the range [-100,100]. To generate initial guesses for each of the edge weights, we apply steady-state modular response analysis (MRA) (Kholodenko

*et al.*, 2002) using the final values of the time course. Briefly, for each node in the system

*x*, we calculate the

_{i}*i*column of the global response matrix

^{th}**Rp**by the central fractional differences in the final time point levels for each node

*j*:

Here is the value of node *j* with the vehicle perturbation and is the value of node *j* with the perturbation of node *i*. The initial edge weight estimates are contained in the local response matrix . If an edge weight is predicted to be identically zero by MRA, it is held fixed in the estimation. The remaining parameters (effects of stimulus) are initially set to one.

### Cell Culture

MCF10A cells were cultured in DMEM/F12 (Gibco# 11330032) medium supplemented with 5% (v/v) horse serum (Gibco# 16050–122), 20ng/mL EGF (PeproTech# AF-100-15), 0.5 mg/mL hydrocortisone (Sigma# H-0888), 10μg/mL insulin (Sigma# I-1882), and 100ng/mL cholera toxin (Sigma# C-8052). Cells were cultured at 37°C in 5% CO2 in a humidified incubator, and passaged every 2–3 days with 0.25% trypsin (Corning# 25-053-CI) to maintain subconfluency. Starvation medium was purely DMEM/F12 medium. Cell line identity was confirmed by STR profiling.

### In-Cell ELISA Perturbation Time-Course Experiments

MCF10A cells were seeded in 96-well plates (Corning# 3603), at 10,000 cells per well, in serum containing medium (see above). Cells were serum- and growth factor-starved overnight (~18 hrs) and stimulated with 20ng/mL EGF (PeproTech# AF-100-15) and 10μg/mL insulin (Sigma# I-1882) for several different time periods (5, 30, 60, 90, 120, 240, and 360 minutes). For conditions receiving inhibitors, cells were pre-incubated with either 0.1 nM MEK- (PD0325901, Sigma# PZ0162) or 10 nM AKT-inhibitor (MK2206, ChemieTek# CT-MK2206) for 30 minutes prior to stimulation with growth factors. Cells were then fixed with 4% paraformaldehyde (EMS# 15710-S) for 15 minutes at room temperature, washed 3x with 200uL of 1x PBS, then permeabilized with 200uL of 1:100 dilution of Triton X-100 (Sigma# X100-500ML) in 1x PBS for 30 minutes at room temperature. Permeabilization buffer was then removed and each well was incubated with 200uL of Odyssey Blocking Buffer (TBS; LICOR #927-5000) for 2 hours at room temperature. The blocking buffer was removed and cells were incubated overnight at 4°C with 100 uL of a primary antibody solution composed of either anti-ppERK (Cell Signaling# 4370, 1:1000) or anti-pAKT(S473) (Cell Signaling# 4060, 1:1000) antibodies diluted directly in the Odyssey Blocking Buffer. The next day, primary antibody solution was removed, cells were washed 3x with 250uL of 1x wash buffer (625uL of Tween-20, Sigma# P1379-500ML, was diluted in 250mL of 1x PBS to make a stock solution). Cells were incubated with 100uL of secondary antibody solution (Goat Anti-Rabbit IgG Horseradish Peroxidase Thermo# 31460, 1:1000 dilution in Odyssey Blocking Buffer then 1:10 in 1X PBS), was added for 2 hours at room temperature. Secondary antibody solution was removed and cells were washed 4x with 250uL of 1x wash buffer. Next, 100uL of OPD substrate solution (5mg of o-Phenylenediamine, Sigma# P-2903, diluted in 10mL citric phosphate buffer pH 5 with 10uL of 30% hydrogen peroxide added immediately prior to use) was added to each well and the plate was immediately put into the plate reader (PerkinElmer Ensight Model# H3570) for kinetic reading. Absorbance was measured at 450nm every two minutes, with shaking in-between each reading, for a total of 30 minutes. Absorbance values obtained during the linear range of the reaction were used. Serum-starved (control) absorbance values were subtracted from all values (separately for pERK and pAKT), which normalized baseline (the zero time point) signal to zero. The 60-minute time point was removed due to the presence of outliers and unacceptable quality control metrics.

## Acknowledgements

We would like to thank Achla Gupta and Gomathi Jayaraman for help with ELISA experiments.