Qualitative modeling of signaling networks in replicative senescence by selecting optimal node and arc sets

Signaling networks are an important tool of modern systems biology and drug development. Here, we present a new methodology to qualitatively model signaling networks by combining experimental data and prior knowledge about protein connectivity. Unlike other methods, our approach does not focus solely on selecting which reactions are involved but also on whether a protein is present. This allows the user to model more complicated experiments and incorporate more knowledge into the model. To demonstrate the capabilities of our method we compared the signaling networks of young and replicative senescent human primary HFL-1 fibroblasts, whose differences are expected to be due mainly to changes in the expression of the proteins rather than the reactions involved. The resulting networks indicate that, compared to young cells, aged cells are not as responsive to insulin stimulation and activate pathways that establish and maintain senescence. Author summary Cells have developed a complex network of biochemical reactions to monitor their environment and react to changes. Although multiple pathways, tuned to identify specific stimuli, have been discovered, it is generally understood that the signaling process typically involves multiple pathways and is context depended. Consequently, reconstructing the signaling network utilized by cells at any given moment is not a trivial task. In this article, we report on a novel logic-based method for identifying signaling network by combining experimental data with prior knowledge about the connectivity of the involved proteins. Unlike other methods proposed so far, our method uses data to evaluate the presence or absence of reactions and proteins alike. We reconstructed and compared the signaling network of human primary HFL-1 fibroblasts as they undergo replicative senescence in the presence of 6 different stimuli. The resulting networks indicate that, compared to young cells, senescent cells are not responsive to insulin stimulation and activate pathways that are known to establish and maintain senescence.


Introduction 1
A central challenge of Systems Biology is to develop mechanistic models that explain 2 cell behavior as illustrated by high-throughput experimental data. Models describing 3 cell signaling are of particular importance in understanding how a cell interacts with its 4 environment and have been widely used in drug discovery [1,2] since many drugs target 5 the signaling process by inhibiting specific interactions [3]. A major challenge in dealing 6 with signaling networks is that there are no golden standards [4] as they are context 7 dependent. Information about interactions is scattered among many databases which 8 have different policies with respect to what constitutes an interaction, which 9 interactions are included and the format to encode them [5,6]. Moreover, several 10 researchers have pointed out that there is a low consensus among these databases, 11 presumably due to the low specificity of the currently available techniques to identify 12 interactions [7][8][9]. Thus, the currently available interactome serves more as a palette of 13 possible interactions rather than a snapshot of what is going on inside a cell. 14 As a result, many algorithms have been proposed for constructing signaling networks 15 by adapting the interactome (prior-knowledge about connectivity) to experimental data 16 in order to produce a network specific to the mechanism under investigation [10]. 17 Experimental data are usually derived from perturbation experiments aiming to 18 highlight the dependencies between some nodes of the network. There are two main 19 steps in designing these algorithms; first, a mathematical formalism is selected to 20 interpret the signal transduction mechanism and second an optimization process is 21 carried out to tune the parameters of the model according to the data. Deciding the 22 appropriate formalism depends primarily on the questions being asked and the type of 23 data available [11]. Some approaches consider the prior knowledge network fixed and 24 use data to tune its dynamic behavior while others use data to adjust the connectivity. 25 Methods belonging to the latter category are more suitable for dealing with large 26 signaling networks due to the limitations described in the previous paragraph. Once a 27 formalism is selected, a network can be used to simulate the signaling process and thus 28 make predictions about the involved entities. This predictive ability allows one to access 29 the quality of the network, and ultimately modify it to improve it, by comparing its 30 predictions with experimentally observed data. 31 To the best of our knowledge, the methods proposed thus far in order to reconstruct 32 signaling networks focus on selecting a subset of the known reactions that can optimally 33 simulate the observed behavior. This approach is justified for two main reasons. First, 34 as discussed above, a lot of the reported reactions are considered to be false positives or 35 context dependent. Second, cells can be probed in the presence of inhibitors, like drugs 36 or small molecules, that inhibit specific reactions and thus the effect of these 37 interactions can be evaluated directly. However, signaling networks can also be altered 38 in cells by the responsiveness of the involved proteins, either because they are saturated, 39 the cell under-expresses them or researchers have knocked them down, using CRISPR or 40 shRNA for example.

41
In this paper, we propose a novel logic-based method for identifying signaling 42 networks using signed directed graphs, also known as interaction graphs (IG) [12].

43
Interaction graphs provide a description powerful enough to capture global dependencies 44 and make qualitative predictions so they fall under the second category of data-driven 45 methods. We use mixed integer programming (MIP) to optimize the networks which, 46 despite being NP-hard in principle, have been proven effective in dealing even with large 47 instances of these problems [13][14][15]. Our method expands upon our previous work [16] 48 on the subject by expanding the scope of the optimization to include both 49 reactions/arcs and proteins/nodes of the network. As a case study, we reconstructed the 50 signaling network of human primary fetal lung fibroblasts (HFL-1) as they undergo 51 replicative senescence. In the case of aging, we suspect that the reactions involved in We assume that we are given a signed directed graph G = (N, A, σ) representing our 57 prior knowledge about the connectivity of the entities involved. Sign directed graphs, 58 also known as interaction graphs (IG), extend the notion of directed graphs by assigning 59 a sign to every arc σ : A → {−1, +1}. In our model, the nodes (N ) of an IG can assume 60 one of three possible states: up-regulated (+1), down-regulated (−1) and 61 unperturbed/basal state (0). The sign of an arc represents the effect or influence the 62 parent-node exerts on the child-node (see sign consistency later). To allow for more 63 complicated scenarios to be modeled, each node and arc is associated with a-possibly 64 empty-set of inhibitors I n and I a respectively. nodes and 262 interactions (Fig 1). Finally, we used the number of references to 78 prioritize which reactions would be included in the final network. Apart from prior knowledge about the connectivity, our method also requires a set of 80 experimentally observed dependencies in order to refine the PKN so as to simulate them 81 as closely as possible. Dependencies can be observed via perturbation experiments 82 where some nodes of the network are perturbed (set in a non-basal state) and the state 83 of some other nodes is recorded. In the context of signaling networks, these experiments 84 correspond to measuring the phosphorylation status of intra-cellular proteins upon 85 perturbing the cells. We assume that these experiments capture a steady-state shift.  On the other hand, a measurement of zero is not equivalent to no measurement, that is 92 why m e are not defined over all nodes N . Another important distinction between p e 93 and m e is that the first maps to discrete values indicating that perturbations are known 94 with certainty while measurements are not. The image of m e is a "fuzzy" relaxation of 95 the three distinct states where the distance from each discrete value {−1, 0, 1} is 96 inversely proportional to the modeler's belief that the node measured was in the

98
To simulate the outcome of a perturbation experiment, we must produce a sign 99 consistent labeling of the nodes. The notion of sign consistency is discussed in detail 100 in [17]. Here, we briefly outline the aspects that we are going to use to simulate the 101 signaling process. A sign consistent labeling l : N → {−1, 0, 1} abides by a set of rules 102 that restrict the possible state-combinations of adjacent nodes. First, every perturbed 103 node must be labeled with the intended sign 104 ∀n ∈ N : p e (n) = 0 ⇒ l(n) = p e (n) Second, the state of every non-perturbed node must be equal to the state of at least one 105 of its predecessors multiplied by the sign of the connecting arc 106 ∀t ∈ N : p e (t) = 0 ⇒ ∃a = (s, t) ∈ A : l(t) = σ(a)l(s) Notice that this rule produces a "weakly consistent" network.

108
In this section, we describe a mixed integer program to prune the original PKN in order 109 to identify a subnetwork that can be labeled sign consistently in a way that minimizes 110 the divergence between labels and measured states. The main idea of the formulation is 111 to use the constraints in order to force a sign consistent labeling and then use decision 112 (binary) variables to select nodes and arcs in order to minimize the objective function. 113 We begin by presenting the indices and the variables utilized together with their 114 interpretation and then proceed to describe the constraints and finally the objective 115 function of the problem.

116
Variables are indexed by n ∈ N to indicate the node, a ∈ A to indicate the arc and 117 e ∈ E to indicate the experiment. For each node n, a binary variable v n is introduced to 118 model whether it is part of the optimal sub-network. To model a node's state in 119 experiment e, two binary variables are introduced x + n,e and x − n,e modeling whether the it 120 is up or down regulated respectively. For each arc a, a binary decision variable y a is 121 introduced to model whether it belongs to the optimal sub-network and two continuous 122 variables z + a,e and z − a,e to model whether, in experiment e, the signal originated from an 123 up or a down regulated node. The asymmetry in the x and z variables, the former being 124 a decision variable while the latter not, is meant to accommodate the degrees of freedom 125 that sign consistency allows. In particular, the fact that each node has to be consistent 126 with at least one of each predecessor allows the optimizer to select the label that would 127 result in a better overall fit. The relationship between the x and z variables is 128 summarized visually in 2. The constraints modeling sign consistency can be grouped into two main groups, 130 those regulating the state of the nodes and those regulating the state of the arcs. For 131 notational convenience, we use the symbols a s and a t to indicate the source and target 132 node of an arc a = s → t and ± to indicate "double" constraints both for the positive 133 and negative variables involved. 134 We begin with the arc regulating constraints. For every arc a ∈ A in every 135 experiment e ∈ E the following constraints must hold: The first four constraints implement an AND gate of necessary and sufficient 137 conditions for an arc a to transduce a signal in experiment e. The three conditions are; 138 first, the arc must be included in the final sub-network, second, the source node must be 139 in the appropriate state and third no inhibitors should be active. If all of these 140 conditions are satisfied then the fourth, constraint ensures that the arc will become 141 active. The first and the last constraint also ensure that z + and z − behave as mutually 142 exclusive binary variables. The reason we do not have to explicitly designate z as binary 143 is that their value is uniquely determined for every combination of the actual decision 144 variables of the problem.

145
For every node n ∈ N and every experiment e ∈ E the following constraints must 146 hold: where 1 is the identity function-equals 1 when the condition is met.

148
Again, the first four constraints implement an AND gate of necessary and sufficient 149 conditions for a node to be up or down regulated. These conditions are; first, the node 150 must be included in the final sub-network, second, unless it's perturbed, at least one 151 reaction must activate it and third, no inhibitor should be active. If all of the three 152 requirements are met then the fourth constraint will force the node to assume a 153 non-basal state. The first constraint also enforces mutual exclusivity between the two 154 active states of a node, as it cannot be both up and down regulated. The fifth 155 constraint guarantees that the perturbed nodes have the intended state and initiates the 156 signaling process.

157
The sum at the left-hand side of the fourth constraint allows the optimizer an extra 158 degree of freedom in labeling a node. In particular, unless forced by another constraint, 159 a node can become either up or down regulated regardless of the sign of z, and the 160 optimizer has to decide the best option. If there are no conflicting influences, then there 161 is no decision to be made since the second constraint would have already forced one of 162 the two x variables to zero. Because of this constraint, xs have to be binary variables, 163 unlike the zs.

164
The final subnetwork will always be an acyclic graph rooted at the perturbed nodes and branching towards the measured nodes. Nonetheless, the prior knowledge network may contain loops. Loops present a challenge for logic-based methods since they can be self-activated without external perturbation. Because the up and down regulated states are mutually exclusive, negative feedback loops are automatically handled by the constraints presented thus far. However, for the case of positive feedback loops we have to include an extra set of constraints and variables. In particular, for every node n ∈ N we introduce a variable d n and the following constraints: In the presence of a cycle, these constraints will push the d variables involved to grow 165 without bound and thus by bounding them up by M we force the optimizer to break 166 the cycle by removing some arcs. M represents a "big" constant and can be thought of 167 as the longest distance between any pair of nodes and thus the best "uninformative" Sparsity incentives were also encoded in the objective function in the form of weights 179 over the variables v and y. The weights were derived from the number of citation   Finally, because of the discrete nature of the problem, many solution can be optimal 191 or nearly optimal. To address this issue, we generated multiple near optimal solutions 192 and for each variable we averaged its value across all weighted by the value of the senescence at about 50 CPD, as described before [18]. In all experimental procedures  Technical variability for bead-based phosphorylation assays has been studied before 238 in [19], where they estimated the coefficient of variance due to the experimental 239 protocol and pipetting errors to average in the range of 5-15%. To access the impact of 240 such variability in our results, we generated artificial datasets by adding noise to our 241 MFI readouts and rerun the analysis. We repeated this processes 10 times for three 242 different levels of noise 5, 10 and 15%, and computed the Jaccard similarity index of the 243 resulting network with the original network. We observed that a CV of 5% has virtually 244 no effect in the results while for larger values the networks start to diverge. The results 245 are summarized in Figure 3.    Factor Kappa B Kinase Subunit Beta (IKBKB ) and RelA/p65 is slightly more likely to 290 be up-regulated. These signaling events can lead to the activation of NFKB signaling 291 that establishes and maintains cellular senescence [22]. Additionally, signaling through 292 the TNF receptor 1 (TNFRSF1A) that leads to downstream activation of SRC, is also 293 predicted to happen predominantly in young cells, while the TNF receptor 2 294 (TNFRSF1B ) is equally responsive in both types of cells.

295
On the other hand, in young cells we predict higher probability of activation of 296 proteins involved in cell growth and proliferation such as STAT1, STAT5, and PTK2, 297 which is confirmed for all three by our experimental data. The basal phosphorylation 298 levels of these proteins are also significantly higher in young cells. In response to IL1A, 299 MYD88 is more likely to be phosphorylated in senescent cells while IL1 receptor 300 activated kinases (IRAK1/2 ) are more likely to be responsive in young cells. MYD88 301 activation through the interleukin receptor, followed by recruitment of IRAKs has been 302 associated with the activation of the senescence-associated secretory phenotype 303 (SASP ) [29].

305
In this article, we present a novel framework for reconstructing signaling networks using 306 phosphoproteomic data and prior knowledge about their connectivity. The framework 307 presented expands our previous work on the subject by allowing species and reactions to 308 be modeled independently. This decomposition renders the framework more flexible to 309 deal with evolving systems, like aging and senescence, and also allows the user to 310 incorporate more information into the model both with respect to species and reactions. 311 For example, cells can be interrogated in the presence of drugs or small molecules that 312 inhibit specific interaction but also upon knocking down specific genes and thus 313 removing their respective proteins.

314
For the case study presented, the reaction set was constrained to be the same across 315 both populations while the node set was allowed to vary but the opposite configuration 316 could have been implemented just as easily. The configuration used allowed networks to 317 differ only due to changes in the responsiveness levels of the proteins while also allowing 318 more data points to inform which reactions should be included. More generally, within 319 the presented framework, one can impose soft or hard constraints on nodes and/or 320 reactions without necessarily compromising the accuracy of the model, since the two 321 mechanisms allow the optimizer to work around them if necessary. From the sensitivity 322 analysis presented, we observed that, despite the fact that reactions were constrained to 323 be the same for the 2 populations, they were more sensitive to noise. This is due to 324 PLOS 9/13 their relatively moderate effect on the overall connectivity compared to the effect of 325 removing a node from the network. Thus by extending the scope of optimization to 326 species as well as reactions we render our model less sensitive to overfitting.

327
Similarly to our previous work, the proposed formulation can handle both positive 328 and negative interactions as well as cycles in the prior knowledge network transparently. 329 However, the resulting networks are acyclic so it cannot model dynamical behavior 330 within a network. Also, as with all logic-based approaches, our framework lacks the 331 ability to describe the kinetic aspects of the network [11]. Because of the MIP 332 implementation, the networks constructed are optimal with respect to the proposed 333 framework or at least within a pre-specified distance from optimal [21]. Moreover, Pareto-optimal or near optimal, solution. We run our method on synthetic data, that 336 respected sign consistency, and were able to reconstruct the generated signatures in 337 every occasion (data not shown). To demonstrate our framework, we modeled how cells alter their signaling network 348 as they senesce in vitro. In particular, we measured the fold-change in the concentration 349 of 18 phosphoproteins, in young and senescent cells, upon stimulation with 6 cytokines 350 and constructed a signaling network for both populations simultaneously. The

351
MetaCore database was used to assemble the pool of possible interactions.

352
The responses for young and senescent fibroblasts were compared and we observed 353 significantly higher overall responsiveness of proliferative signals in young cells and mainly in response to EGF, IL1A and TNF stimulation. SASP involves the secretion of 357 inflammatory cytokines, growth factors, and proteases and is a characteristic feature of 358 senescent cells. SASP has been associated with many age-related diseases, including 359 type 2 diabetes and atherosclerosis. In our model, the NFKB pathway, which is highly 360 involved in the establishment of SASP is activated through downregulation of NFKBIA 361 by upstream signals and upregulation of RelA/p65 [22]. The activation of ERK1/2 is 362 also known to affect the secretion of interleukin 6, the most prominent cytokine of the 363 SASP [30] and MYD88 activation through the interleukin receptor has been associated 364 with SASP as well [29]. 365 Another highly studied pathway in aging is the insulin and IGF1 signaling (IIS ) 366 pathway. The attenuation of IIS has been shown to promote longevity and extend the 367 lifespan of various organisms [26]. However, IIS is also known to decline during normal 368 and accelerated aging [27]. The complexity of the IIS pathway is further increased by 369 evidence that hybrid INSR-IGF1R complexes are formed and have distinct affinities for 370 insulin, IGF-1 and IGF-2 [31]. In this model, we propose that the attenuation of the IIS 371 response in senescent cells is mediated through lower responsiveness of INSR but not 372 IGF1R. While IGF1R and INSR share common downstream pathways, their biological 373 activity is not identical. INSR mainly controls glusose uptake and metabolism, while 374 IGF1R plays a proliferative and anti-apoptotic role [20]. 375 In conclusion, this paper introduced a new logic-based framework for constructing 376

PLOS
10/13 biological networks that allow researchers to model more complex experiments as well as 377 dependencies across networks. To demonstrate its capabilities, we modeled the signaling 378 network of HFL-1 cells as they age. The results presented are preliminary and in the 379 future, we plan to model more types of "-omics" data and time-points in our model.