Exploring Extreme Signaling Failures in Intracellular Molecular Networks

Developing novel methods for the analysis of intracellular signaling networks is essential for understanding interconnected biological processes that underlie complex human disorders. A fundamental goal of this research is to quantify the vulnerability of a signaling network to the dysfunction of one or multiple molecules, when the dysfunction is defined as an incorrect response to the input signals. In this study, we propose an efficient algorithm to identify the extreme signaling failures that can induce the most detrimental impact on the physiological function of a molecular network. The algorithm basically finds the molecules, or groups of molecules, with the maximum vulnerability, i.e., the highest probability of causing the network failure, when they are dysfunctional. We propose another algorithm that efficiently accounts for signaling feedbacks in this analysis. The algorithms are tested on two experimentally verified ERBB and T cell signaling networks. Surprisingly, results reveal that as the number of concurrently dysfunctional molecules increases, the maximum vulnerability values quickly reach to a plateau following an initial increase. This suggests the specificity of vulnerable molecule (s) involved, as a specific number of faulty molecules cause the most detrimental damage to the function of the network. Increasing a random number of simultaneously faulty molecules does not further deteriorate the function of the network. Such a group of specific molecules whose dysfunction causes the extreme signaling failures can better elucidate the molecular mechanisms underlying the pathogenesis of complex trait disorders, and can offer new insights for the development of novel therapeutics.


INTRODUCTION
Cellular functions are largely regulated by signaling events within the complex intracellular molecular networks [1][2][3]. Signals are typically transmitted from the cell membrane to the nucleus via intracellular signaling networks, to regulate some target molecules and alter different cellular functions. Intracellular signaling networks have been studied to address a variety of different questions [1][2][3][4][5][6][7][8].
An important application of the studies on intracellular signaling networks is in target discovery and drug development. In the area of targeted therapies in pharmaceutical industry, a wide variety of highly effective therapeutics have been successfully developed that can target the function of a selective set of molecules within the complex intracellular signaling networks. Fault diagnosis is a platform technology for finding such selective targets by using computational and systems biology techniques that have been developed and optimized over the past few years [9][10][11][12]. The main purpose of such technology developments is to understand how vulnerable the entire network is to the dysfunction of each molecule, or a specific group of molecules. In the realm of fault diagnosis technology development, the dysfunctional state of a molecule can be defined as the failure to respond correctly to the input signals, which may further propagate into incorrect responses at the output of the network. For this analysis, we have defined the vulnerability level of a molecule as the probability of having incorrect network responses when that specific molecule is dysfunctional. Vulnerability analysis can be performed for the dysfunction of a single molecule, as well as a group of molecules. The importance of the latter can be attributed to the widely known observations that the most common human disorders, such as cancer or schizophrenia, are reported to be associated with the dysfunction of multiple molecules rather than a specific single molecule [13,14]. This is in contrast to some rare genetic disorder when a single molecule or a genetic mutation is sufficient to cause the pathology [15]. By computing the vulnerability level of a molecule or a group of molecules, one can identify and rank the key molecular components for gaining a desired response, and then compute how much they contribute to the failure of the network. From the drug development stand points, the highly vulnerable molecules can be used as the molecular targets of novel therapeutics.
To analyze a molecular network, first a biologically relevant model needs to be adopted. Here we consider the class of discrete models such as Boolean models [1][2][3][4][5][6]8,16]. They are particularly useful as they do not need detailed kinetic information and still provide certain biologically relevant insights and predictions. Compared to continuous differential equations models [7], discrete models do not require the knowledge of many mechanistic details and numerous kinetic parameters, and are more appropriate for our study in this paper. 4 The main goal of this paper is to develop a systematic method to explore the extreme failures of intracellular signaling networks. We define the extreme or the worst possible signaling failure as a pathological phenomenon that results in the highest probability of network failure, where the network failure is defined as the level of departure of the network response from its normal or expected response.
The said pathological phenomenon is characterized to be emerged from the presence of one or more dysfunctional molecules in the network. It is conceivable that different individual dysfunctional molecules can result in different levels of probabilities of the network failure. It is not clear, however, what happens if two or more molecules are concurrently dysfunctional, and if the network failure probability increases with the number of simultaneously dysfunctional molecules or not. It is also of interest to have an efficient algorithm to determine the maximum possible network failure probability, over the large number of all possible groups of dysfunctional molecules. The computational complexity of an exhaustive search approach for a network with K molecules is extremely high, in the order of /2 K K , which is unmanageable as K increases. Here we introduce a computationally efficient algorithm with a much less running time in the order of 3 K , for identifying the worst possible signaling failures, considering multiple dysfunctional molecules. This study is particularly important in the context of complex disorders with unknown molecular sources, where more than one molecule is observed to be involved in the pathology [13,14].
In this study we first present our extreme signaling failure analysis results on the ERBB signaling network of [17] and the T cell signaling network of [2]. Then, we analyze the computational complexity of the proposed algorithm and compare it with the exhaustive search. Afterwards, we provide the details of the vulnerability analysis equations and the proposed extreme signaling failure analysis algorithm in Methods, Sections A and B, respectively. Moreover, we propose another algorithm in Methods, Section C, that determines the number of time points (clock cycles) needed for network analysis and simulation while computing the vulnerability levels, so that we prevent running network simulations longer than what is needed. We present the results of this algorithm, when applied to the ERBB and the T cell signaling networks, in Methods, Sections D and E, respectively, and finally conclude the paper with some concluding remarks.

RESULTS OF THE EXTREME SIGNALING FAILURE ANALYSIS
The extreme signaling failure algorithm was applied to two networks: a small signaling network that regulates the transmembrane tyrosine kinase ERBB [17], a therapeutic target in breast cancer, and a larger T cell signaling network [2] involved in a variety of immune system response. Results reveal the impact of the number of dysfunctional molecules in causing the extreme signaling failure.

A. Extreme Signaling Failure Study of ERBB Network
The ERBB network ( Figure 1) has one input and one output. The input molecule is the Epidermal Growth Factor (EGF), whereas the output molecule is the Retinoblastoma Protein (pRB). This network has been studied in the context of breast cancer and understanding the mechanism of action of few drug molecules [17]. Equations that specify how the activity of each molecule is regulated by its inputs are listed in Table   1. To model a dysfunctional molecule, we assume its activity state is either stuck-at-0, SA0, or stuck-at-1, SA1, each with a probability of 1/2 [9].
Let N be the number of molecules that are simultaneously faulty, i.e., dysfunctional, in the network.
According to the developed vulnerability analysis equations (Methods, Section A) and using the proposed al [17]. The green arrows represent activatory interactions and the red circle-ended edges represent inhibitory interactions. The input and output nodes represent EGF and pRB, respectively.

6
where the network failure is defined as the level of departure of the network response from its normal values. In other words, the network maximum vulnerability for a given N is a parameter that quantifies the extreme possible signaling failure when there are N faulty molecules in the network.
An unexpected observation is that as the number of faulty molecules N increases, the maximum vulnerability values do not increase accordingly ( Figure 2). While we see a maximum vulnerability increase going from single faults to double faults, i.e., 1 and 2 N = respectively, the maximum vulnerability quickly reaches a plateau and no longer increase further afterwards. Another interesting observation is that the smallest N for which we see the highest maximum vulnerability in this network is 2 N = , i.e., double faults.
This means there are some pairs of faulty molecules that cause the most detrimental damages to the signaling network, and increasing the number of faulty molecules can no further deteriorate the function of outputs.
Using the developed vulnerability analysis equations (Methods, Section A) and the proposed extreme signaling failure analysis algorithm (Methods, Section B), the network maximum vulnerability is computed for each N (Figure 4), where N is the number of molecules that are simultaneously faulty. For any N, the network maximum vulnerability is the highest probability of network failure.
Similar to the ERBB network, but this time for all the outputs, we see as the number of faulty molecules N increases, again maximum vulnerability values do not increase accordingly ( Figure 4).
Moreover, for the network outputs ap1, bcat and p70s, while we see a maximum vulnerability increase going from single faults to double faults, 1 and 2, N = respectively, the maximum vulnerability does not increase more afterwards. For these outputs, the smallest N for which we see the highest maximum Figure 3. The experimentally-verified T cell signaling network reproduced from the study of Saez-Rodriguez et al [2]. The green arrows are activatory interactions and the red circle-ended edges are inhibitory interactions. The input nodes represent cd28, cd4, and tcrlig, whereas the output nodes stand for shp2, bclxl, p70s, ap1, sre, bcat, cyc1, p21c, p27k, fkhr, p38, cre, nfat and nfkb (see the main text for the acronyms and abbreviations).
vulnerability in this network is 2, N = i.e., double faults. This means that there are some pairs of faulty molecules that cause the most detrimental damage for these outputs, and increasing the number of faulty molecules does not further deteriorate the function of these outputs. However, for some other network outputs, including cre, nfat, p38, shp2 and sre, the results are different. For these outputs, the highest maximum vulnerability occurs when 1 N = , which implies that there are some single faulty molecules that cause the can individually create the extreme signaling failures at these specific outputs ( Figure 4). In what follows, we show that the proposed extreme signaling failure algorithm is much less complex than the exhaustive search.
To determine and compare the computational complexities, let K be the number of molecules in a network, and N be the number of molecules that are simultaneously faulty in the network. The total number of groups of N faulty molecules out of K molecules, 1, NK  is given by the following equation, where ( , ) C K N represents the number of possible combinations Here the O-notation stands for the asymptotic upper bound [20], with K being large. The term ( ) N OK represents the computational complexity of ( , ) C K N as a function of K and N.
The computational complexity of the exhaustive search () K  is the overall number of all possible groups of N faulty molecules for which vulnerabilities have to be computed, 1, , , To simplify the notation and without loss of generality, assume K is even.
We note that ( , ) C K N has a maximum at With ( , / 2) C K K being the dominant term in Equation (2) when K is large, and also using Equation (1), the exhaustive search computational complexity can be finally written as To determine the computational complexity of the proposed extreme signaling failure algorithm (Methods, Section B), we note that initially all groups of one, two and three faulty molecules are considered, And for the rest, 4,..., , NK = only ( 1) KN −− vulnerabilities are computed. This is inspired by the observation [11] that typically a molecule with a high vulnerability appears in larger groups of molecules with high vulnerabilities, and based our experiments, 4 N  is large enough and provides good accuracy, as discussed in the paragraph immediately after Equation (5). Therefore, the computational complexity () K  of the algorithm can be written as It can be verified that ( ,3) CK in the above expression is the dominant term, when K is large. This simplifies the proposed algorithm computational complexity to Upon comparing Equations (3) and (5), we note that since ( ) ( )

A. Equations for Computing Vulnerability Levels
Computing the vulnerability level of a molecule or a group of molecules in a network can help to identify and to rank the key components of the network, and to discover appropriate therapeutics targets.
Vulnerability of a molecule can be defined as the probability of having incorrect network responses when the molecule is dysfunctional. The dysfunction state of a molecule can be defined as failure to respond correctly to its input signals. In this paper, we consider stuck-at-0 (SA0) and stuck-at-1 (SA1) fault models to model the dysfunction of molecules, which are constantly 0, inactive, or 1, active, regardless of the activity state of the input signals of each molecule.
To compute the vulnerability level of a molecule, one needs to first introduce the sample space  In addition, we define CC number of events as follows: 1 E = the event of having an erroneous network response at the output in the 1 st clock cycle, …, and CC E = the event of having an erroneous network response at the output in the CC th clock cycle. Note that 1 E is the set of those v elements in l S in Equation (7) that have an e as the 1 st entry, 2 E is the set of those v elements in l S that have an e as the 2 nd entry, and so on. We define the vulnerability level of the l-th molecule Since we consider SA0 and SA1 as the fault models, Equation (8) can be expanded as follows While we assume equi-probable SA0 and SA1 faults for each molecule, i.e., in our computations, Equations (8) and (9) can be extended to other fault models and fault probabilities.
Equation (9) where   Similarly, when 2, N = a pair of faulty molecules, there are four events associated with each pair of faulty molecules   II. Next, we use Equation (10)  Then, we repeat Step III for 5,..., NK = , to complete the extreme signaling failure analysis.
Note that this algorithm is not limited to a specific molecular network. Furthermore, in addition to the vulnerability parameter used in this paper, other parameters that quantify and rank the importance of a molecule or a group of molecules can be used in the algorithm as well.

C. Proposed Algorithm for Determining the Number of Required Clock Cycles to Compute the Vulnerabilities
Modeling and analysis of molecular networks become more challenging if there are positive or negative feedback paths. Due to the feedback mechanisms, network responses may change over time because of some internal compensatory or regulatory mechanisms [18,19]. Feedbacks can cause delays in propagation of signals to the network outputs, while passing through the feedback paths. Therefore, analysis of the effects of feedback in computing vulnerability levels of the network molecules is of interest. More precisely, in this paper we are interested in determining how many clock cycles are needed to compute the vulnerability level of a molecule or a group of molecules, when there are feedbacks in the network. For this purpose, in this section we propose an algorithm that computes an upper bound on the number of clock cycles needed to generate the network response, to calculate its molecular vulnerabilities. Using this algorithm, one can specify how many times the network needs to be simulated, for a normal or abnormal signal to complete its propagation to the network output. This is needed in the proposed main algorithm for the extreme signaling failure analysis (Methods, Section B), to minimize the overall simulation time.
The feedback paths in a network can be modeled by unit-delay memory elements called flip-flops [9]. In a network, if there is only one feedback path, then we intuitively need at most two clock cycles to then the maximum clock cycles needed is bounded above by 1, L + i.e., the number of clock cycles needed for an error to show its full effect is less than or equal to 1. L + Therefore, to determine the maximum number of required clock cycles, it is sufficient to examine the connections between the network feedback paths and determine how many of them are serially connected in a single pathway. To do this, we define a graph theory topological metric called closeness (CL). The ( ) 0 , 1 CL X Y  parameter for quantifying the closeness of two molecules X and Y in a network is the inverse of the distance ( ) , d X Y between the two molecules, defined as the length of the shortest path between the two molecules in the network graph Some examples of how to calculate the closeness parameter are presented in Figure 5.
To determine the maximum number of clock cycles needed for computing the vulnerability levels in a network, we propose the following algorithm with four steps I. Identify the feedback paths and the nodes that initiate the feedbacks, in the network. If they are not known a priori, identify them by finding the loops in the network graph, using a graph algorithm such as the depth-first search algorithm [20].

IV.
Repeat Step III until all the feedback initiating nodes are examined.
Using the information obtained from executing the above steps, the algorithm finds the pathway that contains the largest number of feedback initiating nodes on it, in series. With this specific number being called L, then at most L + 1 clock cycles are needed for computing the vulnerability levels.
Toy examples: Here we compute vulnerability levels in two toy networks ( Figure 6A and Figure   6C) that have different number of feedback paths, to describe the relation between F and vulnerabilities.
Note that since each network has a single pathway, we have LF = in both networks.
The first toy network ( Figure 6A) has four nodes: 1 () xt is the input node (molecule), the where "  " is used for the AND operation and "~" is used for the NOT operation, and 3 (0) 0. x = Herein, the node 3 x initiates a negative feedback to the node 2 x . Since there is only one feedback path in the network, 1, F = at most two clock cycles are enough, 1 2, F += to observe the full error effect at the output.
To demonstrate this, we compute the vulnerability level of the node 2 x ( Figure 6B), for different number of clock cycles CC 1,2,3,4, = using Equation (9) (Methods, Section A). We observe that the vulnerability of 2 x with 1 clock cycle is 0.5 and it becomes 0.75 with 2 clock cycles, and then remains at 0.75 with 3 and 4 clock cycles. These indicate that the full vulnerability of 2 x is 0.75 that is determined by analyzing the network having feedback for two clock cycles ( 1 2) F += . In other words, two clock cycles are needed for the erroneous 2 x signal values to show their full effects at the network output 4 x . Additionally, more clock cycles are not needed.
The second toy network ( Figure 6C) has five nodes: 1 () xt is the input node,

the intermediate nodes, and 5 () xt
is the output node, where " + " is used for the OR operation, and 34 (0) (0) 0. xx == Here the nodes 3 x and 4 x initiate a negative feedback and a positive feedback to the node 2 x , respectively. Since there are two feedback paths in the network, 2, F = at most three clock cycles are needed, 1 3, F += to observe the full error effect of 2 x at the output. The computed vulnerability level of 2 x ( Figure 6D) for different number of clock cycles corroborates what we stated earlier in this section, that is, 1 F + is indeed an upper bound and less number of clock cycles may be needed for computing vulnerabilities in a network with feedbacks. In fact, we observe that the full vulnerability of 2 x is 0.75, obtained using 2 clock cycles only, and analyzing the network for the upper bound of 13 F += clock cycles is not needed ( Figure 6D). In other words, 2 clock cycles are enough for errors originated from 2 x to show their complete effects at the output 5 x .

D. ERBB Signaling Network -Vulnerability and the Number of Clock Cycles
In this section, we apply the proposed algorithm in Methods Section C, to the ERBB signaling network x . Note: The dashed lines ending in an arrowhead or a blunt line represent positive and negative feedbacks, respectively. existence of five loops, there are only four feedback initiating nodes, AKT1, ER-, p21 and p27, because p27 is common between two loops, i.e., the loops (iii) and (iv). Note that in the absence of prior information about the feedbacks, the feedback initiators may not be uniquely determined within the identified loops.
For example, based on these five loops, one can identify IGF1R, MEK1, CDK4 and CDK2 as feedback initiators as well. Nevertheless, different choices for the feedback initiating molecules within these loops do not affect the algorithm that aims at finding the pathway that contains the largest number of feedback initiating nodes on it, in series. This is because if the feedback initiating nodes i f and ij CL f f  Thus, the algorithm to determine L is independent of the choice of the feedback initiating nodes.
Using the identified feedback initiators, the algorithm outputs the upper bound of 15 L += clock cycles that may be needed for computing the vulnerability level of a molecule. This is because the algorithm identifies a pathway that contains all the feedback initiating molecules on it, in series. For instance, AKT1 CDK2 → pRB is a pathway that contains all of the feedback initiating molecules on it. Through this specific pathway, erroneous signals originated from an upstream molecule of AKT1 may require five clock cycles to show their full effects on the output molecule pRB, due to the signal propagation delays introduced by the feedback paths connected in series on the same pathway. Computed using Equation (9) (Methods, Section A), Figure 7 presents the single-fault vulnerability levels versus the number of clock cycles CC for some molecules in the ERBB signaling network. We observe that the Figure 7. Vulnerability versus the number of clock cycles CC for some molecules in the ERBB signaling network.
vulnerability levels of the molecules can be computed in less than five clock cycles, which confirms that it is sufficient to simulate and analyze the network for at most five clock cycles, as specified by the algorithm.

E. T Cell Signaling Network -Vulnerability and the Number of Clock Cycles
In this section, we apply the proposed algorithm in Methods, Section C to the T cell signaling network ( Figure 3). In the network, first we identify four feedback initiating nodes that are shp1, ccblp1, pag, and gab2, using the time indices in Table 2 and also by visual inspection of Figure 3. After using the proposed algorithm, we obtain the upper bound of 15 L += clock cycles, because there is one pathway that contains all the four feedback initiating nodes in series. Next, we compute the single-fault vulnerability levels of some molecules versus the number of clock cycles CC ( Figure 8A), with cre considered as the output molecule, and using Equation (9) (Methods, Section A). We observe that the vulnerability levels of the molecules can be computed in less than five clock cycles, which confirms that it is sufficient to simulate and analyze the network for at most five clock cycles, as determined by the algorithm.  Overall, the proposed algorithms have the potential to uncover certain aspects of abnormal signaling network behaviors that can contribute to the development of the pathology, and may suggest some new therapeutic strategies in the area of targeted therapy in pharmaceutical industry. This study is particularly important in the context of complex trait disorders with poorly understood molecular sources when more than one molecule is often reported to be involved in the pathogenesis of the disease.

Key Points
• The main goal of this research is to quantify the vulnerability of a signaling network to the dysfunction of one or multiple molecules, when the dysfunction is defined as an incorrect response to the input signals.
• An efficient algorithm to identify the extreme signaling failures, defined as a pathological phenomenon that results in the highest probability of network failure, is developed and applied to the two experimentally verified ERBB and T cell signaling networks.
• The results reveal that as the number of concurrently dysfunctional molecules increases, the maximum vulnerability values quickly reach to a plateau following an initial increase, which suggests the specificity of vulnerable molecule (s) involved. A specific number of faulty molecules cause the most detrimental damage to the function of the network. Increasing a random number of simultaneously faulty molecules does not further deteriorate the function of the network.
• Such specific group of molecules whose dysfunction causes the extreme signaling failures of the intracellular molecular networks can elucidate the molecular mechanisms underlying the pathogenesis of complex trait disorders, and can offer new insights for the development of novel therapeutics. Table 1. The Boolean equations for the ERBB signaling network (Figure 1) provided in [17]. In the equations, "x" is used for the AND operation, "+" is used for the OR operation, and "~" is used for the NOT operation.  Table 2. The Boolean equations for the T cell signaling network (Figure 3), provided in [2]. In the equations, "x" is used for the AND operation. "+" is used for the OR operation, and "~" is used for the NOT operation.