Abstract
Despite its high and direct impact on nearly all biological processes, the underlying structure of gene-gene interaction networks is investigated so far according to pair connections. To address this, we explore the gene interaction networks of the yeast Saccharomyces cerevisiae beyond pairwise interaction using the structural balance theory (SBT). Specifically, we ask whether essential and nonessential gene interaction networks are structurally balanced. We study triadic interactions in the weighted signed undirected gene networks and observe that balanced and unbalanced triads are over and underrepresented in both networks, thus beautifully in line with the strong notion of balance. Moreover, we note that the energy distribution of triads is significantly different in both essential and nonessential networks compared with the shuffled networks. Yet, this difference is greater in the essential network regarding the frequency as well as the energy of triads. Additionally, results demonstrate that triads in the essential gene network are more interconnected through sharing common links, while in the nonessential network they tend to be isolated. Last but not least, we investigate the contribution of all-length signed walks and its impact on the degree of balance. Our findings reveal that interestingly when considering longer cycles the nonessential gene network is more balanced compared to the essential network.
Introduction
Today, various studies investigate genomic information based on pairwise connections in gene interaction networks [1]. However, the interesting collective behaviors that emerge from these interactions can not be described by simply considering pairs of genes. In other words, while studying pair connections has well broadened our view on the functionality of genes, the higher-order organizations are yet to be explored. To be specific, studies demonstrate that genes are categorized into two main groups [2]. Functionally, essential genes play a more vital role in the biological process, and locally they form a denser network compared to nonessential genes. Yet the crucial question raised here is if there exists a structure beyond these pairwise interactions in these two networks? If so, what is the difference in the underlying structure between essential and nonessential networks? Suppose in a signed interaction network genes A, B, and C are connected, is it logical to consider the interaction AB detached from its context, that is, triad ABC? What is the impact of interactions AC and BC on the interaction between genes A and B? It is known that triadic interactions play a significant role in the construction of real-world networks [3, 4], and structural balance theory (SBT) has well discussed these interactions. In this work, we apply SBT to the gene interaction networks to answer the following questions: Is there a structure beyond pairwise interaction in the gene interaction networks? Which types of triads, balanced or unbalanced, are over (under) represented in these networks compared to the shuffled networks regarding both the frequency and the energy distributions? Is there a difference between essential and nonessential networks in the pattern of connection between triads? In addition, when considering all lengths of cycles, which network is more balanced? And do all genes have an equal impact on the final networks’ degree of balance? These questions are the basis of this study.
SBT was introduced in social psychology by Heider to investigate the structure of tension in networks whose mutual relationships are explained in terms of friendship and hostility [5]. Later this theory has been generalized for graphs by Cartwright and Harary through considering the triads as low-dimensional motifs [6]. One of the standard applications provided by balance theory is to measure the degree of balance/ stability in networks [7–12]. On the other hand, quantifying the degree of unbalance/ frustration in a signed network was proposed as well [13]. Similarly, in biological networks distance to the exact balance is computed [14–17]. Moreover, several researchers have studied the dynamics based on which an unbalanced network achieves balance through reducing unbalanced triads [18–25]. Some studies provide further theoretical expansion of balance theory employing methods from Boltzmann-Gibbs statistical physics to unravel the dynamics behind the structural balance [4,26,27]. An appealing application of balance theory recently applied predicts which correlation matrix coefficients are likely to change their signs in the high-dimensional regime [28]. Consequently, there have been two main trends in the literature of SBT: 1) Studying the analytical aspects theoretically [19,29–35], 2) Applying it to a wide variety of real-signed social, economic, ecologic, and political networks empirically to clarify their structures [36–43]. Amongst these applications, it should be mentioned that understanding the structure entirely, not partially, calls for considering not only short-range interactions but also longer-range cycles [44–47]. Accordingly, we analyze the structural balance of gene interaction networks. We study the genetic interaction profile similarity matrices of the yeast Saccharomyces cerevisiae [48, 49], which has been categorized into two main classes, namely, essential and nonessential. Amongst all 5500 genes, approximately 1000 genes are essential because of their vital functional role in biological processes. According to the threshold taken by Costanzo and et al. in [48], essential genes have higher degrees and are considered hubs in the global network. Thus these genes play a considerable role in the local structure of the network. On top of that, essential genes have higher prediction power compared to nonessential genes [50, 51].
Here, we investigate the weighted, signed, and undirected networks of genetic interaction for essential and nonessential genes of the yeast Saccharomyces cerevisiae. Primarily, we are interested in probing the existence of structure beyond the pairwise gene interactions in these networks. To this aim as in our previous study [52], we compare the spectrum of eigenvalues between genetic interaction matrices and their shuffled versions. The rest of the paper is organized as follows. First, we explore the frequency of triads in the gene networks according to the notion of over and under representation of different types of triad compared to the shuffled networks. Afterward, we assign energy levels to unique configurations of triads and demonstrate triads’ energy distributions. Then, the energy-energy mixing patterns between triads are analyzed to systematically investigate how triads with different energies are connected in the networks. Additionally, we examine the balance of the gene interaction networks by considering all lengths of cycles. Last but not least, we propose a list of significant genes which have a high impact on the global degree of networks’ balance.
Materials and methods
Data
Saccharomyces cerevisiae is a beneficial yeast to analyze eukaryotes. One of the outstanding characteristics of it is that almost all bioprocesses in eukaryotes can exist in Saccharomyces cerevisiae [53]. Here, we analyze the gene interaction similarity networks of about 5500 genes. Around 1000 genes are identified as essential, and the rest of them as nonessential genes [54,55]. Costanzo and his colleagues have provided the data [48]. They have published three gene interaction similarity matrices, for essential genes, nonessential genes and the combination of them in the global form. The data file used is available at http://boonelab.ccbr.utoronto.ca/supplement/costanzo2016/. We have worked with data file S3 titled “Genetic interaction profile similarity matrices”. The steps taken by them to produce this data are as below:
Based on the growth rate of the colony consisting of two specific mutated genes, the genetic interaction score (epsilon) between them has been obtained.
A genetic interaction profile for each gene is constructed by considering the genetic interaction score between that gene and a set of other genes in the colony.
The similarity between all two profiles is obtained by calculating the Pearson correlation coefficient (PCC).
The positive value in the PCC matrix indicates how much those two genes are functionally similar to each other and vise versa. Moreover, zero elements show that those two genes are not related to each other functionally. We represent the procedure accomplished to obtain the PCC matrices in Fig 1.
Network analysis
Before anything else, to understand the networks working with, some features were calculated. The network’s topological and statistical measurements analyzed here are: Mean degree, the ratio between mean of squared degrees and squared of mean degree, modularity, assortativity coefficient, average path length, and clustering coefficient. The coefficient 〈k2〉 holds information about the values around mean degree. However, includes information about the tail of degree distribution. Hence, low indicates that the tail carries a higher share in the couplings. About modularity, it measures the strength of a network in division into modules. As another feature, assortativity (positive coefficient) means that a high-degree component usually prefers to be connected with the high-degree one and vice versa [56]. Disassortativity (negative coefficient) implies that a giant cluster tends to link with a small one. Also, mean length declares that, on average, how nodes can create a relationship with each other [57]. Finally, a high clustering coefficient states the extent to which the agents in the system tend to remain in their clusters [58].
After analyzing network features, it is substantial to examine if the network construction is random or not. So, the existence of structure beyond the pairwise interaction in the gene interaction network is analyzed. When there is no structure beyond pairwise interactions, that network can be known as a random one. In a random network, the distribution of the spectrum of eigenvalues has a semi-circular form with a body-centered around zero [59]. In a nonrandom network, there are some eigenvalues out of the bulk [60]. Also, one large eigenvalue exists that mostly has a value far from the bulk of the eigenvalues [61, 62]. This eigenvalue plays a significant role and addresses the global trend of the system.
Structural balance theory
To go beyond the assumption that pair interactions are independent and looking for triads as the shortest motif, structural balance theory (SBT) is applied [29]. To consider the local triangles, we focus on groups with three interacting genes in the network. There are four kinds of triads, including two balanced and two unbalanced ones. The idea of “The friend of my friend is also my friend” [+ + +] refers to a strongly balanced triad. The idea of “The enemy of my enemy is my friend” [− − +] points to a weakly balanced triad. The two other types of signed triadic configurations, [+ + −] is a strongly unbalanced triad, and [− − −] is a weakly unbalanced triad. These triads give rise to frustration in the network [44]. In other words, the triangle is recognized as a balanced one if the sign of the product of its links is positive. Otherwise, the triangle is unbalanced or frustrated. Significant computational methods are used to speed up accounting for the number of triads in the signed and large network [63]. It works based on connectivity (G) and adjacency (A) matrices. In the connectivity matrix, G(i,j) = 1 if the nodes i and j are connected, otherwise G(i,j) = 0. In the adjacency matrix, A(i, j) = 1 represents all positive elements in graph and the A(i, j) = –1 denotes all negative interactions in the graph. As below, the two equations count the number of balanced b and frustrated u triads, respectively:
As Leskovek has proposed [3], we have built a null model to compare the empirical frequencies of triads. It is important for generating a null model to keep the exact fraction of positive (negative) signs. Each selected link is randomly connecting the two existing nodes. So the created null model represents no organization in the structure. Then, we calculate the fraction of each kind of triad in the shuffled network as p0(Ti). The triad i is overrepresented if the related fraction in the original network as p(Ti) be more than that of the shuffled one. Otherwise, it will be underrepresented. Next, the value of surprise, as bellow is calculated in which Ti is the number of triad i and E[Ti] is the expected number of triad i calculated as E[Ti] = Δp0(Ti) and Δ is the total number of triads and p0(Ti) as mentioned before is the fraction of triad i in the shuffled network. To eliminate the effect of size in both networks, after calculating the s(Ti) function, it is divided into .
It has been stated that a balanced network is a network consisting of all positive triads [8]. While the possibility of possessing a real-world network containing all positive signed triads (positive product of their sides) is close to zero. So a common approach is to measure the degree of balance of a signed network. To this aim, the concept of balance enables us to determine an energy landscape for such networks. Energy describes how much a network is structurally balanced. The network energy is obtained by the negative summation of the products of the triads’ links (SijSjkSki) divided by the total number of triads (Δ) [21,64]. If the network energy (E) is –1, then we have a fully balanced network. But if it equals +1, then we will have an unbalanced network. Consequently, in real-world networks, the energy of triads is between –1 and +1. According to SBT’s suggestion, a network evolves towards the minimum level of tension between triadic [64].
The energy landscape introduced above considers the triads individually and does not designate how they are connected. The energy-energy mixing pattern between triads shows which of them with energy E1 has a common link with the other one with energy E2. So we can find out that concerning the energy value, what triangles are contiguous to connect. This pattern shows if the specific types of triangles are packed together and form a kind of module. Also, this pattern figures out if the triads represented a heterogeneous pattern of connections. Moreover, triangles with higher energies prefer to connect to ones with lower or the same value.
The walk-based measure of balance and detecting lack of balance
SBT gives informative information to understand the structural balance of signed networks but is biased. Through these small groups, our analysis recognizes the frustration on the shortest possible cycle, but it overlooks to considering the unbalance correlated with longer-range cycles [33]. Being a balanced or unbalanced cycle is related to the multiplication of the signs of its edges. If the sign of the product is negative, or the number of negative links in the cycle is odd, it is an unbalanced cycle. If all of them in a network has a positive sign, we can consider the signed network as a balanced one [44–46]. The probability of having a network with real data containing all cycles with a positive sign is close to zero. As Estrada proposed in [47], we calculate the walk-balance index (K) for walks with all lengths by assigning more weights to the shorter ones, which is logical [47]. This method relates a hypothetical equilibrium between the real-world signed network and its underlying unsigned version:
Where A(Σ) and A(|Σ|) are signed and unsigned adjacency matrices, respectively. Elements in A(Σ) are +1 when the interaction matrix values are more than zero. Also, if the interaction matrix values are less than zero the elements in A(Σ) are −1. In the unsigned adjacency matrix A(|Σ|), if the elements in the interaction matrix are nonzero, the elements of A(|Σ|) are 1. Another index that can measure the extent of the lack of balance in the network (U) is as follows [47]:
When a network is highly unbalanced, K ≈ 0, it implies U ≈ 1. Diversely, a balanced network has K = 1 and U = 0. At last, the participation of each node in the balance of the network can be calculated by the degree of balance of a given node i as Ki [47]:
Results
First, important features in both essential and nonessential gene networks are compared in Fig 2. Despite the segregation among the measurements, there exist some similarities. As shown in Fig 2, the mean degree 〈k〉 in nonessential gene network is higher compared to the essential network. Besides, in both networks, the ratio between mean squared degrees and squared of mean degree is close to one. This implies that neither nodes with high degree nor medium degree are significantly dominant over the other one. In addition, both networks have nearly similar modularity, as a measure of a network’s tendency to cluster into multiple sets of strongly interacting parts, with a little higher degree for the essential gene network. Moreover, as it has been illustrated in Table 1, the assortativity coefficient in both networks is negative but so close to zero. That is, both networks show weak disassortative behavior. However, the magnitude of assortativity is one order higher in the essential network. In the radar plot (Fig 2), we demonstrated the absolute values of assortativity coefficients. The other significant feature of the networks is the average path length which represents the number of steps along the shortest path for each pair of nodes. The small value of this characteristic in both networks shows that these networks are densely connected, however for nonessential networks it is a bit longer. At last, the tendency in forming clusters as defined by the clustering coefficient is higher in the essential network.
The radar plot for the essential gene network is plotted in blue and the nonessential gene network in yellow.
Then, we have investigated the existence of clusters in the construction of the essential and nonessential gene networks. Within groups, the genes cooperate to annotate a common bioprocess efficiently. Clusters in the essential and nonessential gene networks are illustrated in cluster maps (Fig 3). It can be seen that the essential network has higher modularity which is in line with the previous result which stated that the essential network is more densely connected than the nonessential network. In other words, although the clusters exist in both networks, the structure in the essential gene network (Fig 3A) is highly stronger than the nonessential network (Fig 3B). This is also confirmed in our previous work, where we observed a significant difference between the distributions of eigenvalues in original matrices and the shuffled networks [52]. To be specific, some of the eigenvalues in the original networks are not limited to the narrow bulk of the shuffled matrices’ eigenvalues. Thus, it can be confidently concluded that the structure of the gene interaction networks is far from random.
A) Cluster map of essential gene network, B) Cluster map of nonessential gene network.
The structural balance in gene interaction networks to study the structure beyond pairwise interactions is analyzed. In Table 2, the size, and the percentage of positive and negative links, and the total number of triads in both networks are prepared. In the following, the two equations Eq (1) and Eq (2) are used to count balanced b and unbalanced u triads. To compare the dominance of balanced or unbalanced triads in our networks, we have used a method proposed by Leskovec et al. [3]. If the fraction of a triad in the original network is higher than the shuffled one, it will overrepresent, and vise versa. The fraction of the triad Ti in the original network is considered as p(Ti) and in the shuffled network p0 (Ti). Moreover, they have proposed the concept of surprise as Eq (1), s(Ti), to understand how significant these over (under) representations are. Due to the size of the networks, s(Ti) has a significant order of tens. Balanced triads are overrepresented in both essential and nonessential gene interaction networks. On the contrary, unbalanced triads are underrepresented compared to the shuffled. These results are presented in Table 3.
Number of nodes, edges, triads in both essential and nonessential gene networks with threshold Sij < 0.051.
|Ti|, the total number of Ti; p(Ti), the fraction of Ti; p0(Ti), the fraction of Ti in the null model; s(Ti), the amount of surprise, i.e., is the number of standard deviations by which the actual number of Ti differs from its expected number under the null model.
After analyzing the frequency of triads, we have examined the energy distribution of different types of triads. So we have calculated the energy of triads by Eq (4). Then the energy distributions of strongly balanced triads in Fig 4A, weakly balanced triads in Fig 4B, strongly unbalanced triads in Fig 4C, and weakly unbalanced triads in Fig 4D for both original networks, in comparison with their shuffled, are presented. Results indicate: 1) All kinds of triads, in both essential and nonessential networks, have many triads with small energies. 2) In the essential gene network as Fig 4E, the average energy of all types of triads is larger than the nonessential triads. 3) In the essential gene network, like the nonessential network, the bar levels of the average energy of balanced triads are higher than shuffled ones. However, on the contrary, the bar levels of the average energy of unbalanced triads are lower than shuffled ones. 4) As Fig 4F, in the essential gene network, the bar level of the relative frequency of the balanced triad with one positive side is individually equal to the relative frequency of the other three triads.
A) Energy distribution for strongly balanced triads, B) Energy distribution for weakly balanced triads, C) Energy distribution for strongly unbalanced triads, D) Energy distribution for weakly unbalanced triads. (The energy distribution of triads for original essential gene network and its shuffled network are plotted in blue and red, respectively. The energy distribution of triads for original nonessential gene network and its shuffled network are plotted in yellow and gray, respectively.) The average energy for all four kinds of triangles. E) From left to right, essential gene network and nonessential gene network. The relative frequency for all four kinds of triangles. F) From left to right, essential gene network and nonessential gene network (Green bars for original networks and purple ones for shuffled networks.)
Through another consideration, we look for triads with one shared link in the networks. We display the energy-energy mixing pattern between the triangles. Fig 5 shows how many triangles with different energies are connected. To have a more accurate consideration, the logarithmic scale of that analysis has been plotted. By using the logarithmic scale, there is a magnification between the elements with small amounts. The same behavior from both networks is observed. This plot reflects more sparsity for one shared link in the nonessential gene network rather than the essential gene network. However, the essential gene network shows more preference to participate in modules than nonessential genes. This result is notable because the number of triads in the nonessential gene network is much more than that of the essential gene network. Moreover, the triads with higher absolute valued energies have a shared edge with higher magnitude energies.
From left to right, essential gene network and nonessential gene network
Now, by considering walks with all possible lengths, we extend our analysis. The quantity of balance or unbalance through these walks is measured. Indeed, we used the two indices introduced in [47] by Estradato not to limit ourselves only to triads as the shortest cycle. The walk-balance index by Eq (5) defines the quantification of how close to balance an unbalanced network is. Another index represents the amount of the shortage of balance in a given signed interaction network by Eq (6). In Table 4, the amounts of the walk-balance index in both essential and nonessential gene networks have been presented. The result indicates that by considering all walks, the nonessential gene network is more close to balance. Besides, the extent of the lack of balance in the essential gene network is much more than the nonessential gene network. Also, we shuffled the interaction matrices and calculated these indexes again. There is a leading difference for each index between the result of the original and the shuffled matrices. Furthermore, there is an index that characterizes the degree of balance for a given node by Eq (7). In supplementary, a Table is prepared to represent the classification of highlighted essential and nonessential genes based on their significant role in the balance.
Discussion
We analyzed gene interactions in the weighted, undirected, and signed networks of yeast Saccharomyces cerevisiae. The pre-processed data set used includes two matrices, namely, essential and nonessential gene interaction networks. Here, we explored these two gene networks beyond pairwise interactions in the context of structural balance theory (SBT). The following results have been concluded accordingly: We have discovered that in both essential and nonessential gene networks balanced triads are overrepresented while unbalanced triads are underrepresented. Interestingly, this finding is in agreement with Heider’s balance theory. To be specific, our results empirically support the strong notion of structural balance theory (Table 2). This is while in some social networks, the weak formulation of structural balance has been reported as well [3].
Additionally, we have observed T1 and T0 triads in both gene networks with more average energy and higher relative frequency in the essential network. This can be interpreted from the perspective of SBT in which the presence of T1 and T0 triads in the organization of a network is related to having a higher degree of modularity. In other words, to have T1 or T0 in the stable state of a network indicates that densely connected modules are also connected to each other through negative links. This result corresponds to the presence of specialized clusters in the gene interaction network which has also been reflected in the energy-energy mixing pattern between the triads with one common link Fig 5. It is worth mentioning that this pattern is more significant in the essential network as genes in this network are more densely interconnected.
Moreover, we have noted that although energies of the essential and nonessential networks are not significantly different from each other, the underlying triads’ distributions that led to these final energies are not similar. As mentioned earlier, the average energy and the relative frequency of unbalanced triads are higher in the essential gene network compared to the nonessential network. Thus, they are more likely to experience different possible states. Therefore, it can be concluded that unbalanced triads are providing the essential gene networks with the necessary structure that is needed to contain a dynamism which is crucial for vital biological mechanisms. This is while for nonessential genes with less unbalanced triads, the likelihood of being trapped in a local minima is higher.
Finally, to extend our analysis we have calculated two indices by considering the walks with all possible lengths. Namely, the quantification of how close to balance an unbalanced network is, and the extent to which a given signed network lacks balance by considering longer-range cycles. Results surprisingly suggest that when all length walks are taken into account, the nonessential gene network is more balanced and stable than the essential network. In other words, essential genes respect shorter-range connections while in nonessential genes long-range interactions have a higher impact. As mentioned earlier, the combination of both essential and nonessential interactions constructs the global gene network as a whole. For this network, we have proposed a list of genes in the supplementary file that have an influential role in determining the final networks’ degree of balance. Thus, our finding highlights the genes that are structurally of note, regarding which further biological analysis seems to be very much valuable.
Author contributions
Conceptualization: Nastaran Allahyari, Gholam Reza Jafari, Ali Hosseyni.
Formal analysis: Nastaran Allahyari, Gholam Reza Jafari, Ali Hosseyni, Amir Kargaran.
Methodology: Nastaran Allahyari, Gholam Reza Jafari, Ali Hosseyni. Resources: Nastaran Allahyari, Amir Kargaran, Gholam Reza Jafari. Software: Nastaran Allahyari, Amir Kargaran.
Supervision: Gholam Reza Jafari, Ali Hosseyni. Visualization: Nastaran Allahyari.
Writing: Nastaran Allahyari.
Acknowledgments
N.A. would like to express her appreciation to Z. Moradimanesh and S. Salekzamankhani for constructive comments that improved the manuscript.