IMCC: Quantitative Analysis of the Inter-module Connectivity for Bio-network

Inter-module connectivity, which tend to connect different communities and maintain network architectural integrity, is contributing to functional coordination and information flow between modules in perturbations. Detecting the strength of inter-module connection is essential to characterize the reactive bio-systematical variation. A quantitative evaluation method for inter-module connections is needed. Here, based on the high-throughput microarray data from mouse, an evaluation approach (named as IMCC) for inter-module connectivity was developed. The IMCC model, which is an integration of direct and indirect inter-module connections, successfully excluded inter-module connections without statistical significance or below the cutoff value, and provided a more comprehensive landscape of inter-module relationships. We showed that the IMCC method reflected a more precise functional coordination between modules according to KEGG database, were validated by topological parameter. Application of IMCC in genome-scale stroke networks deciphered characteristic pathological “core-periphery” structure of modular map and functional coordination module pair. Author summery Inter-module connectivity, which tend to connect different communities and maintain network architectural integrity, is contributing to functional coordination and information flow between modules in perturbations. Moreover, modular rearrangements provide more efficient ways for phenotype alteration, inter-module connections have been considered to be ‘‘evolutionary interaction switches”. Such modular map rewiring can be used as a network biomarker to characterize the dynamics of drug responses. Detecting the strength of inter-module connection is essential to characterize the reactive bio-systematical variation response to disease or drug. We aim to construct a quantitative evaluation method for inter-module connections. Thus, this study has implications in systematical exploration detailed variation of inter-module pharmacological action mode of drugs.

. The procedure of computing IMCC model. Based on the genetic interactome and modules originated from microarray of MCAO mice, the parameters for direct or indirect inter-module connections were calculated, screened and integrated. Then the IMCC model were optimized according to KEGG database. The box at the bottom of the diagram was the modular map based on IMCC, in which vertexes denoted modules and the thickness of edges linking pairs of modules was directly proportional to the corresponding IMCC. In this modular map, the red circled modules represented the "core" module pair (module-blue and module-brown).
(2) where x is observed inter-module connections, k and n are the numbers of inter-module connections and all possible edges between two modules, respectively; M and N represent the total numbers of inter-module connections and all combinational gene pairs between any two modules in a module to module network, respectively. In this paper, we set 0.05 as P-value threshold. Therefore, inter-module connections are supposed to be present, if P-value ≤ 0.05; and the SW is defined as a valid direct measurement.
In the light of the network, paths consisted of multiple vertexes and links between them [58]. To simplify the problem, we restricted the length of paths and only considered paths that consist of three nodes (outset (o), mediation (m) and end (e)) with two links.
The path strength (PS) of a path is defined as the product of the weighted probabilities that mediation chooses outset and end. The weighted probability from m to o is the ratio of the weight between m and o (Wm,o) to the sum of the weights between m (Wm)and its first neighbors, the same as m to e.
Hypergeometric distribution was also used to screen the statistically significant PS. However, different from SW, in formula (2), x is observed nodes connecting a pair of module, k and n are the numbers of nodes connecting a pair of modules and all possible nodes connecting the two modules, respectively; M and N represent the total numbers of nodes connecting any pair of module and all possible nodes between any two modules in a module to module network, respectively. In this paper, we set 0.05 as P-value threshold.
We also employed consistency score to measure the inter-module connectivity as described in [57] ) G is a gene set that consists of all genes in network, and C is the total number of genes in G. CLi is the total number of links to gene i; Wi is the weight of gene i in network. S and T are the numbers of genes in modules Mx and My, respectively; CMx,i and CMy,i are the observed numbers of links connecting gene i and modules Mx and My, respectively. Formula (4) was used to compare the weights of genes correlated with a pair of modules with the weights of genes related to only one of the modules [57]. As the CT is a value after comparison with theoretical value, we set cutoff value (10) to screen out the valid CT.
We listed the parameters and screening procedures in Supplementary Table 1 and 2.

Measurement integration
To obtain a more accurate and objective relationship between modules, we merged DIMC and IIMC.
Firstly, we set SW of the inter-module connections, whose P-value of hypergeometric distribution was equal to or less than 0.05, as weight of DIMC. In the process of IMCC1 integration, SW and CT were two parameters of different dimensions, so it was adopted to correct the two parameters to the value of 0-1. We normalized the two measurements to be numbers between [0-1] by the follow formula: The two parameters were weighted using weighted coefficient α and β for SW and CT, respectively.
In the integration of SW and PS, both of them belong to the same dimension, so we plused the two parameters without weighting, defined as:

Data sources
We used genetic interactomes and modules constructed based on the microarray datasets of MCAO mice and WGCNA, to illustrate the performance of IMCC method on the inter-module connection calculation.

Data analysis and weighting coefficient optimization
Considering the different characteristics in various types of networks, whether the emphasis of IMCC should be placed on DIMC or IIMC must be consistent with practical applications. In biological networks, the communication between certain modules is commonly mediated by component with important functions; for example, a gene might be a target regulated by two modules competitively.
All the inter-module correlations summarized or predicted are presumed to contribute to biological functions. Therefore, it is imperative to introduce biological data to define the best weighting coefficient, in order to select the optimal IMCC. As a result, we employed the KEGG (Kyoto  Table 3). Through linear and 6 nonlinear curve estimating, we compared the coefficient of determination (R2), and quantitatively identified the best ρ value and the most fitting model ( Figure 2C). Our results suggested that the optimal ρ value was 1/1 and the most fitting model was logarithmic model with a R 2 of 0.616. Thus the final formula for IMCC could be simplified as We also plotted the IMCC2 against JS, and compared R 2 of fitted curves of IMCC1 and IMCC2 to select the optimal integrative method.

Comparison and verification of inter-module average shortest path
For previous algorithms for inter-module were mainly based on the sum of weight of interactions (SW) between pairs of modules, we compared IMCC with SW by plotting the score versus the KEGG accuracy (JS calculated as formula 8), as shown in Figure 2D, E, Supplementary Table 3. We also compared IMCC with inter-module average shortest path (IMASP), a topological parameter proposed to evaluate the distance of a module pair. IMCC was plotted versus the IMASP, and determination coefficient of curve fitting was also calculated ( Figure 2F, Supplementary Table 4).
In this paper, based on high-throughput microarray data from mouse, we integrated two types of inter-module connections, i.e. DIMC and IIMC, combined with statistical methods, and developed a novel quantitative algorithm termed as IMCC. Using IMCC, we drew more coarse grains from molecular network, and integrated the microscopic molecules into mesoscopic modules.

The IMCC model screens out noise
Biological networks, which were constructed based on a DNA microarray data set and a mathematical model [60], were considered to have high noise due to the false-positive levels inherent in the data set.
Taking our experimental network as an example, there were 374 nodes in the co-expression network, whereas most (55.93%) of the shortest path length between any two nodes is gathering in 2, and the third most (13.41%) in 1 (Supplementary Table 5). This indicated that more than 60% of the node pairs could be connected directly or indirectly. Accordingly, in the modular map, the module-module connections also manifested a false-positive property. As for modules, all of the module pairs could be connected directly or indirectly to form a densely interacted modular map. In our data set, the number of direct inter-module connections (DIMCs) was 575, indicating that averagely each module in this map had 24 neighborhood. And the number of indirect inter-module connections (IIMCs) was up to 1128 in the modular map, indicating that any pair of modules was connected by IIMC. In face of such a large number of inter-module connections, how to remove noise interference to accurately screen out the real data about module connections? Therefore, it seems imperative to screen out the random fluctuations of noise in inter-module connections. We introduced the hypergeometric distribution test (details are described in Methods), which calculates the probability that the specified target is selected from the whole population, so as to identify the valid value of SW (a parameter of DIMC) and PS (a parameter of IIMC) with significance.
As the CT (another parameter of IIMC) value is drawn from the comparison with expectation of the whole network, we set the cutoff value of CT at 10 (an inter-module connection with a CT > 10 was considered valid). Comparison of the number of module-module interactions before and after screening revealed large differences. Among the 575 SW, we screened out 412 (71.65%) invalid SW and 163 (28.35%) valid SW remained, according to the P value of hypergeometric distribution.
Meanwhile, out of the 1128 indirect module-module interactions, 249 (22.07%) valid CT and 358 (31.74%) valid PS remained after screening based on the cutoff value and hypergeometric distribution, respectively ( Figure 2A). After the screening process, we successfully excluded a large number of inter-module connections without statistical significance or below the cutoff value.

The IMCC method provides a comprehensive landscape
The integration of parameters about these direct and indirect connections provides a more complete landscape of inter-module relationships. Two functional modules may bind to each other or target an identical molecule to constitute a competitive regulation, both of which are universally present in biological networks. There were 144 overlapping inter-module connections between SW and CT, and 19 (11.66% of the total SW) and 105 (42.17% of the total CT) specific (non-overlapping) inter-module connections in SW and CT, respectively. Between SW and PS, 120 overlapping inter-module connections were identified, and 43 (26.38% of the total SW) and 238 (66.48% of the total PS) specific inter-module connections were found in SW and PS, respectively ( Figure 2B). These findings revealed that a certain number of module pairs were merely correlated by either direct or indirect connections.
As a result, our integration model has a wider-range of coverage than solo-direct and solo-indirect inter-module connections, which may provide a more comprehensive landscape of inter-module connections. Details of the integration process for SW and CT, as well as SW and PS were described in Supplementary Table 1 and 2.

The IMCC method reflected inter-module functional coordination
When integrating these parameters, we compared and analyzed two integration models of IMCC (IMCC1 and IMCC2). All the inter-module correlations summarized or predicted are presumed to contribute to biological functions. We employed the KEGG to select the most fitting integration model.  Table 3). Our results ( Figure 2C and D) suggested that the optimal ρ value in IMCC1 model was 1/1 and the most fitting model was logarithmic model with a R 2 of 0.616. In logarithmic model, the R 2 of different ρ values showed an obvious peak phenomenon: when ρ=1/1, the R 2 reached the peak with two sides sloping down to lower values; when ρ=1/10 or ρ=10/1, the minimum R 2 of each side were observed, respectively (Supplementary Figure 1). Therefore, it is proper to decide that the integrated parameter IMCC1 is more consistent with the KEGG classification than any single index (SW or CT), which would provide more accurate evaluation of the relationship between modules. Using the same method, we plotted the IMCC2 value against JS in KEGG, and found that there was no correlation between the two parameters and the fitting R 2 was much lower than IMCC1 when ρ=1/1 ( Figure 2C, Supplementary Figure 2). Therefore, we chose the IMCC1 model as the final model with a ρ value of 1/1. The final formula can be simplified as follows: To some extent, the integrated method, which is based on the topology structure of networks, reflects the functional coordination of module pairs.

The IMCC method is more precise and validated by a topological parameter
In the comparison, IMCC showed increased performance of an integrated score relative to the SW. The fitting model of SW was y = 0.1374ln(x) + 0.9807, with R2 equal to 0.580, which is lower than 0.616 of IMCC ( Figure 2E). Overall, the results indicated that IMCC achieved better performance on the weighted gene co-expression data. It also means that the results based on IMCC were more consistent with biological function than SW. After all, the IMCC provided a more precise evaluation for inter-module connections.
Unlike protein-protein interaction networks, gene co-expression networks are weighted networks.
Thus, inter-module connections in such networks should not only include dichotomous edge (0 or 1), but also quantitative precision information. We introduced the SW, CT and PS to quantitatively analyze these inter-module connections, taking the weight and amount of the edge of molecular networks into consideration. We compared the correlation based on IMCC and a dichotomous topological parameter, inter-module average shortest path (also known as average characteristic path length). The results of the two assessments were generally consistent ( Figure 2F