Abstract
The identification of disease associated modules based on protein-protein interaction networks (PPINs) and gene expression data has provided new insights into the mechanistic nature of diverse diseases. A major problem hampering their identification is the detection of protein communities within large-scale, whole-genome PPINs. Current strategies solve the maximal clique enumeration (MCE) problem, i.e., the enumeration of all non-extendable groups of proteins, where each pair of proteins is connected by an edge. The MCE problem however is non-deterministic polynomial time hard and can thus be computationally overwhelming for large-scale, whole-genome PPINs.
We present ModuleDiscoverer, a novel approach for the identification of regulatory modules from PPINs in conjunction with gene-expression data. ModuleDiscoverer is a heuristic that approximates the community structure underlying PPINs. Based on a high-confidence PPIN of Rattus norvegicus and publicly available gene expression data we apply our algorithm to identify the regulatory module of a rat-model of diet induced non-alcoholic steatohepatitis (NASH). We validate the module using single-nucleotide polymorphism data from independent genome-wide association studies. Structural analysis of the module reveals 10 sub-modules. These sub-modules are associated with distinct biological functions and pathways that are relevant to the pathological and clinical situation in NASH.
ModuleDiscoverer is freely available upon request from the corresponding author.