Abstract
Disease modules in molecular interaction maps have been useful for characterizing diseases. Yet biological networks, commonly used to define such modules are incomplete and biased toward some well-studied disease genes. Here we ask whether disease-relevant modules of genes can be discovered without assuming the prior knowledge of a biological network. To this end we train a deep auto-encoder on a large transcriptional data-set. Our hypothesis is that such modules could be discovered in the deep representations within the auto-encoder when trained to capture the variance in the input-output map of the transcriptional profiles. Using a three-layer deep auto-encoder we find a statistically significant enrichment of GWAS relevant genes in the third layer, and to a successively lesser degree in the second and first layers respectively. In contrast, we found an opposite gradient where a modular protein-protein interaction signal was strongest in the first layer but then vanishing smoothly deeper in the network. We conclude that a data-driven discovery approach, without assuming a particular biological network, is sufficient to discover groups of disease-related genes.