iRegulon: from a gene list to a gene regulatory network using large motif and track collections

PLoS Comput Biol. 2014 Jul 24;10(7):e1003731. doi: 10.1371/journal.pcbi.1003731. eCollection 2014 Jul.

Abstract

Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from http://iregulon.aertslab.org.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Breast Neoplasms
  • Cell Line, Tumor
  • Chromatin Immunoprecipitation
  • Computational Biology / methods*
  • Databases, Genetic
  • Gene Expression Regulation / genetics*
  • Gene Regulatory Networks / genetics*
  • Genes, p53
  • Humans
  • Models, Genetic
  • Sequence Analysis, RNA
  • Transcription Factors / genetics*

Substances

  • Transcription Factors

Associated data

  • GEO/GSE47043

Grants and funding

This work is funded by FWO (www.fwo.be) (grants G.0704.11N and G.0640.13 to SA), Special Research Fund (BOF) KU Leuven (http://www.kuleuven.be/research/funding/bof/) (grant PF/10/016 and OT/13/103 to SA), HFSP (www.hfsp.org) (grant RGY0070/2011 to SA), and Foundation Against Cancer (http://www.cancer.be) (grants 2010-154 and 2012-F2 to SA). RJ is supported by postdoc fellowships from Belspo, KU Leuven Research Fund (F+) and FWO. AV and LS have PhD fellowships from FWO. BVdS was supported by a 1-year fellowship from the Vlaamse Liga tegen Kanker (VLK). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.