Abstract
Transcription factors (TFs) often function as cis-regulatory modules (CRMs) including both master factors and mediator coactivators to activate enhancers or promoters and regulate target gene transcription. Cell type-specific ChIP-seq profiling of multiple TFs makes it feasible to infer functional CRMs for a particular cell type. Yet, approaches based on co-localization of TF ChIP-seq peaks to infer CRMs are applied but many weak binding events, especially of those mediators, are missed by peak callers, resulting in an incomplete identification of CRMs. We developed a ChIP-seq data-based CRM inference approach with Gibbs-Sampling (ChIP-GSM). In a Bayesian framework, ChIP-GSM samples read counts of TFs iteratively for the joint effect of each potential TF combination. Using inferred CRMs as novel features, ChIP-GSM employs a logistic regression model to predict active regulatory elements. Performance validation on FANTOM5 enhancer or promoter regions revealed the superior performance of CRMs on regulatory region activity prediction than TFs. Finally, integrating CRMs inferred for K562 cells and gene expression data we found that CRMs are likely to activate regulatory regions or genes at different time points to mediate distinct cellular functions.
Author Summary Accurately inferring cis-regulatory modules (CRMs) from a large set of TFs is a challenging task because the binding signals of TFs are often weak, noisy and sensitive to the cellular environment. Nevertheless, investigating TF associations may help understand the difference between enhancer and promoter activation mechanisms. In this paper, we develop a computational method (ChIP-GSM) to infer CRMs acting on regulatory elements at enhancer and promote regions. The novel method is built upon a Bayesian framework with Gibbs sampling that can be used to infer CRMs reliably hence to predict regulatory elements. The performance of ChIP-GSM is compared to that of existing methods, demonstrating its improved performance. Experimental results demonstrate that CRMs identified by ChIP-GSM are likely activating regulatory regions at different time points to mediate distinct cellular functions.
Competing Interest Statement
The authors have declared no competing interest.