Abstract
Complex gene expression patterns are mediated by binding of transcription factors (TF) to specific genomic loci. The in vivo occupancy of a TF is, in large part, determined by the TF’s DNA binding interaction partners, motivating genomic context based models of TF occupancy. However, the approaches thus far have assumed a uniform binding model to explain genome wide bound sites for a TF in a cell-type and as such heterogeneity of TF occupancy models, and the extent to which binding rules underlying a TF’s occupancy are shared across cell types, has not been investigated. Here, we develop an ensemble based approach (TRISECT) to identify heterogeneous binding rules of cell-type specific TF occupancy and analyze the inter-cell-type sharing of such rules. Comprehensive analysis of 23 TFs, each with ChIP-Seq data in 4-12 cell-types, shows that by explicitly capturing the heterogeneity of binding rules, TRISECT accurately identifies in vivo TF occupancy (93%) substantially improving upon previous methods. Importantly, many of the binding rules derived from individual cell-types are shared across cell-types and reveal distinct yet functionally coherent putative target genes in different cell-types. Closer inspection of the predicted cell-type-specific interaction partners provides insights into context-specific functional landscape of a TF. Together, our novel ensemble-based approach reveals, for the first time, a widespread heterogeneity of binding rules, comprising interaction partners within a cell-type, many of which nevertheless transcend cell-types. Notably, the putative targets of shared binding rules in different cell-types, while distinct, exhibit significant functional coherence.