Automated structure-based learning to model co-operativity and protein-DNA interactions in cis-regulatory modules

O Fornes; A Meseguer; J Aguirre-Plans; P Gohl; P Mireia-Bota; R Molina-Fernández; J Bonet; A Chinchilla; F Pegenaute; O Gallego; N Fernandez-Fuentes; B Oliva

doi:10.1101/2022.04.17.488557

ABSTRACT

Transcription factor (TF) binding is a key component of genomic regulation. There are numerous high-throughput experimental methods to characterize TF-DNA binding specificities. Their application, however, is both laborious and expensive, which makes profiling all TFs challenging. For instance, the binding preferences of ∼25% human TFs remain unknown; they neither have been determined experimentally nor inferred computationally. Here, we introduce a structure-based learning approach to predict the binding preferences of TFs and a web server to automatically model higher-order TF regulatory complexes (ModCRE). Our approach uses high-throughput TF binding data, such as from protein binding microarrays, to address the protein-DNA structure scarcity problem for learning the binding preferences of TFs. We show the conditional advantage of using our approach over the state-of-art nearest-neighbor method for predicting TF binding sites. We improve prediction accuracy when using an enrichment selection system that uses many neighbors or structure-models. Starting from a TF sequence or structure, ModCRE predicts its binding preferences in the form of motifs. The predicted motifs are then used to scan a DNA sequence for occurrences, and the best matches are either profiled with a binding score or collected for their subsequent modeling into a higher-order regulatory complex with DNA. Co-operativity is modelled by: i) the co-localization of TFs; and ii) the structural modeling of protein-protein interactions between TFs and with co-factors. As case examples, we apply our approach to automatically model the interferon-β enhanceosome and the pioneering complex of OCT4, SOX2 and SOX11 with a nucleosome, which are compared with the experimentally known structures.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

1) A more descriptive title 2) Reduce the amount of technical abbreviations
https://sbi.upf.edu/modcre

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.