ABSTRACT
Plants have significantly more transcription factor (TF) families than animals and fungi, and plant TF families tend to contain more genes—these expansions are linked to adaptation to environmental stressors (1, 2). Many TF family members bind to similar or identical sequence motifs, such as G-boxes (CACGTG), so it is difficult to predict regulatory relationships. We determine that the flanking sequences near G-boxes help determine in vitro specificity, but that this is insufficient to predict the transcription pattern of genes near G-boxes. Therefore, we construct a gene regulatory network that identifies the set of bZIPs and bHLHs that are most predictive of the gene expression of genes downstream of perfect G-boxes. This network accurately predicts transcriptional patterns and reconstructs known regulatory subnetworks. Finally, we present Ara-BOX-cis (araboxcis.org), a website that provides interactive visualisations of the G-box regulatory network, a useful resource for generating predictions for gene regulatory relations.