Abstract
Cells respond to environmental stimuli through transcriptional responses, orchestrated by transcription factors (TFs) that interpret the gene cis-regulatory DNA sequences, determining gene expression dynamics timing and locations. Diversification in TFs and cis-regulatory element (CRE) interactions result in unique gene regulatory networks (GRNs) that underpin plant adaptation. A primary challenge is identifying Transcription Factor Binding Motifs (TFBMs) for temporal and condition-specific gene expressions in plants. While the Multiple EM for Motif Elicitation (MEME) suite identifies stress-responsive CREs in Arabidopsis, its predictive power for gene expression remains uncertain. Alternatively, the k-mer approach identifies CRE sites and consensus TF motifs, thereby improving gene expression prediction models. In this study, we harnessed the power of a k-mer pipeline to address sequence-to-expression prediction problems across diverse abiotic stresses, in both bryophytic and vascular plants, including monocots and dicots. Moreover, we characterized both un-gapped and gapped CREs and, coupled with GRN analyses, pinpointed key TFs within transcriptional cascades. Lastly, we developed the Predictive Regulatory Element Database for Identifying Cis-regulatory elements and Transcription factors (PREDICT), a web tool for efficient k-mer identification. This advancement will enrich our understanding of the cis-regulatory code landscape that shapes gene regulation in plant adaptation. PREDICT web tool is available at [http://predict.southerngenomics.org/kmers/kmers.php].
Competing Interest Statement
The authors have declared no competing interest.