TY - JOUR T1 - SArKS: Discovering gene expression regulatory motifs and domains by suffix array kernel smoothing JF - bioRxiv DO - 10.1101/133934 SP - 133934 AU - Dennis C. Wylie AU - Hans A. Hofmann AU - Boris V. Zemelman Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/05/03/133934.abstract N2 - Experiments designed to assess differential gene expression represent a rich resource for discovering how DNA regulatory sequences influence transcription. Results derived from such experiments are usually quantified as continuous scores, such as fold changes, test statistics and p-values. We present a de novo motif discovery algorithm, SArKS, which uses a nonparametric kernel smoothing approach to identify promoter motifs correlated with elevated differential expression scores. SArKS has the capability to smooth over both motif sequence similarity and, in a second pass, over spatial proximity of multiple motifs to identify longer regions enriched in correlative motifs. We applied SArKS to simulated data, illustrating how SArKS can be used to find motifs embedded in random background sequences, and to two published RNA-seq expression data sets, one probing S. cerevisiae transcriptional response to anti-fungal agents and the other comparing gene expression profiles among cortical neuron subtypes in M. musculus. For both RNA-seq sets we successfully identified motifs whose kernel-smoothed scores were significantly elevated compared to the permutation-estimated background distributions. We found strong similarities between these identified motifs and known, biologically meaningful sequence elements which may help to provide additional context for the results previously published regarding these data sets. Finally, because eukaryotic transcription regulation is highly combinatorial, we also outline how SArKS methods might be extended to discover synergistic motifs. ER -