RT Journal Article SR Electronic T1 SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data JF bioRxiv FD Cold Spring Harbor Laboratory SP 127878 DO 10.1101/127878 A1 Ivan Dotu A1 Scott Adamson A1 Benjamin Coleman A1 Cyril Fournier A1 Emma Ricart-Altimiras A1 Eduardo Eyras A1 Jeffrey H. Chuang YR 2017 UL http://biorxiv.org/content/early/2017/04/16/127878.abstract AB RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interactions remains limited, and in particular identification of the RNA motifs that bind proteins has long been a difficult problem. To address this challenge, we have developed a novel semi-automatic algorithm, SARNAclust, to computationally identify combined structure/sequence motifs from immunoprecipitation data. SARNAclust is, to our knowledge, the first unsupervised method that can identify RNA motifs at full structural resolution while also being able to simultaneously deconvolve multiple motifs. SARNAclust makes use of a graph kernel to evaluate similarity between sequence/structure objects, and provides the ability to isolate the impact of specific features through the bulge graph formalism. SARNAclust includes a key method for predicting RNA secondary structure at CLIP peaks, RNApeakFold, which we have verified to be effective on synthetic motif data. We applied SARNAclust to 30 ENCODE eCLIP datasets, identifying known motifs and novel predictions. Notably, we predicted a new motif for the protein ILF3 similar to that for the splicing factor hnRNPC, providing evidence for interaction between these two proteins. To validate our predictions, we performed a directed RNA bind-n-seq assay for two proteins: ILF3 and SLBP, in each case revealing the effectiveness of SARNAclust in predicting RNA sequence and structure elements important to protein binding. Availability: https://github.com/idotu/SARNAclust