PT - JOURNAL ARTICLE AU - Ivan Dotu AU - Scott Adamson AU - Benjamin Coleman AU - Cyril Fournier AU - Emma Ricart-Altimiras AU - Eduardo Eyras AU - Jeffrey H. Chuang TI - SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data AID - 10.1101/127878 DP - 2017 Jan 01 TA - bioRxiv PG - 127878 4099 - http://biorxiv.org/content/early/2017/04/16/127878.short 4100 - http://biorxiv.org/content/early/2017/04/16/127878.full AB - RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interactions remains limited, and in particular identification of the RNA motifs that bind proteins has long been a difficult problem. To address this challenge, we have developed a novel semi-automatic algorithm, SARNAclust, to computationally identify combined structure/sequence motifs from immunoprecipitation data. SARNAclust is, to our knowledge, the first unsupervised method that can identify RNA motifs at full structural resolution while also being able to simultaneously deconvolve multiple motifs. SARNAclust makes use of a graph kernel to evaluate similarity between sequence/structure objects, and provides the ability to isolate the impact of specific features through the bulge graph formalism. SARNAclust includes a key method for predicting RNA secondary structure at CLIP peaks, RNApeakFold, which we have verified to be effective on synthetic motif data. We applied SARNAclust to 30 ENCODE eCLIP datasets, identifying known motifs and novel predictions. Notably, we predicted a new motif for the protein ILF3 similar to that for the splicing factor hnRNPC, providing evidence for interaction between these two proteins. To validate our predictions, we performed a directed RNA bind-n-seq assay for two proteins: ILF3 and SLBP, in each case revealing the effectiveness of SARNAclust in predicting RNA sequence and structure elements important to protein binding. Availability: https://github.com/idotu/SARNAclust