RT Journal Article SR Electronic T1 Empowering the annotation and discovery of structured RNAs with scalable and accessible integrative clustering JF bioRxiv FD Cold Spring Harbor Laboratory SP 550335 DO 10.1101/550335 A1 Milad Miladi A1 Eteri Sokhoyan A1 Torsten Houwaart A1 Steffen Heyne A1 Fabrizio Costa A1 Björn Grüning A1 Rolf Backofen YR 2019 UL http://biorxiv.org/content/early/2019/02/26/550335.abstract AB RNA plays essential regulatory roles in all known forms of life. Clustering RNA sequences with common sequence and structure is an essential step towards studying RNA function. With the advent of high-throughput sequencing techniques, experimental and genomic data are expanding to complement the predictive methods. However, the existing methods do not effectively utilize and cope with the immense amount of data becoming available.Here we present GraphClust2, a comprehensive approach for scalable clustering of RNAs based on sequence and structural similarities. GraphClust2 provides an integrative solution by incorporating diverse types of experimental and genomic data in an accessible fashion via the Galaxy framework. We demonstrate that the tasks of clustering and annotation of structured RNAs can be considerably improved, through a scalable methodology that also supports structure probing data. Based on this, we further introduce an off-the-shelf procedure to identify locally conserved structure candidates in long RNAs. In this way, we suggest the presence and the sparsity of phylogenetically conserved local structures in some long non-coding RNAs. Furthermore, we demonstrate the advantage of a scalable clustering for discovering structured motifs under inherent and experimental biases and uncover prominent targets of the double-stranded RNA binding protein Roquin-1 that are evolutionary conserved.