PT - JOURNAL ARTICLE AU - Lionel Morgado AU - Ritsert C. Jansen AU - Frank Johannes TI - Learning sequence patterns of AGO-sRNA affinity from high-throughput sequencing libraries to improve <em>in silico</em> functional small RNA detection and classification in plants AID - 10.1101/173575 DP - 2017 Jan 01 TA - bioRxiv PG - 173575 4099 - http://biorxiv.org/content/early/2017/09/02/173575.short 4100 - http://biorxiv.org/content/early/2017/09/02/173575.full AB - The loading of small RNA (sRNA) into Argonaute (AGO) complexes is a crucial step in all regulatory pathways identified so far in plants that depend on such non-coding sequences. Important transcriptional and post-transcriptional silencing mechanisms can be activated depending on the specific AGO protein to which sRNA bind. It is known that sRNA-AGO associations are at least partly encoded in the sRNA primary structure, but the sequence features that drive this association have not been fully explored. Here we train support vector machines (SVM) on sRNA sequencing data obtained from AGO-immunoprecipitation experiments to identify features that determine sRNA affinity to specific AGOs. Our SVM reveal that AGO affinity is strongly determined by complex k-mers in the 5’ and 3’ ends of sRNA, in addition to well-known features such as sRNA length and the base composition of the first nucleotide. Moreover, we find that these k-mers tend to overlap known transcription factor (TF) binding motifs, thus highlighting a close interplay between TF and sRNA-mediated transcriptional regulation. We embedded the learned SVM in a computational pipeline that can be used for de novo functional classification of sRNA sequences. This tool, called SAILS, is provided as a web portal accessible at http://sails.eu.nu.