Principles for Predicting RNA Secondary Structure Design Difficulty

https://doi.org/10.1016/j.jmb.2015.11.013Get rights and content
Under a Creative Commons license
open access

Highlights

  • We outline several secondary structure elements and structural idiosyncrasies that lead to difficult RNA design problems, and we test a principle of least elements.

  • We describe the contributions of symmetry to design difficulty and its interplay with the presence and repetition of difficult secondary structure elements.

  • We introduce the Eterna100 benchmark for evaluating RNA design methods and evaluate six existing algorithms using this benchmark.

  • This is the first paper based on dominant writing contributions—and co-lead authorship—by non-expert citizen scientists recruited through a video game.

Abstract

Designing RNAs that form specific secondary structures is enabling better understanding and control of living systems through RNA-guided silencing, genome editing and protein organization. Little is known, however, about which RNA secondary structures might be tractable for downstream sequence design, increasing the time and expense of design efforts due to inefficient secondary structure choices. Here, we present insights into specific structural features that increase the difficulty of finding sequences that fold into a target RNA secondary structure, summarizing the design efforts of tens of thousands of human participants and three automated algorithms (RNAInverse, INFO-RNA and RNA-SSD) in the Eterna massive open laboratory. Subsequent tests through three independent RNA design algorithms (NUPACK, DSS-Opt and MODENA) confirmed the hypothesized importance of several features in determining design difficulty, including sequence length, mean stem length, symmetry and specific difficult-to-design motifs such as zigzags. Based on these results, we have compiled an Eterna100 benchmark of 100 secondary structure design challenges that span a large range in design difficulty to help test future efforts. Our in silico results suggest new routes for improving computational RNA design methods and for extending these insights to assess “designability” of single RNA structures, as well as of switches for in vitro and in vivo applications.

Keywords

RNA design
RNA secondary structure
inverse folding
benchmark
citizen science

Cited by (0)

J.A-L., E.F., V.K. and M.W. contributed equally to this work.

7

Present address: M. Lee, Department of Computer Science, Stanford University, Stanford, CA 94305, USA.

A complete list of the Eterna players can be found in Dataset S1.