The four ingredients of single-sequence RNA secondary structure prediction. A unifying perspective

Elena Rivas

doi:10.4161/rna.24971

The four ingredients of single-sequence RNA secondary structure prediction. A unifying perspective

RNA Biol. 2013 Jul;10(7):1185-96. doi: 10.4161/rna.24971. Epub 2013 May 10.

Author

Elena Rivas¹

Affiliation

¹ Janelia Farm Research Campus; Howard Hughes Medical Institute; Ashburn, VA USA.

Abstract

Any method for RNA secondary structure prediction is determined by four ingredients. The architecture is the choice of features implemented by the model (such as stacked basepairs, loop length distributions, etc.). The architecture determines the number of parameters in the model. The scoring scheme is the nature of those parameters (whether thermodynamic, probabilistic, or weights). The parameterization stands for the specific values assigned to the parameters. These three ingredients are referred to as "the model." The fourth ingredient is the folding algorithms used to predict plausible secondary structures given the model and the sequence of a structural RNA. Here, I make several unifying observations drawn from looking at more than 40 years of methods for RNA secondary structure prediction in the light of this classification. As a final observation, there seems to be a performance ceiling that affects all methods with complex architectures, a ceiling that impacts all scoring schemes with remarkable similarity. This suggests that modeling RNA secondary structure by using intrinsic sequence-based plausible "foldability" will require the incorporation of other forms of information in order to constrain the folding space and to improve prediction accuracy. This could give an advantage to probabilistic scoring systems since a probabilistic framework is a natural platform to incorporate different sources of information into one single inference problem.

Keywords: RNA secondary structure prediction; context-free grammars; probabilistic models; statistical training; thermodynamic parameters.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Computational Biology / methods*
Nucleic Acid Conformation*
RNA / chemistry*
RNA Folding
Software*

Substances

RNA

Grants and funding

Howard Hughes Medical Institute/United States