Secondary Structure Prediction of Interacting RNA Molecules

doi:10.1016/j.jmb.2004.10.082

Journal of Molecular Biology

Volume 345, Issue 5, 4 February 2005, Pages 987-1001

https://doi.org/10.1016/j.jmb.2004.10.082 Get rights and content

Computational tools for prediction of the secondary structure of two or more interacting nucleic acid molecules are useful for understanding mechanisms for ribozyme function, determining the affinity of an oligonucleotide primer to its target, and designing good antisense oligonucleotides, novel ribozymes, DNA code words, or nanostructures. Here, we introduce new algorithms for prediction of the minimum free energy pseudoknot-free secondary structure of two or more nucleic acid molecules, and for prediction of alternative low-energy (sub-optimal) secondary structures for two nucleic acid molecules. We provide a comprehensive analysis of our predictions against secondary structures of interacting RNA molecules drawn from the literature. Analysis of our tools on 17 sequences of up to 200 nucleotides that do not form pseudoknots shows that they have 79% accuracy, on average, for the minimum free energy predictions. When the best of 100 sub-optimal foldings is taken, the average accuracy increases to 91%. The accuracy decreases as the sequences increase in length and as the number of pseudoknots and tertiary interactions increases. Our algorithms extend the free energy minimization algorithm of Zuker & Stiegler for secondary structure prediction, and the sub-optimal folding algorithm by Wuchty et al. Implementations of our algorithms are freely available in the package MultiRNAFold (http://www.rnasoft.ca/download.html).

Introduction

Computational tools for prediction of the secondary structure (a set of base-pairs, with each base occurring in at most one pair) of a single nucleic acid molecule provide insight into the structure of RNA molecules,¹ aid in comparative analysis and alignment of RNA sequences,² and are used to help design and screen libraries of antisense or primer oligonucleotides.3, 4, 5 Such tools include the Mfold server⁶ and the Vienna package,⁷ which, for a given input sequence, calculate the pseudoknot-free secondary structure that has minimum free energy (MFE) according to a standard thermodynamic model. These tools have been significantly enhanced over the years. For example, a list of sub-optimal secondary structures whose energies are close to that of the MFE secondary structure is provided with Mfold; this is useful, since the MFE predictions are not always correct, and some sequences have more than one stable secondary structure.

In some applications, it is desirable to predict the secondary structure of two or more interacting nucleic acids. We focus on this problem here. Such predictions aid in understanding mechanisms for ribozyme function, in determining the affinity of an oligonucleotide primer to its target,⁵ and in designing good antisense oligonucleotides,⁸ novel ribozymes⁹ or nanostructures.¹⁰

A method for predicting the MFE secondary structure of two or multiple sequences was originally briefly proposed, but not implemented, by Hofacker et al.⁷ Mathews et al.⁸ provide an implementation for two sequences; their algorithm calculates the MFE secondary structure formed by a probe and its target, and is used in their OligoWalk software, which selects a good probe for a particular target. However, neither work provides the algorithmic details nor handles the task of predicting sub-optimal secondary structures, and the OligoWalk software cannot handle more than two strands. Another related piece of software is the two-state hybridization server described by Zuker,⁶ a simple extension of the Mfold program, but this tool is less general, in that it considers a limited range of potential secondary structures. Finally, the HyTher software tool† calculates the free energy of stacked pairs or mismatches at the corresponding positions in the two input sequences. No minimization algorithm is performed, and the input sequences must have the same length.

In this work, we describe two algorithms, PairFold and MultiFold, for prediction of the MFE pseudoknot-free secondary structure of two or more interacting nucleic acids. PairFold is the first tool to predict sub-optimal secondary structures of two interacting strands, and MultiFold is the first to handle multiple strands. Both programs use the standard thermodynamic parameters of the Turner group11, 12 for RNA molecules and of SantaLucia Jr¹³ for DNA molecules. Our algorithms have been implemented using C++. The package is open source and can be downloaded‡. We provide a detailed analysis of the performance of PairFold and MultiFold on several data sets, in order to benchmark the quality of MFE thermodynamic predictions for complexes of two or more interacting RNA or DNA molecules by our algorithms.

PairFold predicts the MFE secondary structure that can be formed by two interacting nucleic acid molecules. The structure may include inter-molecular pairing (base-pairing between the two molecules) as well as intra-molecular pairing (base-pairing within each molecule); see Figure 1(b). PairFold algorithm takes as input a pair of RNA molecules S₁ and S₂, and extends the dynamic programming algorithm by Zuker & Stiegler¹⁴ for single molecules, which underlies the Mfold software.¹² The idea is straightforward: the two given sequences S₁ and S₂ are concatenated, and the linkage location is memorized.

The MFE secondary structure is calculated, where the energy of a structure is the sum of the energies of its component elementary structures, see Figure 1. We consider the same elementary structures appear in a pair of interacting molecules as for a single molecule, except that in addition a “special” loop contains the location at which the molecules are linked. A special loop is treated as an external loop, except that a penalty for inter-molecular interaction is added. Handling special loops that form multi-loops, should the linkage location be “sealed”, is the most significant extension to the single-molecule algorithm, and requires two new dynamic programming arrays to handle special multi-loops. The output of the program is independent of the order of concatenation. Roughly, this is because, for a given secondary structure, the set of its elementary structures is the same regardless of the order of concatenation; the only difference being which of the two external loops is treated by the algorithm as a special loop. PairFold also calculates the duplex melting temperature, as a function of the reactants' concentrations and ionic concentration.¹⁵

To predict sub-optimal structures in PairFold, we extend the algorithm of Wuchty et al.¹⁶ for single molecules in two ways: by handling two strands as input; and by outputting a number k of sub-optimal structures whose free energies are closest to that of the MFE secondary structure, where k is specified by the user. In contrast, the algorithm of Wuchty et al. outputs all secondary structures whose free energy values are within a specified distance from the free energy of the optimal structure. Our method gives the user more direct control over the number of sub-optimal structures that are output.

MultiFold predicts the MFE secondary structure formed by several interacting nucleic acid molecules. The MultiFold algorithm is similar to that of PairFold, with the method for handling special elementary structures generalized to manage the case where more than one linkage location may lie in the structure. Furthermore, the input strands are concatenated in all possible orders, and the one having the lowest MFE is returned.

In order to assess the quality of predictions by PairFold, we analyzed the accuracy (i.e. sensitivity and specificity) of predicted structures, compared with secondary structures reported in the literature. Our test data include ribozyme and their RNA targets;9, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 small nuclear RNAs of the U2, U4 and/or U6 snRNPs;30, 31 and small DNA and RNA duplexes used by Peyret et al.³² and by Xia et al.³³ to experimentally determine thermodynamic parameters for mismatched bases. Overall, we found that PairFold predictions on relatively short (length ranging from 43 to 170) duplexes, such as ribozyme–target complexes, are good, with 87% sensitivity and 82% specificity, on average, on pseudoknot-free structures. In cases where the prediction quality of PairFold is poorer, such as for pseudoknotted structures, we found that the best prediction from the top 100 sub-optimal structures was significantly better than the MFE prediction.

To test PairFold on longer complexes (length over 1000 nucleotides), we used two data sets from Yu et al.⁵: a combinatorial library of ribozymes for a target viral mRNA, and a small library of primers for the same target. While the complete secondary structures of the ribozyme–target duplexes are not known, it can be inferred from the experimental data whether or not the secondary structure at the site where the ribozyme is active conforms to the known secondary structure of ribozymes. We tested whether PairFold predicted the active part of the ribozyme–target duplex correctly. We found that PairFold predictions are quite accurate when the target is short, but they are poor for long (e.g. 1.1 kb) targets. We expected this poor performance, as all the existing approaches for RNA folding (including Mfold) perform more poorly on long structures than on short ones.

We tested MultiFold on five complexes that are variations of hairpin and hammerhead ribozyme constructs, each with three or four interacting molecules.20, 34, 35 The sensitivity of predictions of MultiFold on these complexes was also very high, over 93% in every case. Finally, we ran MultiFold on DNA strands designed for a molecular automaton that diagnoses high or low levels of mRNA strands in vitro.¹⁰ The strands are designed to form certain secondary structures, which are essential to correct functioning of the automaton. When the sequences are ordered so that the designed secondary structures can form without pseudoknots, the MFE secondary structure reported by MultiFold matches the designed secondary structures with accuracy between 0.91 and 1. With other permutations, the secondary structures reported by MultiFold also have high accuracy, supporting the good quality of the strand design.

First, we give a thorough analysis of our algorithms' accuracy on experimental data found in the literature. Then, we give the key issues of the algorithms and we analyse their computational complexity. Finally, we present a discussion on the accuracy and the limitations of the tools we propose, and we give conclusions and future work.

Section snippets

Results

In this section we analyze the accuracy of PairFold and MultiFold predictions on several reference structures from the biological literature. Both programs are able to take as input one RNA sequence as well, as opposed to two or several, in which case the result will be equivalent to the one returned by Mfold, except that at this point we do not incorporate coaxial stacking calculations for multi-loops. Since our goal here is to measure the accuracy of folding for two or more interacting RNA

Algorithms

In this section, we briefly describe our algorithms. Full details, including the recurrences for our algorithms, are given in the Supplementary Materials.

Accuracy and limitations

Our results show that PairFold has overall accuracy of 79% on sequences of up to 200 nucleotides in length, and which do not form pseudoknots. When generating 100 sub-optimal foldings and selecting the best, the accuracy increases to 91%. These statistics are roughly consistent with the data reported by Mathews et al.¹² for single sequences, which shows that the predicted lowest free energy structure by Mfold contains 73% of known base-pairs, on single sequences of length up to 700 nucleotides;

Acknowledgements

We thank Holger Hoos, Dan Tulpan, Sanja Rogic, Jérémy Barbay, Greg Lakatos, and Lloyd Smith for valuable input on this work. This material is based upon work supported by the National Science Foundation under grant numbers 0130108 and 0203892. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

References (42)

I.L. Hofacker et al.
Secondary structure prediction for aligned RNA sequences
J. Mol. Biol.
(2002)
J.-Y. Wang et al.
Modelling hybridization kinetics
Math. Biosci.
(2003)
Q. Yu et al.
Cleavage of highly structured viral RNA molecules by combinatorial libraries of hairpin ribozymes
J. Biol. Chem.
(1998)
D.H. Mathews et al.
Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure
J. Mol. Biol.
(1999)
K.-Y. Chang et al.
The structure of an RNA “kissing” hairpin complex of the HIV TAR hairpin loop and its complement
J. Mol. Biol.
(1997)
A. Mougin et al.
Direct probing of RNA structure and RNA–protein interactions in purified HeLa cells and yeast spliceosomal U4/U6.U5 tri-snRNP particles
J. Mol. Biol.
(2002)
E. Rivas et al.
A dynamic programming algorithm for RNA structure prediction including pseudoknots
J. Mol. Biol.
(1999)
M. Zuker et al.
Algorithms and thermodynamics for RNA secondary structure prediction: a practical guide
O.V. Matveeva et al.
Thermodynamic criteria for high hit rate antisense oligonucleotide design
Nucl. Acids Res.
(2003)
M. Zuker
Mfold web server for nucleic acid folding and hybridization prediction
Nucl. Acids Res.
(2003)

I.L. Hofacker et al.

Fast folding and comparison of RNA secondary structures

Chem. Monthly

(1994)

D.H. Mathews et al.

Predicting oligonucleotide affinity to nucleic acid targets

RNA

(1999)

A. Barroso-delJesus et al.

Selection of targets and the most efficient hairpin ribozymes for inactivation of mRNAs using a self-cleaving RNA library

EMBO Rep.

(2001)

Y. Benenson et al.

An autonomous molecular computer for logical control of gene expression

Nature

(2004)

M.J. Serra et al.

Predicting thermodynamic properties of RNA

Methods Enzymol.

(1995)

J. SantaLucia

A unified view of polymer, dumbbell, and oligonucleotide DNA nearest neighbour thermodynamics

Proc. Natl Acad. Sci. USA

(1998)

M. Zuker et al.

Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information

Nucl. Acids Res.

(1981)

J.G. Wetmur

DNA probes: applications of the principles of nucleic acid hybridization

Crit. Rev. Biochem. Mol. Biol.

(1991)

S. Wuchty et al.

Complete sub-optimal folding of RNA and the stability of secondary structures

Biopolymers

(1999)

Y. Kasai et al.

Measurements of weak interactions between truncated substrates and a hammerhead ribozyme by competitive kinetic analyses: implications for the design of new and efficient ribozymes with high sequence specificity

Nucl. Acids Res.

(2002)

N.K. Vaish et al.

Recent developments in the hammerhead ribozyme field

Nucl. Acids Res.

(1998)

Cited by (155)

Taking RNA-RNA Interaction to Machine Peak
2024, IEEE Transactions on Parallel and Distributed Systems
3D feasibility of 2D RNA–RNA interaction paths by stepwise folding simulations
2024, RNA
Experimental and computational methods for studying the dynamics of RNA-RNA interactions in SARS-COV2 genomes
2024, Briefings in Functional Genomics
A Hitchhiker’s guide to RNA–RNA structure and interaction prediction tools
2024, Briefings in Bioinformatics
LinearCoFold and LinearCoPartition: linear-time algorithms for secondary structure prediction of interacting RNA molecules
2023, Nucleic Acids Research
Deep learning models of RNA base-pairing structures generalize to unseen folds and make accurate zero-shot predictions of base-base interactions of RNA complexes
2023, Research Square

View all citing articles on Scopus

View full text

Journal of Molecular Biology

Secondary Structure Prediction of Interacting RNA Molecules

Introduction

Section snippets

Results

Algorithms

Accuracy and limitations

Acknowledgements

J. Mol. Biol.

Math. Biosci.

J. Biol. Chem.

J. Mol. Biol.

J. Mol. Biol.

J. Mol. Biol.

J. Mol. Biol.

Algorithms and thermodynamics for RNA secondary structure prediction: a practical guide

Thermodynamic criteria for high hit rate antisense oligonucleotide design

Nucl. Acids Res.

Mfold web server for nucleic acid folding and hybridization prediction

Nucl. Acids Res.

Fast folding and comparison of RNA secondary structures

Chem. Monthly

Predicting oligonucleotide affinity to nucleic acid targets

RNA

Selection of targets and the most efficient hairpin ribozymes for inactivation of mRNAs using a self-cleaving RNA library

EMBO Rep.

An autonomous molecular computer for logical control of gene expression

Nature

Predicting thermodynamic properties of RNA

Methods Enzymol.

A unified view of polymer, dumbbell, and oligonucleotide DNA nearest neighbour thermodynamics

Proc. Natl Acad. Sci. USA

Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information

Nucl. Acids Res.

DNA probes: applications of the principles of nucleic acid hybridization

Crit. Rev. Biochem. Mol. Biol.

Complete sub-optimal folding of RNA and the stability of secondary structures

Biopolymers

Measurements of weak interactions between truncated substrates and a hammerhead ribozyme by competitive kinetic analyses: implications for the design of new and efficient ribozymes with high sequence specificity

Nucl. Acids Res.

Recent developments in the hammerhead ribozyme field

Nucl. Acids Res.