RNA structure prediction including pseudoknots through direct enumeration of states

Ofer Kimchi; Tristan Cragnolini; Michael P. Brenner; Lucy J. Colwell

doi:10.1101/338921

Abstract

The accurate prediction of RNA secondary structure from primary sequence has had enormous impact on research from the past forty years. While many algorithms are available to make these predictions, the inclusion of non-nested loops, termed pseudoknots, still poses challenges. Here, we describe a new method to compute the entire free energy landscape of secondary structures of RNA resulting from a primary RNA sequence, by combining a polymer physics model for the entropy of pseudoknots with exhaustive enumeration of the set of possible structures. Our polymer physics model can address arbitrarily complex pseudoknots and has only two free loop entropy parameters that correspond to concrete physical quantities, over an order of magnitude fewer than even the sparsest state-of-the-art algorithms. Our model outperforms previously published methods in predicting pseudoknots, while performing on par with current methods in the prediction of non-pseudoknotted structures. For RNA sequences of ~ 45 nucleotides, or ~ 90 with minimal heuristics, the complet–e enumeration of possible secondary structures can be accomplished quickly despite the NP-complete nature of the problem.

RNA molecules play physiological roles that extend far beyond translation. In human cells, most RNA molecules are not translated [1]. Non-coding RNAs interact functionally with mRNA [2], DNA [3], and proteins [4], and can be as large as > 200 nucleotides (ntds) [5, 6]. However, a substantial fraction are < 40 ntds in length, including miRNAs and siRNAs, which serve as regulators for the translation of mRNA [2, 7], and piRNAs which form RNA-protein complexes to regulate the germlines of mammals [8]. The in vitro evolution of RNA, especially through SELEX [9–11], has led to an explosion of applications for short RNA molecules, due their ability to tightly and specifically bind to a remarkable range of target ligands [12].

Overwhelmingly, the properties of short non-coding RNA molecules are tied to their three-dimensional, or tertiary, structures [5, 13–16]. Such structures are formed because of the energetic favorability of bonds between complementary nucleotides (primarily A to U, C to G, and G to U). However, these bonds impose an entropic cost; therefore, the conformations most frequently adopted balance the energetic gain of maximal base-pairing with the entropic cost of structural constraints. In equilibrium, the RNA adopts each possible structure with Boltzmann weighted probabilities.

Because of the relevance of RNA structure to function [17, 18], current research aims to predict the minimum free energy structures given the sequence. Algorithms typically predict “secondary structure”, a list of the base pairings [19]. The early Pipas-McMahon RNA structure prediction algorithm sought to completely enumerate and evaluate the free energy of all possible secondary structures, thereby constructing the entire energy landscape [20]. This NP-complete approach was quickly supplanted by dynamic programming, which has since dominated RNA structure prediction [21–25]. These algorithms efficiently consider an enormous number of structures without explicitly generating them, by iteratively finding the optimal structure for subsequences [26].

However, such algorithms have difficulty predicting RNA secondary structures that include pseudoknots, i.e. structural elements with at least two non-nested base pairs (see Fig. S1A for an example) that make up roughly 1.4% of base pairs [26] and are overrepresented in functionally important regions [27] of RNA. Pseudoknots are disallowed from the most popular RNA structure prediction algorithms (e.g. Refs. [28–30]) due to computational cost; indeed, structural prediction including all pseudoknots has been shown to be NP-complete [31–33]. Significant advances have been made with heuristics, which do not guarantee finding the minimum free energy structure [34–38], and by disallowing all but a narrow class of pseudoknots [39–46].

FIG. 1: Schematic overview of the algorithm.

Given an RNA sequence, the algorithm first enumerates all potential stems (sequences of base pairs) which can form. It then searches for all possible combinations of stems, such that no nucleotide is paired with more than one other, thus forming all possible secondary structures. For each structure, it calculates the free energy, which is comprised of a bond energy term and an entropy term. The histogram of free energies for the sequence shown is plotted with an arrow pointing to the Minimum Free Energy (MFE). Given the entire free energy landscape, the algorithm calculates the probability of any arbitrary secondary structure of forming in equilibrium. Finally, we coarse grain over similar structures described by the same topology (described in Section III), arriving at a probability distribution for every possible topology forming in equilibrium.

A major challenge for predicting pseudoknotted structures is the relative lack of experimental data [47]. Thus, up until recently, theoretical approaches have largely been limited to simple H-type pseudoknots [39, 45, 48, 49]. A recent strategy uses machine-learning of large experimental datasets [45, 50, 51]. Although these approaches can be useful, they come with the disadvantages of compounding possible experimental errors, and often using an enormous number of parameters which can hamper generalizability. A sketch of a theoretical description of pseudoknot entropies based on polymer physics was developed by Isambert and Siggia [34, 52]; however, their derivations have not been published.

In this study, we demonstrate that for short RNA sequences, it is possible to exactly solve for the probability that the RNA will fold into any given structure, in eluding those with pseudoknots. Complete enumeration of the RNA structure landscape is feasible even for biologically relevant RNA sequences (Section I). Our approach combines a method based on the work of Isambert and Siggia (Section II) with a novel graph-theoretical depiction of the RNA (Section III) to exactly calculate the entropy of each structure, treating both pseudoknotted and non-pseudoknotted RNA structures equivalently. The entropies of structures of arbitrary complexity can be analytically computed with just two experimentally derived physical parameters: the persistence length of single-stranded RNA, and the volume within which two RNA nucelotides are considered bound. This represents an enormous parameter reduction compared to state-of-the-art algorithms like the Cao-Chen or Dirks-Pierce models, which have 258 and 11 parameters, respectively, for H-type pseudoknots alone, and ~ 18 parameters for non-pseudoknotted loops [51]. We test our model predictions on molecules from the RNAStrand [53], PseudoBase++ [54], and CompraRNA [55] databases and find good agreement with experimental results (Section IV). Although we fit our entropy model to data from non-pseudoknotted structures, we find that our model outper-forms previously published methods in predicting pseudoknots, while performing on par with current methods in the prediction of non-pseudoknotted structures.

I. ENUMERATING RNA STRUCTURES

The Pipas-McMahon algorithm [20] first enumerates all possible secondary structures for a given sequence (sans pseudoknots), and then evaluates the free energy for each, to construct the entire free energy landscape for non-pseudoknotted structures. A major shortcoming is the significant computer time required for long sequences. However, the exponential increase in computer power over the past forty years, coupled with increased appreciation for the physiological and engineering relevance of short RNA strands suggest revisiting this approach. In this section, we describe the process by which we exhaustively enumerate the secondary structures into which an arbitrary given sequence can fold. We first number the nucleotide sequence from 1 to N from the 5’ end. We define an N × N symmetric matrix B which describes which nucleotides can bind to each other: B_i,j = 1 if nucleotides i and j can bind to make base pair i · j (i.e. they belong to the set {(A,U), (C,G), (G,U)}), and 0 otherwise.

Next, we search for all possible stems (strings of consecutive base pairs) that could form. We define a parameter m to be the minimum allowed stem length (m ≥ 1; m = 1 throughout unless otherwise specified). We also impose the physical constraint that hairpins (single-stranded region connecting one end of a stem) have a minimum length of 3 nucleotides. We include not only the longest possible stems that can form, but all contiguous subsets of those stems [56, 57]. We denote the number of stems found by N_stems.

We next define the N_stems × N_stems symmetric compatibility matrix C, where C_p,q = 1 if a structure could be made with both stems p and q (C_q,q = 1 ∀ q). We impose the constraint that each nucleotide may be paired with, at most, one other nucleotide by setting C_p,q = 0 if stems p and q share at least one nucleotide.

Finally, we explicitly enumerate the remaining possible secondary structures by identifying all compatible combinations of stems. Starting from a single stem s₁, we consider stems s₂ where 1 ≤ s₁ < s₂ ≤ N_stems and add the first stem for which C_s₁,s₂ = 1. Then, we repeat the process, adding the first stem s₃ > s₂ compatible with both s₁ and s₂, and so forth, continuing until we can add no more stems. We add the resulting structure, composed of say M stems, to the list of possible structures, then remove the last stem added (to obtain the structure composed of stems s₁, s₂, …, s_M−1) and continue the process. This algorithm returns all possible secondary structures resulting from the primary sequence.

The algorithm described here was implemented in MatLab and all code is available on the GitHub repository https://github.com/ofer-kimchi/RNA-FE-Landscape.

Having completely enumerated the possible secondary structures, we calculate the probabilities that the RNA will fold into each of them by calculating their free energies.

II. CALCULATING FREE ENERGIES

The probability of the RNA sequence folding into a given equilibrium structure σ is given by the Boltzmann factor where β = 1/k_BT (T is the temperature and k_B is Boltzmann’s constant), and the partition function, Z, is defined such that the probability distribution is normalized: Σ_σ p(σ) = 1. Here F_σ, the free energy of structure σ, is a function of the energy E_σ and entropy S_σ of the structure: where we drop the subscripts for notational convenience and introduce Δs to signify that free energies are measured with respect to the free chain. We separate the free energy calculation into the free energy of stems and the free energy of loops.

A. Calculating bond energies

We make the simplifying assumption that the energy ΔE in Eq. (2) is determined solely by the base pairs in the structure, ignoring higher order corrections to the energy. Thus, each stem, s, contributes an energy ΔE_S such that ΔE = Σ_s ΔE_S. To calculate the terms ΔES, we consider nearest-neighbor interactions among base pairs [58]. Previous work has shown it reasonable to include (whenever appropriate) the contribution of unpaired nucleotides on both sides of each stem in the nearest-neighbor terms for the first and last base pairs of the stem [25]. Specifically, we used tabulated parameters for ΔH from Refs. [50, 59, 60], well documented by Turner and Mathews in the Nearest Neighbor Database [61]. Our entropy model (described below) was used in place of the entropies of hairpin, bulge, internal, and multibranch loops and we set the enthalpy terms of these loops (aside from nearest-neighbor interactions) to zero; we did not consider mismatch-mediated coaxial stacking, symmetry penalties or penalties for specific closures of stems; and we implemented coaxial stacking terms in place of terminal mismatches or dangling ends whenever possible in multibranch loops.

B. Calculating entropies

Entropies are calculated as being comprised of two independent parts: the entropic cost of forming stems and the entropic cost of forming loops, such that ΔS = ΔS_loops + Σ_stems ΔS_stem.

The entropies of stems represent the entropy lost when an RNA forms base pairs. This entropy is considered in the same fashion as the energetic parameters (each energetic parameter has an accompanying entropic parameter). Therefore, as for the energies, the entropic parameters consider pairwise RNA base pair interactions, and ΔS_stem thus depends on the specific nucleotides comprising the stem. In contrast, we make the approximation that ΔS_loops is independent of the identities of the nucleotides comprising the single-stranded regions.

III. CALCULATING LOOP ENTROPIES: RNA FEYNMAN DIAGRAMS

We model single-stranded regions comprised of x unpaired nucleotides (ntds) as a random walk of (x + 1)/b steps, where b ≈ 2.4 ntds is the Kuhn length of single stranded RNA [34, 62]. Since the entropic cost of forming base pairs has already been considered in ΔS_stem, for the purposes of calculating ΔS_loops we consider stems as rigid rods. This approximation is justified because of the extremely long persistence length of double-stranded RNA (~ 200 ntds [63]) compared to both single-stranded RNA and the length of any stem we consider.

The entropy of a single-stranded region of length s_i is given by k_B log ω_i(s_i), where ω_i(s_i) is the number of ways of arranging the region consistent with the topology of the overall structure. Defining Ω(s) as the total number of conformations a random walk of length s can take, for a free chain, ω = Ω. For structures which include constraints, ω(s_i) = Ω(s_i) × p(s_i), where p(s_i) is the probability that the random walk of length s_i will yield a conformation consistent with the topology of the overall structure being considered. Since free energies are measured relative to the free chain, factors of Ω cancel out in equations for ΔS_loops (see further discussion in Section S3). The entropy of the single-stranded regions in a given structure is thus given by where s_i is the number of nucleotides in the i^th singlestranded region. The sum is generally over non-independent terms; we will describe how to address these sums via a Feynman diagram-like approach in this section.

As demonstrated in Eq. (3), the physics of the situation are held in p(s), which is best calculated by considering the end-to-end vector of the random walk undergone by the single-stranded RNA, as where we define as the probability of a random walk of length s to have end-to-end vector :

We have assumed s ≫ b in order to arrive at the Gaussian formula above through the central limit theorem. The mean of the Gaussian is zero by symmetry. In order to find the variance we first consider a single step of length b in three dimensions which has variance in the x, y, and z coordinates of b²/3 by symmetry. For a random walk of N = s/b steps, by independence of subsequent steps, the total variance is equal to Nb²/3 = sb/3, leading to Eq. (5).

As described in Section S5 of the supplement, we can systematically consider higher order corrections to Eq. (5) while maintaining its Gaussian nature. Eq. (5) is accurate for non-self-avoiding random walks; self-avoiding random walks cannot be treated analytically in this way. However, for sufficiently short walks, the probability of self-interaction is low. While the accuracy of the assumption s ≫ b does not always hold in the problems considered, we ultimately find very good agreement between results using Eq. (5) and experiment, and that corrections to Eq. (5) as described in Section S5 are negligible.

In order to demonstrate how Eqs. 3–5 are applied, we first consider the simple hairpin loop. Following Jacobson and Stockmayer [65], we allow that base pairing can occur as long as the two nucleotides are within a small volume of one another, where r_s roughly corresponds to the bond length.¹ We assume that r_s is small enough that for all . Therefore, Eqs. 3–5 yield

We have called the LHS of the equation S_closed-net-0 (the zero references the lack of stems enclosed by the loop) following [34, 52] (rather than, say, S_hairpin) to emphasize that this formula is applicable to hairpin loops, bulge loops, internal loops, and multiloops – all of which can be thought of as closed loops of RNA. Aside from the appropriate inclusion of v_s terms to account for the finite and variable width of RNA stems, RNA stems are treated as having negligible width by performing the approximation .

We estimate v_s by fitting experimental measurements of the entropy of hairpin loops of variable lengths to Eq. (6). Although Eq. (6) implies that the entropy of a hairpin should increase monotonically as a function of its length, the experimental measurements are nonmonotonic, and their nonmonotonicity exceeds the error bars [64]. This non-monotonicity may be due to enthalpic effects [66] which were neglected in our analysis following Ref. [25]. Nevertheless, Fig. 2 shows that Eq. (6) gives a reasonable fit to the experimental data with v_s = 0.0201 ± 0.0036 ntds³.² If one ignores all angular dependences of bond formation, this leads to a naive underestimate of the length of a hydrogen bond of 0.56 Å, which nonetheless is well within an order of magnitude of the true length of hydrogen bonds.

FIG. 2: v_s estimated from experimental data.

Experimental estimates for the free energy of hairpin loops of length s from Table 1 of Ref. [64] were converted to entropy estimates (blue points and error bars) by assuming ΔH = 0 as in Ref. [25]. These data were fit to Eq. (6), yielding an estimate of v_s = 0.0201 ± 0.0036 ntds³.

Finally, we consider pseudoknots. To calculate the entropy of a pseudoknot of arbitrary complexity we invent a novel graph formulation inspired by Feynman diagrams from quantum field theory. First, the RNA structure being considered is translated into a graph. Nodes are used to represent the two end points of a stem, and two types of edges represent single- and double-stranded RNA.

Defined in this way, the graph of the RNA structure directly represents the integrals necessary to compute its entropy. The positions of the nodes are integrated over all of space, while the constraints of the structure are included in the integrand: a double-stranded edge of length l between nodes i and j leads to a term , and a single-stranded edge of length s between these nodes leads to a term in the integrand. Note that two bonded nucleotides in isolation are considered a stem of length l → 0.

As a concrete example, we consider the canonical H-type pseudoknot, an instance of which is shown in Fig. 3A (LHS). As we described, its conformational entropy can be calculated by translating the structure into a graph (Fig. 3A RHS), where each node represents the edge of a stem; blue edges represent regions of doublestranded RNA of length l_i; red edges represent regions of single-stranded RNA of length s_i. For example, here, s₃ = 5 ntds, and l₁ =3 ntds. We set the origin of our coordinate system to node 0 and call the distance between node i and the origin r_i. Integrating over the possible placements of nodes 1-3 (while including the constraints of the structure in the integrand as described previously) we obtain the following Gaussian integral formulation of the entropy: where using the assumption s ≫ b, we allow the integrals to extend over all of space. A more comprehensive derivation of this formula, including the origin of the v_s terms, can be found in Section S4. This integral can be calculated analytically (Sec. S5) [34].

FIG. 3: RNA Feynman Diagrams.

(A): The Canonical Pseudoknot An instance of the canonical H-type pseudoknot. Bold lines represent the RNA backbone; thin lines represent Hydrogen bonds. The entropy of this structure can be calculated by converting it to a graph format as shown in RHS of panel. The nodes of the graph represent the first and last base pairs of each stem, and two types of edges represent single- and double-stranded RNA. The graph directly represents the integral in Eq. (7). (B): Graph Decomposition. The entropy of a sample RNA structure (top left) can be computed by converting the structure to a graph as defined in the text (top right). The graph directly represents the integrals necessary to compute the entropy. Separable integrals are represented by graphs which can be disconnected by the removal of any one edge (bottom right). Thus, once appropriate factors of Vs are included (one for each stem in the original structure), the entropy of the structure in question is given by (bottom left) the sum of four closed-nets-0 (originating from the three hairpins and multiloop) and one open-net-0.

Graphs that can be disconnected by the removal of any one edge correspond to separable integrals, and thus to distinct motifs in the RNA structure. The decomposition of a structure into its component graphs is depicted in Fig. 3B for a classical cloverleaf RNA. The RNA in question decomposes into four instances of closed-net-0 (originating from the three hairpins and multiloop) and one instance of an open-net-0, or free chain (which by definition does not affect the entropy). As shown in the figure, once appropriate factors of v_s are included in the integrals (one for each stem) the stems can be treated as having negligible width; thus, nodes which can be removed without changing the topology can be removed in the graph decomposition process. See Section S4 for further discussion.

In Fig. S2 we display all possible graphs of up to two stems and their respective RNA structures. As in Fig. 3, single-stranded edges are displayed with red; doublestranded with blue. For each graph, the integral formulation of its entropy is displayed in the figure alongside what it evaluates to.

IV. COMPARISON WITH PUBLISHED TOOLS

We use experimentally determined structures to compare the predictions of our model with other current methods; results are shown in Fig. 4. For sequences of length ≤ 80 ntds from the RNAStrand [53], PseudoBase++ [54], and CompraRNA [55] databases (186 non-pseudoknotted structures with 58 different topologies; 235 pseudoknotted structures with 52 different topologies) which had a sequence dissimilarity ≥ 0.2 (using Jukes-Cantor) we measured the number of base pairs correctly predicted by our algorithm’s MFE structure compared to fourteen other current algorithms. Seven of these cannot predict pseudoknots and serve as useful benchmarks for the non-pseudoknotted results, (detailed methods in Section S1).

While the entropy model presented here can give an integral expression for arbitrarily complex pseudoknots, the integral may need to be solved numerically for sufficiently complex structures. For this large-scale comparison we disallowed pseudoknots more complex than those displayed in Fig. S2, and our algorithm therefore did not require any numerical integration. We similarly disallowed parallel stems which can be stable in neutral and acidic pH conditions [73]. We also set the minimum stem length for each sequence (m) to the minimum value it could take such that the total number of possible stems is less than . These choices were all made to speed up computation time; each sequence took between several seconds and ~ an hour to run on a MacBook Pro 2012 laptop. Details of the computation time of our algorithm can be found in Fig. S4.

FIG. 4: Summary statistics for comparison to other prediction tools.

To assess the relative success of our algorithm, we compare its performance to that of 14 other current prediction tools: RNAFold [29, 67], ViennaRNA (Andronescu parameters) [68], Mfold [28], CONTRAFold [69], PPfold [70], CentroidFold [71], ContextFold [72], HotKnots (Dirks-Pierce parameters), HotKnots (Rivas-Eddy parameters), HotKnots (Cao-Chen parameters) [51], ProbKnot [37], pknots [39], RNAPKplex [29, 67], and ILM [35]. We measure sensitivity, PPV, the fraction of topologies predicted correctly by the MFE structure, the average per-base topology accuracy (defined in the main text), and the proportion of the time the MFE structure contains a pseudoknot. We separate the results for sequences which form into pseudoknots and those which don’t. Error bars show the standard error. Despite the fact that our algorithm requires only two parameters to describe the entropy of any arbitrary secondary structure (at least an order of magnitude – and often several – fewer than the other algorithms tested against), and that the parameters were trained on non-pseudoknotted structures, our algorithm outperforms the other algorithms tested in predicting pseudoknotted structures, and performs on par with them in predicting non-pseudoknotted structures. See main text for further discussion.

While these practical constraints were chosen to speed up the computation time, they also led to errors in the algorithm’s predictions. 64 of the tested pseudoknots were topologically more complex than any of those presented in Fig. S2. Furthermore, 33 of the non-pseudoknotted sequences tested (and 8 of the pseudoknotted) include base pairs outside of those allowed by the algorithm (A·U, G·C, and G·U). Removing such structures from our comparison analysis leads to our algorithm performing even better compared to current tools (see Fig. S3).

Further errors were due to our choice of m, which was not optimized and was too high compared to the length of the shortest stem in the experimental structure for 58 non-pseudoknotted cases and 54 pseudoknotted. By changing from 150 to 200, these numbers decreased to 46 for both pseudoknotted and non-pseudoknotted sequences, but the results for were practically identical to the results of Fig. 4 (see full results in Supplementary Table 1). For , the computation time was increased significantly (to ~ 17 hours for one sequence).

The sensitivity (TP/TP + FN) and PPV (TP/TP + FP) of our algorithm were measured to be 0.80 and 0.75 for the non-pseudoknotted cases, and 0.75 and 0.76 for the pseudoknotted cases, respectively. Our algorithm outperformed all other prediction tools tested for the prediction of pseudoknots, and on par with other tools in the prediction of non-pseudoknotted sequences. The full results can be found in Supplementary Table S1.

While sensitivity and PPV are the most common metrics used to establish the success of an RNA prediction algorithm [74], we sought to develop a test that measures success on the scale of the full RNA, rather than on the scale of individual base pairs. To this end, we measured how frequently each algorithm was able to correctly predict the topology of the experimentally measured structure, where the topology of a structure is defined by its graph (Section III). We found for our algorithm that the experimental topology is within the top 1, 5, and 10 topologies at frequencies of (49%, 65%, and 70%) for nonpseudoknotted structures, and (34%, 59%, and 62%) for pseudoknotted, demonstrating a sharp increase between top 1 and top 5, and a plateau between top 5 and top 10.

Considering whether an algorithm correctly predicts the full topology can lead to errors arising from small variations in structure. For example, the opening of a single bond on the edge of a stem can lead to a different topology as we’ve defined it, if that stem includes one of the ends of the molecule. In order to arrive at a per-base measure of topology, we consider for each bond along the RNA backbone to which of the minimal graphs of Fig. S2 it belongs. For example, the bond between the second and third nucleotides of Fig. 3A belong to a stem of an open-net-2a graph. We then measure for each sequence the fraction of correct per-base topology predictions made by each algorithm’s predicted MFE structure. We find that our algorithm averages an 76% per-base topology prediction accuracy for non-pseudoknotted sequences, and a 49% accuracy for pseudoknotted.

Finally, we compare how frequently each algorithm predicts an MFE structure containing a pseudoknot. Our algorithm correctly predicted 174/235 pseudoknots among the pseudoknotted cases, far more than any other algorithm tested. However, it also erroneously predicted 35/186 incorrect pseudoknots among the nonpseudoknotted cases. We have found that the probability of predicting pseudoknots can be significantly decreased with minor changes in the Turner parameters energy function, and these parameters may need to be re-examined in order to be used most effectively with the entropy model presented here.

Our algorithm also provides the probability of folding into a pseudoknotted structure for each sequence. These data for the 421 sequences tested are presented in Fig. 5. Each datapoint represents a different sequence and the total probability calculated of that sequence folding into a pseudoknotted structure. For figure clarity, a lower bound of pseudoknot probability was set at 2 × 10⁻¹⁰.

FIG. 5: Probability of folding into a pseudoknot.

The predicted probability of each of the 421 sequences tested folding into a pseudoknot is presented. Of these sequences, 186 were experimentally found not to form pseudoknots (blue) and 235 were found to form pseudoknots (red). Our algorithm successfully predicts pseudoknots forming in the latter category far more frequently than in the former. For figure clarity, a lower bound of pseudoknot probability was set at 2 × 10⁻¹⁰.

The algorithm’s predictions for the six longest RNA molecules less than 89 ntds in length from the Pseudobase++ database are presented in Fig. 6. We considered only those sequences whose structure was directly supported by experiments and which could be decomposed into the minimal topologies shown in Fig. S2. We display the experimental structure (green background) alongside the MFE predicted structure (light blue background) and the top six predicted topologies (out of several hundred, depending on the sequence; dark blue) where the experimental topology is highlighted (purple). RNA secondary structure was plotted using the PseudoViewer package [83]. Our results demonstrate successful predictions even for long pseudoknotted sequences, especially in terms of the predicted topology. Detailed methods are provided in Section S1.

FIG. 6: Comparison to experiments for long sequences.

Six long sequences were chosen from the Pseudobase++ database as described in the main text. The sequences are derived from (starting from the top left and moving across): tobacco mosaic virus [75–77], Bacillus subtilis, [78], tobacco mild green mosaic virus [76, 79], Bacilis subtilis [80], Giardiavirus [81], and Visna-Maedi virus [82]. We show the experimental structure (green background) and the MFE predicted structure (light blue background) plotted using the PseudoViewer software [83]. We also display the top six topologies (out of several hundred, depending on the particular sequence) and their respective predicted probabilities, with the topology corresponding to the experimental structure highlighted in purple. Overall, our results demonstrate successful predictions even for these long pseudoknotted sequences, especially in terms of the predicted topology.

V. DISCUSSION AND CONCLUSIONS

The accurate prediction of the ensemble of secondary structures explored by an RNA or DNA molecule has played a major role in shaping modern molecular biology and DNA nanotechnology over the past several decades. In this work, we showed that the modern ubiquity of extremely powerful computers can be used alongside novel polymer physics techniques to completely enumerate the free energy landscape of an RNA molecule including complex pseudoknots. This NP-complete algorithm can be used to tackle even relatively long (~ 90 ntds) RNA sequences, and aside from the enumeration procedure (which is relatively fast for long sequences; see Fig. S4) is easily parallelizable.

Remarkably, the entropy model discussed in this work requires only two parameters – orders of magnitude fewer than other current algorithms – corresponding to clearly measurable physical quantities. Despite this, and despite the fact that all parameters used in our model were derived using experiments on non-pseudoknotted RNA, our algorithm is more successful in predicting pseudoknotted structures than any of the other algorithms tested, and on par with all predictors tested in predicting non-pseudoknotted structures. Although we have not done so in this work, we expect that our results can be even further improved by optimizing the energy function given the entropy model presented here. The success of our algorithm is particularly notable given that the entropy model developed in this work can be used to address any RNA secondary structure regardless of complexity.

The algorithm presented here can also be easily generalized to probe multiple interacting strands (see discussion in supplement). The sequences considered can be any combination of DNA and RNA; their identities affects the energy parameters of the model which have been previously tabulated, and to a lesser extent the two entropy parameters (b and v_s).

Our finding that the integral formulation of the entropy of arbitrary complex RNA secondary structures can be represented graphically is reminiscent of Feynman diagrams in quantum field theory. The topologies defined by these graphs can also serve as useful biological constructs to group similar RNA structures together. The depiction of RNA structure as a graph has played an important role in the prediction of RNA secondary structure [84–87], as well as in the search for novel RNAs [88, 89], and the description of similarity between RNA structures [90–93] which is especially useful in the study of the effects of mutations [94, 95]. A common approach among these graphical depictions of RNA has been to represent loops (e.g. hairpins, internal loops, etc.) as verticies and stems as edges [88, 92, 93]. However, this depiction of RNA does not always distinguish between pseudoknotted and non-pseudoknotted structures [88]. Other approaches have represented each nucleotide as a separate node and bonds (either hydrogen or covalent) as edges [89, 91]; while useful in many contexts (for example, secondary structure visualization), this approach does not have the benefit of coarse-graining to group similar structures as the same graph [90]. Our approach, described in Section III, can be viewed as a middle ground and may be useful in the contexts described previously.

VI. ACKNOWLEDGEMENTS

We thank Elena Rivas and Yohai Bar Sinai for fruitful discussions. This research was funded by the National Science Foundation through the Harvard Materials Research Science and Engineering Center Grant DMR-1420570, DMREF Grant DMR-123869 and ONR Grant N00014-17-1-3029. OK acknowledges support from an NDSEG fellowship and Molecular Biophysics Training Grant NIH/NIGMS T32 GM008313 (PI: James M. Hogle). M.P.B. is an investigator of the Simons Foundation.

Footnotes

↵* Electronic address: okimchi{at}g.harvard.edu
↵† Electronic address: ljc37{at}cam.ac.uk
↵1 More generally, we can define a probability of a nucleotide at the origin being base paired with a nucleotide a vector away. Then, v_s is defined as and r_s is the value of for which is non-negligible.
2 A more precise definition of v_s might include a dependence on the closing base pairs of the hairpin loop; we expect that the penalties placed on specific closing base pairs and first mismatches in e.g. Refs. [64] and [25] play a similar role, though such penalties were not included here.

References

[1].↵
Philipp Kapranov, Jill Cheng, Sujit Dike, David A. Nix, Radharani Duttagupta, Aarron T. Willingham, Peter F. Stadler, Jana Hertel, Jörg Hackermüller, Ivo L. Hofacker, Ian Bell, Evelyn Cheung, Jorg Drenkov, Erica Dumais, Sandeep Patel, Gregg Helt, Madhavan Ganesh, Srinka Ghosh, Antonio Piccolboni, Victor Sementchenko, Hari Tammana, and Thomas R. Gingeras. RNA Maps Reveal New RNA Classes and a Possible Function for Pervasive Transcription. Science, 316(5830):1484–1488, 2007.
OpenUrl Abstract/FREE Full Text
[2].↵
Yukinori Okada, Tomoki Muramatsu, Naomasa Suita, Masahiro Kanai, Eiryo Kawakami, Valentina Iotchkova, Nicole Soranzo, Johji Inazawa, and Toshihiro Tanaka. Significant impact of miRNA-target gene networks on genetics of human complex traits. Scientific Reports, 6:1–9, 2016.
OpenUrl
[3].↵
Bharat Sridhar, Marcelo Rivas-Astroza, Tri C. Nguyen, Weizhong Chen, Zhangming Yan, Xiaoyi Cao, Lucie Hebert, and Sheng Zhong. Systematic Mapping of RNA-Chromatin Interactions In Vivo. Current Biology, 27(4):602–609, 2017.
OpenUrl CrossRef
[4].↵
F. Butter, M. Scheibe, M. Morl, and M. Mann. Unbiased RNA-protein interaction screen by quantitative proteomics. Proceedings of the National Academy of Sciences, 106(26):10626–10631, 2009.
OpenUrl Abstract/FREE Full Text
[5].↵
Stefan E Seemann, Susan M Sunkin, Michael J Hawrylycz, Walter L Ruzzo, and Jan Gorodkin. Transcripts with in silico predicted RNA structure are enriched everywhere in the mouse brain. BMC Genomics, 13(214), 2012.
[6].↵
Tim R. Mercer, Marcel E. Dinger, and John S. Mattick. Long non-coding RNAs: Insights into functions. Nature Reviews Genetics, 10(3):155–159, 2009.
OpenUrl CrossRef PubMed Web of Science
[7].↵
Michael T. McManus and Phillip A. Sharp. Gene silencing in mammals by small interfering RNAs. Nature Reviews Genetics, 3(10):737–747, 2002.
OpenUrl CrossRef PubMed Web of Science
[8].↵
Celina Juliano, Jianquan Wang, and Haifan Lin. Uniting Germline and Stem Cells: The Function of Piwi Proteins and the piRNA Pathway in Diverse Organisms. Annual Review of Genetics, 45(1):447–469, 2011.
OpenUrl CrossRef PubMed Web of Science
[9].↵
Andrew D. Ellington and Jack W. Szostak. In vitro selection of RNA molecules that bind specific ligands. Nature, 346:818–822, 1990.
OpenUrl CrossRef PubMed Web of Science
[10].
C Tuerk and L Gold. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science, 249(4968):505–510, 1990.
OpenUrl Abstract/FREE Full Text
[11].↵
Debra L. Robertson and Gerald F. Joyce. Selection in vitro of an RNA enzyme that specifically cleaves single-stranded DNA. Nature, 344(6265):467–468, 1990.
OpenUrl CrossRef PubMed Web of Science
[12].↵
Charles Olea and Gerald F. Joyce. Real-Time detection of a self-replicating RNA Enzyme. Molecules, 21(10):1–12, 2016.
OpenUrl CrossRef
[13].↵
Miriam H. Huntley, Arvind Murugan, and Michael P. Brenner. Information capacity of specific interactions. Proceedings of the National Academy of Sciences, 113(21):5841–5846, 2016.
OpenUrl Abstract/FREE Full Text
[14].
Vera Pancaldi and Jürg Bäahler. In silico characterization and prediction of global protein-mRNA interactions in yeast. Nucleic Acids Research, 39(14):5826–5836, 2011.
OpenUrl CrossRef PubMed Web of Science
[15].
Jiamin Xiao, Yizhou Li, Kelong Wang, Zhining Wen, Menglong Li, Lifang Zhang, and Xuanmin Guang. In silico method for systematic analysis of feature importance in microRNA-mRNA interactions. BMC Bioinformatics, 10:1–13, 2009.
OpenUrl CrossRef PubMed
[16].↵
Nancy Martínez-Montiel, Laura Morales-Lara, Julio M. Hernández-Pérez, and Rebeca D. Martínez-Contreras. In silico analysis of the structural and biochemical features of the NMD factor UPF1 in Ustilago maydis. PLoS ONE, 11(2):1–26, 2016.
OpenUrl CrossRef PubMed
[17].↵
P O Ilyinskii, T Schmidt, D Lukashev, A B Meriin, G Thoidis, D Frishman, and A M Shneider. Importance of mRNA secondary structural elements for the expression of influenza virus genes. Omics, 13(5):421–430, 2009.
OpenUrl CrossRef PubMed Web of Science
[18].↵
R. A. Poot, N. V. Tsareva, I. V. Boni, and J. van Duin. RNA folding kinetics regulates translation of phage MS2 maturation gene. Proceedings of the National Academy of Sciences, 94(19):10110–10115, 1997.
OpenUrl Abstract/FREE Full Text
[19].↵
Maximillian H Bailor, Xiaoyan Sun, and Hashim M. Al-Hashimi. Topology links RNA secondary structure with global conformation, dynamics, and adaptation. Science, 327(5962):202–206, 2010.
OpenUrl Abstract/FREE Full Text
[20].↵
J M Pipas and J E McMahon. Method for predicting RNA secondary structure. Proceedings of the National Academy of Sciences, 72(6):2017–2021, 1975.
OpenUrl Abstract/FREE Full Text
[21].↵
Michael S. Waterman. Secondary Structure of Single-Stranded Nucleic Acidst. Studies in Foundations and Combinatorics, Advances in Mathematics Supplementary Studies, 1:167–212, 1978.
OpenUrl
[22].
Michael S. Waterman and Temple F. Smith. Rapid dynamic programming algorithms for RNA secondary structure. Advances in Applied Mathematics, 7(4):455–464, 1986.
OpenUrl
[23].
Ruth Nussinov, George Pieczenik, Jerrold R. Griggs, and Daniel J. Kleitman. Algorithms for Loop Matchings. SIAM Journal on Applied Mathematics, 35(1):68–82, 1978.
OpenUrl CrossRef Web of Science
[24].
Michael Zuker and Patrick Stiegler. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Research, 9(1):133–148, 1981.
OpenUrl CrossRef PubMed Web of Science
[25].↵
Martin J. Serra and Douglas H. Turner. Predicting thermodynamic properties of RNA. Methods in Enzymology, 259:242–261, 1995.
OpenUrl CrossRef PubMed Web of Science
[26].↵
David H. Mathews and Douglas H. Turner. Prediction of RNA secondary structure by free energy minimization. Current Opinion in Structural Biology, 16(3):270–278, 2006.
OpenUrl CrossRef PubMed Web of Science
[27].↵
C. E. Hajdin, S. Bellaousov, W. Huggins, C. W. Leonard, D. H. Mathews, and K. M. Weeks. Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots. Proceedings of the National Academy of Sciences, 110(14):5498–5503, 2013.
OpenUrl Abstract/FREE Full Text
[28].↵
Michael Zuker. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Research, 31(13):3406–3415, 2003.
OpenUrl CrossRef PubMed Web of Science
[29].↵
I. L. Hofacker, W. Fontana, P. F. Stadler, L. S. Bonhoeffer, M. Tacker, and P. Schuster. Fast folding and comparison of RNA secondary structures. Monatshefte für Chemie, 125(2):167–188, 1994.
OpenUrl
[30].↵
Raheleh Salari, Chava Kimchi-Sarfaty, Michael M. Gottesman, and Teresa M. Przytycka. Sensitive measurement of single-nucleotide polymorphism-induced changes of RNA conformation: Application to disease studies. Nucleic Acids Research, 41(1):44–53, 2013.
OpenUrl CrossRef PubMed Web of Science
[31].↵
Rune B. Lyngsø and Christian N. S. Pedersen.Pseudoknots in RNA Secondary Structures. Proceedings of the fourth annual international Conference on Computational Molecular Biology,, pages201–209, 2000.
[32].
Rune B. Lyngsø and Christian N. S. Pedersen. RNA Pseudoknot Prediction in Energy-Based Models. Journal of Computational Biology, 7(3-4):409–427, 2000.
OpenUrl CrossRef PubMed Web of Science
[33].↵
Biao Liu, David H Mathews, and Douglas H Turner. RNA pseudoknots: folding and finding. F1000 Biology Reports, 5(January):1–5, 2010.
OpenUrl
[34].↵
H. Isambert and E. D. Siggia. Modeling RNA folding paths with pseudoknots: Application to hepatitis delta virus ribozyme. Proceedings of the National Academy of Sciences, 97(12):6515–6520, 2000.
OpenUrl Abstract/FREE Full Text
[35].↵
Jianhua Ruan, Gary D. Stormo, and Weixiong Zhang. An Iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots. Bioinformatics, 20(1):58–66, 2004.
OpenUrl CrossRef PubMed Web of Science
[36].
Jihong Ren, Baharak Rastegari, Anne Condon, and Holger H Hoos. HotKnots : Heuristic prediction of RNA secondary structures including pseudoknots HotKnots : Heuristic prediction of RNA secondary structures including pseudoknots.RNA, 11(1):1494–1504, 2005.
OpenUrl Abstract/FREE Full Text
[37].↵
S. Bellaousov and D. H. Mathews. ProbKnot: Fast prediction of RNA secondary structure including pseudo-knots. RNA, 16(10):1870–1880, 2010.
OpenUrl Abstract/FREE Full Text
[38].↵
Kengo Sato, Yuki Kato, Michiaki Hamada, Tatsuya Akutsu, and Kiyoshi Asai. IPknot: Fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. Bioinformatics, 27(13):85–93, 2011.
OpenUrl CrossRef
[39].↵
Elena Rivas and Sean R. Eddy. A dynamic programming algorithm for RNA structure prediction including pseudoknots. Journal of molecular biology, 285(5):2053–2068, 1999.
OpenUrl CrossRef PubMed Web of Science
[40].
Yasuo Uemura, Aki Hasegawa, Satoshi Kobayashi, and Takashi Yokomori. Tree adjoining grammars for RNA structure prediction. Theoretical Computer Science, 210(2):277–303, 1999.
OpenUrl
[41].
Tatsuya Akutsu. Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Discrete Applied Mathematics, 104(1-3):45–62, 2000.
OpenUrl CrossRef Web of Science
[42].
Anne Condon, Beth Davy, Baharak Rastegari, Shelly Zhao, and Finbarr Tarrant. Classifying RNA pseudoknotted structures. Theoretical Computer Science, 320(1):35–50, 2004.
OpenUrl
[43].
Robert M Dirks and Niles A. Pierce. A partition function algorithm for nucleic acid secondary structure including pseudoknots. Journal of Computational Chemistry, 24(13):1664–1677, 2003.
OpenUrl CrossRef PubMed Web of Science
[44].
Jens Reeder and Robert Giegerich. Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics. BMC Bioinformatics, 5:1–12, 2004.
OpenUrl CrossRef PubMed Web of Science
[45].↵
Song Cao and Shi Jie Chen. Predicting RNA pseudo-knot folding thermodynamics. Nucleic Acids Research, 34(9):2634–2652, 2006.
OpenUrl CrossRef PubMed Web of Science
[46].↵
Song Cao and Shi-Jie Chen. Predicting structures and stabilities for H-type pseudoknots with interhelix loops. RNA, 15(4):696–706, 2009.
OpenUrl Abstract/FREE Full Text
[47].↵
F H van Batenbur, a P Gultyaev, C W Pleij, J Ng, and J Oliehoek. PseudoBase: a database with RNA pseudoknots. Nucleic Acids Research, 28(1):201–204, 2000.
OpenUrl CrossRef PubMed Web of Science
[48].↵
Daniel P. Aalberts and Nathan O. Hodas. Asymmetry in RNA pseudoknots: Observation and theory. Nucleic Acids Research, 33(7):2210–2214, 2005.
OpenUrl CrossRef PubMed Web of Science
[49].↵
Adam Lucas and Ken A. Dill. Statistical mechanics of pseudoknot polymers. Journal of Chemical Physics, 119(4):2414–2421, 2003.
OpenUrl CrossRef Web of Science
[50].↵
D H Mathews, J Sabina, M Zuker, and D H Turner. Expanded sequence dependence of thermodynamic paramenters improves prediction of RNA secondary structure. J. Mol. Biol., 288:911–940, 1999.
OpenUrl CrossRef PubMed Web of Science
[51].↵
Mirela S Andronescu, Cristina Pop, and Anne E Condon. Improved free energy parameters for RNA pseudo-knotted secondary structure prediction. RNA, 16(1):26–42, 2010.
OpenUrl Abstract/FREE Full Text
[52].↵
A. Xayaphoummine, T. Bucher, and Herve Isambert. Kinefold web server for RNA/DNA folding path and structure prediction including pseudoknots and knots. Nucleic Acids Research, 33(SUPPL. 2):605–610, 2005.
OpenUrl CrossRef PubMed Web of Science
[53].↵
Mirela Andronescu, Vera Bereg, Holger H. Hoos, and Anne Condon. RNA STRAND: The RNA secondary structure and statistical analysis database. BMC Bioinformatics, 9:1–10, 2008.
OpenUrl CrossRef PubMed
[54].↵
Michela Taufer, Abel Licon, Roberto Araiza, David Mireles, F. H D van Batenburg, Alexander P. Gultyaev, and Ming Ying Leung. PseudoBase++: An extension of PseudoBase for easy searching, formatting and visualization of pseudoknots. Nucleic Acids Research, 37(SUPPL. 1):127–135, 2009.
OpenUrl
[55].↵
Tomasz Puton, Lukasz P. Kozlowski, Kristian M. Rother, and Janusz M. Bujnicki. CompaRNA: A server for continuous benchmarking of automated methods for RNA secondary structure prediction. Nucleic Acids Research, 41(7):4307–4323, 2013.
OpenUrl CrossRef PubMed Web of Science
[56].↵
Gary M Studnicka, Georgia M Rahn, Ian W Cummings, and Winston A Salser. Computer method for predicting the secondary structure of single-stranded RNA. Nucleic Acids Research, 5(9):3365–3388, 1978.
OpenUrl CrossRef PubMed Web of Science
[57].↵
Michael Zuker and D Sankoff. RNA secondary structures and their prediction. Bulletin of Mathematical Biology, 46(4):591–621, 1984.
OpenUrl CrossRef Web of Science
[58].↵
William Bialek and Rama Ranganathan. Rediscovering the power of pairwise interactions. arXiv, 2007.
[59].↵
Tianbing Xia, John SantaLucia, Mark E. Burkard, Ryszard Kierzek, Susan J. Schroeder, Xiaoqi Jiao, Christopher Cox, and Douglas H. Turner. Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson Crick base pairs. Biochemistry, 37(42):14719–14735, 1998.
OpenUrl CrossRef PubMed Web of Science
[60].↵
1. Dieter Soll,
2. Susumu Nishimura, and
3. Peter B. Moore
Tianbing Xia, David H. Mathews, and Douglas H. Turner. Thermodynamics of RNA Secondary Structure Formation. In Dieter Soll, Susumu Nishimura, and Peter B. Moore, editors, RNA, chapter 2, pages 21–48. Pergamon, 1 edition, 2001.
[61].↵
Douglas H. Turner and David H. Mathews. NNDB: The nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Research, 38(SUPPL.1):2009–2011, 2009.
OpenUrl
[62].↵
S. B. Smith, Y. Cui, and C. Bustamante. Overstretching B-DNA: The Elastic Response of Individual Double-Stranded and Single-Stranded DNA Molecules. Science, 271(5250):795–799, 1996.
OpenUrl Abstract
[63].↵
J. A. Abels, F. Moreno-Herrero, T. Van Der Heijden, C. Dekker, and Nynke H. Dekker. Single-molecule measurements of the persistence length of double-stranded RNA. Biophysical Journal, 88(4):2737–2744, 2005.
OpenUrl CrossRef PubMed Web of Science
[64].↵
D. H. Mathews, M. D. Disney, J. L. Childs, S. J. Schroeder, M. Zuker, and D. H. Turner. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proceedings of the National Academy of Sciences, 101(19):7287–7292, 2004.
OpenUrl Abstract/FREE Full Text
[65].↵
Homer Jacobson and Walter H. Stockmayer. In-tramolecular reaction in polycondensations. I. The theory of linear systems. The Journal of Chemical Physics, 18(12):1600–1606, 1950.
OpenUrl CrossRef Web of Science
[66].↵
Zhi John Lu, Douglas H. Turner, and David H. Math-ews. A set of nearest neighbor parameters for predicting the enthalpy change of RNA secondary structure formation. Nucleic Acids Research, 34(17):4912–4924, 2006.
OpenUrl CrossRef PubMed Web of Science
[67].↵
Ronny Lorenz, Stephan H. Bernhart, Christian Höner zu Siederdissen, Hakim Tafer, Christoph Flamm, Peter F. Stadler, and Ivo L. Hofacker. ViennaRNA Package 2.0. Algorithms for Molecular Biology, 6(1):1–14, 2011.
OpenUrl
[68].↵
Mirela Andronescu, Anne Condon, Holger H. Hoos, David H. Mathews, and Kevin P. Murphy. Efficient parameter estimation for RNA secondary structure prediction. Bioinformatics, 23(13):19–28, 2007.
OpenUrl CrossRef
[69].↵
Chuong B. Do, Daniel A. Woods, and Serafim Batzoglou. CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics, 22(14):90–98, 2006.
OpenUrl CrossRef
[70].↵
Zsuzsanna Sükösd, Bjarne Knudsen, Jorgen Kjems, and Christian N.S. Pedersen. PPfold 3.0: Fast RNA secondary structure prediction using phylogeny and auxiliary data. Bioinformatics, 28(20):2691–2692, 2012.
OpenUrl CrossRef PubMed
[71].↵
Kengo Sato, Michiaki Hamada, Kiyoshi Asai, and Toutai Mituyama. CentroidFold: A web server for RNA secondary structure prediction. Nucleic Acids Research, 37(SUPPL. 2):277–280, 2009.
OpenUrl
[72].↵
S Zakov, Y Goldberg, M Elhadad, and M Ziv-Ukelson. Rich parameterization improves RNA structure prediction. Journal of Computational Biology, 18(11):1525–1542, 2011.
OpenUrl CrossRef PubMed
[73].↵
V Rani Parvathy, Sukesh R Bhaumik, Kandala V R Chary, Girjesh Govil, Keliang Liu, Frank B Howard, and H Todd Miles. NMR structure of a parallel-stranded DNA duplex at atomic resolution. Nucleic Acids Research, 30(7):1500–1511, 2002.
OpenUrl CrossRef PubMed Web of Science
[74].↵
Z. J. Lu, J. W. Gloor, and D. H. Mathews. Improved RNA secondary structure prediction by maximizing expected pair accuracy. RNA, 15(10):1805–1813, 2009.
OpenUrl Abstract/FREE Full Text
[75].↵
K Rietveld, K Linschooten, C W Pleij, and L Bosch. The three-dimensional folding of the tRNA-like structure of tobacco mosaic virus RNA. A new building principle applied twice. The EMBO journal, 3(11):2613–9, 1984.
OpenUrl
[76].↵
Ruud M.W. Mans, Cornelis W.A. Pleij, and Leendert Bosch. tRNAlike structures: Structure, function and evolutionary significance. European Journal of Biochemistry, 201(2):303–324, 1991.
OpenUrl PubMed Web of Science
[77].↵
B Felden, C Florentz, R Giegé, and E Westhof. A central pseudoknotted three-way junction imposes tRNA-like mimicry and the orientation of three 5’ upstream pseudoknots in the 3’ terminus of tobacco mosaic virus RNA. RNA, 2(3):201–12, 1996.
OpenUrl Abstract
[78].↵
Garrett A. Soukup. Core requirements for glmS ribozyme self-cleavage reveal a putative pseudoknot structure. Nucleic Acids Research, 34(3):968–975, 2006.
OpenUrl CrossRef PubMed Web of Science
[79].↵
Fernando García-Arenal. Sequence and structure at the genome 3’ end of the U2-strain of tobacco mosaic virus, a histidine-accepting tobamovirus. Virology, 167(1):201–206, 1988.
OpenUrl CrossRef PubMed
[80].↵
S R Wilkinson and M D Been. A pseudoknot in the 3’ non-core region of the glmS ribozyme enhances self-cleavage activity. RNA, 11(12):1788–1794, 2005.
OpenUrl Abstract/FREE Full Text
[81].↵
Srinivas Garlapati and Ching C. Wang. Identification of an essential pseudoknot in the putative downstream internal ribosome entry site in giardiavirus transcript. RNA, 8(5):601–611, 2002.
OpenUrl Abstract
[82].↵
Simon Pennell, Emily Manktelow, Andrew Flatt, Geoff Kelly, Stephen J Smerdon, and Ian Brierley. The stimulatory RNA of the Visna-Maedi retrovirus ribosomal frameshifting signal is an unusual pseudoknot with an interstem element. RNA, 14(7):1366–77, 2008.
OpenUrl Abstract/FREE Full Text
[83].↵
Yanga Byun and Kyungsook Han. PseudoViewer3: Generating planar drawings of large-scale RNA structures with pseudoknots. Bioinformatics, 25(11):1435–1437, 2009.
OpenUrl CrossRef PubMed Web of Science
[84].↵
Denise R. Koessler, Debra J. Knisley, Jeff Knisley, and Teresa Haynes. A predictive model for secondary RNA structure using graph theory and a neural network. BMC Bioinformatics, 11(SUPPL. 6):1–10, 2010.
OpenUrl CrossRef PubMed
[85].
Michaël Bon and Henri Orland. TT2NE: A novel algorithm to predict RNA secondary structures with pseudoknots. Nucleic Acids Research, 39(14), 2011.
[86].
Henri Orland and A. Zee. RNA folding and large N matrix theory. Nuclear Physics B, 620(3):456–476, 2002.
OpenUrl CrossRef Web of Science
[87].↵
Jizhen Zhao, Russell L. Malmberg, and Liming Cai. Rapid ab initio prediction of RNA pseudoknots via graph tree decomposition. Journal of Mathematical Biology, 56(1-2):145–159, 2008.
OpenUrl CrossRef PubMed Web of Science
[88].↵
Hin Hark Gan, Samuela Pasquali, and Tamar Schlick. Exploring the repertoire of RNA secondary motifs using graph theory; implications for RNA design. Nucleic Acids Research, 31(11):2926–2943, 2003.
OpenUrl CrossRef PubMed Web of Science
[89].↵
Christian Laing and Tamar Schlick. Computational approaches to RNA structure prediction, analysis, and design. Current Opinion in Structural Biology, 21(3):306–318, 2011.
OpenUrl CrossRef PubMed
[90].↵
C. Haslinger and P. F. Stadler. RNA structures with pseudo-knots: Graph-theoretical, combinatorial, and statistical properties. Bulletin of Mathematical Biology, 61(3):437–467, 1999.
OpenUrl CrossRef PubMed Web of Science
[91].↵
Clara I. Bermúdez, Edgar E. Daza, and Eugenio Andrade. Characterization and comparison of Escherichia coli transfer RNAs by graph theory based on secondary structure. Journal of Theoretical Biology, 197(2):193–205, 1999.
OpenUrl CrossRef PubMed Web of Science
[92].↵
Giorgio Benedetti and Stefano Morosetti. A graph-topological approach to recognition of pattern and similarity in RNA secondary structures. Biophysical Chemistry, 59(1-2):179–184, 1996.
OpenUrl CrossRef PubMed Web of Science
[93].↵
Shu Yun Le, Ruth Nussinov, and Jacob V. Maizel. Tree graphs of RNA secondary structures and their comparisons. Computers and Biomedical Research, 22(5):461–473, 1989.
OpenUrl
[94].↵
Walter Fontana and Peter Schuster. Continuity in evolution: On the nature of transitions. 280(5368):1451–1455, 1998.
[95].↵
Lauren W Ancel and Walter Fontana. Plasticity, Evolability and Modularity in RNA. Journal of Experimental Zoology, 288(3):242–283, 2000.
OpenUrl CrossRef PubMed Web of Science
[96].↵
Mathai Mammen, Eugene I. Shakhnovich, John M. Deutch, and George M. Whitesides. Estimating the Entropic Cost of Self-Assembly of Multiparticle Hydrogen-Bonded Aggregates Based on the Cyanuric Acid-Melamine Lattice. Journal of Organic Chemistry, 63(12):3821–3830, 1998.
OpenUrl CrossRef Web of Science
[97].↵
Huan-xiang Zhou and Michael K Gilson. Theory of Free Energy and Entropy in Noncovalent Binding. Chemical Science, Reviews, 109(9):4092–4107, 2009.
OpenUrl
[98].↵
Hatim T. Allawi and John SantaLucia. Thermodynamics and NMR of internal G·T mismatches in DNA. Biochemistry, 36(34):10581–10594, 1997.
OpenUrl CrossRef PubMed Web of Science
[99].
Hatim T. Allawi and John SantaLucia. Nearest neighbor thermodynamic parameters for internal G·A mismatches in DNA. Biochemistry, 37(8):2170–2179, 1998.
OpenUrl CrossRef PubMed Web of Science
[100].
Hatim T. Allawi and John SantaLucia. Thermodynamics of internal C·T mismatches in DNA. Nucleic Acids Research, 26(11):2694–2701, 1998.
OpenUrl CrossRef PubMed Web of Science
[101].
Hatim T. Allawi and John SantaLucia. Nearest-neighbor thermodynamics of internal A·C mismatches in DNA: Sequence dependence and pH effects. Biochemistry, 37(26):9435–9444, 1998.
OpenUrl CrossRef PubMed Web of Science
[102].↵
Nicolas Peyret, P. Ananda Seneviratne, Hatim T. Allawi, and John SantaLucia. Nearest-neighbor thermodynamics and NMR of DNA sequences with internal A·A, C·C, G·G, and T·T mismatches. Biochemistry, 38(12):3468–3477, 1999.
OpenUrl CrossRef PubMed Web of Science
[103].↵
Naoki Sugimoto, Shu ichi Nakano, Misa Katoh, Akiko Matsumura, Hiroyuki Nakamuta, Tatsuo Ohmichi, Mari Yoneyama, and Muneo Sasaki. Thermodynamic Parameters To Predict Stability of RNA/DNA Hybrid Duplexes. Biochemistry, 34(35):11211–11216, 1995.
OpenUrl CrossRef PubMed Web of Science
[104].↵
Norman E. Watkins, William J. Kennelly, Mike J. Tsay, Astrid Tuin, Lara Swenson, Hyung Ran Lee, Svetlana Morosyuk, Donald A. Hicks, and John SantaLucia. Thermodynamic contributions of single internal rA·dA, rC·dC, rG·dG and rU·dT mismatches in RNA/DNA duplexes. Nucleic Acids Research, 39(5):1894–1902, 2011.
OpenUrl CrossRef PubMed Web of Science

View the discussion thread.

Posted June 04, 2018.

Download PDF

Supplementary Material

Citation Tools

Subject Area

Biophysics

Subject Areas

All Articles

Animal Behavior and Cognition (5201)
Biochemistry (11718)
Bioengineering (8724)
Bioinformatics (29132)
Biophysics (14936)
Cancer Biology (12051)
Cell Biology (17360)
Clinical Trials (138)
Developmental Biology (9406)
Ecology (14146)
Epidemiology (2067)
Evolutionary Biology (18269)
Genetics (12223)
Genomics (16768)
Immunology (11844)
Microbiology (28016)
Molecular Biology (11560)
Neuroscience (60822)
Paleontology (450)
Pathology (1864)
Pharmacology and Toxicology (3231)
Physiology (4940)
Plant Biology (10401)
Scientific Communication and Education (1680)
Synthetic Biology (2878)
Systems Biology (7333)
Zoology (1642)

[1] [1].↵
Philipp Kapranov, Jill Cheng, Sujit Dike, David A. Nix, Radharani Duttagupta, Aarron T. Willingham, Peter F. Stadler, Jana Hertel, Jörg Hackermüller, Ivo L. Hofacker, Ian Bell, Evelyn Cheung, Jorg Drenkov, Erica Dumais, Sandeep Patel, Gregg Helt, Madhavan Ganesh, Srinka Ghosh, Antonio Piccolboni, Victor Sementchenko, Hari Tammana, and Thomas R. Gingeras. RNA Maps Reveal New RNA Classes and a Possible Function for Pervasive Transcription. Science, 316(5830):1484–1488, 2007.
OpenUrl Abstract/FREE Full Text

[2] [2].↵
Yukinori Okada, Tomoki Muramatsu, Naomasa Suita, Masahiro Kanai, Eiryo Kawakami, Valentina Iotchkova, Nicole Soranzo, Johji Inazawa, and Toshihiro Tanaka. Significant impact of miRNA-target gene networks on genetics of human complex traits. Scientific Reports, 6:1–9, 2016.
OpenUrl

[3] [3].↵
Bharat Sridhar, Marcelo Rivas-Astroza, Tri C. Nguyen, Weizhong Chen, Zhangming Yan, Xiaoyi Cao, Lucie Hebert, and Sheng Zhong. Systematic Mapping of RNA-Chromatin Interactions In Vivo. Current Biology, 27(4):602–609, 2017.
OpenUrl CrossRef

[4] [4].↵
F. Butter, M. Scheibe, M. Morl, and M. Mann. Unbiased RNA-protein interaction screen by quantitative proteomics. Proceedings of the National Academy of Sciences, 106(26):10626–10631, 2009.
OpenUrl Abstract/FREE Full Text

[5] [5].↵
Stefan E Seemann, Susan M Sunkin, Michael J Hawrylycz, Walter L Ruzzo, and Jan Gorodkin. Transcripts with in silico predicted RNA structure are enriched everywhere in the mouse brain. BMC Genomics, 13(214), 2012.

[6] [6].↵
Tim R. Mercer, Marcel E. Dinger, and John S. Mattick. Long non-coding RNAs: Insights into functions. Nature Reviews Genetics, 10(3):155–159, 2009.
OpenUrl CrossRef PubMed Web of Science

[7] [7].↵
Michael T. McManus and Phillip A. Sharp. Gene silencing in mammals by small interfering RNAs. Nature Reviews Genetics, 3(10):737–747, 2002.
OpenUrl CrossRef PubMed Web of Science

[8] [8].↵
Celina Juliano, Jianquan Wang, and Haifan Lin. Uniting Germline and Stem Cells: The Function of Piwi Proteins and the piRNA Pathway in Diverse Organisms. Annual Review of Genetics, 45(1):447–469, 2011.
OpenUrl CrossRef PubMed Web of Science

[9] [9].↵
Andrew D. Ellington and Jack W. Szostak. In vitro selection of RNA molecules that bind specific ligands. Nature, 346:818–822, 1990.
OpenUrl CrossRef PubMed Web of Science

[10] [10].
C Tuerk and L Gold. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science, 249(4968):505–510, 1990.
OpenUrl Abstract/FREE Full Text

[11] [11].↵
Debra L. Robertson and Gerald F. Joyce. Selection in vitro of an RNA enzyme that specifically cleaves single-stranded DNA. Nature, 344(6265):467–468, 1990.
OpenUrl CrossRef PubMed Web of Science

[12] [12].↵
Charles Olea and Gerald F. Joyce. Real-Time detection of a self-replicating RNA Enzyme. Molecules, 21(10):1–12, 2016.
OpenUrl CrossRef

[13] [13].↵
Miriam H. Huntley, Arvind Murugan, and Michael P. Brenner. Information capacity of specific interactions. Proceedings of the National Academy of Sciences, 113(21):5841–5846, 2016.
OpenUrl Abstract/FREE Full Text

[14] [14].
Vera Pancaldi and Jürg Bäahler. In silico characterization and prediction of global protein-mRNA interactions in yeast. Nucleic Acids Research, 39(14):5826–5836, 2011.
OpenUrl CrossRef PubMed Web of Science

[15] [15].
Jiamin Xiao, Yizhou Li, Kelong Wang, Zhining Wen, Menglong Li, Lifang Zhang, and Xuanmin Guang. In silico method for systematic analysis of feature importance in microRNA-mRNA interactions. BMC Bioinformatics, 10:1–13, 2009.
OpenUrl CrossRef PubMed

[16] [16].↵
Nancy Martínez-Montiel, Laura Morales-Lara, Julio M. Hernández-Pérez, and Rebeca D. Martínez-Contreras. In silico analysis of the structural and biochemical features of the NMD factor UPF1 in Ustilago maydis. PLoS ONE, 11(2):1–26, 2016.
OpenUrl CrossRef PubMed

[17] [17].↵
P O Ilyinskii, T Schmidt, D Lukashev, A B Meriin, G Thoidis, D Frishman, and A M Shneider. Importance of mRNA secondary structural elements for the expression of influenza virus genes. Omics, 13(5):421–430, 2009.
OpenUrl CrossRef PubMed Web of Science

[18] [18].↵
R. A. Poot, N. V. Tsareva, I. V. Boni, and J. van Duin. RNA folding kinetics regulates translation of phage MS2 maturation gene. Proceedings of the National Academy of Sciences, 94(19):10110–10115, 1997.
OpenUrl Abstract/FREE Full Text

[19] [19].↵
Maximillian H Bailor, Xiaoyan Sun, and Hashim M. Al-Hashimi. Topology links RNA secondary structure with global conformation, dynamics, and adaptation. Science, 327(5962):202–206, 2010.
OpenUrl Abstract/FREE Full Text

[20] [20].↵
J M Pipas and J E McMahon. Method for predicting RNA secondary structure. Proceedings of the National Academy of Sciences, 72(6):2017–2021, 1975.
OpenUrl Abstract/FREE Full Text

[21] [21].↵
Michael S. Waterman. Secondary Structure of Single-Stranded Nucleic Acidst. Studies in Foundations and Combinatorics, Advances in Mathematics Supplementary Studies, 1:167–212, 1978.
OpenUrl

[22] [22].
Michael S. Waterman and Temple F. Smith. Rapid dynamic programming algorithms for RNA secondary structure. Advances in Applied Mathematics, 7(4):455–464, 1986.
OpenUrl

[23] [23].
Ruth Nussinov, George Pieczenik, Jerrold R. Griggs, and Daniel J. Kleitman. Algorithms for Loop Matchings. SIAM Journal on Applied Mathematics, 35(1):68–82, 1978.
OpenUrl CrossRef Web of Science

[24] [24].
Michael Zuker and Patrick Stiegler. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Research, 9(1):133–148, 1981.
OpenUrl CrossRef PubMed Web of Science

[25] [25].↵
Martin J. Serra and Douglas H. Turner. Predicting thermodynamic properties of RNA. Methods in Enzymology, 259:242–261, 1995.
OpenUrl CrossRef PubMed Web of Science

[26] [26].↵
David H. Mathews and Douglas H. Turner. Prediction of RNA secondary structure by free energy minimization. Current Opinion in Structural Biology, 16(3):270–278, 2006.
OpenUrl CrossRef PubMed Web of Science

[27] [27].↵
C. E. Hajdin, S. Bellaousov, W. Huggins, C. W. Leonard, D. H. Mathews, and K. M. Weeks. Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots. Proceedings of the National Academy of Sciences, 110(14):5498–5503, 2013.
OpenUrl Abstract/FREE Full Text

[28] [28].↵
Michael Zuker. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Research, 31(13):3406–3415, 2003.
OpenUrl CrossRef PubMed Web of Science

[29] [29].↵
I. L. Hofacker, W. Fontana, P. F. Stadler, L. S. Bonhoeffer, M. Tacker, and P. Schuster. Fast folding and comparison of RNA secondary structures. Monatshefte für Chemie, 125(2):167–188, 1994.
OpenUrl

[30] [30].↵
Raheleh Salari, Chava Kimchi-Sarfaty, Michael M. Gottesman, and Teresa M. Przytycka. Sensitive measurement of single-nucleotide polymorphism-induced changes of RNA conformation: Application to disease studies. Nucleic Acids Research, 41(1):44–53, 2013.
OpenUrl CrossRef PubMed Web of Science

[31] [31].↵
Rune B. Lyngsø and Christian N. S. Pedersen.Pseudoknots in RNA Secondary Structures. Proceedings of the fourth annual international Conference on Computational Molecular Biology,, pages201–209, 2000.

[32] [32].
Rune B. Lyngsø and Christian N. S. Pedersen. RNA Pseudoknot Prediction in Energy-Based Models. Journal of Computational Biology, 7(3-4):409–427, 2000.
OpenUrl CrossRef PubMed Web of Science

[33] [33].↵
Biao Liu, David H Mathews, and Douglas H Turner. RNA pseudoknots: folding and finding. F1000 Biology Reports, 5(January):1–5, 2010.
OpenUrl

[34] [34].↵
H. Isambert and E. D. Siggia. Modeling RNA folding paths with pseudoknots: Application to hepatitis delta virus ribozyme. Proceedings of the National Academy of Sciences, 97(12):6515–6520, 2000.
OpenUrl Abstract/FREE Full Text

[35] [35].↵
Jianhua Ruan, Gary D. Stormo, and Weixiong Zhang. An Iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots. Bioinformatics, 20(1):58–66, 2004.
OpenUrl CrossRef PubMed Web of Science

[36] [36].
Jihong Ren, Baharak Rastegari, Anne Condon, and Holger H Hoos. HotKnots : Heuristic prediction of RNA secondary structures including pseudoknots HotKnots : Heuristic prediction of RNA secondary structures including pseudoknots.RNA, 11(1):1494–1504, 2005.
OpenUrl Abstract/FREE Full Text

[37] [37].↵
S. Bellaousov and D. H. Mathews. ProbKnot: Fast prediction of RNA secondary structure including pseudo-knots. RNA, 16(10):1870–1880, 2010.
OpenUrl Abstract/FREE Full Text

[38] [38].↵
Kengo Sato, Yuki Kato, Michiaki Hamada, Tatsuya Akutsu, and Kiyoshi Asai. IPknot: Fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. Bioinformatics, 27(13):85–93, 2011.
OpenUrl CrossRef

[39] [39].↵
Elena Rivas and Sean R. Eddy. A dynamic programming algorithm for RNA structure prediction including pseudoknots. Journal of molecular biology, 285(5):2053–2068, 1999.
OpenUrl CrossRef PubMed Web of Science

[40] [40].
Yasuo Uemura, Aki Hasegawa, Satoshi Kobayashi, and Takashi Yokomori. Tree adjoining grammars for RNA structure prediction. Theoretical Computer Science, 210(2):277–303, 1999.
OpenUrl

[41] [41].
Tatsuya Akutsu. Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Discrete Applied Mathematics, 104(1-3):45–62, 2000.
OpenUrl CrossRef Web of Science

[42] [42].
Anne Condon, Beth Davy, Baharak Rastegari, Shelly Zhao, and Finbarr Tarrant. Classifying RNA pseudoknotted structures. Theoretical Computer Science, 320(1):35–50, 2004.
OpenUrl

[43] [43].
Robert M Dirks and Niles A. Pierce. A partition function algorithm for nucleic acid secondary structure including pseudoknots. Journal of Computational Chemistry, 24(13):1664–1677, 2003.
OpenUrl CrossRef PubMed Web of Science

[44] [44].
Jens Reeder and Robert Giegerich. Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics. BMC Bioinformatics, 5:1–12, 2004.
OpenUrl CrossRef PubMed Web of Science

[45] [45].↵
Song Cao and Shi Jie Chen. Predicting RNA pseudo-knot folding thermodynamics. Nucleic Acids Research, 34(9):2634–2652, 2006.
OpenUrl CrossRef PubMed Web of Science

[46] [46].↵
Song Cao and Shi-Jie Chen. Predicting structures and stabilities for H-type pseudoknots with interhelix loops. RNA, 15(4):696–706, 2009.
OpenUrl Abstract/FREE Full Text

[47] [47].↵
F H van Batenbur, a P Gultyaev, C W Pleij, J Ng, and J Oliehoek. PseudoBase: a database with RNA pseudoknots. Nucleic Acids Research, 28(1):201–204, 2000.
OpenUrl CrossRef PubMed Web of Science

[48] [48].↵
Daniel P. Aalberts and Nathan O. Hodas. Asymmetry in RNA pseudoknots: Observation and theory. Nucleic Acids Research, 33(7):2210–2214, 2005.
OpenUrl CrossRef PubMed Web of Science

[49] [49].↵
Adam Lucas and Ken A. Dill. Statistical mechanics of pseudoknot polymers. Journal of Chemical Physics, 119(4):2414–2421, 2003.
OpenUrl CrossRef Web of Science

[50] [50].↵
D H Mathews, J Sabina, M Zuker, and D H Turner. Expanded sequence dependence of thermodynamic paramenters improves prediction of RNA secondary structure. J. Mol. Biol., 288:911–940, 1999.
OpenUrl CrossRef PubMed Web of Science

[51] [51].↵
Mirela S Andronescu, Cristina Pop, and Anne E Condon. Improved free energy parameters for RNA pseudo-knotted secondary structure prediction. RNA, 16(1):26–42, 2010.
OpenUrl Abstract/FREE Full Text

[52] [52].↵
A. Xayaphoummine, T. Bucher, and Herve Isambert. Kinefold web server for RNA/DNA folding path and structure prediction including pseudoknots and knots. Nucleic Acids Research, 33(SUPPL. 2):605–610, 2005.
OpenUrl CrossRef PubMed Web of Science

[53] [53].↵
Mirela Andronescu, Vera Bereg, Holger H. Hoos, and Anne Condon. RNA STRAND: The RNA secondary structure and statistical analysis database. BMC Bioinformatics, 9:1–10, 2008.
OpenUrl CrossRef PubMed

[54] [54].↵
Michela Taufer, Abel Licon, Roberto Araiza, David Mireles, F. H D van Batenburg, Alexander P. Gultyaev, and Ming Ying Leung. PseudoBase++: An extension of PseudoBase for easy searching, formatting and visualization of pseudoknots. Nucleic Acids Research, 37(SUPPL. 1):127–135, 2009.
OpenUrl

[55] [55].↵
Tomasz Puton, Lukasz P. Kozlowski, Kristian M. Rother, and Janusz M. Bujnicki. CompaRNA: A server for continuous benchmarking of automated methods for RNA secondary structure prediction. Nucleic Acids Research, 41(7):4307–4323, 2013.
OpenUrl CrossRef PubMed Web of Science

[56] [56].↵
Gary M Studnicka, Georgia M Rahn, Ian W Cummings, and Winston A Salser. Computer method for predicting the secondary structure of single-stranded RNA. Nucleic Acids Research, 5(9):3365–3388, 1978.
OpenUrl CrossRef PubMed Web of Science

[57] [57].↵
Michael Zuker and D Sankoff. RNA secondary structures and their prediction. Bulletin of Mathematical Biology, 46(4):591–621, 1984.
OpenUrl CrossRef Web of Science

[58] [58].↵
William Bialek and Rama Ranganathan. Rediscovering the power of pairwise interactions. arXiv, 2007.

[59] [59].↵
Tianbing Xia, John SantaLucia, Mark E. Burkard, Ryszard Kierzek, Susan J. Schroeder, Xiaoqi Jiao, Christopher Cox, and Douglas H. Turner. Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson Crick base pairs. Biochemistry, 37(42):14719–14735, 1998.
OpenUrl CrossRef PubMed Web of Science

[60] [60].↵
Dieter Soll,
Susumu Nishimura, and
Peter B. Moore
Tianbing Xia, David H. Mathews, and Douglas H. Turner. Thermodynamics of RNA Secondary Structure Formation. In Dieter Soll, Susumu Nishimura, and Peter B. Moore, editors, RNA, chapter 2, pages 21–48. Pergamon, 1 edition, 2001.

[61] Dieter Soll,

[62] Susumu Nishimura, and

[63] Peter B. Moore

[64] [61].↵
Douglas H. Turner and David H. Mathews. NNDB: The nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Research, 38(SUPPL.1):2009–2011, 2009.
OpenUrl

[65] [62].↵
S. B. Smith, Y. Cui, and C. Bustamante. Overstretching B-DNA: The Elastic Response of Individual Double-Stranded and Single-Stranded DNA Molecules. Science, 271(5250):795–799, 1996.
OpenUrl Abstract

[66] [63].↵
J. A. Abels, F. Moreno-Herrero, T. Van Der Heijden, C. Dekker, and Nynke H. Dekker. Single-molecule measurements of the persistence length of double-stranded RNA. Biophysical Journal, 88(4):2737–2744, 2005.
OpenUrl CrossRef PubMed Web of Science

[67] [64].↵
D. H. Mathews, M. D. Disney, J. L. Childs, S. J. Schroeder, M. Zuker, and D. H. Turner. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proceedings of the National Academy of Sciences, 101(19):7287–7292, 2004.
OpenUrl Abstract/FREE Full Text

[68] [65].↵
Homer Jacobson and Walter H. Stockmayer. In-tramolecular reaction in polycondensations. I. The theory of linear systems. The Journal of Chemical Physics, 18(12):1600–1606, 1950.
OpenUrl CrossRef Web of Science

[69] [66].↵
Zhi John Lu, Douglas H. Turner, and David H. Math-ews. A set of nearest neighbor parameters for predicting the enthalpy change of RNA secondary structure formation. Nucleic Acids Research, 34(17):4912–4924, 2006.
OpenUrl CrossRef PubMed Web of Science

[70] [67].↵
Ronny Lorenz, Stephan H. Bernhart, Christian Höner zu Siederdissen, Hakim Tafer, Christoph Flamm, Peter F. Stadler, and Ivo L. Hofacker. ViennaRNA Package 2.0. Algorithms for Molecular Biology, 6(1):1–14, 2011.
OpenUrl

[71] [68].↵
Mirela Andronescu, Anne Condon, Holger H. Hoos, David H. Mathews, and Kevin P. Murphy. Efficient parameter estimation for RNA secondary structure prediction. Bioinformatics, 23(13):19–28, 2007.
OpenUrl CrossRef

[72] [69].↵
Chuong B. Do, Daniel A. Woods, and Serafim Batzoglou. CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics, 22(14):90–98, 2006.
OpenUrl CrossRef

[73] [70].↵
Zsuzsanna Sükösd, Bjarne Knudsen, Jorgen Kjems, and Christian N.S. Pedersen. PPfold 3.0: Fast RNA secondary structure prediction using phylogeny and auxiliary data. Bioinformatics, 28(20):2691–2692, 2012.
OpenUrl CrossRef PubMed

[74] [71].↵
Kengo Sato, Michiaki Hamada, Kiyoshi Asai, and Toutai Mituyama. CentroidFold: A web server for RNA secondary structure prediction. Nucleic Acids Research, 37(SUPPL. 2):277–280, 2009.
OpenUrl

[75] [72].↵
S Zakov, Y Goldberg, M Elhadad, and M Ziv-Ukelson. Rich parameterization improves RNA structure prediction. Journal of Computational Biology, 18(11):1525–1542, 2011.
OpenUrl CrossRef PubMed

[76] [73].↵
V Rani Parvathy, Sukesh R Bhaumik, Kandala V R Chary, Girjesh Govil, Keliang Liu, Frank B Howard, and H Todd Miles. NMR structure of a parallel-stranded DNA duplex at atomic resolution. Nucleic Acids Research, 30(7):1500–1511, 2002.
OpenUrl CrossRef PubMed Web of Science

[77] [74].↵
Z. J. Lu, J. W. Gloor, and D. H. Mathews. Improved RNA secondary structure prediction by maximizing expected pair accuracy. RNA, 15(10):1805–1813, 2009.
OpenUrl Abstract/FREE Full Text

[78] [75].↵
K Rietveld, K Linschooten, C W Pleij, and L Bosch. The three-dimensional folding of the tRNA-like structure of tobacco mosaic virus RNA. A new building principle applied twice. The EMBO journal, 3(11):2613–9, 1984.
OpenUrl

[79] [76].↵
Ruud M.W. Mans, Cornelis W.A. Pleij, and Leendert Bosch. tRNAlike structures: Structure, function and evolutionary significance. European Journal of Biochemistry, 201(2):303–324, 1991.
OpenUrl PubMed Web of Science

[80] [77].↵
B Felden, C Florentz, R Giegé, and E Westhof. A central pseudoknotted three-way junction imposes tRNA-like mimicry and the orientation of three 5’ upstream pseudoknots in the 3’ terminus of tobacco mosaic virus RNA. RNA, 2(3):201–12, 1996.
OpenUrl Abstract

[81] [78].↵
Garrett A. Soukup. Core requirements for glmS ribozyme self-cleavage reveal a putative pseudoknot structure. Nucleic Acids Research, 34(3):968–975, 2006.
OpenUrl CrossRef PubMed Web of Science

[82] [79].↵
Fernando García-Arenal. Sequence and structure at the genome 3’ end of the U2-strain of tobacco mosaic virus, a histidine-accepting tobamovirus. Virology, 167(1):201–206, 1988.
OpenUrl CrossRef PubMed

[83] [80].↵
S R Wilkinson and M D Been. A pseudoknot in the 3’ non-core region of the glmS ribozyme enhances self-cleavage activity. RNA, 11(12):1788–1794, 2005.
OpenUrl Abstract/FREE Full Text

[84] [81].↵
Srinivas Garlapati and Ching C. Wang. Identification of an essential pseudoknot in the putative downstream internal ribosome entry site in giardiavirus transcript. RNA, 8(5):601–611, 2002.
OpenUrl Abstract

[85] [82].↵
Simon Pennell, Emily Manktelow, Andrew Flatt, Geoff Kelly, Stephen J Smerdon, and Ian Brierley. The stimulatory RNA of the Visna-Maedi retrovirus ribosomal frameshifting signal is an unusual pseudoknot with an interstem element. RNA, 14(7):1366–77, 2008.
OpenUrl Abstract/FREE Full Text

[86] [83].↵
Yanga Byun and Kyungsook Han. PseudoViewer3: Generating planar drawings of large-scale RNA structures with pseudoknots. Bioinformatics, 25(11):1435–1437, 2009.
OpenUrl CrossRef PubMed Web of Science

[87] [84].↵
Denise R. Koessler, Debra J. Knisley, Jeff Knisley, and Teresa Haynes. A predictive model for secondary RNA structure using graph theory and a neural network. BMC Bioinformatics, 11(SUPPL. 6):1–10, 2010.
OpenUrl CrossRef PubMed

[88] [85].
Michaël Bon and Henri Orland. TT2NE: A novel algorithm to predict RNA secondary structures with pseudoknots. Nucleic Acids Research, 39(14), 2011.

[89] [86].
Henri Orland and A. Zee. RNA folding and large N matrix theory. Nuclear Physics B, 620(3):456–476, 2002.
OpenUrl CrossRef Web of Science

[90] [87].↵
Jizhen Zhao, Russell L. Malmberg, and Liming Cai. Rapid ab initio prediction of RNA pseudoknots via graph tree decomposition. Journal of Mathematical Biology, 56(1-2):145–159, 2008.
OpenUrl CrossRef PubMed Web of Science

[91] [88].↵
Hin Hark Gan, Samuela Pasquali, and Tamar Schlick. Exploring the repertoire of RNA secondary motifs using graph theory; implications for RNA design. Nucleic Acids Research, 31(11):2926–2943, 2003.
OpenUrl CrossRef PubMed Web of Science

[92] [89].↵
Christian Laing and Tamar Schlick. Computational approaches to RNA structure prediction, analysis, and design. Current Opinion in Structural Biology, 21(3):306–318, 2011.
OpenUrl CrossRef PubMed

[93] [90].↵
C. Haslinger and P. F. Stadler. RNA structures with pseudo-knots: Graph-theoretical, combinatorial, and statistical properties. Bulletin of Mathematical Biology, 61(3):437–467, 1999.
OpenUrl CrossRef PubMed Web of Science

[94] [91].↵
Clara I. Bermúdez, Edgar E. Daza, and Eugenio Andrade. Characterization and comparison of Escherichia coli transfer RNAs by graph theory based on secondary structure. Journal of Theoretical Biology, 197(2):193–205, 1999.
OpenUrl CrossRef PubMed Web of Science

[95] [92].↵
Giorgio Benedetti and Stefano Morosetti. A graph-topological approach to recognition of pattern and similarity in RNA secondary structures. Biophysical Chemistry, 59(1-2):179–184, 1996.
OpenUrl CrossRef PubMed Web of Science

[96] [93].↵
Shu Yun Le, Ruth Nussinov, and Jacob V. Maizel. Tree graphs of RNA secondary structures and their comparisons. Computers and Biomedical Research, 22(5):461–473, 1989.
OpenUrl

[97] [94].↵
Walter Fontana and Peter Schuster. Continuity in evolution: On the nature of transitions. 280(5368):1451–1455, 1998.

[98] [95].↵
Lauren W Ancel and Walter Fontana. Plasticity, Evolability and Modularity in RNA. Journal of Experimental Zoology, 288(3):242–283, 2000.
OpenUrl CrossRef PubMed Web of Science

[99] [96].↵
Mathai Mammen, Eugene I. Shakhnovich, John M. Deutch, and George M. Whitesides. Estimating the Entropic Cost of Self-Assembly of Multiparticle Hydrogen-Bonded Aggregates Based on the Cyanuric Acid-Melamine Lattice. Journal of Organic Chemistry, 63(12):3821–3830, 1998.
OpenUrl CrossRef Web of Science

[100] [97].↵
Huan-xiang Zhou and Michael K Gilson. Theory of Free Energy and Entropy in Noncovalent Binding. Chemical Science, Reviews, 109(9):4092–4107, 2009.
OpenUrl

[101] [98].↵
Hatim T. Allawi and John SantaLucia. Thermodynamics and NMR of internal G·T mismatches in DNA. Biochemistry, 36(34):10581–10594, 1997.
OpenUrl CrossRef PubMed Web of Science

[102] [99].
Hatim T. Allawi and John SantaLucia. Nearest neighbor thermodynamic parameters for internal G·A mismatches in DNA. Biochemistry, 37(8):2170–2179, 1998.
OpenUrl CrossRef PubMed Web of Science

[103] [100].
Hatim T. Allawi and John SantaLucia. Thermodynamics of internal C·T mismatches in DNA. Nucleic Acids Research, 26(11):2694–2701, 1998.
OpenUrl CrossRef PubMed Web of Science

[104] [101].
Hatim T. Allawi and John SantaLucia. Nearest-neighbor thermodynamics of internal A·C mismatches in DNA: Sequence dependence and pH effects. Biochemistry, 37(26):9435–9444, 1998.
OpenUrl CrossRef PubMed Web of Science

[105] [102].↵
Nicolas Peyret, P. Ananda Seneviratne, Hatim T. Allawi, and John SantaLucia. Nearest-neighbor thermodynamics and NMR of DNA sequences with internal A·A, C·C, G·G, and T·T mismatches. Biochemistry, 38(12):3468–3477, 1999.
OpenUrl CrossRef PubMed Web of Science

[106] [103].↵
Naoki Sugimoto, Shu ichi Nakano, Misa Katoh, Akiko Matsumura, Hiroyuki Nakamuta, Tatsuo Ohmichi, Mari Yoneyama, and Muneo Sasaki. Thermodynamic Parameters To Predict Stability of RNA/DNA Hybrid Duplexes. Biochemistry, 34(35):11211–11216, 1995.
OpenUrl CrossRef PubMed Web of Science

[107] [104].↵
Norman E. Watkins, William J. Kennelly, Mike J. Tsay, Astrid Tuin, Lara Swenson, Hyung Ran Lee, Svetlana Morosyuk, Donald A. Hicks, and John SantaLucia. Thermodynamic contributions of single internal rA·dA, rC·dC, rG·dG and rU·dT mismatches in RNA/DNA duplexes. Nucleic Acids Research, 39(5):1894–1902, 2011.
OpenUrl CrossRef PubMed Web of Science