Abstract
In “The ends of a large RNA molecule are necessarily close”, Yoffe et al. (Nucleic Acids Res 39(1):292–299, 2011) used the programs \({\tt RNAfold}\) [resp. \({\tt RNAsubopt}\) ] from Vienna RNA Package to calculate the distance between 5′ and 3′ ends of the minimum free energy secondary structure [resp. thermal equilibrium structures] of viral and random RNA sequences. Here, the 5′–3′ distance is defined to be the length of the shortest path from 5′ node to 3′ node in the undirected graph, whose edge set consists of edges {i, i + 1} corresponding to covalent backbone bonds and of edges {i, j} corresponding to canonical base pairs. From repeated simulations and using a heuristic theoretical argument, Yoffe et al. conclude that the 5′–3′ distance is less than a fixed constant, independent of RNA sequence length. In this paper, we provide a rigorous, mathematical framework to study the expected distance from 5′ to 3′ ends of an RNA sequence. We present recurrence relations that precisely define the expected distance from 5′ to 3′ ends of an RNA sequence, both for the Turner nearest neighbor energy model, as well as for a simple homopolymer model first defined by Stein and Waterman. We implement dynamic programming algorithms to compute (rather than approximate by repeated application of Vienna RNA Package) the expected distance between 5′ and 3′ ends of a given RNA sequence, with respect to the Turner energy model. Using methods of analytical combinatorics, that depend on complex analysis, we prove that the asymptotic expected 5′–3′ distance \({\langle d_n \rangle}\) of length n homopolymers is approximately equal to the constant 5.47211, while the asymptotic distance is 6.771096 if hairpins have a minimum of 3 unpaired bases and the probability that any two positions can form a base pair is 1/4. Finally, we analyze the 5′–3′ distance for secondary structures from the STRAND database, and conclude that the 5′–3′ distance is correlated with RNA sequence length.
Similar content being viewed by others
References
Andronescu M, Bereg V, Hoos HH, Condon A (2008) RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinform 9: 340
Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, Fagan P, Marvin J, Padilla D, Ravichandran V, Schneider B, Thanki N, Weissig H, Westbrook JD, Zardecki C (2002) The protein data bank. Acta Crystallogr D Biol Crystallogr 58(Pt): 899–907
Berman HM, Westbrook J, Feng Z, Iype L, Schneider B, Zardecki C (2003) The nucleic acid database. Methods Biochem Anal 44: 199–216
Cormen T, Leiserson C, Rivest R (1990) Algorithms. McGraw-Hill, New York
Corver J, Lenches E, Smith K, Robison RA, Sando T, Strauss EG, Strauss JH (2003) Fine mapping of a cis-acting sequence element in yellow fever virus RNA that is required for RNA replication and cyclization. J Virol 77(3): 2265–2270
Darty K, Denise A, Ponty Y (2009) VARNA: interactive drawing and editing of the RNA secondary structure. Bioinformatics 25(15): 1974–1975
Flajolet P, Sedgewick R (2009) Analytic Combinatorics. Cambridge University, Cambridge ISBN-13:9780521898065
Gallie DR (1991) The cap and poly(A) tail function synergistically to regulate mRNA translational efficiency. Genes Dev 5(11): 2108–2116
Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR, Bateman A (2009) Rfam: updates to the RNA families database. Nucleic Acids Res 37(Database): D136–D140
Gerland U, Bundschuh R, Hwa T (2001) Force-induced denaturation of RNA. Biophys J 81: 1324–1332
Gutell R, Lee J, Cannone J (2005) The accuracy of ribosomal RNA comparative structure models. Curr Opin Struct Biol 12: 301–310
Hofacker I (2003) Vienna RNA secondary structure server. Nucleic Acids Res 31(13): 3429–3431
Hofacker IL, Schuster P, Stadler PF (1998) Combinatorics of RNA secondary structures. Discret Appl Math 88:207–237. http://citeseer.nj.nec.com/1454.html
Hopcroft JE, Ullman JD (1969) Formal languages and their relation to automata. Addison-Wesley, Reading
Hsu MT, Parvin JD, Gupta S, Krystal M, Palese P (1987) Genomic RNAs of influenza viruses are held in a circular conformation in virions and in infected cells by a terminal panhandle. Proc Natl Acad Sci USA 84(22): 8140–8144
Kneller EL, Rakotondrafara AM, Miller WA (2006) Cap-independent translation of plant viral RNAs. Virus Res 119(1): 63–75
Lorenz WA, Ponty Y, Clote P (2008) Asymptotics of RNA shapes. J Comput Biol 15(1): 31–63
McCaskill J (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29: 1105–1119
Miller WA, White KA (2006) Long-distance RNA–RNA interactions in plant virus gene expression and replication. Annu Rev Phytopathol 44: 447–467
Nussinov R, Jacobson AB (1980) Fast algorithm for predicting the secondary structure of single stranded RNA. Proc Natl Acad Sci USA 77(11): 6309–6313
Sprinzl M, Horn C, Brown M, Ioudovitch A, Steinberg S (1998) Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res 26: 148–153
Stein PR, Waterman MS (1978) On some new sequences generalizing the Catalan and Motzkin numbers. Discret Math 26: 261–272
Xia T, SantaLucia J, Burkard M, Kierzek R, Schroeder S, Jiao X, Cox C, Turner D (1999) Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson–Crick base pairs. Biochemistry 37(14): 719–735
Yoffe AM, Prinsen P, Gelbart WM, Ben-Shaul A (2011) The ends of a large RNA molecule are necessarily close. Nucleic Acids Res 39(1): 292–299
Zuker M (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31(13): 3406–3415
Author information
Authors and Affiliations
Corresponding author
Additional information
Research supported by the Digiteo Foundation, and National Science Foundation grants DMS-0817971, DBI-0543506 and DMS-1016618.
Source code (python, Maple, Mathematica and C programs) are available at http://bioinformatics.bc.edu/clotelab/Expected5to3distance.
Electronic Supplementary Material
The Below is the Electronic Supplementary Material.
Rights and permissions
About this article
Cite this article
Clote, P., Ponty, Y. & Steyaert, JM. Expected distance between terminal nucleotides of RNA secondary structures. J. Math. Biol. 65, 581–599 (2012). https://doi.org/10.1007/s00285-011-0467-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00285-011-0467-8