PT - JOURNAL ARTICLE AU - Guangyao Zhou AU - Jackson Loper AU - Stuart Geman TI - Base-pair Ambiguity and the Kinetics of RNA Folding: a Hypothesis-Driven Statistical Analysis AID - 10.1101/329698 DP - 2018 Jan 01 TA - bioRxiv PG - 329698 4099 - http://biorxiv.org/content/early/2018/05/25/329698.short 4100 - http://biorxiv.org/content/early/2018/05/25/329698.full AB - Non-coding RNA molecules contribute to cellular function through diverse roles, including genome regulation, DNA and RNA repair, RNA splicing, catalysis, protein synthesis, and intracellular transportation[1,2].The mechanisms of these actions can only be fully understood in terms of native secondary and tertiary structures. When provided with a sufficient number of homologous sequences, the gold standard for secondary structure prediction continues to be comparative analysis [3]. Alternatively, the prevailing computational approach to secondary structure is through the Gibbs (thermal) equilibrium, by Monte Carlo sampling or approximating the minimum free energy (MFE) configuration [4,5]. Aside from the necessary approximations, an enduring debate concerns the biological relevance of equilibrium configurations [6’9]. Here we adopt a kinetic perspective and argue that the existence of reliable folding on biologically relevant time scales suggests an intra-molecular statistical relationship between secondary and primary structures: as compared with other locations, nucleotide sequences in and around secondary-structure stems will have fewer Watson-Crick matches that are inconsistent with the native structure. An “ambiguity index”, one for each pair of molecule and presumed secondary structure, measures the prevalence of false matches and hence the tendency to form metastable structures incompatible with native structures. The ambiguity index statistically separates an ensemble of RNA molecules that operate as single entities (Group I and II Introns) from an ensemble that operates as protein-RNA complexes (SRP and tmRNAs), and ensembles of secondary structures determined by comparative analysis from ones based on thermal equilibrium. We find lower average ambiguity in single-entity RNA’s than protein-RNA complexes, and, among single-entity RNA’s, lower ambiguity with comparative analyses than equilibrium analyses. Both comparisons are supported by exact and highly significant hypothesis tests. These experiments, motivated by a hypothesized mechanism of folding, and the first of their kind, are consistent with folding to metastable but not necessarily equilibrium structures.Author summary Recent discoveries indicate that, in addition to being a messenger between DNA and protein, RNA molecules assume a wide range of biological functions. For biological macromolecules, structure is function. Experimental determination of RNA structures is still time-consuming, and computational approaches are of great importance. The prevailing computational approach tries to find the structure with the minimum energy, yet the relevance of this minimum energy structure as the native structure is still hotly debated. In this paper, we adopt a kinetic perspective, and argue that more emphasis should be placed on the folding process when trying to develop computational methods for RNA structure prediction. We present some statistical analyses using the primary and secondary data (sequence and base-pairs data) of RNA molecules, based on the concept of “local ambiguity”, i.e. the molecule’s tendency to “make a mistake” at a certain location when forming secondary structures. Our results show the deficiencies of the minimum energy approach, and demonstrate the importance of considering the kinetics as well as protein-RNA interactions in developing computational approaches for RNA secondary structure prediction.