RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble

  1. YE DING1,
  2. CHI YU CHAN1, and
  3. CHARLES E. LAWRENCE1,2
  1. 1Bioinformatics Center, Wadsworth Center, New York State Department of Health, Albany, New York 12208, USA
  2. 2Center for Computational Molecular Biology and Division of Applied Mathematics, Brown University, Providence, Rhode Island 02912, USA

Abstract

Prediction of RNA secondary structure by free energy minimization has been the standard for over two decades. Here we describe a novel method that forsakes this paradigm for predictions based on Boltzmann-weighted structure ensemble. We introduce the notion of a centroid structure as a representative for a set of structures and describe a procedure for its identification. In comparison with the minimum free energy (MFE) structure using diverse types of structural RNAs, the centroid of the ensemble makes 30.0% fewer prediction errors as measured by the positive predictive value (PPV) with marginally improved sensitivity. The Boltzmann ensemble can be separated into a small number (3.2 on average) of clusters. Among the centroids of these clusters, the “best cluster centroid” as determined by comparison to the known structure simultaneously improves PPV by 46.5% and sensitivity by 21.7%. For 58% of the studied sequences for which the MFE structure is outside the cluster containing the best centroid, the improvements by the best centroid are 62.5% for PPV and 31.4% for sensitivity. These results suggest that the energy well containing the MFE structure under the current incomplete energy model is often different from the one for the unavailable complete model that presumably contains the unique native structure. Centroids are available on the Sfold server at http://sfold.wadsworth.org.

Keywords

Footnotes

| Table of Contents