Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs

Bioinformatics. 2000 Jul;16(7):583-605. doi: 10.1093/bioinformatics/16.7.583.

Abstract

Motivation: Several results in the literature suggest that biologically interesting RNAs have secondary structures that are more stable than expected by chance. Based on these observations, we developed a scanning algorithm for detecting noncoding RNA genes in genome sequences, using a fully probabilistic version of the Zuker minimum-energy folding algorithm.

Results: Preliminary results were encouraging, but certain anomalies led us to do a carefully controlled investigation of this class of methods. Ultimately, our results argue that for the probabilistic model there is indeed a statistical effect, but it comes mostly from local base-composition bias and not from RNA secondary structure. For the thermodynamic implementation (which evaluates statistical significance by doing Monte Carlo shuffling in fixed-length sequence windows, thus eliminating the base-composition effect) the signals for noncoding RNAs are still usually indistinguishable from noise, especially when certain statistical artifacts resulting from local base-composition inhomogeneity are taken into account. We conclude that although a distinct, stable secondary structure is undoubtedly important in most noncoding RNAs, the stability of most noncoding RNA secondary structures is not sufficiently different from the predicted stability of a random sequence to be useful as a general genefinding approach.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms*
  • Animals
  • Base Composition
  • Caenorhabditis elegans / genetics
  • Data Interpretation, Statistical
  • Methanococcus / genetics
  • Models, Statistical
  • Nucleic Acid Conformation*
  • RNA / chemistry*
  • RNA, Helminth / analysis

Substances

  • RNA, Helminth
  • RNA