LocARNA-P: Accurate boundary prediction and improved detection of structural RNAs

  1. Rolf Backofen1,9,10
  1. 1Chair for Bioinformatics, Institute of Computer Science, Albert-Ludwigs-Universität, D-79110 Freiburg, Germany
  2. 2Computation and Biology Group, CSAIL and Mathematics Department, MIT, Cambridge, Massachusetts 02139, USA
  3. 3Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark
  4. 4Department of Theoretical Chemistry, University of Vienna, A-1090 Wien, Austria
  5. 5Bioinformatics Group, Department of Computer Science, Interdisciplinary Center of Bioinformatics, University of Leipzig, D-04107 Leipzig, Germany
  6. 6Max-Planck-Institute for Mathematics in the Sciences, D-04103 Leipzig, Germany
  7. 7Fraunhofer Institute for Cell Therapy and Immunology, D-04103 Leipzig, Germany
  8. 8Santa Fe Institute, Santa Fe, New Mexico 87501, USA
  9. 9Center for Biological Signaling Studies (BIOSS), University of Freiburg, D-79104 Freiburg, Germany

    Abstract

    Current genomic screens for noncoding RNAs (ncRNAs) predict a large number of genomic regions containing potential structural ncRNAs. The analysis of these data requires highly accurate prediction of ncRNA boundaries and discrimination of promising candidate ncRNAs from weak predictions. Existing methods struggle with these goals because they rely on sequence-based multiple sequence alignments, which regularly misalign RNA structure and therefore do not support identification of structural similarities. To overcome this limitation, we compute columnwise and global reliabilities of alignments based on sequence and structure similarity; we refer to these structure-based alignment reliabilities as STARs. The columnwise STARs of alignments, or STAR profiles, provide a versatile tool for the manual and automatic analysis of ncRNAs. In particular, we improve the boundary prediction of the widely used ncRNA gene finder RNAz by a factor of 3 from a median deviation of 47 to 13 nt. Post-processing RNAz predictions, LocARNA-P's STAR score allows much stronger discrimination between true- and false-positive predictions than RNAz's own evaluation. The improved accuracy, in this scenario increased from AUC 0.71 to AUC 0.87, significantly reduces the cost of successive analysis steps. The ready-to-use software tool LocARNA-P produces structure-based multiple RNA alignments with associated columnwise STARs and predicts ncRNA boundaries. We provide additional results, a web server for LocARNA/LocARNA-P, and the software package, including documentation and a pipeline for refining screens for structural ncRNA, at http://www.bioinf.uni-freiburg.de/Supplements/LocARNA-P/.

    Keywords

    Footnotes

    • Received July 1, 2011.
    • Accepted January 18, 2012.

    Freely available online through the RNA Open Access option.

    | Table of Contents
    OPEN ACCESS ARTICLE