Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Caveats to deep learning approaches to RNA secondary structure prediction

View ORCID ProfileChristoph Flamm, Julia Wielach, View ORCID ProfileMichael T. Wolfinger, View ORCID ProfileStefan Badelt, View ORCID ProfileRonny Lorenz, View ORCID ProfileIvo L. Hofacker
doi: https://doi.org/10.1101/2021.12.14.472648
Christoph Flamm
1Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 Vienna, Austria
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Christoph Flamm
Julia Wielach
1Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 Vienna, Austria
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael T. Wolfinger
1Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 Vienna, Austria
2Research Group Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Währingerstraße 29, 1090 Vienna, Austria
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Michael T. Wolfinger
Stefan Badelt
1Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 Vienna, Austria
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Stefan Badelt
Ronny Lorenz
1Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 Vienna, Austria
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ronny Lorenz
Ivo L. Hofacker
1Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 Vienna, Austria
2Research Group Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Währingerstraße 29, 1090 Vienna, Austria
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ivo L. Hofacker
  • For correspondence: ivo@tbi.univie.ac.at
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Machine learning (ML) and in particular deep learning techniques have gained popularity for predicting structures from biopolymer sequences. An interesting case is the prediction of RNA secondary structures, where well established biophysics based methods exist. These methods even yield exact solutions under certain simplifying assumptions. Nevertheless, the accuracy of these classical methods is limited and has seen little improvement over the last decade. This makes it an attractive target for machine learning and consequently several deep learning models have been proposed in recent years. In this contribution we discuss limitations of current approaches, in particular due to biases in the training data. Furthermore, we propose to study capabilities and limitations of ML models by first applying them on synthetic data that can not only be generated in arbitrary amounts, but are also guaranteed to be free of biases. We apply this idea by testing several ML models of varying complexity. Finally, we show that the best models are capable of capturing many, but not all, properties of RNA secondary structures. Most severely, the number of predicted base pairs scales quadratically with sequence length, even though a secondary structure can only accommodate a linear number of pairs.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • https://github.com/ViennaRNA/RNAdeep

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted December 16, 2021.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Caveats to deep learning approaches to RNA secondary structure prediction
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Caveats to deep learning approaches to RNA secondary structure prediction
Christoph Flamm, Julia Wielach, Michael T. Wolfinger, Stefan Badelt, Ronny Lorenz, Ivo L. Hofacker
bioRxiv 2021.12.14.472648; doi: https://doi.org/10.1101/2021.12.14.472648
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Caveats to deep learning approaches to RNA secondary structure prediction
Christoph Flamm, Julia Wielach, Michael T. Wolfinger, Stefan Badelt, Ronny Lorenz, Ivo L. Hofacker
bioRxiv 2021.12.14.472648; doi: https://doi.org/10.1101/2021.12.14.472648

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4231)
  • Biochemistry (9124)
  • Bioengineering (6774)
  • Bioinformatics (23981)
  • Biophysics (12115)
  • Cancer Biology (9518)
  • Cell Biology (13772)
  • Clinical Trials (138)
  • Developmental Biology (7625)
  • Ecology (11682)
  • Epidemiology (2066)
  • Evolutionary Biology (15500)
  • Genetics (10637)
  • Genomics (14317)
  • Immunology (9476)
  • Microbiology (22825)
  • Molecular Biology (9087)
  • Neuroscience (48943)
  • Paleontology (355)
  • Pathology (1480)
  • Pharmacology and Toxicology (2567)
  • Physiology (3844)
  • Plant Biology (8324)
  • Scientific Communication and Education (1471)
  • Synthetic Biology (2295)
  • Systems Biology (6184)
  • Zoology (1300)