PT - JOURNAL ARTICLE AU - Flamm, Christoph AU - Wielach, Julia AU - Wolfinger, Michael T. AU - Badelt, Stefan AU - Lorenz, Ronny AU - Hofacker, Ivo L. TI - Caveats to deep learning approaches to RNA secondary structure prediction AID - 10.1101/2021.12.14.472648 DP - 2021 Jan 01 TA - bioRxiv PG - 2021.12.14.472648 4099 - http://biorxiv.org/content/early/2021/12/16/2021.12.14.472648.short 4100 - http://biorxiv.org/content/early/2021/12/16/2021.12.14.472648.full AB - Machine learning (ML) and in particular deep learning techniques have gained popularity for predicting structures from biopolymer sequences. An interesting case is the prediction of RNA secondary structures, where well established biophysics based methods exist. These methods even yield exact solutions under certain simplifying assumptions. Nevertheless, the accuracy of these classical methods is limited and has seen little improvement over the last decade. This makes it an attractive target for machine learning and consequently several deep learning models have been proposed in recent years. In this contribution we discuss limitations of current approaches, in particular due to biases in the training data. Furthermore, we propose to study capabilities and limitations of ML models by first applying them on synthetic data that can not only be generated in arbitrary amounts, but are also guaranteed to be free of biases. We apply this idea by testing several ML models of varying complexity. Finally, we show that the best models are capable of capturing many, but not all, properties of RNA secondary structures. Most severely, the number of predicted base pairs scales quadratically with sequence length, even though a secondary structure can only accommodate a linear number of pairs.Competing Interest StatementThe authors have declared no competing interest.