PT  - JOURNAL ARTICLE
AU  - Marcell Szikszai
AU  - Michael Wise
AU  - Amitava Datta
AU  - Max Ward
AU  - David H. Mathews
TI  - Deep learning models for RNA secondary structure prediction (probably) do not generalise across families
AID  - 10.1101/2022.03.21.485135
DP  - 2022 Jan 01
TA  - bioRxiv
PG  - 2022.03.21.485135
4099  - http://biorxiv.org/content/early/2022/03/21/2022.03.21.485135.short
4100  - http://biorxiv.org/content/early/2022/03/21/2022.03.21.485135.full
AB  - Motivation The secondary structure of RNA is of importance to its function. Over the last few years, several papers attempted to use machine learning to improve de novo RNA secondary structure prediction. Many of these papers report impressive results for intra-family predictions, but seldom address the much more difficult (and practical) inter-family problem.Results We demonstrate it is nearly trivial with convolutional neural networks to generate pseudo-free energy changes, modeled after structure mapping data, that improve the accuracy of structure prediction for intra-family cases. We propose a more rigorous method for inter-family cross-validation that can be used to assess the performance of learning-based models. Using this method, we further demonstrate that intra-family performance is insufficient proof of generalisation despite the widespread assumption in the literature, and provide strong evidence that many existing learning-based models have not generalised inter-family.Availability Source code and data is available at https://github.com/marcellszi/dl-rna.Competing Interest StatementThe authors have declared no competing interest.