PT - JOURNAL ARTICLE AU - Szymborski, Joseph AU - Emad, Amin TI - RAPPPID: Towards Generalisable Protein Interaction Prediction with AWD-LSTM Twin Networks AID - 10.1101/2021.08.13.456309 DP - 2022 Jan 01 TA - bioRxiv PG - 2021.08.13.456309 4099 - http://biorxiv.org/content/early/2022/04/18/2021.08.13.456309.short 4100 - http://biorxiv.org/content/early/2022/04/18/2021.08.13.456309.full AB - Motivation Computational methods for the prediction of protein-protein interactions, while important tools for researchers, are plagued by challenges in generalising to unseen proteins. Datasets used for modelling protein-protein predictions are particularly predisposed to information leakage and sampling biases.Results In this study, we introduce RAPPPID, a method for the Regularised Automatic Prediction of Protein-Protein Interactions using Deep Learning. RAPPPID is a twin AWD-LSTM network which employs multiple regularisation methods during training time to learn generalised weights. Testing on stringent interaction datasets composed of proteins not seen during training, RAPPPID outperforms state-of-the-art methods. Further experiments show that RAPPPID’s performance holds regardless of the particular proteins in the testing set and its performance is higher for biologically supported edges. This study serves to demonstrate that appropriate regularisation is an important component of overcoming the challenges of creating models for protein-protein interaction prediction that generalise to unseen proteins. Additionally, as part of this study, we provide datasets corresponding to several data splits of various strictness, in order to facilitate assessment of PPI reconstruction methods by others in the future. Availability and Implementation: Code and datasets are freely available at https://github.com/jszym/rapppid.Contact amin.emad{at}mcgill.caSupplementary Information Online-only supplementary data is available at the journal’s website.Competing Interest StatementThe authors have declared no competing interest.