Abstract
RNA-small molecule interactions play a critical role in biological processes and have emerged as attractive targets for drug discovery. Despite recent enhancements in RNA molecular docking for virtual screening, the computational cost of docking remains prohibitive. Previous work has shown that using a proper graph representation of RNA structures and machine learning enables us to quickly get enrichment factors in small molecule ligand prediction. However, the performance of this approach is limited by the paucity of available structural data including interaction with a small molecular compound.
In this paper, we propose a competitive virtual screening pipeline to rapidly identify candidate ligands. We conduct large-scale docking experiments as a data augmentation and leverage unsupervised pre-training techniques for our RNA and ligand encoders to compensate for our reduced data. Our pipeline can also be used in combination with molecular docking, either by simply prioritizing compounds or by mixing the predicted score with the docking score.
We show that our new model widely outperforms previous ones. We also introduce a virtual screening test case for RNA and evaluate the performance of our tool in a realistic drug discovery setting. Our affinity prediction model shows native ligand recovery rates competitive with full docking experiments with runtimes in seconds. When the affinity prediction model is used to prioritize docking candidates, we show a 1200% acceleration of native ligand recovery with no performance loss. Finally, we show that using our model in combination with docking enhances virtual screening results with almost no overhead. Our source code and data, as well as a Google Colab notebook for inference, are available on GitHub. 1
Competing Interest Statement
The authors have declared no competing interest.