ABSTRACT
Most drugs are small molecules, with their activities typically arising from interactions with protein targets. Accurate predictions of these interactions could greatly accelerate pharmaceutical research. Current machine learning models designed for this task have a limited ability to generalize beyond the proteins used for training. This limitation is likely due to a lack of information exchange between the protein and the small molecule during the generation of the required numerical representations. Here, we introduce ProSmith, a machine learning framework that employs a multimodal Transformer Network to simultaneously process protein amino acid sequences and small molecule strings in the same input. This approach facilitates the exchange of all relevant information between the two types of molecules during the computation of their numerical representations, allowing the model to account for their structural and functional interactions. Our final model combines gradient boosting predictions based on the resulting multimodal Transformer Network with independent predictions based on separate deep learning representations of the proteins and small molecules. The resulting predictions outperform all previous models for predicting drug-target interactions, and the model demonstrates unprecedented generalization capabilities to unseen proteins. We further show that the superior performance of ProSmith is not limited to drug-target interaction predictions, but also leads to improvements in other protein-small molecule interaction prediction tasks, the prediction of Michaelis constants KM of enzyme-substrate pairs and the identification of potential substrates for enzymes. The Python code provided can be used to easily implement and improve machine learning predictions of interactions between proteins and arbitrary drug candidates or other small molecules.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
In this version we slightly revised the structure of the manuscript. Changes include the content of the title, of the Abstract, and of the Introduction, as well as the order of subsections in the Results section.