Abstract
Sequence-based prediction of drug-target interactions has the potential to accelerate drug discovery by complementing experimental screens. Such computational prediction needs to be generalizable and scalable while remaining sensitive to subtle variations in the inputs. However, current computational techniques fail to simultaneously meet these goals, often sacrificing performance on one to achieve the others. We develop a deep learning model, ConPLex, successfully leveraging the advances in pre-trained protein language models (“PLex”) and employing a novel protein-anchored contrastive co-embedding (“Con”) to outperform state-of-the-art approaches. ConPLex achieves high accuracy, broad adaptivity to unseen data, and specificity against decoy compounds. It makes predictions of binding based on the distance between learned representations, enabling predictions at the scale of massive compound libraries and the human proteome. Furthermore, ConPLex is interpretable, which enables us to visualize the drug-target lexicon and use embeddings to characterize the function of human cell-surface proteins. We anticipate ConPLex will facilitate novel drug discovery by making highly sensitive and interpretable in-silico drug screening feasible at genome scale. Con-PLex is available open-source at https://github.com/samsledje/ConPLex.
Significance Statement In time and money, one of the most expensive steps of the drug discovery pipeline is the experimental screening of small molecules to see which will bind to a protein target of interest. Therefore, accurate high-throughput computational prediction of drug-target interactions would unlock significant value, guiding and prioritizing promising candidates for experimental screening. We introduce ConPLex, a machine learning method for predicting drug-target binding which achieves state-of-the-art accuracy on many types of targets by using a pre-trained protein language model. The approach co-locates the proteins and the potential drug molecules in a shared feature space while learning to contrast true drugs from similar non-binding “decoy” molecules. ConPLex is extremely fast, which allows it to rapidly shortlist candidates for deeper investigation.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
The authors have no competing interests to declare.