RT Journal Article SR Electronic T1 Using BERT to identify drug-target interactions from whole PubMed JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.09.10.459845 DO 10.1101/2021.09.10.459845 A1 Jehad Aldahdooh A1 Markus Vähä-Koskela A1 Jing Tang A1 Ziaurrehman Tanoli YR 2021 UL http://biorxiv.org/content/early/2021/09/11/2021.09.10.459845.abstract AB Background Drug-target interactions (DTIs) are critical for drug repurposing and elucidation of drug mechanisms, and they are collected in large databases, such as ChEMBL, BindingDB, DrugBank and DrugTargetCommons. However, the number of studies providing this data (~0.1 million) likely constitutes only a fraction of all studies on PubMed that contain experimental DTI data. Finding such studies and extracting the experimental information is a challenging task, and there is a pressing need for machine learning for the extraction and curation of DTIs. To this end, we developed new text mining document classifiers based on the Bidirectional Encoder Representations from Transformers (BERT) algorithm. Because DTI data intimately depends on the type of assays used to generate it, we also aimed to incorporate functions to predict the assay format.Results Our novel method identified and extracted DTIs from 2.1 million studies not previously included in public DTI databases. Using 10-fold cross-validation, we obtained ~99% accuracy for identifying studies containing drug-target pairs. The accuracy for the prediction of assay format is ~90%, which leaves room for improvement in future studies.Conclusion The BERT model in this study is robust and the proposed pipeline can be used to identify new and previously overlooked studies containing DTIs and automatically extract the DTI data points. The tabular output facilitates validation of the extracted data and assay format information. Overall, our method provides a significant advancement in machine-assisted DTI extraction and curation. We expect it to be a useful addition to drug mechanism discovery and repurposing.Competing Interest StatementThe authors have declared no competing interest.