PT - JOURNAL ARTICLE AU - Jehad Aldahdooh AU - Markus Vähä-Koskela AU - Jing Tang AU - Ziaurrehman Tanoli TI - Using BERT to identify drug-target interactions from whole PubMed AID - 10.1101/2021.09.10.459845 DP - 2021 Jan 01 TA - bioRxiv PG - 2021.09.10.459845 4099 - http://biorxiv.org/content/early/2021/09/11/2021.09.10.459845.short 4100 - http://biorxiv.org/content/early/2021/09/11/2021.09.10.459845.full AB - Background Drug-target interactions (DTIs) are critical for drug repurposing and elucidation of drug mechanisms, and they are collected in large databases, such as ChEMBL, BindingDB, DrugBank and DrugTargetCommons. However, the number of studies providing this data (~0.1 million) likely constitutes only a fraction of all studies on PubMed that contain experimental DTI data. Finding such studies and extracting the experimental information is a challenging task, and there is a pressing need for machine learning for the extraction and curation of DTIs. To this end, we developed new text mining document classifiers based on the Bidirectional Encoder Representations from Transformers (BERT) algorithm. Because DTI data intimately depends on the type of assays used to generate it, we also aimed to incorporate functions to predict the assay format.Results Our novel method identified and extracted DTIs from 2.1 million studies not previously included in public DTI databases. Using 10-fold cross-validation, we obtained ~99% accuracy for identifying studies containing drug-target pairs. The accuracy for the prediction of assay format is ~90%, which leaves room for improvement in future studies.Conclusion The BERT model in this study is robust and the proposed pipeline can be used to identify new and previously overlooked studies containing DTIs and automatically extract the DTI data points. The tabular output facilitates validation of the extracted data and assay format information. Overall, our method provides a significant advancement in machine-assisted DTI extraction and curation. We expect it to be a useful addition to drug mechanism discovery and repurposing.Competing Interest StatementThe authors have declared no competing interest.