Abstract
(1) Background:Identification of hit molecules protein targets is essential in the drug discovery process. Target prediction with machine-learning algorithms can help accelerate this search, limiting the number of required experiments. However, Drug-Target Interactions databases used for training present high statistical bias, leading to a high number of false positive predicted targets, thus increasing time and cost of experimental validation campaigns. (2) Methods: To minimize the number of false positive predicted proteins, we propose a new scheme for choosing negative examples, so that each protein and each drug appears an equal number of times in positive and negative examples. We artificially reproduce the process of target identification for 3 particular drugs, and more globally for 200 approved drugs. (3) Results: For the detailed 3 drugs examples, and for the larger set of 200 drugs, training with the proposed scheme for the choice of negative examples improved target prediction results: the average number of false positive among the top ranked predicted targets decreased and overall the rank of the true targets was improved. (4) Conclusion: Our method enables to correct databases statistical bias and reduces the number of false positive predictions, and therefore the number of useless experiments potentially undertaken.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
https://github.com/njmmatthieu/dti_negative_examples_data.git