Abstract
We present ZairaChem, an artificial intelligence (AI)- and machine learning (ML)-based tool to train small-molecule activity prediction models. ZairaChem is fully automated, requires low computational resources and works across a broad spectrum of datasets, ranging from whole-cell growth inhibition assays to drug metabolism properties. The tool has been implemented end-to-end at the Holistic Drug Discovery and Development (H3D) Centre, the leading integrated drug discovery unit in Africa, at which no prior AI/ML capabilities were available. We have exploited in-house data collected from over a decade of drug discovery research in malaria and tuberculosis and built models to predict the outcomes of 15 key checkpoint assays. We subsequently deployed these models as a virtual screening cascade at an organisational scale to increase the hit rate of current experimental assays. We show how computational profiling of compounds, prior to synthesis and experimental testing, can increase the rate of progression by up to 40%. Moreover, we demonstrate that the approach can be applied to prioritise small molecules within a chemical series and to assess the likelihood of success of novel chemotypes, promoting efficient usage of limited experimental resources. This project is part of a first-of-its-kind collaboration between the H3D Centre, a research centre operating in a low-resource setting, and the Ersilia Open Source Initiative, a young tech non-profit devoted to building data science capacity in the Global South.
Competing Interest Statement
The authors have declared no competing interest.