Abstract
Accurately predicting the mechanisms and properties of potential drug molecules is essential for advancing drug discovery. However, traditional methods often require the development of specialized models for each specific prediction task, resulting in inefficiencies in both model training and integration into work-flows. Moreover, these approaches are typically limited to predicting pharmaceutical attributes represented as discrete categories, and struggle with predicting complex attributes that are best described in free-form texts. To address these challenges, we introduce DrugChat, a multi-modal large language model (LLM) designed to provide comprehensive predictions of molecule mechanisms and properties within a unified framework. DrugChat analyzes the structure of an input molecule along with users’ queries to generate comprehensive, free-form predictions on drug indications, pharmacodynamics, and mechanisms of action. Moreover, DrugChat supports multi-turn dialogues with users, facilitating interactive and in-depth exploration of the same molecule. Our extensive evaluation, including assessments by human experts, demonstrates that DrugChat significantly outperforms GPT-4 and other leading LLMs in generating accurate free-form predictions, and exceeds state-of-the-art specialized prediction models.
Competing Interest Statement
The authors have declared no competing interest.