Abstract
Word models (natural language descriptions of molecular mechanisms) are a common currency in spoken and written communication in biomedicine but are of limited use in predicting the behavior of complex biological networks. We present an approach to building computational models directly from natural language using automated assembly. Molecular mechanisms described in simple English are read by natural language processing algorithms, converted into an intermediate representation and assembled into executable or network models. We have implemented this approach in the Integrated Network and Dynamical Reasoning Assembler (INDRA), which draws on existing natural language processing systems as well as pathway information in Pathway Commons and other online resources. We demonstrate the use of INDRA and natural language to model three biological processes of increasing scope: (i) p53 dynamics in response to DNA damage; (ii) adaptive drug resistance in BRAF-V600E mutant melanomas; and (iii) the RAS signaling pathway. The use of natural language for modeling makes routine tasks more efficient for modeling practitioners and increases the accessibility and transparency of models for the broader biology community.
Standfirst text INDRA uses natural language processing systems to read descriptions of molecular mechanisms and assembles them into executable models.
Highlights
INDRA decouples the curation of knowledge as word models from model implementation
INDRA is connected to multiple natural language processing systems and can draw on information from curated databases
INDRA can assemble dynamical models in rule-based and reaction network formalisms, as well as Boolean networks and visualization formats
We used INDRA to build models of p53 dynamics, resistance to targeted inhibitors of BRAF in melanoma, and the Ras signaling pathway from natural language