Abstract
Motivation Biomedical identifier resources (ontologies, taxonomies, controlled vocabularies) commonly overlap in scope and contain equivalent entries under different identifiers. Maintaining mappings for these relationships is crucial for interoperability and the integration of data and knowledge. However, there are substantial gaps in available mappings motivating their semi-automated curation.
Results Biomappings implements a curation cycle workflow for missing mappings which combines automated prediction with human-in-the-loop curation. It supports multiple prediction approaches and provides a web-based user interface for reviewing predicted mappings for correctness, combined with automated consistency checking. Predicted and curated mappings are made available in public, version-controlled resource files on GitHub. Biomappings currently makes available 8,560 curated mappings and 41,178 predicted ones, providing previously missing mappings between widely used resources covering small molecules, cell lines, diseases and other concepts. We demonstrate the value of Biomappings on case studies involving predicting and curating missing mappings among cancer cell lines as well as small molecules tested in clinical trials. We also present how previously missing mappings curated using Biomappings were contributed back to multiple widely used community ontologies.
Availability The data and code are available under the CC0 and MIT licenses at https://github.com/biopragmatics/biomappings.
Contact benjamin_gyori{at}hms.harvard.edu
Competing Interest Statement
The authors have declared no competing interest.