Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

RTX-KG2: a system for building a semantically standardized knowledge graph for translational biomedicine

E. C. Wood, View ORCID ProfileAmy K. Glen, Lindsey G. Kvarfordt, Finn Womack, Liliana Acevedo, Timothy S. Yoon, Chunyu Ma, Veronica Flores, Meghamala Sinha, Yodsawalai Chodpathumwan, Arash Termehchy, Jared C. Roach, Luis Mendoza, Andrew S. Hoffman, View ORCID ProfileEric W. Deutsch, View ORCID ProfileDavid Koslicki, View ORCID ProfileStephen A. Ramsey
doi: https://doi.org/10.1101/2021.10.17.464747
E. C. Wood
1School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Amy K. Glen
1School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Amy K. Glen
  • For correspondence: glena@oregonstate.edu
Lindsey G. Kvarfordt
1School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Finn Womack
2School of Electrical Engineering and Computer Science, Penn State University, State College, Pennsylvania USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Liliana Acevedo
1School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Timothy S. Yoon
1School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Chunyu Ma
3Huck Institutes of the Life Sciences, Penn State University, State College, Pennsylvania USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Veronica Flores
1School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Meghamala Sinha
1School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yodsawalai Chodpathumwan
7King Mongkut’s University of Technology North Bangkok, Thailand
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Arash Termehchy
1School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jared C. Roach
4Institute for Systems Biology, Seattle, Washington USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Luis Mendoza
4Institute for Systems Biology, Seattle, Washington USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrew S. Hoffman
5Interdisciplinary Hub for Digitalization and Society, Radboud University, Nijmegen NL
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eric W. Deutsch
4Institute for Systems Biology, Seattle, Washington USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Eric W. Deutsch
David Koslicki
2School of Electrical Engineering and Computer Science, Penn State University, State College, Pennsylvania USA
3Huck Institutes of the Life Sciences, Penn State University, State College, Pennsylvania USA
6Department of Biology, Penn State University, State College, Pennsylvania USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for David Koslicki
Stephen A. Ramsey
1School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon USA
8Department of Biomedical Sciences, Oregon State University, Corvallis, Oregon USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Stephen A. Ramsey
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Background Biomedical translational science is increasingly leveraging computational reasoning on large repositories of structured knowledge (such as the Unified Medical Language System (UMLS), the Semantic Medline Database (SemMedDB), ChEMBL, DrugBank, and the Small Molecule Pathway Database (SMPDB)) and data in order to facilitate discovery of new therapeutic targets and modalities. Since 2016, the NCATS Biomedical Data Translator project has been working to federate autonomous reasoning agents and knowledge providers within a distributed system for answering translational questions. Within that project and within the field more broadly, there is an urgent need for an open-source framework that can efficiently and reproducibly build an integrated, standards-compliant, and comprehensive biomedical knowledge graph that can be either downloaded in standard serialized form or queried via a public application programming interface (API) that accords with the FAIR data principles.

Results To create a knowledge provider system within the Translator project, we have developed RTX-KG2, an open-source software system for building—and hosting a web API for querying—a biomedical knowledge graph that uses an Extract-Transform-Load (ETL) approach to integrate 70 knowledge sources (including the aforementioned sources) into a single knowledge graph. The semantic layer and schema for RTX-KG2 follow the standard Biolink metamodel to maximize interoperability within Translator. RTX-KG2 is currently being used by multiple Translator reasoning agents, both in its downloadable form and via its SmartAPI-registered web interface. JavaScript Object Notation (JSON) serializations of RTX-KG2 are available for download of RTX-KG2 in both the pre-canonicalized form and in canonicalized form (in which synonym concepts are merged). The current canonicalized version (KG2.7.3) of RTX-KG2 contains 6.4M concept nodes and 39.3M relationship edges with a rich set of 77 relationship types.

Conclusion RTX-KG2 is the first open-source knowledge graph of which we are aware that integrates UMLS, SemMedDB, ChEMBL, DrugBank, SMPDB, and 65 additional knowledge sources within a knowledge graph that conforms to the Biolink standard for its semantic layer and schema at the intersections of these databases. RTX-KG2 is publicly available for querying via its API at arax.ncats.io/api/rtxkg2/v1.2/openapi.json. The code to build RTX-KG2 is publicly available at github:RTXteam/RTX-KG2.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • Two authors added.

  • 1 An example prioritization would be for the semantic type “gene”, to prefer identifier types from Ensembl Gene, National Center for Biotechnology Information (NCBI Gene), and Human Gene Nomenclature Committee (HGNC).

  • 2 This will be transitioning to the original_predicate property in the next release of RTX-KG2, for compatibility with recent changes in the Biolink standard.

  • 3 Note however, that one API is used in constructing RTX-KG2; see Sec. 2.1.

  • 5 List of Abbreviations

    ARAX
    Autonomous Relay Agent X
    AWS
    Amazon Web Services
    D2J
    direct-to-JSON method
    EC2
    Elastic Compute Cloud
    ETL
    extract–transform–load paradigm
    GO
    Gene Ontology
    ICD
    International Classification of Diseases
    JSON
    JavaScript Object Notation
    KEGG
    Kyoto Encyclopedia of Genes and Genomes
    NCATS
    National Center for Advancing Translational Sciences
    NCBI
    National Center for Biotechnology Information
    OBO
    Open Biomedical Ontologies
    OWL
    Web Ontology Language
    RBM
    RDF-based method
    RDF
    Resource Description Framework
    REST
    REpresentational State Transfer
    RTX-KG2
    Reasoning Tool X, Knowledge Graph Generation Two
    RTX-KG2c
    Reasoning Tool X, Knowledge Graph Generation Two, Canonicalized
    RTX-KG2pre
    Reasoning Tool X, Knowledge Graph Generation Two, Pre-canonicalization
    S3
    Simple Storage Service
    SemMedDB
    Semantic Medline Database
    SMPDB
    Small Molecule Pathway Database
    SQL
    Structured Query Language
    Translator
    NCATS Biomedical Data Translator
    TSV
    tab-separated value
    TTL
    Terse RDF Triple Language
    UMLS
    Unified Medical Language System
    XML
    eXtensible Markup Language (See also Table 2.1, Table 3, and Table S1).
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
    Back to top
    PreviousNext
    Posted November 01, 2021.
    Download PDF
    Email

    Thank you for your interest in spreading the word about bioRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    RTX-KG2: a system for building a semantically standardized knowledge graph for translational biomedicine
    (Your Name) has forwarded a page to you from bioRxiv
    (Your Name) thought you would like to see this page from the bioRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    RTX-KG2: a system for building a semantically standardized knowledge graph for translational biomedicine
    E. C. Wood, Amy K. Glen, Lindsey G. Kvarfordt, Finn Womack, Liliana Acevedo, Timothy S. Yoon, Chunyu Ma, Veronica Flores, Meghamala Sinha, Yodsawalai Chodpathumwan, Arash Termehchy, Jared C. Roach, Luis Mendoza, Andrew S. Hoffman, Eric W. Deutsch, David Koslicki, Stephen A. Ramsey
    bioRxiv 2021.10.17.464747; doi: https://doi.org/10.1101/2021.10.17.464747
    Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
    Citation Tools
    RTX-KG2: a system for building a semantically standardized knowledge graph for translational biomedicine
    E. C. Wood, Amy K. Glen, Lindsey G. Kvarfordt, Finn Womack, Liliana Acevedo, Timothy S. Yoon, Chunyu Ma, Veronica Flores, Meghamala Sinha, Yodsawalai Chodpathumwan, Arash Termehchy, Jared C. Roach, Luis Mendoza, Andrew S. Hoffman, Eric W. Deutsch, David Koslicki, Stephen A. Ramsey
    bioRxiv 2021.10.17.464747; doi: https://doi.org/10.1101/2021.10.17.464747

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Bioinformatics
    Subject Areas
    All Articles
    • Animal Behavior and Cognition (4369)
    • Biochemistry (9546)
    • Bioengineering (7068)
    • Bioinformatics (24768)
    • Biophysics (12562)
    • Cancer Biology (9924)
    • Cell Biology (14297)
    • Clinical Trials (138)
    • Developmental Biology (7930)
    • Ecology (12074)
    • Epidemiology (2067)
    • Evolutionary Biology (15954)
    • Genetics (10904)
    • Genomics (14707)
    • Immunology (9844)
    • Microbiology (23582)
    • Molecular Biology (9454)
    • Neuroscience (50692)
    • Paleontology (369)
    • Pathology (1535)
    • Pharmacology and Toxicology (2674)
    • Physiology (3998)
    • Plant Biology (8639)
    • Scientific Communication and Education (1505)
    • Synthetic Biology (2388)
    • Systems Biology (6415)
    • Zoology (1344)