Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

CLEP: A Hybrid Data- and Knowledge-Driven Framework for Generating Patient Representations

View ORCID ProfileVinay Srinivas Bharadhwaj, View ORCID ProfileMehdi Ali, View ORCID ProfileColin Birkenbihl, View ORCID ProfileSarah Mubeen, View ORCID ProfileJens Lehmann, View ORCID ProfileMartin Hofmann-Apitius, View ORCID ProfileCharles Tapley Hoyt, View ORCID ProfileDaniel Domingo-Fernández
doi: https://doi.org/10.1101/2020.08.20.259226
Vinay Srinivas Bharadhwaj
1Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
2Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53113 Bonn, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Vinay Srinivas Bharadhwaj
Mehdi Ali
3Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53113, Germany
4Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Dresden and Sankt Augustin, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mehdi Ali
Colin Birkenbihl
1Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
2Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53113 Bonn, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Colin Birkenbihl
Sarah Mubeen
1Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
2Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53113 Bonn, Germany
5Fraunhofer Center for Machine Learning, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sarah Mubeen
Jens Lehmann
3Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53113, Germany
4Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Dresden and Sankt Augustin, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jens Lehmann
Martin Hofmann-Apitius
1Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
2Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53113 Bonn, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Martin Hofmann-Apitius
Charles Tapley Hoyt
1Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
3Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53113, Germany
5Fraunhofer Center for Machine Learning, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Charles Tapley Hoyt
Daniel Domingo-Fernández
1Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
3Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn 53113, Germany
5Fraunhofer Center for Machine Learning, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Daniel Domingo-Fernández
  • For correspondence: daniel.domingo.fernandez@scai.fraunhofer.de
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

As machine learning and artificial intelligence become more useful in the interpretation of biomedical data, their utility depends on the data used to train them. Due to the complexity and high dimensionality of biomedical data, there is a need for approaches that combine prior knowledge around known biological interactions with patient data. Here, we present CLEP, a novel approach that generates new patient representations by leveraging both prior knowledge and patient-level data. First, given a patient-level dataset and a knowledge graph containing relations across features that can be mapped to the dataset, CLEP incorporates patients into the knowledge graph as new nodes connected to their most characteristic features. Next, CLEP employs knowledge graph embedding models to generate new patient representations that can ultimately be used for a variety of downstream tasks, ranging from clustering to classification. We demonstrate how using new patient representations generated by CLEP significantly improves performance in classifying between patients and healthy controls for a variety of machine learning models, as compared to the use of the original transcriptomics data. Furthermore, we also show how incorporating patients into a knowledge graph can foster the interpretation and identification of biological features characteristic of a specific disease or patient subgroup. Finally, we released CLEP as an open source Python package together with examples and documentation.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • https://github.com/hybrid-kg/clep/

  • https://github.com/hybrid-kg/clep-resources/

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted November 04, 2020.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
CLEP: A Hybrid Data- and Knowledge-Driven Framework for Generating Patient Representations
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
CLEP: A Hybrid Data- and Knowledge-Driven Framework for Generating Patient Representations
Vinay Srinivas Bharadhwaj, Mehdi Ali, Colin Birkenbihl, Sarah Mubeen, Jens Lehmann, Martin Hofmann-Apitius, Charles Tapley Hoyt, Daniel Domingo-Fernández
bioRxiv 2020.08.20.259226; doi: https://doi.org/10.1101/2020.08.20.259226
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
CLEP: A Hybrid Data- and Knowledge-Driven Framework for Generating Patient Representations
Vinay Srinivas Bharadhwaj, Mehdi Ali, Colin Birkenbihl, Sarah Mubeen, Jens Lehmann, Martin Hofmann-Apitius, Charles Tapley Hoyt, Daniel Domingo-Fernández
bioRxiv 2020.08.20.259226; doi: https://doi.org/10.1101/2020.08.20.259226

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3698)
  • Biochemistry (7809)
  • Bioengineering (5689)
  • Bioinformatics (21330)
  • Biophysics (10595)
  • Cancer Biology (8199)
  • Cell Biology (11961)
  • Clinical Trials (138)
  • Developmental Biology (6777)
  • Ecology (10419)
  • Epidemiology (2065)
  • Evolutionary Biology (13900)
  • Genetics (9726)
  • Genomics (13094)
  • Immunology (8164)
  • Microbiology (20058)
  • Molecular Biology (7871)
  • Neuroscience (43147)
  • Paleontology (321)
  • Pathology (1280)
  • Pharmacology and Toxicology (2264)
  • Physiology (3362)
  • Plant Biology (7246)
  • Scientific Communication and Education (1315)
  • Synthetic Biology (2010)
  • Systems Biology (5547)
  • Zoology (1132)