Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Enabling Semantic Queries Across Federated Bioinformatics Databases

Ana Claudia Sima, Tarcisio Mendes de Farias, Erich Zbinden, Maria Anisimova, Manuel Gil, Heinz Stockinger, Kurt Stockinger, Marc Robinson-Rechavi, Christophe Dessimoz
doi: https://doi.org/10.1101/686600
Ana Claudia Sima
1ZHAW Zurich University of Applied Sciences, Switzerland
2Department of Computational Biology, University of Lausanne, Switzerland
3Center for Integrative Genomics, University of Lausanne, Switzerland
4SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tarcisio Mendes de Farias
2Department of Computational Biology, University of Lausanne, Switzerland
3Center for Integrative Genomics, University of Lausanne, Switzerland
4SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
5Department of Ecology and Evolution, University of Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Erich Zbinden
1ZHAW Zurich University of Applied Sciences, Switzerland
4SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Maria Anisimova
1ZHAW Zurich University of Applied Sciences, Switzerland
4SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Manuel Gil
1ZHAW Zurich University of Applied Sciences, Switzerland
4SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Heinz Stockinger
4SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kurt Stockinger
1ZHAW Zurich University of Applied Sciences, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marc Robinson-Rechavi
4SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
5Department of Ecology and Evolution, University of Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: Marc.Robinson-Rechavi@unil.ch Christophe.Dessimoz@unil.ch
Christophe Dessimoz
2Department of Computational Biology, University of Lausanne, Switzerland
3Center for Integrative Genomics, University of Lausanne, Switzerland
4SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
6Department of Genetics, Evolution, and Environment, University College London, UK
7Department of Computer Science, University College London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: Marc.Robinson-Rechavi@unil.ch Christophe.Dessimoz@unil.ch
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Motivation Data integration promises to be one of the main catalysts in enabling new insights to be drawn from the wealth of biological data available publicly. However, the heterogeneity of the different data sources, both at the syntactic and the semantic level, still poses significant challenges for achieving interoperability among biological databases.

Results We introduce an ontology-based federated approach for data integration. We applied this approach to three heterogeneous data stores that span different areas of biological knowledge: 1) Bgee, a gene expression relational database; 2) OMA, a Hierarchical Data Format 5 (HDF5) orthology data store, and 3) UniProtKB, a Resource Description Framework (RDF) store containing protein sequence and functional information. To enable federated queries across these sources, we first defined a new semantic model for gene expression called GenEx. We then show how the relational data in Bgee can be expressed as a virtual RDF graph, instantiating GenEx, through dedicated relational-to-RDF mappings. By applying these mappings, Bgee data are now accessible through a public SPARQL endpoint. Similarly, the materialised RDF data of OMA, expressed in terms of the Orthology ontology, is made available in a public SPARQL endpoint. We identified and formally described intersection points (i.e. virtual links) among the three data sources. These allow performing joint queries across the data stores. Finally, we lay the groundwork to enable nontechnical users to benefit from the integrated data, by providing a natural language template-based search interface.

Project URL http://biosoda.expasy.org, https://github.com/biosoda/bioquery

Footnotes

  • https://github.com/biosoda/bioquery

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted June 28, 2019.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Enabling Semantic Queries Across Federated Bioinformatics Databases
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Enabling Semantic Queries Across Federated Bioinformatics Databases
Ana Claudia Sima, Tarcisio Mendes de Farias, Erich Zbinden, Maria Anisimova, Manuel Gil, Heinz Stockinger, Kurt Stockinger, Marc Robinson-Rechavi, Christophe Dessimoz
bioRxiv 686600; doi: https://doi.org/10.1101/686600
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Enabling Semantic Queries Across Federated Bioinformatics Databases
Ana Claudia Sima, Tarcisio Mendes de Farias, Erich Zbinden, Maria Anisimova, Manuel Gil, Heinz Stockinger, Kurt Stockinger, Marc Robinson-Rechavi, Christophe Dessimoz
bioRxiv 686600; doi: https://doi.org/10.1101/686600

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4235)
  • Biochemistry (9140)
  • Bioengineering (6784)
  • Bioinformatics (24005)
  • Biophysics (12132)
  • Cancer Biology (9537)
  • Cell Biology (13781)
  • Clinical Trials (138)
  • Developmental Biology (7638)
  • Ecology (11704)
  • Epidemiology (2066)
  • Evolutionary Biology (15513)
  • Genetics (10647)
  • Genomics (14327)
  • Immunology (9484)
  • Microbiology (22849)
  • Molecular Biology (9095)
  • Neuroscience (49004)
  • Paleontology (355)
  • Pathology (1483)
  • Pharmacology and Toxicology (2570)
  • Physiology (3848)
  • Plant Biology (8332)
  • Scientific Communication and Education (1471)
  • Synthetic Biology (2296)
  • Systems Biology (6193)
  • Zoology (1301)