Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Galaxy-Kubernetes integration: scaling bioinformatics workflows in the cloud

View ORCID ProfilePablo Moreno, View ORCID ProfileLuca Pireddu, View ORCID ProfilePierrick Roger, Nuwan Goonasekera, View ORCID ProfileEnis Afgan, View ORCID ProfileMarius van den Beek, View ORCID ProfileSijin He, View ORCID ProfileAnders Larsson, View ORCID ProfileDaniel Schober, View ORCID ProfileChristoph Ruttkies, View ORCID ProfileDavid Johnson, View ORCID ProfilePhilippe Rocca-Serra, View ORCID ProfileRalf JM Weber, View ORCID ProfileBjörn Gruening, View ORCID ProfileReza M Salek, View ORCID ProfileNamrata Kale, View ORCID ProfileYasset Perez-Riverol, View ORCID ProfileIrene Papatheodorou, View ORCID ProfileOla Spjuth, View ORCID ProfileSteffen Neumann
doi: https://doi.org/10.1101/488643
Pablo Moreno
aEuropean Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Pablo Moreno
Luca Pireddu
bDistributed Computing Group, CRS4, Italy;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Luca Pireddu
Pierrick Roger
cCEA, Laboratory for Data Analysis and Systems Intelligence, MetaboHUB, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Pierrick Roger
Nuwan Goonasekera
fMelbourne Bioinformatics & EMBL Australia Bioinformatics Resource, University of Melbourne, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Enis Afgan
gJohns Hopkins University, Department of Biology, USA;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Enis Afgan
Marius van den Beek
lInstitut Curie, PSL Research University, Paris, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Marius van den Beek
Sijin He
aEuropean Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sijin He
Anders Larsson
fMelbourne Bioinformatics & EMBL Australia Bioinformatics Resource, University of Melbourne, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Anders Larsson
Daniel Schober
dLeibniz Institute of Plant Biochemistry, Dept. of Stress and Developmental Biology, Halle, Germany.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Daniel Schober
Christoph Ruttkies
dLeibniz Institute of Plant Biochemistry, Dept. of Stress and Developmental Biology, Halle, Germany.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Christoph Ruttkies
David Johnson
hDepartment of Engineering Science, University of Oxford, United Kingdom;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for David Johnson
Philippe Rocca-Serra
hDepartment of Engineering Science, University of Oxford, United Kingdom;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Philippe Rocca-Serra
Ralf JM Weber
iSchool of Biosciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ralf JM Weber
Björn Gruening
jUniversity of Freiburg, Department of Computer Science, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Björn Gruening
Reza M Salek
kInternational Agency for Research on Cancer, Lyon CEDEX 08, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Reza M Salek
Namrata Kale
aEuropean Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Namrata Kale
Yasset Perez-Riverol
aEuropean Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yasset Perez-Riverol
Irene Papatheodorou
aEuropean Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Irene Papatheodorou
Ola Spjuth
eDepartment of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Sweden
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ola Spjuth
Steffen Neumann
dLeibniz Institute of Plant Biochemistry, Dept. of Stress and Developmental Biology, Halle, Germany.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Steffen Neumann
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Summary

Making reproducible, auditable and scalable data-processing analysis workflows is an important challenge in the field of bioinformatics. Recently, software containers and cloud computing introduced a novel solution to address these challenges. They simplify software installation, management and reproducibility by packaging tools and their dependencies. In this work we implemented a cloud provider agnostic and scalable container orchestration setup for the popular Galaxy workflow environment. This solution enables Galaxy to run on and offload jobs to most cloud providers (e.g. Amazon Web Services, Google Cloud or OpenStack, among others) through the Kubernetes container orchestrator.

Availability All code has been contributed to the Galaxy Project and is available (since Galaxy 17.05) at https://github.com/galaxyproject/ in the galaxy and galaxy-kubernetes repositories. https://public.phenomenal-h2020.eu/ is an example deployment.

Suppl. Information Supplementary Files are available online.

Contact pmoreno{at}ebi.ac.uk, European Molecular Biology Laboratory, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK, Tel: +44-1223-494267, Fax: +44-1223-484696.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted February 12, 2019.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Galaxy-Kubernetes integration: scaling bioinformatics workflows in the cloud
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Galaxy-Kubernetes integration: scaling bioinformatics workflows in the cloud
Pablo Moreno, Luca Pireddu, Pierrick Roger, Nuwan Goonasekera, Enis Afgan, Marius van den Beek, Sijin He, Anders Larsson, Daniel Schober, Christoph Ruttkies, David Johnson, Philippe Rocca-Serra, Ralf JM Weber, Björn Gruening, Reza M Salek, Namrata Kale, Yasset Perez-Riverol, Irene Papatheodorou, Ola Spjuth, Steffen Neumann
bioRxiv 488643; doi: https://doi.org/10.1101/488643
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Galaxy-Kubernetes integration: scaling bioinformatics workflows in the cloud
Pablo Moreno, Luca Pireddu, Pierrick Roger, Nuwan Goonasekera, Enis Afgan, Marius van den Beek, Sijin He, Anders Larsson, Daniel Schober, Christoph Ruttkies, David Johnson, Philippe Rocca-Serra, Ralf JM Weber, Björn Gruening, Reza M Salek, Namrata Kale, Yasset Perez-Riverol, Irene Papatheodorou, Ola Spjuth, Steffen Neumann
bioRxiv 488643; doi: https://doi.org/10.1101/488643

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4246)
  • Biochemistry (9184)
  • Bioengineering (6808)
  • Bioinformatics (24080)
  • Biophysics (12167)
  • Cancer Biology (9570)
  • Cell Biology (13847)
  • Clinical Trials (138)
  • Developmental Biology (7666)
  • Ecology (11742)
  • Epidemiology (2066)
  • Evolutionary Biology (15548)
  • Genetics (10676)
  • Genomics (14372)
  • Immunology (9523)
  • Microbiology (22923)
  • Molecular Biology (9140)
  • Neuroscience (49175)
  • Paleontology (358)
  • Pathology (1488)
  • Pharmacology and Toxicology (2584)
  • Physiology (3851)
  • Plant Biology (8361)
  • Scientific Communication and Education (1474)
  • Synthetic Biology (2302)
  • Systems Biology (6207)
  • Zoology (1304)