Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Reproducible genomics analysis pipelines with GNU Guix

Ricardo Wurmus, Bora Uyar, Brendan Osberg, Vedran Franke, Alexander Gosdschan, Katarzyna Wreczycka, Jonathan Ronen, View ORCID ProfileAltuna Akalin
doi: https://doi.org/10.1101/298653
Ricardo Wurmus
Max Delbrueck Center for Molecular Medicine, Berlin, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bora Uyar
Max Delbrueck Center for Molecular Medicine, Berlin, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brendan Osberg
Max Delbrueck Center for Molecular Medicine, Berlin, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Vedran Franke
Max Delbrueck Center for Molecular Medicine, Berlin, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alexander Gosdschan
Max Delbrueck Center for Molecular Medicine, Berlin, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Katarzyna Wreczycka
Max Delbrueck Center for Molecular Medicine, Berlin, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jonathan Ronen
Max Delbrueck Center for Molecular Medicine, Berlin, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Altuna Akalin
Max Delbrueck Center for Molecular Medicine, Berlin, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Altuna Akalin
  • For correspondence: aakalin@gmail.com
  • Abstract
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

In bioinformatics as well as other compute heavy research fields, there is a need for workflows that can be relied upon to produce consistent output, independent of the software environment or configuration settings of the machine on which they are executed. Indeed, this is essential for making controlled comparisons between different observations or distributing software to be used by others. Providing this type of reproducibility, however, is often complicated by the need to accommodate the myriad dependencies included in a larger body of software, each of which often contain multiple versions. In many fields as wells as bioinformatics, these versions are subject to continual change due to rapidly evolving technologies, further complicating problems related to reproducibility. We are proposing a principled approach for building analysis pipelines and taking care of their dependencies. As a case study to demonstrate the utility of our approach, we present a set of highly reproducible pipelines for the analysis of RNA-seq, ChIP-seq, Bisulfite-seq, and single-cell RNA-seq. All pipelines process raw experimental data generating reports containing publication-ready plots and figures, with interactive report elements and standard observables. Users may install these highly reproducible packages and apply them to their own datasets without any special computational expertise apart from using the command line. We hope such a toolkit will provide immediate benefit to laboratory workers wishing to process their own data sets or bioinformaticians who would want to automate parts or all of their analysis. Our approach to reproducibility may also serve as a blueprint for reproducible workflows in other areas. Our pipelines, their documentation and sample reports from the pipelines are available at http://bioinformatics.mdc-berlin.de/pigx

Copyright 
The copyright holder for this preprint is the author/funder. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
  • Posted April 11, 2018.

Download PDF

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Reproducible genomics analysis pipelines with GNU Guix
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
Share
Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus, Bora Uyar, Brendan Osberg, Vedran Franke, Alexander Gosdschan, Katarzyna Wreczycka, Jonathan Ronen, Altuna Akalin
bioRxiv 298653; doi: https://doi.org/10.1101/298653
del.icio.us logo Digg logo Reddit logo Technorati logo Twitter logo CiteULike logo Connotea logo Facebook logo Google logo Mendeley logo
Citation Tools
Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus, Bora Uyar, Brendan Osberg, Vedran Franke, Alexander Gosdschan, Katarzyna Wreczycka, Jonathan Ronen, Altuna Akalin
bioRxiv 298653; doi: https://doi.org/10.1101/298653

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (658)
  • Biochemistry (899)
  • Bioengineering (536)
  • Bioinformatics (4911)
  • Biophysics (1557)
  • Cancer Biology (1097)
  • Cell Biology (1518)
  • Clinical Trials (56)
  • Developmental Biology (1029)
  • Ecology (1707)
  • Epidemiology (852)
  • Evolutionary Biology (3793)
  • Genetics (2582)
  • Genomics (3364)
  • Immunology (641)
  • Microbiology (2558)
  • Molecular Biology (940)
  • Neuroscience (6772)
  • Paleontology (45)
  • Pathology (135)
  • Pharmacology and Toxicology (231)
  • Physiology (303)
  • Plant Biology (922)
  • Scientific Communication and Education (262)
  • Synthetic Biology (403)
  • Systems Biology (1379)
  • Zoology (166)