Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

GEMBS — high through-put processing for DNA methylation data from Whole Genome Bisulfite Sequencing (WGBS)

View ORCID ProfileAngelika Merkel, Marcos Fernandez-Callejo, View ORCID ProfileEloi Casals, View ORCID ProfileSantiago Marco-Sola, View ORCID ProfileRonald Schuyler, View ORCID ProfileIvo G. Gut, View ORCID ProfileSimon C. Heath
doi: https://doi.org/10.1101/201988
Angelika Merkel
Centro Nacional de Analisis Genomico (CNAG-CRG)
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Angelika Merkel
  • For correspondence: angelika.merkel@cnag.crg.es
Marcos Fernandez-Callejo
Centro Nacional de Analisis Genomico (CNAG-CRG)
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eloi Casals
Centro Nacional de Analisis Genomico (CNAG-CRG)
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Eloi Casals
Santiago Marco-Sola
Centro Nacional de Analisis Genomico (CNAG-CRG)
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Santiago Marco-Sola
Ronald Schuyler
Centro Nacional de Analisis Genomico (CNAG-CRG)
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ronald Schuyler
Ivo G. Gut
Centro Nacional de Analisis Genomico (CNAG-CRG)
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ivo G. Gut
Simon C. Heath
Centro Nacional de Analisis Genomico (CNAG-CRG)
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Simon C. Heath
  • Abstract
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

DNA methylation is essential for normal embryogenesis and development in mammals. Currently, whole genome sequencing of bisulfite converted DNA (WGBS) represents the gold standard for studying DNA methylation at genomic level. Contrary to other techniques, it provides an unbiased view of the entire genome at single base pair resolution. However, in practice, due to its (until recently) comparatively high cost, its application for the analysis of large data sets (i.e. > 50 samples) has been lagging behind other more cost-efficient platforms, such as for example the Illumina microarrays (Infinium 27K, 450k and EPIC). Subsequently, despite the variety of software tools that exist for the analysis of WGBS, processing of large datasets still remains cumbersome. We present GEMBS, a bioinformatics pipeline specifically designed for the analysis of large WGBS data sets. GEMBS is based on two core modules: GEM3, a high performance read aligner, and BScall, a variant caller specifically for bisulfite sequencing data. Both components are embedded in a highly parallel workflow enabling highly efficient and reliable execution in a HPC environment. In this study, we benchmark GEMBS performance against other common analysis tools and show how GEMBS can be used for accurate variant calling from WGBS data.

Copyright 
The copyright holder for this preprint is the author/funder. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
  • Posted December 12, 2017.

Download PDF

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
GEMBS — high through-put processing for DNA methylation data from Whole Genome Bisulfite Sequencing (WGBS)
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
Share
GEMBS — high through-put processing for DNA methylation data from Whole Genome Bisulfite Sequencing (WGBS)
Angelika Merkel, Marcos Fernandez-Callejo, Eloi Casals, Santiago Marco-Sola, Ronald Schuyler, Ivo G. Gut, Simon C. Heath
bioRxiv 201988; doi: https://doi.org/10.1101/201988
del.icio.us logo Digg logo Reddit logo Technorati logo Twitter logo CiteULike logo Connotea logo Facebook logo Google logo Mendeley logo
Citation Tools
GEMBS — high through-put processing for DNA methylation data from Whole Genome Bisulfite Sequencing (WGBS)
Angelika Merkel, Marcos Fernandez-Callejo, Eloi Casals, Santiago Marco-Sola, Ronald Schuyler, Ivo G. Gut, Simon C. Heath
bioRxiv 201988; doi: https://doi.org/10.1101/201988

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (619)
  • Biochemistry (857)
  • Bioengineering (515)
  • Bioinformatics (4754)
  • Biophysics (1499)
  • Cancer Biology (1028)
  • Cell Biology (1445)
  • Clinical Trials (52)
  • Developmental Biology (973)
  • Ecology (1628)
  • Epidemiology (808)
  • Evolutionary Biology (3687)
  • Genetics (2509)
  • Genomics (3260)
  • Immunology (601)
  • Microbiology (2408)
  • Molecular Biology (888)
  • Neuroscience (6471)
  • Paleontology (42)
  • Pathology (124)
  • Pharmacology and Toxicology (220)
  • Physiology (286)
  • Plant Biology (890)
  • Scientific Communication and Education (247)
  • Synthetic Biology (383)
  • Systems Biology (1321)
  • Zoology (162)