Pipeline Olympics: continuable benchmarking of computational workflows for DNA methylation sequencing data against an experimental gold-standard

Abstract
DNA methylation is a widely studied epigenetic mark and a powerful biomarker of cell type, age, environmental exposures, and disease. Whole-genome sequencing following selective conversion of unmethylated cytosines into thymines via bisulfite treatment or enzymatic methods remains the reference method for DNA methylation profiling genome-wide. While numerous software tools facilitate processing of DNA methylation sequencing reads, a comprehensive benchmarking study has been lacking thus far. In this study, we systematically compared complete computational workflows for processing DNA methylation sequencing data using a dedicated benchmarking dataset generated with five genome-wide profiling protocols. As an evaluation reference, we employed highly quantitative locus-specific measurements from our preceding benchmark of targeted DNA methylation assays. Based on this experimental gold-standard assessment and a number of comprehensive metrics, we ranked the evaluated workflows, identified workflows that consistently demonstrated superior performance, and revealed global workflow development trends. To facilitate the sustainability of our benchmark, we implemented an interactive workflow execution and data presentation platform, adaptable to user-defined criteria and seamlessly expandable to future workflows.
Competing Interest Statement
The authors have declared no competing interest.
Subject Area
- Biochemistry
- Biochemistry (14164)
- Bioengineering (10825)
- Bioinformatics (34310)
- Biophysics (17648)
- Cancer Biology (14751)
- Cell Biology (20780)
- Clinical Trials (138)
- Developmental Biology (11178)
- Ecology (16501)
- Epidemiology (2067)
- Evolutionary Biology (20808)
- Genetics (13675)
- Genomics (19092)
- Immunology (14242)
- Microbiology (33131)
- Molecular Biology (13823)
- Neuroscience (72393)
- Paleontology (542)
- Pathology (2278)
- Pharmacology and Toxicology (3860)
- Physiology (6101)
- Plant Biology (12388)
- Synthetic Biology (3457)
- Systems Biology (8369)
- Zoology (1913)