PT - JOURNAL ARTICLE AU - Patrick Hüther AU - Jörg Hagmann AU - Adam Nunn AU - Ioanna Kakoulidou AU - Rahul Pisupati AU - David Langenberger AU - Detlef Weigel AU - Frank Johannes AU - Sebastian J. Schultheiss AU - Claude Becker TI - MethylScore, a pipeline for accurate and context-aware identification of differentially methylated regions from population-scale plant WGBS data AID - 10.1101/2022.01.06.475031 DP - 2022 Jan 01 TA - bioRxiv PG - 2022.01.06.475031 4099 - http://biorxiv.org/content/early/2022/01/06/2022.01.06.475031.short 4100 - http://biorxiv.org/content/early/2022/01/06/2022.01.06.475031.full AB - Whole-genome bisulfite sequencing (WGBS) is the standard method for profiling DNA methylation at single-nucleotide resolution. Many WGBS-based studies aim to identify biologically relevant loci that display differential methylation between genotypes, treatment groups, tissues, or developmental stages. Over the years, different tools have been developed to extract differentially methylated regions (DMRs) from whole-genome data. Often, such tools are built upon assumptions from mammalian data and do not consider the substantially more complex and variable nature of plant DNA methylation. Here, we present MethylScore, a pipeline to analyze WGBS data and to account for plant-specific DNA methylation properties. MethylScore processes data from genomic alignments to DMR output and is designed to be usable by novice and expert users alike. It uses an unsupervised machine learning approach to segment the genome by classification into states of high and low methylation, substantially reducing the number of necessary statistical tests while increasing the signal-to-noise ratio and the statistical power. We show how MethylScore can identify DMRs from hundreds of samples and how its data-driven approach can stratify associated samples without prior information. We identify DMRs in the A. thaliana 1001 Genomes dataset to unveil known and unknown genotype-epigenotype associations. MethylScore is an accessible pipeline for plant WGBS data, with unprecedented features for DMR calling in small- and large-scale datasets; it is built as a Nextflow pipeline and its source code is available at https://github.com/Computomics/MethylScore.Competing Interest StatementThe authors declare the following competing interests: JH is currently an employee of Computomics GmbH. S.J.S. is currently the CEO of and holds shares in Computomics GmbH. A.N. is currently an employee of ecSeq Bioinformatics GmbH. D.L. is currently the CEO of and holds shares in ecSeq Bioinformatics GmbH.