PT - JOURNAL ARTICLE AU - Akanksha Srivastava AU - Yuliya V Karpievitch AU - Steven R Eichten AU - Justin O Borevitz AU - Ryan Lister TI - HOME: A histogram based machine learning approach for effective identification of differentially methylated regions AID - 10.1101/228221 DP - 2017 Jan 01 TA - bioRxiv PG - 228221 4099 - http://biorxiv.org/content/early/2017/12/02/228221.short 4100 - http://biorxiv.org/content/early/2017/12/02/228221.full AB - DNA methylation is a covalent modification of DNA that plays important role in regulating gene expression, cell identity, and organism development. Localized changes in DNA methylation are observed between different cell types, during development and aging, in various disease states, and under different stress conditions, and are often associated with functionally important genomic regions, including promoters and enhancers. The development of whole genome bisulfite sequencing has made it possible to identify methylation differences at single base resolution throughout an entire genome. A persistent challenge in DNA methylome analysis is the accurate identification of differentially methylated regions (DMRs) in the genome between samples. Sensitive and specific identification of DMRs between different conditions requires accurate and efficient algorithms, and while various tools have been developed to tackle this problem, they frequently suffer from limitations in sensitivity and accuracy. Here, we present a novel Histogram Of MEthylation (HOME) based method that exploits the inherent difference in distribution of methylation levels between DMRs and non-DMRs to robustly discriminate between the two via a linear Support Vector Machine. HOME produces accurate DMR boundaries, few spurious DMRs, and provides the ability to determine DMRs in time-series data. HOME can identify DMRs among any number of treatment groups in experiments with or without replicates at high accuracy. We demonstrate that HOME produces more accurate DMRs than the current state-of-the-art methods on both simulated and biological datasets, and provide a user-friendly implementation of the tool.