Abstract
Motivation Epigenetic assays using next-generation sequencing (NGS) have furthered our understanding of the functional genomic regions and the mechanisms of gene regulation. However, a single assay produces billions of data represented by nucleotide resolution signal tracks. The signal strength at a given nucleotide is subject to numerous sources of technical an biological noise and thus conveys limited information about the underlying biological state. In order to draw biological conclusions, data is typically summarized into higher order patterns. Numerous specialized algorithms for summarizing epigenetic signal have been proposed and include methods for peak calling or finding differentially methylated regions. A key unifying principle underlying these approaches is that they all leverage the strong prior that signal must be locally consistent.
Results We propose L0 segmentation as a universal framework for extracting locally coherent signals for diverse epigenetic sources. L0 serves to both compress and smooth the input signal by approximating it as piece-wise constant. We implement a highly scalable L0 segmentation with additional loss functions designed for NGS epigenetic data types including Poisson loss for single tracks and binomial loss for methylation/coverage data. In contrast to the widely used L1 segmentation problem, known as fused lasso, the L0 solution does not induce global attenuation and is able to capture the salient features of the data over a wide range of compression values. Finally, we show that L0 segmentation can be used as an effective prior inside other machine learning models, such as matrix factorization.
Availability Our approach is implemented as an R package “l01segmentation” with a C++ backend. Available at https://github.com/boooooogey/l01segmentation.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
© The Author 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions{at}oup.com