ABSTRACT
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) can detect read-enriched DNA loci for point-source (e.g., transcription factor binding) and broad-source factors (e.g., several histone modifications). Although numerous quality metrics for ChIP-seq data have been developed, the ‘peaks’ thus obtained are still difficult to assess with respect to signal-to-noise ratio (S/N) especially for broad-source factors, and peak reliability. Here we introduce SSP (strand-shift profile), a tool to assess the quality of ChIP-seq data without peak calling. SSP provides metrics to quantify the S/N for both point-and broad-source factors, and to estimate peak reliability based on the mapped-read distribution throughout a genome. We carried out an in-depth validation of our method using over 1,000 publicly available ChIP-seq datasets, along with virtual data, to demonstrate that SSP is more sensitive than existing tools for both point-and broad-source factors because of the larger dynamic range of the S/N score, and robust for various cell types and sequencing depth. We also found that SSP can identify low-quality samples that cannot be identified by quality metrics currently available. Finally, SSP provides an additional metric to avoid “hidden-duplicate reads” that cause aberrantly high S/Ns in the strand-shift profile. This metric can also contribute to estimation of peak mode (point-or broad-source) of each sample. Our approach provides a useful way to obtain information about sample quality and traits for ChIP-seq analyses.