De novo detection of differentially bound regions for ChIP-seq data using peaks and windows: controlling error rates correctly

Aaron T L Lun; Gordon K Smyth

doi:10.1093/nar/gku351

De novo detection of differentially bound regions for ChIP-seq data using peaks and windows: controlling error rates correctly

Nucleic Acids Res. 2014 Jun;42(11):e95. doi: 10.1093/nar/gku351. Epub 2014 May 22.

Authors

Aaron T L Lun¹, Gordon K Smyth²

Affiliations

¹ The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, Australia Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Australia.
² The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, Australia Department of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Australia smyth@wehi.edu.au.

Abstract

A common aim in ChIP-seq experiments is to identify changes in protein binding patterns between conditions, i.e. differential binding. A number of peak- and window-based strategies have been developed to detect differential binding when the regions of interest are not known in advance. However, careful consideration of error control is needed when applying these methods. Peak-based approaches use the same data set to define peaks and to detect differential binding. Done improperly, this can result in loss of type I error control. For window-based methods, controlling the false discovery rate over all detected windows does not guarantee control across all detected regions. Misinterpreting the former as the latter can result in unexpected liberalness. Here, several solutions are presented to maintain error control for these de novo counting strategies. For peak-based methods, peak calling should be performed on pooled libraries prior to the statistical analysis. For window-based methods, a hybrid approach using Simes' method is proposed to maintain control of the false discovery rate across regions. More generally, the relative advantages of peak- and window-based strategies are explored using a range of simulated and real data sets. Implementations of both strategies also compare favourably to existing programs for differential binding analyses.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Binding Sites
Chromatin Immunoprecipitation / methods*
DNA-Binding Proteins / metabolism*
Histones / metabolism
Sequence Analysis, DNA / methods*
Software
Transcription Factors / metabolism

Substances

DNA-Binding Proteins
Histones
Transcription Factors