RT Journal Article SR Electronic T1 Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.01.26.428252 DO 10.1101/2021.01.26.428252 A1 Koen Van den Berge A1 Hsin-Jung Chou A1 Hector Roux de Bézieux A1 Kelly Street A1 Davide Risso A1 John Ngai A1 Sandrine Dudoit YR 2021 UL http://biorxiv.org/content/early/2021/05/20/2021.01.26.428252.abstract AB Modern assays have enabled high-throughput studies of epigenetic regulation of gene expression using DNA sequencing. In particular, the assay for transposase-accessible chromatin using sequencing (ATAC-seq) allows the study of chromatin configuration for an entire genome. Despite the gain in popularity of the assay, there have been limited studies investigating the analytical challenges related to ATAC-seq data, and most studies leverage tools developed for bulk transcriptome sequencing (RNA-seq). Here, we show that GC-content effects are omnipresent in ATAC-seq datasets. Since the GC-content effects are sample-specific, they can bias downstream analyses such as clustering and differential accessibility analysis. We introduce a normalization method based on smooth-quantile normalization within GC-content bins, and evaluate it together with eleven different normalization procedures on eight public ATAC-seq datasets. Our work clearly shows that accounting for GC-content effects in the normalization is crucial for common downstream ATAC-seq data analyses, leading to improved accuracy and interpretability of the results. Using two case studies, we show that exploratory data analysis is essential to guide the choice of an appropriate normalization method for a given dataset.Competing Interest StatementThe authors have declared no competing interest.