Abstract
Enhanced cross-linking and immunoprecipitation (eCLIP) sequencing is a powerful approach to transcriptome-wide detect binding sites of RNA-binding proteins (RBPs). High background noise of CLIP protocols and current peak calling strategies which compare each sequenced library to a single control result in low replicability and high false-positives rates. Moreover, identified crosslink sites can profoundly deviate from experimentally established functional elements of well-studied RBPs. Here, we present the DEWSeq method to detect significantly enriched crosslink regions using a sliding-window approach that makes full use of replicate information and input control data. We benchmarked DEWSeq on all 107 RBPs for which sequence motifs are reported by ENCODE. Using DEWSeq, the number of motif-containing binding sites detected increased 1.64 fold relative to standard eCLIP processing, accompanied by a slight improvement in the fraction of binding sites containing known RNA sequence motifs from 56.1% to 59.1%. Further, we demonstrate the strong enrichment in known histone stem loop structures for the histone stem-loop binding protein (SLBP) and in mitochondrial targets for the known mitochondrial RBP FASTKD2 compared to standard methods. Comparison with HITS-CLIP, PAR-CLIP and iCLIP data also show a consistent improvement of overlap with these orthogonal methods when using DEWSeq on eCLIP data compared to the standard data analysis approach.
DEWSeq is a well-documented R/Bioconductor package, scalable to adequate numbers of replicates, and tends to substantially increase the proportion and total number of RBP binding sites containing biologically relevant features such as known RNA sequence motifs and secondary structures. We envision that the binding site datasets produced as part of our study will be a valuable resource for the community.
Competing Interest Statement
The authors have declared no competing interest.
Abbreviations
- adj
- adjusted
- Orig.
- original
- IP
- immunoprecipitated
- CIMS
- cross-link-induced mutation sites
- CLIP
- cross-linking immunoprecipitation
- nt
- nucleotide(s)
- PAR
- photoactivatable ribonucleoside enhanced
- Rep.
- reproducible
- SMI
- size-matched input
- TPM
- tags per million
- UTR
- untranslated region