Leveraging cross-link modification events in CLIP-seq for motif discovery

Nucleic Acids Res. 2015 Jan;43(1):95-103. doi: 10.1093/nar/gku1288. Epub 2014 Dec 10.

Abstract

High-throughput protein-RNA interaction data generated by CLIP-seq has provided an unprecedented depth of access to the activities of RNA-binding proteins (RBPs), the key players in co- and post-transcriptional regulation of gene expression. Motif discovery forms part of the necessary follow-up data analysis for CLIP-seq, both to refine the exact locations of RBP binding sites, and to characterize them. The specific properties of RBP binding sites, and the CLIP-seq methods, provide additional information not usually present in the classic motif discovery problem: the binding site structure, and cross-linking induced events in reads. We show that CLIP-seq data contains clear secondary structure signals, as well as technology- and RBP-specific cross-link signals. We introduce Zagros, a motif discovery algorithm specifically designed to leverage this information and explore its impact on the quality of recovered motifs. Our results indicate that using both secondary structure and cross-link modifications can greatly improve motif discovery on CLIP-seq data. Further, the motifs we recover provide insight into the balance between sequence- and structure-specificity struck by RBP binding.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • 3' Untranslated Regions
  • Algorithms*
  • Binding Sites
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Immunoprecipitation
  • Models, Statistical
  • Nucleic Acid Conformation
  • Nucleotide Motifs
  • RNA / chemistry*
  • RNA / metabolism
  • RNA-Binding Proteins / metabolism*
  • Sequence Analysis, RNA / methods

Substances

  • 3' Untranslated Regions
  • RNA-Binding Proteins
  • RNA