Modeling RNA-Binding Protein Specificity In Vivo by Precisely Registering Protein-RNA Crosslink Sites

Mol Cell. 2019 Jun 20;74(6):1189-1204.e6. doi: 10.1016/j.molcel.2019.02.002.

Abstract

RNA-binding proteins (RBPs) regulate post-transcriptional gene expression by recognizing short and degenerate sequence motifs in their target transcripts, but precisely defining their binding specificity remains challenging. Crosslinking and immunoprecipitation (CLIP) allows for mapping of the exact protein-RNA crosslink sites, which frequently reside at specific positions in RBP motifs at single-nucleotide resolution. Here, we have developed a computational method, named mCross, to jointly model RBP binding specificity while precisely registering the crosslinking position in motif sites. We applied mCross to 112 RBPs using ENCODE eCLIP data and validated the reliability of the discovered motifs by genome-wide analysis of allelic binding sites. Our analyses revealed that the prototypical SR protein SRSF1 recognizes clusters of GGA half-sites in addition to its canonical GGAGGA motif. Therefore, SRSF1 regulates splicing of a much larger repertoire of transcripts than previously appreciated, including HNRNPD and HNRNPDL, which are involved in multivalent protein assemblies and phase separation.

Keywords: CLIP; RNA-binding protein; SRSF1; allelic interaction; alternative splicing; hnRNP proteins; mCross; motif discovery; phase separation; protein-RNA crosslink sites.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Binding Sites
  • Cross-Linking Reagents / chemistry
  • Gene Expression
  • HeLa Cells
  • Hep G2 Cells
  • Heterogeneous Nuclear Ribonucleoprotein D0
  • Heterogeneous-Nuclear Ribonucleoprotein D / chemistry*
  • Heterogeneous-Nuclear Ribonucleoprotein D / genetics
  • Heterogeneous-Nuclear Ribonucleoprotein D / metabolism
  • Humans
  • K562 Cells
  • Models, Molecular*
  • Nucleic Acid Conformation
  • Protein Binding
  • Protein Conformation, alpha-Helical
  • Protein Conformation, beta-Strand
  • Protein Interaction Domains and Motifs
  • RNA / chemistry*
  • RNA / genetics
  • RNA / metabolism
  • Sequence Alignment
  • Sequence Homology, Nucleic Acid
  • Serine-Arginine Splicing Factors / chemistry*
  • Serine-Arginine Splicing Factors / genetics
  • Serine-Arginine Splicing Factors / metabolism

Substances

  • Cross-Linking Reagents
  • HNRNPD protein, human
  • Heterogeneous Nuclear Ribonucleoprotein D0
  • Heterogeneous-Nuclear Ribonucleoprotein D
  • SRSF1 protein, human
  • Serine-Arginine Splicing Factors
  • RNA