Towards In-Silico CLIP-seq: Predicting Protein-RNA Interaction via Sequence-to-Signal Learning

Marc Horlacher; Nils Wagner; Lambert Moyon; Klara Kuret; Nicolas Goedert; Marco Salvatore; Jernej Ule; Julien Gagneur; Ole Winther; Annalisa Marsico

doi:10.1101/2022.09.16.508290

Abstract

Unraveling sequence determinants which drive protein-RNA interaction is crucial for studying binding mechanisms and the impact of genomic variants. While CLIP-seq allows for transcriptome-wide profiling of in vivo protein-RNA interactions, it is limited to expressed transcripts, requiring computational imputation of missing binding information. Existing classification-based methods predict binding with low resolution and depend on prior labeling of transcriptome regions for training. We present RBPNet, a novel deep learning method, which predicts CLIP crosslink count distribution from RNA sequence at single-nucleotide resolution. By training on up to a million regions, RBPNet achieves high generalization on eCLIP, iCLIP and miCLIP assays, outperforming state-of-the-art classifiers. CLIP-seq suffers from various technical biases, complicating downstream interpretation. RBPNet performs bias correction by modeling the raw signal as a mixture of the protein-specific and background signal. Through model interrogation via Integrated Gradients, RBPNet identifies predictive sub-sequences corresponding to known binding motifs and enables variant-impact scoring via in silico mutagenesis. Together, RBPNet improves inference of protein-RNA interaction, as well as mechanistic interpretation of predictions.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

wagnern{at}in.tum.de, lambert.moyon{at}helmholtz-muenchen.de, klara.kuret{at}crick.ac.uk, nicolas.goedert{at}helmholtz-muenchen.de, marco.salvatore{at}bio.ku.dk, jernej.ule{at}crick.ac.uk, gagneur{at}in.tum.de
Revised abstract and title page layout.