RT Journal Article SR Electronic T1 Towards In-Silico CLIP-seq: Predicting Protein-RNA Interaction via Sequence-to-Signal Learning JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.09.16.508290 DO 10.1101/2022.09.16.508290 A1 Marc Horlacher A1 Nils Wagner A1 Lambert Moyon A1 Klara Kuret A1 Nicolas Goedert A1 Marco Salvatore A1 Jernej Ule A1 Julien Gagneur A1 Ole Winther A1 Annalisa Marsico YR 2022 UL http://biorxiv.org/content/early/2022/09/28/2022.09.16.508290.abstract AB Unraveling sequence determinants which drive protein-RNA interaction is crucial for studying binding mechanisms and the impact of genomic variants. While CLIP-seq allows for transcriptome-wide profiling of in vivo protein-RNA interactions, it is limited to expressed transcripts, requiring computational imputation of missing binding information. Existing classification-based methods predict binding with low resolution and depend on prior labeling of transcriptome regions for training. We present RBPNet, a novel deep learning method, which predicts CLIP crosslink count distribution from RNA sequence at single-nucleotide resolution. By training on up to a million regions, RBPNet achieves high generalization on eCLIP, iCLIP and miCLIP assays, outperforming state-of-the-art classifiers. CLIP-seq suffers from various technical biases, complicating downstream interpretation. RBPNet performs bias correction by modeling the raw signal as a mixture of the protein-specific and background signal. Through model interrogation via Integrated Gradients, RBPNet identifies predictive sub-sequences corresponding to known binding motifs and enables variant-impact scoring via in silico mutagenesis. Together, RBPNet improves inference of protein-RNA interaction, as well as mechanistic interpretation of predictions.Competing Interest StatementThe authors have declared no competing interest.