Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

A sequence-based, deep learning model accurately predicts RNA splicing branchpoints

Joseph M. Paggi, Gill Bejerano
doi: https://doi.org/10.1101/185868
Joseph M. Paggi
1Department of Computer Science, Stanford University, Stanford, California, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gill Bejerano
1Department of Computer Science, Stanford University, Stanford, California, USA.
2Department of Developmental Biology, Stanford University, Stanford, California, USA.
3Department of Pediatrics, Stanford University, Stanford, California, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Experimental detection of RNA splicing branchpoints, the nucleotide serving as the nucleophile in the first catalytic step of splicing, is difficult. To date, annotations exist for only 16-21% of 3’ splice sites in the human genome and even these limited annotations have been shown to be plagued by noise. We develop a sequence-only, deep learning based branchpoint predictor, LaBranchoR, which we conclude predicts a correct branchpoint for over 90% of 3’ splice sites genome-wide. Our predicted branchpoints show large agreement with trends observed in the raw data, but analysis of conservation signatures and overlap with pathogenic variants reveal that our predicted branchpoints are generally more reliable than the raw data itself. We use our predicted branchpoints to identify a sequence element upstream of branchpoints consistent with extended U2 snRNA base pairing, show an association between weak branchpoints and alternative splicing, and explore the effects of variants on branchpoints.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted September 07, 2017.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A sequence-based, deep learning model accurately predicts RNA splicing branchpoints
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A sequence-based, deep learning model accurately predicts RNA splicing branchpoints
Joseph M. Paggi, Gill Bejerano
bioRxiv 185868; doi: https://doi.org/10.1101/185868
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
A sequence-based, deep learning model accurately predicts RNA splicing branchpoints
Joseph M. Paggi, Gill Bejerano
bioRxiv 185868; doi: https://doi.org/10.1101/185868

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4095)
  • Biochemistry (8786)
  • Bioengineering (6493)
  • Bioinformatics (23386)
  • Biophysics (11766)
  • Cancer Biology (9167)
  • Cell Biology (13290)
  • Clinical Trials (138)
  • Developmental Biology (7422)
  • Ecology (11386)
  • Epidemiology (2066)
  • Evolutionary Biology (15119)
  • Genetics (10413)
  • Genomics (14024)
  • Immunology (9145)
  • Microbiology (22108)
  • Molecular Biology (8793)
  • Neuroscience (47445)
  • Paleontology (350)
  • Pathology (1423)
  • Pharmacology and Toxicology (2483)
  • Physiology (3711)
  • Plant Biology (8063)
  • Scientific Communication and Education (1433)
  • Synthetic Biology (2215)
  • Systems Biology (6021)
  • Zoology (1251)