Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

DeepBound: Accurate Identification of Transcript Boundaries via Deep Convolutional Neural Fields

Mingfu Shao, Jianzhu Ma, Sheng Wang
doi: https://doi.org/10.1101/125229
Mingfu Shao
1Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jianzhu Ma
2School of Medicine, University of California San Diego, La Jolla, CA 92093
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sheng Wang
3Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Motivation Reconstructing the full-length expressed transcripts (a.k.a. the transcript assembly problem) from the short sequencing reads produced by RNA-seq protocol plays a central role in identifying novel genes and transcripts as well as in studying gene expressions and gene functions. A crucial step in transcript assembly is to accurately determine the splicing junctions and boundaries of the expressed transcripts from the reads alignment. In contrast to the splicing junctions that can be efficiently detected from spliced reads, the problem of identifying boundaries remains open and challenging, due to the fact that the signal related to boundaries is noisy and weak.

Results We present DeepBound, an effective approach to identify boundaries of expressed transcripts from RNA-seq reads alignment. In its core DeepBound employs deep convolutional neural fields to learn the hidden distributions and patterns of boundaries. To accurately model the transition probabilities and to solve the label-imbalance problem, we novelly incorporate the AUC (area under the curve) score into the optimizing objective function. To address the issue that deep probabilistic graphical models requires large number of labeled training samples, we propose to use simulated RNA-seq datasets to train our model. Through extensive experimental studies on both simulation datasets of two species and biological datasets, we show that DeepBound consistently and significantly outperforms the two existing methods.

Availability DeepBound is freely available at https://github.com/realbigws/DeepBound.

Contact mingfu.shao{at}cs.cmu.edu, realbigws{at}gmail.com

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted April 07, 2017.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
DeepBound: Accurate Identification of Transcript Boundaries via Deep Convolutional Neural Fields
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
DeepBound: Accurate Identification of Transcript Boundaries via Deep Convolutional Neural Fields
Mingfu Shao, Jianzhu Ma, Sheng Wang
bioRxiv 125229; doi: https://doi.org/10.1101/125229
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
DeepBound: Accurate Identification of Transcript Boundaries via Deep Convolutional Neural Fields
Mingfu Shao, Jianzhu Ma, Sheng Wang
bioRxiv 125229; doi: https://doi.org/10.1101/125229

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4658)
  • Biochemistry (10313)
  • Bioengineering (7634)
  • Bioinformatics (26240)
  • Biophysics (13480)
  • Cancer Biology (10648)
  • Cell Biology (15359)
  • Clinical Trials (138)
  • Developmental Biology (8462)
  • Ecology (12776)
  • Epidemiology (2067)
  • Evolutionary Biology (16792)
  • Genetics (11371)
  • Genomics (15428)
  • Immunology (10578)
  • Microbiology (25086)
  • Molecular Biology (10172)
  • Neuroscience (54229)
  • Paleontology (398)
  • Pathology (1660)
  • Pharmacology and Toxicology (2883)
  • Physiology (4326)
  • Plant Biology (9210)
  • Scientific Communication and Education (1582)
  • Synthetic Biology (2544)
  • Systems Biology (6761)
  • Zoology (1457)