Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing

View ORCID ProfileRuibang Luo, View ORCID ProfileFritz J. Sedlazeck, View ORCID ProfileTak-Wah Lam, View ORCID ProfileMichael C. Schatz
doi: https://doi.org/10.1101/310458
Ruibang Luo
1Department of Computer Science, The University of Hong Kong, Hong Kong
2Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ruibang Luo
  • For correspondence: rbluo@cs.hku.hk
Fritz J. Sedlazeck
3Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Fritz J. Sedlazeck
Tak-Wah Lam
1Department of Computer Science, The University of Hong Kong, Hong Kong
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Tak-Wah Lam
Michael C. Schatz
2Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Michael C. Schatz
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

The accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5%-15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieved 99.73%, 97.68% and 95.36% precision on known variants, and 98.65%, 92.57%, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than two hours on a standard server. Furthermore, we identified 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source (https://github.com/aquaskyline/Clairvoyante), with modules to train, utilize and visualize the model.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted September 26, 2018.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing
Ruibang Luo, Fritz J. Sedlazeck, Tak-Wah Lam, Michael C. Schatz
bioRxiv 310458; doi: https://doi.org/10.1101/310458
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing
Ruibang Luo, Fritz J. Sedlazeck, Tak-Wah Lam, Michael C. Schatz
bioRxiv 310458; doi: https://doi.org/10.1101/310458

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (2543)
  • Biochemistry (4994)
  • Bioengineering (3497)
  • Bioinformatics (15279)
  • Biophysics (6926)
  • Cancer Biology (5427)
  • Cell Biology (7771)
  • Clinical Trials (138)
  • Developmental Biology (4558)
  • Ecology (7180)
  • Epidemiology (2059)
  • Evolutionary Biology (10261)
  • Genetics (7532)
  • Genomics (9826)
  • Immunology (4899)
  • Microbiology (13304)
  • Molecular Biology (5165)
  • Neuroscience (29569)
  • Paleontology (203)
  • Pathology (842)
  • Pharmacology and Toxicology (1470)
  • Physiology (2153)
  • Plant Biology (4780)
  • Scientific Communication and Education (1015)
  • Synthetic Biology (1343)
  • Systems Biology (4022)
  • Zoology (771)