Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Toward perfect reads: short reads correction via mapping on compacted de Bruijn graphs

View ORCID ProfileAntoine Limasset, View ORCID ProfileJean-Francois Flot, View ORCID ProfilePierre Peterlongo
doi: https://doi.org/10.1101/558395
Antoine Limasset
CNRS;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Antoine Limasset
  • For correspondence: antoine.limasset@univ-lille.fr
Jean-Francois Flot
Universite Libre de Bruxelles;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jean-Francois Flot
Pierre Peterlongo
INRIA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Pierre Peterlongo
  • Abstract
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Motivations: Short-read accuracy is important for downstream analyses such as genome assembly and hybrid long-read correction. Despite much work on short-read correction, present-day correctors either do not scale well on large data sets or consider reads as mere suites of k-mers, without taking into account their full-length read information. Results: We propose a new method to correct short reads using de Bruijn graphs, and implement it as a tool called Bcool. As a first step, Bcool constructs a compacted de Bruijn graph from the reads. This graph is filtered on the basis of k-mer abundance then of unitig abundance, thereby removing most sequencing errors. The cleaned graph is then used as a reference on which the reads are mapped to correct them. We show that this approach yields more accurate reads than k-mer-spectrum correctors while being scalable to human-size genomic datasets and beyond. Availability and Implementation: The implementation is open source and available at http://github.com/Malfoy/BCOOL under the Affero GPL license and as a Bioconda package. Contact: Antoine Limasset antoine.limasset@gmail.com & Jean-Francois Flot jflot@ulb.ac.be & Pierre Peterlongo pierre.peterlongo@inria.fr

Footnotes

  • https://github.com/Malfoy/BCOOL

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted February 28, 2019.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Toward perfect reads: short reads correction via mapping on compacted de Bruijn graphs
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
Share
Toward perfect reads: short reads correction via mapping on compacted de Bruijn graphs
Antoine Limasset, Jean-Francois Flot, Pierre Peterlongo
bioRxiv 558395; doi: https://doi.org/10.1101/558395
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Toward perfect reads: short reads correction via mapping on compacted de Bruijn graphs
Antoine Limasset, Jean-Francois Flot, Pierre Peterlongo
bioRxiv 558395; doi: https://doi.org/10.1101/558395

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (1524)
  • Biochemistry (2479)
  • Bioengineering (1731)
  • Bioinformatics (9663)
  • Biophysics (3895)
  • Cancer Biology (2968)
  • Cell Biology (4188)
  • Clinical Trials (135)
  • Developmental Biology (2624)
  • Ecology (4097)
  • Epidemiology (2031)
  • Evolutionary Biology (6892)
  • Genetics (5204)
  • Genomics (6495)
  • Immunology (2182)
  • Microbiology (6936)
  • Molecular Biology (2751)
  • Neuroscience (17259)
  • Paleontology (126)
  • Pathology (425)
  • Pharmacology and Toxicology (705)
  • Physiology (1056)
  • Plant Biology (2487)
  • Scientific Communication and Education (643)
  • Synthetic Biology (831)
  • Systems Biology (2687)
  • Zoology (429)