Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

ABySS 2.0: Resource-Efficient Assembly of Large Genomes using a Bloom Filter

Shaun D Jackman, Benjamin P Vandervalk, Hamid Mohamadi, Justin Chu, Sarah Yeo, S Austin Hammond, Golnaz Jahesh, Hamza Khan, Lauren Coombe, Rene L Warren, Inanc Birol
doi: https://doi.org/10.1101/068338
Shaun D Jackman
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Benjamin P Vandervalk
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hamid Mohamadi
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Justin Chu
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sarah Yeo
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
S Austin Hammond
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Golnaz Jahesh
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hamza Khan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lauren Coombe
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rene L Warren
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Inanc Birol
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

The assembly of DNA sequences de novo is fundamental to genomics research. It is the first of many steps towards elucidating and characterizing whole genomes. Downstream applications, including analysis of genomic variation between species, between or within individuals critically depends on robustly assembled sequences. In the span of a single decade, the sequence throughput of leading DNA sequencing instruments has increased drastically, and coupled with established and planned large-scale, personalized medicine initiatives to sequence genomes in the thousands and even millions, the development of efficient, scalable and accurate bioinformatics tools for producing high-quality reference draft genomes is timely.

With ABySS 1.0, we originally showed that assembling the human genome using short 50 bp sequencing reads was possible by aggregating the half terabyte of compute memory needed over several computers using a standardized message-passing system (MPI). We present here its re-design, which departs from MPI and instead implements algorithms that employ a Bloom filter, a probabilistic data structure, to represent a de Bruijn graph and reduce memory requirements.

We present assembly benchmarks of human Genome in a Bottle 250 bp Illumina paired-end and 6 kbp mate-pair libraries from a single individual, yielding a NG50 (NGA50) scaffold contiguity of 3.5 (3.0) Mbp using less than 35 GB of RAM, a modest memory requirement by today’s standard that is often available on a single computer. We also investigate the use of BioNano Genomics and 10x Genomics’ Chromium data to further improve the scaffold contiguity of this assembly to 42 (15) Mbp.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted August 07, 2016.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
ABySS 2.0: Resource-Efficient Assembly of Large Genomes using a Bloom Filter
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
ABySS 2.0: Resource-Efficient Assembly of Large Genomes using a Bloom Filter
Shaun D Jackman, Benjamin P Vandervalk, Hamid Mohamadi, Justin Chu, Sarah Yeo, S Austin Hammond, Golnaz Jahesh, Hamza Khan, Lauren Coombe, Rene L Warren, Inanc Birol
bioRxiv 068338; doi: https://doi.org/10.1101/068338
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
ABySS 2.0: Resource-Efficient Assembly of Large Genomes using a Bloom Filter
Shaun D Jackman, Benjamin P Vandervalk, Hamid Mohamadi, Justin Chu, Sarah Yeo, S Austin Hammond, Golnaz Jahesh, Hamza Khan, Lauren Coombe, Rene L Warren, Inanc Birol
bioRxiv 068338; doi: https://doi.org/10.1101/068338

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4683)
  • Biochemistry (10360)
  • Bioengineering (7675)
  • Bioinformatics (26337)
  • Biophysics (13528)
  • Cancer Biology (10686)
  • Cell Biology (15440)
  • Clinical Trials (138)
  • Developmental Biology (8497)
  • Ecology (12821)
  • Epidemiology (2067)
  • Evolutionary Biology (16858)
  • Genetics (11399)
  • Genomics (15478)
  • Immunology (10617)
  • Microbiology (25217)
  • Molecular Biology (10223)
  • Neuroscience (54471)
  • Paleontology (401)
  • Pathology (1668)
  • Pharmacology and Toxicology (2897)
  • Physiology (4342)
  • Plant Biology (9247)
  • Scientific Communication and Education (1586)
  • Synthetic Biology (2558)
  • Systems Biology (6781)
  • Zoology (1466)