Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Pan-genomic Matching Statistics for Targeted Nanopore Sequencing

View ORCID ProfileOmar Ahmed, View ORCID ProfileMassimiliano Rossi, View ORCID ProfileSam Kovaka, View ORCID ProfileMichael C. Schatz, View ORCID ProfileTravis Gagie, View ORCID ProfileChristina Boucher, View ORCID ProfileBen Langmead
doi: https://doi.org/10.1101/2021.03.23.436610
Omar Ahmed
1Department of Computer Science, Johns Hopkins University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Omar Ahmed
  • For correspondence: oahmed6@jhu.edu langmea@cs.jhu.edu
Massimiliano Rossi
2Department of Computer and Information Science and Engineering, University of Florida
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Massimiliano Rossi
Sam Kovaka
1Department of Computer Science, Johns Hopkins University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sam Kovaka
Michael C. Schatz
1Department of Computer Science, Johns Hopkins University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Michael C. Schatz
Travis Gagie
3Faculty of Computer Science, Dalhousie University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Travis Gagie
Christina Boucher
2Department of Computer and Information Science and Engineering, University of Florida
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Christina Boucher
Ben Langmead
1Department of Computer Science, Johns Hopkins University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ben Langmead
  • For correspondence: oahmed6@jhu.edu langmea@cs.jhu.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Nanopore sequencing is an increasingly powerful tool for genomics. Recently, computational advances have allowed nanopores to sequence in a targeted fashion; as the sequencer emits data, software can analyze the data in real time and signal the sequencer to eject “non-target” DNA molecules. We present a novel method called SPUMONI, which enables rapid and accurate targeted sequencing with the help of efficient pangenome indexes. SPUMONI uses a compressed index to rapidly generate exact or approximate matching statistics (half-maximal exact matches) in a streaming fashion. When used to target a specific strain in a mock community, SPUMONI has similar accuracy as minimap2 when both are run against an index containing many strains per species. However SPUMONI is 12 times faster than minimap2. SPUMONI’s index and peak memory footprint are also 15 to 4 times smaller than minimap2, respectively. These improvements become even more pronounced with even larger reference databases; SPUMONI’s index size scales sublinearly with the number of reference genomes included. This could enable accurate targeted sequencing even in the case where the targeted strains have not necessarily been sequenced or assembled previously. SPUMONI is open source software available from https://github.com/oma219/spumoni.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • https://benlangmead.github.io/aws-indexes/spumoni

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted March 23, 2021.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Pan-genomic Matching Statistics for Targeted Nanopore Sequencing
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Pan-genomic Matching Statistics for Targeted Nanopore Sequencing
Omar Ahmed, Massimiliano Rossi, Sam Kovaka, Michael C. Schatz, Travis Gagie, Christina Boucher, Ben Langmead
bioRxiv 2021.03.23.436610; doi: https://doi.org/10.1101/2021.03.23.436610
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Pan-genomic Matching Statistics for Targeted Nanopore Sequencing
Omar Ahmed, Massimiliano Rossi, Sam Kovaka, Michael C. Schatz, Travis Gagie, Christina Boucher, Ben Langmead
bioRxiv 2021.03.23.436610; doi: https://doi.org/10.1101/2021.03.23.436610

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4382)
  • Biochemistry (9591)
  • Bioengineering (7090)
  • Bioinformatics (24856)
  • Biophysics (12600)
  • Cancer Biology (9956)
  • Cell Biology (14349)
  • Clinical Trials (138)
  • Developmental Biology (7948)
  • Ecology (12105)
  • Epidemiology (2067)
  • Evolutionary Biology (15988)
  • Genetics (10925)
  • Genomics (14738)
  • Immunology (9869)
  • Microbiology (23659)
  • Molecular Biology (9484)
  • Neuroscience (50856)
  • Paleontology (369)
  • Pathology (1539)
  • Pharmacology and Toxicology (2681)
  • Physiology (4013)
  • Plant Biology (8657)
  • Scientific Communication and Education (1508)
  • Synthetic Biology (2394)
  • Systems Biology (6433)
  • Zoology (1346)