Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Asymptotically optimal minimizers schemes

View ORCID ProfileGuillaume Marçais, View ORCID ProfileDan DeBlasio, View ORCID ProfileCarl Kingsford
doi: https://doi.org/10.1101/256156
Guillaume Marçais
1Computational Biology Department, Carnegie Mellon University, Pittsburgh, 15213, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Guillaume Marçais
Dan DeBlasio
1Computational Biology Department, Carnegie Mellon University, Pittsburgh, 15213, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Dan DeBlasio
Carl Kingsford
1Computational Biology Department, Carnegie Mellon University, Pittsburgh, 15213, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Carl Kingsford
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Motivation The minimizers technique is a method to sample k-mers that is used in many bioinformatics software to reduce computation, memory usage and run time. The number of applications using minimizers keeps on growing steadily. Despite its many uses, the theoretical understanding of minimizers is still very limited. In many applications, selecting as few k-mers as possible (i.e. having a low density) is beneficial. The density is highly dependent on the choice of the order on the k-mers. Different applications use different orders, but none of these orders are optimal. A better understanding of minimizers schemes, and the related local and forward schemes, will allow designing schemes with lower density, and thereby making existing and future bioinformatics tools even more efficient.

Results From the analysis of the asymptotic behavior of minimizers, forward and local schemes, we show that the previously believed lower bound on minimizers schemes does not hold, and that schemes with density lower than thought possible actually exist. The proof is constructive and leads to an efficient algorithm to compare k-mers. These orders are the first known orders that are asymptotically optimal. Additionally, we give improved bounds on the density achievable by the 3 type of schemes.

Contact gmarcais{at}cs.cmu.edu ckingsf{at}cs.cmu.edu

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted January 30, 2018.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Asymptotically optimal minimizers schemes
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Asymptotically optimal minimizers schemes
Guillaume Marçais, Dan DeBlasio, Carl Kingsford
bioRxiv 256156; doi: https://doi.org/10.1101/256156
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Asymptotically optimal minimizers schemes
Guillaume Marçais, Dan DeBlasio, Carl Kingsford
bioRxiv 256156; doi: https://doi.org/10.1101/256156

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3506)
  • Biochemistry (7348)
  • Bioengineering (5324)
  • Bioinformatics (20266)
  • Biophysics (10020)
  • Cancer Biology (7744)
  • Cell Biology (11306)
  • Clinical Trials (138)
  • Developmental Biology (6437)
  • Ecology (9954)
  • Epidemiology (2065)
  • Evolutionary Biology (13325)
  • Genetics (9361)
  • Genomics (12587)
  • Immunology (7702)
  • Microbiology (19027)
  • Molecular Biology (7444)
  • Neuroscience (41049)
  • Paleontology (300)
  • Pathology (1230)
  • Pharmacology and Toxicology (2138)
  • Physiology (3161)
  • Plant Biology (6861)
  • Scientific Communication and Education (1273)
  • Synthetic Biology (1897)
  • Systems Biology (5313)
  • Zoology (1089)