Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Theory of local k-mer selection with applications to long-read alignment

View ORCID ProfileJim Shaw, View ORCID ProfileYun William Yu
doi: https://doi.org/10.1101/2021.05.22.445262
Jim Shaw
1Department of Mathematics, University of Toronto, Toronto, M5S 2E4, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jim Shaw
  • For correspondence: jshaw@math.toronto.edu
Yun William Yu
1Department of Mathematics, University of Toronto, Toronto, M5S 2E4, Canada
2Computer and Mathematical Sciences, University of Toronto at Scarborough, Scarborough, M1C 1A4, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yun William Yu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Motivation Selecting a subset of k-mers in a string in a local manner is a common task in bioinformatics tools for speeding up computation. Arguably the most well-known and common method is the minimizer technique, which selects the ‘lowest-ordered’ k-mer in a sliding window. Recently, it has been shown that minimizers are a sub-optimal method for selecting subsets of k-mers when mutations are present. There is however a lack of understanding behind the theory of why certain methods perform well.

Results We first theoretically investigate the conservation metric for k-mer selection methods. We derive an exact expression for calculating the conservation of a k-mer selection method. This turns out to be tractable enough for us to prove closed-form expressions for a variety of methods, including (open and closed) syncmers, (α, b, n)-words, and an upper bound for minimizers. As a demonstration of our results, we modified the minimap2 read aligner to use a more optimal k-mer selection method and demonstrate that there is up to an 8.2% relative increase in number of mapped reads.

Availability and supplementary information Simulations and supplementary methods available at https://github.com/bluenote-1577/local-kmer-selection-results. os-minimap2 is a modified version of minimap2 and available at https://github.com/bluenote-1577/os-minimap2.

Contact jshaw{at}math.toronto.edu

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • https://github.com/bluenote-1577/local-kmer-selection-results

  • https://github.com/bluenote-1577/os-minimap2

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted May 23, 2021.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Theory of local k-mer selection with applications to long-read alignment
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Theory of local k-mer selection with applications to long-read alignment
Jim Shaw, Yun William Yu
bioRxiv 2021.05.22.445262; doi: https://doi.org/10.1101/2021.05.22.445262
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Theory of local k-mer selection with applications to long-read alignment
Jim Shaw, Yun William Yu
bioRxiv 2021.05.22.445262; doi: https://doi.org/10.1101/2021.05.22.445262

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3479)
  • Biochemistry (7318)
  • Bioengineering (5296)
  • Bioinformatics (20196)
  • Biophysics (9976)
  • Cancer Biology (7701)
  • Cell Biology (11249)
  • Clinical Trials (138)
  • Developmental Biology (6417)
  • Ecology (9915)
  • Epidemiology (2065)
  • Evolutionary Biology (13276)
  • Genetics (9352)
  • Genomics (12551)
  • Immunology (7673)
  • Microbiology (18937)
  • Molecular Biology (7417)
  • Neuroscience (40887)
  • Paleontology (298)
  • Pathology (1226)
  • Pharmacology and Toxicology (2125)
  • Physiology (3140)
  • Plant Biology (6837)
  • Scientific Communication and Education (1270)
  • Synthetic Biology (1891)
  • Systems Biology (5296)
  • Zoology (1084)