Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Strobemers: an alternative to k-mers for sequence comparison

View ORCID ProfileKristoffer Sahlin
doi: https://doi.org/10.1101/2021.01.28.428549
Kristoffer Sahlin
1Department of Mathematics, Science for Life Laboratory, Stockholm University, 106 91, Stockholm, Sweden
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kristoffer Sahlin
  • For correspondence: ksahlin@math.su.se
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

K-mer-based methods are widely used in bioinformatics for various types of sequence comparison. However, a single mutation will mutate k consecutive k-mers and makes most k-mer based applications for sequence comparison sensitive to variable mutation rates. Many techniques have been studied to overcome this sensitivity, e.g., spaced k-mers and k-mer permutation techniques, but these techniques do not handle indels well. For indels, pairs or groups of small k-mers are commonly used, but these methods first produce k-mer matches, and only in a second step, a pairing or grouping of k-mers is performed. Such techniques produce many redundant k-mer matches due to the size of k.

Here, we propose strobemers as an alternative to k-mers for sequence comparison. Intuitively, strobemers consist of linked minimizers. We use simulated data to show that strobemers provide more evenly distributed sequence matches and are less sensitive to different mutation rates than k-mers and spaced k-mers. Strobemers also produce a higher match coverage across sequences. We further implement a proof-of-concept sequence matching tool StrobeMap, and use synthetic and biological Oxford Nanopore sequencing data to show the utility of using strobemers for sequence comparison in different contexts such as sequence clustering and alignment scenarios. A reference implementation of our tool StrobeMap together with code for analyses is available at https://github.com/ksahlin/strobemers.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • - More analyses on biological data - more informative analysis metrics - added spaced k-mers to comparison - Added analysis of proof-of-concept tool StrobeMap - Implemented and evaluated new strobemer-class hybridstrobes - Modifications to definitions/construction of strobemers. - Added runtime and memory comparison - Added thinning analysis

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted April 09, 2021.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Strobemers: an alternative to k-mers for sequence comparison
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Strobemers: an alternative to k-mers for sequence comparison
Kristoffer Sahlin
bioRxiv 2021.01.28.428549; doi: https://doi.org/10.1101/2021.01.28.428549
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Strobemers: an alternative to k-mers for sequence comparison
Kristoffer Sahlin
bioRxiv 2021.01.28.428549; doi: https://doi.org/10.1101/2021.01.28.428549

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4685)
  • Biochemistry (10362)
  • Bioengineering (7682)
  • Bioinformatics (26343)
  • Biophysics (13534)
  • Cancer Biology (10694)
  • Cell Biology (15446)
  • Clinical Trials (138)
  • Developmental Biology (8501)
  • Ecology (12824)
  • Epidemiology (2067)
  • Evolutionary Biology (16867)
  • Genetics (11402)
  • Genomics (15484)
  • Immunology (10621)
  • Microbiology (25226)
  • Molecular Biology (10225)
  • Neuroscience (54482)
  • Paleontology (402)
  • Pathology (1669)
  • Pharmacology and Toxicology (2897)
  • Physiology (4345)
  • Plant Biology (9254)
  • Scientific Communication and Education (1587)
  • Synthetic Biology (2558)
  • Systems Biology (6781)
  • Zoology (1466)