Abstract
K-mer-based methods are widely used in bioinformatics for various types of sequence comparison. However, a single mutation will mutate k consecutive k-mers and makes most k-mer based applications for sequence comparison sensitive to variable mutation rates. Many techniques have been studied to overcome this sensitivity, e.g., spaced k-mers and k-mer permutation techniques, but these techniques do not handle indels well. For indels, pairs or groups of small k-mers are commonly used, but these methods first produce k-mer matches, and only in a second step, a pairing or grouping of k-mers is performed. Such techniques produce many redundant k-mer matches due to the size of k.
Here, we propose strobemers as an alternative to k-mers for sequence comparison. Intuitively, strobemers consists of linked minimizers. We show that under a certain minimizer selection technique, strobemers provide more evenly distributed sequence matches than k-mers and are less sensitive to different mutation rates and distributions. Strobemers also give a higher total coverage of matches across sequences. Strobemers are a useful alternative to k-mers for performing sequence comparisons as commonly used in read alignment, clustering, classification, and error-correction.
Competing Interest Statement
The authors have declared no competing interest.