PT - JOURNAL ARTICLE AU - Pellow, David AU - Pu, Lianrong AU - Ekim, Baris AU - Kotlar, Lior AU - Berger, Bonnie AU - Shamir, Ron AU - Orenstein, Yaron TI - Efficient minimizer orders for large values of <em>k</em> using minimum decycling sets AID - 10.1101/2022.10.18.512682 DP - 2022 Jan 01 TA - bioRxiv PG - 2022.10.18.512682 4099 - http://biorxiv.org/content/early/2022/10/21/2022.10.18.512682.short 4100 - http://biorxiv.org/content/early/2022/10/21/2022.10.18.512682.full AB - Minimizers are ubiquitously used in data structures and algorithms for efficient searching, mapping, and indexing of high-throughput DNA sequencing data. Minimizer schemes select a minimum k-mer in every L-long sub-sequence of the target sequence, where minimality is with respect to a predefined k-mer order. Commonly used minimizer orders select more k-mers overall than necessary and therefore provide limited improvement to runtime and memory usage of downstream analysis tasks. The recently introduced universal k-mer hitting sets produce minimizer orders resulting in fewer selected k-mers. Unfortunately, generating compact universal k-mer hitting sets is currently infeasible for k &gt; 13, and thus cannot help in the many applications that need minimizers of larger k.Here, we close this gap by introducing decycling set-based minimizer orders. We define new orders based on minimum decycling sets, which are guaranteed to hit any infinitely long sequence. We show that in practice these new minimizer orders select a number of k-mers comparable to that of minimizer orders based on universal k-mer hitting sets, and can also scale up to larger k. Furthermore, we developed a query method that avoids the need to keep the k-mers of a decycling set in memory, which enables the use of these minimizer orders for any value of k. We expect the new decycling set-based minimizer orders to improve the runtime and memory usage of algorithms and data structures in high-throughput DNA sequencing analysis.Competing Interest StatementThe authors have declared no competing interest.