RT Journal Article SR Electronic T1 Dataset-adaptive minimizer order reduces memory usage in k-mer counting JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.12.02.470910 DO 10.1101/2021.12.02.470910 A1 Flomin, Dan A1 Pellow, David A1 Shamir, Ron YR 2021 UL http://biorxiv.org/content/early/2021/12/03/2021.12.02.470910.abstract AB The rapid, continuous growth of deep sequencing experiments requires development and improvement of many bioinformatics applications for analysis of large sequencing datasets, including k-mer counting and assembly. Several applications reduce RAM usage by binning sequences. Binning is done by employing minimizer schemes, which rely on a specific order of the minimizers. It has been demonstrated that the choice of the order has a major impact on the performance of the applications. Here we introduce a method for tailoring the order to the dataset. Our method repeatedly samples the dataset and modifies the order so as to flatten the k-mer load distribution across minimizers. We integrated our method into Gerbil, a state-of-the-art memory efficient k-mer counter, and were able to reduce its memory footprint by 50% or more for large k, with only minor increase in runtime. Our tests also showed that the orders produced by our method produced superior results when transferred across datasets from the same species, with little or no order change. This enables memory reduction with essentially no increase in runtime.Competing Interest StatementThe authors have declared no competing interest.