PT - JOURNAL ARTICLE AU - Saurabh Kalikar AU - Chirag Jain AU - Vasimuddin Md AU - Sanchit Misra TI - Accelerating long-read analysis on modern CPUs AID - 10.1101/2021.07.21.453294 DP - 2022 Jan 01 TA - bioRxiv PG - 2021.07.21.453294 4099 - http://biorxiv.org/content/early/2022/02/03/2021.07.21.453294.short 4100 - http://biorxiv.org/content/early/2022/02/03/2021.07.21.453294.full AB - Long read sequencing is now routinely used at scale for genomics and transcriptomics applications. Mapping of long reads or a draft genome assembly to a reference sequence is often one of the most time consuming steps in these applications. Here, we present techniques to accelerate minimap2, a widely used software for mapping. We present multiple optimizations using SIMD parallelization, efficient cache utilization and a learned index data structure to accelerate its three main computational modules, i.e., seeding, chaining and pairwise sequence alignment. These result in reduction of end-to-end mapping time of minimap2 by up to 1.8 × while maintaining identical output.Competing Interest StatementSK, VM and SM are employees of Intel Corporation