Abstract
Recent advancements in long-read sequencing and assembly methods have ushered in an era of high-quality genome assemblies. Modern assemblies commonly feature megabase-long sequences frequently spanning entire chromosomes. The increase in the assembly contiguity and the reduced number of assembly contigs also implies that whole-genome alignment is no longer an embarrassingly parallel problem. The conventional method of aligning sequences of the query genome in parallel is to utilize a single thread per sequence. This results in poor CPU utilization and long runtimes. In this work, we designed optimizations to accelerate whole-genome alignment on multi-core processors and implemented them in a commonly used aligner, minimap2. Our improvements include a fine-grained parallel chaining method and a fast mechanism for differentiating primary and secondary chains. Our approach accelerates alignment of human, plant, and primate genomes by 1.6× to 7.2× without compromising accuracy.
Competing Interest Statement
V.M. and S.M. are employees of Intel Corporation.
Footnotes
Emails: {ghanshyamc{at}iisc.ac.in}
Emails: {vasimuddin.md{at}intel.com, sanchit.misra{at}intel.com}