PT - JOURNAL ARTICLE AU - Chirag Jain AU - Sergey Koren AU - Alexander Dilthey AU - Adam M. Phillippy AU - Srinivas Aluru TI - A Fast Adaptive Algorithm for Computing Whole-Genome Homology Maps AID - 10.1101/259986 DP - 2018 Jan 01 TA - bioRxiv PG - 259986 4099 - http://biorxiv.org/content/early/2018/02/05/259986.short 4100 - http://biorxiv.org/content/early/2018/02/05/259986.full AB - Motivation Whole-genome alignment is an important problem in genomics for comparing different species, mapping draft assemblies to reference genomes, and identifying repeats. However, for large plant and animal genomes, this task remains compute and memory intensive.Results We introduce an approximate algorithm for computing local alignment boundaries between long DNA sequences. Given a minimum alignment length and an identity threshold, our algorithm computes the desired alignment boundaries and identity estimates using kmer-based statistics, and maintains sufficient probabilistic guarantees on the output sensitivity. Further, to prioritize higher scoring alignment intervals, we develop a plane-sweep based filtering technique which is theoretically optimal and practically efficient. Implementation of these ideas resulted in a fast and accurate assembly-to-genome and genome-to-genome mapper. As a result, we were able to map an error-corrected whole-genome NA12878 human assembly to the hg38 human reference genome in about one minute total execution time and < 4 GB memory using 8 CPU threads, achieving more than an orderof magnitude improvement in both runtime and memory over competing methods. Recall accuracy of computed alignment boundaries was consistently found to be > 97% on multiple datasets. Finally, we performed a sensitive self-alignment of the human genome to compute all duplications of length ≥ 1 Kbp and ≥ 90% identity. The reported output achieves good recall and covers 5% more bases than the current UCSC genome browser’s segmental duplication annotation.Availability https://github.com/marbl/MashMap.Contact adam.phillippy{at}nih.gov, aluru{at}cc.gatech.edu