PT - JOURNAL ARTICLE AU - Ilia Minkin AU - Paul Medvedev TI - Scalable multiple whole-genome alignment and locally collinear block construction with SibeliaZ AID - 10.1101/548123 DP - 2019 Jan 01 TA - bioRxiv PG - 548123 4099 - http://biorxiv.org/content/early/2019/02/13/548123.short 4100 - http://biorxiv.org/content/early/2019/02/13/548123.full AB - Multiple whole-genome alignment is a fundamental and challenging problems in bioinformatics. Despite many ongoing successes, today’s methods are not able to keep up with the growing number, length, and complexity of assembled genomes. Approaches based on using compacted de Bruijn graphs to identify and extend anchors into locally collinear blocks hold the potential for scalability, but current algorithms still do not scale to mammalian genomes. We present a novel algorithm SibeliaZ-LCB for identifying collinear blocks in closely related genomes based on the analysis of the de Bruijn graph. We further incorporate it into a multiple whole-genome alignment pipeline called SibeliaZ. SibeliaZ shows drastic run-time improvements over other methods on both simulated and real data, with only a limited decrease in accuracy. On sixteen recently assembled strains of mice, SibeliaZ runs in under 12 hours, while other tools could not run to completion for even eight mice, given a week. SibeliaZ makes a significant step towards improving scalability of multiple whole-genome alignment and collinear block reconstruction algorithms and will enable many comparative genomics studies in the near future.