Abstract
We present an algorithm for the optimal alignment of sequences to genome graphs. It works by phrasing the edit distance minimization task as finding a shortest path on an implicit alignment graph. To find a shortest path, we instantiate the A⋆ paradigm with a novel domain-specific heuristic function that accounts for the upcoming subsequence in the query to be aligned, resulting in a provably optimal alignment algorithm called AStarix.
Experimental evaluation of AStarix shows that it is 1–2 orders of magnitude faster than state-of-the-art optimal algorithms on the task of aligning Illumina reads to reference genome graphs. Implementations and evaluations are available at https://github.com/eth-sri/astarix.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
pesho.ivanov{at}inf.ethz.ch, benjamin.bichsel{at}inf.ethz.ch, harun.mustafa{at}inf.ethz.ch, andre.kahles{at}inf.ethz.ch, gunnar.ratsch{at}inf.ethz.ch, martin.vechev{at}inf.ethz.ch
The optimal algorithm from the GraphAligner tool is referred to as BitParallel.