RT Journal Article SR Electronic T1 GraphChainer: Co-linear Chaining for Accurate Alignment of Long Reads to Variation Graphs JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.01.07.475257 DO 10.1101/2022.01.07.475257 A1 Jun Ma A1 Manuel Cáceres A1 Leena Salmela A1 Veli Mäkinen A1 Alexandru I. Tomescu YR 2022 UL http://biorxiv.org/content/early/2022/01/07/2022.01.07.475257.abstract AB Aligning reads to a variation graph is a standard task in pangenomics, with downstream applications in e.g., improving variant calling. While the vg toolkit (Garrison et al., Nature Biotechnology, 2018) is a popular aligner of short reads, GraphAligner (Rautiainen and Marschall, Genome Biology, 2020) is the state-of-the-art aligner of long reads. GraphAligner works by finding candidate read occurrences based on individually extending the best seeds of the read in the variation graph. However, a more principled approach recognized in the community is to co-linearly chain multiple seeds. We present a new algorithm to co-linearly chain a set of seeds in an acyclic variation graph, together with the first efficient implementation of such a co-linear chaining algorithm into a new aligner of long reads to variation graphs, GraphChainer. Compared to GraphAligner, at a normalized edit distance threshold of 40%, it aligns 9% to 12% more reads, and 15% to 19% more total read length, on real PacBio reads from human chromosomes 1 and 22. On both simulated and real data, GraphChainer aligns between 97% and 99% of all reads, and of total read length. At the more stringent normalized edit distance threshold of 30%, GraphChainer aligns up to 29% more total real read length than GraphAligner.GraphChainer is freely available at https://github.com/algbio/GraphChainerCompeting Interest StatementThe authors have declared no competing interest.