PT - JOURNAL ARTICLE AU - Jouni Sirén AU - Erik Garrison AU - Adam M. Novak AU - Benedict Paten AU - Richard Durbin TI - Haplotype-aware graph indexes AID - 10.1101/559583 DP - 2019 Jan 01 TA - bioRxiv PG - 559583 4099 - http://biorxiv.org/content/early/2019/02/24/559583.short 4100 - http://biorxiv.org/content/early/2019/02/24/559583.full AB - Motivation The variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are nonbiological, unlikely recombinations of true haplotypes.Results We augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows–Wheelertransform (GBWT). We demonstrate the scalability of the new implementation by building a whole-genome index of the 5,008 haplotypes of the 1000 Genomes Project, and an index of all 108,070 TOPMed Freeze 5 chromosome 17 haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes.Availability Our software is available at https://github.com/vgteam/vg, https://github.com/jltsiren/gbwt, and https://github.com/jltsiren/gcsa2.Contact jouni.siren{at}iki.fiSupplementary information Supplementary data are available.