RT Journal Article SR Electronic T1 Pangenome-based genome inference JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.11.11.378133 DO 10.1101/2020.11.11.378133 A1 Jana Ebler A1 Wayne E. Clarke A1 Tobias Rausch A1 Peter A. Audano A1 Torsten Houwaart A1 Jan Korbel A1 Evan E. Eichler A1 Michael C. Zody A1 Alexander T. Dilthey A1 Tobias Marschall YR 2020 UL http://biorxiv.org/content/early/2020/11/12/2020.11.11.378133.abstract AB Typical analysis workflows map reads to a reference genome in order to detect genetic variants. Generating such alignments introduces references biases, in particular against insertion alleles absent in the reference and comes with substantial computational burden. In contrast, recent k-mer-based genotyping methods are fast, but struggle in repetitive or duplicated regions of the genome. We propose a novel algorithm, called PanGenie, that leverages a pangenome reference built from haplotype-resolved genome assemblies in conjunction with k-mer count information from raw, short-read sequencing data to genotype a wide spectrum of genetic variation. The given haplotypes enable our method to take advantage of linkage information to aid genotyping in regions poorly covered by unique k-mers and provides access to regions otherwise inaccessible by short reads. Compared to classic mapping-based approaches, our approach is more than 4× faster at 30× coverage and at the same time, reached significantly better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (> 50bp), where we are able to genotype > 99.9% of all tested variants with over 90% accuracy at 30× short-read coverage, where the best competing tools either typed less than 60% of variants or reached accuracies below 70%. PanGenie now enables the inclusion of this commonly neglected variant type in downstream analyses.Competing Interest StatementThe authors have declared no competing interest.