RT Journal Article
SR Electronic
T1 Pangenome-based genome inference
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 2020.11.11.378133
DO 10.1101/2020.11.11.378133
A1 Jana Ebler
A1 Wayne E. Clarke
A1 Tobias Rausch
A1 Peter A. Audano
A1 Torsten Houwaart
A1 Jan Korbel
A1 Evan E. Eichler
A1 Michael C. Zody
A1 Alexander T. Dilthey
A1 Tobias Marschall
YR 2020
UL http://biorxiv.org/content/early/2020/11/12/2020.11.11.378133.abstract
AB Typical analysis workflows map reads to a reference genome in order to detect genetic variants. Generating such alignments introduces references biases, in particular against insertion alleles absent in the reference and comes with substantial computational burden. In contrast, recent k-mer-based genotyping methods are fast, but struggle in repetitive or duplicated regions of the genome. We propose a novel algorithm, called PanGenie, that leverages a pangenome reference built from haplotype-resolved genome assemblies in conjunction with k-mer count information from raw, short-read sequencing data to genotype a wide spectrum of genetic variation. The given haplotypes enable our method to take advantage of linkage information to aid genotyping in regions poorly covered by unique k-mers and provides access to regions otherwise inaccessible by short reads. Compared to classic mapping-based approaches, our approach is more than 4× faster at 30× coverage and at the same time, reached significantly better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (&gt; 50bp), where we are able to genotype &gt; 99.9% of all tested variants with over 90% accuracy at 30× short-read coverage, where the best competing tools either typed less than 60% of variants or reached accuracies below 70%. PanGenie now enables the inclusion of this commonly neglected variant type in downstream analyses.Competing Interest StatementThe authors have declared no competing interest.