RT Journal Article SR Electronic T1 Fast and Accurate Genomic Analyses using Genome Graphs JF bioRxiv FD Cold Spring Harbor Laboratory SP 194530 DO 10.1101/194530 A1 Goran Rakocevic A1 Vladimir Semenyuk A1 James Spencer A1 John Browning A1 Ivan Johnson A1 Vladan Arsenijevic A1 Jelena Nadj A1 Kaushik Ghose A1 Maria C. Suciu A1 Sun-Gou Ji A1 Gülfem Demir A1 Lizao Li A1 Berke Ç. Toptaş A1 Alexey Dolgoborodov A1 Björn Pollex A1 Irina Glotova A1 Péter Kómár A1 Yilong Li A1 Milos Popovic A1 Wan-Ping Lee A1 Morten Källberg A1 Amit Jain A1 Deniz Kural YR 2017 UL http://biorxiv.org/content/early/2017/09/29/194530.abstract AB The human reference genome serves as the foundation for genomics by providing a scaffold for sequencing read alignment, but currently only reflects a single consensus haplotype, impairing read alignment and downstream analysis accuracy. Reference genome structures incorporating known genetic variation have been shown to improve the accuracy of genomic analyses, but have so far remained computationally prohibitive for routine large-scale use. Here we present a graph genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million indels. Our graph genome aligner and variant calling pipeline consume around 5.5 and 2 hours per high coverage whole-genome-sequenced sample, respectively, comparable to those of state-of-the-art linear reference genome-based methods. Using orthogonal benchmarks based on real and simulated data, we show that using a graph genome reference improves read mapping sensitivity and produces a 0.5 percentage point increase in variant calling recall, which extrapolates into 20,000 additional variants being detected per sample, while variant calling specificity is unaffected. Structural variations (SVs) incorporated into a graph genome can be directly genotyped from read alignments in a rapid and accurate fashion. Finally, we show that iterative augmentation of graph genomes yields incremental gains in variant calling accuracy. Our implementation is the first practical step towards fulfilling the promise of graph genomes to radically enhance the scalability and precision of genomic analysis by incorporating prior knowledge of population characteristics.One Sentence Summary Genome graphs incorporating common genetic variation enable efficient variant identification at population scale.