TY - JOUR T1 - Fast and Accurate Genomic Analyses using Genome Graphs JF - bioRxiv DO - 10.1101/194530 SP - 194530 AU - Goran Rakocevic AU - Vladimir Semenyuk AU - James Spencer AU - John Browning AU - Ivan Johnson AU - Vladan Arsenijevic AU - Jelena Nadj AU - Kaushik Ghose AU - Maria C. Suciu AU - Sun-Gou Ji AU - Gülfem Demir AU - Lizao Li AU - Berke Ç. Toptaş AU - Alexey Dolgoborodov AU - Björn Pollex AU - Péter Kómár AU - Yilong Li AU - Milos Popovic AU - Wan-Ping Lee AU - Morten Källberg AU - Amit Jain AU - Deniz Kural Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/09/27/194530.abstract N2 - The human reference genome serves as the foundation for genomics by providing a scaffold for sequencing read alignment, but currently only reflects a single consensus haplotype, impairing read alignment and downstream analysis accuracy. Reference genome structures incorporating known genetic variation have been shown to improve the accuracy of genomic analyses, but have so far remained computationally prohibitive for routine large-scale use. Here we present a graph genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million indels. Our graph genome aligner and variant calling pipeline consume around 5.5 and 2 hours per high coverage whole-genome-sequenced sample, respectively, comparable to those of state-of-the-art linear reference genome-based methods. Using orthogonal benchmarks based on real and simulated data, we show that using a graph genome reference improves read mapping sensitivity and produces a 0.5 percentage point increase in variant calling recall, which extrapolates into 20,000 additional variants being detected per sample, while variant calling specificity is unaffected. Structural variations (SVs) incorporated into a graph genome can be directly genotyped from read alignments in a rapid and accurate fashion. Finally, we show that iterative augmentation of graph genomes yields incremental gains in variant calling accuracy. Our implementation is the first practical step towards fulfilling the promise of graph genomes to radically enhance the scalability and precision of genomic analysis by incorporating prior knowledge of population characteristics.One Sentence Summary Genome graphs incorporating common genetic variation enable efficient variant identification at population scale. ER -