Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Fast and Accurate Genomic Analyses using Genome Graphs

Goran Rakocevic, Vladimir Semenyuk, James Spencer, John Browning, Ivan Johnson, Vladan Arsenijevic, Jelena Nadj, Kaushik Ghose, Maria C. Suciu, Sun-Gou Ji, Gulfem Demir, Lizao Li, Berke C. Toptas, Alexey Dolgoborodov, Bjoern Pollex, Peter Komar, Yilong Li, Milos Popovic, Wan-Ping Lee, Morten Kallberg, Amit Jain, Deniz Kural
doi: https://doi.org/10.1101/194530
Goran Rakocevic
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Vladimir Semenyuk
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
James Spencer
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John Browning
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ivan Johnson
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Vladan Arsenijevic
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jelena Nadj
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kaushik Ghose
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Maria C. Suciu
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sun-Gou Ji
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gulfem Demir
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lizao Li
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Berke C. Toptas
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alexey Dolgoborodov
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bjoern Pollex
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Peter Komar
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yilong Li
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Milos Popovic
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Wan-Ping Lee
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Morten Kallberg
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Amit Jain
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Deniz Kural
Seven Bridges Genomics, Inc, Cambridge, MA 02140
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: deniz.kural@sbgdinc.com
  • Abstract
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

The human reference genome serves as the foundation for genomics by providing a scaffold for sequencing read alignment, but currently only reflects a single consensus haplotype, impairing read alignment and downstream analysis accuracy. Reference genome structures incorporating known genetic variation have been shown to improve the accuracy of genomic analyses, but have so far remained computationally prohibitive for routine large-scale use. Here we present a graph genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million indels. Our graph genome aligner and variant calling pipeline consume around 5.5 and 2 hours per high coverage whole-genome-sequenced sample, respectively, comparable to those of state-of-the-art linear reference genome-based methods. Using orthogonal benchmarks based on real and simulated data, we show that using a graph genome reference improves read mapping sensitivity and produces a 0.5 percentage point increase in variant calling recall, which extrapolates into 20,000 additional variants being detected per sample, while variant calling specificity is unaffected. Structural variations (SVs) incorporated into a graph genome can be directly genotyped from read alignments in a rapid and accurate fashion. Finally, we show that iterative augmentation of graph genomes yields incremental gains in variant calling accuracy. Our implementation is the first practical step towards fulfilling the promise of graph genomes to radically enhance the scalability and precision of genomic analysis by incorporating prior knowledge of population characteristics.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted September 27, 2017.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Fast and Accurate Genomic Analyses using Genome Graphs
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
Share
Fast and Accurate Genomic Analyses using Genome Graphs
Goran Rakocevic, Vladimir Semenyuk, James Spencer, John Browning, Ivan Johnson, Vladan Arsenijevic, Jelena Nadj, Kaushik Ghose, Maria C. Suciu, Sun-Gou Ji, Gulfem Demir, Lizao Li, Berke C. Toptas, Alexey Dolgoborodov, Bjoern Pollex, Peter Komar, Yilong Li, Milos Popovic, Wan-Ping Lee, Morten Kallberg, Amit Jain, Deniz Kural
bioRxiv 194530; doi: https://doi.org/10.1101/194530
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Fast and Accurate Genomic Analyses using Genome Graphs
Goran Rakocevic, Vladimir Semenyuk, James Spencer, John Browning, Ivan Johnson, Vladan Arsenijevic, Jelena Nadj, Kaushik Ghose, Maria C. Suciu, Sun-Gou Ji, Gulfem Demir, Lizao Li, Berke C. Toptas, Alexey Dolgoborodov, Bjoern Pollex, Peter Komar, Yilong Li, Milos Popovic, Wan-Ping Lee, Morten Kallberg, Amit Jain, Deniz Kural
bioRxiv 194530; doi: https://doi.org/10.1101/194530

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (996)
  • Biochemistry (1485)
  • Bioengineering (938)
  • Bioinformatics (6803)
  • Biophysics (2414)
  • Cancer Biology (1782)
  • Cell Biology (2514)
  • Clinical Trials (106)
  • Developmental Biology (1683)
  • Ecology (2553)
  • Epidemiology (1488)
  • Evolutionary Biology (5003)
  • Genetics (3598)
  • Genomics (4614)
  • Immunology (1156)
  • Microbiology (4222)
  • Molecular Biology (1617)
  • Neuroscience (10740)
  • Paleontology (81)
  • Pathology (236)
  • Pharmacology and Toxicology (407)
  • Physiology (552)
  • Plant Biology (1443)
  • Scientific Communication and Education (410)
  • Synthetic Biology (542)
  • Systems Biology (1868)
  • Zoology (257)