Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Genome Graphs

Adam M. Novak, Glenn Hickey, Erik Garrison, Sean Blum, Abram Connelly, Alexander Dilthey, Jordan Eizenga, M. A. Saleh Elmohamed, Sally Guthrie, André Kahles, Stephen Keenan, Jerome Kelleher, Deniz Kural, Heng Li, Michael F. Lin, Karen Miga, Nancy Ouyang, Goran Rakocevic, Maciek Smuga-Otto, Alexander Wait Zaranek, Richard Durbin, Gil McVean, David Haussler, Benedict Paten
doi: https://doi.org/10.1101/101378
Adam M. Novak
UCSC Genomics Institute
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Glenn Hickey
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Erik Garrison
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sean Blum
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Abram Connelly
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alexander Dilthey
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jordan Eizenga
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
M. A. Saleh Elmohamed
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sally Guthrie
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
André Kahles
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stephen Keenan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jerome Kelleher
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Deniz Kural
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Heng Li
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael F. Lin
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Karen Miga
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nancy Ouyang
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Goran Rakocevic
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Maciek Smuga-Otto
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alexander Wait Zaranek
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Richard Durbin
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gil McVean
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David Haussler
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Benedict Paten
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

There is increasing recognition that a single, monoploid reference genome is a poor universal reference structure for human genetics, because it represents only a tiny fraction of human variation. Adding this missing variation results in a structure that can be described as a mathematical graph: a genome graph. We demonstrate that, in comparison to the existing reference genome (GRCh38), genome graphs can substantially improve the fractions of reads that map uniquely and perfectly. Furthermore, we show that this fundamental simplification of read mapping transforms the variant calling problem from one in which many non-reference variants must be discovered de-novo to one in which the vast majority of variants are simply re-identified within the graph. Using standard benchmarks as well as a novel reference-free evaluation, we show that a simplistic variant calling procedure on a genome graph can already call variants at least as well as, and in many cases better than, a state-of-the-art method on the linear human reference genome. We anticipate that graph-based references will supplant linear references in humans and in other applications where cohorts of sequenced individuals are available.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted January 18, 2017.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Genome Graphs
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Genome Graphs
Adam M. Novak, Glenn Hickey, Erik Garrison, Sean Blum, Abram Connelly, Alexander Dilthey, Jordan Eizenga, M. A. Saleh Elmohamed, Sally Guthrie, André Kahles, Stephen Keenan, Jerome Kelleher, Deniz Kural, Heng Li, Michael F. Lin, Karen Miga, Nancy Ouyang, Goran Rakocevic, Maciek Smuga-Otto, Alexander Wait Zaranek, Richard Durbin, Gil McVean, David Haussler, Benedict Paten
bioRxiv 101378; doi: https://doi.org/10.1101/101378
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Genome Graphs
Adam M. Novak, Glenn Hickey, Erik Garrison, Sean Blum, Abram Connelly, Alexander Dilthey, Jordan Eizenga, M. A. Saleh Elmohamed, Sally Guthrie, André Kahles, Stephen Keenan, Jerome Kelleher, Deniz Kural, Heng Li, Michael F. Lin, Karen Miga, Nancy Ouyang, Goran Rakocevic, Maciek Smuga-Otto, Alexander Wait Zaranek, Richard Durbin, Gil McVean, David Haussler, Benedict Paten
bioRxiv 101378; doi: https://doi.org/10.1101/101378

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4115)
  • Biochemistry (8818)
  • Bioengineering (6522)
  • Bioinformatics (23466)
  • Biophysics (11792)
  • Cancer Biology (9212)
  • Cell Biology (13326)
  • Clinical Trials (138)
  • Developmental Biology (7439)
  • Ecology (11413)
  • Epidemiology (2066)
  • Evolutionary Biology (15155)
  • Genetics (10439)
  • Genomics (14045)
  • Immunology (9173)
  • Microbiology (22159)
  • Molecular Biology (8814)
  • Neuroscience (47581)
  • Paleontology (350)
  • Pathology (1429)
  • Pharmacology and Toxicology (2492)
  • Physiology (3731)
  • Plant Biology (8082)
  • Scientific Communication and Education (1437)
  • Synthetic Biology (2221)
  • Systems Biology (6039)
  • Zoology (1253)