Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

A Draft Human Pangenome Reference

View ORCID ProfileWen-Wei Liao, Mobin Asri, Jana Ebler, Daniel Doerr, Marina Haukness, View ORCID ProfileGlenn Hickey, View ORCID ProfileShuangjia Lu, Julian K. Lucas, View ORCID ProfileJean Monlong, Haley J. Abel, Silvia Buonaiuto, View ORCID ProfileXian H. Chang, Haoyu Cheng, Justin Chu, Vincenza Colonna, View ORCID ProfileJordan M. Eizenga, Xiaowen Feng, Christian Fischer, Robert S. Fulton, Shilpa Garg, Cristian Groza, Andrea Guarracino, William T Harvey, Simon Heumos, Kerstin Howe, Miten Jain, Tsung-Yu Lu, View ORCID ProfileCharles Markello, View ORCID ProfileFergal J. Martin, Matthew W. Mitchell, View ORCID ProfileKatherine M. Munson, Moses Njagi Mwaniki, View ORCID ProfileAdam M. Novak, View ORCID ProfileHugh E. Olsen, View ORCID ProfileTrevor Pesout, View ORCID ProfileDavid Porubsky, View ORCID ProfilePjotr Prins, View ORCID ProfileJonas A. Sibbesen, Chad Tomlinson, View ORCID ProfileFlavia Villani, View ORCID ProfileMitchell R. Vollger, Human Pangenome Reference Consortium, View ORCID ProfileGuillaume Bourque, View ORCID ProfileMark JP Chaisson, View ORCID ProfilePaul Flicek, Adam M. Phillippy, Justin M. Zook, View ORCID ProfileEvan E. Eichler, View ORCID ProfileDavid Haussler, Erich D. Jarvis, View ORCID ProfileKaren H. Miga, Ting Wang, View ORCID ProfileErik Garrison, Tobias Marschall, View ORCID ProfileIra Hall, View ORCID ProfileHeng Li, View ORCID ProfileBenedict Paten
doi: https://doi.org/10.1101/2022.07.09.499321
Wen-Wei Liao
1McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
2Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
3Department of Genetics, Yale University School of Medicine, New Haven, CT 06510, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Wen-Wei Liao
Mobin Asri
4UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High St, Santa Cruz, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jana Ebler
5Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Daniel Doerr
5Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marina Haukness
4UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High St, Santa Cruz, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Glenn Hickey
4UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High St, Santa Cruz, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Glenn Hickey
Shuangjia Lu
3Department of Genetics, Yale University School of Medicine, New Haven, CT 06510, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Shuangjia Lu
Julian K. Lucas
4UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High St, Santa Cruz, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jean Monlong
4UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High St, Santa Cruz, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jean Monlong
Haley J. Abel
6Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Silvia Buonaiuto
7Institute of Genetics and Biophysics, National Research Council, Naples 80111, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xian H. Chang
4UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High St, Santa Cruz, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Xian H. Chang
Haoyu Cheng
8Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02215, USA
9Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02215, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Justin Chu
8Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02215, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Vincenza Colonna
7Institute of Genetics and Biophysics, National Research Council, Naples 80111, Italy
10Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jordan M. Eizenga
4UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High St, Santa Cruz, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jordan M. Eizenga
Xiaowen Feng
8Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02215, USA
9Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02215, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christian Fischer
10Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Robert S. Fulton
1McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shilpa Garg
11Department of Biology, University of Copenhagen, Denmark
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cristian Groza
12Quantitative Life Sciences, McGill University, Montreal, Québec H3A 0C7, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrea Guarracino
13Genomics Research Centre, Human Technopole, Milan 20157, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
William T Harvey
14Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Simon Heumos
15Quantitative Biology Center (QBiC), University of Tübingen, Tübingen 72076, Germany
16Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen 72076, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kerstin Howe
17Tree of Life, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Miten Jain
18Northeastern University, Boston, MA 02115, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tsung-Yu Lu
19University of Southern California, Quantitative and Computational Biology, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Charles Markello
4UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High St, Santa Cruz, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Charles Markello
Fergal J. Martin
20European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, CB10 1SD, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Fergal J. Martin
Matthew W. Mitchell
21Coriell Institute for Medical Research, Camden, NJ 08103, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Katherine M. Munson
14Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Katherine M. Munson
Moses Njagi Mwaniki
22Department of Computer Science, University of Pisa, Pisa 56127, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Adam M. Novak
4UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High St, Santa Cruz, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Adam M. Novak
Hugh E. Olsen
4UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High St, Santa Cruz, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Hugh E. Olsen
Trevor Pesout
4UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High St, Santa Cruz, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Trevor Pesout
David Porubsky
14Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for David Porubsky
Pjotr Prins
10Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Pjotr Prins
Jonas A. Sibbesen
23Center for Health Data Science, University of Copenhagen, Denmark
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jonas A. Sibbesen
Chad Tomlinson
1McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Flavia Villani
10Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Flavia Villani
Mitchell R. Vollger
14Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
24Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mitchell R. Vollger
Guillaume Bourque
25Department of Human Genetics, McGill University, Montreal, Québec H3A 0C7, Canada
26Canadian Center for Computational Genomics, McGill University, Montreal, Québec H3A 0G1, Canada
27Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto 606-8501, Japan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Guillaume Bourque
Mark JP Chaisson
19University of Southern California, Quantitative and Computational Biology, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mark JP Chaisson
Paul Flicek
20European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, CB10 1SD, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Paul Flicek
Adam M. Phillippy
28Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Justin M. Zook
29Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20877, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Evan E. Eichler
14Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
30Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Evan E. Eichler
David Haussler
4UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High St, Santa Cruz, CA, USA
30Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for David Haussler
Erich D. Jarvis
31The Rockefeller University, New York, NY 10065, USA
30Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Karen H. Miga
4UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High St, Santa Cruz, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Karen H. Miga
Ting Wang
32Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Erik Garrison
10Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Erik Garrison
  • For correspondence: egarris5@uthsc.edu tobias.marschall@hhu.de ira.hall@yale.edu hli@jimmy.harvard.edu bpaten@ucsc.edu
Tobias Marschall
5Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: egarris5@uthsc.edu tobias.marschall@hhu.de ira.hall@yale.edu hli@jimmy.harvard.edu bpaten@ucsc.edu
Ira Hall
3Department of Genetics, Yale University School of Medicine, New Haven, CT 06510, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ira Hall
  • For correspondence: egarris5@uthsc.edu tobias.marschall@hhu.de ira.hall@yale.edu hli@jimmy.harvard.edu bpaten@ucsc.edu
Heng Li
8Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02215, USA
9Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02215, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Heng Li
  • For correspondence: egarris5@uthsc.edu tobias.marschall@hhu.de ira.hall@yale.edu hli@jimmy.harvard.edu bpaten@ucsc.edu
Benedict Paten
4UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High St, Santa Cruz, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Benedict Paten
  • For correspondence: egarris5@uthsc.edu tobias.marschall@hhu.de ira.hall@yale.edu hli@jimmy.harvard.edu bpaten@ucsc.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

The Human Pangenome Reference Consortium (HPRC) presents a first draft human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence and are more than 99% accurate at the structural and base-pair levels. Based on alignments of the assemblies, we generated a draft pangenome that captures known variants and haplotypes, reveals novel alleles at structurally complex loci, and adds 119 million base pairs of euchromatic polymorphic sequence and 1,529 gene duplications relative to the existing reference, GRCh38. Roughly 90 million of the additional base pairs derive from structural variation. Using our draft pangenome to analyze short-read data reduces errors when discovering small variants by 34% and boosts the detected structural variants per haplotype by 104% compared to GRCh38-based workflows, and by 34% compared to using previous diversity sets of genome assemblies.

Competing Interest Statement

The authors have declared no competing interest.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted July 09, 2022.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A Draft Human Pangenome Reference
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A Draft Human Pangenome Reference
Wen-Wei Liao, Mobin Asri, Jana Ebler, Daniel Doerr, Marina Haukness, Glenn Hickey, Shuangjia Lu, Julian K. Lucas, Jean Monlong, Haley J. Abel, Silvia Buonaiuto, Xian H. Chang, Haoyu Cheng, Justin Chu, Vincenza Colonna, Jordan M. Eizenga, Xiaowen Feng, Christian Fischer, Robert S. Fulton, Shilpa Garg, Cristian Groza, Andrea Guarracino, William T Harvey, Simon Heumos, Kerstin Howe, Miten Jain, Tsung-Yu Lu, Charles Markello, Fergal J. Martin, Matthew W. Mitchell, Katherine M. Munson, Moses Njagi Mwaniki, Adam M. Novak, Hugh E. Olsen, Trevor Pesout, David Porubsky, Pjotr Prins, Jonas A. Sibbesen, Chad Tomlinson, Flavia Villani, Mitchell R. Vollger, Human Pangenome Reference Consortium, Guillaume Bourque, Mark JP Chaisson, Paul Flicek, Adam M. Phillippy, Justin M. Zook, Evan E. Eichler, David Haussler, Erich D. Jarvis, Karen H. Miga, Ting Wang, Erik Garrison, Tobias Marschall, Ira Hall, Heng Li, Benedict Paten
bioRxiv 2022.07.09.499321; doi: https://doi.org/10.1101/2022.07.09.499321
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
A Draft Human Pangenome Reference
Wen-Wei Liao, Mobin Asri, Jana Ebler, Daniel Doerr, Marina Haukness, Glenn Hickey, Shuangjia Lu, Julian K. Lucas, Jean Monlong, Haley J. Abel, Silvia Buonaiuto, Xian H. Chang, Haoyu Cheng, Justin Chu, Vincenza Colonna, Jordan M. Eizenga, Xiaowen Feng, Christian Fischer, Robert S. Fulton, Shilpa Garg, Cristian Groza, Andrea Guarracino, William T Harvey, Simon Heumos, Kerstin Howe, Miten Jain, Tsung-Yu Lu, Charles Markello, Fergal J. Martin, Matthew W. Mitchell, Katherine M. Munson, Moses Njagi Mwaniki, Adam M. Novak, Hugh E. Olsen, Trevor Pesout, David Porubsky, Pjotr Prins, Jonas A. Sibbesen, Chad Tomlinson, Flavia Villani, Mitchell R. Vollger, Human Pangenome Reference Consortium, Guillaume Bourque, Mark JP Chaisson, Paul Flicek, Adam M. Phillippy, Justin M. Zook, Evan E. Eichler, David Haussler, Erich D. Jarvis, Karen H. Miga, Ting Wang, Erik Garrison, Tobias Marschall, Ira Hall, Heng Li, Benedict Paten
bioRxiv 2022.07.09.499321; doi: https://doi.org/10.1101/2022.07.09.499321

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4678)
  • Biochemistry (10350)
  • Bioengineering (7670)
  • Bioinformatics (26324)
  • Biophysics (13520)
  • Cancer Biology (10680)
  • Cell Biology (15429)
  • Clinical Trials (138)
  • Developmental Biology (8495)
  • Ecology (12818)
  • Epidemiology (2067)
  • Evolutionary Biology (16846)
  • Genetics (11389)
  • Genomics (15474)
  • Immunology (10608)
  • Microbiology (25193)
  • Molecular Biology (10213)
  • Neuroscience (54439)
  • Paleontology (401)
  • Pathology (1668)
  • Pharmacology and Toxicology (2895)
  • Physiology (4341)
  • Plant Biology (9241)
  • Scientific Communication and Education (1586)
  • Synthetic Biology (2557)
  • Systems Biology (6777)
  • Zoology (1463)