A Draft Human Pangenome Reference

Abstract
The Human Pangenome Reference Consortium (HPRC) presents a first draft human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence and are more than 99% accurate at the structural and base-pair levels. Based on alignments of the assemblies, we generated a draft pangenome that captures known variants and haplotypes, reveals novel alleles at structurally complex loci, and adds 119 million base pairs of euchromatic polymorphic sequence and 1,529 gene duplications relative to the existing reference, GRCh38. Roughly 90 million of the additional base pairs derive from structural variation. Using our draft pangenome to analyze short-read data reduces errors when discovering small variants by 34% and boosts the detected structural variants per haplotype by 104% compared to GRCh38-based workflows, and by 34% compared to using previous diversity sets of genome assemblies.
Competing Interest Statement
The authors have declared no competing interest.
Subject Area
- Biochemistry (13886)
- Bioengineering (10582)
- Bioinformatics (33648)
- Biophysics (17344)
- Cancer Biology (14396)
- Cell Biology (20388)
- Clinical Trials (138)
- Developmental Biology (11000)
- Ecology (16228)
- Epidemiology (2067)
- Evolutionary Biology (20535)
- Genetics (13528)
- Genomics (18825)
- Immunology (13953)
- Microbiology (32570)
- Molecular Biology (13550)
- Neuroscience (71012)
- Paleontology (533)
- Pathology (2222)
- Pharmacology and Toxicology (3782)
- Physiology (5962)
- Plant Biology (12174)
- Synthetic Biology (3406)
- Systems Biology (8246)
- Zoology (1875)