Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Accurate chromosome-scale haplotype-resolved assembly of human genomes

Shilpa Garg, Arkarachai Fungtammasan, Andrew Carroll, Mike Chou, Anthony Schmitt, Xiang Zhou, Stephen Mac, Paul Peluso, Emily Hatas, Jay Ghurye, Jared Maguire, Medhat Mahmoud, Haoyu Cheng, David Heller, Justin M. Zook, Tobias Moemke, Tobias Marschall, Fritz J. Sedlazeck, John Aach, Chen-Shan Chin, George M. Church, View ORCID ProfileHeng Li
doi: https://doi.org/10.1101/810341
Shilpa Garg
1Department of Genetics, Harvard Medical School, Boston, MA 02215
2Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02215
10Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02215
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: shilpa_garg@hms.harvard.edu jchin@dnanexus.com gchurch@genetics.med.harvard.edu hli@ds.dfci.harvard.edu
Arkarachai Fungtammasan
3DNAnexus, Mountain View, CA 94040
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrew Carroll
4Google, Mountain View, CA 94043
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mike Chou
1Department of Genetics, Harvard Medical School, Boston, MA 02215
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Anthony Schmitt
5Arima Genomics, San Diego, CA 92121
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xiang Zhou
5Arima Genomics, San Diego, CA 92121
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stephen Mac
5Arima Genomics, San Diego, CA 92121
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Paul Peluso
6Pacific Biosciences, Menlo Park, CA 94025
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Emily Hatas
6Pacific Biosciences, Menlo Park, CA 94025
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jay Ghurye
7Dovetail Genomics, 100 Enterprise Way, Scotts Valley, CA 95066
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jared Maguire
7Dovetail Genomics, 100 Enterprise Way, Scotts Valley, CA 95066
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Medhat Mahmoud
9Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Haoyu Cheng
2Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02215
10Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02215
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David Heller
12Max Planck Institute for Molecular Genetics, Berlin, Germany 14195
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Justin M. Zook
8Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tobias Moemke
13Saarland University, Saarbrücken, Germany, 66123
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tobias Marschall
11Max Planck Institute for Informatics, Saarbrücken, Germany, 66123
13Saarland University, Saarbrücken, Germany, 66123
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Fritz J. Sedlazeck
9Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John Aach
1Department of Genetics, Harvard Medical School, Boston, MA 02215
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Chen-Shan Chin
3DNAnexus, Mountain View, CA 94040
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: shilpa_garg@hms.harvard.edu jchin@dnanexus.com gchurch@genetics.med.harvard.edu hli@ds.dfci.harvard.edu
George M. Church
1Department of Genetics, Harvard Medical School, Boston, MA 02215
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: shilpa_garg@hms.harvard.edu jchin@dnanexus.com gchurch@genetics.med.harvard.edu hli@ds.dfci.harvard.edu
Heng Li
2Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02215
10Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02215
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Heng Li
  • For correspondence: shilpa_garg@hms.harvard.edu jchin@dnanexus.com gchurch@genetics.med.harvard.edu hli@ds.dfci.harvard.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Haplotype-resolved or phased sequence assembly provides a complete picture of genomes and complex genetic variations. However, current phased assembly algorithms either fail to generate chromosome-scale phasing or require pedigree information, which limits their application. We present a method that leverages long accurate reads and long-range conformation data for single individuals to generate chromosome-scale phased assembly within a day. Applied to three public human genomes, PGP1, HG002 and NA12878, our method produced haplotype-resolved assemblies with contig NG50 up to 25 Mb and phased ∼99.5% of heterozygous sites to 98–99% accuracy, outperforming other approaches in terms of both contiguity and phasing completeness. We demonstrate the importance of chromosome-scale phased assemblies to discover structural variants (SVs), including thousands of new transposon insertions, and of highly polymorphic and medically important regions such as HLA and KIR. Our improved method will enable high-quality precision medicine and facilitate new studies of individual haplotype variation and population diversity.

Competing Interest Statement

F.J.S. obtained a Pacbio SMRT grant in 2019 and had multiple travels sponsored by Pacific Biosciences and Oxford Nanopore Technologies. E.H. and P.P. are employees of Pacific Biosciences. C-S.C. and A.F. are employees of DNAnexus. A.S., X.Z. and S.M. are employees of Arima Genomics. J.G. and J.M. are employees of Dovetail Genomics. A.C. is an employee of Google. G.M.C. is a co-founder of Editas Medicine and has other financial interests listed at arep.med.harvard.edu/gmc/tech.html.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted July 01, 2020.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Accurate chromosome-scale haplotype-resolved assembly of human genomes
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Accurate chromosome-scale haplotype-resolved assembly of human genomes
Shilpa Garg, Arkarachai Fungtammasan, Andrew Carroll, Mike Chou, Anthony Schmitt, Xiang Zhou, Stephen Mac, Paul Peluso, Emily Hatas, Jay Ghurye, Jared Maguire, Medhat Mahmoud, Haoyu Cheng, David Heller, Justin M. Zook, Tobias Moemke, Tobias Marschall, Fritz J. Sedlazeck, John Aach, Chen-Shan Chin, George M. Church, Heng Li
bioRxiv 810341; doi: https://doi.org/10.1101/810341
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Accurate chromosome-scale haplotype-resolved assembly of human genomes
Shilpa Garg, Arkarachai Fungtammasan, Andrew Carroll, Mike Chou, Anthony Schmitt, Xiang Zhou, Stephen Mac, Paul Peluso, Emily Hatas, Jay Ghurye, Jared Maguire, Medhat Mahmoud, Haoyu Cheng, David Heller, Justin M. Zook, Tobias Moemke, Tobias Marschall, Fritz J. Sedlazeck, John Aach, Chen-Shan Chin, George M. Church, Heng Li
bioRxiv 810341; doi: https://doi.org/10.1101/810341

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4674)
  • Biochemistry (10346)
  • Bioengineering (7658)
  • Bioinformatics (26305)
  • Biophysics (13504)
  • Cancer Biology (10672)
  • Cell Biology (15423)
  • Clinical Trials (138)
  • Developmental Biology (8489)
  • Ecology (12807)
  • Epidemiology (2067)
  • Evolutionary Biology (16835)
  • Genetics (11383)
  • Genomics (15471)
  • Immunology (10603)
  • Microbiology (25186)
  • Molecular Biology (10211)
  • Neuroscience (54395)
  • Paleontology (400)
  • Pathology (1667)
  • Pharmacology and Toxicology (2889)
  • Physiology (4334)
  • Plant Biology (9237)
  • Scientific Communication and Education (1586)
  • Synthetic Biology (2556)
  • Systems Biology (6774)
  • Zoology (1461)