Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads

View ORCID ProfileMitchell R. Vollger, View ORCID ProfileGlennis A. Logsdon, View ORCID ProfilePeter A. Audano, View ORCID ProfileArvis Sulovari, David Porubsky, Paul Peluso, View ORCID ProfileAaron M. Wenger, Gregory T. Concepcion, View ORCID ProfileZev N. Kronenberg, View ORCID ProfileKatherine M. Munson, Carl Baker, Ashley D. Sanders, View ORCID ProfileDiana C.J. Spierings, View ORCID ProfilePeter M. Lansdorp, Urvashi Surti, Michael W. Hunkapiller, View ORCID ProfileEvan E. Eichler
doi: https://doi.org/10.1101/635037
Mitchell R. Vollger
1Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mitchell R. Vollger
Glennis A. Logsdon
1Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Glennis A. Logsdon
Peter A. Audano
1Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Peter A. Audano
Arvis Sulovari
1Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Arvis Sulovari
David Porubsky
1Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Paul Peluso
2Pacific Biosciences of California, Inc., Menlo Park, CA 94025, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Aaron M. Wenger
2Pacific Biosciences of California, Inc., Menlo Park, CA 94025, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Aaron M. Wenger
Gregory T. Concepcion
2Pacific Biosciences of California, Inc., Menlo Park, CA 94025, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zev N. Kronenberg
2Pacific Biosciences of California, Inc., Menlo Park, CA 94025, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Zev N. Kronenberg
Katherine M. Munson
1Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Katherine M. Munson
Carl Baker
1Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ashley D. Sanders
3European Molecular Biology Laboratory, Genome Biology Unit, 69117, Heidelberg, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Diana C.J. Spierings
4European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV Groningen, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Diana C.J. Spierings
Peter M. Lansdorp
4European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV Groningen, The Netherlands
5Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada
6Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Peter M. Lansdorp
Urvashi Surti
7Department of Pathology, University of Pittsburgh School of Medicine, and University of Pittsburgh Medical Center, Pittsburgh, PA 15213, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael W. Hunkapiller
2Pacific Biosciences of California, Inc., Menlo Park, CA 94025, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Evan E. Eichler
1Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
8Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Evan E. Eichler
  • For correspondence: eee@gs.washington.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective stand-alone technology for de novo assembly of human genomes.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted August 13, 2019.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads
Mitchell R. Vollger, Glennis A. Logsdon, Peter A. Audano, Arvis Sulovari, David Porubsky, Paul Peluso, Aaron M. Wenger, Gregory T. Concepcion, Zev N. Kronenberg, Katherine M. Munson, Carl Baker, Ashley D. Sanders, Diana C.J. Spierings, Peter M. Lansdorp, Urvashi Surti, Michael W. Hunkapiller, Evan E. Eichler
bioRxiv 635037; doi: https://doi.org/10.1101/635037
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads
Mitchell R. Vollger, Glennis A. Logsdon, Peter A. Audano, Arvis Sulovari, David Porubsky, Paul Peluso, Aaron M. Wenger, Gregory T. Concepcion, Zev N. Kronenberg, Katherine M. Munson, Carl Baker, Ashley D. Sanders, Diana C.J. Spierings, Peter M. Lansdorp, Urvashi Surti, Michael W. Hunkapiller, Evan E. Eichler
bioRxiv 635037; doi: https://doi.org/10.1101/635037

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4237)
  • Biochemistry (9146)
  • Bioengineering (6785)
  • Bioinformatics (24019)
  • Biophysics (12137)
  • Cancer Biology (9543)
  • Cell Biology (13795)
  • Clinical Trials (138)
  • Developmental Biology (7640)
  • Ecology (11714)
  • Epidemiology (2066)
  • Evolutionary Biology (15517)
  • Genetics (10649)
  • Genomics (14331)
  • Immunology (9491)
  • Microbiology (22856)
  • Molecular Biology (9103)
  • Neuroscience (49027)
  • Paleontology (355)
  • Pathology (1484)
  • Pharmacology and Toxicology (2572)
  • Physiology (3848)
  • Plant Biology (8335)
  • Scientific Communication and Education (1472)
  • Synthetic Biology (2296)
  • Systems Biology (6196)
  • Zoology (1302)