Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Single haplotype assembly of the human genome from a hydatidiform mole

Karyn Meltz Steinberg, Valerie A. Schneider, Tina A. Graves-Lindsay, Robert S. Fulton, Richa Agarwala, John Huddleston, Sergey A. Shiryev, Aleksandr Morgulis, Urvashi Surti, Wesley C. Warren, Deanna M. Church, Evan E. Eichler, Richard K. Wilson
doi: https://doi.org/10.1101/006841
Karyn Meltz Steinberg
1The Genome Institute at Washington University, St. Louis, MO
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Valerie A. Schneider
2National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tina A. Graves-Lindsay
1The Genome Institute at Washington University, St. Louis, MO
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Robert S. Fulton
1The Genome Institute at Washington University, St. Louis, MO
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Richa Agarwala
2National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John Huddleston
3Department of Genome Sciences, University of Washington, Seattle, WA
4Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sergey A. Shiryev
2National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Aleksandr Morgulis
2National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Urvashi Surti
5University of Pittsburgh, Pittsburgh, PA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Wesley C. Warren
1The Genome Institute at Washington University, St. Louis, MO
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Deanna M. Church
6Personalis, Inc. Menlo Park, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Evan E. Eichler
3Department of Genome Sciences, University of Washington, Seattle, WA
4Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Richard K. Wilson
1The Genome Institute at Washington University, St. Louis, MO
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

ABSTRACT

An accurate and complete reference human genome sequence assembly is essential for accurately interpreting individual genomes and associating sequence variation with disease phenotypes. While the current reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can help overcome these problems, even the longest available reads do not resolve all regions of the human genome. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones, an optical map, and 100X whole genome shotgun (WGS) sequence coverage using short (lllumina) read pairs. We used the WGS sequence and the GRCh37 reference assembly to create a sequence assembly of the CHM1 genome. We subsequently incorporated 382 finished CHORI-17 BAC clone sequences to generate a second draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene and repeat content show this assembly to be of excellent quality and contiguity, and comparisons to ClinVar and the NHGRI GWAS catalog show that the CHM1 genome does not harbor an excess of deleterious alleles. However, comparison to assembly-independent resources, such as BAC clone end sequences and long reads generated by a different sequencing technology (PacBio), indicate misassembled regions. The great majority of these regions is enriched for structural variation and segmental duplication, and can be resolved in the future by sequencing BAC clone tiling paths. This publicly available first generation assembly will be integrated into the Genome Reference Consortium (GRC) curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Back to top
PreviousNext
Posted July 03, 2014.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Single haplotype assembly of the human genome from a hydatidiform mole
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Single haplotype assembly of the human genome from a hydatidiform mole
Karyn Meltz Steinberg, Valerie A. Schneider, Tina A. Graves-Lindsay, Robert S. Fulton, Richa Agarwala, John Huddleston, Sergey A. Shiryev, Aleksandr Morgulis, Urvashi Surti, Wesley C. Warren, Deanna M. Church, Evan E. Eichler, Richard K. Wilson
bioRxiv 006841; doi: https://doi.org/10.1101/006841
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Single haplotype assembly of the human genome from a hydatidiform mole
Karyn Meltz Steinberg, Valerie A. Schneider, Tina A. Graves-Lindsay, Robert S. Fulton, Richa Agarwala, John Huddleston, Sergey A. Shiryev, Aleksandr Morgulis, Urvashi Surti, Wesley C. Warren, Deanna M. Church, Evan E. Eichler, Richard K. Wilson
bioRxiv 006841; doi: https://doi.org/10.1101/006841

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3686)
  • Biochemistry (7774)
  • Bioengineering (5668)
  • Bioinformatics (21245)
  • Biophysics (10563)
  • Cancer Biology (8162)
  • Cell Biology (11915)
  • Clinical Trials (138)
  • Developmental Biology (6738)
  • Ecology (10388)
  • Epidemiology (2065)
  • Evolutionary Biology (13843)
  • Genetics (9694)
  • Genomics (13056)
  • Immunology (8123)
  • Microbiology (19956)
  • Molecular Biology (7833)
  • Neuroscience (42973)
  • Paleontology (318)
  • Pathology (1276)
  • Pharmacology and Toxicology (2256)
  • Physiology (3350)
  • Plant Biology (7208)
  • Scientific Communication and Education (1309)
  • Synthetic Biology (1999)
  • Systems Biology (5528)
  • Zoology (1126)