Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A hybrid approach for de novo human genome sequence assembly and phasing

Abstract

Despite tremendous progress in genome sequencing, the basic goal of producing a phased (haplotype-resolved) genome sequence with end-to-end contiguity for each chromosome at reasonable cost and effort is still unrealized. In this study, we describe an approach to performing de novo genome assembly and experimental phasing by integrating the data from Illumina short-read sequencing, 10X Genomics linked-read sequencing, and BioNano Genomics genome mapping to yield a high-quality, phased, de novo assembled human genome.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1
Figure 2: Schematic of the relative sizes of scaffolds produced during the assembly process.
Figure 3: Alignment and phasing of the hybrid assembly.

Similar content being viewed by others

Accession codes

Primary accessions

BioProject

Sequence Read Archive

References

  1. Wheeler, D.A. & Wang, L. From human genome to cancer genome: the first decade. Genome Res. 23, 1054–1062 (2013).

    Article  CAS  Google Scholar 

  2. Duncan, E., Brown, M. & Shore, E.M. The revolution in human monogenic disease mapping. Genes 5, 792–803 (2014).

    Article  CAS  Google Scholar 

  3. Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311–317 (2010).

    Article  CAS  Google Scholar 

  4. Tattini, L., D'Aurizio, R. & Magi, A. Detection of genomic structural variants from next-generation sequencing data. Front. Bioeng. Biotechnol. 3, 92 (2015).

    Article  Google Scholar 

  5. Cao, H. et al. De novo assembly of a haplotype-resolved human genome. Nat. Biotechnol. 33, 617–622 (2015).

    Article  CAS  Google Scholar 

  6. Quick, J., Quinlan, A.R. & Loman, N.J. A reference bacterial genome dataset generated on the MinION portable single-molecule nanopore sequencer. GigaScience 3, 22 (2014).

    Article  Google Scholar 

  7. Goodwin, S. et al. Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res. 25, 1750–1756 (2015).

    Article  CAS  Google Scholar 

  8. Landolin, J. et al. Initial de novo assemblies of the D. melanogaster genome using long-read PacBio sequencing. 55th Annual Drosophila Research Conference, San Diego (2014).

  9. Huddleston, J. et al. Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res. 24, 688–696 (2014).

    Article  CAS  Google Scholar 

  10. Chaisson, M.J.P. et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015).

    Article  CAS  Google Scholar 

  11. Voskoboynik, A. et al. The genome sequence of the colonial chordate, Botryllus schlosseri. eLife 2, e00569 (2013).

  12. McCoy, R.C. et al. Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements. PLoS One 9, e106689 (2014).

    Article  Google Scholar 

  13. Putnam, N.H. et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342–350 (2016).

    Article  CAS  Google Scholar 

  14. Williams, L.J.S. et al. Paired-end sequencing of Fosmid libraries by Illumina. Genome Res. 22, 2241–2249 (2012).

    Article  CAS  Google Scholar 

  15. Kitzman, J.O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol. 29, 59–63 (2011).

    Article  CAS  Google Scholar 

  16. Suk, E. et al. A comprehensively molecular haplotype-resolved genome of a European individual. Genome Res. 21, 1672–1685 (2011).

    Article  CAS  Google Scholar 

  17. Duitama, J. et al. Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of single individual haplotyping techniques. Nucleic Acids Res. 40, 2041–2053 (2012).

    Article  CAS  Google Scholar 

  18. Lo, C. et al. On the design of clone-based haplotyping. Genome Biol. 14, R100 (2013).

    Article  Google Scholar 

  19. Amini, S. et al. Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nat. Genet. 46, 1343–1349 (2014).

    Article  CAS  Google Scholar 

  20. Adey, A. et al. In vitro, long-range sequence information for de novo genome assembly via transposase contiguity. Genome Res. 24, 2041–2049 (2014).

    Article  CAS  Google Scholar 

  21. Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).

    Article  CAS  Google Scholar 

  22. Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108, 1513–1518 (2011).

    Article  CAS  Google Scholar 

  23. Steinberg, K.M. et al. Structural diversity and African origin of the 17q21.31 inversion polymorphism. Nat. Genet. 44, 872–880 (2012).

    Article  CAS  Google Scholar 

  24. Mak, A.C. et al. Genome-wide structural variation detection by genome mapping on nanochannel arrays. Genetics 202, 351–362 (2016).

    Article  CAS  Google Scholar 

  25. Zook, J.M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Preprint at http://biorxiv.org/content/early/2015/12/23/026468 (2015).

  26. Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1, 18 (2012).

    Article  Google Scholar 

  27. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303.3997v2 (2013).

  28. Harris, R.S. Improved Pairwise Alignment of Genomic DNA PhD thesis, Pennsylvania State Univ. (2007).

  29. Zheng, G.X.Y. et al. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotech. 34, 303–311 (2016).

    Article  CAS  Google Scholar 

  30. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  31. Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  CAS  Google Scholar 

  32. Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    Article  CAS  Google Scholar 

  33. Hunter, J.D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by R01 HG005946 (P.-Y.K.). The DNA sample was obtained from the Coriell Institute for Medical Research, and the Illumina sequence data were obtained from the US National Institute of Standards and Technology (NIST). We thank the expert sequencing staff at the Institute for Human Genetics at UCSF for generating some of the sequencing data.

Author information

Authors and Affiliations

Authors

Contributions

P.-Y.K., J.D.W., and Y.M. conceived the project and provided resources and oversight for sequencing and algorithmic analysis. K.G. prepared long libraries for 10XG GemCode sequencing. C.C. and C.L. performed long DNA preparation and BNG genome mapping experiments. E.T.L., A.R.H., Ž.D., J.Lee, and H.C. built initial genome maps and performed BNG alignment and structural variant calling. Y.M. and J.Lam performed scaffold analysis. E.T.L., A.R.H., and J.Lee performed hybrid genome assembly. P.M., K.G., and M.S.-L. performed scaffold phasing. Y.M., M.L.-S., E.T.L., J.Lam, J.Lee, and S.A.S. performed validation and quality measure analyses of the assembled data. Y.M., E.T.L., M.L.-S., and P.-Y.K. primarily wrote the manuscript and revisions, though many coauthors provided edits and Online Methods sections.

Corresponding author

Correspondence to Pui-Yan Kwok.

Ethics declarations

Competing interests

E.T.L., A.R.H., J.Lee, Ž.D., and H.C. are employees of BioNano Genomics. P.M., K.G., and M.S.-L. are employees of 10X Genomics, and P.-Y.K. is on the scientific advisory board of BioNano Genomics.

Integrated supplementary information

Supplementary Figure 1 Ideograms showing scaffold boundaries and segmental duplication locations.

Blue lines mark the boundaries of assembly scaffolds. Black marks show the locations of segmental duplications. Magenta regions mark unassembled regions around the centromeres and telomeres. Ideogram were generated using The Genome Decoration Page, NCBI.

Supplementary Figure 2 Architecture of complex regions at the MHC and Amylase loci.

(a) MHC region (chr6: 28-32 Mb). Upper panel: green bar = reference, blue bars = hybrid assembly (bottom). Bottom panel: green phase blocks separated by SNVs in the hybrid assembly in the middle. (b) Amylase region (chr1: 160-163 Mb). Top panel: green bar = reference, blue bars = assemby. Assembly in the red box expanded to show nicking pattern in 450 kb region (bottom panel). (c) Haplotypes in increasing resolution to show alleles on the same phase block (green line = allele 1, grey line = allele 2).

Supplementary Figure 3 Assembly and phasing of Inversion.

Inversion at Chr10q 46-47.5 Mb with partial haplotyping.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–3 and Supplementary Tables 1–3 (PDF 786 kb)

Supplementary Data 1

NA12878 BNG assembly (TXT 16137 kb)

Supplementary Data 2

NA12878 hybrid assembly SV calls (TXT 278 kb)

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mostovoy, Y., Levy-Sakin, M., Lam, J. et al. A hybrid approach for de novo human genome sequence assembly and phasing. Nat Methods 13, 587–590 (2016). https://doi.org/10.1038/nmeth.3865

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.3865

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research