Abstract
Despite tremendous progress in genome sequencing, the basic goal of producing a phased (haplotype-resolved) genome sequence with end-to-end contiguity for each chromosome at reasonable cost and effort is still unrealized. In this study, we describe an approach to performing de novo genome assembly and experimental phasing by integrating the data from Illumina short-read sequencing, 10X Genomics linked-read sequencing, and BioNano Genomics genome mapping to yield a high-quality, phased, de novo assembled human genome.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
References
Wheeler, D.A. & Wang, L. From human genome to cancer genome: the first decade. Genome Res. 23, 1054–1062 (2013).
Duncan, E., Brown, M. & Shore, E.M. The revolution in human monogenic disease mapping. Genes 5, 792–803 (2014).
Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311–317 (2010).
Tattini, L., D'Aurizio, R. & Magi, A. Detection of genomic structural variants from next-generation sequencing data. Front. Bioeng. Biotechnol. 3, 92 (2015).
Cao, H. et al. De novo assembly of a haplotype-resolved human genome. Nat. Biotechnol. 33, 617–622 (2015).
Quick, J., Quinlan, A.R. & Loman, N.J. A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer. GigaScience 3, 22 (2014).
Goodwin, S. et al. Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res. 25, 1750–1756 (2015).
Landolin, J. et al. Initial de novo assemblies of the D. melanogaster genome using long-read PacBio sequencing. 55th Annual Drosophila Research Conference, San Diego (2014).
Huddleston, J. et al. Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res. 24, 688–696 (2014).
Chaisson, M.J.P. et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015).
Voskoboynik, A. et al. The genome sequence of the colonial chordate, Botryllus schlosseri. eLife 2, e00569 (2013).
McCoy, R.C. et al. Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements. PLoS One 9, e106689 (2014).
Putnam, N.H. et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342–350 (2016).
Williams, L.J.S. et al. Paired-end sequencing of Fosmid libraries by Illumina. Genome Res. 22, 2241–2249 (2012).
Kitzman, J.O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol. 29, 59–63 (2011).
Suk, E. et al. A comprehensively molecular haplotype-resolved genome of a European individual. Genome Res. 21, 1672–1685 (2011).
Duitama, J. et al. Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of single individual haplotyping techniques. Nucleic Acids Res. 40, 2041–2053 (2012).
Lo, C. et al. On the design of clone-based haplotyping. Genome Biol. 14, R100 (2013).
Amini, S. et al. Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nat. Genet. 46, 1343–1349 (2014).
Adey, A. et al. In vitro, long-range sequence information for de novo genome assembly via transposase contiguity. Genome Res. 24, 2041–2049 (2014).
Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).
Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108, 1513–1518 (2011).
Steinberg, K.M. et al. Structural diversity and African origin of the 17q21.31 inversion polymorphism. Nat. Genet. 44, 872–880 (2012).
Mak, A.C. et al. Genome-wide structural variation detection by genome mapping on nanochannel arrays. Genetics 202, 351–362 (2016).
Zook, J.M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Preprint at http://biorxiv.org/content/early/2015/12/23/026468 (2015).
Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1, 18 (2012).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303.3997v2 (2013).
Harris, R.S. Improved Pairwise Alignment of Genomic DNA PhD thesis, Pennsylvania State Univ. (2007).
Zheng, G.X.Y. et al. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotech. 34, 303–311 (2016).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
Hunter, J.D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
Acknowledgements
This work was supported in part by R01 HG005946 (P.-Y.K.). The DNA sample was obtained from the Coriell Institute for Medical Research, and the Illumina sequence data were obtained from the US National Institute of Standards and Technology (NIST). We thank the expert sequencing staff at the Institute for Human Genetics at UCSF for generating some of the sequencing data.
Author information
Authors and Affiliations
Contributions
P.-Y.K., J.D.W., and Y.M. conceived the project and provided resources and oversight for sequencing and algorithmic analysis. K.G. prepared long libraries for 10XG GemCode sequencing. C.C. and C.L. performed long DNA preparation and BNG genome mapping experiments. E.T.L., A.R.H., Ž.D., J.Lee, and H.C. built initial genome maps and performed BNG alignment and structural variant calling. Y.M. and J.Lam performed scaffold analysis. E.T.L., A.R.H., and J.Lee performed hybrid genome assembly. P.M., K.G., and M.S.-L. performed scaffold phasing. Y.M., M.L.-S., E.T.L., J.Lam, J.Lee, and S.A.S. performed validation and quality measure analyses of the assembled data. Y.M., E.T.L., M.L.-S., and P.-Y.K. primarily wrote the manuscript and revisions, though many coauthors provided edits and Online Methods sections.
Corresponding author
Ethics declarations
Competing interests
E.T.L., A.R.H., J.Lee, Ž.D., and H.C. are employees of BioNano Genomics. P.M., K.G., and M.S.-L. are employees of 10X Genomics, and P.-Y.K. is on the scientific advisory board of BioNano Genomics.
Integrated supplementary information
Supplementary Figure 1 Ideograms showing scaffold boundaries and segmental duplication locations.
Blue lines mark the boundaries of assembly scaffolds. Black marks show the locations of segmental duplications. Magenta regions mark unassembled regions around the centromeres and telomeres. Ideogram were generated using The Genome Decoration Page, NCBI.
Supplementary Figure 2 Architecture of complex regions at the MHC and Amylase loci.
(a) MHC region (chr6: 28-32 Mb). Upper panel: green bar = reference, blue bars = hybrid assembly (bottom). Bottom panel: green phase blocks separated by SNVs in the hybrid assembly in the middle. (b) Amylase region (chr1: 160-163 Mb). Top panel: green bar = reference, blue bars = assemby. Assembly in the red box expanded to show nicking pattern in 450 kb region (bottom panel). (c) Haplotypes in increasing resolution to show alleles on the same phase block (green line = allele 1, grey line = allele 2).
Supplementary Figure 3 Assembly and phasing of Inversion.
Inversion at Chr10q 46-47.5 Mb with partial haplotyping.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–3 and Supplementary Tables 1–3 (PDF 786 kb)
Supplementary Data 1
NA12878 BNG assembly (TXT 16137 kb)
Supplementary Data 2
NA12878 hybrid assembly SV calls (TXT 278 kb)
Source data
Rights and permissions
About this article
Cite this article
Mostovoy, Y., Levy-Sakin, M., Lam, J. et al. A hybrid approach for de novo human genome sequence assembly and phasing. Nat Methods 13, 587–590 (2016). https://doi.org/10.1038/nmeth.3865
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.3865
This article is cited by
-
Genome sequencing and functional analysis of a multipurpose medicinal herb Tinospora cordifolia (Giloy)
Scientific Reports (2024)
-
Familial co-segregation and the emerging role of long-read sequencing to re-classify variants of uncertain significance in inherited retinal diseases
npj Genomic Medicine (2023)
-
A Bos taurus sequencing methods benchmark for assembly, haplotyping, and variant calling
Scientific Data (2023)
-
Analyzing the cellular and molecular atlas of ovarian mesenchymal cells provides a strategy against female reproductive aging
Science China Life Sciences (2023)
-
Combining callers improves the detection of copy number variants from whole-genome sequencing
European Journal of Human Genetics (2022)