De novo assembly of human genomes with massively parallel short read sequencing
- Ruiqiang Li1,2,3,
- Hongmei Zhu1,3,
- Jue Ruan1,3,
- Wubin Qian1,
- Xiaodong Fang1,
- Zhongbin Shi1,
- Yingrui Li1,
- Shengting Li1,
- Gao Shan1,
- Karsten Kristiansen1,2,
- Songgang Li1,
- Huanming Yang1,
- Jian Wang1 and
- Jun Wang1,2,4
- 1 Beijing Genomics Institute at Shenzhen, Shenzhen 518083, China;
- 2 Department of Biology, University of Copenhagen, Copenhagen DK-2200, Denmark
-
↵3 These authors contributed equally to this work.
Abstract
Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.
Footnotes
-
↵4 Corresponding author.
E-mail wangj{at}genomics.org.cn; fax 86-755-25274247.
-
[Supplemental material is available online at http://www.genome.org. SOAPdenovo is freely available at http://soap.genomics.org.cn/soapdenovo.html. The genome assembly results for the Asian and African individuals have been submitted to GenBank (http://www.ncbi.nlm.nih.gov/Genbank/) under accession nos. ADDF000000000 and DAAB000000000, respectively. The versions described in this study are the first versions, ADDF010000000 and DAAB010000000. The assembly and analysis results are also available at http://yh.genomics.org.cn.]
-
Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.097261.109.
-
- Received June 14, 2009.
- Accepted October 19, 2009.
- Copyright © 2010 by Cold Spring Harbor Laboratory Press