Whole-Genome Sequence Assembly for Mammalian Genomes: Arachne 2

  1. David B. Jaffe1,2,
  2. Jonathan Butler1,
  3. Sante Gnerre1,
  4. Evan Mauceli1,
  5. Kerstin Lindblad-Toh1,
  6. Jill P. Mesirov1,
  7. Michael C. Zody1, and
  8. Eric S. Lander1,3
  1. 1Whitehead Institute/MIT Center for Genome Research, Cambridge, Massachusetts 02141, USA; 3Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

Abstract

We previously described the whole-genome assembly program Arachne, presenting assemblies of simulated data for small to mid-sized genomes. Here we describe algorithmic adaptations to the program, allowing for assembly of mammalian-size genomes, and also improving the assembly of smaller genomes. Three principal changes were simultaneously made and applied to the assembly of the mouse genome, during a six-month period of development: (1) Supercontigs (scaffolds) were iteratively broken and rejoined using several criteria, yielding a 64-fold increase in length (N50), and apparent elimination of all global misjoins; (2) gaps between contigs in supercontigs were filled (partially or completely) by insertion of reads, as suggested by pairing within the supercontig, increasing the N50 contig length by 50%; (3) memory usage was reduced fourfold. The outcome of this mouse assembly and its analysis are described in (Mouse Genome Sequencing Consortium 2002).

Footnotes

  • 2 Corresponding author.

  • E-MAIL jaffe{at}genome.wi.mit.edu; FAX (617) 258-9108.

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.828403.

    • Received September 19, 2002.
    • Accepted October 30, 2002.
| Table of Contents

Preprint Server