Sequencing of natural strains of Arabidopsis thaliana with short reads

  1. Stephan Ossowski1,
  2. Korbinian Schneeberger1,
  3. Richard M. Clark1,2,
  4. Christa Lanz,
  5. Norman Warthmann, and
  6. Detlef Weigel3
  1. Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
  1. 1 These authors contributed equally to this work.

Abstract

Whole-genome hybridization studies have suggested that the nuclear genomes of accessions (natural strains) of Arabidopsis thaliana can differ by several percent of their sequence. To examine this variation, and as a first step in the 1001 Genomes Project for this species, we produced 15- to 25-fold coverage in Illumina sequencing-by-synthesis (SBS) reads for the reference accession, Col-0, and two divergent strains, Bur-0 and Tsu-1. We aligned reads to the reference genome sequence to assess data quality metrics and to detect polymorphisms. Alignments revealed 823,325 unique single nucleotide polymorphisms (SNPs) and 79,961 unique 1- to 3-bp indels in the divergent accessions at a specificity of >99%, and over 2000 potential errors in the reference genome sequence. We also identified >3.4 Mb of the Bur-0 and Tsu-1 genomes as being either extremely dissimilar, deleted, or duplicated relative to the reference genome. To obtain sequences for these regions, we incorporated the Velvet assembler into a targeted de novo assembly method. This approach yielded 10,921 high-confidence contigs that were anchored to flanking sequences and harbored indels as large as 641 bp. Our methods are broadly applicable for polymorphism discovery in moderate to large genomes even at highly diverged loci, and we established by subsampling the Illumina SBS coverage depth required to inform a broad range of functional and evolutionary studies. Our pipeline for aligning reads and predicting SNPs and indels, SHORE, is available for download at http://1001genomes.org.

Footnotes

  • 2 Present address: Department of Biology, University of Utah, Salt Lake City, UT 84112, USA.

  • 3 Corresponding author.

    3 E-mail weigel{at}weigelworld.org; fax 49-7071-601-1412.

  • [Supplemental material is available online at www.genome.org. The sequence data from this study have been submitted to the NCBI Short Read Archive (http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi) under accession no. SRA001168 and GenBank (http://www.ncbi.nlm.nih.gov/Genbank/index.html) under accession nos. FI160450–FI160637. Polymorphism and reference base predictions, contigs from targeted de novo assembly, and a version of the reference sequence masked for oversampled regions are available at http://1001genomes.org; polymorphism predictions are also available at TAIR (http://www.arabidopsis.org).]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.080200.108.

    • Received April 29, 2008.
    • Accepted September 18, 2008.
  • Freely available online through the Genome Research Open Access option.

| Table of Contents
OPEN ACCESS ARTICLE

Preprint Server