The Release 6 reference sequence of the Drosophila melanogaster genome

  1. Susan E. Celniker1
  1. 1Department of Genome Dynamics, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA;
  2. 2Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia 20147, USA;
  3. 3Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, V5Z 4S6, Canada;
  4. 4Dipartimento di Biologia e Biotecnologie “Charles Darwin” and Istituto Pasteur Fondazione Cenci-Bolognetti, Sapienza Università di Roma, 00185 Roma, Italy;
  5. 5Centro de Biología Molecular “Severo Ochoa” (CSIC-UAM), Universidad Autónoma de Madrid, 28049 Madrid, Spain;
  6. 6Institute of Molecular and Cellular Biology, Russian Academy of Sciences, Novosibirsk, 630090, Russia;
  7. 7Departamento de Genética, Universidade Federal do Rio de Janeiro, CEP 21944-970, Rio de Janeiro, Brazil;
  8. 8Novosibirsk State University, Novosibirsk, 630090, Russia;
  9. 9Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
  1. Corresponding authors: RHoskins{at}lbl.gov, celniker{at}fruitfly.org
  • 10 Present address: Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA

Abstract

Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads.

Footnotes

  • [Supplemental material is available for this article.]

  • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.185579.114.

    Freely available online through the Genome Research Open Access option.

  • Received October 8, 2014.
  • Accepted January 13, 2015.

This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0.

| Table of Contents
OPEN ACCESS ARTICLE

Preprint Server