Centrifuge: rapid and sensitive classification of metagenomic sequences

  1. Steven L. Salzberg1,2,3
  1. 1Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA;
  2. 2Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA;
  3. 3Departments of Biomedical Engineering and Biostatistics, Johns Hopkins University, Baltimore, Maryland 21205, USA
  1. Corresponding author: infphilo{at}gmail.com
  1. 4 These authors contributed equally to this work.

Abstract

Centrifuge is a novel microbial classification engine that enables rapid, accurate, and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together, these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI nonredundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer-based indexing schemes, which require far more extensive space.

Footnotes

  • [Supplemental material is available for this article.]

  • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.210641.116.

  • Freely available online through the Genome Research Open Access option.

  • Received May 28, 2016.
  • Accepted October 13, 2016.

This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

| Table of Contents
OPEN ACCESS ARTICLE

Preprint Server