PT - JOURNAL ARTICLE AU - Daehwan Kim AU - Li Song AU - Florian P. Breitwieser AU - Steven L. Salzberg TI - Centrifuge: rapid and sensitive classification of metagenomic sequences AID - 10.1101/054965 DP - 2016 Jan 01 TA - bioRxiv PG - 054965 4099 - http://biorxiv.org/content/early/2016/05/24/054965.short 4100 - http://biorxiv.org/content/early/2016/05/24/054965.full AB - Centrifuge is a novel microbial classification engine that enables rapid, accurate and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4,078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI non-redundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer based indexing schemes, which require far more extensive space. Centrifuge is available as free, open-source software from www.ccb.jhu.edu/software/centrifuge