A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples

Samia N. Naccache; Scot Federman; Narayanan Veeraraghavan; Matei Zaharia; Deanna Lee; Erik Samayoa; Jerome Bouquet; Alexander L. Greninger; Ka-Cheung Luk; Barryett Enge; Debra A. Wadford; Sharon L. Messenger; Gillian L. Genrich; Kristen Pellegrino; Gilda Grard; Eric Leroy; Bradley S. Schneider; Joseph N. Fair; Miguel A. Martínez; Pavel Isa; John A. Crump; Joseph L. DeRisi; Taylor Sittler; John Hackett; Steve Miller; Charles Y. Chiu

doi:10.1101/gr.171934.113

A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples

¹Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA;
²UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA;
³Department of Computer Science, University of California, Berkeley, California 94720, USA;
⁴Department of Biochemistry, UCSF, San Francisco, California 94107, USA;
⁵Abbott Diagnostics, Abbott Park, Illinois 60064, USA;
⁶Viral and Rickettsial Disease Laboratory, California Department of Public Health, Richmond, California 94804, USA;
⁷Department of Family and Community Medicine, UCSF, San Francisco, California 94143, USA;
⁸Viral Emergent Diseases Unit, Centre International de Recherches Médicales de Franceville, Franceville, BP 769, Gabon;
⁹Metabiota, Inc., San Francisco, California 94104, USA;
¹⁰Departamento de Genética del Desarrollo y Fisiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, 62260, Mexico;
¹¹Division of Infectious Diseases and International Health and the Duke Global Health Institute, Duke University Medical Center, Durham, North Carolina 27708, USA;
¹²Kilimanjaro Christian Medical Centre, Moshi, Kilimanjaro, 7393, Tanzania;
¹³Centre for International Health, University of Otago, Dunedin, 9054, New Zealand;
¹⁴Department of Medicine, Division of Infectious Diseases, UCSF, San Francisco, California 94143, USA

Abstract

Unbiased next-generation sequencing (NGS) approaches enable comprehensive pathogen detection in the clinical microbiology laboratory and have numerous applications for public health surveillance, outbreak investigation, and the diagnosis of infectious diseases. However, practical deployment of the technology is hindered by the bioinformatics challenge of analyzing results accurately and in a clinically relevant timeframe. Here we describe SURPI (“sequence-based ultrarapid pathogen identification”), a computational pipeline for pathogen identification from complex metagenomic NGS data generated from clinical samples, and demonstrate use of the pipeline in the analysis of 237 clinical samples comprising more than 1.1 billion sequences. Deployable on both cloud-based and standalone servers, SURPI leverages two state-of-the-art aligners for accelerated analyses, SNAP and RAPSearch, which are as accurate as existing bioinformatics tools but orders of magnitude faster in performance. In fast mode, SURPI detects viruses and bacteria by scanning data sets of 7–500 million reads in 11 min to 5 h, while in comprehensive mode, all known microorganisms are identified, followed by de novo assembly and protein homology searches for divergent viruses in 50 min to 16 h. SURPI has also directly contributed to real-time microbial diagnosis in acutely ill patients, underscoring its potential key role in the development of unbiased NGS-based clinical assays in infectious diseases that demand rapid turnaround times.

Footnotes

↵15 Corresponding author

E-mail charles.chiu{at}ucsf.edu
[Supplemental material is available for this article.]
Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.171934.113.

Freely available online through the Genome Research Open Access option.

Received December 31, 2013.
Accepted March 26, 2014.

This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.