Abstract
Background: Metagenomic sequencing experiments require a number of preprocessing and analytical steps to interpret the microbial and genetic composition of biological samples. Such steps include quality control, adapter trimming, host decontamination, metagenomic classification, read assembly, and alignment to reference genomes.
Results: We present here an extensible and modular pipeline called Sunbeam that performs these steps in a consistent and reproducible way. It features a one-step installation and novel tools for eliminating artifactual sequences that may interfere with downstream analysis including Komplexity, a novel software tool to eliminate potentially problematic, low-complexity nucleotide sequences from metagenomic data. Another unique component of the Sunbeam pipeline is an easy-to-use extension framework that enables users to add custom processing or analysis steps directly to the Sunbeam workflow.
Conclusions: Sunbeam provides a foundation to build more in-depth analyses and to enable comparisons between disparate sequencing experiments by standardizing routine pre-processing and analytical steps. Sunbeam is written in Python using the Snakemake workflow management software and is freely available at github.com/sunbeam-labs/sunbeam under the GPLv3.