PT - JOURNAL ARTICLE AU - Nicole M. Davis AU - Diana Proctor AU - Susan P. Holmes AU - David A. Relman AU - Benjamin J. Callahan TI - Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data AID - 10.1101/221499 DP - 2017 Jan 01 TA - bioRxiv PG - 221499 4099 - http://biorxiv.org/content/early/2017/11/17/221499.short 4100 - http://biorxiv.org/content/early/2017/11/17/221499.full AB - The accuracy of microbial community surveys based on marker-gene and metagenomic sequencing (MGS) suffers from the presence of contaminants — DNA sequences not truly present in the sample. Contaminants come from a variety of sources, including reagents. Appropriate laboratory practices can reduce contamination in MGS data, but do not eliminate it. Here we introduce decontam (https://github.com/benjjneb/decontam), an open-source R package which implements a statistical classification procedure for identifying contaminants in MGS data. Contaminants are identified on the basis of two widely reproduced signatures: contaminants are more frequent in low-concentration samples, and are often found in negative controls. In a dataset from the human oral microbiome, the classification of amplicon sequence variants by decontam was strongly consistent with prior microscopic observations of microbial taxa in that environment. In both metagenomics and marker-gene measurements of a mock community dilution series, the removal of contaminants identified by decontam substantially reduced technical variation due to differences in reagents and sequencing centers. The application of decontam to two recently published datasets corroborated and extended their conclusions that little evidence existed for an indigenous placenta microbiome, and that some low-frequency taxa seemingly associated with preterm birth were run-specific contaminants. decontam integrates easily with existing MGS workflows, and allows researchers to generate more accurate profiles of microbial community composition at little to no additional cost.