PT - JOURNAL ARTICLE AU - Sean M. Gibbons AU - Claire Duvallet AU - Eric J. Alm TI - Correcting for batch effects in case-control microbiome studies AID - 10.1101/165910 DP - 2017 Jan 01 TA - bioRxiv PG - 165910 4099 - http://biorxiv.org/content/early/2017/07/24/165910.short 4100 - http://biorxiv.org/content/early/2017/07/24/165910.full AB - High-throughput data generation platforms, like mass-spectrometry, microarrays, and second-generation sequencing are susceptible to batch effects due to run-to-run variation in reagents, equipment, protocols, or personnel. Currently, batch correction methods are not commonly applied to microbiome sequencing datasets. In this paper, we compare multiple batch-correction methods applied to microbiome case-control studies. We introduce a model-free normalization procedure where features (i.e. bacterial taxa) in case samples are converted to percentiles of the equivalent features in control samples within a study prior to pooling data across studies. We look at how this percentile-normalization method compares to ComBat, a widely used batch-correction model developed for RNA microarray data, and traditional meta-analysis methods for combining independent p-values. Overall, we show that percentile-normalization is a simple, model-free approach for removing batch effects and improving sensitivity in case-control meta-analyses.Author Summary Batch effects present a significant obstacle to comparing results across independent studies. Traditional meta-analysis techniques for combining p-values from independent studies, like Fisher’s method, are effective, but statistically conservative. If batch-effects can be corrected, then statistical tests can be performed on data pooled across studies, increasing sensitivity to detect differences between treatment groups. Here, we show how a simple, model-free approach corrects for batch effects in case-control datasets.