PT - JOURNAL ARTICLE AU - M. Senthil Kumar AU - Eric V. Slud AU - Kwame Okrah AU - Stephanie C. Hicks AU - Sridhar Hannenhalli AU - Héctor Corrada Bravo TI - Analysis and correction of compositional bias in sparse sequencing count data AID - 10.1101/142851 DP - 2017 Jan 01 TA - bioRxiv PG - 142851 4099 - http://biorxiv.org/content/early/2017/05/26/142851.short 4100 - http://biorxiv.org/content/early/2017/05/26/142851.full AB - Count data derived from high-throughput DNA sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. We demonstrate that existing techniques for estimating compositional bias fail with sparse metagenomic 16S count data and propose an empirical Bayes normalization approach to overcome this problem. In addition, we clarify the assumptions underlying frequently used scaling normalization methods in light of compositional bias, including scaling methods that were not designed directly to address it.