PT - JOURNAL ARTICLE AU - Michelle Badri AU - Zachary D. Kurtz AU - Richard Bonneau AU - Christian L. Müller TI - Shrinkage improves estimation of microbial associations under different normalization methods AID - 10.1101/406264 DP - 2020 Jan 01 TA - bioRxiv PG - 406264 4099 - http://biorxiv.org/content/early/2020/04/04/406264.short 4100 - http://biorxiv.org/content/early/2020/04/04/406264.full AB - Consistent estimation of associations in microbial genomic survey count data is fundamental to microbiome research. Technical limitations, including compositionality, low sample sizes, and technical variability, obstruct standard application of association measures and require data normalization prior to estimating associations. Here, we investigate the interplay between data normalization and microbial association estimation by a comprehensive analysis of statistical consistency. Leveraging the large sample size of the American Gut Project (AGP), we assess the consistency of the two prominent linear association estimators, correlation and proportionality, under different sample scenarios and data normalization schemes, including RNA-seq analysis work flows and log-ratio transformations. We show that shrinkage estimation, a standard technique in high-dimensional statistics, can universally improve the quality of association estimates for microbiome data. We find that large-scale association patterns in the AGP data can be grouped into five normalization-dependent classes. Using microbial association network construction and clustering as examples of exploratory data analysis, we show that variance-stabilizing and log-ratio approaches provide for the most consistent estimation of taxonomic and structural coherence. Taken together, the findings from our reproducible analysis workflow have important implications for microbiome studies in multiple stages of analysis, particularly when only small sample sizes are available.