RT Journal Article SR Electronic T1 LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.10.03.462964 DO 10.1101/2021.10.03.462964 A1 Yingtian Hu A1 Glen A. Satten A1 Yi-Juan Hu YR 2021 UL http://biorxiv.org/content/early/2021/10/04/2021.10.03.462964.abstract AB Motivation Compositional analysis is based on the premise that a relatively small proportion of taxa are “differentially abundant”, while the ratios of the relative abundances of the remaining taxa remain unchanged. Most existing methods of compositional analysis such as ANCOM or ANCOM-BC use log-transformed data, but log-transformation of data with pervasive zero counts is problematic, and these methods cannot always control the false discovery rate (FDR). Further, high-throughput microbiome data such as 16S amplicon or metagenomic sequencing are subject to experimental biases that are introduced in every step of the experimental workflow. McLaren, Willis and Callahan [1] have recently proposed a model for how these biases affect relative abundance data.Methods Motivated by [1], we show that the (log) odds ratios in a logistic regression comparing counts in two taxa are invariant to experimental biases. With this motivation, we propose LOCOM, a robust logistic regression approach to compositional analysis, that does not require pseudocounts. We use a Firth bias-corrected estimating function to account for sparse data. Inference is based on permutation to account for overdispersion and small sample sizes. Traits can be either binary or continuous, and adjustment for continuous and/or discrete confounding covariates is supported.Results Our simulations indicate that LOCOM always preserved FDR and had much improved sensitivity over existing methods. In contrast, ANCOM often had inflated FDR; ANCOM-BC largely controlled FDR but still had modest inflation occasionally; ALDEx2 generally had low sensitivity. LOCOM and ANCOM were robust to experimental biases in every situation, while ANCOM-BC and ALDEx2 had elevated FDR when biases at causal and non-causal taxa were differentially distributed. The flexibility of our method for a variety of microbiome studies is illustrated by the analysis of data from two microbiome studies.Availability and implementation Our R package LOCOM is available on GitHub at https://github.com/yijuanhu/LOCOM in formats appropriate for Macintosh or Windows.Competing Interest StatementThe authors have declared no competing interest.