PT - JOURNAL ARTICLE AU - Kimberly E. Roche AU - Sayan Mukherjee TI - The accuracy of absolute differential abundance analysis from relative count data AID - 10.1101/2021.12.06.471397 DP - 2021 Jan 01 TA - bioRxiv PG - 2021.12.06.471397 4099 - http://biorxiv.org/content/early/2021/12/07/2021.12.06.471397.short 4100 - http://biorxiv.org/content/early/2021/12/07/2021.12.06.471397.full AB - Concerns have been raised about the use of relative abundance data derived from next generation sequencing as a proxy for absolute abundances. In the differential abundance setting compositional effects are hypothesized to contribute to increased rates of spurious differences (false positives). However in practice, partial reconstruction of total abundance can be imputed through renormalization of observed per-sample abundance. Given the renormalized data differential abundance need not be called on relative counts themselves but on estimates of absolute counts. We use simulated data to explore the consistency of differential abundance calls made on these adjusted relative abundances and find that while overall rates of false positive calls are low substantial error is possible. Conditions consistent with microbial community profiling are the most at risk of error induced by compositional effects. Increasing complexity of composition (i.e. increasing feature number) is generally protective against this effect. In real data sets drawn from 16S metabarcoding, expression array, bulk RNA-seq, and single-cell RNA-seq experiments, results are similar: though median accuracy is high, microbial community profiling and single-cell transcriptomic data sets can have poor outcomes. However, we show that problematic data sets can often be identified by summary characteristics of their relative abundances alone, giving researchers a means of anticipating problems and adjusting analysis strategies where appropriate.Competing Interest StatementThe authors have declared no competing interest.