Feature matrix normalization, transformation and calculation of ß-diversity in metagenomics: Theoretical and applied perspectives on your decisions

Casper Sahl Poulsen; Frank Møller Aarestrup; Christian Brinch; Claus Thorn Ekstrøm

doi:10.1101/859157

Abstract

Microbial metagenomics utilising next generation sequencing is a powerful experimental approach enabling detailed and potentially complete descriptions of the microbial world around and within us. Selecting how to perform feature data normalization, transformation and calculate ß-diversity is a critical step in the analysis of metagenomic data, but also a step for which a multitude of methods are available. Researchers need to have a broad overview and understand the many methods that exist in the field and the consequences from applying them. In this perspectives article, some of the most widely used metagenomic feature data normalizations, transformations and ß-diversity metrics are discussed in the context of multivariate visualizations. We provide a framework that other researchers can utilize to evaluate how robust their test data are when applying different normalizations, transformations and ß-diversity metrics, and visually compare the results of the methods. We constructed an in silico test dataset to evaluate the setup and clarify how the theoretical discussion is transferable to this data. We urge other researchers to implement their own test data, normalization, transformation, ß-diversity metric and visualization methods, in the hope that it will advance better decision making both in study design and analysis strategy.

Footnotes

https://github.com/csapou/DataProcessinginMetagenomics

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.