RT Journal Article SR Electronic T1 Compositional data analysis of microbiome and any-omics datasets: a revalidation of the additive logratio transformation JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.05.15.444300 DO 10.1101/2021.05.15.444300 A1 Michael Greenacre A1 Marina Martínez-Álvaro A1 Agustín Blasco YR 2021 UL http://biorxiv.org/content/early/2021/05/17/2021.05.15.444300.abstract AB Microbiome and omics datasets are, by their intrinsic biological nature, of high dimensionality, characterized by counts of large numbers of components (microbial genes, operational taxonomic units, RNA transcripts, etc…). These data are generally regarded as compositional since the total number of counts identified within a sample are irrelevant. The central concept in compositional data analysis is the logratio transformation, the simplest being the additive logratios with respect to a fixed reference component. A full set of additive logratios is not isometric in the sense of reproducing the geometry of all pairwise logratios exactly, but their lack of isometry can be measured by the Procrustes correlation. The reference component can be chosen to maximize the Procrustes correlation between the additive logratio geometry and the exact logratio geometry, and for high-dimensional data there are many potential references. As a secondary criterion, minimizing the variance of the reference component’s log-transformed relative abundance values makes the subsequent interpretation of the logratios even easier. Finally, it is preferable that the reference component not be a rare component but well populated, and substantive biological reasons might also guide the choice if several reference candidates are identified. Results: On each of three high-dimensional datasets the additive logratio transformation was performed, using references that were identified according to the abovementioned criteria. For each dataset the compositional data structure was successfully reproduced, that is the additive logratios were very close to being isometric. The Procrustes correlations achieved for these datasets were 0.9991, 0.9977 and 0.9997, respectively. In the third case, where the objective was to distinguish between three groups of samples, the approximation was made to the restricted logratio space of the between-group variance. Conclusions: We show that for high-dimensional compositional data additive logratios can provide a valid choice as transformed variables that are (1) subcompositionally coherent, (2) explaining 100% of the total logratio variance and (3) coming measurably very close to being isometric, that is approximating almost perfectly the exact logratio geometry. The interpretation of additive logratios is simple and, when the variance of the log-transformed reference is very low, it is made even simpler since each additive logratio can be identified with a corresponding compositional component.Competing Interest StatementThe authors have declared no competing interest.