Multivariate Welch t-test on distances

Bioinformatics. 2016 Dec 1;32(23):3552-3558. doi: 10.1093/bioinformatics/btw524. Epub 2016 Aug 11.

Abstract

Motivation: Permutational non-Euclidean analysis of variance, PERMANOVA, is routinely used in exploratory analysis of multivariate datasets to draw conclusions about the significance of patterns visualized through dimension reduction. This method recognizes that pairwise distance matrix between observations is sufficient to compute within and between group sums of squares necessary to form the (pseudo) F statistic. Moreover, not only Euclidean, but arbitrary distances can be used. This method, however, suffers from loss of power and type I error inflation in the presence of heteroscedasticity and sample size imbalances.

Results: We develop a solution in the form of a distance-based Welch t-test, [Formula: see text], for two sample potentially unbalanced and heteroscedastic data. We demonstrate empirically the desirable type I error and power characteristics of the new test. We compare the performance of PERMANOVA and [Formula: see text] in reanalysis of two existing microbiome datasets, where the methodology has originated.

Availability and implementation: The source code for methods and analysis of this article is available at https://github.com/alekseyenko/Tw2 Further guidance on application of these methods can be obtained from the author.

Contact: alekseye@musc.edu.

MeSH terms

  • Animals
  • Computer Simulation
  • Gastrointestinal Tract / microbiology
  • Humans
  • Mice
  • Microbiota*
  • Models, Statistical*
  • Psoriasis / microbiology
  • Sample Size
  • Skin / microbiology