Abstract
Motivation A key issue in the omics literature is the search of statistically significant relation-ships between molecular markers and phenotype. The aim is to detect disease-related discriminatory features while controlling for false positive associations at adequate power. Metabolome-wide association studies have revealed significant relationships of metabolic phenotypes with disease risk by analysing hundreds to tens of thousands of molecular variables leading to multivariate data which are highly noisy and collinear. In this context, Bonferroni or Sidak correction are rather useful as these are valid for independent tests, while permutation procedures allow for the estimation of p-values from the null distribution without assuming independence among features. Nevertheless, under the permutation approach the distribution of p-values may presents systematic deviations from the theoretical null distribution which leads to biased adjusted threshold estimate, e.g. smaller than a Bonferroni or Sidak correction.
Methods We make use of parametric approximation methods based on a multivariate Normal distribution to derive stable estimates of the metabolome-wide significance level within a univariate approach based on a permutation procedure which effectively controls the maximum overall type I error rate at the α level.
Results We illustrate the results for different model parametrizations and distributional features of the outcome measure, as well as for diverse correlation levels within the features and between the features and the phenotype in real data and simulated studies.
Availability MWSL is the open-source R software package for the empirical estimation of the metabolomic-wide significance level available at https://github.com/AlinaPeluso/MWSL. This include the original metabolomic dataset employed in this study and the main function for the MWSL estimation. A user guide tutorial is provided to detail the procedure.