Abstract
Motivation The increasingly available multi-omic datasets have posed both new opportunities and challenges to the development of quantitative methods for discovering novel mechanisms in biomedical research. One natural approach to analyzing such datasets is mediation analysis originated from the causal inference literature. Mediation analysis can help unravel the mechanisms through which exposure(s) exert the effect on outcome(s). However, existing methods fail to consider the case where (1) both exposures and mediators are potentially high-dimensional and (2) it is very likely that some important confounding variables are unmeasured or latent; both issues are quite common in practice. To the best of our knowledge, however, no methods have been developed to address these challenges with statistical guarantees.
Results In this article, we propose a new method for HIgh-dimensional LAtent-confounding Mediation Analysis, abbreviated as “HILAMA”, that considers both high-dimensional exposures and mediators, and more importantly, the possible existence of latent confounding variables. HILAMA achieves false discovery rate (FDR) control under finite sample size for multiple mediation effect testing. The proposed method is evaluated through extensive simulation experiments, demonstrating its improved stability in FDR control and superior power in finite sample size compared to existing competitive methods. Furthermore, our method is applied to the proteomics-radiomics data from ADNI, identifying some key proteins and brain regions relating to Alzheimer’s disease. The results show that HILAMA can effectively control FDR and provide valid statistical inference for high dimensional mediation analysis with latent confounding variables.
Availability The R package HILAMA is publicly available at https://github.com/Cinbo-Wang/HILAMA.
Contact cinbo_w{at}sjtu.edu.cn
Competing Interest Statement
The authors have declared no competing interest.