TY - JOUR T1 - An analytic approach for interpretable predictive models in high dimensional data, in the presence of interactions with exposures JF - bioRxiv DO - 10.1101/102475 SP - 102475 AU - Sahir R Bhatnagar AU - Yi Yang AU - Mathieu Blanchette AU - Budhachandra Khundrakpam AU - Alan Evans AU - Luigi Bouchard AU - Celia MT Greenwood Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/01/23/102475.abstract N2 - Computational approaches to variable selection have become increasingly important with the advent of high-throughput technologies in genomics and brain imaging studies, where the data has become massive, yet where it is believed that the number of truly important variables is small relative to the total number of variables. Although many approaches have been developed for main effects, less attention has been paid to interaction models. Here, starting from the hypothesis that a binary exposure variable can alter correlation patterns between clusters of high-dimensional variables, i.e. alter network properties of the variables, we explore whether such exposure-dependent clustering relationships can improve predictive modelling of an outcome or phenotype variable. Hence, we propose a modelling framework called ECLUST to test this hypothesis, and evaluate performance through extensive simulations. We see improved model fit in many scenarios. We further illustrate the framework through the analysis of three data sets from very different fields, each with high dimensional data, a binary exposure, and a phenotype of interest. Our method is available in the eclust CRAN package. ER -