PT - JOURNAL ARTICLE
AU - Bhatnagar, Sahir R
AU - Yang, Yi
AU - Khundrakpam, Budhachandra S
AU - Evans, Alan
AU - Blanchette, Mathieu
AU - Bouchard, Luigi
AU - Greenwood, Celia MT
TI - An analytic approach for interpretable predictive models in high dimensional data, in the presence of interactions with exposures
AID - 10.1101/102475
DP - 2017 Jan 01
TA - bioRxiv
PG - 102475
4099 - http://biorxiv.org/content/early/2017/01/23/102475.short
4100 - http://biorxiv.org/content/early/2017/01/23/102475.full
AB - Computational approaches to variable selection have become increasingly important with the advent of high-throughput technologies in genomics and brain imaging studies, where the data has become massive, yet where it is believed that the number of truly important variables is small relative to the total number of variables. Although many approaches have been developed for main effects, less attention has been paid to interaction models. Here, starting from the hypothesis that a binary exposure variable can alter correlation patterns between clusters of high-dimensional variables, i.e. alter network properties of the variables, we explore whether such exposure-dependent clustering relationships can improve predictive modelling of an outcome or phenotype variable. Hence, we propose a modelling framework called ECLUST to test this hypothesis, and evaluate performance through extensive simulations. We see improved model fit in many scenarios. We further illustrate the framework through the analysis of three data sets from very different fields, each with high dimensional data, a binary exposure, and a phenotype of interest. Our method is available in the eclust CRAN package.