RT Journal Article
SR Electronic
T1 An analytic approach for interpretable predictive models in high dimensional data, in the presence of interactions with exposures
JF bioRxiv
FD Cold Spring Harbor Laboratory Press
SP 102475
DO 10.1101/102475
A1 Bhatnagar, Sahir R
A1 Yang, Yi
A1 Khundrakpam, Budhachandra S
A1 Evans, Alan
A1 Blanchette, Mathieu
A1 Bouchard, Luigi
A1 Greenwood, Celia MT
YR 2017
UL http://biorxiv.org/content/early/2017/01/23/102475.abstract
AB Computational approaches to variable selection have become increasingly important with the advent of high-throughput technologies in genomics and brain imaging studies, where the data has become massive, yet where it is believed that the number of truly important variables is small relative to the total number of variables. Although many approaches have been developed for main effects, less attention has been paid to interaction models. Here, starting from the hypothesis that a binary exposure variable can alter correlation patterns between clusters of high-dimensional variables, i.e. alter network properties of the variables, we explore whether such exposure-dependent clustering relationships can improve predictive modelling of an outcome or phenotype variable. Hence, we propose a modelling framework called ECLUST to test this hypothesis, and evaluate performance through extensive simulations. We see improved model fit in many scenarios. We further illustrate the framework through the analysis of three data sets from very different fields, each with high dimensional data, a binary exposure, and a phenotype of interest. Our method is available in the eclust CRAN package.