Abstract
Model averaging partial regression coefficients has been criticized because coefficients conditioned on different covariates estimate regression parameters with different interpretations from model to model. This criticism ignores (or rejects) the long tradition of using a partial regression coefficient to estimate an effect parameter (or Average Causal Effect), which gives the direct generating or causal effect of an independent variable on the response variable. The regression parameter is a descriptor and its meaning is conditional on the covariates in the model. It makes no claims about causal or generating effects. By contrast, an effect parameter derives its meaning from a causal model and not from a set of covariates. A multiple regression model implicitly specifies a causal model with direct, causal paths from each predictor to the response. Consequently, the partial regression coefficient for any predictor has the same meaning across all sub-models if the goal is estimation of the causal effects that generated the response. In a recent article, Cade (2015) went beyond this “different parameter” criticism and suggested that, in the presence of any multicollinearity, averaging partial regression coefficients is invalid because they have no defined units. I argue that Cade’s interpretation of the math is incorrect. While partial regression coefficients may be meaningfully averaged, model averaging may not be especially useful. To clarify this, I compare effect estimates using a small Monte-Carlo simulation. The simulation results show that model-averaged (and ridge) estimates have increasingly better performance, relative to full model estimates, as multicollinearity increases, despite the full regression model correctly specifying the causal effect structure (that is, even when we know the truth, a method that averages over incorrectly specified models outperforms the correctly specified model).