Abstract
Many researchers want to report an R2 to measure the variance explained by a model. When the model includes correlation among data, such as mixed models and phylogenetic models, defining an R2 faces two conceptual problems. (i) It is unclear how to measure the variance explained by predictor (independent) variables when the model contains covariances. (ii) Researchers may want the R2 to include the variance explained by the covariances by asking questions such as “What is the partial R2 for random effects in a linear mixed model?” or “How much of the variance is explained by phylogeny?”.
I propose using three R2s for mixed and phylogenetic models. A least-squares R2ls is an extension of the ordinary least-squares R2 that weights residuals by variances and covariances estimated by the model. The likelihood ratio R2lr was first used by Cragg and Uhler (1970) for logistic regression, and here is used with the standardization proposed by Nagelkerke (1991). The conditional expectation R2ce is based on “predicting” each residual from the remaining residuals of the fitted model. These three R2s can be formulated as partial R2s to compare the contributions of mean components (fixed effects in mixed models and regression coefficients in phylogenetic models) and variance components (random effects and phylogenetic signal) to the fit of models.
I investigated the properties of the R2s for linear and generalized linear mixed models (LMMs and GLMMs), and phylogenetic models for continuous and binary response data (PGLS and phylogenetic logistic regression). For LMMs and GLMMs, I compared the R2s to R2glmm from Nakagawa and Schielzeth (2013), and for LMMs also the ordinary least-squares R2 treating random effects as fixed effects.
R2ls, R2lr, and R2ce have reasonable performance, and each has advantages and disadvantages for different applications. Overall, R2lr showed less variation among repeated simulations of the same model than R2ls and R2ce (and also R2glmm), making it the most precise estimate of goodness-of-fit. Nonetheless, all three can be used with a wide range of models for correlated data.