Abstract
Many researchers want to report an R2 to measure the variance explained by a model. When the model includes correlation among data, such as phylogenetic models and mixed models, defining an R2 faces two conceptual problems. (i) It is unclear how to measure the variance explained by predictor (independent) variables when the model contains covariances. (ii) Researchers may want the R2 to include the variance explained by the covariances by asking questions such as “How much of the variance is explained by phylogeny?” Here, I investigate three R2s for phylogenetic and mixed models. A least-squares R2ls is an extension of the ordinary least-squares R2 that weights residuals by variances and covariances estimated by the model; it is closely related to R2glmm proposed by Nakagawa & Schielzeth (2013). The conditional expectation R2ce is based on “predicting” each residual from the remaining residuals of the fitted model. The likelihood ratio R2lr was first used by Cragg & Uhler (1970) for logistic regression, and here is used with the standardization proposed by Nagelkerke (1991). These three R2s are formulated as partial R2s, making it possible to compare the contributions of mean components (regression coefficients in phylogenetic models and fixed effects in mixed models) and variance components (phylogenetic correlations and random effects) to the fit of models. The properties of the R2s for phylogenetic models were assessed using simulations for continuous and binary response data (phylogenetic generalized least squares and phylogenetic logistic regression). Because the R2s are designed broadly for any model for correlated data, the R2s were also compared for LMMs and GLMMs. R2ls, R2ce, and R2lr all have good performance, and each has advantages and disadvantages for different applications. These R2s are computed in the R package rr2 (https://github.com/arives/rr2). [Binomial regression, coefficient of determination, non-independent residuals, phylogenetic model, pseudo-likelihood]
INTRODUCTION
Researchers often want to calculate a coefficient of determination, an R2, to give a measure of the amount of variance in their data explained by a statistical model. For ordinary least-squares models (OLS), such as regression and ANOVA, the R2 is simple to calculate and interpret. Many types of models, however, assume that the errors among response variables are correlated. Phylogenetic generalized least squares models (PGLS) allow the possibility of phylogenetically related species being more similar to each other, leading to phylogenetic correlations in the errors. PGLS models are structurally similar to linear mixed models (LMMs) that include random effects to account for correlations in the residual variation; for example, LMMs can account for correlation between residuals of experimental replicates within the same block. The situation is more complex for models for discrete response variables, such as phylogenetic logistic regression models (PLOG) and generalized linear mixed models (GLMMs). For models of discrete distributions, even perfectly fitting models have residual variation due to the discreteness of the data, and this complicates the interpretation of an R2.
Correlated errors in statistical models cause two issues for defining an R2. The first involves assessing the goodness-of-fit of predictor variables (fixed effects) in terms of the explained variance. For standard OLS models, the errors are assumed to be identical and independently distributed, and therefore the variance in the residuals can be calculated directly to give the total variance that is not explained by the model. In models for correlated data, however, the errors are not independently distributed. Therefore, to calculate the “unexplained variance” given by the residuals, it is necessary to deal with the covariances among errors; applying the OLS R2 to estimates from a model with covariances among errors gives values that are bounded below by -∞ rather than zero (Judge et al. 1985 p. 32).
The second issue for defining an R2 involves assessing the goodness-of-fit of the covariances (random effects) estimated in the model. For phylogenetic models, this is embodied by the question “How much of the data is explained by phylogeny?” The difficulty is that a phylogenetic model can be used to estimate the strength of phylogenetic signal (covariances) in the errors, but the phylogenetic signal does not directly lead to predictions of the fitted data. Desdevises et al. (2003) propose an R2 in which a phylogeny is decomposed into principle components, and the principle components are used as predictor variables in a set of regressions; however, it is not clear how the resulting R2 maps back to the statistical fit of a model and hence the statistical confidence in its results. An R2 should ideally be tightly coupled to the fitted model and use this fit to quantify how much of the data are “explained” by the phylogeny.
Here, I assess three R2s for models that specify non-zero covariances among errors. Although the definitions of R2s are broad enough to encompass any model specifying an error covariance matrix, I will focus on application to phylogenetic models for continuous (PGLS) and binary (PLOG) data. In addition, I will compare the properties of the R2s applied to phylogenetic and mixed models, which makes it possible to explore the R2s in detail. This comparison also validates the R2s as viable measures of goodness-of-fit for to a broad class of models. This is important, because R2s should make it possible to assess and compare as wide a range of models as possible (Kvalseth 1985).
The general form of the investigated models is where data Yi (i = 1, …, n) are distributed by a member ℱ of the exponential family of distributions (McCullagh & Nelder 1989). The parameter µi of distribution ℱ is itself a random variable, and applying the link function g() to µi gives a linear equation in terms of the predictor variable xi and an error term ei. The error term ei has a multivariate Gaussian distribution with means 0 and covariance matrix σ2Σ(θ) that may depend on a parameter θ. When the link function g() is the identify function, then equation (1) becomes a linear model (e.g., PGLS or LMM). PLOG can be modeled as a phylogenetic GLMM (PGLMM) in which Yi has outcomes 0 and 1, and the link function g() is logit (Ives & Helmus 2011; Ives & Garland 2014; Hadfield 2015). Note that other approaches to phylogenetic logistic regression (Ives & Garland 2010; Ho & Ane 2014) are not structured as PGLMMs, although calculating one of the three R2s is still possible. Although equation (1) is written for only a single predictor variable xi and parameter θ (a single random effect in a GLMM), all of the results presented below extend in the obvious way to multiple variables xi and parameters θ.
The covariances in the residual errors are contained within σ2Σ(θ). In phylogenetic models, the structure of σ2Σ(θ) is typically generated under a specific model of evolution (Martins & Hansen 1997b; Lavin et al. 2008). For example, in a PGLS using Pagel’s λ branch-length transform (Pagel 1997; Housworth, Martins & Lynch 2004), Σ(λ) is the sum of two matrices, Σ(λ) = λΣBM + (1-λ)I, where ΣBM is the covariance matrix derived under the assumption of Brownian motion (BM) evolution (Felsenstein 1985; Grafen 1989), and I is the identity matrix. If λ=1, then the covariances between errors given by BM are proportional to the distance between two tips and their most common ancestor on the phylogenetic tree. If λ=0, the errors are uncorrelated, and 0 < λ < 1 gives intermediate levels of phylogenetic signal. The effect of λ on the covariances among errors can be depicted by adding tip branches to the BM phylogenetic tree that are scaled to make up a proportion λ of the base-to-tip distance (Fig. 1).
The similarity between PGLS and LMMs can be seen by depicting the LMM as a tree with branch lengths giving the strength of covariances among errors (Fig. 1). For a model with a single random effect b, the covariance matrix is Σ(σ2b) = σ2b Σb + I where Σb is a block-diagonal matrix whose values are 1 for each row i and column j corresponding to the same level (block) of the random effect (Gelman & Hill 2007). The greater the variance of the random effect σ2b, the greater the covariances among errors within the same level, and the smaller the relative contribution of the residual errors given by the length of the tips of the tree. For comparison with LMMs, GLMMs can also be depicted as a tree, but rather than the residual errors at the tips of the tree having length 1, for models of discrete data the error variance depends on the unavoidable differences between the observation (0 or 1 for binary data) and probability of the observation (taking any value between 0 and 1). The lengths of these variances σ2w depend on the probability of the observation. I am showing the variances σ2w only for illustrating the similarities between LMM and GLMM models, and by extension PGLS and PLOG models. In some methods for implementing GLMMs (Schall 1991; Breslow & Clayton 1993), σ2w (the inverses of the GLM weights; McCullagh & Nelder 1989) are used in the fitting algorithms; in other methods, for example those in the R package ‘lme4’ (Bates et al. 2014), they are not used, although they can nonetheless be extracted from fitted models.
The three R2s presented here partition the “explained” and “unexplained” variances for models with correlated errors such as those depicted in figure 1. Because models can contain multiple parameters, the R2s compare a full model with a reduced model in which one or more of the parameters are removed; thus, they are partial R2s that give the explained variance by the components that differ between full and reduced models. The total R2s are obtained by selecting the reduced model in which there is only an intercept and residuals are independent. Defining partial R2s has the advantage of being able to ask about the contribution of a single or subset of components to the fit of a model. This makes it possible to exclude coefficients in a model that are not of explicit interest; for example, many phylogenetic models for species traits include body size as one of the predictor variables to factor out body size, and partial R2s make it possible to assess the goodness-of-fit for the remaining predictor variables. By comparing a model with a phylogeny to a model without, partial R2s also make it possible to answer the question “How much of the data is explained by phylogeny?"
R2s can be assessed on multiple grounds (Kvalseth 1985), and here I consider three. First, does the R2 give a good measure of fit of a model to data? To serve as a basis for assessment, I use the log-likelihood ratio (LLR) of the full and reduced models. The LLR approaches a χ2 distribution for large samples and is therefore used for hypothesis tests of full versus reduce models (Judge et al. 1985). Also, the LLR is linearly related to the AIC and other measures used for model selection (Burnham & Anderson 2002). Therefore, the LLR is a natural choice to assess R2s: a good R2 should be monotonically related to the LLR. Second, can the R2 separate the contribution of different components of the model to the overall model fit? For the simple case of equation (1) in which there is only a single regression coefficient (β1) and a single variance parameter (θ), I ask whether the R2s can distinguish between the two in their contributions to the fit of the model. Third, does the R2 give similar values when applied to data generated by the same statistical process? If the values of R2 when applied to data generated from the same statistical process are all similar, then the R2 gives a precise measure of goodness-of-fit.
MATERIALS AND METHODS
There is an extensive literature on R2s for GLMs and LMMs, and a growing literature for GLMMs (Buse 1973; Cameron & Windmeijer 1996; Cameron & Windmeijer 1997; Kenward & Roger 1997; Menard 2000; Xu 2003; Kramer 2005; Edwards et al. 2008; Liu, Zheng & Shen 2008; Orelien & Edwards 2008; Nakagawa & Schielzeth 2013; Jaeger et al. 2017), and this literature forms the basis for the R2s that can be applied to phylogenetic models. The three R2s take three different approaches to defining “explained variance", the same general approaches considered for LMMs by Xu (2003). R2ls is based on the variance of the residuals in a way that explicitly incorporates their covariances. For models with discrete data, R2ls is defined to closely match R2glmm presented by Nakagawa & Schielzeth (2013). R2ce is based on the difference between the observed data and model predictions (Kvalseth 1985), where the predictions use information from the covariances among errors. R2lr is based upon the information that is gained by adding parameters (regression parameters or covariance terms); it uses the likelihood ratio of full to reduced models as was first proposed for logistic regression (Cragg & Uhler 1970; Maddala 1983; Cox & Snell 1989). For ordinary linear models without correlated errors, these three R2s are identical to the OLS R2, but they differ for models with correlated errors.
Here I give a heuristic explanation for the R2s, and Appendix 1 gives details about their implementation. The starting point in the derivations of the three R2s is the standard R2 for continuous data (Buse 1973; Judge et al. 1985), where mSEf is the mean squared errors for the full model, and mSEr is for the reduced model. For the unadjusted R2, the mSEs are the mean SEs without correcting for degrees of freedom, so I have used the abbreviation mSE rather than the normal MSE, the mean squared errors corrected for degrees of freedom. Both full and reduced models may contain parameters in vectors θf and θr that involve the variances and covariances among samples which are estimated when the model is fit; for calculating the mSE, these parameters are assumed to be fixed. For a generalized least-squares model (Judge et al. 1985), where Y is the n × 1 vector of response values Yi, X is the n × p matrix for p predictor variables (including the intercept), β̂ is the 1 × p vector of estimated regression coefficients (fixed effects), and is the inverse of the n × n matrix V(θ) = σ2Σ(θ) that contains the variances and covariances of the errors. The mSE for OLS models is the special case in which Σ(θ) = I, which gives the standard R2.
The key issue in defining R2s is the scaling of V(θ) = σ2Σ(θ). Setting V(θ) = Σ(θ) in equation (3), the mSE gives the maximum likelihood estimate of the variance term σ2 from equation (1). However, Σ(θ) can be rescaled by a constant without changing the fit of the statistical model; the only effect of multiplying Σ(θ) by a constant is to change the value of σ2 by 1/constant. This would not be an issue if the scaling were the same for full and reduced models, because the scaling would cancel out when dividing mSEf by mSEr. However, it will generally be the case that ; for example, even for LMMs that include the same random effects, if removing fixed effects from the full model changes the estimated variances of the random effects in the reduce model. Therefore, the scaling is not removed by dividing mSEf by mSEr. Because the scaling determines the estimate of σ2, it affects the R2. The three R2s presented below differ in how they address the issue of scaling Σ(θ).
R2ls (for least squares) extends equation (2) in a way that closely matches R2s that have been proposed for LMMs and GLMMs (Buse 1973; Xu 2003; Edwards et al. 2008; Nakagawa & Schielzeth 2013; Jaeger et al. 2017). For LMMs, a natural scaling of Σ(θ) is to let Σ(θ) = I + G(θ) where G(θ) is the block-diagonal matrix containing the variances of the random effects divided by the residual variance (i.e., σ2b in Fig. 1). With this scaling, the residual variance σ2 estimated from the LMM equals exactly mSE from equation (3). For the total R2, the reduced model contains only the intercept and therefore equals the variance of the data. This means that the total R2ls for LMMs differs slightly from R2glmm(c) (Nakagawa & Schielzeth 2013) in which the variance in the data is estimated from the full model, rather than the actual variance of the data. Nonetheless, the values of R2ls and R2glmm(c) will be very close.
For phylogenetic models, scaling Σ(θ) is more complicated. For some common branch-length transforms in PGLS that are used to measure the strength of phylogenetic signal, such as the OU transform (Martins & Hansen 1997a; Blomberg, Garland & Ives 2003), the covariance matrix Σ(θ) does not separate additively to give terms for the explained versus unexplained variance. Even though Pagel’s λ branch-length transform, Σ(λ) = λΣBM + (1-λ)I, does break down into the sum of phylogenetic and non-phylogenetic terms, the non-phylogenetic term (1-λ)I cannot be interpreted as the unexplained variance. This is because, for many data sets, the estimate of λ will be 1, which is the expectation under the assumption of BM evolution. If this occurs, this would force an R2 treating (1-λ)I as the unexplained variance to be 1, regardless of the explanatory power of the predictor variances (fixed effects); this property would make the R2 uninformative. To solve this problem, I propose scaling Σ(θ) so that the total branch lengths equal 1; this is equivalent to assuming that the total amount of independent evolution is the same. For a fitted tree with strong phylogenetic signal, scaling Σ(θ) to have the total branch lengths equal to 1 will make the base-to-tip distances greater than a fitted tree with no phylogenetic signal. Because this scaling increases the diagonal elements in Σ(θ) for greater phylogenetic signal, it will reduce the estimates of σ2 and decrease the variance in the residuals that is unexplained by the model. Although this is only a convention (as opposed to a scaling derived from theory), the resulting R2ls performs for PGLS models in a similar way as it does for LMMs.
For discrete models, it is necessary to account for the variation introduced by discrete data (Fig. 1). While there are different ways to do this (Appendix 1), an approach that makes R2ls conform to R2glmm(c) for GLMMs (Nakagawa & Schielzeth 2013) is to replace mSE in equation (2) with where gives the distribution-specific variance attributed to the discreteness of the data; because the predictions from the model are probabilities whereas the observations are discrete values (0 or 1 for binary data). For binary data . The variance that measures is at the level of the transformed parameter value (i.e., g(µi) in equation (1)) for which the errors are normally distributed. To scale the variance by the variances of the data in the transformed space of g(µi), it is necessary to divide by the total variance that is given by the regression coefficients (fixed effects), , and the errors, . For GLMMs is the estimate of the variance of the random effects. For PGLMMs is the estimate of the variance in the Gaussian error term σ2Σ in which Σ is scaled to have base-to-tip branch lengths of 1. Note that in contrast to R2glmm(c) (Nakagawa & Schielzeth 2013), R2ls is explicitly defined as a partial R2; this not only makes it possible to assess the contributions of different components of the model to goodness-of-fit, it also simplifies application and interpretation of R2ls to LMMs and GLMMs with complex random effects, such as random slope effects (Johnson 2014).
R2ce (for conditional expectation) is based on the variance in the difference between observed and predicted data, . This approach conceptually comes the closest to answer the question of how much variation in the data is explained by the covariances in the model. For the case of LMMs, the predicted values Ŷ can be taken as the sum of the fixed effects and the estimate of the value of the random effects. For the LMM in figure 1, this corresponds to the estimated value at the polytomy formed at the node shared by all observations within the same level of the random effect. As the number of observations within each level of the random effect increases, R2ce for LMMs converges to R2ls because the estimates of the values of the random effects become more precise.
As it does for R2ls, PGLS poses a complication for R2ce: what is the predicted value Ŷi ? To parallel R2ce for LMMs, Ŷi could be taken as the estimated value at the node immediate below the tip on the phylogeny containing Yi. For phylogenies with some short terminal branch lengths, however, the estimates for the node underneath Yi will be determined largely by the value of Yi itself, leading to very high (and uninformative) R2s. Therefore, for PGLS I defined R2ce using the estimates Ŷi computed by removing the point Yi from the data set and then estimating Ŷi from the predictor variables and the remaining data points. Specifically for equation (1), the expected value of residual Ri = Yi – (β0 + β1xi) from the remaining residuals R[-i] is where R̅ is the GLS mean of the residuals, V[i,-i] is row i of V with column i removed, and V[-i,-i] is V with row i and column i removed (Petersen & Pedersen 2012). The predicted value of Yi is then Ŷ = β0 + β1xi + R̂i. Note that this procedure (removing Yi before predicting Ŷi) could be used for LMMs (and other models), although to make R2ce conform most closely to the structure of LMMs, the LMM R2ce makes predictions Ŷi while keeping Yi in.
The same approach as used for LMMs can by used for GLMMs and PGLMMs. In these cases, the variances are calculated for untransformed values of Yi, rather than in the Gaussian space of g(µi) as was done for R2ls. These predicted values are given from the estimation algorithms for GLMMs by glmer in the R package lme4 (Bates et al. 2014) and for PGLMMs by binaryPGLMM (Ives & Helmus 2011; Ives & Garland 2014) in the R package ape (Paradis & Paradis 2012). Because GLMMs and PGLMMs include the unavoidable variance in Yi – Ŷi due to the discreteness of the data, the estimates Ŷi correspond to those values a distance σ2w from the tips of the tree in figure 1.
R2lr (for likelihood ratio) is the application of an R2 proposed for logistic regression (Cragg & Uhler 1970; Maddala 1983; Cox & Snell 1989) and generalized by Magee (1990) and Nagelkerke (1991), and used for LMMs by Kramer (2005). R2lr computed for a range of models in the MuMIn package of R (Barton 2016). For LMMs, R2lr differs from R2ls only in the scaling of V(θ). If V(θ) is scaled so that the determinant det(V(θ)) = 1, then the maximum log likelihood is
Substituting into equation (2) then leads to
This definition of R2lr in terms of likelihoods extends immediately to any model fit by maximum likelihood estimation, including PGLS and GLMM. However, for discrete data equation (6) does not have a maximum of 1, because the maximum attainable log-likelihood for discrete data is zero. Therefore, Nagelkerke (1991) and Cameron and Windmeijer (1997) proposed dividing by the maximum attainable value, which is equation (6) with ; throughout, I have used this Nagelkerke standardization.
The algorithm used by binaryPGLMM to fit equation (1) for binary phylogenetic data uses quasi-likelihoods and does not give a true maximum likelihood that could be used to compute R2lr. Therefore, for PLOG models I used phyloglm in the R package phyloglm (Ho & Ane 2014), fitting the model with penalized ML but then using the provided ML values to calculate R2lr.
Simulations for assessment
The simulations to assess the statistical properties of the R2s applied to LMM, PGLS, GLMM, and PLOG all follow the same strategy. For each, data were simulated using equation (1) for the case with variation in a predictor variable and no covariances (β1 > 0, θ = 0), only covariances (β1 = 0, θ > 0), or both (β1 > 0, θ > 0). For each case, the model parameters were the same for all simulations, so that variation in values of a given R2 among datasets is caused by random sampling from the same statistical process.
For LMM, data were simulated with the model where xi follows a Gaussian distribution with mean 0 and variance 1, and the random effect ui has 10 levels, with b following a normal distribution with mean 0 and variance θ. I selected parameter values to generate moderate R2 values. For GLMMs, values from equation (7) without the residual error term ei were used through a logit link function (equation (1)) to produce binomial probabilities for a binary model. Models were fit using lmer and glmer in the lme4 package of R (Bates et al. 2014).
For the PGLS model, to obtain the covariance matrix Σ(θ) in equation (1), I first simulated random phylogenetic trees using the rtree function of the ape package of R (Paradis, Claude & Strimmer 2004). Thus, a different tree was simulated for each dataset. The strength of phylogenetic signal was varied using Pagel’s λ transformation which served as the variance parameter θ. Values of xi were simulated under the BM assumption using the rTraitCont function (Paradis, Claude & Strimmer 2004). The simulated data were fit using penalized maximum likelihood with the function phylolm assuming a Pagel’s lambda transformation in the package phyloglm in R (Ho & Ane 2014). The PLOG model was similar to the PGLS model, but in contrast the predictor variable xi was assumed to be independently distributed; including phylogenetic signal in xi caused challenges for model fitting for some simulated datasets, making the simulation studies difficult. Phylogenetic signal in the residuals ei was controlled by setting Σ(θ) = θΣBM so that in the absence of phylogenetic signal (θ = 0) the simulations conformed to simple logistic regression. To simulate binary data, a logit link function was used in equation (1).
Simulated examples
I illustrate the three R2s for two simulated examples. The first involves a PGLS model for two predictor variables and computes the partial R2s for one of them. For example, suppose a researcher has data on sprint speed in lizards, and the predictor variables are hind leg length and body size (Bauwens et al. 1995). Hind leg length is the variable of interest, while body size is a covariate, and therefore the interesting R2 compares the models with and without hind leg length. I simulated the case of 30 species under BM evolution and compared the cases in which hind leg length (as a proportion of body size) either did or did not show phylogenetic signal. Specifically, the model was where log(body size) was selected from a Gaussian distribution with covariance matrix ΣBM, and log(hind leg length) was selected from a Gaussian distribution with covariance matrix either I or ΣBM. The data were fit using phylolm with Pagel’s λ transformation.
As a second example, I simulated LMMs and binary GLMMs using equation (7), and fit them not only as LMM and GLMMs, but also as PGLS and PLOG models. The fitting as phylogenetic models was performed by converting the covariance matrix given by the random effects in the LMM and GLMM into a phylogenetic tree (Fig. 1) using the vcv2phylo function in ape (Paradis & Paradis 2012). This simulation allows a direct comparison between the R2s applied to mixed versus phylogenetic models.
RESULTS
The R2s were assessed according to the three properties: (i) their ability to measure goodness-of-fit as benchmarked by the LLR of full model and the model with only an intercept, (ii) whether they can partition sources of variation in the model as partial R2s, and (iii) how precise is their inference about goodness-of-fit. Property (iii) treats the R2s as if they were estimators of goodness-of-fit and asks how variable are the estimates when applied to repeated simulations from the same model (e.g.,; Cameron & Windmeijer 1996). A more comprehensive assessment is given in Appendix 2.
Goodness-of-fit
Figure 2 plots the total R2s against the corresponding LLR. All R2s were positively related to the LLR, which is a minimum requirement for an R2. R2lr shows a monotonic relationship with LLR, which is necessarily the case due to the definition of R2lr (equations (5), (6)). For the remaining R2s, values for a given LLR were generally lower for simulations in which variation was produced only by the fixed effect (β1 > 0, θ = 0; Fig. 1, blue circles). This implies that, relative to the LLR, these R2s were attributing less “explained” variance to regression coefficients (fixed effects) than covariances parameters (phylogeny and random effects).
For the LMM, I included the adjusted R2, R2adj computed from OLS regression by treating the random effect as a categorical fixed effect. R2ls and R2adj were almost identical. This correspondence implies that R2ls gives an R2 that is interpretable in the same way as the standard R2adj but generalized to LMMs.
All of the R2s other than R2lr showed greater scatter in their relationships with LLR for the simulations of binary data (GLMM and PLOG). In part, this is due to the difficulty of estimating variance parameters θ in binomial models. For example, there is more scatter in R2lr for GLMM simulations than LMM simulations, even though the criterion for calculating the R2s (the log likelihoods) are the same. The scatter seems particularly large for R2ls and R2ce applied to PLOG simulations, although this case requires some technical discussion. For PLOG, the LLR was obtained from phyloglm using penalized maximum likelihood, whereas R2ls and R2ce were estimated from the model fit by binaryPGLMM using the pseudo-likelihood. The phyloglm estimate of phylogenetic signal, λ, tended to absorb at zero even when the estimate λ from binaryPGLMM was positive; therefore, R2ls and R2ce could be positive even when the LLR was zero. Previous comparison between phyloglm and binaryPGLMM showed that they have similar performances but do not necessarily give the same conclusions about the presence of phylogenetic signal for the same dataset (Ives & Garland 2014).
Partitioning sources of variation
The partial R2ls, R2ce, and R2lr were generally able to partition sources of variation between components of a model, in particular between regression coefficients (fixed effects) and covariance parameters (random effects). Simulations with β1 > 0 and θ = 0 should have partial R2s for β1 that are positive and partial R2s for θ that are zero (blue circles, Fig. 2). Simulations with β1 = 0 and θ > 0 should have partial R2s for β1 that are zero and partial R2s for θ that are positive (red triangles, Fig. 2). Simulations with β1 > 0 and θ > 0 should have both partial R2s > 0 (black x’s, Fig. 2). Because the values of β1 and θ were the same whether or not the other was zero, the partial R2s for β1 should be the same for simulations with θ = 0 (blue circles) as for simulations with θ > 0 (black x′s), and the partial R2s for θ should similarly be the same for β1 = 0 (red triangles) and β1 > 0. For continuous data (LMM and PGLS), all three R2s had similar performance and similar values of the partial R2s (see also Appendix 2). For binary data (GLMM and PLOG), the three R2s showed more scatter, which in large part is due to the greater statistical challenge of estimating regression coefficients and variance parameters from discrete data. This is seen, for example, in the GLMM and PLOG simulations with β1 > 0 and θ > 0 which sometimes gave a partial R2lr for θ of zero (black x’s); these cases occur when the estimate of θ was zero even though a non-zero value was used in the simulations.
Inference about underlying process
The ability of R2s to infer the fit of the statistical process to the model depends on the precision of the estimates of R2. Figure 4 plots the mean values of the R2s with 66% and 95% inclusion intervals for simulated datasets with sample sizes 40, 60, …, 160. For LMMs and GLMMs, there were 10 levels of the random effect; datasets were produced by first simulating 160 samples (16 replicates at each level) and then randomly removing two replicates at each level to reduce the sample size in steps of 20. For PGLS and PLOG, each dataset at each sample size was simulated independently.
For LMM simulations, R2ls, R2ce, and R2adj showed similar patterns (Fig. 4), reflecting the fact that they give very similar values (Fig. 2, Appendix 2). Mean values did not change with sample size, and there was only moderate increase in variability among simulations with decreasing sample size. In contrast, mean values of R2lr decreased with decreasing sample size. This probably reflects the information that is lost when estimating the model parameters. In contrast to LMM simulations, the PGLS simulations showed less change in the means of R2lr and R2ce with sample size, presumably because there were more covariances among samples (i.e., the covariance matrix had more non-zero elements) than in the LMM with few replicates per level.
For the GLMM, both R2ls and R2ce had somewhat higher variances (less precision) than R2lr. The greater variation in values of R2ls and R2ce compared to R2lr may occur because R2ls and R2ce depend on estimates from the models (random effects for R2ls and fitted values for R2ce) whereas R2lr depends on likelihoods. Thus, R2ls and R2ce are compromised when the estimates are poor, as is particularly the case when sample sizes are small. For PLOG, the variances in R2ls and R2ce were similar to R2lr (Fig. 4). This is likely because estimates of phylogenetic signal (λ = θ) were well-bounded, in contrast to the variance in random effects in the GLMMs.
Simulated examples
In the simulated example of sprint speed regressed on log body mass (x1) and log hind limb length (x2) (equation (8)), whether or not log hind limb length showed phylogenetic signal had a large effect on the partial R2s for the effect of log hind limb length (Table 1). In the fitted PGLS models for both datasets, the parameter estimates and log likelihoods were similar, and the only indication that phylogenetic signal in hind limb length affected the fit of the model was the p-value for the regression coefficient for hind limb length (P = 0.012 when x2 had phylogenetic signal and P << 0.001 when it did not). The partial R2ls, R2ce, and R2lr for hind leg length were 0.21-0.23 when hind leg length had phylogenetic signal, and 0.71-0.89 when it did not. In contrast, the partial R2s for phylogenetic signal (reduced model with λ = 0) and the total R2s (reduced model with x1 = x2 = λ =0) did not differ much between simulations. The partial R2s for hind limb length depended upon the phylogenetic signal in hind limb length because when there is phylogenetic signal and hind limb length is removed from the model, much of the information is recaptured in the phylogenetic signal of the residual variation. This example illustrates the value of having partial R2s that can assess the role of predictor variables separately from other variables like body size that are not of specific interest.
In the second example (Table 2), a LMM and a GLMM were simulated and then the data were fit using LMM and GLMM, and also PGLS and PLOG by converting the covariance matrix given by the random effects into a phylogeny (Fig. 1). R2ls, R2lr, and R2ce were computed for the total model, as well as partial R2s for the fixed effect x and the random effect θ. As expected, R2lr was the same for mixed and phylogenetic models. All values for R2ce were also close between mixed and phylogenetic models. For the LMM simulation, R2ls calculated for the LMM model was very close to both R2ce and the R2ols computed by treating the random effect as a categorical fixed effect. However, the values of R2ls calculated from PGLS were lower. This occurred because the scaling of the covariance matrix Σ for LMM and PGLS is different; for LMM R2ls scales Σ so that the residual error corresponds to 1, whereas for PGLS R2ls scales Σ so that the total branch lengths equal 1. Although this gives different values of R2ls for LMM and PGLS, it avoids forcing R2ls to equal 1 when the residual error matches its expectation under BM evolution. For the GLMM simulation, R2ls from the fitted GLMM is higher than R2ls from the fitted PLOG, although R2ls for both models are higher than R2lr and R2ce.
DISCUSSION
R2ls, R2ce, and R2lr are presented here with focus on phylogenetic models, although they are broadly applicable to models with correlated errors. Below, I first address their specific application to phylogenetic models using mixed models as a reference, and then give general recommendations.
Applications to LMM, PGLS, GLMM, and PLOG
For both continuous data (LMM and PGLS) and discrete data (GLMM and PLOG), all R2s had good performance. For the simple model with a single regression coefficient β1 and single covariance parameter θ (equation (7)), all R2s were reasonable measures of goodness-of-fit, as assessed against the log likelihood ratio between full and reduced models (Fig. 2). Nonetheless, R2ls and R2ce gave lower partial R2 values for the regression coefficient β1 relative to the partial R2 values for the covariance parameter θ in comparison to R2lr and the log likelihood ratio, LLR (Fig. 2). Also, although all three R2s gave very similar values for the same dataset with continuous data, the values differed more for discrete data fit with either GLMM or PLOG (Fig. 2 and Appendix 2). This is reflected in general by the decreased precision of the R2s applied to discrete data, as measured by the variation in values when fit to data simulated under the same parameter values (Fig. 4). All R2s were capable of identifying whether β1 or θ was responsible for the fit of the model to the data as determined by the partial R2s; when β1 or θ were zero in the simulations, the partial R2s for β1 or θ, respectively, were low (Fig. 3). However, the partial R2s for GLMM and PLOG tended to be more variable and less conclusive than for LMM and PGLS (Fig. 3). Finally, R2lr decreased as sample sizes decreased especially for LMMs but also for GLMMs (Fig. 4). This is an understandable consequence of the loss of information to separate full and reduced models when there are fewer data. Nonetheless, it is an undesirable property, just as the change in the unadjusted OLS R2 is undesirable.
The poorer performance of all three R2s for GLMM and PLOG relative to LMM and PGLS in terms of partitioning sources of variation (Fig. 3) and precision (Fig. 4) is due to the greater challenges of fitting discrete data. This will affect the R2s differently if they are differently sensitive to the fitting. R2ls is calculated from fitted variances in a model; R2ce is calculated from the fitted values of Yi; and R2lr is calculated from the likelihood. Therefore, the three R2s will be sensitive to the precision with which each of these attributes is estimated. For GLMMs, the precision of R2lr was slightly greater than the other two R2s, although this did not appear to be the case for PLOG.
Although it is hard to argue in favor of one R2 over the others on the basis of their performance in the simulations, R2ls has the advantage of producing R2glmm(c) (Nakagawa & Schielzeth 2013) as a special case when applied to LMMs and GLMMs. Furthermore, because R2ls (like the other R2s) is defined as a partial R2, it makes application to LMMs and GLMMs more flexible and general, allowing subsets of fixed and random effects to be analyzed, and also more-complex structures like random slopes (Johnson 2014). Nonetheless, a partial R2glmm(c) comparable to R2ls can easily be defined as
Although the relationship between R2ls and R2glmm(c) might argue in favor of R2ls over R2ce and R2lr, when applied to data simulated from LMM and GLMM, R2ls values calculated from fitted LMM and GLMM were different from the values calculated from PGLS and PLOG fitted to the same data (Table 2). This highlights a weakness of R2ls: a decision has to be made about how to scale the covariance matrix Σ depending on the fitted model, and the resulting values of R2ls will depend on this decision.
Recommendations
An ideal R2 would make it possible to compare among different models and among different methods used to fit the same model (Kvalseth 1985 properties of a good R2 #4 and #5). R2ls and R2ce can be used for any model and fitting method that estimates the covariance matrix (R2ls) and/or fitted values (R2ce); for example, they could be used to compare LMMs fit with ML vs. REML, or binary phylogenetic models fit with ML (e.g., phyloglm; Ho & Ane 2014) or quasi-likelihood (e.g., binaryPGLMM; Ives & Garland 2014). Nonetheless, R2ls and R2ce have a disadvantage in terms of generality. For R2ls a decision must be made about how to scale the covariance matrix V(θ) (equation (3)), and for R2ce a decision has to be made about how values of Yi are predicted. The conventions I used for LMMs and PGLS differed. In contrast, R2lr is restricted to models that are fit with ML estimation; however, if ML is used for fitting, then values of R2lr can be compared across different types of models. This applies to any type of data and model fit with ML estimation.
An ideal R2 should also be intuitive (Kvalseth 1985 property #1). However, intuitive is in the eye of the beholder. R2ls is the most similar to the OLS R2, which grounds R2ls in the familiar and intuitive OLS framework. R2ce predicts the data from covariances estimated in the model, and therefore could be viewed as the most intuitive way to relate the variance explained by regression coefficients (fixed effects) to that explained by variance parameters (random effects). R2lr is also related to the OLS R2: in LMMs and PGLS, R2lr only differs from R2ls by the way in which the covariance matrix V(θ) (equation (3)) is scaled, and this provides a link between R2lr and the OLS R2 through R2ls. This said, however, I suspect that different researchers would rank the intuitiveness of R2ls, R2ce, and R2lr differently.
R2s are often used as “summary statistics" to describe the fit of a model to data in a way that does not involve statistical inference about the underlying stochastic process that generated the data: “How does the model fit these data?" rather than “How much does the model infer about the process that generated the data?" Should R2s be judged as a summary statistic? I think not. All the R2s showed high variation among simulations of the same model with the same parameters, especially when sample sizes were small (Fig. 4). This means that how the model fits a specific dataset involves a lot of chance, and hence one should not get too excited about a high R2, or too discouraged about a low one. R2s are best treated as inferential statistics, that is, as functions of a data-generating process that are themselves random variables (Cameron & Windmeijer 1996; Nakagawa & Schielzeth 2013). As an inferential statistic, R2lr most directly ties to hypothesis testing between full and reduce models using a likelihood ratio test. For me, this tips the balance to favor R2lr over the others.
FUNDING
Financial support came from the US National Science Foundation, DEB-LTREB-1052160 and DEB-1240804.
SUPPLEMENTARY MATERIAL
Appendix 1: Details about the implementation of the R2s.
Appendix 2: More comparisons among the R2s in the R package rr2. supplementary files: source R scripts for computing R2ls, R2lr, and R2ce with examples.
ACKNOWLEDGMENTS
I thank Ted Garland, Daijiang Li, Shinishi Nakagawa, Eric Pedersen, and Joe Phillips for wonderfully insightful comments that helped to clarify this article.