Doors and corners of variance partitioning in statistical ecology

Variance partitioning is a common tool for statistical analysis and interpretation in both observational and experimental studies in ecology. Its popularity has led to a proliferation of methods with sometimes confusing or contradicting interpretations. Here, we present variance partitioning as a general tool in a model based Bayesian framework for summarizing and interpreting regression-like models. To demonstrate our approach we present a case study comprising of a simple occupancy model for a metapopulation of the Glanville fritillary butterfly. We pay special attention to the thorny issue of correlated covariates and random effects, and highlight uncertainty in variance partitioning. We recommend several alternative measures of variance, which jointly can be used to better interpret variance partitions. Additionally, we extend the general approach to encompass partitioning of variance within and between groups of observations, an approach very similar to analysis of variance. While noting that many troublesome issues relating to variance partitioning, such as uncertainty quantification, have been neglected in the literature, we likewise feel that the rather general applicability of the methods as an extension of statistical model-based analyses has not been fully utilized by the ecological research community either.


Introduction
In ecology, we are often interested in determining the processes and conditions that modify or maintain the distribution of species and the composition of ecological communities (Hubbell 2001, Ives et al. 2003, Vellend 2010, Ovaskainen et al. 2016. Key questions are what abiotic and biotic variables affect individual species and communities, what have been the relative roles of these variables in the past, and what will they be in the future (e.g. MacKenzie et al. 2007, Peres-Neto et al. 2006, Estay et al. 2014, Vuorinen et al. 2015, Kahilainen et al. 2018). These questions are intertwined but still largely separate. For example, an environmental variable such as temperature can affect species survival and reproduction rate in general but, if the study population experiences only moderate temperature variability, the contribution of temperature to variation in population size might be negligible in a particular study site (see e.g. Tomkiewicz et al. 1998, Veneranta et al. 2011. It is partially this distinction between potential variation and realized variation that has motivated our work and probably has contributed to the large array of methods of variance partitioning. In experimental research, the relative importance of alternative variables in explaining variation in a response variable is traditionally studied using the analysis of variance (ANOVA) approach, which provides a direct method for separating background variation from treatment effects under controlled conditions Berger 2002, Gelman 2005). In theoretical studies, the relative importance of alternative driving processes can be examined by mechanistic (first principles) models either by analytically studying their behaviour or by simulation (Ovaskainen et al. 2016).
However, mechanistic models and large ecological experiments that would be necessary to directly study ecological processes at large scales and over longer time spans are feasible only in limited cases (cf. Cressie and Wikle 2011, Péron and Koons 2012, McCallum 2016.
Consequently, empirical studies usually rely on observational data, often combined with model based statistical inference (Mac Nally 1996, Ives et al. 2003. For example, species distribution models and a variety of regression-based approaches are used to associate effects of covariates and structured random effects to the presence and abundance of species or the composition communities. In these circumstances, where direct interpretation of effect sizes is hampered by lack of experimental control and correlations among model predictors, statistical variance partitioning methods enter the stage as an aid for interpreting the results of these regression-like models (Pedhazur 1997, pp. 241-282, Legendre andLegendre 2012). Variance partitioning is used towards various ends, such as understanding how variation is distributed at different hierarchical levels in nested experiments or multi-scale studies (Searle et al. 1992, Cushman and McGarigal 2002, Goldstein et al. 2002, McMahon and Diez 2007 and determining what is the relative importance of a covariate in terms of model performance or contribution to the process studied (Chevan and Sutherland 1991, Bring 1995, Mac Nally 1996. In other cases, the interest lies simply in separating the unique and shared contributions of covariates (Ray-Mukherjee et al. 2014). We also find methods such as ordination (Borcard et al. 1992), standardized regression coefficients (Bring 1994(Bring , 1995, variance partitioning coefficients (Goldstein et al. 2002), and factor analysis (Riitters et al. 1995) being used for variance partitioning.
The abundance of applications and methods go hand in hand with the ad-hoc nature of many approaches to variance partitioning and determination of relative importance of covariates.
Often, the use of variance partitioning arises from a specific need, or for a specific kind of model. In that applied context the potential for rather wide applicability of the general approach does not become evident. These applications may also take shortcuts that do not respect the model structure or do not directly answer the question they are expected to answer (Pedhazur 1997, pp. 241-282). Furthermore, clear recommendations for how to interpret results and what should be reported when utilizing variance partitioning are lacking. To give an example, only a few studies report uncertainties in variance partition estimates or indicate the strength of correlations between variance components (but see Peres-Neto et al. 2006, Ray-Mukherjee et al. 2014, Yuan et al. 2017, Schulz et al. 2020.
As a further challenge, statistical approaches rarely fulfill the dream of causal understanding (Wright 1921, Burks 1926, Levins and Lewontin 1982, Cooper 1998, Cressie and Wikle 2011, and meaningful interpretation of estimated effect sizes of covariates and variance parameters of so-called random effects is challenging when modelling observational data. While at best the effect sizes reflect the actual importance of the covariates on the species or community studied, even then they might not reflect how important these covariates are for predicting variability in population size or community composition over space or time. The importance of the covariates in any specific context depends not only on the existence of some underlying, and usually unconfirmed, causal relationship but also on the variability in the covariates themselves. To give a concrete example, the Baltic Sea is a brackish water basin hosting partially endemic fish species, such as Baltic herring, sprat, pikeperch, and turbot, which are all sensitive to water salinity. Even though the salinity gradient is large throughout the Baltic Sea, salinity predicts only very little of the interannual variation in the species' reproduction rates as there is generally little interannual variation in water salinity in regions where these fish species occur (e.g. Lappalainen and Lehtonen 1995, Casini et al. 2006, Nissling et al. 2006. It is also often the case that any of the survey data at hand do not represent the full extent of variation in the covariates in the areas where the species occur (Vernier et al. 2008, cf. Foster et al. 2021. In either case, if data contain little or no variation with respect to a covariate, then there is no statistical way to assess its effect and thus no variation in the outcome can be attributed to that covariate, whether it is important or not (Pratt 1987). In extreme cases the situation is akin to the lack of genetic variability in an allele fixed either due to its high fitness advantage or simply due to genetic drift (cf. Whitlock and Gomulkiewicz 2005).
Keeping the preceding admonitions in mind, we follow Gelman (2005, Gelman et al. 2014, 381-400, Gelman et al. 2019) to present a model based approach to variance partitioning. Our goal is to quantify the contributions of individual model components to variation in the quantity of interest, such as species abundance or community composition. A similar Bayesian approach has been utilized by e.g. Yuan et al. (2017), who used it as a model checking tool for a spatiotemporal species distribution model, by Schulz et al. (2020) where we used it to explore occupancy-abundance relationships in a butterfly metapopulation, and in the joint species distribution modelling software HMSC (Ovaskainen andAbrego 2020, Tikhonov et al. 2020).
Instead of focusing only on a specific application, we present variance partitioning as a more general tool for the analysis and interpretation of statistical models. We also demonstrate how we can avoid some of the pitfalls of conducting and interpreting variance partitioning studies present in earlier work as well as show its applicability to predictive scenarios. We specifically address topics such as correlations between covariates, confounding by random effects and correlated parameters, which have received only little attention in the contemporary literature (but see Wakefield 2003, 2004, Yuan et al. 2017. Moreover, we propose variance partitioning for grouping of covariates as well as conditioning on observations and present some common as well as new visualization techniques for variance partitions. We derive and discuss the methods in a generalized linear model setting but variance partitioning is, at least partially, applicable to a wider class of statistical models, namely any hierarchical additive models (Gelman 2005), including traditional GLMs and GAMs as well as their extensions with structured and unstructured random effects (Hodges et al. 2007).
The rest of the paper is organized as follows. We first introduce the basic concepts behind variance partitioning in Section 2.1, after which we introduce the theory for its extensions in sections 2.2-2.7. In Section 3, we demonstrate the different variance partitioning techniques with a case study using a simple species distribution modeling (SDM) -like regression model. We will apply it to model presence-absence of an insect herbivore in a metapopulation context. After Section 2.1, practically oriented readers can jump directly to the case study and refer to the theory after reading the motivating examples. For researchers interested in applying these methods we also provide worked examples including all necessary code in the Programming supplement.

Introduction to variance partitioning
To better understand how variance partitioning works and is easiest to interpret, we first look at an idealized scenario from population ecology, where a closed population reproduces according to a simple time-varying Ricker population model (Ricker 1954). The Ricker model for population growth at time with population size can be written as (Ruppert and Carroll 1985, Collie et al. 2012, Weigel et al. 2021) where + 1 stands for the intrinsic population growth as a function of environmental covariates , 2 (< 0) is a parameter governing density-dependent survival and models intrinsic stochasticity in the population dynamics. Thus, we can describe the log growth rate of this idealized population with a linear model, whose parameters have an intuitive ecological interpretation. Hence, we would hope calculating and interpreting the variance partitioning were straightforward as well. However, this is not the case in practice.
Let us first assume that we know the parameters of the model, where the variances are sample variances over the observations collected in vectors, such as In order for these terms to summarize the proportions of variation they should sum up to one.
However they sum up to one, only in the special case that the three components of the Ricker  Ives et al. 2003).
If the correlation between 1 and is positive then Var[ ] and if they are negatively correlated, then the sum of variance terms is larger than the variance of .
In ecological applications, it is common to have temporally or spatially correlated environmental covariates and we refer to this phenomenon as the correlated covariates challenge (see also Cox and Snell 1974, Pedhazur 1997, pp. 241-282, Graham 2003, Dormann et al. 2013, Morrisey and Ruxton 2018. The example of fisheries in the Baltic Sea above serves to illustrate the problem: salinity, water temperature, and depth can all play a role in determining spawning areas for many of the fish species, but they are hard to disentangle as they are also all highly correlated with each other (Tomkiewicz et al. 1998, Janssen et al. 1999, MacKenzie et al. 2007, Veneranta et al. 2016. A related challenge that has received less attention in the ecological literature is the challenge posed by correlation between the random effects and the covariates (Wakefield 2004, Yuan et al. 2017). The random effects are typically defined to be a priori independent from the other model components. However, most often our model does not contain all relevant covariates, but we may have missed a covariate that is correlated with . When estimating the random effects from data, the effect of this missing covariate will be captured by at least partially, leading to correlation between 1 and . Correlation between 1 and can also arise for other reasons such as when a model includes both a spatial random effect and covariates with spatial structure (Teng et al. 2018). We call this phenomenon the confounded random effects challenge.
One approach to surpass the correlated covariates and confounded random effects challenges is to calculate the variance partitioning as The uncertainty in variance partitioning is not, however, typically reported but what is traditionally done (e.g. Cushman andMcGarigal 2002, Legendre andLegendre 2012) is that people summarize the model parameters with a point estimate ̂, such as the posterior mean, or more commonly the least squares or maximum likelihood estimate, and report (̂) or (̂) or some similar measure (but see e.g. Goldstein et al. 2002, Yuan et al. 2017. These estimates suffer from the correlated covariates and confounded random effects challenges as discussed above. However, there is even more to the story. In addition to neglecting the uncertainty in variance partitioning, there is no guarantee that (̂) and (̂) are actually good summaries for ( | ) and (  These summaries can easily be approximated using samples from the posterior distribution of ( | ) as it is sufficient to know the distribution of the parameters to calculate the expectation of functions of these parameters (Casella and Berger 2002, p. 55; see also Section 3, Mathematical supplement 2.7, and the Programming supplement). This approach avoids the bias that substituting point estimates into the formulas directly introduces and is used for example in HMSC (Table 1). However, if we then only communicate the posterior means ̂ or ̂, we lose knowledge of how the uncertainty in model parameters propagates to the variance partitions. The least should be to also report the posterior variance of the estimates.
In the Bayesian context, we must also consider what happens when the model parameters are mutually correlated in their posterior distribution, that is, for example, Cor[ 1 , 2 | ] ≠ 0. The posterior correlations between parameters translate into posterior correlations between the variance partition measures and or equivalently and . In that case, even the marginal distributions of the variance partitions are not necessarily good summaries of their joint posterior distribution (see Sections 2.6 and 3.6).

Variance partitioning of the linear predictor
Moving on to a general regression context, we discuss variance partitioning in the context of the linear predictor. In a generalized linear model, the linear predictor is = ( ( )), a transformation of the expected value of outcomes to the linear space using a link function .
Linear predictor is a length vector, that is formed as the sum of vectors of linear terms, that and is a length vector of covariates. But as far as variance partitioning is concerned the linear terms could equally be additive or random effects.
The variance over the observations of the linear predictor in (generalized) linear models is ]. Hence its decomposition follows from the properties of variances of linear combinations (Mulaik 2010, pp. 73-76, 83-84) and can be written as a sum of the covariances of the linear terms: Var[ ] ). We could also calculate the variance partitioning for non-linear joint effects of two or several covariates, such as ( , ′ ). This is used, for example, when calculating the variance contribution of continuous spatial random effects.
The approach is not limited to partitioning the variation in the linear predictor, but can also be used for the more traditional partitioning of explained variation, 2 , when working with a linear model or a model with linearizable residuals. If we partition variance while accounting for residual variance, it will give us a measure that is equal to the traditional 2 when defined as . The residuals can be calculated either with respect to the observations or its posterior predictive distribution ̃ as is done in the Bayesian 2 approach (Gelman et al. 2019) but we do not consider this choice further here.

Grouping linear terms in variance partitioning
Often analyses include many environmental covariates, which can be divided into multiple groups, such as habitat quality or climatic variables. These groups of linear terms are sometimes called predictor groups (Ovaskainen and Abrego 2020, pp. 69-70) or simply sets of variables in some ecological literature (e.g. Legendre and Legendre 2012, cf. Wisler 1969, Beaton 1973, Cox and Snell 1974. We can calculate the variance partitioning at the level of these groups of linear terms by summing over the variance terms of the covariates included in that group (cf. Mulaik 2010, pp. 72-84). To present this explicitly, we define the following notation. If each covariate is associated with only one of groups, the linear term group effects can be written formally as = ∑ ∈ where = 1, … , and is a set of covariate indexes that are associated with the 'th group. We can rewrite the linear predictor in terms of sums of the linear term group and can be used to express the variance of the linear predictor . The linear predictor group variance partition is then for some group (Mathematical supplement 2.3).
In addition to simplifying the presentation of models with many covariates by summarizing them at a more abstract thematic level, it can also help interpreting models with correlated covariates when the correlations between covariates are greater within the linear term groups than between them: grouping related covariates incorporates the covariances between those covariates to the variance term of that group and thus reduces the covariance between the (grouped) variance components (Ovaskainen and Abrego 2020, pp. 69-70). This can also reduce the posterior correlation between the variance partitions (see Section 2.6) compared to a situation without groups. However, this does not solve either of the problems entirely, since correlations can occur also between thematically unrelated covariates leading to covariances and posterior correlations between the groups.
It is to be noted, that the grouping of the covariates is not a property of the observations, or the current sample of data at hand, but rather a choice made by the analyst as will be illustrated in the case study (Section 3.1). It thus incorporates external knowledge into the interpretation of results. Also, depending on how the groups of linear terms are defined, a group might consist of a mixture of covariate and random effects (cf. Schulz et al. 2020).

Variance partitioning over populations and its use in prediction
The approach to variance partitioning presented thus far corresponds to partitioning with respect to the observations at hand, such as data from specific time points and sites. Hereafter, we refer to them as sample variance partitions, since they describe the properties of a finite sample from some larger domain, for example a set of potentially randomized sampling locations in a larger landscape or sampling times of some continuous process. In some cases, it will be possible to partition variance with respect to the full domain of interest (see Section 3.3). The domain can, for example, be the area and time period over which a survey was conducted (see Yuan et al. 2017). This population variance partition will often correspond closer to what is of ecological interest, especially when modelling observational data obtained without a well-designed survey.
In such data, the distribution of environmental covariates in the observations does not correspond to that of the whole study region and the sample variance partition will not be representative.
However, whether population variance partitions can be calculated or not, depends on the availability of covariate data over the domain and is thus often not easily realizable. The distinction between sample and population variance partitioning is demonstrated in the case study (section 3.3) and Fig. 2 ((a) and (b)).
The notion of variance partitioning over an existing, but only partially observed region or population, can be extended to predictive variance partitioning over new domains. For example, in climate change studies, we could be interested in predicting how much variation in species distribution is contributed by the environment under future climate compared to the current climate. When predicting variance contributions for a new location or time, the distribution, and hence variability, of the predictors often differs from the original study location or time, even if in terms of the range of values for individual covariates the prediction would have the character of "interpolation". Hence, population variance partitioning can also be a tool for summarizing ecological predictions and scenarios.

Conditional variance partitioning and partitioning between and within conditions
If we group our observations according to some categorical variable or some function that assigns observations to groups based on the covariates, we can calculate variance conditional on that grouping. For example, our case study (Section 3.4) includes a covariate for the abundance of the insect herbivore's host plant. It is divided into three categories, with all observations within the same abundance category grouped together. Formally, we define a conditioning function ( ) that assigns each observation into one of predefined groups. The variance partitioning conditional on group ∈ is then calculated over the subset of observations (or the subset of the population) for which ( ) = . For example, the conditional sample variance partitioning for the covariate would be where [• | = ] denotes the sample variance over the subset of observations for which ( ) = ( = 1, … , indexes the observations, see also Mathematical supplement 2.4). The definition for | = ( ) would arise analogously.
When, for example, a predictor varies only over some subset of the observations, conditional variance partitioning can be used to study its role in that subset where it varies. However, conditional partitioning should be used with some meaningful question in mind, not just to search for an arbitrary division of the data to make up for the minor role of a predictor for the whole data. Unlike the groups of linear terms, the conditioning variable is a property of the data -that is the group membership of each observation depends on the values of the covariates (or functions thereof) available for the observations. Usually these variables would be part of the covariates of the linear predictor. Nonetheless, the choice of the conditioning variable is still made by the analyst.
We can extend conditional variance partitioning to see how much of the total variation is partitioned within the groups defined by conditioning variable and how much variation is due to the differences between the conditions. This approach has the character of analysis of variance (cf. Pedhazur 1997, pp. 679-683, Nakagawa andSchielzeth 2013 (Whittaker 1990, pp. 124-125).
Specifically, in applied variance partitioning we utilize the sample covariance of the linear terms within and between conditions. The variance partition within conditions is simply the weighted average of the conditional variances where = is the number of observations belonging to the condition and = ∑ = =1 is the total number of observations.The variation between groups is derived from the weighted variance of the per-condition means of the linear terms where the ̅ | = is the per-condition mean and ̅ is the unconditional or grand mean of the linear term over all observations. The within and between condition diagonal variance partitions | wi and | btw are defined analogously. Note that the sum of the variance partitions within and between conditions equals the variance partitioning over all observations, i.e. = | wi + | btw . For derivation of the within and between condition variances see the Mathematical supplement 2.5.
Differences between the conditional variance partitions imply that the distributions of covariates and random effects differ between the conditions. The partitioning of variation to within and between condition (co)variance is a way to summarize these differences for individual linear terms or predictor groups. When the conditioning variable is a categorical predictor or the level of an iid random effect, then the corresponding linear term will contribute only to betweencondition variation. At the other extreme, a predictor that is constrained to sum to zero within each condition would only contribute to within-condition variation.

Correlation between covariates and posterior correlation between parameters
The variance partitions ( | ) and (  grouping of linear terms often ameliorates this issue, as the within-group correlations will be subsumed by the marginal distribution of the grouped variance partitions ( ; Section 2.3). Since this does not totally remove the problem, it is recommendable to explicitly report and discuss the correlations between different variance components (see Section 3.6).
The problem caused by the correlations is analogous to an over a century old problem of how to interpret variance partitions in a regression context in the presence of correlations between the model's covariates (Wright 1921, Burks 1926, Cox and Snell 1974, Bring 1995, Graham 2003, Dormann et al. 2013. Hence, by exploiting this literature, in addition to the measures and presented above, we present two other measures which can be used to summarize the covariance relationships between the linear terms and can also be interpreted as variance partitions themselves (Table 2). To simplify the notation we denote here the linear terms by = such that = ∑ =1 . We begin by rewriting the variance of the linear predictor in two alternative where \ collects all other linear terms except and the hat-notation denotes linear least squares predictor so that, for example, ( \ ) is the linear least squares prediction for using \ ; since = ∑ =1 , it holds that ( 1 , … , ) = . The first equality above arises by just rearranging the covariance terms in (5) and the second equality is proved for example by Whittaker (1990, pp. 134-138). In linear regression, the sum of the terms inside parenthesis on the right hand side of (10) we call the marginal variance of the corresponding linear term since it is a measure of the contribution of a linear term, while accounting for covariances with all the other linear terms (Pratt 1987, Chase 1960, Bring 1995. The squared semi-partial correlation coefficient between the linear term, , and the linear predictor, , in (11) measures the variance contribution of the linear term, that cannot be accounted for by a linear combination of the other linear terms (Whittaker 1990, pp. 134-137, Legendre andLegendre 2012, pp. 173-180). Hence, squared semi-partial correlation coefficient between a linear term and a linear predictor can be seen as the unique variance from that linear term.
Motivated by the above considerations we define the marginal variance partition as Var[ ] and the unique variance partition as Var[ ] .
The marginal and unique variance partitions, and , for a group of linear terms are calculated analogously. The posterior distribution for both of these measures is remarkably simple to calculate by exploiting the properties of the sample covariances between the linear predictor and individual predictive terms and is easy to extend to groups of linear terms as will be illustrated in Section 3.6 and described in detail in the Mathematical supplement(2.6).
The unique variance partition is equal to only in the absence of correlations between the linear terms. In all other cases, its value is less than the value of and it is zero if the linear term under investigation can perfectly be explained by the other linear terms. Hence, the difference between and is a measure of the effect of correlations between linear terms to the variance partition (see also Burks 1926, Whittaker 1990. on its own can be thought of as a lower bound of variability associated with that linear term. In the case of treatment effects in an experimental context, it would be the part of variation in the outcome that is unequivocally attributable to the treatment corresponding to that linear term (i.e., the direct effect). In the case of auxiliary variables or observational data, it simply implies that that proportion of variation in the linear term is not confounded with the other linear terms in the model. However that "unique variation" could still be confounded with covariates missing from the model. We note also that, even though in some special cases the semi-partial correlation can be interpreted as a causal , is also equal to the covariance between the linear term and the linear predictor (see Mathematical supplement 2.6.3). Hence, when the measure is close to zero, that linear term has no direct effect on the linear predictor, only an indirect effect via its correlations with the other terms. That indirect effect is sometimes interpreted as improving the model due to removal of "noise" from other covariates (Bring 1995). At the same time, the marginal variance partition can also equal the variance partition even in the presence of covariances, if they "cancel out". The marginal variance partition shares some properties with commonality analysis, a widely used method of variance partitioning in ecology (Legendre and Legendre 2012, 570-583), namely it sums up to one, represents the direct effect of the linear term and comprises the unique and shared variance contributions of that linear term (Chase 1960, see also Beaton 1973, Mathematical supplement 2.6.4 ).
We illustrate possible conclusions from comparing different values of these three measures for a single linear term or a group of linear terms in Table 3 and present some of them here: When the marginal variance partition is larger than the variance partition , this implies that the covariances between linear term and the other linear terms are positive on average and increase the variability of the linear predictor . When is smaller than , the covariances are negative on average. If is still positive there is some confounding between the linear terms, but the linear term still contributes to variability in the linear predictor also independently. When is close to zero, although is not, the linear term has no direct effect on the linear predictor, but all variance attributed to can be explained by other covariates. When negative, the linear term acts to supress variability in the linear predictor. All the previous cases imply that the unique variance partition is smaller than . Lastly, when the variance partition is equal to even though is smaller than , then the effect of the covariances averages out. To summarize, the difference between and serves as an indicator for the presence of covariances, and hence confounding, between the linear terms, while the difference between and shows whether the covariances increase or suppress the variability of the linear term or the linear predictor.

Variance partition measures revisited
For any variance partitioning measure there are many properties that would be desirable (Pratt 1987, Bring 1995, Nathans et al. 2012. We consider here only four of them (Table 2) (Table 2), which is why it can be useful to report variance partitions in terms of more than one measure (cf. Pratt 1987, Whittaker 1990, p. 130, Bring 1995

Variance partitioning in practice: Case study on the Glanville fritillary butterfly
We demonstrate the application of variance partitioning using a simple species distribution model to explore factors contributing to patch occupancy in a Glanville fritillary butterfly metapopulation in the Åland Islands, SW Finland (Fig. 1). We flesh out the methodological details of the variance partitioning by concisely explaining practical application of the theory to this case study. We demonstrate the analysis steps, present the results, and discuss them from the ecological interpretation point of view in this section. In Section 4 we provide thorough discussion on the findings from the methodological point of view.

Study system and species distribution model
The study system consists of approximately some four thousand potential habitat patches in which the butterflies' larval host plants, Plantago lanceolata and Veronica spicata, occur. These habitat patches range from dry meadows, pastures, and rocky outcroppings to roadsides and yards. The butterfly, Melitaea cinxia, has a single generation at this northern latitude of its range; the adult butterflies emerge in early summer and the larvae, that live greagariously most of their lives, build silken nests for overwintering. The nests are easily distinguishable and are counted during an annual autumn survey. At the same time data about the habitat is gathered, such as measures of the abundance of the host plant and indicators of grazing (Ojanen et al. 2013).
The model and data are simplified from our previous study analysing the relationship between occupancy and abundance in this long-term ecological study system (Schulz et al. 2019(Schulz et al. , 2020.

2020). For our analyses, we only select a subset of the data available in order to mimic a scenario
where only a part of the potential study domain is surveyed (see Fig. 1). This core area has the highest occupancy rates (average within 17.5 % vs. 14.5 % outside) and comprises = 1259 of the total 4379 habitat patches in the data set. When we refer to unobserved patches or locations below we mean the patches outside this core area. In reality, the full data set contains observations also from those patches. Note that we use previous occupancy as a covariate in our model and it will be missing for cases when the corresponding habitat patch was not surveyed in the previous year. For demonstration purposes we ignore the problem and simply omit the observations for which there is no corresponding observation from the same patch in the previous year. We also exclude observations with no host plant present (n = 635) as it is really an indication that the patch is no longer habitat for the butterfly.
where is the constant of regression, , ∈ {0,1} is the occupancy status of the patch in the previous year (0 representing unoccupied) and occupancy, metapopulation, and habitat quality index sets , and collect the "fixed effects" into covariate predictor groups (see section 2.3; note that the set contains only one covariate index ; Model supplement). The random effects are collected into an additional random effects group labeled . We use this thematic grouping of the linear terms to study which general factors affect our study system rather than individual covariates or random effects. and are per patch and year iid random effects and is a spatio-temporal random effect.
Throughout this running example, we assume that posterior inference is implemented so that we can produce random samples from the joint posterior distribution of the model parameters and random effects. This is the most common approach to implement Bayesian inference and allows where we also report the posterior distributions of the model's parameters. The full code for the analyses is publicly available . For code examples using a simpler model see the Programming supplement.

Sample variance partitioning for the regression model
We are interested in partitioning the variance of the linear predictor into the groups of linear terms indicated by brackets above the model (Eq. (12)). That is, we partition the variance of linear predictor with respect to these four groups   and ̂ and ̂ are then simply the mean over ,1 , … , and ,1 , … , . The approach for the other predictor groups is identical.
The biased variance partitioning is calculated by first calculating posterior means of the predictor group effects: ̂= ̂, ̂= ∑∈ , and ̂, = ∑∈ where ̂ is the mean over ,1 , … , , . The posterior mean of the random effects group is constructed analogously. After this we plug in ̂,̂,̂,̂ into the above equations to calculate (̂) and (̂).
The posterior distributions of all variance partitioning measures (e.g., ( | )) exhibit considerable uncertainty (Fig. 2 (a)). The posterior means (̂ and ̂) and posterior variances The random effects, metapopulation covariates and the habitat quality covariates all contribute a large proportion to the total variation in occupancy in the core study area (̂= 0.32,̂= 0.26,̂= 0.21, Fig. 2 (a)). The role of occupancy in the previous year in the variation of current occupancy is very low (̂= 0.02). Partly this is due to expected correlations with the metapopulation covariates connectivity ( = 0.48) and area ( = 0.27) as well as host plant abundance ( = 0.22). Given the high rate of extinctions and colonizations previous occupancy alone is not a very good predictor of current occupancy in this system; and connectivity also is a predictor for colonizations while area as well as host plant abundance are good surrogates for population size, which does predict extinction risk (Schulz et al. 2020). In Section 3.6 we will take a more formal look at the role of these correlations.

Population variance partitioning
In our case study, model inference is performed only on a subset of the study region (Section 3.1, +̃+̃ we need to construct the predicted random effects at the unobserved locations. In our case, the sample data cover the same years as the predictions, so the yearly random effects are simply ̃= . For the patch effects we must sample for each unobserved habitat patch new values from a univariate normal distribution with variance corresponding to the th posterior sample of the variance parameter of the patch random effect. The case for the spatio-temporal random effect the approach is similar, but more complicated due to the covariance structure of the spatio-temporal random field which depends both on the hyperparameters defining the covariance matrix as well as the existing random effects at the observed patches. The population variance partitioning differs from the sample variance partitioning considerably in terms of the role of the metapopulation predictors ( Fig. 2 (a) and (b)). Over the whole habitat range in the Åland Islands, it accounts for approximately half of the variation in the occupancy of the butterfly, with correspondingly reduced roles for the random effects and habitat quality (̂p op = 0.21,̂p op = 0.50,̂p op = 0.14).
This suggest that outside of the core study area connectivity and habitat patch size determine variation in occupancy as there is less variation in habitat quality (̃= 0.13 vs. ̂= 0.21).
This is consistent with the fact that the core study area contains most of the more viable habitat of the butterfly in the Åland islands (Hanski et al. 2017). Also, the habitat patches outside of the core area tend to be more isolated accentuating the role of connectivity in predicting occupancy.

Variance partitioning conditional on host plant abundance
The amount of host plant vegetation within the habitat patch is known to be crucial for the specialist larvae of the butterfly (Opedal et al. 2020). and is calculated equivalently for the other predictor groups.
Differences between the conditional variance would imply that the other covariates or the random effects covary with host plant abundance. Or more precisely, that the distribution of the covariates (and random effects) differs between the levels of host plant abundance. Here, we see mostly only small differences between the abundance classes except for the random effects, where less variation is associated with random effects in the highest abundance class (Fig. 2 (c)).
The minor role of the habitat quality predictor group indicates that the predictor group's variance contribution is mostly due to the conditioning variable host plant abundance. Again the posterior mean and variance estimates (e.g. ̂ and Var[ | ]) summarize well the posterior distributions for variance partitioning. However, especially for the random effects, the biased point estimates greatly underestimate the conditional variance proportions.

Variance within and between patches
We go one step further and ask whether more variation in occupancy is due to differences between patches or due to annual variation within patches and which factors drive these differences (see also section 2.5). To do this, we define the conditioning function to be ( ) = where is an indicator vector of length for = 1259 patches, with , = ( ) above. When the variance contributions are separated for within and between habitat patches we see that the metapopulation covariates contribute more to differences in occupancy between patches than interannual variability within the patches (Fig. 2 (d)). This is to be expected as connectivity varies significantly between patches already simply due to varying isolation, though annual variation is also present (Hanski et al. 2017, Kahilainen et al. 2018. And as the data uses a single areal measurement for each patch, habitat area cannot contribute at all to within patch variation. For the random effects the situation is reversed and they contribute somewhat more to yearly variation in occupancy within the patches. The habitat quality covariates contribute almost equally to variation within and between patches. This dual role for habitat quality is consistent with the observation that in addition to habitat patches vastly differing in quality (Schulz et al. 2020) there is also significant annual variation in host plant availability within these patches (Opedal et al. 2020).

The variation within patches
The biased point estimates are rather close to the actual variance partitions in all cases but one.
The estimate for the random effects' contribution to variation between patches is severely underestimated. A likely explanation lies in the fact that the posterior mean of the random effects is close to zero, which is both a function of the random effects' priors having a mean of zero and an constraint for the random effects to sum to zero (Model supplement). As a consequence a large proportion of their variance contribution is due to the posterior variance, which is ignored by the biased point estimate. This is a general problem whenever the mean estimate is close to zero, but the uncertainty about the estimate is great, which almost by design is true of random effects, but can affect covariate terms as well.

Correlation between covariates and posterior correlation between parameters
To understand the effects of correlated covariates and random effects, we start by visualizing both the posterior distributions of the covariances between predictor groups and the correlations of the predictor group's variance partitions (Fig. 3). The unique variance partition for the groups of linear effects is the proportion of variation due to the linear terms in that group, which cannot be expressed as a linear combination of linear terms belonging to other groups. The measure can calculated using the sample covariance matrix of the linear terms (Mathematical supplement 2.6.2): where is the sample covariance matrix between the linear terms in the model, indexes the rows or columns belonging to occupancy and \ all the linear terms in the other groups of linear terms. Note that in general is the sum of the variances in the above equation, but that is omitted as the occupancy group consists of a single covariate. the other variance partitions (Fig. 3) precisely because its variance contribution is mostly unique (Fig . 4). Any increase in its proportion of total variation will have to be offset by the other groups' variance partitions. And the three groups of covariate linear terms are all correlated, but only the metapopulation and habitat quality variance partitions have a clear negative correlation The random' effects unique variance partition is approximately equal to its variance partition and the marginal variance partition is correspondingly only slightly higher than (Fig .   4). This means there is only minor confounding between the random effects and the covariates, even though the posterior distribution of is clearly correlated with the other variance partitions. All of the marginal variance partitions are higher than their corresponding variance partitions, which means that the covariances between the groups of linear terms are overall positive. The clearest example is occupancy, whose main contribution to variability is due to its covariances with the other linear terms. This is not unexpected as due to the high extinction and colonization rates in this system occupancy alone is not a generally reliable predictor of future occupancy states and as occupancy is highly correlated with habitat quality, patch area and connectivity. It is interesting to note that the metapopulation covariates and habitat quality covariate have approximately equal unique contributions, but the variance and marginal variance partition are both higher for the metapopulation covariates. A potential explanation is the higher correlation between occupancy and metapopulation covariates than occupancy and habitat quality covariates (Fig. 3).

Discussion
Variance partitioning in its many forms has been an important tool in ecology already for decades and we believe it continues to hold its central role in future as well. As often happens with highly influential methods, variance partitioning has evolved to many directions over the years. Some of these directions, however, have lost their connection to the original theory whereas others have improved and developed the core ideas significantly further. In this work, we have discussed the pros and cons of some of the modern versions of variance partitioning and proposed a few extensions to them that we see useful. Our work is not intended to be a review on the subject but we have taken a practical stance by pointing our criticism to popular approaches that we think are problematic while appraising approaches that we think should be more widely adopted in ecological research. We treat the subject from Bayesian perspective since it allows transparent and straightforward treatment of uncertainties.
The central idea in variance partitioning is to examine the relative importance of alternative ecological processes. As shown in our earlier presentation of the methods and exemplified in by our case study, the form of variance partitioning should change according to the question asked.
For example, the sample variance partitioning for individual model components (the individual covariates in our case study) informs us about the relative roles of those covariates only (Section 2.2). However, distinct covariates often fall into broader sets of ecological processes, such as occupancy, metapopulation, habitat and random processes in our case study. If we are interested in the relative importance of these higher level processes in our study system, we should decrease the resolution of the analysis by using variance partitioning for the groups of linear terms formed by these upper level processes (Sections 2.3, 3.2). At the same time, the conditional variance partitioning and its extension to between and within condition variance partitioning increases the resolution of answers available by allowing the analyst to zoom into different compartments of the study system. That provides a method to assess what are the relative roles of the ecological processes in the subset of cases falling into a particular compartment, and whether the relative roles of the ecological processes are consistent between the compartments (low between condition variance) or not (high between condition variance) (Sections 2.5-2.6, 3.4-3.5).
Moreover, variance partitioning extends straightforwardly beyond the study system from where the data are collected. Here we demonstrated this by calculating the population variance partitioning for the larger system in whole from which we had (hypothetically) observed only a small fraction (section 3.3). We could analogously extend the idea to predictive variance partitioning analysis related to, for example, climate or land use change. These approaches to variance partitioning presented take a finite-population view of variation by looking at sample and population variance partitions for a specific study system. One essential question in ecology is, however, to distinguish between context dependency and generality. The variance partitioning can be turned to this question as well by implementing so called superpopulation view of variance (Gelman et al. 2014, pp. 201-202). For example, we could model spatially distinct study systems (such as, Glanville fritillary populations in Åland or UK) with a hierarchical Bayesian framework by assigning variance parameters to each study system and to a higher hierarchical level description of consistent environmental response across systems. The variance parameters could be interpreted as measures of variability across a larger domain than encompassed by any single study system. For specific questions such model could also make predictions for unobserved locations. In our context, this could for example involve the prediction of occupancy in populations elsewhere in the study species' distribution.
Superpopulation variance partitioning requires a more carefully thought out model structure and is more sensitive to choice of priors (Gelman 2005(Gelman , 2006. Developing these methods further is beyond the scope of this work.
Our approach to variance partitioning is strictly model based, which has several consequences.
All these analyses can be attained by a single model after a single parameter inference step. The model based approach allows us to coherently encode the ecological understanding into analysis.
It also allows, when combined with Bayesian approach to parameter inference, transparent and theoretically justified treatment of uncertainty. These properties are clearly an advantage when our model explains the system well but it is also a reminder that we should not jump into variance partitioning before scrutinizing our model.
The treatment of uncertainty is one of the central themes in our work. In our experience, this is typically neglegted at many mutually connected levels of variance partitioning. Firstly, and most evidently, the variance partitioning summaries themselves are uncertain. This simple fact seems to be forgotten in most applications of variance partitioning even if authors had careful uncertainty treatment in other parts of their analysis. As shown in our results (Fig. 2), ignoring the uncertainty in variance partitioning by using simple parameter estimates may significantly change quantitative but also the qualitative conclusions from the study compared to treating the uncertainty correctly. Variance partitions should be treated as random variables and should be interpreted either by studying their distributions directly or by suppressing these distributions to well defined summary statistics such as posterior mean and variance or credible intervals. As shown in our case study, propagating the parameter uncertainty is rather simple in practice, as it can be done after model inference and only involves the same steps as calculating the variance partition in the first place. Hence, there is no reason to ignore the uncertainty in variance partitioning.
The above uncertainty considerations apply to the marginal distribution of variance partitions but another aspect of uncertainty is driven by the correlations between variance partitions. This is induced by correlations between model covariates and the correlations between model parameters in their posterior distribution. Even in the applied Bayesian variance partitioning studies that report uncertainties of marginal distributions, this aspect of uncertainty seems to have received barely any attention. For this reason we advocate explicitly reporting posterior correlations between variance partitions and linear terms as well as comparing marginal distribution of variance partitions to the unique variance partition and the marginal variance partition. These latter measures summarize the covariance relationships between the linear terms and, together with the basic variance partition, help interpreting possible confounding effects on the results.
When our research involves observational data, we will inevitably encounter situations where our covariates are correlated. The first step in approaching analyses of such data is acceptance.
Correlations might be a nuisance from the analyst's perspective, but they are also a real feature of the ecological system with implications for how the system works. In terms of partitioning variation, the resulting covariances between the linear terms are a feature not a deficiency in the method. They summarize the effects of the covariate distribution on variability and the limits of which effects can be identified independently. Hence we recommend not to remove covariates from an analysis simply because they are correlated (cf. Cox and Snell 1974, Morrisey and Ruxton 2018) as this only hides the problem and gives a false sense of certainty. We also do not recommend principal components analysis (PCA) as a general solution to correlated covariates.
To remain meaningfully interpretable, the covariates included in PCA should be sufficiently similar. Moreover, in attempting to interpret the PCA one has not escaped the correlations after all, but moved the problem to the interpretation of the loading factors. And when working with non-linear or additive effects it does not seem sensible to use anything but the original covariates.
If the aim is to reduce the number of covariates to a smaller group of meaningful features, this can be achieved with grouping of the linear terms. Especially when interest is less in interpreting regression coefficients or prediction and more in looking at variance contributions, the problem of correlations is reduced by the grouping of the linear terms. When the correlation between covariates is strong enough that their independent effects cannot be estimated in a model, by grouping the linear terms one can still estimate their joint contribution to the model (cf. Mood 1971). Generally, we recommend only grouping thematically related covariates. Depending on the research question, the analyst might well be interested even in reporting variances for predictor groups consisting of rather dissimilar covariates. Either approach provides an alternative to excluding correlated covariates, a practice often arising from a misunderstanding about the effect and role of collinearity (Cox andSnell 1974, Morrisey andRuxton 2018).
The simple fact that variances of sums equal sums of variances gets us surprisingly far. We like to think of the linear terms and their variances as the common currency of regression models.
They are all on the same scale and can be meaningfully added together, which makes variance partitioning such a versatile approach -while we should not add apples and oranges, we can freely add growth from apples and oranges eaten to get total growth. Unlike working only directly with regression coefficients, partial regression coefficients, or especially non-linear effects, variance partitioning can deal with both individual predictors and groups of them. And the within-between condition comparison is not available from simple regression coefficients which also do not account for effect size vs. realized variability.