Mitigating pseudoreplication and bias in resource selection functions with autocorrelation-informed weighting

Resource selection functions are among the most commonly used statistical tools in both basic and applied animal ecology. They are typically parameterized using animal tracking data, and advances in animal tracking technology have led to increasing levels of autocorrelation between locations in such data sets. Because resource selection functions assume that data are independent and identically distributed, such autocorrelation can cause misleadingly narrow confidence intervals and biased parameter estimates. Data thinning, generalized estimating equations, and step selection functions have been suggested as techniques for mitigating the statistical problems posed by autocorrelation, but these approaches have notable limitations that include statistical inefficiency, unclear or arbitrary targets for adequate levels of statistical independence, constraints in input data, and (in the case of step selection functions) scale-dependent inference. To remedy these problems, we introduce a method for likelihood weighting of animal locations to mitigate the negative consequences of autocorrelation on resource selection functions. In this study, we demonstrate that this method weights each observed location in an animal’s movement track according to its level of non-independence, expanding confidence intervals and reducing bias that can arise when there are missing data in the movement track. Ecologists and conservation biologists can use this method to improve the quality of inferences derived from resource selection functions. We also provide a complete, annotated analytical workflow to help new users apply our method to their own animal tracking data using the ctmm R package.

In this study, we introduce a method for autocorrelation-informed likelihood weighting of animal 126 locations to mitigate pseudoreplication and bias in parameter estimates of RSFs. We provide 127 a description of the mathematical principles underlying our method, demonstrate its practical 128 advantages over conventional approaches using simulations and empirical animal tracking data for To the conventional IID log-likelihood, we incorporate our RSF log-likelihood weights, which 155 are given by where AKDE ( ) denotes weights optimized for non-parametric autocorrelated kernel density where ∫ r denotes the -dimensional volume integral and · · · denotes the expectation value 166 with respect to the distribution of the data, r( ), which may be autocorrelated. Because the true 167 density function (r) is unknown and kernel density estimation is non-parametric, the MISE (4) 168 must be approximated, and different approximations correspond to different methods of kernel 169 density estimation-all being asymptotically optimal (Izenman, 1991, Silverman, 1986 1993).

171
The inclusion of area terms in the RSF model is a form of parametric Gaussian density estimation, which allows weights to serve a similar purpose in RSFs as for kernel density estimation.
estimation, indicating that they provide a bias correction for the Gaussian model. The weights ( ) 175 are used to weight each sampled location so that the final log-likelihood is of the form These weights can be used to re-weight data points for any method of approximating inhomogeneous

264
To demonstrate the application of our method on real-world animal location data, we performed  parameter estimates of weighted RSFs remained similar across all sampling rates (Fig. 3). Notably, 317 weighted RSFs were better able to successfully identify a lack of habitat selection (i.e., weighted 318 RSF parameter estimates became increasingly closer to zero and a higher proportion of confidence 319 intervals contained zero) as the sampling rate increased, while IID RSFs performed worse as the 320 sampling rate increased (i.e., parameter estimates were largely consistent, but confidence intervals 321 around the [biased] estimates contracted sharply as the sampling rate increased). The benefits of 322 weighted RSFs are also enhanced when habitat is more highly clustered, but likelihood weighting 323 still improves both parameter estimates and confidence intervals when habitat is unclustered (Table   324 1).

326
When analyzing the water mongoose data (Fig. 3A), we found that both IID and weighted RSFs

333
When analyzing the caracal data (Fig. 3B), we found that although the IID RSF identified

345
When analyzing how data thinning would influence inferences of habitat selection by the caracal, 346 we found that IID RSFs parameterized using subsets of the caracal movement track exhibited 347

Figure 5: Parameter estimates and confidence intervals around those parameter estimates for one weighted RSF performed on a complete caracal movement track and 27 IID RSFs performed on subsamples of the movement track in which each sequential location was independent. The larger point and bar on the far left represent the weighted RSF, while the smaller points and bars represent individual subsets. The light blue box outlines the confidence interval of the weighted RSF, while the darker blue horizontal line identifies the selection parameter identified by the weighted RSF.
As expected, IID RSFs parameterized using thinned data sets exhibit substantial variation in their estimated parameters.
substantial variation around the parameter estimate provided by the weighted RSF (Fig. 5) RSFs to generate more robust confidence intervals, but with more variation in parameter estimates 357 than weighted RSFs. regularly sampled location data (e.g., VHF telemetry or visual searches for marked individuals). In the future, our approach could be extended further to address other statistical problems associated 373 with RSFs, which we discuss below.

374
Rigorous estimation of confidence intervals on RSF parameters is a challenge using conven- in RSFs (e.g., random effects). The method we propose in this study offers an objective way to 387 estimate effective sample sizes for generating reliable confidence intervals.

388
In addition to estimating more robust confidence intervals, our method of weighting also offers 389 potential to mitigate bias that can arise from inadequate sampling of landscape features that cause 390 triangulation failure, irregular sampling designs intended to document animal behavior at multiple 391 temporal scales, or technologies that do not allow regular sampling intervals. As shown by our 392 simulations (Table 1; Fig. 3), RSFs that assume IID data cannot handle frequent habitat-induced 393 triangulation failure-estimates of selection of those habitats are negatively biased, and models 394 become artificially more confident in this bias as the sampling rate increases. Our method of 395 weighting reduces this bias, and the quality of estimated parameters improves rather than declines 396 as the sampling rate increases (however, we note that the extent to which inference is improved it should be noted that because conventional IID RSFs assume stationary distributions of resource 440 use and availability, conventional RSFs also assume range residency (but this assumption is often 441 ignored).
Step-selection functions are therefore more appropriate for studying resource selection where is the domain of integration and ( ) is the normalization constant. For the Poisson point 708 process, the intensity function ( , ) determines the rate of events (e.g., animal visits) at a location 709 given the history of events in a stationary process. Equation (6) is the basic RSF model for typical The exponential model is usually used to describe resource selection, so under a log link function 712 the intensity function can be represented by: where R( , ) represents spatially explicit covariates or resources, and are their corresponding terms as the area available to an animal; hereafter, "available area"). The conventional model is 720 therefore given by which is explicitly a function of the available area . This means that any animal locations 722 outside of -which may be many locations if includes less than 100% of the area an animal 723 travels-necessarily occur in areas where the probability of use is assumed to be zero.
to numerically estimate integrals of interest. In RSFs, this occurs by sampling locations within the 728 available area (hereafter, "availability sampling" of "available locations"). In terms of available in terms of the spatial variance of the intensity function, VAR [ ], which can be estimated from 734 the available locations according to With some rescaling, the likelihood function of the data can be represented as that of a condi-

741
In contrast to the conventional two-stage method for parameterizing RSFs, it is also possible to 742 include spatial terms in the RSF model that determine the domain of availability simultaneously 743 with model fitting. Although the value of explicitly incorporating area in RSF models has been 744 recognized before (Fieberg et al., 2021), the benefits of this approach are not widely appreciated 745 among ecologists who conduct resource selection studies, and in practice, spatial terms are rarely, if 746 ever, included in RSFs. Incorporating spatial terms into RSFs can also account for animal movement 747 in the RSF model (albeit at coarser scales than iSSAs). Below, we describe our approach.

749
In an integrated RSF, we start with the same iPPP modeling framework (6), but rather than fixing 750 availability as a uniform distribution within an arbitrary spatial boundary-as in (4)-we substitute 751 a more gradually decaying intensity function ( , ), such that the integral converges without ad hoc truncation. By explicitly modeling non-uniform availability with the 753 intensity function ( , ), the availability parameters (which define the spatial extent of the area 754 that is available to an animal) are estimated consistently and simultaneously with the resource-755 selection parameters. A number of different models can be used to achieve this goal, but we will covariates included in the model, the probability density of its location at any given time is still 762 estimated as a symmetrical Gaussian utilization distribution, which serves as the null model.

763
Availability parameters are thus estimated consistently with the resource-selection parameters, 764 which increases the statistical efficiency of the fitted model, reducing bias in the parameter estimates 765 and improving the coverage of their confidence intervals. In particular, because we do not treat ,

766
, and as known parameters on which to condition a second-stage of analysis, this approach 767 also accounts for uncertainty in the unknown available area, by accounting for the correlation 768 between errors in the availability-parameter estimates and resource-selection parameter estimates, 769 and enabling estimation of confidence intervals around .