movedesign: Shiny R app to evaluate sampling design for animal movement studies

Projects focused on movement behavior and home range are commonplace, but beyond a focus on choosing appropriate research questions, there are no clear guidelines for such studies. Without these guidelines, designing an animal tracking study to produce reliable estimates of space-use and movement properties (necessary to answer basic movement ecology questions), is often done in an ad hoc manner. We developed ‘movedesign’, a user-friendly Shiny application, which can be utilized to investigate the precision of three estimates regularly reported in movement and spatial ecology studies: home range area, speed, and distance traveled. Conceptually similar to statistical power analysis, this application enables users to assess the degree of estimate precision that may be achieved with a given sampling design; i.e., the choices regarding data resolution (sampling interval) and battery life (sampling duration). Leveraging the ‘ctmm’ R package, we utilize two methods proven to handle many common biases in animal movement datasets: autocorrelated Kernel Density Estimators (AKDE) and continuous-time speed and distance (CTSD) estimators. Longer sampling durations are required to reliably estimate home range areas via the detection of a sufficient number of home range crossings. In contrast, speed and distance estimation requires a sampling interval short enough to ensure that a statistically significant signature of the animal’s velocity remains in the data. This application addresses key challenges faced by researchers when designing tracking studies, including the trade-off between long battery life and high resolution of GPS locations collected by the devices, which may result in a compromise between reliably estimating home range or speed and distance. ‘movedesign’ has broad applications for researchers and decision-makers, supporting them to focus efforts and resources in achieving the optimal sampling design strategy for their research questions, prioritizing the correct deployment decisions for insightful and reliable outputs, while understanding the trade-off associated with these choices.

The position autocorrelation parameter ( ) can be interpreted as the home range 119 crossing time, or the time is takes on average for an animal to cross the linear extent of 120 its range (Silva et al., 2022). As increases, we can expect an animal to take longer to 121 travel this linear extent. Range-resident animals tend to travel away from their point of 122 origin at a rate controlled by the location variance parameter ( ) while simultaneously 123 reverting back to it at a rate driven by (Péron et al., 2017). As → ∞, movement 124 becomes endlessly diffusive ( → ∞), with no range residency behavior; for range-125 resident species, is asymptotically constant and proportional to home range area 126 (Calabrese et al., 2016;Fleming et al., 2014a). 127 6 The velocity autocorrelation parameter ( ) describes how velocity persists through time 128 (directional persistence; Fleming et al. 2014). Animals with strong directional persistence 129 (longer bouts of constant speed and constant direction) will tend to have a large 130 parameter (such as a migratory species), while animals with more tortuous movement 131 (less linear, more diffusive) tend towards smaller . Speed and distance traveled are 132 two of the properties of an animal's velocity process, with variance = /( × ) for 133 stationary processes (Noonan, Fleming, et al., 2019 (Fieberg & Börger, 2012), and an understanding of existing constraints, both those 138 related to the study species (characterized by , , , and ) and the sampling 139 parameters (duration, and interval). 140 Our choice of sampling parameters largely determines our ability to detect the 141 characteristic timescales of the movement process: it is the sampling duration and 142 interval relative to characteristic timescales that determine whether there will be any 143 signature of the animal's range crossing time ( ), or its directional persistence ( ). 144 Typically, sampling duration ( ) should be at least as long as (and ideally many times 145 longer) for home range estimation (Fleming et  and we collect locations for 100 days, our effective sample size ( ) will be roughly 100. 158 This equivalence is irrespective of sampling interval Δ (provided that Δ < 1 day), as There are two objective metrics that quantify the performance of these estimators: the 216 effective sample size for home range estimation ( ), or how many independent 217 location fixes would be required to calculate the same quality home range estimate, and 218 the effective sample size for speed estimation ( ), or how many independent 219 velocity fixes would be required to calculate the same quality mean speed estimate. 220 In addition, the accuracy and precision of the estimates above -calculated from any 221

12
The application consists of four main sections, organized as easy-to-navigate tabs seen 258 in the left sidebar. Once the user has set their data source and research question(s), the 259 sidebar will automatically subset to only the required steps. We advise users to follow 260 along the guided tutorial available in the "Home" tab, until they feel confident with the 261 application's workflow. Each run-through of the entire workflow may take a few minutes 262 up to several hours to run; computation bottlenecks arise from the 263 ctmm::ctmm.select() and ctmm::speed() functions, particularly if the absolute 264 sample size exceeds 10 3 locations, or for very short sampling intervals. are also available to assist in additional validation steps that cannot be done 296 automatically, such as variograms to confirm range residency if the goal is home range 297 estimation (Fleming et al., 2014a), or plots to identify outliers (which should be removed 298 and the data re-uploaded before proceeding). 299 For option three ("Simulate data"), the application will simulate autocorrelated tracking 300 data using an isotropic Ornstein-Uhlenbeck with foraging (OUF) Gaussian process, 301 which incorporates both correlated velocities and constrained space use. This OUF 302 model is also the most frequently selected across modern GPS tracking datasets 303 the new model fit, which will assist in decision making (to continue with the current 320 sampling design, or to make further adjustments before proceeding). 321

Running estimators and building the report 322
In the Analyses section, there will be one or two tabs available to the user, based on the 323 chosen outputs: (1) Home range, and/or (2) Speed & distance. Each tab will run an 324 estimator on the simulated dataset, created during the previous section. For each output, 325 there will be two sets of values in the main output box: "Estimate", which is the point 326 estimate followed by the 95% confidence intervals (CIs), and "Expected error", which is 327 the relative error (in %) of the point estimate (and of the 95% CIs) in relation to the 328 expected values. The user can utilize these outputs (as well as effective sample sizes) 329 to plan an appropriate sampling duration and interval so the confidence intervals are 330 sufficiently narrow and relative errors acceptably low: typically, a more reliable home 331 range estimate requires a longer sampling duration than the one specified, while a more 332 reliable speed and distance estimation requires shorter sampling intervals. However, we 333 advise caution in two scenarios for speed and distance estimation: (1) producing a CTSD 334 estimate at Δ > 3 does not guarantee a meaningful output, and (2) Δ ≪ may 335 only yield a marginal benefit over Δ < while markedly increasing computation time 336 (see Noonan, Fleming, et al., 2019 for more details). Users can then navigate to the last 337 section, the "Report" tab, to see a summary of all previous inputs and outputs, and the 338 final recommendations regarding home range and/or speed and distance estimation.

356
We selected the individual "Cilla" for parameter extraction; once validated, we extracted 357 parameters from the fitted OUF anisotropic model for a of approximately 7.5 days (CI: 358 4.4 -12.7), and a of 42 minutes (39.6 -44.7). We plotted our sampling design options 359 based on the chosen tracking device parameters (Figure 3), which revealed that a rough 360 estimate of for any sampling interval was substantially smaller the (as 361 expected, since ≫ ). As our focus was both home range and speed/distance 362 estimation, we wanted to maximize and , so we selected a sampling interval 363 of one hour (which sets our sampling duration to 5.7 months, due to the battery/resolution 364 trade-off discussed earlier). Once we successfully validated and ran our new simulation, 365 our sample sizes were = 4,012, = 18.1 and = 3,991. 366 367 Figure 4. Example outputs shown in the "Analyses" tabs, including the point estimates (and corresponding 368 confidence intervals below the point estimate in small, black text) and/or expected errors associated with 369 both home range and speed/distance. We can also visualize the home range estimate (with the true 95% 370 area for comparison), and the instantaneous speed estimates at different time scales -other visualizations 371 (such as those related to distance) are not shown here but are available within the app.

372
With an appropriate movement model available, the next step was to estimate home 373 range area, mean speed and total distance traveled (Figure 4). Once the final report was 374 built, the AKDE estimates showed high uncertainty, while the CTSD estimates did not. 375 Based on these results and the effective sample sizes, we determined that our sampling 376 design was adequate for speed/distance estimation, but could be insufficient for home 377