## Abstract

A rich set of statistical techniques have been developed over the last several decades to estimate the spatial extent of animal home ranges from telemetry data, and new methods to estimate home ranges continue to be developed.

Here we investigate home-range estimation from a computational point of view and aim to provide a general framework for computing home ranges, independent of specific estimators.

We show how such a workflow can help make home-range estimation easier and more intuitive, and we provide a series of examples illustrating how different estimators can be compared easily, so that one can perform a sensitivity analysis to determine the degree to which the choice of estimator influences qualitative and quantitative conclusions.

By providing a standardized, tidy implementation of home-range estimators, we hope to equip analysts with the tools needed to explore how estimator choice influences answers to biologically meaningful questions.

## Introduction

The biological concept of an animal’s *home range* has served as a useful construct for organizing our thinking about how animals use and interact with space since the time of Darwin (Kie et al. 2010; Horne et al. 2020). Today, most people associate the term home range with Burt (1943)’s definition, “that area traversed by the individual in its normal activities of food gathering, mating and caring for young. Occasional sallies outside the area, perhaps exploratory in nature, should not be considered as in part of the home range.” A variety of statistical and modeling approaches have been developed to quantify the spatial extent and intensity of landscape use by individual animals and to gain insights into factors that structure their home ranges (see e.g., Powell 2012 and associated papers in a special feature on the topic). Recently, Fleming et al. (2016) and Horne et al. (2020) have argued for classifying statistical home-range methods by whether they estimate one of two estimation targets: the *range distribution* or long-term (equilibrium) distribution that would result from an animal continuing to move in a consistent manner and an *occurrence distribution* that captures the path of movement an animal takes during a specific observation window, along with its uncertainty. This dichotomy is appealing from a theoretical point of view, and several new statistical estimators have been developed for targeting these quantities while also addressing issues related to autocorrelation, a prominent feature of modern day Global Positioning System (GPS) data (Fleming et al. 2014, 2015, 2016).

Despite these advances, many biologists continue to use a variety of “old” estimators (e.g., minimum convex polygons [MCP]; Mohr 1947; or kernel density estimators [KDE] that assume independent location data; Worton 1989) without explicit discussion of a particular estimation target (e.g., Froy et al. 2018; Ranc et al. 2020). We suspect there may be multiple reasons why, including: 1) some ecologists may not be familiar with recent literature on home-range estimators; 2) current estimators that account for autocorrelation require an extra step of fitting an animal movement model to location data, which can also can take considerable time and computational resources when applied to large data sets involving many animals; some researchers may feel this extra step is unnecessary or they may not feel confident in their use of these methods; and 3) researchers may be interested in estimating something other than a range or occurrence distribution. The new methods developed by Fleming and co-authors are major contributions to this area of research, and these authors have done a nice job providing open-source software and training for implementing their estimators (Fleming and Calabrese 2020; Calabrese et al. 2020). We conjecture, however, that many biologists continue to view traditional home-range estimators as convenient, though imperfect, indices that capture the spatial extent of the area used by individuals during specific tracking periods. For convenience, and as is common in the literature, we will refer to the suite of methods used in this context as *home-range estimators*, even though these methods may have different statistical estimation targets (Horne et al. 2020).

Whereas there are many studies that compare different methods for quantifying space use with the goal of determining a single “best” estimator (e.g., Lichti and Swihart 2011; Walter, Onorato, and Fischer 2015; Noonan et al. 2019), we aim in this paper to be largely estimator agnostic. As we have argued previously, we think researchers should carefully consider estimators and their properties (e.g., variance or statistical power), and choose one or more depending on the specific biological questions of interest (Fieberg and Börger 2012; Signer et al. 2015). We also encourage users to consider multiple estimators, when possible, to evaluate the sensitivity of their results to estimator choice.

To accomplish this goal, we propose a general and consistent framework for home-range estimation that should be able to accommodate *most* home-range estimators. We propose two classes of home-range estimators and a set of properties for each class. Having a standardized treatment of home-range estimators facilitates their computation, visualization, and comparisons among estimators. This proposal goes hand in hand with calls for more reproducible and standardized workflows in (wildlife) ecology (Gula and Theuerkauf 2013; Lewis, Vander Wal, and Fifield 2018; Archmiller et al. 2020). After introducing the framework conceptually, we demonstrate how to estimate home-ranges using this framework following the principles of tidy data (Wickham and others 2014) using the R package `amt` (Signer, Fieberg, and Avgar 2019; R Core Team 2020) and a previously published data set of fishers from New York (LaPoint et al. 2013a).

### A conceptual framework for home ranges

Home-range estimators can be divided into two classes: geometric and probabilistic estimators (Figure 1; Fleming et al. 2015). Geometric estimators are constructed following a set of rules and are often hull-based, i.e., the home range is a polygon that is constructed using (all) points where an animal was observed. Typical examples of geometric estimators are minimum convex polygons (Mohr 1947) or local convex hulls (LoCoH; Getz and Wilmers 2004). On the other hand, probabilistic estimators have an underlying probabilistic model and estimate an utilization distribution, the two-dimensional relative frequency distribution of an animal’s spatial locations (Van Winkle 1975), from which a hull-based home range can be retrieved for a given isopleth level. Typical examples of probabilistic home-range estimators include uniform or bi-variate normal models (Van Winkle 1975; Horne and Garton 2006), traditional KDEs (Worton 1989; Fieberg 2007), and autocorrelated KDEs (aKDE; Fleming et al. 2015).

Each home-range estimator, regardless of its class, has several attributes (values stored within the object) and methods (functions to work with the estimate). Estimators of both classes should have the following three attributes: the coordinate reference system (`crs`), the data that were used to construct the home-range estimate (`data`), and the home-range isopleths or levels (`levels`). The coordinate reference system is inherited from the data used to estimate the home range and will be needed to ensure that the home range is correctly positioned in space and that the units of the home-range area are correct. The attribute `data` contains the original data used to calculate the home range which can be especially useful for plotting home ranges. Finally, home-range areas are calculated for a pre-specified home-range level (or isopleth). For probabilistic estimators, the cumulative distribution function of the utilization distribution is truncated at given quantile, with associated 1 *− α* level. For hull-based methods the outermost points are excluded (given it is possible to identify the outermost points). It is common to use the 95% isopleth to determine the home range (although arguments for 90% levels exist e.g., Börger, Franconi, Ferretti, et al. 2006a). Further, we propose the following methods for all home-range estimators: `hr_area()` to calculate the home-range area, `hr_isople``th()` to calculate the home-range isopleths at the specified `levels`, `hr_overlap()` to calculate the home-range overlap between two or more home ranges, and `plot` to plot the home range.

In addition to the four global methods, specific methods or classes can have additional properties and/or methods. Probabilistic home ranges, for example, have a property that gives information about the spatial extent and resolution of the utilization distribution (here and in `amt` this is termed template raster, or `trast` for the argument name). Probabilistic home ranges should also have a method to obtain the utilization distribution (`hr_ud()`), the cumulative utilization distribution (`hr_cud()`), and to quantify volumetric intersections of two utilization distributions (`hr_intersection()`). Examples of estimator-specific properties are the number of neighbors used for the local convex hull method, the bandwidth for kernel density estimation, or the movement model used for autocorrelated kernel density estimation (Fig. 1).

An established set of methods makes it easy to work with home ranges. For example, the function `hr_area()` in the package `amt`, will always return a `tibble` (Müller and Wickham 2020) with two columns (the home-range level and area), regardless of the estimator. A `tibble` is very similar to a `data.frame` in R (i.e., a two-dimensional data structure) but with improved properties. A `tibble` makes it is easy to work with list-columns, which as we will demonstrate later in this paper, help to facilitate analyses of data from multiple animals or sampling instances. Similarly, the function `hr_isopleth()` in `amt` will always return a `tibble` with a simple feature column of class `sfc_POLYGON` from the `sf` package (Pebesma 2018), allowing further GIS-related work.

### One individual or sampling instance

In the first set of examples, we demonstrate how home ranges and derived quantities can be calculated for a single individual or sampling instance (e.g., a home range for one individual using data collected during a single tracking period). We use a data set containing locations of fisher from New York, USA (LaPoint et al. 2013a). These data are freely available from Movebank (LaPoint et al. 2013b), and include observations of six individuals (three males and three females) tracked between January and March 2011, with a sampling rate of 10 minutes or less. We use a preprocessed data set here. All steps to prepare the data set are provided in Supplement 1. For the first few examples, we will use data from one female (F1); for the second set of examples, we will use data from all six individuals.

First we load required packages, including `amt` for calculating home ranges (Signer, Fieberg, and Avgar 2019), `tidyverse` for data manipulation (this includes `ggplot2` for plotting; Wickham 2016, 2017), and `lubridate` for working with dates (Grolemund and Wickham 2011). After loading the data, we use the function `make_track()` to create a `track` – an object class used by `amt`, and then use `filter` to filter only those relocations that belong to the fisher where `id == “F1”`.

With the `fisher.f1` data set, we can now calculate different home-ranges estimates. We demonstrate by calculating MCP and KDE home ranges here (with the default reference bandwidth). For both home-range estimators, we estimate home ranges at two different home-range levels (50% and 95%).

Results from applying any estimator in `amt` – not just the ones illustrated here – are stored in a `list`. All estimators have three entries in common: `crs`, `data` and `levels`; `crs` stores the coordinate reference system of the home range estimate, inherited from the data used to estimate the home range. The attribute `data` contains the track that was used to estimate the home range (a `track_xy*` from package `amt`), unless during estimation the argument `keep.data` was set to `FALSE`, then the attribute `data` is `NULL`. Finally, the argument `levels` contains the home-range levels that were used when estimating the home range.

All estimators also have at least four generic functions for working with the results and for basic plotting. The `plot()` function plots the home range isopleths with the observed points unless `keep.data` is set to `FALSE` or the argument `add.points` is set to `FALSE`. Below, we plot KDE- and MCP-based home ranges. When plotting the MCP, we use the arguments `add.relocations = FALSE` to avoid plotting the observed locations twice; further, we set the argument `add = TRUE` to draw the MCP home range to the existing plot and `border = “red”` to distinguish the KDE home range from the MCP home range by border color (Fig. 2).

Furthermore, we can now continue to work with these home-range estimates. For example, we can query the home-range area with the function `hr_area()`, which returns a `tibble` with two columns: the home-range level and the corresponding area.

The function `hr_isopleth()` returns a `tibble` with a simple feature column of class `sfc_POLYGON` from the `sf` package (Pebesma 2018), which we can use to conduct further spatial analyses, visually inspect the home range, or export it to a GIS.

Finally, we may want to calculate the extent of overlap between two or more home ranges (Fieberg and Kochanny 2005), for which we provide the function `hr_overlap()`. This function can be used to calculate overlap between any two sampling instances (e.g., time periods, animals or even estimators). Below we calculate overlap of the MCP and KDE home ranges, which were previously estimated.

`hr_overlap()` always calculates the fraction of the first home range (i.e., the first argument) that is intersected by the second home range (second argument). Hence, changing the order of arguments will lead to a different result (Fieberg and Kochanny 2005).

For probabilistic estimators, several other home-range overlap indices have been proposed (Fieberg and Kochanny 2005); these are also implemented in the `amt` package.

### Many individuals or sampling instances

Most telemetry studies collect data on several individuals and/or during several sampling instances (e.g., time periods). To compare estimates across sampling instances and to facilitate population-level inference, it is therefore important that the methods discussed so far for individual home ranges scale easily to situations with many animals and/or time intervals. The package `amt` does not provide an infrastructure for multiple instances, but instead relies on general data structures for such situations: *list columns* from the `tibble` package. A list column is a column of a `tibble` that contains a list. And a list, in turn, is a very flexible data structure that can hold almost any object. Thus, we can store results from applying home-range estimators in a list column of a `tibble`, together with meta-information (such as the name of the individual, its sex or age, or the time period it when was tracked) in other columns.

To demonstrate list columns, we will consider the full data set containing locations from all six fishers and illustrate workflows addressing three different example questions:

Do estimates of home-range size differ between sexes?

Is there a correlation between environmental covariates and estimates of home-range size?

How do daily “home ranges” change over time?

The aim of these examples is twofold: 1) we illustrate the benefit of standardized classes for home-range estimation (Fig. 1) and list columns, and 2) we highlight that some results are sensitive to the choice of estimator whereas others are not. In particular, estimates of home-range size tend to vary considerably among estimators, but relative comparisons over time or space are often robust to estimator choice (Signer et al. 2015).

All three of the above questions require that we iterate over several animals (question 1 and 2), and animals and days (question 3). List columns that organize data for each individual or sampling instance provide a simple way to facilitate these analyses. The function `nest()` from package `tidyr` (Wickham and Henry 2020) can be used to create a list column; `nest()` only requires the name of the list column and the columns that should be nested into the list. When using the syntax `nest(data = c(x_, y_, t_))`, below, all columns that are not named in the `nest()` call act as grouping variables. In our first example, the only column not listed is `id`, so it serves as a grouping variable; later we will show to group by more than one grouping variable. Alternatively, we could have used `nest(data = -c(id))` to specify that we want to use `id` as a grouping column, and that all other columns should be nested. These two approaches will result in identical results. Also note that we could choose a different name for the list column (i.e., it does not need to be labeled `data`).

The result of `nest` is a `tibble` with three columns: `id`, `sex` and `data`.

`data` is a list column that contains a tibble with all the relocations for a given animal (`id`). This list can be accessed in the regular way. For example to obtain relocations for the first animal, we can use:

With the `mutate()` function, we can create a new list column that contains the home-range estimates for each animal. To achieve this goal, we have to iterate over each element in the column `data`, apply a home-range estimator, and save the result in a list. In base R, the function `lapply()` is well suited to this task. An alternative is the function `map()` from the `purrr` package (Henry and Wickham 2020).

The function `map()` always iterates over a data structure (e.g., a vector or a list) that is provided as its first argument. The second argument to `map()` is a function that is to be applied to each element of this data structure. There are three different snytaxes we may use to specify this function: 1) we can simply supply the function name, as was done for the new column `hr_mcp`. In this case, `map()` is given the tracking data of each animal (stored in the column `data`) and the data set for each animal is then passed to the function `hr_mcp()`. This syntax works because the function `hr_mcp()` does not require the specification of further arguments (note, the default value of 0.95 for the home-range level is used). 2) A formula (`∼`) notation can be used to pass a function to `map()`. The advantage of this notation is that it is possible to access the data under evaluation – i.e., the relocation data of the current animal can be accessed either through a ., as we illustrate above, or through the predefined variable `.x` or `..1`. For example, for the local convex hull method, we want to choose `n` (the number of neighbors) as the square root of the number of observations. Thus, we count the number of rows with `nrow(.)` and then take the square root. Similarly, for the aKDE home-range estimator, we first want to fit a continuous-time movement model to the relocation data and then use this model when estimating the home range. Thus, we first pass the data, again using the ., to the function `fit_ctmm()` and then pass the result to the function `hr_akde()`. 3) `map()` can be used analogously to `lapply()`, by passing an anonymous function. We did not use this approach here, but if we would use this for the MCP home ranges, the call would change from `map(data, hr_mcp)` to `map(data, function(x) hr_mcp(x))`. `x` is just a local variable (i.e., a placeholder) for the current animal’s data and could also be named differently.

The data set `hr1` has now gained a new list column for each home-range estimator (in total there are now five new columns).

We now want to obtain the home-range size for each animal and each estimator using the same `map`-strategy. Taking advantage of the previously introduced framework, we know that a function `hr_area()` exists for each estimator, and that it will return the home-range size as a `tibble`. However, we would have to apply `hr_area()` separately to each column (`hr_mcp` to `hr_akde_ou`). Instead, we would like to apply the function `hr_area()` to one list containing the home-range estimates for all of the different estimators. To accomplish this task, we need to first change from wide to long format using the function `pivot_longer()` from the package `tidyr` (Wickham and Henry 2020) so that we end up with a tibble that has a column that records the estimator (MCP, KDE, LoCoH, etc) and a second column with the estimates.

We first removed the tracking data (column `data` as these data are no longer needed) and then pass the resulting `tibble` to the function `pivot_longer`. Here we need to say which columns should be turned from the wide format to the long format (`hr_mcp:hr_akde_ou`). The new data set, `hr2`, will have four columns. The first two columns are `id` and `sex` from the old data set. The third column is called `estimator` (this can be controlled with the argument `names_to`) and identifies the estimator type (i.e., the old column names). The fourth column is called `hr` and contains the actual home-range estimates (the name for this column be controlled again with the argument `values_to`).

The new long data format allows us to apply the `hr_area` function to each element of the new column `hr` of `hr2`.

We now undo the list column (with the function `unnest()`). This step is necessary to obtain a `tibble` without a list column, that is suitable for plotting.

We can now visually explore differences in home-range size between males and females, and consider how these differences are influenced by our choice of home-range estimator (Fig. 3A; full code to reproduce the plot is given in the Supplement 1). Estimates of home-range size differ considerably across the 5 estimators (Fig. 3A). As a result, estimates of the absolute difference in mean home-range size between sexes also varies depending on the chosen estimator (Fig. 3B). By contrast, differences between estimators becomes negligible if we quantify relative differences (i.e., ratios of mean home-range size; Fig. 3C); regardless of estimator choice, we find that male home ranges were 2.4 – 2.8 times larger than female home ranges. With a larger sample of individuals, we could quantify uncertainty in the ratio of mean home-range sizes using a bootstrap (Fieberg, Vitense, and Johnson 2020). If other additional animal-specific covariates were available and of interest, we could use a linear (mixed) model to quantify the relative importance of different covariates in determining home-range size. Again, a larger sample size (i.e., more animals) and a standardized collection scheme would be desirable (Börger, Franconi, De Michele, et al. 2006).

In a second example, we explore whether home-range size correlates with the amount of forest within an animal’s home range. To address this question, we load a preprocessed land use raster (see Supplement 1 for full details) and assign it to the object `env`. As before, we make use of `pivot_longer` to obtain one list column with all home-range estimates and then obtain the isopleth levels with `hr_isopleth()` function. Again this works for all implemented estimators in the package `amt` and results in an `sf`-object. With the function `extract` from the raster package (Hijmans 2020), the pixel values within each of the home-range isopleths can be queried.

In a final step for this analysis, we calculate the proportion of forest pixels within each individual’s estimated home range and the home-range size. For the proportion, we iterate again over the list `env`, but this time we use the function `map_dbl()`, a variant of `map()` that will always return a numeric vector.

We need to make a call to the function `unlist()` within the function `mean()` because `extract()` returns a list, allowing for more than one polygon per feature as is common with some home-range estimators (e.g., LoCoH). For this application, however, we can safely combine the land cover classes for different polygons belonging to the same animal. We use the resulting `tibble hr1.env1` to plot estimates of home-range size against the proportion of the estimated home-range composed of forest (Fig. 4). Similar to the previous example (Fig. 3), we observe that different home-range methods result in vastly different estimates in absolute terms, but the observed pattern (i.e., home-ranges with more forest tend to be larger in size) is consistent among all estimators. In situations with more tracked animals and more (environmental) covariates, linear (mixed) models could be used to simultaneously explore multiple determinants of home-range size.

Lastly, we consider an example exploring the extent to which individual space-use patterns change over time. To do so, we first add a new column to the `tibble` with the day of the year (called `yday`) and then group our data set by animal id (`id`) and the day of the year (`yday`). This results in a new `tibble` where the relocations for each animal and year-day are stored in the column `data`. We will only consider days with > 10 relocations. To do this, we first count the number of relocations of each instance (animal-year-day) and then filter for days with > 10 relocations. We then calculate five different home-range estimates as before, but for the new grouping and swapping out Fleming et al. (2015)’s aKDE estimator of the range distribution with Fleming et al. (2016)’s estimator of the occurrence distribution [OD]; the latter is more appropriate for use with short tracking periods where interest lies in estimating the actual path of the animal rather than its equilibrium distribution.

We then follow the same design pattern as before, combining all animals into one list column in the long format and applying the function `hr_area` to all estimates. As with the other 2 examples, there are large consistent differences between the 5 estimators, but all exhibit similar trends over time (Fig. 5). The full code is again given in the Supplement 1.

## Discussion

The home range is an important biological concept that has and will continue to be highly influential. Fieberg and Börger (2012) argued we should clearly distinguish the biological concept of a home range from the statistical methods used to gain insights into this concept. It is also important to recognize that home-range estimates are not always the end goal, but rather, estimates of home-range size are often used to explore questions regarding how various factors influence animals’ use of space (Börger, Franconi, De Michele, et al. 2006; Börger, Franconi, Ferretti, et al. 2006b); often, these questions involve comparisons of home-range estimates over space or time and for different population segments. For example, researchers may correlate estimates of home-range size with demographic traits, landscape features that also vary in time, or the presence or absence of predator species (Beest et al. 2011; Tingley et al. 2014; Ditmer et al. 2018). Estimates of animal home-ranges are also often used to determine habitat availability when studying habitat selection. Although we agree with Fleming et al. (2016) and Horne et al. (2020) that *range* and *occurrence* distributions are useful estimation targets that can help end users choose an appropriate statistical home-range method, for some situations it may not be clear which of the two concepts (if either) is most suitable for addressing a particular research question. For example, a range distribution will not be appropriate for studying temporarily varying space-use patterns. On the other hand, researchers may want to incorporate areas that were likely known and accessible to the animal but not used during a specific observation window when studying habitat selection (i.e., they may be interested in more than just the animal’s movement path, which would be estimated by the occurrence distribution).

Analysts often seek simple measures to detect changes in spatial extent of movements over time, between sexes or habitats. Like Signer et al. (2015), we found that answers to questions that involve relative comparisons of home-range size were robust to estimator choice. Yet, differences in the tracking regime (VHF or GPS) and sampling rate (i.e., how often is an animal tracked) can lead to vastly different home-range estimates depending on one’s choice of estimator (Noonan et al. 2019; Peris et al. 2020). These differences can also influence estimates of derived quantities and observed relationships, for example scaling laws between home-range size and body size (e.g., Noonan et al. 2020). Thus, it is important for researchers to conduct sensitivity analyses to determine how choice of estimator influences their quantitative and qualitative results. The standards we outline here should make this task simple to accomplish.

In the spirit of allowing users to freely apply and evaluate multiple estimators, we suggest approaching home-range estimation in a consistent manner, using a coherent and tidy work flow that facilitates quantification of space use of different animals when various grouping instances are present (e.g., individuals, temporal units such as days or weeks, or both). Animal tracking is still in its early stages (Kays et al. 2015), standards for tracking data are still in flux (Campbell et al. 2016), and new methods are constantly being developed. As wildlife biology enters an era of big data, a coherent, scriptable and reproducible workflow is needed to ensure reproducibility of results (Lewis, Vander Wal, and Fifield 2018; Archmiller et al. 2020), and a standardized, tidy implementation of home-range estimators should help facilitate that vision.

## Authors contribution

JS and JF conceived the ideas and designed methodology; JS analyzed the data; JS and JF led the writing of the manuscript. Both authors gave final approval for publication.

## Data availability

All data used in this manuscript were already published an are available here: https://zenodo.org/record/3991482.

## Acknowledgments

J.Fieberg received partial salary support from the Minnesota Agricultural Experimental Station.

## Footnotes

Added data statement