Elsevier

Ecological Modelling

Volume 220, Issue 4, 24 February 2009, Pages 589-594
Ecological Modelling

Short Communication
Selecting pseudo-absence data for presence-only distribution modeling: How far should you stray from what you know?

https://doi.org/10.1016/j.ecolmodel.2008.11.010Get rights and content

Abstract

An important decision in presence-only species distribution modeling is how to select background (or pseudo-absence) localities for model parameterization. The selection of such localities may influence model parameterization and thus, can influence the appropriateness and accuracy of the model prediction when extrapolating the species distribution across time and space. We used 12 species from the Australian Wet Tropics (AWT) to evaluate the relationship between the geographic extent from which pseudo-absences are taken and model performance, and shape and importance of predictor variables using the MAXENT modeling method. Model performance is lower when pseudo-absence points are taken from either a restricted or broad region with respect to species occurrence data than from an intermediate region. Furthermore, variable importance (i.e., contribution to the model) changed such that, models became increasingly simplified, dominated by just two variables, as the area from which pseudo-absence points were drawn increased. Our results suggest that it is important to consider the spatial extent from which pseudo-absence data are taken. We suggest species distribution modeling exercises should begin with exploratory analyses evaluating what extent might provide both the most accurate results and biologically meaningful fit between species occurrence and predictor variables. This is especially important when modeling across space or time—a growing application for species distributional modeling.

Introduction

Appropriate selection of pseudo-absence or background locations is essential for presence-only species distribution modeling (SDM) (Chefaoui and Lobo, 2008). Recent studies have highlighted several methods for selection of pseudo-absence points including at: random (e.g., Stockwell and Peters, 1999); random with geographic-weighted exclusion (e.g., Hirzel et al., 2001); random with environmentally weighted exclusion (e.g., Zaniewski et al., 2002); locations that have been visited (i.e., occurrences for other species) but where the target species was not recorded (e.g., Elith and Leathwick, 2007); and occurrences for an entire group of species collected using the same methods, encapsulating sampling bias of data (e.g., Phillips and Dudik, 2008). While the relative merits of these different methods have been discussed previously (e.g., Lütolf et al., 2006, Chefaoui and Lobo, 2008, Phillips and Dudik, 2008), one important methodological step that has not been properly evaluated is the extent of the geographic region in which background or pseudo-absence points are taken. We suspect that, in practice, the decision to set spatial constraints on the background is typically one that is made unconsciously. Modelers simply default to using the extent of an arbitrarily defined study area. But does this really matter?

There are several reasons why pseudo-absences selected at large distances from known occurrences may be problematic. Essentially, pseudo-absences are meant to provide a comparative data set to enable the conditions under which a species occurs to be contrasted against those where it is absent. If pseudo-absences are geographically disparate from the presence locations, predictive models will be dominated by parameters that serve to coarsely discriminate regional conditions with weakened ability to tease out fine scale conditions that actually limit the species distribution. This is in direct conflict with the purpose of generating pseudo-absences in the first place.

The objective of this study was to ask whether background size really matters and, if so, how far from presence localities should selection of pseudo-absence points be taken? We address both questions by selecting random pseudo-absences from increasingly larger background areas and monitoring the impact this has on the predictions of species distribution models. Specifically, we examine 12 rainforest vertebrates from the Australian Wet Tropics (AWT) and employ a common presence-only ecological niche modeling methodology, MAXENT (Phillips et al., 2006). In this application MAXENT is used to represent both background and pseudo-absence modeling. We explore changes in model accuracy, predicted distributional area and relative importance of predictor variables with increasing background size.

Section snippets

Methods

The AWT of northeastern Australia is an ideal candidate region for testing our objectives (Fig. 1). The region contains a diverse, well-studied vertebrate fauna and encompasses strong environmental gradients. The AWT supports 1.8 million ha of rainforest-dominated vegetation that was once widespread in Miocene Australia but now forms a distinct and isolated environmental domain of high diversity surrounded by drier and warmer environments (Nix, 1991, Moritz, 2005).

We modeled vertebrate species

Results

The results are summarized in Fig. 2. Model predictions and performance changed in at least four important ways as the area of the background from which pseudo-absences were drawn increased. First, the flexible area AUC increased. Specifically, the AUC increased rapidly as background size expanded from 10 to 100 km. Subsequent expansions resulted in only minor increases in AUC (i.e., at 100 km all models already had an AUC > 0.93 and by 500 km AUC > 0.99). Second, in 50% of the species, the fixed area

Discussion

Here we show that the size of background from which pseudo-absences are drawn has important ramifications for predictions and performance of SDMs. We have focused on predictions of current distributions but this issue will likely be even more problematic for models that are projected onto different geographic space or under different climate scenarios. For example, inappropriate background selection may unduly affect studies of invasive species (e.g., Mau-Crimmins et al., 2006, Steiner et al.,

Acknowledgements

This research was supported by the James Cook University Research Advancement Program, the Marine and Tropical Sciences Research Facility, Earthwatch Institute and the Queensland Smart State Program.

References (23)

  • G.M. Harris et al.

    Redefining biodiversity conservation priorities

    Conserv. Biol.

    (2005)
  • Cited by (642)

    View all citing articles on Scopus
    View full text