Short CommunicationSelecting pseudo-absence data for presence-only distribution modeling: How far should you stray from what you know?
Introduction
Appropriate selection of pseudo-absence or background locations is essential for presence-only species distribution modeling (SDM) (Chefaoui and Lobo, 2008). Recent studies have highlighted several methods for selection of pseudo-absence points including at: random (e.g., Stockwell and Peters, 1999); random with geographic-weighted exclusion (e.g., Hirzel et al., 2001); random with environmentally weighted exclusion (e.g., Zaniewski et al., 2002); locations that have been visited (i.e., occurrences for other species) but where the target species was not recorded (e.g., Elith and Leathwick, 2007); and occurrences for an entire group of species collected using the same methods, encapsulating sampling bias of data (e.g., Phillips and Dudik, 2008). While the relative merits of these different methods have been discussed previously (e.g., Lütolf et al., 2006, Chefaoui and Lobo, 2008, Phillips and Dudik, 2008), one important methodological step that has not been properly evaluated is the extent of the geographic region in which background or pseudo-absence points are taken. We suspect that, in practice, the decision to set spatial constraints on the background is typically one that is made unconsciously. Modelers simply default to using the extent of an arbitrarily defined study area. But does this really matter?
There are several reasons why pseudo-absences selected at large distances from known occurrences may be problematic. Essentially, pseudo-absences are meant to provide a comparative data set to enable the conditions under which a species occurs to be contrasted against those where it is absent. If pseudo-absences are geographically disparate from the presence locations, predictive models will be dominated by parameters that serve to coarsely discriminate regional conditions with weakened ability to tease out fine scale conditions that actually limit the species distribution. This is in direct conflict with the purpose of generating pseudo-absences in the first place.
The objective of this study was to ask whether background size really matters and, if so, how far from presence localities should selection of pseudo-absence points be taken? We address both questions by selecting random pseudo-absences from increasingly larger background areas and monitoring the impact this has on the predictions of species distribution models. Specifically, we examine 12 rainforest vertebrates from the Australian Wet Tropics (AWT) and employ a common presence-only ecological niche modeling methodology, MAXENT (Phillips et al., 2006). In this application MAXENT is used to represent both background and pseudo-absence modeling. We explore changes in model accuracy, predicted distributional area and relative importance of predictor variables with increasing background size.
Section snippets
Methods
The AWT of northeastern Australia is an ideal candidate region for testing our objectives (Fig. 1). The region contains a diverse, well-studied vertebrate fauna and encompasses strong environmental gradients. The AWT supports 1.8 million ha of rainforest-dominated vegetation that was once widespread in Miocene Australia but now forms a distinct and isolated environmental domain of high diversity surrounded by drier and warmer environments (Nix, 1991, Moritz, 2005).
We modeled vertebrate species
Results
The results are summarized in Fig. 2. Model predictions and performance changed in at least four important ways as the area of the background from which pseudo-absences were drawn increased. First, the flexible area AUC increased. Specifically, the AUC increased rapidly as background size expanded from 10 to 100 km. Subsequent expansions resulted in only minor increases in AUC (i.e., at 100 km all models already had an AUC > 0.93 and by 500 km AUC > 0.99). Second, in 50% of the species, the fixed area
Discussion
Here we show that the size of background from which pseudo-absences are drawn has important ramifications for predictions and performance of SDMs. We have focused on predictions of current distributions but this issue will likely be even more problematic for models that are projected onto different geographic space or under different climate scenarios. For example, inappropriate background selection may unduly affect studies of invasive species (e.g., Mau-Crimmins et al., 2006, Steiner et al.,
Acknowledgements
This research was supported by the James Cook University Research Advancement Program, the Marine and Tropical Sciences Research Facility, Earthwatch Institute and the Queensland Smart State Program.
References (23)
- et al.
Ensemble forecasting of species distributions
Trends Ecol. Evol.
(2007) - et al.
Assessing the effects of pseudo-absences on predictive distribution model performance
Ecol. Model.
(2008) - et al.
Classification of aquatic bioregions through the use of distributional modelling of freshwater fish
Ecol. Model.
(2008) - et al.
The utility of artificial neural networks for modelling the distribution of vegetation in past, present and future climates
Ecol. Model.
(2001) - et al.
Assessing habitat-suitability models with a virtual species
Ecol. Model.
(2001) - et al.
Can the invaded range of a species be predicted sufficiently using only native-range data? Lehmann lovegrass (Eragrostis lehmanniana) in the southwestern United States
Ecol. Model.
(2006) - et al.
Maximum entropy modeling of species geographic distributions
Ecol. Model.
(2006) - et al.
Predicting species spatial distributions using presence-only data: a case study of native New Zealand ferns
Ecol. Model.
(2002) - et al.
Novel methods improve prediction of species’ distributions from occurrence data
Ecography
(2006) - et al.
Predicting species distributions from museum and herbarium records using multiresponse models fitted with multivariate adaptive regression splines
Divers. Distrib.
(2007)
Redefining biodiversity conservation priorities
Conserv. Biol.
Cited by (642)
Species distribution and habitat attributes guide translocation planning of a threatened short-range endemic plant
2024, Global Ecology and Conservation