RT Journal Article SR Electronic T1 Evaluating Bayesian spatial methods for modelling species distributions with clumped and restricted occurrence data JF bioRxiv FD Cold Spring Harbor Laboratory SP 105742 DO 10.1101/105742 A1 David W. Redding A1 Tim C. D. Lucas A1 Tim Blackburn A1 Kate E. Jones YR 2017 UL http://biorxiv.org/content/early/2017/02/06/105742.abstract AB 1. Statistical approaches for inferring the spatial distribution of taxa (Species Distribution Models, SDMs) commonly rely on available occurrence data, which is often non-randomly distributed and geographically restricted. Although available SDM methods address some of these problems, the errors could be more directly and accurately modelled using a spatially-explicit approach. Software to implement spatial autocorrelation terms into SDMs are now widely available, but whether such approaches for inferring SDMs are an improvement over existing methodologies is unknown.2. Here, within a simulated environment using 1000 generated species’ ranges, we compared the performance of two commonly used non-spatial SDM methods (Maximum Entropy Modelling, MAXENT and Boosted Regression Trees, BRT) to a spatially-explicit Bayesian SDM method (Integrated Laplace Approximation, INLA), when the underlying data exhibit varying combinations of clumping and geographic restriction. Finally, we tested whether any recommended methodological settings for all methods were further impacted by spatially non-random patterns in these data.3. Spatially-explicit INLA was the most consistently accurate method, being most or equal most accurate in 5 out of 8 data sampling scenarios. Within high-coverage sample datasets, all methods performed fairly similarly, but when sampling points were randomly spread BRT had a 1-3% greater accuracy over the other methods and when samples were clumped, spatial-INLA had a 4%-8% better in AUC score. Alternatively, when sampling points were restricted to a small section of the true range, all methods were on average 10-12% less accurate, with higher variation among the methods. None of the recommended settings for the different methods were found to be sensitive to clumping or restriction of data, except the complexity of the INLA spatial term.4. INLA-based modelling approaches can be successfully used to account for spatial autocorrelation in an SDM context and, by taking account of random effects, produce outputs that can better elucidate the role of covariates in predicting species occurrence. Given that it is often unclear what the drivers are behind data clumping in an empirical occurrence dataset, or indeed how geographically restricted these data are, spatially-explicit INLA-based SDMs may be the better choice when modelling the spatial distribution of target species.