Abstract
We are in the midst of a sixth mass extinction but little is known about the global patterns of biodiversity when accounting for taxonomic, phylogenetic and functional information. Here, we present the first integrated analysis of global variation in taxonomic, functional diversity and phylogenetic diversity of more than 17,000 tetrapod species (terrestrial mammals, amphibians, reptiles and birds). We used a new metric (z-Diversity) able to synthetize taxonomic, functional and phylogenetic information across different sets of species to provide a comprehensive estimation of biodiversity. Our analyses reveal that hotspots of tetrapod diversity are clustered in specific regions of the world such as central Africa and the Indian peninsula, and that climate stability and energy availability have an overarching importance in explaining tetrapod spatial patterns. Future research might take advantage of these methods to perform an informed prioritization of protected areas.
Introduction
Humans drive patterns of biodiversity in the Anthropocene to the point that the world is facing the sixth mass extinction1, where nearly 1 million species are estimated to be threatened with extinction with severe consequences for ecosystem health and human wellbeing2,3. Biodiversity is a multidimensional metric4 and species loss does not only entail a reduction in species richness, but potentially affect also the evolutionary history (phylogenetic diversity – PD5) and the functional structure (functional diversity – FD6) of natural communities7,8. While PD can provide information on how past dispersal events may have shaped current species assemblages9, FD depicts ecosystem functions and associated services than simple patterns of species richness and turnover might not completely disclose10. Particularly, the regional loss of PD or FD may lead local assemblages towards the loss of evolutionary history or important functions likely jeopardizing crucial ecosystem processes, and potentially leading to higher homogenization11. In recent years, increased data availability (e.g. species spatial distribution, functional or genetic data) has improved our understanding of global diversity patterns across the tree of life9,12–14, including the development of conservation targets based on the assumption that conserving species with unique evolutionary history indirectly preserve also other diversity facets (e.g. EDGE project15). Nevertheless, recent findings seem to suggest that focusing on PD alone might not ensure the conservation of all facets of diversity16, but the strength of the relationship between PD and FD is still debated in literature17,18. Given these premises, the inclusion of different diversity facets beyond taxonomic diversity is essential for a thorough understanding of the processes shaping life on Earth19,20, and ideally to reevaluate global priority areas for biodiversity conservation21–24. Despite the pivotal role of FD and PD on ecosystem functioning and stability10,25,26, little is known about how biodiversity conservation could benefit from an integration of its different diversity facets21,27.
Here, we provide the first integrated analysis of global variation in taxonomic, functional diversity and phylogenetic diversity of extant tetrapods (terrestrial mammals, amphibians, reptiles and birds) by presenting a new metric (z-Diversity) integrating species richness, PD and FD in a single measure that can be combined across different groups of species to provide a comprehensive estimation of biodiversity. We focused on Tetrapods which represent half of the vertebrate species living on our planet and are among the most described taxa (in terms of spatial distribution, conservation status and functional traits) on our planet. There are continuous evidences of ongoing global decline for all these species28–32, to the point that approximately one third of them are threatened with extinctions, spanning from 14% of birds to 40% of amphibians33. Tetrapods have important ecological roles within natural ecosystems34,35, thus preserving higher tetrapod diversity should buffer the effects of accelerated global change36,37, promoting ecosystem stability38.
Many studies tried to disentangle tetrapod spatial patterns mainly focusing on mammals and birds21,35,39,40, but see19,41), and their taxonomic patterns42–44, whereas little attention have been paid to the spatial patterns of the other diversity facets (i.e. PD and FD)9,39,40. Several hypotheses (reviewed in Fine 45) have been postulated to explain broad-scale patterns of species diversity, usually relying solely on species richness, with a lack of general consensus so far. These relate diversity to the variation in water-energy dynamics46,47 or link it with macroevolutionary aspects48, historical factors44 and species coexistence49. Nevertheless, there are no well-established mechanistic hypotheses about the drivers of broad-scale patterns of PD and FD, and if they might respond to different factors with respect to the one described for species richness. Given these premises, an integrated metric such as z-Diversity might help to identify global priority areas whose protection would maximize tetrapod diversity. In addition, testing the relationship between z-Diversity and some variables related to past climate change, biogeography history, energy availability and land use legacies might shed light on their relative influence in shaping global tetrapod spatial patterns. Our analyses reveal that hotspots of tetrapod diversity are clustered in specific regions of the world such as central Africa and Indian peninsula. Finally, climate stability and energy availability revealed to be the best predictors in explaining the spatial variation across all tetrapod groups.
Results
Spatial mismatch between diversity facets
For our analysis, we collated a large database of 17,341 tetrapod species encompassing 3,912 terrestrial mammals 3,239 amphibians, 3,338 reptiles and 6,852 birds for which accurate range estimates were available based on International Union of Conservation of Nature (IUCN) data50 which were subsequently converted to hexagonal equal-area grid cells (cell resolution 23,322 km2) on which we compiled the species list in each cell for each taxonomic group. Later, we selected a set of functional traits characterizing tetrapod species from public databases51,52 along with their phylogenies20,40,53,54. Due to the presence of missing values among traits, for each group we performed a phylogenetically informed trait imputation procedure followed by a sensitivity analysis to evaluate imputation performance following Carmona et al.8, both using phylogenetic information that functional traits only. Briefly, for each taxonomic group we first compute the functional space using principal component analysis (PCA); we then artificially removed trait values in a reduced set of species which were later imputed with the complete database. The ability in retrieving species position in the functional space was used as an indicator of the performance of the imputation process. Our simulations showed that the imputation procedure performed quite well in retrieving the positions of species in the functional space for all groups, but using phylogenetic information halves the errors on average with respect to the imputation realized with traits information only (Supplementary Figure 1, see methods for more details).
For each grid cell and for each group, we therefore estimated species richness (SR), Faith’s PD5 and FD which was expressed as functional richness (FRic). Since both PD and FRic depend on species richness55, we performed null model simulations to obtain standardized effect sizes – SES computed as follows: [SES = (Metricobs–mean(Metricnull))/SDnull.]. SES indicate the degree of deviation of a given metric (expressed in SD units) with respect to simulated values. The three diversity metrics thus obtained (SR, sesPD, sesFRic) were later scaled and centered to unit variance (zSR, zPD, zFRic) and averaged into a single indicator of diversity (z-Diversity). The arithmetic mean among the z-Diversities of the four taxonomic groups provided a new overall metric able to synthetize the total diversity (taxonomic, functional and phylogenetic aspects) contained in a set of species.
Overall, we observed congruent spatial pattern in species richness and sesPD for all taxonomic groups. In contrast, sesFRic showed some striking differences especially between mammals and reptiles (see for instance central Africa and Indian peninsula in Supplementary Figure 2, where to a higher sesFRic was associated a lower sesPD). Morevoer, negative correlations between species richness and sesPD were detected across all taxonomic groups while there was a slight positive correlation between sesPD and sesFRic (Supplementary Table 1). Tetrapod z-Diversity is strongly correlated with zFRic (Pearsons’s correlation r = 0.76, p < 0.001; all correlations were spatially corrected) and to a lesser extent to zSR (r = 0.34, p < 0.01) whereas a not significant correlation was detected with zPD (r = 0.17, p > 0.05). z-Diversity was also strongly correlated with zFRic across all groups; additionally for mammals and birds we observed also a significant correlation with zSR and zPD, respectively (Supplementary Table 2). Notably, Afrotropics and Indomalayan realms showed an overall even dispersion on both sesPD and sesFric with respect the other realms, in contrast Neotropic realm was mainly driven by both phylogenetic and functional clustering across all groups (Supplementary Figure 3).
Global priority areas
Global tetrapod z-Diversity is highest in Africa and South-East Asia followed by Central and South America, Japan and the Mediterranean basin (Figure 1A). Looking at the single groups (Figures 1B,C,D,E), mammals z-Diversity was higher in Africa and Indian peninsula, whereas amphibians showed a higher z-Diversity especially in the Amazon basin. Reptiles displayed the highest variation in Africa and South-East Asia while bird assemblages showed higher z-Diversity in southern hemisphere with peaks especially in Africa and Oceania. Interestingly, hotspots of tetrapod z-Diversity (the richest 5% of grid cells) were largely clustered in African continent with few spots in Indian peninsula and South America (tropical Andes, northeastern coast, Figure 2A). These patterns were mirrored by all the considered groups (Figure 2B,C,D,E), except for amphibians whose higher z-Diversity resulted to be largely clustered in South America.
Global patterns of z-Diversity expressed averaging z-scores of single diversity facets in each taxonomic groups (zSR, zPD, zFRich). These were later mediated across groups to obtain tetrapod diversity. (A) Tetrapoda, (B) Amphibia, (C) Aves, (D) Mammalia, (E) Reptilia. Silhouettes were retrieved from PhyloPic (www.phylopic.org).
Global hotspots of z-Diversity. Darkest tones denote 10% of the richest grid cells while darker tones 5% and 2.5%, respectively. (A) Tetrapoda, (B) Amphibia, (C) Aves, (D) Mammalia, (E) Reptilia. Silhouettes were retrieved from PhyloPic (www.phylopic.org).
Climate stability and energy availability shapes tetrapod diversity
The global patterns of tetrapod z-Diversity were highly predictable by the set of variables that we chose (R2 =0.85±0.04, Root Mean Square Error -RMSE =0.24±0.06; average ± SD). Our model showed that the global pattern of z-Diversity was mainly driven by energy availability and climate variation since Late Quaternary, rather than by current or past anthropogenic factors (Figure 3, Table 1). Within taxonomic groups (Supplementary Figures 4–7), results were relatively concordant, only amphibians departed from this general pattern, probably due to their higher dependency on water. In addition, whereas the diversity of mammals, birds and reptiles increased along with evapotranspiration, the diversity of amphibians showed a negative relationship with PET (Figure 3). In contrast, birds were primarily driven by a positive relation with PET, while all other variables showed a comparable influence in the model. In terms of model performance, RMSE within individual groups was higher than those of the tetrapod model (≈ 0.41) coupled with a small reduction in R2 (≈ 0.78).
Variable importance ranked by the RMSE loss after permutations (left panel) and marginal effects of the different predictors (right panel) of the random forest model using tetrapod z-Diversity as response variable. ClimVar and ClimVel represented the average rate of change during the time-series (expressed in °C/century and m/yr, respectively) since Last Glacial Maximum. BiomeShan described the variation in biome patterns over the last 140 ka expressed using the Shannon index. SoilRH, PET, and TreeCovFract2019 represented soil humidity, Potential Evapotranspiration and forest cover updated to 2019, respectively. LULC expresses the fraction of grid cell under anthropogenic land use since 8000 BC, while HFP2009 is the 2009 Human Footprint index.
Model output showing the variable importance expressed using Root Mean Square Error (RMSE) loss (average ± SD) for each variable considering all tetrapod pooled and each taxonomic group independently. ClimVar and ClimVel represented the average rate of change during the time-series (expressed in °C/century and m/yr, respectively) since Last Glacial Maximum. BiomeShan described the variation in biome patterns over the last 140 ka expressed using the Shannon index. SoilRH, PET, and TreeCovFract2019 represented soil humidity, Potential Evapotranspiration and forest cover updated to 2019, respectively. LULC expresses the fraction of grid cell under anthropogenic land use since 8000 BC, while HFP2009 is the 2009 Human Footprint index. RMSE and R2 were obtained using spatial cross-validation. N represents the number of grid cells used to train the models. Please note z-Diversity was computed only in the cells where all the three metrics (zSR, zPD,zFD) were available.
Discussion
We collated, for the first time at a global scale, the taxonomic, phylogenetic and functional characteristics of all groups of terrestrial vertebrates and summarized it in a single index. Accounting for all the three diversity facets across different taxonomic groups revealed conservation priority areas that are usually overlooked in global conservation schemes that use less comprehensive information41. These new hotspots of diversity include arid and semi-arid environments, especially in the Mediterranean basin, central Asia, southern coast of Australia or in South America (e.g. Brazilian caatinga). Interestingly, despite the relatively lower number of tetrapod species with respect to Neotropics, the Afrotropical and Indomalayan realms stand out as hotspots of a high diversity (Supplementary Figure 3). This result is in agreement with previous studies on individual taxonomic groups (e.g. amphibians20, mammals39,56, reptiles41), but here we present the first comprehensive assessment showing this trend across all terrestrial vertebrates and considering multiple facets of diversity.
Interestingly, the pattern of z-Diversity is primarily driven by functional diversity, as suggested by the high correlation between z-Diversity and zFRic (r = 0.76), highlighting the importance to consider functional information to provide reliable evaluation of species diversity patterns. Afrotropics showed the highest values of sesFRic with respect to random expectations especially for mammals and reptiles (Supplementary Figure 3). This pattern might be explained by the high intrinsic megafaunal diversity reported for this continent56,57. African continent was probably the first one which experienced some moderate megafaunal loss (e.g. carnivores and proboscidean) already in Early Pleistocene (~ 1 Ma) likely due to the appearance of Homo erectus58, which were later somehow dampened thanks to coevolution with Homo sapiens. In contrast, outside H. sapiens area of origin, subsequent extinction waves occurred coinciding more or less with the expansion of humans across the globe59. In addition to this, the Great American Biotic Interchange (GABI) – the interchange between North and South American faunas associated with the formation of the Isthmus of Panama – seemed to have enhanced the extinction and the consequent reduction of diversity in South American mammals60.
We detected also dominant phylogenetic clustering suggesting that environmental filtering and inter-clade competitions have shaped local assemblages61. Indeed, clades with rapid speciation rates such as primates in Africa and ovenbirds (Furnariidae) in Central-South America or closely related species tended to co-occur more frequently at smaller scales, as a results of local processes of radiation and dispersal limitations62. Nonetheless, multiple processes can interact together in defining local assemblages in space and time, and more studies linking mechanistically trait evolution and biogeographic history can help in this sense (e.g. process-based models63). Moreover, the relatively low correlation between sesPD and sesFRich implies a spatial mismatch in the global spatial diversity patterns, suggesting also that phylogenetic diversity captures only a portion of functional diversity in agreement with recent works16,64.
Energy availability and climate stability confirmed to have an overarching importance to explain tetrapod diversity. Water–energy dynamics are important in describing species richness patterns for plants65 and animals46,66, but their relationship with the other diversity facets has been poorly investigated at a global scale (but see67). Generally, higher energy (i.e., higher PET) is linked to a higher resource availability which in turn promotes greater species packing (i.e., more species coexist with narrower niches68) and larger population sizes which may lessen extinction rates47. When considered individually, only amphibians departed from this general pattern, due to their higher dependency on water. The high importance of soil humidity in amphibians (Supplementary Figure 5) is not surprising since it helps in keeping balanced their hydric state69. Also the negative relation with PET compared to the positive of all other groups could be explained by the property of this metric, which tends to increase towards dry environments, not reflecting water balance as accurately as Actual Evapotranspiration (AET)70. Model outputs also indicated that climate stability promotes higher diversity, probably through the combination of lower extinction rates and high levels of speciation71,72, occurring also at a larger spatial scale. There are compelling evidences of higher extinction rates towards the poles for different taxonomic groups67,73 further corroborating the idea that climate stability and evolutionary processes influence species richness latitudinal gradient42 through region-specific accumulation of diversity74, which is consistent with the CSH. Accordingly, species inhabiting more stable regions tend to display restricted thermal preferences and higher specialization48,75,76, thanks also to the higher frequency of speciation events77 driven by the intimate link between temperature and ecological and evolutionary rates78. In contrast, extinctions might be higher in climatic unstable regions79, being triggered by variations in Earth’s orbit causing recurrent climatic shifts across the globe80. For instance, higher extinction rates occurred during cold periods, especially for those taxonomic groups with poor dispersal abilities81 (e.g. reptiles). To the best of our knowledge, this is the first evidence demonstrating how climate stability influences broad-scale patterns of species diversity, considering all three diversity facets. Lastly, we found no consistent effect of past and recent Land Use Land Changes similarly to what observed for genetic diversity12, even though future projections of land-use changes seem to strongly affect Earth’s biodiversity82,83. Another explanation for this lack of signal might rely in the relatively coarse scale used in this study along with the lack of finer spatio-temporal data able to depict these patterns. Even though some taxa (e.g. small-ranged species) or regions (e.g. tropics) might have some spatial biases84, and despite the potential lack of inclusion of important evolutionary or ecological variables (e.g. speciation and dispersal rate), our models indicated that the selected variables are able to describe most of the global variation in tetrapod diversity.
Our novel approach allows to consider all components of biodiversity and average them across taxonomic groups. Future research can take advantage of these methods to perform an informed prioritization of protected areas23,24, which could enhance the achievement of Aichi Biodiversity targets, whose progress for some indicators are still not satisfactory2. More importantly, the cells hosting a higher tetrapod diversity are often located in regions under high human pressure (e.g. Southeast Asia, Mediterranean coast)19,85 enhancing the need for a transnational cooperation, especially in the countries with lower GDP in order to preserve also the “option-value” of natural ecosystems.
Methods
Species spatial distribution and environmental data
We obtained expert-verified range maps of 23,848 tetrapod species from the International Union for Conservation of Nature (IUCN) 50. Even though these maps might underestimate the complete extent of occurrence of the species, especially in poorly surveyed regions 84, these currently represent the best information available. We then excluded marine mammals and range maps were converted to hexagonal equal-area grid cells with a cell area of 23,322 km2 using the ‘dggridR’86 R package. We chose this resolution because it is close to the finest resolution justifiable for using global data without incurring in false presences87. Species names were standardized using Global Biodiversity Information Facility (GBIF) Backbone Taxonomy88 using the R package ‘taxize’89.
For each grid cell, we computed several environmental predictors depicting spatiotemporal effects of past climate change/biogeography history, current ecosystem features, and anthropogenic disturbances. Specifically, we gathered the following environmental data: climate stability since Last Glacial Maximum (ca. 20 kya) was retrieved using two complementary indices reflecting the median rate of change during the time-series expressed in °C/century (climate variation90, ClimVar) and m/yr (climate velocity43, ClimVel). Biome variation (BiomeShan)91, expressed using the Shannon index, described the variation in biome patterns over the last 140 ka. Gridded databases of Soil humidity (SoilRH) and Potential Evapotranspiration (PET) were obtained from TerraClimate92, while forest cover (TreeCovFract2019) updated to 2019 was retrieved from Copernicus Global Land Cover products93. Land cover land use (LULC) legacy effects were assessed by means of the data of Kaplan et al.94, which reported the fraction of grid cell under anthropogenic land use since 8000 BC, while the 2009 Human Footprint index (HFP2009)95 was used to depict the spatial distribution of the current human pressure across the globe. HFP2009 reports for each grid cell a measure of the intensity of eight metrics of human pressure (i.e., human population density, roads, railways, navigable waterways, built environments, crop land, pasture land, night-time lights), weighted based on the relative human pressure on that cell 95.
Functional traits
Functional trait data for the different groups were collected using public databases from different sources. See8 for a detailed description of the traits used in this study.
Mammals, reptiles and birds
Data were retrieved from Amniote database51, which include traits for 4953 species of mammals, 6567 species of reptiles, and 9802 species of birds. Specifically, this database contains information of 29 life history traits, of which we selected a subset of traits with information available for at least 1000 species (see Table S1 in Carmona et al.8 for more details about traits and their completeness in each group). For mammals, eight traits were chosen: longevity (long, years), number of litters per year (ly), adult body mass (bm, g), litter size (ls, number of offspring), weaning length (wea, days), gestation length (gest, days), time to reach female maturity (fmat, days), and snout–vent length (svl, cm). For birds, we selected the following traits: number of clutches per year, adult body mass (bm, g), incubation time (gest, days), clutch size (ls, number of eggs), longevity (long, years), egg mass (em, g), snout-vent length (svl, cm), and fledging age (fa, days). Regarding reptiles, six traits were selected: number of clutches per year, longevity (long, years), adult body mass (bm, g), clutch size (ls, number of eggs), incubation time (inc, days), and snout-vent length (svl, cm).
Amphibians
Functional trait data of amphibians were retrieved from AmphiBIO database52. We selected four traits that mirror similar information as the one collected for the other three groups (i.e. traits related to body size, pace of life and reproductive strategies): age at maturity (am, years), body size (bs; measured in Anura as snout-vent length – SVL – and in Gymnophiona and Caudata as total length in mm); maximum litter size (ls, number of individuals); and offspring size (os, mm).
Phylogenies
Phylogenies for each group were downloaded from published papers20,40,53,54. Species absent from the phylogeny were manually added to the root of their genus using the R package ‘phytools’96. Since for mammals and birds multiple phylogenetic trees were available, for these groups we computed a maximum clade credibility tree (MCC) using the ‘phangorn’97 R package. To assess the reliability of the information contained in the MCC, we performed a simulation where we correlated PD obtained from this MCC with those obtained with 100 phylogenies randomly selected from the original posterior distribution. This test proved that using the MCC tree is unlikely to affect the computation of PD (Supplementary Figure 8).
Trait imputation and sensitivity analysis
Since there were gaps in the functional trait data, we imputed missing traits for each group using ‘missForest’98 R package. This procedure relied on random forest algorithm to impute trait data taking advantages also of the phylogenetic relationships among species following the procedure described in Penone et al.99. To further validate this procedure, we performed a sensitive analysis similarly to the one performed in ref. 8, but repeating the imputation process using both phylogenetic information and without it… Our simulations showed that the imputation procedure performed quite well in retrieving the positions of species in the functional space for all groups, but using phylogenetic information halves the errors on average with respect to the imputation realized with traits information only (Supplementary Figure 1)
Calculation of diversity metrics
Extinct species and species totally lacking evolutionary, functional trait or spatial data were removed from the database, thus leaving 17,341 species for subsequent analysis (N = 3,912 for mammals, N = 3,239 for amphibians, N = 3,338 for reptiles and N = 6,852 for birds; see Supplementary Table 3). To map global patterns of tetrapod diversity, we first computed diversity metrics for each taxonomic group independently. Species richness was estimated as the number of species in each cell; PD represented the sum of branch length between the root node and tips for the subtree comprising all species in the grid cell, and was computed using the ‘caper’100 R package. FD was estimated as described in ref8, we first have built a two-dimensional functional space based on a Principal Component Analysis on the log-transformed and scaled trait values, then by means of TPD framework101 and ‘TPD’ and ‘ks’ R packages102,103, we estimated cell-based functional richness (FRic, i.e. the amount of the functional space occupied by an assemblage 101). Since both PD and FRic are strongly dependent on species richness, we performed null model simulations to break this relationship 55 and to compute standardized effect sizes (SES) as SES = (Metricobs-mean(Metricnull))/SDnull. To obtain the null distribution, we randomized 1000 times the community composition of each cell preserving marginal totals by using the quasiswap algorithm in the R package ‘vegan’104. After having computed the SES, we centered and scaled to unit variance the three diversity indices (i.e., species richness, sesPD, sesFrich) for each group in order to obtain comparable range of variation and then we averaged them to calculate within-group z-Diversity only for the cells where all the three metrics were available. Finally, tetrapod z-Diversity was obtained as the arithmetic mean of within-group z-Diversity for each cell where all within-group z-Diversity values were available. We further computed the Pearson’s correlation (r) among all diversity facets by taking into account their spatial structure since all these metrics were measured on the same cells. Specifically, we used a modified t-test of spatial association105 implemented in the SpatialPack 106 R package to test the spatial association between z-Diversity and the three diversity metrics underlying it (zSR, zPD, zFRic) as well as the correlation among their original values (species richness, sesPD, sesFRic).
Drivers of diversity
Random forest (RF) is a machine learning algorithm consisting of an ensemble of classification or regression trees107. RF are well suited for modeling large-scale patterns, since they can deal with large amounts of data, prevent overfitting and multicollinearity, and perform well in presence of complex interactions or non-linear relationships108. RF are effectively used in different research fields such as climate modelling109, species conservation110 and landscape genetics111, among others. We build 5 models using z-Diversity as a function of environmental variables (one for tetrapod plus one for each individual taxonomic group) using the framework provided in the ‘ml3’112 and ‘mlr3spatiotempcv’113 R packages. We started building trees using the following parametrization: ntree = 500, mtry= 1, min.node.size = 1, sample.fraction = 0.6, which were later tuned using the ‘paradox’114 R package. Variable importance was determined by measuring the mean change in a loss function (i.e., Root Mean Square Error - RMSE) after variable permutations (N = 500) using ‘DALEX’ R package115. This method assumes that if a variable is relevant for a given model, we expect a worsening in model’s performance after randomly permuting its values (see116 for more technical details). In other words, this method asses variable importance as the loss in explanatory ability of the model when that variable is randomized. We also displayed marginal effects of different predictors by using partial dependence plots computed with the ‘iml’117 R package.
Spatial cross validation
Failing to account for spatial autocorrelation processes in ecology might lead to biased conclusions118,119 or to an overoptimistic evaluation of model predictive power120,121. For this reason, we performed an internal spatial cross-validation (spCV) splitting the data into training (70%) and validation set (30%). We created five spatially disjointed subsets (i.e., partitions) where we introduced a spatial distance between training and validation set so that these sets are more distant than they would be using random partitioning122. To perform the spCV, we used a nested resampling approach as described in ref.123, where outer resampling evaluated model performance while inner resampling performed tuning of model hyperparameters for each outer training set. Because nested resampling is computationally expensive, we selected 5 folds with 5 repetitions each to reduce the variance introduced by partitioning in outer resampling and 5 folds in inner resampling coupled with 50 evaluations of model settings.
Author contributions
E.T. and C.P.C. co-led and designed the study. E.T., A.T. and C.P.C. extracted and prepared the data, E.T. performed the statistical analyses. M.P. and D.N.B. contributed to the interpretation of results. E.T. led the writing of the manuscript with inputs from all the co-authors. All authors approved the submitted version.
Supplementary Information
Sensitivity analysis on trait imputation procedure for each taxonomic group. We simulated missing traits (100 repetitions) starting for a subset of species with complete trait data. We then randomly selected 10% of species assigning them the structure of missing values of a random species from the subset of species with missing trait values. Then we combined the three datasets (90% species with complete traits, 10% with simulated NA and the remaining species with non-complete trait information). Here we performed two imputation processes: one based solely on the variance-covariance structure of functional traits and another based on the phylogenetic information as described in the methods in the main text. For each dataset obtained, we then computed a functional space using a PCA on which we predicted the position of all species. For only the species with artificial NA, we evaluated the normalized root mean square error (NRMSE) between the original position in the functional space and the position calculated after trait imputation, expressed as the relative range of trait values in the corresponding PC axis.
Global Patterns of species richness (upper panels), sesPD (central panels) and sesFRic (lower panels). A-E-I) Amphibians, B-F-J) Birds, C-G-K) Mammals, D-H-L) Reptiles. Please note that species richness is expressed on logarithmic scale and the color scale is centered on the median value.
Boxplots showing the distributions of species richness, sesPD and sesFRic for each realm. Please note that sesPD and sesFRic represents standardized effect sizes of the original metric.
Variable importance ranked by the RMSE loss after permutations (left panel) and marginal effects of the different predictors (right panel) of the random forest model using mammal zDiversity as response variable. ClimVar and ClimVel represented the average rate of change during the time-series (expressed in °C/century and m/yr, respectively) since Last Glacial Maximum. BiomeShan described the variation in biome patterns over the last 140 ka expressed using the Shannon index. SoilRH, PET, and TreeCovFract2019 represented soil humidity, Potential Evapotranspiration and forest cover updated to 2019, respectively. LULC expresses the fraction of grid cell under anthropogenic land use since 8000 BC, while HFP2009 is the 2009 Human Footprint index.
Variable importance ranked by the RMSE loss after permutations (left panel) and marginal effects of the different predictors (right panel) of the random forest model using amphibians zDiversity as response variable. ClimVar and ClimVel represented the average rate of change during the time-series (expressed in °C/century and m/yr, respectively) since Last Glacial Maximum. BiomeShan described the variation in biome patterns over the last 140 ka expressed using the Shannon index. SoilRH, PET, and TreeCovFract2019 represented soil humidity, Potential Evapotranspiration and forest cover updated to 2019, respectively. LULC expresses the fraction of grid cell under anthropogenic land use since 8000 BC, while HFP2009 is the 2009 Human Footprint index.
Variable importance ranked by the RMSE loss after permutations (left panel) and marginal effects of the different predictors (right panel) of the random forest model using reptilian zDiversity as response variable. ClimVar and ClimVel represented the average rate of change during the time-series (expressed in °C/century and m/yr, respectively) since Last Glacial Maximum. BiomeShan described the variation in biome patterns over the last 140 ka expressed using the Shannon index. SoilRH, PET, and TreeCovFract2019 represented soil humidity, Potential Evapotranspiration and forest cover updated to 2019, respectively. LULC expresses the fraction of grid cell under anthropogenic land use since 8000 BC, while HFP2009 is the 2009 Human Footprint index.
Variable importance ranked by the RMSE loss after permutations (left panel) and marginal effects of the different predictors (right panel) of the random forest model using avian zDiversity as response variable. ClimVar and ClimVel represented the average rate of change during the time-series (expressed in °C/century and m/yr, respectively) since Last Glacial Maximum. BiomeShan described the variation in biome patterns over the last 140 ka expressed using the Shannon index. SoilRH, PET, and TreeCovFract2019 represented soil humidity, Potential Evapotranspiration and forest cover updated to 2019, respectively. LULC express the fraction of grid cell under anthropogenic land use since 8000 BC, while HFP2009 is the 2009 Human Footprint index.
Comparison of phylogenetic diversity values calculated with a maximum clade credibility (PDMCC) tree and PD calculated averaging the values from 100 trees selected from the posterior distribution of mammals and birds phylogenies (PDsim), red line represents the perfect fit. In both groups, PD values across assemblages were very similar regardless of the method used (Spearman’s ρ > 0.99). We conclude that using a MCC tree should not affect our results.
Pearson’s correlations between diversity metrics in each taxonomic groups. All the correlations were spatially corrected.
Pearson’s correlation between zDiversity of each taxonomic group and for all tetrapod and the related diversity metrics obtained after centering and scaling to unit variance species richness (zSR), sesPD (zPD) and sesFRic (zFRic). Please note that overall zDiversity was calculated as the arithmetic mean among zSR, zPD and zFRic. All the correlations were spatially corrected.
Median diversity metric scores for each taxonomic group and for all tetrapod and the relative coverage in terms of number of species. SR, PD and FRic represent the median value of species richness, phylogenetic diversity and functional diversity (expressed as functional richness), respectively. of the cells. Please note that for mammals and birds PD was derived using a Maximum Credibility Tree.
Acknowledgments
E.T., C.P.C., A.T. and M.P. were supported by the Estonian Ministry of Education and Research (PSG293, IUT20-29, PRG609, and PSG505) and the European Regional Development Fund (Centre of Excellence EcolChange).
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.
- 30.
- 31.
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵
- 120.↵
- 121.↵
- 122.↵
- 123.↵