Integrating global patterns and drivers of tree diversity across a continuum of spatial grains

What drives biodiversity and where are the most biodiverse places on Earth? The answer critically depends on spatial scale (grain), and is obscured by lack of data and mismatches in their grain. We resolve this with cross-scale models integrating global data on tree species richness (S) from 1338 local forest surveys and 287 regional checklists, enabling estimation of drivers and patterns of biodiversity at any desired grain. We uncover grain-dependent effects of both environment and biogeographic regions on S, with a positive regional effect of Southeast Asia at coarse grain that disappears at fine grains. We show that, globally, biodiversity cannot be attributed to purely environmental or regional drivers, since regions are environmentally distinct. Finally, we predict global maps of biodiversity at two grains, identifying areas of exceptional species turnover in China, East Africa, and North America. Our cross-scale approach unifies disparate results from previous studies regarding environmental versus biogeographic predictors of biodiversity, and enables efficient integration of heterogeneous data.

2 on global distribution of tree biodiversity, yet the lack of methods to address differences in sampling 25 have so far prevented their integration. 26 Further, as could be said for many problems in ecology, attempts to map global biodiversity and to 27 assess its potential drivers are severely complicated by the issues of spatial scale 8-11 : The most 28 straightforward issue is the non-linear increase of number of species (S) with area 12 , which is why 29 patterns of biodiversity cannot be readily inferred from sampling locations of varying area. The second 30 issue concerns sets of sampling locations that do have a constant area (hereafter grain); even then a 31 spatial pattern of S observed at a small grain may differ from a pattern at large grain 13-15an example 32 is grain-dependence of latitudinal diversity gradient 16 [but see ref 17 ]. The reason is that beta diversity 33 (the ratio between fine-grain alpha diversity and coarse-grain gamma diversity) varies over large should ideally be studied, mapped, and explained at multiple grains 14 . 39 Although the abovementioned scaling issues are well-known 13,19,24,25 , methods are lacking that 40 explicitly incorporate grain-dependence within a single model, allowing cross-grain inference and 41 predictions. Furthermore, it is still common to report patterns and drivers of biodiversity at a single 42 grain, resulting in pronounced mismatches of spatial grain among studies, and hindering synthesis. An 43 example is the debate over whether biodiversity is more associated with regional proxy variables for 44 macroevolutionary diversification and historical dispersal limitation, or with ecological drivers that 45 include climatic and other environmental drivers, as well as biotic interactions [25][26][27][28][29] . While climate and 46 other ecological factors usually play a strong role [but see ref 30 ], studies differ in whether they view 47 residual regional forces being weak 31-33 or strong [34][35][36] . Even within the same group of organisms -48 treesthere is debate regarding whether environment 23,37-40 or regional history 41-44 drive global 49 patterns. And yet, these studies are rarely done at a comparable spatial grain, and perhaps not 50 surprisingly, studies from smaller plot-scale analyses 39,40 typically conclude a strong role for 51 environmental variation, whereas large-grain analyses 43,45 show a strong role of historical 52 biogeographic processes.

53
Here, we propose a cross-grain approach that allows estimation of contemporary environmental and 54 regional predictors, as well as global patterns, of tree species richness across a continuum of grains, 55 from plots of 10 x 10 m 2 up to the entire continents. Our study has three main goals: (i) by explicitly 56 considering spatial grain as a modifier of the influence of ecology versus regional biogeography, we 57 aim to synthesize results among studies, and illustrate how the importance of these processes varies 58 with grain. Apart from the well-known grain-dependent effects of environment, we also focus on the 59 so far overlooked grain-dependent effects of biogeographic regions. (ii) The novelty of the approach is 60 to model grain-dependence of every predictor (spatial, regional, or ecological) within a single model 61 as having a statistical interaction with area, which enables integration of an unprecedented volume of 62 heterogeneous data from local surveys and country-wide checklistsalthough such interaction has 63 been occasionally tested 16,17,36 , to our knowledge it has not been applied to both spatial and 64 environmental effects, nor for data integration and cross-grain predictions. (iii) We take the advantage 65 of being able to predict biodiversity patterns at any arbitrarily chosen grain and we map the estimates 66 of alpha, beta, and gamma diversity of trees across the entire planet.

Results and Discussion
Macroecological patterns. To explain the observed global variation of tree diversity ( Fig. 1 Fig. 1). This is in line with other studies from large geographical extents, where 70-75 90% model fits are common even for relatively simple climate-based models 23,40,46-48 . 76 Next, we used model SMOOTH to predict patterns of S and beta diversity over the entire mainland, at 77 a regular grid of large hexagons of 209,903 km 2 and at a grid of local plots of 1 ha ( Fig. 2A-C). We  Grain-dependent effects of region. Although model REALM treats the regional biogeographic 89 effects on S as discrete, while model SMOOTH treats them as continuous, both models reveal similar 90 grain-dependence of these regional effects. At the coarse grains (i.e. in larger regions), model REALM 91 shows that the anomaly of S that is independent of environment (and thus attributed to the effect of 92 4 regions) is highest in the Indo-Malay region, followed by parts the Neotropics, Australasia, and 93 Eastern Palaearctic (Fig. 3). Similar pattern emerges at the coarse grain from model SMOOTH, where 94 particularly China, and Central America to some degree, are hotspots of environmentally-independent 95 S (i.e., strong effects of biogeographic regions) (Fig. 2D). This follows the existing narrative 44,46 96 where tree diversity is typically highest, and anomalous from the climate-driven expectation, in eastern 97 Asia. However, at the smaller plot grain, a different pattern emerges in both the REALM (Fig. 3) and 98 SMOOTH (Fig. 2E) models: the regional biogeographic effects are present, but weaker. Further, they 99 shift away from the Indo-Malay and the Neotropical regions (REALM model) or China and Central 100 America (SMOOTH model) at the coarse grains towards the equator, particularly to Australasia, at the 101 plot grain (Fig. 2F, 3).

102
These results can be viewed through the logic of species-area relationship (SAR), and its link to alpha, historically accumulated species that are spatially segregated with relatively small ranges, for example 112 due to allopatric speciation 44 , climate refugia [as in Europe 54 ], or due to dispersal barriers and/or 113 large-scale habitat heterogeneity 44 . This would lead to increased regional richness but contribute less 114 to local richness, leading to stronger regional effects at larger than smaller grains, as we observed. 115 We also found pronounced autocorrelation in the residuals of the REALM model at the country grain, 116 but low autocorrelation at both grains in the residuals of model SMOOTH ( Supplementary Fig. 5).

117
Residual autocorrelation in S is the spatial structure that was not accounted for by environmental 118 predictors; it can emerge as a result of dispersal barriers or particular evolutionary history in a given 119 location or region 55,56 . The autocorrelation in REALM residuals thus indicates that the discrete 120 biogeographical regions (Fig. 3A) fail to delineate areas with unique effects on S; these are better 121 derived directly from the data, for example using the splines in model SMOOTH (Fig. 2D, E). As 122 such, the smoothing not only addresses a prevalent nuisance [i.e. biased parameter estimates due to 123 autocorrelation 57 ], but can also be used to delineate the regions relevant for biodiversity more 124 accurately than the use of á priori defined regions. and environment at the global extent in plants. However, the latter study lacked data from local plots 132 (i.e. had a limited range of areas). We detected the clearest grain dependence in the effect of Gross 133 Primary Productivity (GPP, a proxy for energy input) and Tree density (Fig. 4); both effects decrease 134 with area. The reason is that, as area increases, large parts of barren, arid, and forest-free land are 135 included in the large countries such as Russia, Mongolia, Saudi Arabia, or Sudan, diluting the 136 importance of the total tree density at large grains.

137
Further, we failed to detect an effect of elevation span at fine grain, but it emerged at coarse grains 138 (Fig. 4). This is in line with other studies 21,22 , and it shows that topographic heterogeneity is most 139 important over large areas where clear barriers (mountain ranges and deep valleys) limit colonization 140 and promote diversification 58 . Also note the wide credible intervals (i.e. high uncertainty) around the 141 effects of islands and most of the climate-related variables across grains (Fig. 4). A likely source of 142 this uncertainty is the collinearity between environmental and regional predictors (see below). This  Regions vs environment. We used deviance partitioning 59,60 to assess the relative importance of 150 biogeographic regions versus environmental conditions in explaining the variation of S across grains.

151
At the global extent, the independent effects of biogeographic realms strengthened towards coarse 152 grain, from 5% at the plot grain to 20% for country grain in model REALM (Fig. 5A). In contrast, the 153 variation of S explained uniquely by environmental conditions (around 14%, Fig. 5A) showed little 154 grain dependence. However and importantly, at both grains, roughly 50% of the variation of S is 155 explained by an overlap between biogeographic realms and environment, and it is impossible to tease 156 these apart due to the collinearity between them. In other words, biogeographic realms also tend to be 6 versa. Thus, we caution interpretations of analyses such as ours and others 30,31,33,46,61 inferring the 164 relative magnitude biogeographic versus environmental effects merely from contemporary 165 observational data. 166 Given this covariation, we cannot clearly say whether environment or regional effect are more 167 important in driving patterns of richness. We can, however, make statements about the grain 168 dependence of both environment and region, as above. The climate-realm collinearity is likely  (Fig. 4) and biogeographic realms (Fig. 3), but there remains enough certainty about the 171 effects of some predictors, such as tree density or GPP (Fig. 4), which are more orthogonal to climate 172 and regions.

173
To overcome the global collinearity problem and to better answer the classical question of whether 174 diversity is more influenced by historical or contemporary processes, we suggest the following  Implications. We have compiled a global dataset on tree species richness, and used it to integrate 185 highly heterogeneous data in a model that contains grain-dependence as well as spatial autocorrelation, 186 and predicts hotspots of biodiversity across grains that span 11 orders of magnitude, from local plots 187 to the entire continents. This is an improvement of data, methods, and concepts, and importantly, we 188 reveal a critical grain-dependence in the both regional and environmental predictors. We propose that smaller-grained data tend to find strong influence of environment 39,40 , whereas those that use larger-193 grained data find strong effect historical biogeography 43,45 . We reconcile this with a grain-explicit 194 analysis and show that smaller-grain (alpha-diversity) patterns are less strongly influenced by regional 195 biogeography than larger-grained (gamma-diversity) patterns. Finally, we suggest that the advantages 196 of having a formal statistical way to directly embrace grain dependence are twofold: Not only it will 197 allow ecologists to test grain-explicit theories, but it is precisely the same grain dependence that will 198 also allow integration of heterogeneous, messy, and haphazard data from various taxonomic groups, 199 especially the data deficient ones. This is desperately needed in the field that has restricted its global 200 focus to a small number of well-surveyed taxa.

Methods
The complete data and R codes used for all analyses are available under CC-BY license in a GitHub 202 repository at https://github.com/petrkeil/global_tree_S. Extended description of methods is in SI Text.  Table 1). For each plot we also noted minimum DBH that was used as a criterion to 229 8 include tree individuals in a study. All continuous predictors were standardized to 0 mean and unit 230 variance prior to the statistical modelling.

231
Cross-grain models. Our core approach is that 'grain dependence' of an effect of a predictor can be 232 modelled as a statistical interaction between the predictor and area. For example, imagine a linear 233 relationship between species richness and temperature , defined as = + . Now let us assume 234 that the coefficient also depends linearly on area (grain) as = + ; by substitution we get  Model REALM. This model follows the traditional approach to assess regional effects on S, that is, where REALM is a factor identifying the regions.

254
Model SMOOTH. In this model we avoid using discrete biogeographic regions; instead, we use thin- . (2)

259
The notation is the same as in the previous model, with the exception of and now being constant,