MoB (Measurement of Biodiversity): a method to separate the scale-dependent effects of species abundance distribution, density, and aggregation on diversity change

Little consensus has emerged regarding how proximate and ultimate drivers such as productivity, disturbance, and temperature may affect species richness and other aspects of biodiversity. Part of the confusion is that most studies examine species richness at a single spatial scale and ignore how the underlying components of species richness can vary with spatial scale. We provide an approach for the measurement of biodiversity (MoB) that decomposes changes in species rarefaction curves into proximate components attributed to: 1) the species abundance distribution, 2) density of individuals, and 3) the spatial arrangement of individuals. We decompose species richness by comparing spatial and nonspatial sample- and individual-based species rarefaction curves that differentially capture the influence of these components to estimate the relative importance of each in driving patterns of species richness change. We tested the validity of our method on simulated data, and we demonstrate it on empirical data on plant species richness in invaded and uninvaded woodlands. We integrated these methods into a new R package (mobr). The metrics that mobr provides will allow ecologists to move beyond comparisons of species richness in response to ecological drivers at a single spatial scale towards a dissection of the proximate components that determine species richness across scales.


Introduction 50
Species richnessthe number of species co-occurring in a specified areais one of the most 51 widely-used biodiversity metrics. However, ecologists often struggle to understand the 52 mechanistic drivers of richness, in part because multiple ecological processes can yield 53 qualitatively similar effects on species richness (Chase and Leibold 2002, Leibold and Chase 54 2017). For example, high species richness in a local community can be maintained either by 55 species partitioning niche space to reduce interspecific competition (Tilman 1994), or by a 56 balance between immigration and stochastic local extinction (Hubbell 2001). Similarly, high 57 species richness in the tropics has been attributed to numerous mechanisms such as higher 58 productivity supporting more individuals, higher speciation rates, and longer evolutionary time 59 since disturbance (Rosenzweig 1995). 60 Although species richness is a single metric that can be measured at a particular grain 61 size or spatial scale, it summarizes the underlying biodiversity information that is contained in 62 the individual organisms, each of which are assigned to a particular species, Operational 63 Taxonomic Unit, or other taxonomic grouping. Variation in species richness can be decomposed 64 dependent shifts in species richness and its components: a simple-to-interpret two-scale analysis 138 and a more informative but necessarily more complex multi-scale analysis. 139 The two-scale analysis provides a big-picture view of the changes between the treatments 140 by focusing exclusively on the α (plot-level) and γ (across all plots) spatial scales. It provides 141 diagnostics for whether species richness and its components differ between treatments at the two 142 scales. The multi-scale scale analysis expands the two-scale analysis by taking advantage of 143 three distinct types of rarefaction curves: 1) spatially constrained, sample-based rarefaction 144 (Chiarucci et al. 2009), where the order in which plots are sampled depends on their spatial 145 proximity (these are referred to as species accumulation curves in Gotelli and Colwell 2001); 2) 146 the non-spatial, sample-based rarefaction, where individuals are randomly shuffled across plots 147 within a treatment while maintaining average density in the plots; and 3) the individual-based 148 rarefaction curve where again individuals are randomly shuffled across plots within a treatment, 149 but in this case average plot density is not maintained (Hurlbert 1971, Gotelli andColwell 2001). 150 The differences between these curves are used to isolate the effects of the SAD, density of 151 individuals, and spatial aggregation on richness and document how these effects change as a 152 function of scale. This is true when considering both spatially contiguous and non-contiguous sampling designs. 160 Theoretical treatments of the species-area relationship established the connection between the 161 number of individuals and sampled area (Arrhenius 1921, Williams 1943, Hubbell 2001, Harte 162 2011. The MoB framework uses this connection to synthesize the information provided by the 163 three types of rarefaction curves which differ only in how they define sampling effort (i.e., 164 scale). 165

Mathematical nomenclature 166
Consider T = 2 treatments, with K replicated plots per treatment (See supplemental Table  167 S1.1). Within each plot, we have measured the abundance of each species, which can be denoted 168 by nt,k,s, where t = 1, 2 for treatment, k = 1, 2, … K for plot number within the treatment, and s = 169 1, 2, … S for species identity, with a total of S species recorded among all plots and treatments. 170 The experimental design does not necessarily have to be balanced (i.e., K can differ between 171 treatments) so long as the spatial extent is similar between the treatments. However, it is 172 important to note that inferences can only be made over a common number of individuals or 173 samples, so large differences in sampling effort will decrease the range of scales useful for 174 assessing treatment effects (but this can be remedied by standardization of sampling among 175 treatments post-hoc). For simplicity of notation we describe the case of a balanced design here. 176 St,k is the number of species observed in plot k in treatment t (i.e., number of species with nt,k,s > 177 0), and Nt,k is the number of individuals observed in plot k in treatment t (i.e., , = ∑ , , ). 178 The spatial coordinates of each plot k in treatment t are xt,k and yt,k. We focus on spatial patterns 179 but our framework also applies analogously to samples distributed through time. 180 For clarity, we focus here on a single-factor design with two (or more) categorical 181 treatment levels. The method can and will be extended to accommodate crossed designs and

Two-scale analysis 184
The two-scale analysis provides a simple decomposition of species richness while still 185 emphasizing the three components and change with spatial scale. In the two-scale analysis, we 186 compare observed species richness in each treatment and several other summary statistics at the 187 α and γ scales (Supplemental Table S2). The summary statistics were chosen to represent the 188 most informative aspects of individual-based rarefaction curves (Fig. 1). Individual-based 189 rarefaction curves plot the expected species richness Sn against the number of individuals when 190 individuals are randomly drawn from the sample at the α or γ scales. The curve can be calculated 191 precisely using the hypergeometric sampling formula given the SAD (nt,k,s at the α-scale, nt,+,s at 192 the γ-scale) (Hurlbert 1971). 193 We show how several widely-used diversity metrics are represented along the individual 194 rarefaction curve, corresponding to α and γ scales ( The treatment effect on these metrics can be visually examined with boxplots (see 232 Empirical example section) at the α-scale and with single points at the pooled γ-scale (unless 233 there is replication at the γ-scale as well). Quantitative comparison of the metrics can be made 234 with t-tests (ANOVAs for more than two treatments) or, for highly skewed data, nonparametric 235 tests such as Mann-Whitney U test (Kruskal-Wallis for more than two treatments). 236 We provide a non-parametric, randomization test where the null expectation of each 237 metric is established by randomly shuffling the plots between the treatments, and recalculating 238 the metrics for each reshuffle. The significance of the differences between treatments can then be 239 evaluated by comparing the observed test statistic to the null expectation when the treatment IDs 240 are randomly shuffled across the plots (Legendre and Legendre 2012). When more than two 241 groups are compared, the test examines the overall group effect rather than specific group 242 differences. At the α-scale where there are replicate plots to summarize over, we use the 243 ANOVA F-statistic as our test statistic (Legendre and Legendre 2012), and at the γ-scale in 244 which we only have a single value for each treatment (and therefore cannot use the F-statistic) 245 the test statistic is the absolute difference between the treatments (if more than two treatments 246 are considered then it is the average of the absolute differences, ̅ ). We use ̅ as a measure of 247 effect size at both scales. 248 Note that Nt,k and Nt,+ give the same information, because one scales linearly with the 249 other by a constant (i.e., Nt,+ is equal to Nt,k multiplied by the number of plots K within 250 treatment). However, the other metrics (S, Sn, and SPIE) are not directly additive across scales. 251 Evaluation of these metrics at different scales may yield different insights for the treatments, 252 sometimes even in opposite directions (Chase et al. 2018). However, complex scale-dependence 253 may require comparison of entire rarefaction curves (rather than their two-scale summary 254 statistics) to understand how differences in community structure change continuously across a 255 range of spatial scales. 256

Multiscale analysis 257
While the two-scale analysis provides a useful tool with familiar methods, it ignores that scale 258 itself is not discrete, but rather is a continuum. Such a discrete scale perspective can thus only 259 provide a minimal view of the scaling relationships of treatment differences. In this section, we 260 develop a method to examine the components of change across a continuum of spatial scale. We 261 define spatial scale by the amount of sampling effort (i.e., the number of individuals and/or the 262 number of plots sampled; see Scale and sampling effort subsection)

13
The three curves 264 The key innovation is to use three distinct types of rarefaction curves that capture different 265 components of community structure. By a carefully sequenced analysis, it is possible to tease 266 apart the effects of SAD shape, of changes in density of individuals (N), and of spatial 267 aggregation across a continuum of spatial scale. The three types of curves are summarized in 268 Table 1. Fig. 2 shows how they are constructed graphically. 269 The first curve is the spatial, sample-based rarefaction (sSBR) (spatially-constrained 270 rarefaction Chiarucci et al. 2009). It is constructed by accumulating plots sampled within a 271 treatment based on their spatial position such that the most proximate plots are collected first. 272 One can think of this as starting with a target plot and then expanding a circle centered on the 273 target plot until one additional plot is added, then expanding the circle until another plot is added, 274 etc. In practice, every plot is used as the starting target plot and the resulting curves are averaged 275 to give a smoother curve. If two or more plots are of equal distance to the target plot, they are 276 accumulated in random order. 277 The second curve is the non-spatial, sample-based rarefaction curve (nsSBR, Supplement 278 S3). It is constructed by randomly sampling plots within a treatment, but in which the individuals 279 in the plots have first been randomly shuffled among the plots within a treatment, while 280 maintaining the plot-level average abundance ( , ̅̅̅̅̅ ) and the treatment-level SAD (⃑ ,+ = 281 ∑ ⃑ , ). Note that this rarefaction curve is distinct from the traditional "sample-based rarefaction 282 curve" (Gotelli and Colwell 2001), in which plots are randomly shuffled to build the curve but 283 individuals within a plot are preserved (and consequently any within-plot spatial aggregation is 284 retained). The nsSBR contains much of the same information as the sSBR (plot density and 285 14 The third curve is the familiar individual-based species rarefaction (IBR) (Hurlbert 1971, 287 Gotelli and Colwell 2001). It is constructed by first pooling individuals across all plots within a 288 treatment, and then randomly sampling individuals without replacement. The shape of the IBR 289 reflects only the shape of the underlying SAD (⃑ ,+ ). 290 It can be computationally intensive to compute rarefaction curves, and therefore 291 analytical formulations of these curves are desirable to speed up computation. It is unlikely an 292 analytical formulation of the sSBR exists because it requires averaging over each possible 293 ordering of nearest sites; however, analytical expectations are available for the nsSBR and IBR. 294 Specifically, we used the hypergeometic formulation provided by Hurlbert (1971)  The sSBR includes all information in the data including effect of SAD, effect of density of individuals, and effect of spatial aggregation.
Nonspatial, sample-based rarefaction curve nsSBR E[ | , ⃑ ,+ , ⃑ ⃑ ] Random sampling of k plots after removing intraspecific spatial aggregation by randomly shuffling individuals across plots while maintaining average plot-level abundance ( , ̅̅̅̅̅ ) and the treatment-level SAD ( ,+, = ∑ , , ). We use an analytical extension of the hypergeometric distribution demonstrating this curve is a rescaling of the IBR based on the ratio: (average density across treatments) / (average density of treatment of interest) The nsSBR reflects both the shape of the SAD and the difference in density between the treatments. If density between the two treatments is identical then this curve converges on the IBR.   Table 1 for detailed description of each 315 rarefaction curve. 316 The mechanics of isolating the distinct effects of spatial aggregation, density, and SAD 317 The three curves capture different components of community structure that influence 318 richness changes across scales (measured in number of samples or number of individuals, both of 319 which can be easily converted to area, Table 1). Therefore, if we assume the components 320 contribute additively to richness, then the effect of a treatment on richness propagated through a 321 single component at any scale can be obtained by subtracting the rarefaction curves from each 322 other. For simplicity and tractability, we assume additivity to capture first-order effects. This 323 assumption is supported by Tjørve et al.'s (2008) demonstration that an additive partitioning of 324 richness using rarefaction curves reveals random sampling and aggregation effects when using 325 presence-absence data. We further validated this assumption using sensitivity analysis (see 326 Supplement S5). Below we describe the algorithm to obtain the distinct effect of each 327 component. Figure 5 provides a graphic illustration. 328

i) Effect of aggregation 329
The difference between the sSBRs of two treatments, ∆( 21 | , ⃑ ,+ , ⃑ ⃑ , ) = 330 E[ 2 | , ⃑ 2,+ , ⃑ ⃑ 2 , 2 ] − E[ 1 | , ⃑ 1,+ , ⃑ ⃑ 1 , 1 ], gives the observed difference in richness between 331 treatments across scales (Fig. 5a, d, purple shaded area and solid curve respectively). It 332 encapsulates the treatment effect propagated through all three components: shape of the SAD, 333 density of individuals, and spatial aggregation. Differences between treatments in any of these 334 factors could potentially translate into observed difference in species richness. 335 Similarly, the difference between the nsSBRs, ∆( 21 | , ⃑ ,+ , ⃑ ⃑ ) = E[ 2 | , ⃑ 2,+ , ⃑ ⃑ 2 ] − aggregation is removed (Fig. 5b, e, purple shaded area and dashed curve respectively). The 338 distinct effect of aggregation across treatments from one plot to k plots can thus be obtained by 339 taking the difference between the two ΔS values (Fig.5g,  This simple duality can be extended to the estimation of the density and SAD effects, but we will 354 only consider the approach laid out in Eqn 1 below. In Fig. 5, we separate each individual effect 355 using the approach of Eqn 1 while the code in the mobr package uses the approach of Eqn 2. 356 One thing to note is that the effect of aggregation always converges to zero at the 357 maximal spatial scale (k = K plots) for a balanced design. This is because, when all plots have 358 been accumulated, ∆( 21 | , ⃑ ,+ , ⃑ ⃑ , ) and ∆( 21 | , ⃑ ,+ , ⃑ ⃑ ) will both converge on the 359 difference in total richness between the treatments. However, for an unbalanced design in which one treatment has more plots than the other, ∆( 21 |aggregation) would converge to a nonzero 361 constant because E[ | , ⃑ ,+ , ⃑ ⃑ , ] − E[ | , ⃑ ,+ , ⃑ ⃑ ] would be zero for one treatment but not 362 the other at the maximal spatial scale (i.e., min(K1, K2) plots). This artefact is inevitable and 363 should not be interpreted as a real decline in the relative importance of aggregation on richness, 364 but simply as the diminishing ability to detect aggregation without sampling a larger region. Note that in Eqn 3, spatial scale is defined with respect to numbers of individuals sampled (N)

iii) Effect of SAD: 382
The distinct effect of the shape of the SAD on richness between the two treatments is simply the 383 difference between the two IBRs (Fig. 5c, f,  The formulae used to identify the distinct effect of the three factors are summarized in Table 2. 390  This is estimated directly by comparing the IBRs between two treatments.

Significance tests and acceptance intervals 393
In the multi-scale analysis, we also applied Monte Carlo permutation procedures to 1) construct 394 acceptance intervals (or non-rejection intervals) across scales on simulated null changes in Although the logic justifying the examination separating the effect of the three components is 408 rigorous, we tested the validity of our approach (and the significance tests) by simulations using 409 the R package mobsim (May et al. 2018). The goal was to establish the rate of type I error (i.e., 410 detecting significant treatment effect through a component when it does not differ between 411 treatments) and type II error (i.e., nonsignificant treatment effect through a component when it 412 does differ). This was achieved by systematically comparing simulated communities in which we 413 altered one or more components while keeping the others unchanged (see Supplement S5). 414 Overall, the benchmark performance of our method was good. When a factor did not differ 415 between treatments, the detection of significant difference was low (Supplemental Table S5.1). 416 Conversely, when a factor did differ, the detection of significant difference was high, but 417 decreased at smaller effect sizes. Thus, we were able to control both Type I and Type II errors at 418 reasonable levels. In addition, there did not seem to be strong interactions among the components 419 the error rates remained consistently low even when two or three components were changed 420 simultaneously. The code to carry out the sensitivity analysis is archived here: 421 https://github.com/MoBiodiv/mobr/blob/master/scripts/mobr_sensitivity.R 422

An empirical example 423
In this section, we illustrate the potential of our method with an empirical example 424 presented in Powell et al. (2013). Invasion of an exotic shrub, Lonicera maackii, has caused 425 significant, but strongly scale-dependent, decline in the diversity of understory plants in eastern 426 Missouri (Powell et al. 2013). Specifically, Powell et al. (2013) showed that the effect size of the 427 invasive plant on herbaceous plant species richness was relatively large at plot-level spatial 428 scales (1 m 2 ), but the proportional effect declines with increasing windows of observations, with 429 the effect becoming negligible at the largest spatial scale (500 m 2 ). Using a null model approach, 430 the authors further identified that the negative effect of invasion was mainly due to the decline in 431 plant density observed in invaded plots. To recreate these analyses run the R code archived here: 432 https://github.com/MoBiodiv/mobr/blob/master/scripts/methods_ms_figures.R. 433 The original study examined the effect of invasion across scales using the slope and 434 intercept of the species-area relationship. We now apply our MoB approach to data from one of 435 their sites from Missouri, where the numbers of individuals of each species were recorded from 436 50 1-m 2 plots sampled from within a 500-m 2 region in the invaded part of the forest, and another 437 50 plots from within a 500-m 2 region in an adjacent uninvaded part of the forest. Our method 438 leads to conclusions that are qualitatively similar to the original study, but with a richer analysis 439 of the underlying components and their scale dependence. Moreover, our methods show that 440 invasion influenced both the SAD and spatial aggregation, in addition to density, and that these To identify whether the increase in evenness due to invasion was primarily because of 462 shifts in common or rare species, we examined ENS of PIE (SPIE) which is more sensitive to 463 common species relative to comparisons of S, which is more sensitive to rare species. At the α 464 scale, invasion did not strongly influence SPIE (Fig. 4g), but at the γ scale, there was evidence that 465 invaded sites had greater evenness in the common species (Fig. 4i, ̅ = 5.78, p = 0.001). In other 466 words, the degree of dominance by any one species was reduced in the invaded sites. 467 The β diversity metrics were significantly higher (Fig. 4b,e,h p = 0.001) in the invaded 468 sites, suggesting that invaded sites had greater spatial species turnover and thus were more 469 heterogeneous. These increases in spatial turnover appeared to be only slightly due to the sole 470 effect of increased spatial intraspecific aggregation in invaded sites as βSn displayed the most 471 modest effect size ( ̅ = 0.4). Therefore, it appears that the shift in SAD and decreased N also are 472 playing a role in increasing β-diversity in the invaded treatment as they are reflected in βS and 473

βSPIE. 474
Overall the two-scale analysis indicates: 1) that there are scale-dependent shifts in 475 richness, 2) that these are caused by invasion decreasing N, increasing evenness in common 476 species, and increasing species patchiness. 477 Applying the multi-scale analysis, we further disentangled the effect of invasion on interest. Figure 5a-c present the three sets of curves for the two treatments: the sSBR, in which 480 plots accumulate by their spatial proximity (Fig. 5a); the nsSBR, in which individuals are 481 randomized across plots within a treatment (Fig. 5b); and the IBR, in which species richness is 482 plotted against number of individuals (Fig. 5c). The MoB framework that we have introduced here provides a comprehensive answer to 533 this question by taking a spatially explicit approach and decomposing the effect of the condition 534 (treatment) on richness into its individual components. The two-scale analysis provides a big-535 picture understanding of the differences and components of richness by only examining the 536 single plot (α) and all plots combined (γ) scales. The multi-scale analysis expands the endeavor 537 to cover a continuum of scales, and quantitatively decomposes change in richness into three 538 components: change in the shape of the SAD, change in individual density, and change in spatial 539 aggregation. As such, we can not only quantify how richness changes at any scale of interest, but 540 also identify how the change occurs and consequently push the ecological question to a more 541 mechanistic level. For example, we can ask to what extent the effects on species richness are 542 driven by numbers of individuals. Or instead, whether common and rare species, or their spatial 543 distributions, are more strongly influenced by the treatments. 544 Here we considered the scenario of comparing a discrete treatment effect on species 545 richness, but clearly the MoB framework will need to be extended to other kinds of experimental 546 and observational designs and questions (Supplement S6). The highest priority extension of the 547 framework is to generalize it from a comparison of discrete treatment variables to continuous 548 drivers such as temperature and productivity. Additionally, we recognize that abundance is 549 difficult to collect for many organisms and that there is a need to understand if alternative 550 measures of commonness (e.g., visual cover, biomass) can also be used to gain similar insights.
Finally, we have only focused on taxonomic diversity here, whereas other types of 552 biodiversity-most notably functional and phylogenetic diversity-are often of great interest, 553 and comparisons such as those we have overviewed here would also be of great importance for 554 these other biodiversity measures. Importantly, phylogenetic and functional diversity measures 555 share many properties of taxonomic diversity that we have overviewed here (e.g., scale-556 dependence, non-linear accumulations, rarefactions, etc) (e.g., Chao et al. 2014), and it would 557 seem quite useful to extend our framework to these sorts of diversities. We look forward to 558 working with the community to develop extensions of the MoB framework that are most needed 559 for understanding scale dependence in diversity change and overcome the limitations of the 560 framework as currently implemented (Supplement S6). 561 MoB is a novel and robust approach that explicitly addresses the issue of scale-562 dependence in studies of diversity, and quantitatively disentangles diversity change into its three 563 components. Our method demonstrates how spatially explicit community data and carefully 564 framed comparisons can be combined to yield new insight into the underlying components of 565 biodiversity. We hope the MoB framework will help ecologists move beyond single-scale 566 analyses of simple and relatively uninformative metrics such as species richness alone. We view 567 this as a critical step in reconciling much confusion and debate over the direction and magnitude 568 of diversity responses to natural and anthropogenic drivers. Ultimately accurate predictions of 569 biodiversity change will require knowledge of the relevant drivers and the spatial scales over 570 which they are most relevant, which MoB (and its future extensions), helps to uncover. 571