Macroecological laws in experimental microbial communities

Ecology has historically benefited by characterizing statistical patterns of biodiversity within and across communities. This approach, encompassed by the discipline of macroecology, has achieved considerable success in microbial ecology in recent years. Macroecological approaches have identified universal patterns of diversity and abundance that can be captured by effective models that do not include explicit interactions between community members. Experimentation has simultaneously played a crucial role in the development of our understanding of the ecology of microbes, as the advent of highly replicated time-series has allowed researchers to investigate how ecological forces govern community dynamics. However, there remains a gap between microbial experiments performed in the laboratory and macroecological patterns documented in natural systems, as we do not know if and how experimental manipulations produce macroecological effects. Here, we work to bridge the gap between the experimental manipulation of communities and their macroecological consequences. Using high-replication time-series of experimental microbial communities, we demonstrate that macroecological laws observed in nature can be readily recapitulated in a laboratory setting and unified under the Stochastic Logistic Model of growth (SLM). We found that demographic manipulations and their effect on community-level variation can alter empirical patterns in a manner that diverges from predictions obtained from the SLM. By incorporating experimental details (e.g., number of migrants), we were able to restore the predictive capacity of the SLM by linking demographic manipulations with macroecological effects. Finally, we demonstrate the extent that experimental manipulations are capable of altering macroecological patterns under the SLM, establishing a demarcation between macroecological outcomes we can and cannot observe in a laboratory setting.

Microbial communities inhabit virtually every environment on Earth. Through their 2 ubiquity, abundance, and diversity, microorganisms regulate the biogeochemical 3 processes that sustain vast amounts of life. Furthermore, microbial communities play a 4 crucial role in maintaining the health of many macroscopic forms of life, including 5 humans, who have harnessed microbial communities to promote their own 6 well-being [1,2]. Given their environmental, medical, and economic importance, it is 7 necessary to develop a quantitative theory of ecology that allows researchers to explain, 8 1/32 maintain, and alter the properties of microbial communities. A challenge of this scale is 9 daunting, and the complexity of microbial communities has promoted the engagement of 10 researchers from various disciplines with distinct approaches. Of these approaches, there 11 are two that have substantially contributed towards our quantitative understanding of 12 microbial communities in recent years: macroecology and experimental ecology. 13 Historically, the field of ecology has achieved considerable success by characterizing 14 generalized statistical patterns of biodiversity, an approach known as 15 macroecology [3][4][5][6][7][8]. The macroecological approach is, fundamentally, statistical in 16 nature, allowing for quantitative predictions to be made about the typical features of 17 ecological communities without having to specify microscopic ecological forces. The 18 unencumbered nature of this approach has allowed researchers to successfully 19 characterize a diverse array of microbial ecological patterns [9][10][11][12][13][14][15][16][17] and spurred the 20 development of mathematical models of microbial ecology grounded in statistical 21 physics [18][19][20][21][22][23][24][25][26]. Through this approach, disparate patterns were recently unified by the 22 observation that the typical microbial community follows three macroecological laws: 1) 23 the abundance of a given community member across sites follows a gamma distribution, 24 2) the mean abundance of a given community member is not independent of its variance 25 (i.e., Taylor's Law [27,28]), and 3) the mean abundance of a community member across 26 sites follows a lognormal distribution [29]. These three universal laws can be captured 27 by an intuitive mathematical model of density-dependent growth with environmental 28 noise, the stochastic logistic model (SLM) of growth [29][30][31]. Building on its utility, the 29 SLM has been successfully extended to quantitatively capture additional empirical 30 microbial macroecological patterns. Examples that explicitly use the SLM include 31 attempts to capture measures of ecological distances and dissimilarities between 32 communities [32], alternative stable-states [33], patterns of richness and diversity at 33 coarse-grained taxonomic and phylogenetic scales [34], and dynamics within and across 34 human hosts at the sub-species level (i.e., strains) [17,35]. The results of these studies 35 demonstrate that a minimal mathematical model of ecological dynamics can capture a 36 broad assemblage of microbial macroecological patterns. 37 Alongside the study of macroecological patterns in observational data, 38 experimentation has played a crucial role in the documentation and manipulation of 39 ecological patterns [36,37]. The advent of 16S rRNA amplicon sequencing permits 40 current researchers to advance past experimental efforts by investigating the ecological 41 dynamics of a large number of microbial communities in a laboratory setting. Through 42 this controlled approach, researchers have begun to determine the extent that the 43 assembly of microbial communities is reproducible and predicable [38][39][40][41][42][43]. Surprisingly, 44 seemingly simple environments have the capacity to form highly dissimilar microbial mechanistic explanations of universal empirical macroecological patterns alongside their 61 characterization and prediction requires both approaches and is arguably a central goal 62 of microbial ecology. Furthermore, there remains the need to develop and incorporate 63 ecological theory to the study of variation in experimental microbial communities as 64 well as the extent that extrinsic and intrinsic ecological forces contribute to said 65 variation. By applying and extending macroecological theory to highly replicated 66 temporally resolved experiments, we can begin to approach a quantitative theory of 67 microbial ecology. 68 Here, we attempted to bridge the gap between the experimental manipulation of 69 ecological forces and their resulting macroecological effects. Using high-replication 70 time-series of experimental microbial communities, we first demonstrated that the 71 macroecological laws observed in nature can be recapitulated in a laboratory setting. 72 By connecting these patterns to the predictions of the SLM, we identified a reasonable 73 null model of microbial macroecology which can be used to predict treatment-specific 74 effects. Specifically, we focused on migration treatments that correspond to 75 mainland-island and fully-connected metacommunity models [52,53]. We examined the 76  glycerol and cryopreserved at -80°C.

100
DNA extraction was performed using a QIAGEN DNeasy 96 Blood and Tissue kit. 101 Library preparation for 16S rRNA amplicon sequencing of the V4 region was performed 102 as previously described [41] and PCR products were purified and normalized using the 103 SequalPrep PCR kit (Invitrogen). Sequencing was performed on an Illumina MiSeq 104 (2x250 bp paired-end) and raw reads were processed for demultiplexing and barcode, 105 index, and primer removal using QIIME v1.9 [54]. The number of communities 106 sequenced at each transfer for each treatment can be found in Table S1. In this study 107 we reprocessed all raw FASTQ data from the original study to obtain Amplicon 108 Sequence Variants (ASVs) using DADA2 [55]. We processed our data using the pooled 109 3/32 option inference option so that ASVs with an abundance of one in a given site (i.e., 110 singletons) could be inferred, allowing us to examine the entirety of the empirical 111 sampling distribution. The attractor status of a given replicate community was assigned 112 as previously described [44]. Additional details can be found in the original 113 manuscript [44]. 114 Testing the macroecological predictions of the stochastic logistic 115 model 116 The SLM is a stochastic differential equation that describes the logistic growth of a 117 population with a time-dependent growth rate. In this context, the SLM for the relative 118 abundance of the ith ASV (x i ) is defined as the following Langevin equation where 1 τi , K i , and σ i represent the intrinsic growth rate, carrying capacity, and the 120 coefficient of variation of growth rate fluctuations.

121
Environmental noise is modeled as the product of a linear frequency term (as 122 opposed to demographic noise, which would be modeled as √ x i ), the compound 123 parameter σi τi , and a Brownian noise term η(t) that introduces stochasticity. The 124 expected value of η(t) is η(t) = 0 and the dependence of η(t ) at time t on an earlier 125 time t is defined as η(t)η(t ) = δ(t − t ) [56]. Under this standard definition if the 126 noise term is shifted in time it has zero correlation with itself.

127
The SLM has recently been shown to explain three empirical macroecological laws 128 that hold across dissimilar environments as follows: 1) predicting a gamma distributed 129 abundance fluctuation distribution, 2) interpreting the power law relationship between 130 mean relative abundance and the variance of relative abundance (i.e., Taylor's Law) in 131 terms of a community member's carrying capacity, and 3) interpreting the lognormally 132 distributed mean relative abundance distribution in terms of underlying carrying 133 capacities [29]. To provide necessary context, we briefly summarize these three laws and 134 their connection to the SLM.

135
The Abundance Fluctuation Distribution

136
The stationary form of the Abundance Fluctuation Distribution (AFD) P * predicted by 137 the SLM can be derived from the stationary state of the partial differential equation 138 obtained from the Fokker-Planck equation corresponding to Eq. 1 (interpreted as an Itô 139 SDE) [56] 140 which is the predicted form of the AFD under the SLM. This distribution is a gamma 141 distribution with the following expected mean and squared coefficient of variation Defining empirical estimates of x i and x 2 i − xi 2 xi 2 asx i and β −1 i , we obtain a form 143 of the gamma that can be used to generate predictions By using the Poisson limit of a binomial sampling process, the sampling form of the 145 AFD has been derived [29]. The derivation allows us to calculate the probability of 146 obtaining n reads out of a total sampling depth of N for the ith ASV as By setting n = 0 we obtain the expression for the probability of not detecting an 148 ASV in a site from which we can derive the expectation of the fraction of M sites where an ASV is 150 found (i.e., its occupancy).
The relative error of our predictions was estimated as 152 ε = Obs. − Pred. Obs.
The difference in relative error between treatments was estimated as Taylor's Law 154 Taylor's Law describes the empirical relationship between the mean and variance of the 155 relative abundance [27,28] 156 When b = 1 the empirical coefficient of variation is constant across ASVs, which in 157 the context of the SLM means that the parameter controlling the strength of 158 environmental noise is constant (σ i = σ).

159
The Lognormally distributed Mean Abundance Distribution 160 We consider that the Mean Abundance Distribution (MAD) follows a lognormal To account for finite sampling, we use a modified form of the lognormal that 163 considers mean abundances with a coverage greater than c. 164

5/32
where θ(·) is the Heaviside step function. Parameters of this modified lognormal were 165 fit to the empirical MAD as previously described (see Supplementary Note 7 in [29]). 166 Simulating the SLM with migration 167 In experimental microbial communities, after a period of time resources will become 168 depleted and an aliquot of the community will be transferred to a new environment 169 with replenished resources. This process of serial dilution introduces boom-bust cycles 170 of microbial growth. To account for this experimental detail, we considered a piece-wise 171 form of the SLM to describe the dynamics within the kth transfer where the dynamics occur over a period t ∈ [0, T ]. A transfer then occurs and the 173 dynamics start anew.

174
The time-dependent form of Eq. 13 can be derived (i.e., P (x i (t)|x i (t 0 ), t 0 ) and could 175 feasibly be extended to incorporate the effect of migration on initial conditions [57] 176 (description of solution in S2 Text). However, there are additional experimental details 177 and observations we would like to incorporate that can difficult to investigate 178 analytically. 179 We begin by modeling the relative abundance at the start of a given transfer cycle. 180 The abundances of ASVs at the start of the kth transfer cycle can be modeled as a 181 multinomial sampling process Where the vector of S abundances at the start of the kth transfer cycle ( n (k) (0)) are 183 drawn from the progenitor community for the first transfer cycle (k = 0) and are then 184 drawn from the previous transfer at T = 48 hr. for all subsequent transfer cycles 185 (k > 0). 186 We then turn to modeling migration manipulations performed by the experimentor. 187 Despite their different forms, both global and regional migration treatments are similar 188 in that migrants are not supplied at a constant rate within a given transfer cycle 189 (stationary distribution of the SLM with constant migration derived in S1 Text).

190
Rather, migration is manipulated by adding an aliquot at the start of each of the first 191 twelve transfer cycles. This experimental detail means that both regional and global 192 migration treatments alter the initial abundances of community members, which 193 subsequently grow according to the SLM. Our task then is to model the initial where abundances of migrants can again be modeled as a multinomial sampling 202 process 203 Pr n where n (k) migration (0) = 0 for k > 12. 204 We can then examine how the typical initial relative abundance of a given ASV 205 depends on the dilution rate of transfers. We begin by noticing that the final total 206 abundance of a community does not considerably vary from transfer to transfer ). Using the above definitions and dividing ASV 208 abundances by the total abundances to obtain relative abundances (T )N * (T )+D regional x i,regional N regional D transfer N * (T )+D regional N regional , Regional where parameters were set by the experimenter and are detailed in Table 1.

211
Beyond migration, there are dependencies between the final abundance of an ASV 212 and its abundance in the progenitor community that we would like to incorporate. The 213 set of K i was drawn from the parameters obtained by fitting Eq. 12 to the empirical 214 MAD. Dependence between x regional and the MAD, which is proportional to K, was 215 evaluated by performing logistic regression from scikit-learn v0.22.1 [58], obtaining 216 the formula where a is the intercept and b is the slope of the regression. From this relationship 218 we can model Pr[K i > 0|x i,progenitor ], allowing us to define the full distribution of 219 carrying capacities as follows The total number of ASVs with K i > 0 was drawn from the distribution communities relative to the progenitor.

224
The task of grafting these empirical considerations onto the time-dependent analytic 225 solution of the SLM is not straightforward. Therefore, the incorporation of experimental 226 7/32 details required the use of a simulation procedure. We obtain such a model by using an 227 approximation of the numerical solution of the SLM within each transfer cycle which 228 incorporates the above detail while requiring only two global parameters: σ and τ , the 229 values of the latter being constrained by the amount of growth that can possibly occur 230 over 48 hr (S3 Text). We set σ as a constant due to the existence of Taylor's Law and τ 231 as a constant due to the unknown nature of the distribution of growth rates. We used 232 logarithmically spaced parameter values from the following ranges: σ ∈ [0.01, 1.9] and 233 τ ∈ [1.7, 6.9]. The upper bound on the range of σ was set by the observation that 234 x i = 0 for σ ≤ 2. The range on τ was set by the observation that the serial dilution 235 factor sets the total number of generations per-transfer as log 2 (D −1 transfer ) ∼ 7, assuming 236 exponential growth. This translates to a maximum generation time of 237 τ max = 48hr./7 ∼ 6.9hr.. We identified a reasonable bound on the minimum generation 238 time by noticing that prior research efforts have established that these communities 239 have a maximum growth rate of ∼ 0.6 hr. −1 , translating to a minimum generation time 240 of τ min ≡ 0.6 −1 ≈ 1.7hr. [44].

241
Simulated relative abundances were calculated from simulated true abundances and 242 the process of sampling was modeled as a multinomial process The total number of reads N reads were sampled from the empirical distribution of 244 total read counts with replacement ( Fig. S7).

245
Statistical analyses 246 We identified appropriate statistics to capture macroecological changes specific to each 247 migration treatment. Standard tests were used when appropriate (e.g., change in slopes 248 of Taylor's Law), but two additional statistics were used to assess patterns relating to 249 change in correlation coefficients for the regional migration treatment and the change in 250 the CV of relative abundances for the global migration treatment due to the cessation 251 of migration manipulations. The change in correlation coefficients (ρ) between transfers 252 12 and 18 was assessed using Fisher's Z statistic [59] 253 where z t = log 1+ρt 1−ρt , M t is the number of replicate communities at transfer t, and 254 the denominator represents the standard error of the numerator.

255
The difference in CVs between transfers 12 and 18 was calculated using the following 256 previously described F CV statistic [60] 257 All regressions were performed using ordinary least squares regression with SciPy 266 v1.4.1.

268
Using the results of a high-replication community assembly experiment, we examined 269 the extent that universal patterns of microbial macroecology from observed data could 270 be recapitulated in a laboratory setting. Focusing on forms of migration corresponding 271 to island-mainland and metacommunity dynamics, we examined the extent that Distribution across independent sites (MAD) follows a lognormal distribution [29].

287
These three patterns can be unified through the lens of a mean-field model, the 288 Stochastic Logistic Model of growth [29].

289
To determine whether these patterns occur in an experimental system, we elected to 290 examine the microbial communities from a prior experiment where qualitatively 291 different forms of migration were manipulated [44]. Here a community grown from an 292 environmental sample was divided among a large number (∼ 100) of identical 293 environments containing a defined media with glucose as the sole source of carbon and 294 transferred after 48 hr. of growth 18 times. We first assessed the variation that emerged 295 in experimental communities. The average relative abundance varied over four orders of 296 magnitude across treatments and different transfers, meaning that a considerable degree 297 of community-level variation was maintained in the experimental communities. By 298 examining the relationship between the mean and variance of relative abundance, we 299 found that the two measures clearly follow a similar relationship on a log-log scale 300 across treatments, suggesting that Taylor's Law applies [27] (Fig. 1). Under Taylor's 301 Law a slope of two implies that the coefficient of variation (CV) remains constant 302 across ASVs. By plotting this slope value, we found that it reasonably captured the 303 data. Fitting a slope to each treatment for each transfer, we found that the mean slope 304 is 2.08 ± 0.057, suggesting that despite the variation in typical abundance the CV of 305 relative abundances remained roughly constant across ASVs. 306 We then turned towards examining the full distribution of abundances across 307 replicate communities, known as the Abundance Fluctuation Distribution (AFD). To 308 9/32 facilitate comparisons, we rescaled the logarithm of the AFD for each ASV by its mean 309 and variance (i.e., the standard score). We found that the rescaled AFDs tended to 310 overlap across treatments, implying that despite differences in experimental details, the 311 general shape of the AFD remained invariant. We found that the bulk of the 312 distributions generally followed the gamma distribution predicted by the SLM (Eq. 2, 313 Fig. 2a), though there remained a long tail which served as motivation for comparing 314 the AFDs of different treatments. 315 We then turned to the fraction of communities where a given ASV was found, a 316 measure known as occupancy. Assuming that ASVs are sampled as a Poisson process, 317 the distribution of observed read counts can be analytically derived with a gamma AFD, 318 from which a prediction of the expected occupancy can be obtained (Eq. 7). We found 319 that the predictions of the SLM generally held across treatments, with slight deviations 320 at high values of observed occupancies for treatments without migration and with 321 regional migration (Fig. S1a). This trend was reflected in the distribution of relative 322 errors of our prediction, where certain treatments appeared to have higher errors than 323 others (Fig. S1b). However, a paired inspection using ASVs that were present in both 324 treatments that underwent migration as well as the no migration control found that the 325 error did not considerably change due to migration (Fig. S1c). A permutation-based 326 test for the average change in the relative error of occupancy between treatments 327 revealed a significant, if slight, effect where migration reduced the error of our 328 predictions for both regional (∆ε regional = −0.236, P = 0.00140) and global migration 329 (∆ε global = −0.196, P = 0.0104). Regardless, we were able to predict the occupancy of 330 the typical ASV across different demographic treatments with high accuracy.

331
Leveraging this result, we investigated the relationship between the mean relative 332 abundance across replicates and the occupancy of an ASV, more commonly known as 333 the abundance-occupancy relationship [12,62,63]. We found that all treatments tended 334 to follow the same curve, which could be captured by binning and plotting the predicted 335 occupancy for a given empirical mean abundance (Fig. 2b). The existence of this 336 relationship in experimental communities is particularly striking, as it implies that the 337 probability of observing a given community member is primarily determined by 338 sampling effort, despite differences in experimental details. across-site extension of the observation that distribution of abundances within a single 343 microbial community can be captured by a lognormal [10]. While this approach is 344 difficult since the typical richness of a given treatment is ≈ 10, by pooling ASVs across 345 treatments and transfers we found that the empirical MAD could be captured by a 346 lognormal (Fig. 2c). A comparison of the two free parameters of the lognormal between 347 transfers 12 and 18 indicated that MADs approached a similar shape after the cessation 348 of migration manipulations (Fig. 2d). This result suggests that the MAD as a  Testing the macroecological effects of migration 356 We have interpreted the results of our macroecological analyses in the broadest possible 357 sense to convey that demographic manipulations of experimental communities are 358 unlikely to induce qualitative changes to macroecological patterns (e.g., removing 359

10/32
Taylor's Law). Most microbial communities are similar and many of their features can 360 be captured by minimal mathematical models [19,29]. Instead, we propose that the 361 utility of macroecology is in its role as a quantitative mediator between empirical 362 observation and predictions obtained from mathematical models. Given that our two 363 migration treatments can only be paired with a control that lacks migration, we omitted 364 the high inoculation no migration treatment from the remainder of our study. By 365 examining the quantitative deviations noted above we can identify how 366 experiment-specific forms of migration can be incorporated into the SLM as well as the 367 appropriate variables necessary to identify the quantitative outcomes of experimental 368 treatments.

369
To start, it is clear from the previous section that the SLM is a useful tool and that 370 a form of the SLM that incorporates migration would be appropriate. While a constant 371 rate of migration can be readily incorporated into the SLM and a stationary solution of 372 the AFD can be derived (S1 Text), this model does not reflect the details of the 373 experiment. Rather, migration occurs in this experiment only at the start of a given 374 transfer cycle. This detail corresponds to a model where the effect of migration can be 375 captured as an experimentally-induced perturbation on the initial conditions of a 376 system, namely the initial abundance of an ASV at the kth transfer (n (k) i (0)). The full 377 time-dependent solution of the SLM which depends on initial conditions has previously 378 been solved [57], meaning that the temporal evolution of the AFD in response to 379 experimental perturbations and its subsequent relaxation to the stationary state of the 380 gamma distribution can be modeled (S2 Text). We found that this is the case, where 381 migration as a perturbation of initial conditions drastically altered the AFD relative to 382 the constant migration case, with its effects then rapidly dissipating (Fig. S2).

383
The fact that we observe quantitative differences in macroecological patterns 384 between different treatments as well as over time once migration manipulations have 385 ceased tells us that the communities did not relax to their steady-state by the end of a 386 given transfer cycle. This observation suggests that while the stationary solution of the 387 SLM is sufficient to characterize certain macroecological patterns, the time-dependent 388 solution of the SLM would likely be more appropriate to explain differences between 389 treatments. However, there are additional experimental details that one would like to 390 incorporate before predicting the effect of migration in order to strike a balance between 391 the realism and tractability of a model.

392
The first observation is the number of ASVs in a given community, i.e., its richness. 393 Through rarefaction curves and previously established richness estimation 394 procedures [64], we found that the richness of the progenitor community was ∼ 100 fold 395 greater than the richness of a typical assembled community (Fig. S3). Migration had 396 little effect on richness, as estimates are only slightly higher among communities that 397 experienced migration but pale in comparison to the richness of the progenitor. This 398 result suggests that the carrying capacity of a given ASV played a primary role in its 399 survival, raising the question of how to specify the K i of a community and how it 400 relates to that of the progenitor. The observation that Taylor's Law holds in the no 401 migration treatments implied that the coefficient of variation of growth fluctuations was 402 constant across ASVs (i.e., σ i = σ), meaning thatx i ∝ K i and, consequently, 403 K i ∼ Lognormal. This result allowed us to simulate carrying capacities as draws from a 404 lognormal distribution. Weak correlations between the mean relative abundance after 405 the cessation of migration (x i (t = 18)) and x i,progenitor for all treatments supports the 406 view that K i is independent of the abundances in the regional community (Fig. S4).

407
However this observation represents a form of survivorship bias, as we can only compare 408 values ofx i and x regional i for ASVs that were present in both the regional and 409 descendant communities. We found that distributions of x i,progenitor differ depending on 410 whether a given ASV was present in descendant communities, with the distribution for 411

11/32
ASVs that were present having shifted to the right (Fig. S5a). A permutational 412 Kolmogorov-Smirnov test found that this shift was significant, a result that holds for all 413 treatments (Fig. S6). This conditional dependence between the probability that a 414 carrying capacity is non-zero and the relative abundance of an ASV in the progenitor 415 was captured by logistic regression (Fig. S5b). This statistical dependence, along with 416 the details regarding depth of sampling (i.e., number of reads, Fig. S7), were 417 incorporated into our numerical simulations of the SLM (Materials and Methods).

418
Experiment-agnostic macroecological patterns 419 We first examined the quantitative effect of migration on macroecological patterns that 420 have been previously unified by the SLM. Namely, we investigated the effect of 421 migration on the AFD and Taylor's Law. We found that log 10 -transformed AFDs 422 rescaled by their mean and variance only seem to differ between transfers 12 and 18 423 when migration is present (Fig. 4a-c). A KS test supports this observation, where parameters (Materials and Methods). The Euclidean distance was 0.003, 9 · 10 −5 , and 431 0.02 for no, regional, and global migration treatments respectively. We then used the 432 optimal set of parameters to generate null distributions of KS from 10 3 simulations.

433
While the distance between AFDs for the no migration treatment was too small to be 434 significant, its value lied outside the bulk of the simulated distribution (Fig. 4d). 435 Regarding migration, the effect of regional migration was reasonably captured by the 436 simulation (Fig. 4e) though the same could not be said for the global treatment (Fig.   437 4f). Before a conclusion can be reached it is worth evaluating whether the AFD is an 438 appropriate measure for assessing the impact of migration. By repeating our 439 simulations across a grid of τ, σ combinations we found that KS only changed in a 440 systematic manner across parameter regimes for the regional migration treatment (Fig. 441  S8). This result suggests that the inability of the SLM to capture the difference in 442 AFDs for the no and global migration treatments was likely driven by the uninformative 443 nature of the AFD for assessing the impact of migration. 444 We then turned towards Taylor's Law to determine whether the slope of the 445 relationship changed after the cessation of migration (Fig. 5a-f). We found that the 446 slope under regional migration was lower than that of the no migration at transfer 12 447 (Fig. 5b), but approached the result of the no migration treatment by transfer 18 (Fig. 448  5e). Similarly, the intercept of regional migration treatment was initially lower but then 449 approached the value of the no migration treatment at treatment 18, as seen by the 450 range of values on the y-axis. As a contrast, we observed little change in the values of 451 the slope or intercept for the global migration treatment. In line with our predictions, 452 there was no significant change in the slope (t slope = −1.12, P = 0.227) or the intercept 453 (t intercept = −0.605, P = 0.583) between transfers 12 and 18 for the communities that 454 did not undergo migration. We found that global migration failed to alter the slope 455 (t slope = −0.390, P = 0.692) or the intercept (t intercept = −0.380, P = 0.734). 456 Contrastingly, regional migration significantly altered both the slope (t slope = 3.34, 457 P = 0.0178) and intercept (t intercept = 2.75, P = 0.0271).

458
By repeating the same ABC procedure outlined for the AFD we found that the 459 distribution of slopes generated for an optimal set of parameters could capture the 460 observed values of t slope across all migration treatments (Fig. 5d-f). However, by 461 examining t slope and t intercept across a grid of parameter combinations we found that 462 12/32 the statistic was again only informative for the regional migration treatment (Figs. 463 S9,S10). These results demonstrate that the AFD and Taylor's Law are primarily 464 informative of the effects of regional (island-mainland) migration. patterns. Assuming that the effect of migration was sufficient to alter the initial relative 472 abundance of an ASV at the start of a given transfer, we would expect regional 473 migration to primarily alter the mean abundance, whereas global migration would 474 primarily alter fluctuations in ASV abundance.

475
Regional migration 476 We first considered regional migration and the macroecological patterns it might alter. 477 Under the framework of the SLM, the effect of migration on the initial conditions 478 should be detected if the timescale of growth was large enough such that the final state 479 of the community was not dependent on its initial conditions.

480
To test this prediction, we examined the paired MADs for the regional and no 481 migration treatments before and after the cessation of migration. We found that the 482 correlation between MADs was initially weak and non-significant at transfer 12 but then 483 considerably increased and was significant by transfer 18 (Fig. 6a,b), a result that is 484 consistent with the interpretation that ASVs revert to their carrying capacity once 485 migration has ceased. Using a permutation-based form of Fisher's Z-test for the 486 difference in correlation coefficients, we found that the increase in MAD correlation was 487 significant (Z ρ = 2.67, P = 0.0103) [59]. Our SLM simulations support this conclusion, 488 as we were able to recapture the observed value for the optimal parameter combination 489 of σ and τ from ABC (Fig. 6c) and over a range of parameter combinations (Fig.   490 S11a,b).

491
Our correlation analyses demonstrated how regional migration altered the MAD, but 492 it remained necessary to examine how the effect of migration depended on ASV 493 abundances in the progenitor community. To examine this dependence, we calculated 494 the ratio of the mean abundance of an ASV in the regional and no migration treatments 495 xi regional xi no mig and plotted its dependence on the its abundance in the progenitor community. 496 We found that at transfer 12 there was a strong relationship between the two quantities 497 that was statistically significant via permutation. By transfer 18 the slope has 498 dissipated and is no longer significant (Fig. 6d,e). We found that the change in slopes is 499 significant using a permutation-based t-test (t = −2.64, P = 0.0255). Similar to Z ρ , 500 these t statistics can be reproduced using a model of the SLM in certain σ, τ parameter 501 regimes, which overlap with those found to reproduce observed estimates of Z ρ (Fig. 6f) 502 Similar to what we observed with Z ρ , we found that successful predictions of σ, τ were 503 restricted to certain parameter regimes (Fig. S11c,d).

504
Global migration 505 We examined relationships that are analogous to those examined in the regional 506 migration case to evaluate the impact of global migration. First, under the framework 507 of the SLM we predicted that global migration as a form of fully-connected migration in 508 a metacommunity would strictly alter the fluctuations in ASV abundance across 509 13/32 replicate communities while leaving the MAD unchanged. In terms of measurables, we 510 predicted that the correlation in the MAD would remain unchanged between transfers 511 12 and 18, whereas the correlation in ASV CVs would increase after the cessation of the 512 migration treatment. We first found that within both transfers 12 and 18 for both the 513 MAD and distribution CVs that ASVs were significantly correlated (Fig. S12). By 514 repeating the permutation-based Z-test, we found that there was no significant change 515 in the correlation of the MAD between transfers 12 and 18, consistent with our 516 predictions (Z ρ = 0.203, P = 0.420). However, there was also no significant increase in 517 the strength of correlation for the distribution of CVs (Z ρ = 0.289, P = 0.621), meaning 518 that the cessation of migration did not considerably alter fluctuations in abundance.

519
It is possible that measurements taken at two sole timepoints is insufficient to detect 520 the effect of global migration. Unlike regional migration, here we are primarily 521 interested the effect that a treatment has in the fluctuation around a typical value, 522 rather than the typical value itself. Situations like these where one is interested in the 523 macroecology of fluctuations rather than typical values such as the mean often require 524 additional observations. As a solution, we leveraged the higher temporal sampling 525 resolution of the global migration treatment. Unlike regional migration communities, 526 several global migration communities were sequenced at each of the 18 transfers, 527 providing an opportunity to examine the fluctuation in abundance between subsequent 528 timepoints ∆ = log x(t+1) . As an analogous measure to those considered in Fig. S12, 529 we calculated the ensemble mean and CV of ∆ for each ASV at each timepoint and 530 examined how the distribution of each measure changed before and after the cessation 531 of migration manipulations. On first examination, we found that ∆ tended to relax 532 towards a value of zero around the sixth transfer and remained there throughout the 533 rest of the experiment (Fig. S14). This pattern indicated that after six transfers the 534 abundances of ASVs after 48 hr. growth within a given transfer cycle have reached 535 stationarity with respect to the initial conditions of the experiment, allowing us to 536 examine equal intervals [7,12] and [13,18] as the period of time before and after the 537 cessation of migration, respectively. We note that this pattern does not mean that the 538 effect of the initial conditions at the start of a given transfer cycle are no longer relevant. 539 We used permutational KS tests to determine whether distributions of ∆ varied 540 before and after the cessation of migration while controlling for ASV identity. There is 541 no evidence that ∆ changed in the communities without any migration 542 (KS ∆ = 0.101, P = 0.862) or within the global migration treatment (KS ∆ = 0.215, 543 P = 0.177) with simulations returning similar estimates (Fig. S14). This result was 544 consistent with our prediction that global migration would not affect the mean. Turning 545 to the question of fluctuations, we examined how CV ∆ changed with respect to time. 546 As predicted, the distribution of CV ∆ did not change between time windows for the no 547 migration treatment (KS CV ∆ = 0.1, P = 0.789; Fig. 7a). Furthermore, we observed 548 that CV ∆ tended to increase after the cessation of migration, generating a significant 549 difference between distributions (KS CV ∆ = 0.285, P = 0.0120; Fig. 7b). This result 550 was consistent with our predictions regarding the effect of global migration on ensemble 551 fluctuations. However, our ABC SLM simulations were generally unable to reproduce 552 values of KS CV ∆ that are consistent with what was observed (Fig. 7c,d). This result 553 was consistent with our search for regions of parameter space that generated successful 554 predictions (Fig. S13). 555 At face value, the inability of the SLM to reproduce the changes we observe in the 556 CV for the global migration treatment is concerning. The size of the inoculum presents 557 a reasonable explanation, as the global migration inoculum was nearly two orders of 558 magnitude smaller than that of regional migration (Table 1). However, this 559 experimental detail does not explain why we observed a higher CV after the cessation of 560 migration. An experimental detail can, again, provide us with insight: the presence of 561 14/32 multiple attractors in the no migration treatment (i.e., alternative stable-states).

562
Replicate communities that did not undergo migration tended to assemble such that 563 their composition was dominated by one of two bacterial families: Alcaligenaceae or 564 Pseudomonadaceae. Contrastingly, communities that were subjected to the global 565 migration treatment remained consistently Alcaligenaceae-dominated over the course of 566 the experiment. While a modified form of the SLM has been previously demonstrated 567 to be capable of explaining the existence of alternative stable-states within a single 568 community over an extended period of time, it is difficult to apply the same approach 569 to a system with 18 timepoints, which effectively reduces to six for global migration if 570 one is primarily concerned with the period before and after the cessation of the 571 migration treatment [33].

572
While the existence of attractors confounds the appropriateness of the SLM as a 573 model capable of describing macroecological properties across experimental communities 574 in the absence of additional information, it does not restrict its ability to capture 575 temporal properties within a community. Given this argument, we turn from estimates 576 of CV ∆ calculated over the ensemble at a given time t to those calculated over time for 577 a given replicate community. We again split the time-series into observations taken 578 before and after the cessation of migration, obtaining two estimates of the CV for each 579 ASV in each replicate community, CV < ∆ and CV > ∆ , from which we calculated an F 580 statistic describing the change in the CV [60]. By plotting the observed distributions 581 alongside nulls obtained by permuting transfer labels, we found that the distributions of 582 F generally overlaped with the null (Fig. S15).

583
While the observed distributions appear similar to the null, we performed a 584 statistical analysis. We ran a one-tailed t-test on the distributions of F where the null 585 was obtained by permuting treatment labels. We performed this test for all ASVs that 586 were present in at least three replicate communities in both the global and no migration 587 treatments. We found that the F -statistics in the global migration treatment were not 588 significantly greater than those in the control, meaning that the CV with respect to 589 time did not considerably increase after the cessation of migration (Table S2). We effects. We examined the effects of two different forms of migration: regional and 619 global [44]. As expected, regional migration altered macroecological patterns of typical 620 relative abundance in a manner that was captured by the SLM using minimal free 621 parameters. The results of our regional migration analyses demonstrate that modeling 622 frameworks can be employed to link experimental microbial communities and 623 macroecological patterns.  [72][73][74][75]. Extending microbial macroecology beyond patterns of abundance to 661 the level of function embodies the original physiological and energetic breadth that 662 allowed macroecology to advance our understanding of macroorganisms [4]. 663

16/32
Finally, it is worth taking a step back to consider how the results presented here 664 shape the microbial view of macroecology. The discipline of macroecology was originally 665 conceived as an explicitly non-experimental form of investigation [4]. Analysis of the 666 origin and development of macroecology provides two historical explanations for the 667 initial rejection of experimental approaches: 1) large-scale community-level experiments 668 are often impractical and 2) producing generalities from experiments has proven to be 669 difficult [76]. Our results demonstrate that these two constraints are ameliorated in the 670 study of microbial communities, running contrary to recent claims that statistical 671 distributions are uninformative of mechanism [77]. We, and others [78], propose that 672 the timescales, abundance, and comparative ease with which ensembles of communities 673 can be maintained and manipulated make microorganisms an ideal system for testing 674 quantitative macroecological predictions.  Figure 1. Emergent variation in experimental communities. Variation in abundance consistently arises across treatments and timescales in experimental communities. This variation can be captured by the relationship between the mean and variance of relative abundance, a pattern known as Taylor's Law. Figure 2. Macroecological laws hold in experimental communities. Empirical macroecological laws that were previously identified in natural microbial systems consistently arise in experimental communities [29]. a) The Abundance Fluctuation Distribution (AFD) tends to follow a gamma distribution across treatments. b) A gamma distribution that explicitly considers sampling successfully predicts the fraction of communities where an ASV is present (i.e., its occupancy). Measures of occupancy are primarily determined by the mean abundance of an ASV, implying that the presence or absence of an ASV in a given community with knownx i is primarily determined by sampling depth. c) The distributions of mean abundances are similar across treatments and largely follow a lognormal distribution and d) after the cessation of migration manipulations the two free parameters of the lognormal converged across treatments. We model the ecological dynamics within a given transfer cycle using the Stochastic Logistic Model of growth (Eq. 14). The random sampling of cells at the start of a transfer cycle does not alter the average relative abundance. However, migration alters the abundance of an ASV at the start of a transfer cycle as a perturbation (Eq. 17). This experimental detail reduces the relative abundance of a given ASV if the relative abundance in a migration inoculum was lower than the carrying capacity (K i ), where the ASV proceeds to increase abundance at a rate set by τ . Figure 4. Migration impacts the shape of AFDs. a-c) By rescaling and merging log 10 transformed AFDs of all ASVs for a given treatment we can examine how the shape of the AFD changes before and after the cessation of migration using a KS test. d-f) Using optimal parameter combinations identified by ABC, we found that the SLM predicted reasonable KS statistics (KS) for regional migration, but not for the global and no migration treatments.

28/32
Figure 5. Regional migration impacts the slope of Taylor's Law. a-f) By examining the slope of Taylor's Law for each treatment we found that the slope only considerably changed after the cessation of migration for the regional migration treatment. g-i) Using our ABC estimation procedure, the SLM succeeds in predicting this increase in the slope for regional migration. Figure 6. SLM simulations reproduce regional migration patterns. The properties of the MAD were examined to evaluate the impact of migration over time. a) At transfer 12, the last transfer with migration, we found no apparent correlation between the MADs of the regional and no migration treatments. b) Contrastingly, the strength of the correlation rapidly increased by transfer 18. The significance of this difference can be evaluated by calculating Fisher's Z-statistic, a statistic that can be applied to correlations calculated from simulated data. c) By performing simulations over a range of σ and τ , we can visualize how the observed value of Z ρ compares to simulated results and identify reasonable parameter regimes d) as well as the relative error of our predictions. e,f) The ratio of the MAD between regional and no migration provides a single axis that can then be compared to the abundance of an ASV in the progenitor community. We found that there was a positive significant slope at transfer 12 that dissipated by transfer 18, reflecting the cessation of migration. g,h) Analogous heatmaps of the t-test for the difference in slopes can be constructed for simulated t statistics and their relative error. Figure 7. SLM simulations are unable to reproduce global migration patterns. By leveraging the entire timeseries we can examine how the coefficient of variation of ∆ changes after the cessation of migration at transfer 12 using a KS test (KS). a) As expected, there is no change in the distribution of CV ∆ for the no migration treatment. b) However, the typical CV ∆ tends to slightly increase after transfer 12 for global migration, a significant result. c) Our ABC estimation procedure for the SLM succeeds for the no migration treatment, but is unable to capture observed values of KS under global migration.

31/32
Tables 994 Quantity Symbol Value Total abundance, transfer N * (T ) ∼ 10 8 Total # of migrants, regional N regional 7.92 · 10 6 Total # of migrants, global N global # reps. · (8 · 10 −5 ) · D transfer · N * ≈ 3.07 · 10 4 Dilution rate, transfer D transfer 4µL 500µL = 0.008 Dilution rate, global D global 504µL 60,000µL ≈ 0.0084 Dilution rate, regional D regional 504µL 60,000µL ≈ 0.0084 Table 1. Experimentally imposed parameters that were used in this study [44]. Estimates of total community size were previously obtained via CFU counts. The calculation of N global is the result the procedure used to pool samples from replicate communities into a single global pool at the end of each transfer, which was then used to inoculate communities at the start of the subsequent transfer.