## Abstract

Historically, ecology has benefited by characterizing statistical patterns of biodiversity within and across communities. This approach, known as macroecology, has achieved considerable success in microbial ecology in recent years, having identified universal patterns of diversity and abundance that can be captured by effective models that do not require interactions between community members. Experimentation has simultaneously played a crucial role in the field’s development, as the manipulation of high-replication time-series has revealed novel forces that govern community dynamics. However, there remains a gap between microbial experiments performed in the laboratory and macroecological patterns documented in natural systems. Here, we work to bridge the gap between the experimental manipulation of communities and their macroecological effects. Using high-replication time-series of experimental microbial communities, we demonstrate that macroecological laws observed in nature can be readily recapitulated in a laboratory setting and unified under the Stochastic Logistic Model of growth (SLM). We find that demographic manipulations and their effect on community-level variation can alter empirical patterns in a manner that diverges from our predictions, though the predictive capacity of the SLM can be restored by incorporating explicit experimental details. Finally, we demonstrate the extent that experimental manipulations are capable of altering macroecological patterns under the SLM, establishing a demarcation between macroecological effects we can and cannot observe in a laboratory setting.

## Introduction

Microbial communities inhabit virtually every environment on Earth. Through their ubiquity, abundance, and diversity, microorganisms regulate the biogeochemical processes that sustain vast amounts of life. Furthermore, microbial communities play a crucial role in maintaining the health of many macroscopic forms of life, including humans, who have harnessed microbial communities to promote their own well-being [1, 2]. Given their environmental, medical, and economic importance, it is necessary to develop a quantitative theory of ecology that allows researchers to explain, maintain, and alter the properties of microbial communities. A challenge of this scale is daunting, and the complexity of microbial communities has promoted the engagement of researchers from various disciplines with distinct approaches. Of these approaches, there are two that have substantially contributed towards our quantitative understanding of microbial communities: macroecology and experimental ecology.

Historically, the field of ecology has achieved considerable success by characterizing patterns of diversity and abundance among ecological communities, an approach known as macroecology [3–8]. The macroecological approach is, fundamentally, statistical in nature, allowing for quantitative predictions to be made about the typical features of ecological communities without having to specify microscopic ecological forces. The unencumbered nature of this approach has allowed researchers to successfully characterize a diverse array of microbial ecological patterns [9–17] and spurred the development of mathematical models of microbial ecology grounded in statistical physics [18–26]. Through this approach, disparate patterns were recently unified by the observation that the typical microbial community follows three macroecological laws: 1) the abundance of a given community member across sites follows a gamma distribution, 2) the mean abundance of a given community member is not independent of its variance (i.e., Taylor’s Law [27, 28]), and 3) the mean abundance of a community member across sites follows a lognormal distribution [29]. These three universal laws can be captured by an intuitive mathematical model of density dependent growth with environmental noise, the stochastic logistic model (SLM) of growth [29–31]. Building on its utility, the SLM has been successfully extended to quantitatively capture additional empirical microbial macroecological patterns. Examples that explicitly use the SLM include attempts to capture measures of ecological distances between communities [32], alternative stable-states [33], patterns of richness and diversity at coarse-grained taxonomic and phylogenetic scales [34], and dynamics within and across human hosts at the sub-species level (i.e., strains) [17, 35]. The results of these studies demonstrate that a minimal mathematical model of ecological dynamics can capture a broad assemblage of microbial macroecological patterns.

Alongside the study of macroecological patterns in observational data, experimentation has played a crucial role in the documentation and manipulation of patterns [36, 37]. The advent of 16S rRNA amplicon sequencing permits current researchers to experimentally investigate the ecological dynamics of large microbial communities in a laboratory setting. Through this controlled approach, researchers have begun to determine the extent that the assembly of microbial communities is reproducible and predicable [38–43]. Surprisingly, purportedly simple environments have the capacity to form highly dissimilar microbial communities, even in systems harboring a lone exchangeable resource [44]. Results such as this have led researchers to investigate the degree that the state of a community is susceptible to experimentally imposed ecological forces. A prominent example is the migration of individuals between communities, an ecological force that is amenable to experimental manipulation [45, 46] and can alter the heterogeneity of communities across replicates [44, 47, 48]. The effect of this consistent heterogeneity on macroscopic properties of microbial communities, as well as the overall macroecology of experimental communities, remains to be determined.

Macroecology and experimental ecology as distinct approaches have revealed substantial insight into the fundamental nature of microbial communities, though they are not without their own drawbacks. It is often difficult to identify microscopic mechanistic causes for the macroecological patterns we frequently observe [49]. Contrastingly, experimental ecology can inordinately focus on system-specific details and observations that may not be generalizable, a hindrance towards constructing an overall theory of microbial ecology [50, 51]. Providing mechanistic explanations of universal empirical macroecological patterns alongside their characterization and prediction requires both approaches and is arguably a central goal of microbial ecology. Furthermore, there remains the need to develop and incorporate ecological theory to the study of variation in microbial community diversity and assembly as well as the extent that extrinsic and intrinsic ecological forces contribute to said variation. By applying and extending macroecological theory to highly replicated temporally resolved experiments, we can begin to approach a quantitative theory of microbial ecology.

Here, we attempted to bridge the gap between the experimental manipulation of ecological forces and their resulting macroecological effects. Using high-replication time-series of experimental microbial communities, we first demonstrated that the macroecological laws observed in nature can be recapitulated in a laboratory setting. By connecting these patterns to the predictions of the SLM, we identified a reasonable null model of microbial macroecology which can be used to predict treatment-specific effects. Specifically, we focused on migration treatments that correspond to mainland-island and fully-connected metacommunity models [52, 53]. We examined the ways in which the ecological force of migration can be manipulated, their resulting macroecological outcomes, and in what degree these outcomes can be captured by the SLM. Using these results, we identified when and how the SLM succeeded at predicting the effects of experimental manipulation and where it fell short. By leveraging high-throughput ecological experiments alongside robust statistical patterns, we can strengthen the predictive and quantitative elements of microbial macroecological theory.

## Materials and methods

### Experimental data

Experimental data was obtained from a previous study where a large number of replicate ecological communities originating from a single progenitor soil sample were propagated in controlled laboratory conditions under regional or global migration regimes [44]. A given replicate community was initiated by inoculating 4 *µ*L (low inoculum) or 40 *µ*L (high inoculum) of the source community into 500 *µ*L of M9 minimal media with 0.2% glucose into a well of a 96 deep-well plate (VWR) at 30°C under static conditions. Transfers were performed every 48 hours, 18 times in total for all replicate communities with a dilution rate *D*_{transfer} = 0.0084 (Table 1).

The details of the transfer procedure varied to manipulate the effects of different forms of migration. For the regional migration treatment, a 4 *µ*L aliquot of the soil supernatant was added to the 4 *µ*L aliquot from the previous transfer. For the global migration treatment, 4 *µ*L aliquots of all replicate populations were pooled, vortexed, and diluted 10,000-fold. A 4 *µ*L aliquot of the diluted solution was added to the 4 *µ*L aliquot from the previous transfer for each replicate community. At the end of each 48 hr. transfer period samples from each replicate community were mixed with 40% glycerol and cryopreserved at -80°C.

DNA extraction was performed using a QIAGEN DNeasy 96 Blood and Tissue kit. Library preparation for 16S rRNA amplicon sequencing of the V4 region was performed as previously described [41] and PCR products were purified and normalized using the SequalPrep PCR kit (Invitrogen). Sequencing was performed on an Illumina MiSeq (2×250 bp paired-end) and raw reads were processed for demultiplexing and barcode, index, and primer removal using QIIME v1.9 [54]. The number of communities sequenced at the beginning and end of migration manipulations are as follows: 20 and 92 communities at transfers 12 and 18 for the no migration treatment, 93 for both transfers 12 and 18 for global migration, 92 for both transfers 12 and 18 for regional migration, and four and 93 communities for the no migration with high initial inoculation for transfers 12 and 18. The entire 18 transfer time series was sequenced for 20 communities in the no migration treatment and for eight communities in the regional migration treatment. Additional details can be found in the original manuscript [44].

In this study we reprocessed all raw FASTQ data from the original study to obtain Amplicon Sequence Variants (ASVs) using DADA2 [55]. We processed our data using the pooled option inference option so that ASVs with an abundance of one in a given site (i.e., singletons) could be inferred, allowing us to examine the entirety of the empirical sampling distribution. The attractor status of a given replicate community was assigned as previously described [44].

### Testing the macroecological predictions of the stochastic logistic model

The SLM is a stochastic differential equation that describes the logistic growth of a population with a time-dependent growth rate. In this context, the SLM for the relative abundance of the *i*th ASV (*x*_{i}) is defined as the following Langevin equation

Where , *K*_{i}, and *σ*_{i} are the intrinsic growth rate, carrying capacity, and the coefficient of variation of growth rate fluctuations. The Stochastic Logistic Model (SLM) of population growth has recently been shown to explain three empirical macroecological laws that hold across dissimilar environments as follows: 1) predicting a gamma distributed abundance fluctuation distribution, 2) interpreting the power law relationship between mean abundance the and variance of abundance (i.e., Taylor’s Law) in terms of a community member’s carrying capacity, and 3) interpreting the lognormally distributed mean abundance distribution in terms of underlying carrying capacities [29]. To provide necessary context, we briefly summarize these three laws and their connection to the SLM.

### The Abundance Fluctuation Distribution

The form of the AFD predicted by the SLM can be derived from the stationary state of the PDE obtained from the Fokker-Planck equation corresponding to Eq. (interpreted as an Itô SDE) [56] which is the predicted form of the abundance fluctuation distribution. This distribution is a gamma distribution with the following expected mean and squared coefficient of variation

Defining empirical estimates of *⟨x*_{i} *⟩* and as and , we obtain a form of the gamma that can be used to generate predictions with zero free parameters

By using the Poisson limit of a binomial sampling process, the sampling form of the AFD has been derived [29]. The derivation allows us to calculate the probability of obtaining *n* reads out of a total sampling depth of *N* for the *i*th ASV as

By setting *n* = 0 we obtain the expression for the probability of not detecting an ASV in a site
from which we can derive the expectation of the fraction of sites where an ASV is found (i.e., its occupancy).

### Taylor’s Law

Taylor’s Law describes the relationship between the mean and variance of the relative abundance [27, 28]
When *b* = 1 the empirical coefficient of variation is constant across ASVs, implying that *σ*_{i} = *σ* in the context of the SLM. To provide a point of comparison, we calculated the upper bounds on variance for a given mean relative abundance ⟨*x*⟩ using the Bhatia–Davis inequality [57].

### The Lognormally distributed Mean Abundance Distribution

We consider that the Mean Abundance Distribution (MAD) follows a lognormal distribution

To account for finite sampling, we use a modified form of the lognormal that considers mean abundances with a coverage greater than *c*.
where *θ*(·) is the Heaviside step function. Parameters of this modified lognormal were fit to the empirical MAD as previously described (see Supplementary Note 7 in [29]).

### Simulating the SLM with migration

To account for the boom-bust cycle of microbial growth in a serial dilution experiment, we considered a piece-wise form of the SLM to describe the dynamics within the *k*th transfer
where the dynamics occur over a period *t* ∈ [0, *T* ]. A transfer then occurs and the dynamics start anew.

The time-dependent probability distribution of Eq. 11 can be derived (i.e., *P* (*x*_{i}(*t*) | *x*_{i}(*t*_{0}), *t*_{0}) and could feasibly be extended to incorporate the effect of migration on initial conditions [58] (description of solution in S2 Text). However, there are additional experimental details and observations we would like to incorporate that are difficult to investigate analytically.

We begin by modeling the relative abundance at the start of a given transfer cycle. The abundances of ASVs at the start of the *k*th transfer cycle can be modeled as a multinomial sampling process

Where abundances are drawn from the progenitor community for the first transfer cycle (*k* = 0) and are drawn from the previous transfer at *T* = 48 hr. for all subsequent transfer cycles (*k >* 0).

We then turn to modeling migration. Despite their different forms, both global and regional migration treatments are similar in that migrants are not supplied at a constant rate within a given transfer cycle (stationary distribution of the SLM with constant migration derived in S1 Text). Rather, migration is manipulated by adding an aliquot at the start of each of the first twelve transfer cycles. This experimental detail means that both regional and global migration treatments alter the initial abundances of community members, which then follow the SLM, so our task is to model a form of the SLM where the initial conditions at the start of each transfer cycle are determined by migration.

The regional migration treatment can be viewed as a form of island-mainland migration, where the ancestral community is propagated into the descendant communities during the first twelve transfers [52]. Contrastingly, global migration can be viewed as a fully interconnected metacommunity, where aliquots of each replicate are intermixed and equally distributed among replicates [53]. In mathematical terms, each form of migration can be formalized for the *i*th ASV at transfer *k* as
where abundances of migrants can again be modeled as a multinomial sampling process
where for *k >* 12.

We can then examine how the typical initial *relative* abundance of a given ASV depends on the dilution rate of transfers. We begin by noticing that the final total abundance of a community does not considerably vary from transfer to transfer (*N* ^{(k−1)}(*T*) ≈ *N* ^{(k)}(*T*) *≡ N* ^{∗}(*T*)). Using the above definitions and dividing ASV abundances by the total abundances to obtain relative abundances , we obtain
where parameters were set by the experimenter and detailed in Table 1.

Beyond migration, there are dependencies between the final abundance of an ASV and its abundance in the progenitor community that we would like to incorporate. The set of *K*_{i} was drawn from the parameters obtained by fitting Eq. 10 to the empirical MAD. Dependence between and the MAD, which is proportional to , was evaluated by performing logistic regression from `scikit-learn v0.22.1` [59]. From this relationship we obtained an estimate of Pr[*K*_{i} *>* 0 | *x*_{i,regional}], allowing us to define the distribution of carrying capacities as follows

The total number of ASVs with *K*_{i} *>* 0 was drawn from the distribution *S*_{descendant} ∼ Binomial(*S*_{regional}, 0.016), where *S*_{regional} is the number of ASVs in the regional community and 0.016 is the mean fraction of ASVs in descendant communities relative to the regional.

Given the amount of detail provided by the experiment, we instead elected to simulate the SLM (S3 Text). We used logarithmically spaced parameter values from the following ranges: *σ* ∈ [0.01, 1.9] and *τ* ∈ [1.7, 6.9]. The upper bound on the range of *σ* was set by the observation that ⟨ *x*_{i} ⟩ = 0 for *σ ≤* 2. The range on *τ* was set by the observation that the serial dilution factor sets the total number of generations per-transfer as , assuming exponential growth. This translates to a maximum generation time of *τ*_{max} = 48hr.*/*7 ∼ 6.9hr.. We identified a reasonable bound on the minimum generation time by noticing that prior research efforts have established that these communities have a maximum growth rate of *∼* 0.6 hr.^{−1}, translating to a minimum generation time of *τ*_{min} *≡* 0.6^{−1} *≈* 1.7hr. [44].

### Statistical tests

We identified reasonable statistics to assess macroecological changes in our experimental communities. Standard tests were used when appropriate (*e*.*g*., change in slopes), but two less commonly used statistics were used to assess patterns relating to correlation coefficients and the CV of relative abundances. The change in correlation coefficients between transfers 12 and 18 was assessed using Fisher’s Z statistic [60]
where , *M*_{t} is the number of replicate communities at transfer *t*, and the denominator represents the standard error of the numerator.

The difference in CVs between transfers 12 and 18 was calculated using the following previously described *F*_{CV} statistic [61]
where *M*_{t} represents the number of replicate communities at transfer *t*. Optimal combinations of [*τ, σ*] were identified using Approximate Bayesian Computation, where we selected the simulation iteration with the lowest Euclidean distance between the observed and simulated statistic. For analyses where a set of test statistics **c** were considered, we used a weighted measure of Euclidean distance where the simulated values of each test statistic were scaled by their standard deviation to prevent the distance from being dominated by a statistic with high variation [62]

## Results

Using the results of a high-replication community assembly experiment, we examined the extent that universal patterns of microbial macroecological from observed data could be recapitulated in a laboratory setting. Focusing on forms of migration corresponding to island-mainland and metacommunity dynamics, we examined the extent that qualitatively similar macroecological patterns differed in their quantitatively details. We then extended the SLM to incorporate experimental details without introducing additional free parameters, allowing us to balance the effectiveness of the SLM as a minimal model with experimental realism. Through simulation results, we identified whether the macroecological effects of a given form of migration could be captured by the SLM and established its limitations as a model of biodiversity.

### Macroecological patterns emerge in experimental systems

Predicting the macroecological effects of experimental manipulation is a vital goal of microbial ecology. With this goal in mind, a reasonable first step is to determine the degree that empirical macroecological patterns from observational data hold in an experimental system. The number of potential patterns one can examine is myriad, though three stand out: 1) the mean and variance of species abundances are not independent (Taylor’s Law), 2) the Abundance Fluctuation Distribution across independent sites (AFD) is gamma distributed, and 3) the Mean Abundance Distribution across independent sites (MAD) follows a lognormal distribution [29]. These three patterns can be unified through the lens of a mean-field model, the Stochastic Logistic Model of growth [29].

To examine whether these patterns occur in an experimental system, we elected to examine the microbial communities from a prior experiment where qualitatively different forms of migration were manipulated [44]. To briefly summarize, here a community grown from an environmental sample was divided among a large number (∼100) of identical environments containing a defined media with glucose as the sole source of carbon and transferred after 48 hr. of growth 18 times. Building off of this standard procedure, migration was incorporated by adding an additional aliquot at the start of each of the first 12 transfer cycles. Regional migration was manipulated by adding an aliquot from the progenitor community, a form of migration that makes current communities more similar to the original community by altering the typical abundance. Contrastingly, in the global migration treatment aliquots were obtained by merging samples of all replicate communities from the previous transfer, a homogenizing form of migration that reduces fluctuations around the typical abundance. We examined these three treatments plus an additional no migration treatment with large initial inoculum to determine whether demographic manipulations have the capacity to alter broad macroecological patterns.

We first assess the variation that emerged in experimental communities. The average relative abundance tends to vary over four orders of magnitude across treatments and different transfers, meaning that a considerable degree of community-level variation was maintained in the experimental communities. By examining the relationship between the mean and variance of relative abundance, we see that the two measures clearly follow a similar relationship on a log-log scale across treatments, suggesting that Taylor’s Law applies [27](Fig. 1). Under Taylor’s Law a slope of two implies that the coefficient of variation (CV) remains constant across ASVs. By plotting this slope value, we see that it captures reasonably well the data. If we fit a slope to each treatment for each transfer we find that the mean slope is 2.08 ± 0.057, suggesting that despite the variation in typical abundance the CV of relative abundances remains roughly constant across ASVs.

While it appears many ASVs with mean abundances ∼ 1 tend to fall below the slope, we can explain this by noting that relative abundance is, by definition, a bounded variable. This constraint means that there is a mathematical upper bound on the variance for a given mean abundance, the value of which can be lower than the value predicted by the linear slope between the mean and variance on a log-log scale. This bound is known as the Bhatia–Davis inequality and by plotting it alongside our experimental observations we see that it slightly curves along with the data when *x*_{i} *∼* 1 [17, 57]. This agreement suggests that the deviation from Taylor’s Law at very high abundances is primarily due to mathematical constraints rather than the ecological consequences of maintaining communities in an experimental setting.

We then turn towards examining the full distribution of abundances across replicate communities, known as the Abundance Fluctuation Distribution (AFD). To facilitate comparison, we rescale the logarithm of the AFD for each ASV by its mean and variance (i.e., the standard score). We see that the rescaled AFDs tend to overlap across treatments, implying that despite differences in experimental details, the general shape of the AFD remains invariant. We find that the bulk of the distributions generally follow the gamma distribution predicted by the SLM (Eq. 2, Fig. 2a), though there remains a long tail which serves as motivation for comparing the AFDs of different treatments.

We then turn to the fraction of communities where a given ASV is found, a measure known as occupancy. Assuming that ASVs are sampled as a Poisson process, the distribution of observed read counts can be analytically derived with a gamma AFD, from which a prediction of the expected occupancy can be obtained with zero free parameters (Eq. 7). We find that the predictions of the SLM generally hold across treatments, with slight deviations at high values of observed occupancies for treatments without migration and with regional migration (Fig. S1a). This trend is reflected in the distribution of relative errors of our prediction, where certain treatments appear to have higher errors than others (Fig. S1b). However, a paired inspection using ASVs that were present in treatments that underwent migration, and the control suggests that the error does not considerably change due to migration (Fig. S1c). A permutation-based test for the average change in the relative error of occupancy between treatments reveals a significant, if slight, effect where migration reduced the error of our predictions for both regional (, *P* = 0:00140) and global migration (, *P* = 0:0104). Regardless, we are able to predict the occupancy of the typical ASV across different demographic treatments with high accuracy. Leveraging this result, we investigated the relationship between the mean relative abundance across replicates and the occupancy of an ASV, more commonly known as the abundance-occupancy relationship [12, 63, 64]. We find that all treatments tend to follow the same curve, which can be generally captured by binning and plotting the predicted occupancy for a given empirical mean abundance (Fig. 2b). The existence of this relationship in experimental communities is particularly striking, as it implies that the probability of observing a given community member is primarily determined by one’s sampling effort, despite differences in experimental details.

Finally, we turn to the distribution of mean abundances itself. Prior work has demonstrated that the Mean Abundance Distribution (MAD) across sites can be captured by a lognormal distribution that accounts for sampling ([29], Eq. 10), an across-site extension of the observation that distribution of abundances within a single microbial community can be captured by a lognormal [10]. While this approach is difficult since the global richness of a given treatment is ≈ 10, by pooling ASVs across treatments and timepoints we see that the empirical MAD can be roughly captured by a lognormal (Fig. 2c). A comparison of the two free parameters of the lognormal between transfers 12 and 18 indicates that MADs approach a similar shape after the cessation of migration manipulations (Fig. 2d). This result suggests that the MAD as a macroecological pattern consistently converges to a similar form over time. However, it should be noted that the lognormality of the MAD itself does not validate or invalidate the SLM. Rather, under the SLM the observation that the mean relative abundance and variance are not independent implies that the mean is proportional to the carrying capacity , so evidence of a lognormal MAD informs us of the distribution of parameters used in the SLM, not the SLM itself.

### Testing the macroecological effects of migration

We have purposefully interpreted the results of our macroecological analyses in the broadest possible sense in order to convey the reality that it is unlikely that experimental manipulations are capable of inducing large-scale qualitative changes to macroecological patterns. Most microbial communities are similar and many of their features can be captured by minimal models [19, 29]. Instead, we believe that the utility of macroecology is in its role as a quantitative mediator between empirical observation and theoretical prediction. Given that our two migration treatments can only be paired with the control lacking migration, we omit the high inoculation no migration treatment from the remainder of our study. By examining the quantitative deviations noted above we can identify how to incorporate the experiment-specific interpretation of migration as well as the appropriate variables necessary to identify the quantitative effects of experimental treatments.

To start, it is clear from the previous section that the SLM is a useful tool and that a form of the SLM that incorporates migration would be suitable. While a constant rate of migration can be readily incorporated into the SLM and a stationary solution derived (S1 Text), this model does not reflect the details of the experiment. Rather, migration occurs in this experiment only at the start of a given transfer cycle. This detail corresponds to a model where the effect of migration can be captured as an experimentally-induced perturbation on the initial conditions of a system. The full time-dependent solution of the SLM has previously been solved [58], meaning that the temporal evolution of the AFD in response to experimental perturbations can be quantitatively captured (S2 Text). We see that this is the case, where migration as a perturbation of initial conditions can drastically alter the AFD relative to the constant migration case, with its effects rapidly dissipating (Fig. S2).

The fact that we observe quantitative differences in macroecological patterns between different treatments as well as over time once migration manipulations have ceased tells us that the communities did not relax to their steady-state by the end of a given transfer cycle. This observation suggests that while the stationary solution of the SLM is sufficient to characterize certain macroecological patterns, the time-dependent solution of the SLM would likely be more appropriate to explain differences between treatments. However, there are additional experimental details that we would like to incorporate before predicting the effect of migration in order to strike a balance between realism and tractability. The first observation is the number of ASVs in a given community, i.e., its richness. Through rarefaction curves and previously established richness estimation procedures [65], we find that the richness of the progenitor community is ∼ 100 fold greater than the richness of a typical assembled community (Fig. S3).

We also see that migration had little effect, with richness estimates slightly higher among communities that experienced migration but paling in comparison to the richness of the progenitor. This result suggests that the carrying capacities of a given ASV plays a major role in its survival, raising the question of how to specify the *K*_{i} of an assembled community and how it relates to that of the progenitor. The observation that Taylor’s Law holds in the no migration treatments implies that the coefficient of variation of growth fluctuations is constant across ASVs (i.e., *σ*_{i} = *σ*), meaning that and, consequently, *K*_{i} *∼* Lognormal. This result allows us to simulate carrying capacities as draws from a lognormal distribution. Weak correlations between the mean abundance after the cessation of migration and for all treatments supports the assumption that *K*_{i} is independent of the abundances in the regional community (Fig. S4). However this observation represents a form of survivorship bias, as we can only compare values of and for ASVs that were present in both the regional and descendant communities. We see that distributions of differ depending on whether a given ASV was present in descendant communities, with the distribution for ASVs that were present having shifted to the right (Fig. S5a). A permutational Kolmogorov–Smirnov test demonstrates that this shift is significant, a result that holds for all treatments (Fig. S6). This conditional dependence can be captured by logistic regression (Fig. S5b), providing us with the probability of an ASV having a non-zero carrying capacity in the assembled community given its abundance in the progenitor:
where *a* is the intercept and *b* is the slope of the regression. Finally, there is the issue of sampling, both through the process of transferring communities via serial dilution and the finite number of reads obtained from each community. The latter is particularly relevant since out occupancy results suggests that sampling depth plays a major role in determining whether a given ASV is observed. To account for this effect we drew from the empirical distribution of sampling depths (Fig. S7). The task of grafting these empirical considerations onto the time-dependent solution of the SLM is not straightforward. Therefore, the incorporation of experimental constraints requires the use of a simulation procedure that incorporates relevant experimental details and empirical observations without the introduction of additional free parameters (Materials and methods). We obtain such a model by using an approximation of the numerical solution of the SLM within each transfer cycle which incorporates all the above results while requiring only two global parameters: *σ* and *τ*, the values of the latter being constrained by the amount of growth that can possibly occur over 48 hr.

### Experiment-agnostic macroecological patterns

We begin by examining the quantitative effect of migration on macroecological patterns that have been previously unified by the SLM. Namely, we investigated the effect of migration on the AFD and Taylor’s Law. We found that log_{10}-transformed AFDs rescaled by their mean and variance only seem to differ between transfers 12 and 18 when migration is present (Fig. 4a-c). A KS test supports this observation, where significance was assessed by permuting the transfer label for a given ASV in a given replicate community.

Determining whether the observed KS statistic could be explained by the SLM required our experiment-informed simulation. We generated a null distribution of 10^{4} values of KS with *τ* and *σ* drawn from a uniform distribution for each iteration. Using Approximate Bayesian Computation (ABC) we identified the optimal combination of parameters (Materials and Methods). The Euclidean distance was 0.003, 9 10^{−5}, and 0.02 for no, regional, and global migration treatments respectively. We then used the optimal set of parameters to generate null distributions of KS from 10^{3} simulations. While the distance between AFDs for the no migration treatment was too small to be significant, its value lied outside the bulk of the simulated distribution (Fig. 4d). Regarding migration, the effect of regional migration was reasonably captured by the simulation (Fig. 4e) though the same could not be said for the global treatment (Fig. 4f). Before a conclusion can be reached it is worth evaluating whether the AFD is an appropriate measure for assessing the impact of migration. By repeating our simulations across a grid of (*τ, σ*) combinations we find that KS only changed in a systematic manner across parameter regimes for the regional migration regime (Fig. S8). This result suggests that the inability of the SLM to capture the difference in AFDs for the no and global migration treatments is likely driven by the uninformative nature of the AFD for assessing the impact of migration.

We then turned towards Taylor’s Law to determine whether the slope of the relationship changed after the cessation of migration (Fig. 5a-f). As predicted, there was no significant change in the slope (*t*_{slope} = *−*1.12, *P* = 0.227) or the intercept (*t*_{intercept} = − 0.605, *P* = 0.583) between transfers 12 and 18 for the communities that did not undergo migration. We found that global migration failed to alter the slope (*t*_{slope} = *−−* 0.390, *P* = 0.692) or the intercept (*t*_{intercept} = *−*0.380, *P* = 0.734). Contrastingly, regional migration significantly altered both the slope (*t*_{slope} = 3.34, *P* = 0.0178) and intercept (*t*_{intercept} = 2.75, *P* = 0.0271). By repeating the same ABC procedure outlined for the AFD we find that the distribution of slopes generated for an optimal set of parameters can capture the observed *t*_{slope} across all migration treatments (Fig. 5d-f). However, by examining *t*_{slope} and *t*_{intercept} across a grid of parameter combinations we see that the statistic is again only informative for the regional migration treatment (Figs. S9,S10). These results demonstrate that the AFD and Taylor’s Law are informative of the effects of island-mainland migration.

### Experiment-specific macroecological patterns

The macroecological patterns we have examined up to now have not considerably changed under migration. It is useful to instead consider macroecological patterns that may not have received notice in the past, but are likely altered by the presence of a specific form of migration. In this section, we consider how the experimental manipulations of each treatment should affect specific macroecological patterns. Assuming that the effect of migration was sufficient to alter the initial relative abundance of an ASV at the start of a given transfer, we would expect regional migration to primarily alter the mean abundance, whereas global migration would primarily alter the fluctuations in ASV abundance.

#### Regional migration

We first consider regional migration and the macroecological patterns it might alter. Under the framework of the SLM, the effect of migration on the initial conditions should be detected if the timescale of growth is large enough such that the final state of the community is not dependent on its initial conditions.

To test this prediction, we examined the paired MADs for the regional and no migration treatments before and after the cessation of migration. We find that the correlation between MADs is initially weak and non-significant at transfer 12 and is considerable and significant by transfer 18 (Fig. 6a-d), a result that is consistent with the interpretation that ASVs revert to their carrying capacity once migration has ceased. Using a permutation-based form of Fisher’s *Z*-test for the difference in correlation coefficients, we found that the increase in MAD correlation was significant (*Z*_{ρ} = 2.67, *P* = 0.0103) [60]. Our SLM simulations support this conclusion, as we were able to recapture the observed value for the optimal parameter combination of *σ* and *τ* from ABC (Fig. 6c) and over a range of parameter combinations (Fig. S11a,b).

Our correlation analyses demonstrate how regional migration altered the MAD, but we have not examined how this effect depends on ASV abundances in the progenitor community. To examine this dependence, we calculated the ratio of the mean abundance of an ASV in the regional and no migration treatments and plotted its dependence on the its abundance in the progenitor community. We find that at transfer 12 there is a strong relationship between the two quantities with a significant slope that dissipates by transfer 18 (Fig. 6d,e). We find that the change in slopes is significant using a permutation-based *t*-test (*t* =− 2.64, *P* = 0.0255). Similar to *Z*_{ρ}, these *t* statistics can be reproduced using a model of the SLM in certain *σ, τ* parameter regimes, which overlap with those found to reproduce observed estimates of *Z*_{ρ} (Fig. 6f) Similar to what we observed with *Z*_{ρ}, we find that successful predictions of *σ, τ* are restricted to certain parameter regimes (Fig. S11c,d).

### Global migration

We examined relationships that are analogous to those examined in the regional migration case to evaluate the impact of global migration. First, under the framework of the SLM we predict that global migration as a form of fully-connected migration in a metacommunity would strictly alter the fluctuations in ASV abundance across replicate communities while leaving the MAD unchanged. In terms of measurables, we predicted that the correlation in the MAD would remain unchanged between transfers 12 and 18, whereas the correlation in ASV CVs would increase after the cessation of the migration treatment. We first find that within both transfers 12 and 18 for both the MAD and distribution CVs that ASVs were significantly correlated (Fig. S12) with similar values. By repeating the permutation-based *Z*-test, we find that there is no significant change in the correlation of the MAD between transfers 12 and 18, consistent with our predictions (*Z*_{ρ} = 0.203, *P* = 0.420). However, there was also no significant increase in the strength of correlation for the distribution of CVs (*Z*_{ρ} = 0.289, *P* = 0.621), meaning that the cessation of migration did not considerably alter fluctuations in abundance.

It is possible that measurements taken at two sole timepoints is insufficient to detect the effect of global migration. Unlike regional migration, here we are primarily interested the effect that a treatment has in the fluctuation around a typical value, rather than the typical value itself. Situations like these where one is interested in the macroecology of higher statistical moments such as the variance can require additional observations. As a solution, we leveraged the higher temporal sampling resolution of the global migration treatment to contend with this possibility. Unlike regional migration communities, several global migration communities were sequenced at each of the 18 transfers, providing an opportunity to examine the fluctuation in abundance between subsequent timepoints . As an analogous measure to those considered in Fig. S12, we calculated the ensemble mean and CV of ∆*/!* for each ASV at each timepoint and examined how the distribution of each measure changed before and after the cessation of migration manipulations. On first examination, we see that *⟨*∆*ℓ⟩* tends to relax towards a value of zero around the sixth transfer, where it remains for the remainder of the experiment (Fig. S14). This pattern indicates that after six days the abundances of ASVs after 48 hr. growth within a given transfer cycle have reached stationarity with respect to the initial conditions of the experiment, allowing us to examine equal intervals [7, 12] and [13, 18] as the period of time before and after the cessation of migration, respectively. We note that this pattern does not mean that the effect of the initial conditions at the start of a given transfer cycle are no longer relevant.

We used permutational KS tests to determine whether distributions of ⟨∆/ ℓ⟩ varied before and after the cessation of migration while controlling for ASV identity. There is no evidence that ⟨∆/ℓ⟩ changed in the communities without any migration (KS_{⟨∆ ℓ⟩} = 0.101, *P* = 0.862) or within the global migration treatment (KS_{⟨∆ ℓ⟩} = 0.215, *P* = 0.177) with simulations returning similar estimates (Fig. S14). This result is consistent with our prediction that global migration would not affect the mean. Turning to the question of fluctuations, we examined how CV_{∆ℓ} changed with respect to time. As predicted, the distribution of CV_{∆ℓ} did not change between time windows for the no migration treatment (, *P* = 0.789; Fig. 7a). Furthermore, we observed that CV_{∆ℓ} tended to increase after the cessation of migration, generating a significant difference between distributions (, *P* = 0.0120; Fig. 7b). This result was consistent with our predictions regarding the effect of global migration on ensemble fluctuations. However, our ABC SLM simulations were generally unable to reproduce values of that are consistent with what was observed (Fig. 7c,d). This result was consistent with our search for regions of parameter space that generated successful predictions (Fig. S13).

At face value, the inability of the SLM to reproduce the change we observe in the CV for the global migration treatment is concerning. The size of the inoculum presents a reasonable explanation, as the global migration inoculum was nearly two orders of magnitude smaller than that of regional migration (Table 1). However, this experimental detail does not explain why we observed a higher CV after the cessation of migration. An experimental detail can, again, provide us with insight: the presence of multiple attractors in the no migration treatment (i.e., alternative stable-states).

Replicate communities that did not undergo migration tended to assemble such that their composition was dominated by one of two families: Alcaligenaceae or Pseudomonadaceae. Contrastingly, communities that were subjected to the global migration treatment remained consistently Alcaligenaceae-dominated over the course of the experiment. While a modified form of the SLM has been previously demonstrated to be capable of explaining the existence of alternative stable-states within a single community over an extended period of time, it is difficult to apply the same approach to a system with 18 timepoints, which effectively reduces to six for global migration if one is primarily concerned with the period before and after the cessation of the migration treatment [33].

While the existence of attractors confounds the appropriateness of the SLM as a model capable of describing macroecological properties across communities, it does not restrict its ability to capture temporal properties within a community. Given this argument, we turn from estimates of CV_{∆ℓ}calculated over the ensemble at a given time *t* to those calculated over time for a given replicate community. We again split the time-series into observations taken before and after the cessation of migration, obtaining two estimates of the CV for each ASV in each replicate community, and , from which we calculated an *F* statistic describing the change in the CV [61]. By plotting the observed distributions alongside nulls obtained by permuting transfer labels, we see that the distributions of *F* generally overlap with the null (Fig. S15).

While the observed distributions appear similar to the null, we performed a rigorous statistical analysis to confirm. We ran a one-tailed *t*-test on the distributions of *F* where the null was obtained by permuting treatment labels. We performed this test for all ASVs that were present in at least three replicate communities in both the global and no migration treatments. We found that the *F* -statistics in the global migration treatment were not significantly greater than those in the control, meaning that the CV with respect to time did not considerably increase after the cessation of migration (Table S1). We repeated the test to control for attractor status in the no migration treatment, obtaining similar results (Table S1). The inability of the SLM with demographic details to reproduce the observed increase in the CV of abundance fluctuations suggests that the existence of alternative stable-states is the primary detail that determines the ecological effect of global migration.

## Discussion

The decisive result of our study is the demonstration that communities propagated in artificial environments harboring a single carbon source maintain the ecological variation necessary for macroecological investigations. When viewed as an ensemble, this variation provides the means to assess distributions of typical abundances across several orders of magnitude, a prerequisite for examining broad probabilistic patterns of diversity. The fact that this criterion was met allowed us to document the existence of macroecological patterns that were previously observed in naturally occurring communities, suggesting a quantitative equivalence between experimental and observational studies despite the controlled nature of artificially maintained communities and the presence of distinct demographic treatments. However, the repeatable maintenance of variation observed in the face of demographic manipulations was likely contingent on the high level of variation present in the progenitor community. This figurative raw material is analogous to the need for genetic variation to exist before selection can occur [66–68], the absence of which would preclude the possibility of macroecological investigations.

Characterizing robust empirical patterns in artificial communities is a key step toward identifying appropriate statistical null models in microbial ecology. The existence of patterns predicted by the SLM in artificial communities provided an opportunity to evaluate the macroecological consequences of experimental manipulations. Our approach of modifying the SLM, an empirically validated null model of microbial community composition, using experimental details provided a useful framework for identifying treatments that are capable of generating macroecological effects. We examined the effects of two different forms of migration: regional and global [44]. As expected, regional migration altered macroecological patterns of typical relative abundance in a manner that was captured by the SLM using minimal free parameters and ecological assumptions [52].

Contrastingly, we predicted that global migration would alter fluctuations around typical abundance [53]. We observed no change in the temporal variation in community member fluctuations after the cessation of migration within a given replicate community. However, we found that variation across the ensemble of communities tended to increase after the cessation of migration. This trend was consistent with our hypothesis regarding the effect of global migration, but one that could not be reproduced by the SLM using experimentally-set parameters. This inconsistency can be explained by the observation that the abundance of certain taxonomic families exhibited considerable heterogeneity across replicates that did not undergo migration, likely due to the presence of alternative stable-states [47, 69, 70]. Alternative stable-states cannot be captured by the SLM as-is without additional assumptions, namely on the form of a potential function that depends on a community member’s abundance [71]. Additional environmental details may be necessary to incorporate as well, as the fluctuations imposed in a batch culture design can induce alternative stable-states [72]. However, we are able to draw two key conclusions. First, the quantity of migrants chosen for the global migration treatment was sufficient to alter the attractor status, but insufficient to alter fluctuations in abundance within a given replicate. Second, macroecological patterns explained by the SLM persisted despite the pervasiveness of alternative stable-states, suggesting that forms of fine-scale heterogeneity cannot considerably alter certain macroecological patterns.

The results of our global migration analysis illustrate how one’s choice of model limits the set of empirical patterns that can be sufficiently captured. We have primarily focused on patterns relating to the typical abundance and fluctuations in abundance across communities as well as over time. Noticeably, we did not examine patterns that are likely to be overwhelmingly determined by interactions between community members (e.g., the correlations of abundance fluctuations), as the addition of interactions into the SLM would require several assumptions about the network of interactions and their magnitude. The consideration of microscopic dynamics such as resource consumption may be necessary in order to address experimental questions relating to interactions such as correlations in abundance [11]. Indeed, consideration of resource consumption has proven critical for investigating the evolutionary dynamics of microorganisms in an ecological context [20]. However, recent developments on the predictability of community function (e.g., total biomass, polysaccharide hydrolysis, resource excretion, etc.) point towards new avenues of exploration for microbial macroecology. There is increasing evidence that the functional profiles of experimental communities tend to follow quantitative rules that are amenable to mathematical modeling [73–76]. Extending microbial macroecology beyond patterns of abundance to the level of function embodies the original physiological and energetic breadth that allowed macroecology to advance our understanding of macroorganisms [4].

Finally, it is worth taking a step back to consider how the results presented here shape the microbial view of macroecology. The discipline of macroecology was originally conceived as an explicitly non-experimental form of investigation [4]. Analysis of the origin and development of macroecology provides two historical explanations for the initial rejection of experimental approaches: 1) large-scale community-level experiments are often impractical and 2) producing generalities from experiments has proven to be difficult [77]. Our results demonstrate that these two constraints are ameliorated in the study of microbial communities, running contrary to recent claims that statistical distributions are uninformative of mechanism [78]. We, and others [79], propose that the timescales, abundance, and comparative ease with which ensembles of communities can be maintained and manipulated make microorganisms an ideal system for testing quantitative macroecological predictions.

## Data and code availability

All code written for this study is available on GitHub under a GNU General Public License: https://github.com/wrshoemaker/experimental_macroecology.

## Author contributions

W.R.S., Á.S., and J.G. conceptualized the project, developed the mathematical models, and wrote the manuscript. W.R.S. performed all analyses.

## Acknowledgments

We thank S. Estrela for her assistance in reprocessing the data. We thank S. Bubnovich, M. Dal Bello, A. Goyal and members of the qEcoEvo group at ICTP for helpful discussions. This work was supported by the NSF Postdoctoral Research Fellowships in Biology Program under Grant No. 2010885 (W.R.S.). ÁS acknowledges support from Grant PID2021-125478NA-I00 funded by MCIN/AEI/10.13039/501100011033 and by “ERDF A way of making Europe”.