## Abstract

A central question in ecology is to understand the ecological processes that shape community structure. Niche-based theories have emphasized the important role played by competition for maintaining species diversity. Many of these insights have been derived using MacArthur’s consumer resource model (MCRM) or its generalizations. Most theoretical work on the MCRM has focused on small ecosystems with a few species and resources. However theoretical insights derived from small ecosystems many not scale up large ecosystems with many resources and species because large systems with many interacting components often display new emergent behaviors that cannot be understood or deduced from analyzing smaller systems. To address this shortcoming, we develop a sophisticated statistical physics inspired cavity method to analyze MCRM when both the number of species and the number of resources is large. We find that in this limit, species generically and consistently perturb their environments and significantly modify available ecological niches. We show how our cavity approach naturally generalizes niche theory to large ecosystems by accounting for the effect of this emergent environmental engineering on species invasion and ecological stability. Our work suggests that environmental engineering is a generic feature of large, natural ecosystems and must be taken into account when analyzing and interpreting community structure. It also highlights the important role that statistical-physics inspired approaches can play in furthering our understanding of ecology.

## I. INTRODUCTION

One of the most stunning aspects of the natural world is the diversity of species present in most ecosystems. The community structure of ecosystems are shaped through a complex interplay of the externally supplied resources available in an ecosystem, competition for these resources, as well as stochasticity [1–4]. A fundamental problem in community ecology is to understand how these processes give rise to observed pattern of species abundances. A rich theoretical framework has been developed to address this problem. Niche-based theories have emphasized the role of competition for resources [2, 5–10], while neutral theory has highlighted the role of stochastic effects [4, 11–13], and several works have investigated the interplay between stochasticity and competition [14–18].

Many of these theoretical insights have been synthe-sized in what is commonly referred to as contemporary niche theory. Contemporary niche theory highlights the role played by equalizing mechanisms, processes that decrease fitness differences between organisms, and stabilizing mechanisms, processes that decrease competition for resources. These basic organizational schema have been successfully applied to understand community structure in a wide range of settings [1–3].

One of the simplest and most influential mathematical models for niche theory is MacArthur’s consumer resource model (MCRM) [2, 7, 8, 10]. Most analysis of MCRM – including those that inform contemporary niche theory and modern coexistence theory – have focused on small ecosystems with a few species and and few resources [2, 7, 8, 10]. However, it is unclear to what extent the theoretical insights derived from ecosystems with just a few species can be scaled up to diverse, natural ecosystems. One of the defining features of large complex systems is that they often display new “emergent behaviors” that cannot be understood or deduced from analyzing small systems with just a few parts [19–22]. For this reason, it is essential to directly analyze large ecosystems with many resources and species and ask how they differ from the few-species ecosystems that have been analyzed previously. Recently, several works suggest that large ecosystems can exhibit unexpected behaviors such as phase transitions, emergent community-level cohesion, and the analogues of critical points [15, 23–27]. This highlights the need for new theoretical frameworks for directly analyzing large, heterogeneous ecosystems.

Perhaps the most successful and ubiquitous approaches for analyzing large systems in statistical physics is mean field theory. We emphasize that what is meant by a mean field theories in statistical physics is distinct from the way it is commonly understood in ecology [28, 29]. Unlike most usages in ecology, mean field theories in physics account for not only the means of various quantities but also fluctuations around the mean. In this paper, whenever we use the term mean field theory, we will mean it in this broader statistical physics definition rather than the narrow usage common in ecology. Mean field models have long history in statistical physics and have played a central role in the study of phase transitions and collective emergent behaviors in physical systems [30, 31]. Most mean field theories in physics focus on homogenous systems with identical components and couplings. However, more sophisticated variants such as the cavity method can be used to analyze heterogeneous “disordered systems” [32]. Here, we develop a statistical physics inspired mean field theory, based on a generalization of the cavity method, and use it to analyze diverse ecosystems. In this paper, we will refer to this as the cavity theory (CT).

Our methods are inspired by and build upon recent work showing the connection between community ecology the physics of disordered systems [15, 23, 24, 27, 33–38]. It is also closely related to the statistical mechanics of interacting socio-economic agents [39]. However, unlike these previous works our analysis explicitly incorporates resource dynamics, including resource heterogeneity and depletion. This allows us to naturally connect our results to contemporary niche theory and modern coexistence theory. One of the most striking aspects of our analysis is that we find that environmental engineering is a generic feature of all diverse ecosystems [40]. In diverse ecosystems, organisms can and do significantly reshape their environments by changing resource abundances and, importantly, depleting resources. Moreover, we show that many of the central theoretical quantities in our novel CT have natural ecological interpretations that generalize many classical quantities and results of niche theory to large ecosystems and quantify the role of environmental engineering in shaping community structure.

## II. MACARTHUR CONSUMER RESOURCE MODEL

In this work, we will analyze one of the canonical and most influential models in community ecology: MacArthur’s Consumer Resource Model (MCRM) [7, 8]. MCRM consists of *S* species or consumers with abundances *N _{i}* (

*i*= 1…

*S*) that can consume one of M substitutable resources with abundances

*R*

_{α}(

*α*= 1…

*M*). The consumer preferences of species

*i*for resource α are encoded by a

*S*×

*M*matrix,

*c*

_{iα}.

In the MCRM, the growth rate *g*_{i}(**R**) of a species depends of the concentration of all the resources. To model the growth rate, following MacArthur, we assume that a species *i* have some minimum maintenance cost, *m*_{i}, that they must meet. The growth rate, *g*_{i}(**R**), is proportional to amount of resources consumed, weighted by a quality factor *w*_{α}, minus this maintenance cost

If *g*_{i} > 0, then this is also the growth rate of species *i*.

The resources have their own internal dynamics which,following MacArthur, we assume can be modeled using logistic growth. Furthermore, when a resource is consumed, it’s abundance is reduced. This ecological dynamics is captured by the following coupled, nonlinear differential equations
where *F*_{α}(*R*_{α}) = *R*_{α}(*K*_{α}*-R*_{α}) describes the resource dynamics in the absences of consumption and *K*_{α} is the carrying capacity of each resource *α*. In our model, both the species and resource abundances *N*_{i} and *R*_{α} must be strictly positive. For our analysis, it will be useful to define an “effective resource capacity”
that accounts for depletion of resources by consumers [23]. The MCRM can be rewritten in terms of

A crucial property of these equations is that resources can be completely depleted from the environment. This will play an important role in what follows. Finally, we emphasize that these equations are identical those analyzed by MacArthur, Chesson, and others in deriving modern niche theory.

## III. STATISTICAL MECHANICS APPROACH TO MACARTHUR’S CONSUMER RESOURCE MODEL

Previous approaches to analyzing the MCRM have largely been confined to small ecosystems with a few species and resources. Here, we consider the opposite limit of large, diverse ecosystems where both the number of species and number of resources is large, *S*, *M* ≫ 1. In this limit, the number of parameters needed to define the ecosystem dynamics becomes extremely large. To overcome this problem, we follow a long tradition in theoretical ecology pioneered by Robert May of looking at the case where the parameters are drawn from a random distribution [41]. This allows us to ask questions about the behavior of a generic, diverse ecosystem.

We consider the case where all the consumption coefficients *c*_{iα}, resource carrying capacities *K*_{α}, and maintenance costs *m*_{i} are drawn from a random distribution. Our analytic calculations depend only on the mean and variances of the probability distributions. Denoting the expectation value of a parameters *x* over a distribution by ⟨*x*⟩, then we denote the mean and variances of our parameters by: , , and . We can also define a parameter *γ* = *M/S* that measures the ratio of resources to species.

### A. Invasion, ecological stability, and self-consistency

One of the cornerstones of community ecology is the idea of invasion [6, 42, 43]. In our analysis, we will ask under what circumstances a new species can invade an ecosystem. Denote the growth rate of species *i* when it tries to invade the ecosystem We will call this the invasion growth rate. Since we are interested in statistical properties, we will be primarily concerned with the mean and variances of the invasion growth rate averaged over all species *i* in the regional species pool: and .

The key idea that we will exploit in our analysis is the observation that as *S* and *M* get large, both the invasion growth rates, , and the effective carrying ca-pacities, are the sum of a large number of small terms. Each individual resource makes only a small contribution (of order 1/*M*) to the growth of any consumer, and every consumer makes an order 1/*S* contribution to the effective resource capacity. Thus, from the central limit theorem, the distribution of growth rates and the distribution of effective resources in the ecosystem can be well-approximated by a normal distribution. In the language of the cavity method of statistical physics, this corresponds to the replica symmetric solution. For future reference, denote the means and variance of the effective carrying capacity by and (see Figure 1).

This suggests the following intuition for thinking about our ecosystem. Each species, *i*, has a invasion growth rate drawn from a normal distribution. In other words, we can think of , where *z*_{i} is a standard normal variable. Similarly, each resource has an effective carrying capacity that is also drawn from a normal distribution, with , with a standard, normal variable. In general, the means and vari-ances depend on the abundances of all other species and resources. Our statistical mechanics inspired mean field approach exploits this observation to self-consistently solve for the means and variances of the invasion growth rate and effective resource carrying capacity. In the physics literature, these is known as cavity theory (CT). In general, this is a very subtle calculation but can be done using a generalized cavity equation (see below and in appendix).

In order to derive the CT self-consistency equations, we consider a system with *S* species and *M* resources and ask what happens when we add an additional species and resource to the system. We denote the abundances of the additional species and resource by *N*_{0} and *R*_{0} respectively. This two-step cavity where both a resource and species is removed is similar to the procedure employed to analyze the Hopfield model and compressed sensing [44, 45] and is necessary to correctly capture subtle correlations between resource and species dynamics due to environmental engineering. This approach is intimately related to classic works by MacArthur and Levins that analyzed ecological dynamics by asking if a new species could invade an ecosystem [6]. Whereas their analysis was applicable to small ecosystems with a few species, our analysis is valid for large, diverse ecosystems

Since the number of species and resources in the original ecosystem is large (*S*, *M* ≫ 1), the addition of the new resource and consumer represent a small perturbation of the original system. For this reason, it is useful to define two susceptibilities, *χ* and *ν* that measure the sensitivity of an ecosystem to small perturbations. The resource susceptibility, *χ*, measures the average change in the mean resource abundance at steady-state if we slightly increase the supply of all the externally supplied resources. Denoting the steady-state value of a quantity *X* by , we can mathematically define *χ*, as

The average species-cost susceptibility, *ν*, measures the change in mean species abundances if we slightly decrease the minimum fitness cost (or equivalently increase the growth rate),

These susceptibilities characterize the sensitivity of an ecosystem to perturbations and can be directly measured in experiments.

In terms of these quantities, one can derive a simple expression for the steady-state abundances of newly added consumer and resource (see Appendix):
where as above *z*_{0} and are independent, unit normalvariables. These equations have a beautiful and straightforward interpretation. A new species added to the system will have an invasion growth rate , which is normally distributed. If the growth rate is negative, it will not be able to invade the system and go extinct. If its growth rate is positive when introduced in the ecosystem, then it survives with an abundance proportional to its invasion growth rate. We emphasize that this proportionality constant can differ significantly from what would be expected in a single-species ecosystem and depends on all the other resources and species present in the ecosystem through the susceptibility *χ* and the variance of the consumption coefficients. For this reason, the invasion growth rate of a species when it invades an ecosystem is positively correlated with its abundance. Similarly, the new resource is depleted if its effective carrying capacity is negative. Otherwise the steady-state abundance of the new resource is proportional to its effective carrying capacity. These equations are similar to the arguments of MacArthur and Levins on the necessary conditions for invasibility to large ecosystems [6]. They also generalize results for species abundances derived in [27] using the Lotka-Volterra equation and the results in [24, 38, 39] which ignored resource depletion and resource fluctuations.

### B. Comparison with numerics

Unlike small ecosystems, we cannot analytically solve for the all the resource and species abundances. However, we can take a statistical approach that allows us to calculate statistical properties of species and resource abundances at steady-state. We also restrict our analysis to uninvadable steady-states, defined as a steady-state which cannot be invaded by any species. This, both simplifies the mathematics, and allows us to more directly relate our calculations to ecology.

Using (7) is it possible to derive self-consistency equations for the fraction of species in the regional species pool that survive, *ϕ*_{N}, the mean abundance of the species , and variance and second moment of surviving species aLbundances, 〈(*δN*)^{2}〉 and respectively. We can also calculate the analogous equations for resources: the fraction of resources with non-zero abundance, *ϕ*_{R}, the mean abundance of resources and variance and second moment of the resource abundances, , and . The equation are derived in Appendix C and can be solved numerically.

To check the accuracy of our CT, we compared our analytic predictions to numerical simulations (see Figure 2). We simulated (2) for two different choices of distributions for the *c*_{iα}. In the first set of simulation, the *c*_{iα} were binary random variables with *c*_{iα} = 1 with probability *p* and *c*_{iα} = 0 with probability 1– *p*. The probability *p* can be viewed as the level of generalism in the regional species pool. As *p ⟶* 0, all organisms in the community are specialist and consume a handful of resources. When *p ⟶* 1, the community consists of generalists who can consume almost all resources. In the second set of simulations, we drew the consumption coefficients from a Gaussian distribution with the same mean and variance as the corresponding Bernoulli distribution with probability *p*.

As shown in Fig. 2, our analytic results agree remarkably well with numerical simulations. The agreement between theory and numerics is nearly exact when *c*_{iα} are drawn from a Gaussian and shows qualitative agreement even when the consumption coefficients *c*_{iα} are binary random variables. This is a result of the Gaussianity assumptions used to derive the cavity equations (see Appendix). The discrepancy between the binary case and Gaussian case stems from the fact that the for large *S* and *M* the *c*_{iα} are strictly positive for the binary case but generically contain some negative elements for Gaussian distributions. A negative *c*_{iα} implies that species *i* produces resource *α* at a fitness cost to itself. Thus, all simulations with Gaussian include a small fraction of public good producers that are accounted for in our theoretical calculations but are absent in the simulations with binary variables.

Despite these differences, for both choice of distributions the fraction of surviving species declines with increasing *p*. This is consistent with the basic idea of nichetheory that as *p* increases, there is increased competition resulting in greater competitive exclusion. In contrast, the mean abundances of surviving species and resources shows a non-monotonic behavior as a function of *p* in both numerical simulations and analytics (see appendix and Fig. 5 for additional simulation results).

## IV. GENERALIZING NICHE THEORY TO LARGE ECOSYSTEMS

The MacArthur consumer resource model has played a central role in the development of niche-based theories of community assembly [2, 7–10]. However, most of these analyses have focused on small ecosystems with just a few species and resources. Here, we discuss the ecological implications of our analysis for understanding community assembly in large ecosystems with many species and resources.

### A Relating MCRM parameters to ecology

We begin by relating the parameters of the MCRM to more ecologically meaningful quantities such as the niche overlap, fitness, zero netgrowth isoclines (ZNGI), and impact vectors. In ecology, the niche overlap, *ρ*, measures how much two species compete for the same resources. The larger the niche overlap, the more species compete. For small ecosystems, the niche overlap is bounded between 0 and 1, with a niche overlap of zero meaning the species do not compete for resources and a niche overlap of one indicating the species have identical consumption profiles. In the context of the two species MacArthur resource model, the niche-overlap between species can be thought of as the percentage of variance explained if one performs a regression of the first consumer’s consumption vector against the consumption vector of the second species[1, 7, 8]. Using this observation, we can naturally extend the idea of niche overlap to entire ecosystems by defining an ecosystem-level niche overlap *ρ* in terms of the mean and variances of the consumption coefficients *c*_{iα}:

One useful way of thinking about *ρ* is that it measures the niche-overlap between two species randomly drawn from the regional species pool. It is easy to see that When, all species have nearly identical consumption preferences and *ρ* 1. In contrast when species will have very distinct consumer preferences and *ρ →* 0.

Another fundamental quantity in contemporary niche theory is the ecological fitness of an organism, This fitness is the initial growth rate of organism *i* in the *absence* of other species. In general, the actual growth rate of a species will differsignificantly from the fitness if the resource abundances differ significantly from the resource carrying capacities *K*_{α}. For this reason, we will refer to this as the “naive” fitness.

We show in the appendix that it is also possible to relate our parameters directly to ZGNIs and generalized impact vectors.

### B. Niche overlap and coexistence

One of the fundamental results of niche-based theories is that as the niche-overlap between species increases, coexistence become more and more difficult [1]. The underlying reason for this is species that have similar consumer preference are more likely to compete with each other, resulting in competitive exclusion. Thus, increasing the niche-overlap in the community should decrease the fraction of species *ϕ*_{N} that can co-exist in a community. On the other hand, stabilizing mechanisms that decrease the fitness differences between species should increase coexistence. We can parameterize the fitness differences in the community by the dimensionless quantity *σ*_{m}*/m* equal to the standard deviation over the mean of the maintenance costs *m*_{i} over all species in the regional species pool. This choice of parameterization is in line with contemporary niche theory where fitness differences are defined as the difference in growth rates when species have identical consumption preferences [1]. Figure 3 shows *ϕ*_{N} as a function of the niche overlap *ρ* and . This choice of niche-overlap corresponds to varying the probability *p* for having a non-zero *c*_{iα} from 0.1 to 0.9 (see Fig. 2). As predicted by niche theories, increasing *ρ* leads to increased competition and a smaller *ϕ*_{N}. In constrast decreasing *σ*_{m}*/m*_{i} at a fixed *ρ*, leads to a larger fraction of species surviving. Thus, in this regard large ecosystems behave quite similarly to predictions made by analyzing smaller models.

### C. Resource depletion and environmental engineering

One ubiquitous feature of our analysis that is often absent in smaller ecosystems is the large scale depletion of resources. As shown in Fig. 2, species can significantly change the resource profile and deplete a large fraction of resources initially present in the environment. This environmental engineering can change which species survive and thrive in an environment. One way to measure the effect of environmental engineering and the reshaping of the resource profile is to measure the correlation between the naive fitness of an organism,, and its steady-state abundance in the ecosystem . The fitness *f*_{i} measures the growth rate of organism *i* if it is introduced into an environment in the absence of other species. For this reason, we expect *f*_{i} to be highly predictive of when resource abundance profiles are not significantly perturbed by consumption. On the other hand, in the presence of significant environmental engineering, we expect the correlation between *f*_{i} and to decrease significantly.

Fig. 4 shows *f*_{i} versus for numerical simulations where the *c*_{iα} drawn from a binomial distribution with *p* = 0.1 and *σ*_{m}*/m* = 0.1, as well as the case where parameters are Gaussian random variables with mean an variance matching the binomial setting. From the figure, it is clear there is a significant correlation between *f*_{i} and Organisms with higher fitness disproportionately survive in the ecosystem. However, a significant number of organisms that have a high naive fitness *f*_{i} can still go extinct in the ecosystem (black points). The difference between plots (A,B) and (C,D) is that in the former *K*_{α} and *m*_{i} are kept positive by ensuring they are drawn from a gamma distribution and the consumer preferences *c*_{iα} are always positive since they are binary (1 with probability *p* or 0 otherwise). In (C,D), each of these parameters is drawn from a Gaussian distribution, but with the same mean and variance as in (A,B). This allows *K*_{α}, *m*_{i}, and *c*_{iα} to be negative. A negative *c*_{iα} means that species *i* produces resource *α* at fitness expense to itself (i.e. public good production). As expected, this results in much more environmental engineering than the case where the *c*_{iα} are strictly positive. In C, red points indicate species with negative fitness that can stably exist in the com-munity by utilizing public goods . In D, red points correspond to resources with a negative capacity which end up in the environment due to public good production by high abundance species. Conversely, species that cannot survive in the environment in the absence of other species can fixate due to environmental engineering (red points). Importantly, this emergent environmental engineering is a collective property of the whole ecosystem and results from a complex interplay between organisms and environment. These simulations demonstrate how environmental engineering can dramatically modify community structure.

Additionally, Fig. 4 shows predictions from our CT for the correlation between *f*_{i} and . Within our replicasymmetric ansatz, these correlations are described by normal distributions whose variances and covariances can be calculated using our self-consistent equations. The contour lines represent half a standard deviation spread of our normal distribution. Our theory qualitatively captures the shape of the correlation between *f*_{i} and We give explicit expression for these correlations as well as the mutual information between species abundances and naive fitness in Appendix E.

## V. DISCUSSION

Niche-based theories have played a fundamental role in shaping our understanding of community assembly and community ecology. In this work, we use ideas and methods from statistical physics to analyze a canonical model in community ecology, MacArthur’s Consumer Resource Model (MCRM). Unlike previous works, our statistical physics inspired approach allows us to analyze large ecosystems with many species and resources. Our results suggest that organisms can significantly perturb their environments. The abundance of resources can be significantly altered and resource can even be completely depleted. We find that such niche-construction and environmental engineering is a generic feature of MCRM. This suggests that in complex ecosystems, organisms actively construct their environment. To quote Levins and Lewontin, “they are not the passive objects of external forces, but creators and modulators of these forces” [22]. The effects of environmental engineering are even more dramatic when consumers can produce public goods at a fitness cost to themselves. In this case, species and resources that could not survive in isolation can fixate in the ecosystem.

To carry out our analysis, we developed a sophisticated theory based on the cavity method. One of the most striking things about our analysis is that many physical quantities that appear in the “cavity equations” have natural ecological interpretations in terms of invasion growth rates and effective carrying capacities. The underlying reason for this is that the cavity methods is based on asking how ecosystems are perturbed when a new species and a new resource are introduced into the ecosystem. Conceptually, this is very similar in spirit to many classical arguments in community ecology pioneered by Levins and MacArthur that ask whether a new species can invade [2, 6]. This naturally allows us to generalize many of the results from niche-based theories to large, diverse ecosystems. However, the price we pay for using our cavity approach is that we are limited to making statistical predictions.

An important question for future investigation is to ask how our results change if we make the model more realistic. In the MCRM, all species are assumed to have a linear, Type I functional response. It will be interesting to generalize our model to non-linear functional responses. We have also neglected the effects of environmental and demographic stochasticity. Stochasticity can induce phases transitions in ecosystems from a nichelike phase where competitive effects dominate community assembly to an ecologically neutral-like phase where stochasticity is the primary determinant of community structure [15, 23]. It will be interesting to see if the techniques developed here can be generalized to this more complicated setting. Finally, we have assumed that our population can be modeled as a well-mixed community. However, spatial effects can qualitatively change the behavior of cellular populations [46, 47] and are likely to play an important role in community assembly.

## VI. ACKNOWLEDGEMENTS

We would like to thank Josh Goldford, Kirill Korolev, Seppe Kuehn, Alvaro Sanchez, Daniel Segr`e, Cui Wenping for many useful discussions. PM was supported by NIH NIGMS grant 1R35GM119461, a Simons Investigator award in the Mathematical Modeling of Living Systems (MMLS), and a Scialog grant from the Simons Foundation and Research Corporation. MA was supported by the Swartz Program in Theoretical Neuroscience at Harvard.

## Appendix A: Basic Setup

We briefly summarize MacArthur’s classical consumer resource model (MCRM). Species *i* = 1 *… S* grows at a rate proportional to its utilization of resources, *R*_{α}, *α* = 1 *… M*, in the environment. This is described by the equation:
where *w*_{α} is the value of one unit of resource to a species (e.g. ATPs that can be extracted); *c*_{iα} is the rate at which species *i* consumes resource *α* and converts that into a “growth rate”, *m*_{i}; *m*_{i} is the minimum amount of resources that must be consumed in order to have a positive growth rate. We have also added a small perturbation *h*_{i} to the system that will do a linear expansion in. The original MCRM corresponds to the choice *h*_{i} = 0. We define the growth rate to be

In consumer resource model, resources satisfy their own dynamical equations:
where the first term (with *b*_{α} = 0) describes the resource dynamics in the absence of any species and the second term models the consumption of resource by species in the environment, and *b*_{α} is small perturbation. The original MCRM corresponds to the choose *b*_{α} = 0. Furthermore, define the effective carrying capacity

We will consider the case when the consumer preferences *c*_{iα} are random variables that can be characterized by their means and variances. In particular,

And

To perform the cavity equations, it is useful to define several other quantities. Let us a define the fluctuating part of the consumer preferences *d*_{iα} as

Then,we have that and

We will also assume that the carrying capacities are drawn from a Gaussian distribution with and

Finally, we assume that the minimum survival coefficients are also drawn from Gaussian distribution with

And

For future reference, it will also be helpful to define the ratio the average resource abundance, and the average species abundance

With these definitions, notice that we can rewrite (A1) as and rewrite (A3) as

We can define the mean growth rate of the population and the mean effective capacity of resources in the ecosystem to be as in the main text. In terms of these quantities, we can rewrite these equations as

The terms on the right hand sides of the equation above have a natural interpretation as the “fluctuating parts” of the growth rate and effective carrying capacity. In particular, we have rewritten the growth rate for species *i* as the sum of the mean growth rate *g* and a fluctuating component *Δg*_{i} defined as

We have split the effective carrying capacity of resource *α* is divided into a mean *K*^{eff} and fluctuating component defined as

## Appendix B: Deriving the Species and Resource Distributions

To derive the cavity equations, we will relate a system with *S* species and *M* resources to a new system where we add an additional resource *R*_{0} and and additional species *N*_{0}. Thus, the cavity equations relate a ecosystem with *S* + 1 and *M* + 1 resources to a ecosystem with *S* and *M* resources.

Then we can write equations for this new ecosystem (to leading order in *S*):
and

We can also write down the corresponding equations for the new resource and species:

and

We now focus on steady-state. Let us denote the steady-state value of a quantity *X* by . Then, we can define some susceptibilities that are extremely useful for what follow:
and

Now we are in a position to perform the cavity calculation. Let us denote the steady-state value of a quantity *X* in the absence of the new resource and species as Then, since the addition of a resource and species repre-and (B4) sents a small perturbation (order 1*/S*), we can write:
and

We can now plug in these expressions into the steadystate equations for *N*_{0} and *R*_{0}. This gives:

If we now take leading order contributions to *S* in this expression, and take expectation value over expressions this reduces to

Notice that, to leading order in *S*, we can model the term which is just the invasion growth rate minus the mean growth rate , as a Gaussian random field with mean 0 and variance
where

If we let *z*_{N} be random field with mean zero and unit variance, and define the average suspectibility
then we can write the equation for *N*_{0} as

We can also derive a similar equation for *R*_{0}. This is given by

Using the same logic as above, to leading order we have where and with

We can solve these equations and get

Thus, the distributions for *N* and *R* are given by truncated Gaussians.

## Appendix C: Self Consistency Equations

Let us now write some self-consistency equations in the replica symmetric phase. Let us define the number of non-zero species and resources as *S** and *M** respectively.

Our goal is, given some parameters {*K, σ*_{K}*, m, σ*_{m}*, μ, σ, S, M*}, to find the values for {*ϕ*_{S}*, ϕ*_{N} *, ⟨ N ⟩, ⟨ R ⟩, q*_{R}*, q*_{N} *, χ, ν*}. Since there are eight unknowns we will need eight equations. It will also be useful to define:
and the function

First, let us write self-consistency equations for the susceptibilities. Taking derivatives with respect to *m* and *K* of (B21) and noting that the fraction of non-zero species and non-zero resources is *ϕ*_{N} and *ϕ*_{R} respectively gives

Notice now that if we define with *z* a gaussian random variable we have that:

We can now use the fact that (B21) implies that the species distribution and resource distribution is given by a truncated Gaussian to write self consistency equations for the fraction of nonzero resources and species as well as the the moments of their abundances:

Together (C4), (C7), and (C14) define the 10 selfconsistency equations we need, along with the definitions (B11) and (B18).

We solve these mean field equations numerically using the sum of squared differences between the left and right sides of equations (C9-C14) as an energy function which we minimize using the basinhopping optimization algorithm from the scipy.optimize. The algorithm uses random perturbations, local minimization, and an accept or reject criterion to attempt to minimize function which may be non-convex. The parameters we used were a temperature of 1, a step size of 0.5, and 5 iterations or initializations. Note that the equations can also be solved iteratively, but we found these solutions were stable for a smaller set of parameter values using this approach.

## Appendix D: Zero net-growth nullclines and generalized impact vectors

We can also easily relate our mean-field quantities to ZGNIs. Recall, that ZGNI’s delineate range of resource conditions in which a species maintains a positive growth rate [9, 10]. Each species *i* defines a ZGNI in the resource space defined by the equation . Geometrically, we can view **as a hyperplane in resource space whose dot product with the consumption coefficients ***c*_{iα} of species *i* equals *m*_{i} (see Eq. 1). If *m*_{i} *≪*1, the ZGNI is well-approximated by the plane perpendicular to *c*_{iα}. We can calculate some statistical properties of these ZGNI. Notice that the mean value of each component is just
and the expected value of the square is just

In Modern Niche Theory, another important quantity is the impact vector of a species *i*. The impact vector describes how resources are depleted by the addition of another individual. Here, we introduce the idea of generalized impact vectors that measure how the steady state concentration of a resource *α* changes due to the introduction of a species *j* Alternatively, we can consider a system without resource *β* and then ask how it’s addition changes the species abundance of a species *i*. These define the generalized impact vectors (GIVs).

These are of course the leading order contribution (in*S*) to the cavity equations under the replica symmetric assumption, namely (B8) and (B7). Thus, the components of the two “generalized’ impact vectors are given by:
and

## Appendix E: Comparison of individual fitness to true growth rate and steady state abundance

We want to quantify how much the naive fitness (the growth rate of an organism without other species present) is correlated to the invasion growth rate, which in turn is closely related to the steady state abundance of each of the species in the community. From the definition of the invasion growth rate: and naive fitness: we can compute the level of correlation between the two using the CT. To begin, we compute the means of each of these distributions:

And

Note that the consumer preferences *c*_{iα} of an individual species are independent of the resources steady state levels when that species *i* is not included in the community as in the previous equation. We can also compute the correlation between fluctuations from the mean naive fitness and the mean invasion growth rate: i.e. if a species has a higher or lower individual fitness, how should we expect this to impact its growth rate in the community? To understand this correlation, we define δ*f*_{i}*=f*_{i}– ⟨*f _{i}* ⟩and and compute:

In the large *S* limit, the important terms remaining in the average above are:
thus in the asymptotic limit we can write the correlation between the two forms of fitness as:

To compute the correlation between carrying capacity and resource level we modify (B21) from our capacity calculation, which yields such a relationship:

Using this relationship and letting *k* be drawn from the same distribution as *K*_{α}, where
we compute

The full form of this integral is thus:

By rewriting this as it may be simplified via integration by parts on the first term, yielding:

Where and

Note, we can also compute Pearson’s correlation coefficient for these two fitness metrics:

Using (C10) and (C12) we can write the preceding expression as:

### 1. Abundance vs naive fitness

Note that given the relationship that *N* is a scaled version of *g*^{inv} where all negative values are truncated to zero (B21), it follows that we can compute the correlation:between and *f*_{i}, where Where we let and *N*_{i} = 0 otherwise. To better understand that the correlation between the abundance and fitness of a species, we compute the correlation between and *f*:
and

Using these covariances, along with the means:

And
we are able to generate theoretical predictions for the distribution of *f, N*. See Fig 4 where the theoretical plot of *f*, is compared with values of fitness *f* and abundance *N* for all species in a network over many realizations.

### 2. Resource capacity versus resource abundance

In the same Fig 4, we additionally plot a theoretical prediction overlayed with numerics of how the resource capacities *K*_{α} are related to the resource abundances *R*_{α}.

If we define which is equal to the predic-tion of *R*_{α} when the resources abundance is positive, anduse the fact
then we may combine these two relations to yield:

We can thus compute the correlations:

Also the means are easy to compute:

This allows us to make theoretical predictions for how the resource abundances are correlated to resource capacities.

### 3. Mutual information between individual fitness and true growth rate

The mutual information between two Gaussian variables *x* and *z* is simply (note the means of these random variables do not contribute so we will assume them are zero mean random variables):

Thus,

This gives us a theoretical prediction using the predicted form for the correlation coefficient (E15).

## Appendix F: Additional simulations and notes

We discussed how the theoretical curves were generated in Appendix C. The numerical simulations were performed by solving the corresponding ODEs (4) and integrating numerically until time 50, 000 with 1000 steps. Although it is not always needed, we improved the accuracy by additionally including a small amount of migration noise which we lowered linearly to a negligible roundoff error over the course of the integration to help ensure that a species that was favored to survive would not go extinct.

We also ran simulations in other regimes, such as the one shown in Fig 5 where we consider fixing *μ*_{c} = 1 while varying *σ*_{c} to study the setting when we are less interested in comparing specialists to generalists and more interested in the effect of niche overlap and how a high overlap in the generating distribution can reduce the number of surviving species.

## Footnotes

↵* pankajm{at}bu.edu