Diffuse cliques maintain biodiversity in species-rich ecological communities

High-dimensional phenomena, which often defy low-dimensional intuitions 1;2, are an essential and yet seldom explored frontier in our understanding of ecological communities 3. Ecologists have long speculated about how large numbers of species manage to coexist in rich assemblages. Most answers to date have focused on identifying particular dimensions along which species may organize to persist together 4. Here we instead ask: what is the characteristic structure of a community where coexistence arises from a large number of concurrent factors? In such communities, individual species might not follow any evident pattern in their interactions, yet the group as a whole exhibits a statistical structure that we call “diffuse clique”. We find remarkable quantitative evidence for this pattern across a range of plant biodiversity experiments. Our approach exploits the emergent simplicity of high-dimensional systems 5;6, a powerful idea originating in physics that has, so far, rarely been demonstrated unequivocally in ecological data. We conclude that a subtle form of collective order may underlie complex networks of species interactions. This diffuse order offers a new grasp on how ecological communities maintain their fascinating diversity.

ching framework of niche theory 9 . This framework sug-36 gests that we should identify particular trade-offs between 37 abilities such as resource exploitation 10 , defense against 38 predators 11 and tolerance of temporal fluctuations 12;13 . 39 Through these trade-offs, strict bounds are imposed upon 40 how species grow and interact, preventing any species 41 from overwhelming its competitors. 42 We propose to start from a different perspective. Co-43 existence in highly diverse communities likely involves a 44 large number of niches and trade-offs, some known and 45 many unknown a priori 14 . Each cross-species interaction 46 may be determined by a unique combination of factors, 47 precluding any simple and conspicuous (low-dimensional) 48 order in the community 3 . Some ecological theories there-49 fore make the assumption that interactions are essentially 50 random 5 -a bold move, yet one that parallels major suc-51 cesses in physics 15;16 . Fully random interactions, how-52 ever, do not allow many species to coexist 17 . The high 53 biodiversity observed in many natural communities there-54 fore implies some form of latent structure. 55 We first derive a theoretical prediction, the most par-56 simonious way to constrain species interactions in order 57 to achieve coexistence. We uncover it by asking: if one 58 samples many different interaction networks, and retains 59 only those where all species survive, what do the remain-60 ing networks have in common? Some may appear very 61 structured, others almost random. Yet, we find in Fig. 1 62 that most of these networks exhibit the same statistical 63 pattern. This pattern, expressed in equations below, is a 64 weak but crucial bias in how the most successful competi-65 tors interact with others 20 . We now derive this pattern 66 from a simpler probabilistic argument, and explain in in-67 tuitive terms how it allows coexistence. 68 Measuring species interactions is often difficult and 69 prone to high uncertainty [21][22][23] , and most empirical set-70 tings only give us access to aggregated statistics. The to-71 tal effect of interactions on one species i can be inferred 72 from its relative yield the ratio of its abundance B i in a community to its abun-74 Figure 1: Finding a general pattern of coexistence in species-rich interaction networks. Interaction networks of S species are represented by S × S matrices (square boxes) where each element β ij denotes the effect of species j on species i. Of all possible species interaction networks (bottom disk), only an infinitesimal fraction (shaded area) allows S species to coexist at some equilibrium η i . Zooming into this area of coexistence (upper disk), we find that most such networks appear almost random, yet they tend to follow a common trend which we call a "diffuse clique structure": an underlying pattern of biases (4) and correlations (6) hidden in the large spread of coefficients β ij . We define a metric ρ to quantify how well our predicted pattern is observed in a given interaction network. We find in simulations that this metric ρ gets closer to 1 as biodiversity increases (histograms show the distribution of ρ for 200 networks with S = 8 and 25 species). By contrast, some coexistence mechanisms, such as the one-dimensional competition-colonization tradeoff 18;19 , can give rise to highly atypical networks, showing unrelated or even opposite patterns.
dance without competitors K i (known as carrying capac-75 ity) in the same environment 9 . We interpret species with 76 higher η as successful competitors, as they benefit more 77 (or suffer less) in total from their interactions with oth-78 ers. The simplest way to model these interactions is by 79 assuming a linear dependence between species where β ij is the competitive effect of species j on species 81 i. This relationship, which can be tested empirically 24;25 , 82 holds between coexisting species at equilibrium in the 83 classic Lotka-Volterra model.

84
Many different interaction networks can generate the 85 same equilibrium community. Observing the coexistence 86 of S species with relative yields η i conveys some infor-87 mation about their interactions, but not enough to fully 88 determine them: the equations (2) impose S constraints, 89 while there are S(S − 1) unknown interaction coefficients 90 β ij . On the other hand, community-wide statistics, such 91 as the mean strength of competitionβ, can be reliably 92 deduced from that information 26 (Appendix E). 93 We therefore adopt a probabilistic approach, and ask 94 what is the most likely community structure, i.e. the set 95 of features most widely shared among the many possible 96 solutions. We first define a prior distribution P (β ij ) that 97 can be adapted to our biological knowledge of the commu-98 nity. For all experiments below, we simply assume that 99 each coefficient β ij is drawn independently from a normal 100 distribution with meanβ. We then compute how this 101 prior is modified once restricted to networks that admit 102 the equilibrium η i (Appendix B). Computing a posterior 103 distribution given a prior and linear constraints (2) is a 104 well-established problem in probability theory 27;28 . 105 We find that interactions β ij should follow two statis-106 tical patterns that both admit intuitive interpretations 107 (Fig. 2). First, competition must be biased to explain 108 which species are successful or not. If all interaction 109 strengths were equal to the prior mean, β ij =β, we would 110 expect any species to achieve the same relative yield, When η i > η * , we therefore expect that species i suffers 112 less competition thanβ, and conversely if η i < η * . In our 113 calculation, this appears in the conditional expectation of 114 the competitive effect of j on i, which deviates from the prior meanβ by a bias Figure 2: The diffuse clique structure is characterized by two statistical patterns in the competition of successful species. (a) Bias in competitive effects (4). If all species competed with equal strengthβ, we would expect any given species i to achieve the relative yield η i = η * (3). When η i differs from this baseline, we can infer how interactions most likely deviate fromβ. We show this deviation ∆(η i , η j ) for simulated data, in the form of a matrix which we now describe. Left to right: a species with low η j competes indiscriminately against others (white, ∆ = 0), whereas a species with high η j is a biased competitor. Top to bottom: species with η i < η * experience stronger competition on average (red, ∆ > 0) whereas species with η i > η * experience weaker competition (blue, ∆ < 0). Together, these biases indicate the existence of a "clique" of species that compete less against each other, and more against all others, thus achieving higher relative yield than the baseline η * . (b) Correlation structure between columns of the interaction matrix: the competitive effects of two successful species are anti-correlated, avoiding overlap in which species they affect, whereas competition from unsuccessful species is again indiscriminate.
We see that this bias is not evenly distributed. Compe-116 tition coming from unsuccessful species (low η j ) can be 117 random without compromising the equilibrium. On the 118 other hand, a species that is successful (achieving high 119 η j ) is likely to have biased interactions, competing less 120 on average against other successful species, and experi-121 encing weaker competition from them (Fig. 2a).

122
The second pattern imposes that successful species j 123 and k avoid competing against the same target i (Fig. 2b) 124 While the first pattern (4) determines the expected success 125 of each species, the second pattern guarantees that each 126 relative yield is exactly set to η i . We show in Appendix B 127 that this correlation pattern prevents η i from deviating 128 from its expectation, which would likely drive some low-η 129 species to extinction in a fully random community.

130
Taken together, these two patterns suggest that we will 131 generally observe a fuzzy "clique" of competitors that are 132 both biased and successful, surrounded by unsuccessful 133 species with arbitrary interactions. This picture differs 134 in multiple respects from classic explanations of coexis-135 tence. It does not suppose a measurable segregation of 136 species into distinct niches. By imposing only the weakest 137 possible constraints upon the many degrees of freedom in 138 β ij , it allows interactions to take almost arbitrary values. 139 It also represents a form of collective organization, where 140 coexistence arises, not from particular species traits, but 141 from statistical biases distributed over all interactions. 142 Accordingly, Fig. 1 shows that this structure becomes 143 increasingly prevalent (although more diffuse) in highly 144 diverse communities. 145 We now present an empirical validation of these pat-146 terns on experimental data in Fig. 3   To test our predictions, we split these data in two sets. 156 Relative yields η i in the full-diversity plots are used to 157 compute the theoretical expectations (4) and correlations 158 (6) of interactions. All other plots, comprising different 159 subsets of the species pool, are used to fit individual in-160 teraction coefficients β ij by a multilinear regression of 161 equation (2). From these fitted coefficients, we construct 162 empirical estimates of the theoretical statistics. 163 We show in Fig. 3 the interaction matrix computed in 164 the Wageningen grassland experiment. While lacking ap-165  Fig. 1 and 4.

185
The approach developed here provides a test of how 186 typical an empirical or theoretical interaction network is, 187 given the observed abundances of its species: how sim-188 ilar it is to the majority of possible networks admitting 189 the same equilibrium η i (Fig. 1). We also detail in  pendix B an algorithm for generating such typical net-191 works. A deviation from typicality may hint at low di-192 mensional mechanisms, such as particular trade-offs 4 .

193
We have introduced a novel methodology for thinking 194 about the collective organization of coexistence in ecolog-195 ical communities. This approach goes beyond the partic-196 ular theoretical predictions (4) and (6), which are simpli-197 fied results tied to our choice of unstructured prior distri-198 bution and linear interactions. When there is a positive 199 but nonlinear relationship between data and predictions, 200 our approach could be improved with more accurate in-201 ference and more realistic models, but it already captures 202 an important qualitative feature of community organiza-203 tion. The same methods could be expanded by adding 204 structure to the prior and nonlinearity to the dynamics. 205 This will allow extensions to more complex communities, 206 such as food webs, or networks that have been structured 207 by other ecological and evolutionary processes 32 . Future 208 work should explore how this approach, based on ideas 209 from statistical physics and generic properties of high-210 dimensional systems, can be generalized to other biologi-211 cal systems. 212 Figure 4: Cross-experiment validation of the two components of the diffuse clique structure. We compute the agreement between the measured and predicted values for: conditional expectations E[β ij |η i , η j ] (left) and correlations corr(β ij , β ik |η i , η j , η k ) (right), using the predictions in Fig. 2. We show the agreement metric ρ (see SI Appendix E), comprised between -1 and 1. Boxes and whiskers represent the 50% and 95% CI of bootstrapped values, with the median shown as the center line. The shaded rows show results for two coexistence mechanisms as in Fig. 1: a one-dimensional trade-off with ρ < 0 (top row) and our high-dimensional diffuse clique structure with ρ > 0 (bottom row), both aggregated over 5 simulated communities with S = 25 species. In the Wageningen experiment, ρ is significantly larger than zero for both means and correlations. The Big Bio and Bio-CON experiments comprise multiple treatments (see SI Appendix E), over which we aggregate to find that zero always lies below the 10th percentile of ρ.

213
Experimental data 214 We employ data from 3 grassland biodiversity experi-215 ments in Wageningen, Netherlands 29 and Cedar Creek, 216 MN, USA (the Big Biodiversity 31 and BioCON 30 exper-217 iments). Each experiment uses a pool of species seeded 218 or planted in various combinations, including some or all 219 possible monocultures (S = 1 species), some partial com-220 positions, and all species planted together. We removed 221 the first two year for all experiments as they showed clear 222 evidence of transient dynamics (Appendix C).

223
Interactions measured in the Wageningen experiment 224 showed much lower inference error than in other exper-225 iments. Therefore, we used this experiment to assess 226 our hypothesis that observed abundances are primarily 227 determined by fixed inter-species interactions (Appendix 228 D). The consistency of the equilibrium Lotka-Volterra de-229 scription (2) was shown through a series of tests: we em-230 ployed multiple inference procedures for the matrix β ij 231 and carrying capacities K i , using different subsets of the 232 data for prediction and validation, and found them all 233 statistically significant and in agreement within empirical 234 uncertainty. In particular, carrying capacities K i inferred 235 from all multispecies plots (S > 1) agreed with measure-236 ments in monocultures (S = 1). Likewise, interactions 237 β ij inferred from all plots with S < 8 were consistent 238 with the equilibrium values of η i = B i /K i in octoculture 239 (S = 8). This strongly supports the simple linear model 240 (2). Interaction estimates from other experiments were 241 less robust and might be affected by nonlinearity, tran-242 sient dynamics, stochasticity and errors.

Validation of theoretical predictions 244
For each experiment, we first split monoculture (S = 1) 245 replicates in two sets, and compute the species' carry-246 ing capacities K i for each set, to be used separately. 247 For a maximal plot biodiversity S max , all plots with 248 1 < S < S max and the first set of monocultures were 249 used to infer the interaction matrix β ij using the hyper-250 plane (multilinear) least-squares fit proposed by Xiao et 251 al 23 (see Appendix D). The second set of monocultures 252 and the species abundance B i in plots with S = S max 253 were then used compute the relative yields η i = B i /K i in 254 the full community. All calculations were performed 250 255 times, using different bootstrapped sample means as val-256 ues for K i and B i . Each calculation led to a different set 257 of η i , β ij and β (see Appendix A for calculation details). 258 We tested the two components of the diffuse clique pat-259 tern, starting with the pattern of means (4). We plotted 260 the measured values of β ij (hereafter y) against their the-261 oretical expectation E[β ij |η i , η j ] (hereafter x). To obtain an empirical estimate of the expectation for a single in-263 teraction coefficient, we performed a running average: for 264 each point (x, y), we replaced its y coordinate by the av-265 erage y within a window centered on x and spanning 10% 266 of the x-axis; we also measure the 90% CI over boot-267 strapped values (Fig. 3c). We then grouped all values 268 y associated with the same species pair (i, j), took their 269 median, and reconstructed an empirical matrix of expec-270 tations B ij (shown in Fig. 3d).

271
To construct a stringent test (see SI Appendix E), the 272 metric of agreement ρ used in Fig. 1 and 4