A simple linear relationship between resource availability and microbial community diversity

Microbial community diversity is pivotal for the functioning of our planet, but its drivers are still unclear, in particular the role of resource number and identity. To fill this gap, we studied the assembly of hundreds of soil-derived microbial communities on a wide range of well-defined resource environments, from single carbon sources to combinations of up to 16. We found a remarkable diversity in single resources but a linear one-by-one increase in the number of species with the number of additional resources. We show, both experimentally and theoretically, that both these observations could originate from generalist and specialist taxa interacting in a modular fashion within the community. Since generalists and specialists are ubiquitous in natural microbiomes, our results might apply to a variety of different ecological settings, providing a framework to predict how community diversity responds to changes in resource availability. One Sentence Summary While many species coexist in single resources, community diversity only increases one-by-one as more resources are added.

Main Text: 21 Microbial communities are extremely diverse and contribute to the health and function of every 22 ecosystem (1). Although photosynthetic bacteria are among the most abundant organisms on 23 Earth (2), many microbial communities, including those living in the human gut and in the soil, 24 rely on externally supplied carbon sources to survive and thrive (3). A major question is thus: do 25 larger numbers of available resources mean more coexisting species? This is still largely 26 unresolved because, traditionally, different facets of resource availability have been tackled 27 separately, with theory focusing on the effects of the number of resources on species coexistence, 28 e.g. (4-7), and experiments concentrating on the impact of resource identity on community 29 composition and function (8-10). Yet, understanding how the identity and number of available 30 resources shape microbial diversity is key to anticipating the response of ecosystem to alterations 31 in the environment, including global climate change and evolving human diets (11). 32 In order to illuminate how resource number and identity shape the richness of microbiomes, we 33 assayed the assembly of soil-derived bacterial communities in laboratory microcosms (8, 12). 34 We started by inoculating a rich microbial suspension obtained from a soil sample (  Table S1). The 16 carbon 37 sources represented a broad range of common soil compounds (e.g. mannose, xylose, cellulose 38 and hydroxyproline), encompassing simple sugars as well as complex polymers. We adopted a 39 daily-dilution protocol, whereby at the end of each 24-hour growth cycle the bacterial cultures 40 were diluted 1/30x into fresh media. We observed that the majority of microcosms reached 41 stability after 3 days from the inoculum (Fig. S3). We continued the experiment until day 7 and 42 measured the final richness as the number of ASVs (amplicon sequence variants) observed 43 within each community (Fig. S4). 44 Consistent with recent experimental studies (8,9,13), single-resource communities were 45 remarkably rich (mean richness = 22.5 ± 2 ASVs, Fig. 1B) and taxonomically diverse (Fig. S5). 46 This is in contrast with competitive exclusion predicting that the number of species cannot 47 exceed the number of resources (14, 15)-which, in single resources, would result into no more 48 than one species surviving. Interestingly, the variability in richness among different resources 49 was also high-with the average number of ASVs ranging from eight, in citrate, to ~40, in 50 cellulose (Fig. 1B)-and larger than the variability among replicates of the same carbon source 51 (ANOVA test, Fresource = 3.4339, p < 0.01). Richness in single carbon sources therefore depended 52 on resource identity. Community richness did not correlate significantly with molecular weight 53 (Fig. S6), but did correlate with the number of metabolites predicted to be generated from 54 biochemical reactions within the cell (Fig. S7), consistent with the idea that extensive cross-55 feeding allows for the stable coexistence of many species on single resources (16-18). 56 The large numbers of coexisting species in single resources generated the expectation that 57 community diversity would increase rapidly if more resources were provided. As previously 58 observed in marine bacteria (9,19), community composition could be roughly predicted as the 59 sum of the assemblages observed on each nutrient supplied in the mixture. To provide an 60 example, the expected richness of the community grown on glucose and hydroxyproline ( Fig.   61 1C), each alone supporting on average 24 and 11 ASVs, would be ~ 30 ASVs, i.e. the sum minus 62 the number of shared ASVs (union). Alternatively, niche overlap between the taxa found in the 63 single-resource media (20, 21) might bring the expected number of species down to the 64 maximum richness observed in the constituent singles; in the case of glucose + hydroxyproline, 65 24 ASVs. However, when we measured the richness of the communities grown in a media 66 supplied with equal amounts of glucose and hydroxyproline, we found only ~16 ASVs on 67 average, which is significantly lower than both expectations (Fig. 1C). Yet, our observed 68 richness is remarkably similar to the mean richness measured in the two constituent single 69 resources (17.5 ASVs), a trend that was consistent across many two-resource communities ( Next, we examined the full range of resource combinations included in the experiment. We 95 found that the richness predicted from the union of constituent singles significantly 96 overestimated the observed richness (Fig. 1D). The prediction based on the maximum of 97 constituent singles gave an increase with negative curvature that was not detected in our 98 experiment (Fig. 1D). A similar trend originated also when we estimated the number of 99 metabolites generated from resource combinations, with the same approach we used for single 100 carbon sources (Fig. S9). Instead, the observed average richness increased linearly with the supplied resources is an important driver of microbial diversity, the observed one-by-one relation 108 between richness and resource number was difficult to reconcile with the large diversity found in 109 single resources. Thus, we went back to the single-resource communities to gain a better 110 understanding of our observations.

111
A key feature that we did not consider while making predictions is that, in natural and 112 experimental communities, bacterial taxa exhibit a spectrum of resource-utilization strategies 113 (22, 23), ranging from generalists, for which resources are largely substitutable, to specialists, 114 showing specific resource requirements (24, 25). Importantly, these features can alter the 115 structure of the interactions among community members (26, 27). In light of this, we measured 116 the resource occupancy of the 275 ASVs observed in single resources, i.e. how many media a 117 given ASV was found in, to find a range of different resource-utilization strategies ( Fig. 2A, Fig. 118 S13).

119
Based on resource occupancy (23, 26), we considered specialists the ASVs that were observed in 120 less than 25% of single-resource media ( Fig. 2A), and generalists those that occupied the vast 121 majority of single-resource media (more than 75% of them) ( Fig. 2A). As a result of this   is calculated, for each ASV found in a single resource (target resource), using the number of multi-resource 168 media containing the target resource in which the ASV was found (X) and the number of media not containing 169 the target resource in which the ASV is found (Y), as (X -Y)/(X + Y). It ranges from 1, indicating that the 170 ASV is always present in a combination containing the resource, to -1, implying that the ASV is always absent 171 when the resource is supplied. A score of 0 is indicative of an ASV showing no specificity for that particular 172 resource. Bars indicate the mean specificity score ± SEM for generalists (pink) and specialists (teal) (N = 16).

173
Colored dots indicate the mean score for each resource (SEM are omitted for clarity, N varies for each 174 resource, see Materials and methods). 175 We implemented a version of the Lotka-Volterra model modified to account for resource 176 specialists and generalists (33). Briefly, we let the size of the species pool increase with the 177 number of resources under the constraint that specialists could grow only when the resource they 178 specialize on was present, while generalists could always grow (Fig. 3A, B). We tested different 179 interaction structures (Materials and methods, Table S2, Fig. S17) to find that the one where 180 competition between generalists and specialists was weak, but within generalists and specialists 181 was strong (Fig. 3B) was sufficient to reproduce our experimental results. In our simulations, we 182 could observe the stable coexistence of several species in single resources and an almost one-by-183 one relation between richness and number of resources (Fig. 3C), mostly driven by specialists 184 (Fig. 3D). We concluded that the existence of a modular interaction structure linked to resource-  impact on microbial assembly (12) and potentially explain the loss of specialistic taxa in two-197 resource media (Fig. 2D, S6B, S16). In support of this, we observed that the relative abundance   Table S1 for 239 the complete list and Fig. S2). The total concentration of carbon was kept the same and resources 240 were in all instances supplied in equal amounts, that was 100%, 50%, 25%, 12.5%, 6.7% and    in order to compare their slopes (Fig. S11A). The absolute value of the slope of the fitted 304 regression line informs on the abundance distribution of the ASVs in a community. More even 305 communities usually display smaller slopes (Fig. S11B). Since each community exhibited a 306 different richness, we normalized the RADS for richness (Fig. S11C) To do this, we used the 307 RADnormalization_matrix function in the RADanalysis package: from each RAD with an 308 observed richness, this function generates a "normalized RAD" with a richness corresponding to 309 the minimum richness observed in the experiment (7 ASVs) by randomly resampling the original 310 RADs for 10 times (45) . In this way, samples with different richness can be compared and 311 changes in evenness properly assessed. 313 ASVs found in single resources were classified in three categories based on how many media 314 containing a single resource they were found in, i.e. they exhibited abundance larger than 0 (23, 315 26). We considered specialists the ASVs that were observed in less than 25% of single-resource 316 media, i.e. in one, two or three resources. Generalists were those ASVs found in more than the 317 75% of media, i.e. in 13 or more resources. We defined intermediates the ASVs found between 318 four and twelve resources. These thresholds were chosen arbitrarily, but the resulted in about ~ 319 4% generalists and 80% specialists, consistently with proportions of generalists and specialists 320 observed in natural communities (23, 26, 28). We chose this simple way of assigning ASVs to 321 generalist, intermediate and specialist categories over other methods, e.g. as in (9) in order to 322 leave aside their relative abundance, which has been analyzed separately. 324 We predicted the possible number of metabolic byproducts that could be produced using the 325 resources present in each medium using a curated metabolic network. The metabolic network 326 contained a large set of metabolic reactions encompassing carbohydrate, sugar and amino acid 327 metabolism extracted from the KEGG database (46). We manually curated this large set of 328 reactions using the MetaCyc database (47) in order to limit it to reactions possible by most 329 microbial taxa common to the soil, such as Pseudomonas. We used this network to estimate all 330 the metabolic compounds that could be produced as byproducts, starting from the carbon sources 331 available in each medium. We assumed that a small set of "currency" molecules, such as water, 332 carbon dioxide and ATP, were always available as reactants when required (full list of currency 333 molecules: phosphate, oxygen, carbon dioxide, water, H + , ATP, NAD(P)H, Acetyl-CoA, CoA).

334
To estimate the possible byproducts in each medium, we employed the well-known scope 335 expansion algorithm (48-52). Each reaction in our curated metabolic network consisted of a set 336 of reactants and resulting products. For each medium, we first asked which reactions could be 337 performed using only the carbon sources available in the medium (i.e., the current "scope" of the 338 medium). We assumed that the products of these reactions could be produced and added them to 339 the set of reactants -the new scope -for the next step. In the next step, we again asked which 340 reactions could be performed using the new scope. We added their products to the scope for the 341 next step. We continued this process, step by step, until we could add no new products to the 342 scope. The resulting final scope of metabolites, minus the currency molecules provided in the 343 medium, was our estimated set of possible metabolic byproducts producible in that medium.

344
Inference of rRNA operon copy number for generalist and specialist taxa 345 To test for signatures of different life-history strategies of the generalist and specialist taxa in our 346 study, we estimated their 16S rRNA operon copy numbers. We estimated rRNA copy numbers at 347 the level of both genus and family, separately for generalist and specialist taxa. For each genus 348 identified, we queried rrnDB (53)-a database of rRNA operon copy number statistics-for the 349 median copy number corresponding to the genus. We used this as an estimate for the rRNA 350 operon copy number of that genus. For each family identified, we calculated the median copy 351 number of all genera in our dataset belonging to that family.

352
Calculation of the resource-specificity score 353 We used a resource-specificity score to test if the ASV-resource associations that we observed in 354 single resources were maintained when the single resource(s) in which the ASV was found was 355 combined with others. For each ASV present in a single resource (target resource), the resource 356 specificity score is calculated as the difference between the number of multi-resource media 357 containing the target resource in which the ASV is found and the number of media not 358 containing the target resource in which the ASV is found divided by the total number of media in 359 which the ASV is found (Fig. 2E). This is reminiscent of a preference index, which is a standard 360 measure in the behavioral sciences. Single resources are excluded from the count. The resource-361 specificity score ranges from 1, indicating that the ASV is always present when the target 362 resource is provided, to -1, implying that the ASV is always absent when that resource is 363 supplied. A score of 0 is indicative of an ASV showing no specificity for that particular resource 364 (Fig. 2E). We calculated a score for each ASV-resource pair, such that each ASV had as many 365 scores as the number of single resources is found in. Then, we computed the average of the 366 scores obtained for each single resource, separating between scores belonging to generalist and 367 specialist ASVs (Fig. 2E).

368
Detection of family-resource associations using an ensemble tree regression model 369 We calculated the relative abundance of the most prevalent families (37) in the 75 replicated 370 bacterial communities and ran an ensemble tree regression model to detect significant patterns of 371 variations in family abundance due to changes in the relative concentration of resources. 372 We chose to coarse-grain the abundance data at the family level because, while several ASVs 373 were lost and others were gained going from one to 16 resources in the growth media, the 374 families found across all combinations of resources were mostly the same. In addition, we 375 distinguished between generalist families, i.e. those that contained at least one generalist ASV, 376 and specialist families, i.e. containing only specialist ASV. Consistent with ASV-level definition, 377 generalist families displayed higher mean rRNA operon copy number compared to specialist 378 families. 379 We employed XGBoost, a gradient boosting framework based on decision trees (54).

380
Specifically, we implemented a regression model for each family in which the input was the 381 relative resource concentration and the output was the log-transformed relative family 382 abundance. We trained the model on two replicates by performing leave-one-out crossvalidation 383 of the XGBoost parameters "max_depth", "n_estimators" and "learning_rate" (55) and tested on 384 the third one with average mean-squared error across families of 6.05. We applied the Shapley 385 Additive exPlanations (SHAP) (56) to identify the resources that were more important in driving 386 changes in the abundance of each family. This analysis has been done using Python version 3.8. experimental results, which also hints on how some classes of resource explicit models might 412 also be able to do the same. 413 We used a modified version of the Lotka-Volterra model:  The interaction coefficient !" indicates the strength of the competitive inhibition on the i-th 420 species by the j-th species. Each population ! is normalized by the carrying capacity of each 421 species. The interaction coefficient !" is normalized as well, with !" = 1 for the interspecific 422 competition between the i-th and the j-th species and the same strength as the intraspecific 423 competition within the i-th species. 424 We assumed that a specialist species grows only when the resource it specializes on is provided 425 either as the sole supplied resource in the media or in combination with other resources. In 426 contrast, a generalist species grows on any resource. Although in the experiment we observed 427 ASVs that displayed an intermediate occupancy of single resources, for simplicity, we modelled 428 only two representative versions of specialists and generalists. We also assumed that the per 429 capita growth rate R in the model is the same for any species in any environment where it can 430 grow. Another assumption was the absence of dilution in the system, despite the fact that we 431 applied a daily dilution protocol in our experiments. However, we have found that the addition of 432 a dilution rate in the model does not qualitatively affect the results (results not shown). Finally, 433 we assumed that all interactions are antagonistic and that positive interactions, such as cross-434 feeding, do not enable specialists to grow when the supplied resources do not include the one 435 that they can grow on. 436 We implemented a modular interaction structure by differentiating among intra-group pairwise 437 interactions, i.e. within generalists and within specialists, and inter-group pairwise interactions, 438 i.e. between generalists and specialists. We tested the four interaction structures obtained by 439 keeping the strength of inter-group competitive interactions constant and varying the relative 440 strength of intra-group competitive interactions (Table S2). The interaction structure that we 441 expected to reproduce the linear one-by-one increase in richness with the number of supplied 442 resources is the one in which competition within generalists and within specialists are both 443 stronger than the competition between generalists and specialists. 444 We found that a model with the hypothesized modular interaction structure recapitulated our  (Fig. 3C, D, Fig. S17A). In contrast, when competitive interactions within specialists 448 were stronger than the inter-group interactions, while competitive interactions within generalists 449 were weaker than the inter-group interactions, generalists always took over the community and 450 the richness remained constant as more resources are provided (Fig. S17B). When the opposite 451 was true-competitive interactions within specialists were weaker than the inter-group 452 interactions, while competitive interactions within generalists were stronger than the inter-group 453 interactions-specialists took over as generalists went extinct when many resources were 454 available. This resulted in a U-shaped richness trend with the number of available resources (Fig.   455 S17C). Finally, when both intra-group competitive interactions were weaker than intergroup 456 competition, generalists outcompeted specialists when few resources were available and, in turn, 457 specialists took over when many resources were available (Fig. S17D).

458
The inter-group interaction coefficients were drawn from uniform distributions with range (0.2, 459 0.5). The intra-group interaction coefficients, instead, were drawn from uniform distributions 460 with range (0.1, 0.4) in case of weaker competition within groups than between groups and range 461 (0.3, 0.6) in the case of stronger competition within groups than between groups (Table S2) intermediates. The simulations were run for 2e4 unit time with R=1 starting from initial 471 population drawn from uniform distribution of (1e-6, 1e-7), and communities reached 472 equilibrium at the end of the simulations. Simulation were run in Python version 3.7.4.

473
In conclusion, our model suggests that weak inter-group interactions maintain the coexistence of 474 generalists and specialists in the communities. This is reminiscent of the conditions required for 475 the coexistence of two species in the Lotka-Volterra model. Namely, when interspecific 476 competition between two species is weaker than intraspecific competition within each species, 477 the two species coexist. In contrast, when interspecific competition is stronger than intraspecific 478 interactions, the stronger competitor cannot be invaded. This coexistence underpins a linear trend 479 in richness, consisting of a constant number of generalist species and a linear increase driven by 480 the specialist population.