Microcosm cultures of a complex synthetic community reveal ecology and genetics of gut microbial organization

The behavior of microbial communities depends on both taxonomic composition and physical structure. Metagenomic sequencing of fecal samples has revealed the composition of human gut microbiomes, but we remain less familiar with the spatial organization of microbes between regions such as lumen and mucosa, as well as the microbial genes that regulate this organization. To discover the determinants of spatial organization in the gut, we simulate mucosal colonization over time using an in vitro culture approach incorporating mucin hydrogel microcosms with a complex yet defined community of 123 human strains for which we generated high-quality genome assemblies. Tracking strain abundance longitudinally using shotgun metagenomic measurements, we observe distinct and strain-specific spatial organization in our cultures with strains enriched on mucin microcosms versus in supernatant, reminiscent of mucosa versus lumen enrichment in vivo. Our high taxonomic resolution data enables a comprehensive search for microbial genes that underlie this spatial organization. We identify gene families positively associated with microcosm-enrichment, including several known for biofilm and adhesion functions such as efflux pumps, gene expression regulation, and membrane proteases, as well as a novel link between a coenzyme F420 hydrogenase gene family and lipo/exopolysaccharide biosynthesis. Our strain-resolved abundance measurements also demonstrate that incorporation of microcosms yields a more diverse community than liquid-only culture by allowing co-existence of closely related strains. Altogether these findings demonstrate that microcosm culture with synthetic communities can effectively simulate lumen versus mucosal regions in the gut, providing measurements of microbial organization with high taxonomic resolution to enable identification of specific bacterial genes and functions associated with spatial structure.

Subdoligranulum sp. 4-3-54A2FAA and Subdoligranulum variabile DSM-15176 (Fig. 1G), and between 115 two closely related (ANI ∼ 80%) Firmicutes_C strains: Acidaminococcus fermentans DSM-20731 and 116 Acidaminococcus intestini D21 (Fig. 1H). These observations of co-existence (see Fig. S8 for additional 117 examples) concur with increased richness detected in microcosm cultures. 118 To better understand why community richness increases with mucin microcosms, we additionally grow 119 our inoculum in cultures with 1% agar microcosms (plain-agar, i.e., no mucin). We observe that 120 plain-agar microcosm culture also exhibits overall enhanced richness compared with liquid-only culture 121 (Fig. S6). However, we do note some specific strain-level differences between mucin-agar and plain-122 agar microcosm cultures (See Fig. S7,S8). This suggests that increased richness result largely -but 123 not entirely -from having a physical surface to colonize rather than the nutrients provided by mucin. 124 These results are reminiscent of similar effects in bacterial biofilms, where increased diversity has been 125 attributed to expanded spatial niches and reduced competition [36][37][38]. 126

Figure 1: (Previous page) Microcosm cultures yield stable, diverse communities with coexistence of closely related strains. (A)
We generate closed, high quality genomes for each strain in a 123-member microbial community, representative of taxa in the human gut. De novo generated genomes are more contiguous than closest previously available NCBI genomes, and represent exact matches to our strains. (B) We use this 123-member community to inoculate cultures incorporating mucin microcosms as well as non-microcosm controls. We passage (P) each culture 6 times (3 days between passages), independently sampling bacterial DNA from microcosm, supernatant, and no-microcosm control at each timepoint for downstream metagenomic sequencing. (C) We use NinjaMap to obtain community relative abundances from metagenomic sequencing data, here we plot median abundance of each strain at each passage timepoint, across experimental conditions. (D) Number of detected strains after culture stabilization (∼P3 and later) is higher in microcosm versus no microcosm cultures, indicating enhanced community richness. Grey dashed line indicates median number of strains detected (56) using same threshold with the 119-member hCom2 community in mice [26] (E) Addition of microcosms leads to broad taxonomic shifts in community composition relative to no-microcosm control, visualized here at phylum level. (F) Strain-resolved abundance patterns of 3 B. dorei strains (AN I > 99%) in our community demonstrates stable co-existence enabled by addition of microcosms, compared with dominance of a single B. dorei strain without microcosms. (G) Strain-resolved abundance patterns of 2 Subdoligranulum strains (AN I ∼ 80%) in our community demonstrates stable co-existence enabled by addition of microcosms. Subdoligranulum variabile DSM-15176 in particular also exhibits increasing abundance over passage timepoints. (H) Strain-resolved abundance patterns of 2 Acidaminococcus strains (AN I ∼ 80%) demonstrates more stable co-existence when cultured in the presence of microcosms. no obvious time-dependent signal (Fig. 2B). These results are largely consistent between mucin-agar 141 and plain-agar microcosms, with the exception of Desulfobacterota which is not enriched on plain-agar 142 microcosms. Certain individual strains also exhibit similar trends, such as Eubacterium ventriosum 143 ATCC-27560 which exhibits microcosm preference with mucin-agar microcosms but not plain-agar (Fig.   144 S9).

145
At strain level, we find a diverse range of enrichment profiles over time (Fig 2A)   4-3-54A2FAA (Fig. 2D) and Acidaminococcus intestini D21 (Fig. 2E). These findings support our 154 hypothesis that distinct strain-level spatial organization occurs within gut communities.  Table S6). We find our 157 in vitro microcosm-enrichment strain scores exhibit similarity to in vivo mucosal-enrichment species 158 scores within inter-subject variability (Fig. S10, Table S7). We also observe general agreement at

Phylogenetic regression predicts genes associated with mucosal colonization
We next test for statistical associations between microcosm enrichment and underlying microbial geno-166 types, evaluating our hypothesis that key microbial genes may regulate spatial organization in the gut.  Our approach identifies 244 KO families significantly associated with increased enrichment on mucin mi-176 crocosms relative to supernatant applying FDR correction at p<0.01 threshold (Fig. 3B, see also Meth-177 ods, Table S8). Out of these KOs, we highlight several illustrative examples whose genotype patterns     (Table S11), further supporting the importance of these gene functions in mucosal coloniza-201 tion.

202
Testing for clade-specific effects using within-phylum phylogenetic regression, we find K14440 and 203 K14743 to be among the most significant hits in Firmicutes/Firmicutes_A/Firmicutes_C, while K00441,

204
K14743 and K08217 are among the most significant hits for Bacteroidota (Table S9) (Table S10), and find statistically significant overlap between genes 207 associated with microcosm enrichment in vitro and genes associated with mucosal enrichment in vivo, 208 (log − odds − ratio = 4.0, p = 9.7 × 10 −21 , Fig. S11). Thus, we confirm that measurements from 209 our in vitro synthetic community cultures are sufficiently detailed to inform a computational gene-level 210 analysis of gut spatial organization, revealing that genes related to biofilm formation and adhesion likely 211 play key roles in modulating gut microbial structure.   Table S12). We then map presence/absence of each of these 256 BGC-groups against the 86 prevalent 219 strains in our experiment (Fig. S12), and apply phylogenetic regression to test for associations between 220 microcosm enrichment and BGC-groups.

221
Our approach yields a total of 7/256 significant BGC-groups positively associated with microcosm  Applying mucin microcosm culture with our defined community of human gut strains, we present here 234 the first strain-resolved measurements of spatial structure within the context of a complex gut microbial 235 community. By a priori generating a database of high quality closed reference genomes, our approach 236 enables high taxonomic resolution abundance measurements using metagenomic sequencing, while ef- between closely related strains, supporting our hypothesis that spatial organization in the gut occurs at 244 strain-level and trends would be missed at coarser taxonomic resolution.

245
Another benefit of using strain-resolved metagenomics is that we can identify gene families that specifi-246 cally occur in strains with microcosm enrichment (or depletion) phenotypes. We do so using phylogenetic 247 regression, a rigorous statistical approach that adjusts for evolutionary relationships between strains. This  We speculate that in mucosa-associated taxa, key regulator genes act as master switches for a host of 261 bacterial functions that alter outer membrane composition to enhance biophysical interactions with the 262 mucosal surface and thus increase mucosal colonization fitness, leading to global spatial organization of 263 these taxa towards the mucosa (Fig. 6). 264 We conclude by noting several limitations to our work and point to areas for further exploration. First, 265 we only show statistical associations -not causal mechanisms -between genotypes and microcosm 266 enrichment, meaning hits should be cautiously interpreted as potential genetic factors deserving of 267 followup investigation. Synthetic biology in genetically tractable gut strains can be used to test our 268 Figure 6: Schema for how mucosal associated genes may regulate spatial structure. Depiction of proposed framework where in mucosa-associated taxa, regulatory genes serve as master switches for microbial functions that increase mucosal colonization fitness such as LPS/EPS, membrane transporters / efflux pumps, and proteases.  Follow-on studies with our platform that incorporate strain dropout can address these questions. Finally, 279 in addition to strains from healthy Western guts, future work should incorporate taxa found in dysbiotic 280 and non-Western guts to explore how spatial structure varies between healthy and diseased states, and 281 across global geographic regions. Ultimately we believe the platform presented here has the potential 282 to transform the standard for in vitro investigation of gut microbiota, in a manner that recognizes the 283 important interplay between spatial structure and strain-level ecology.

285
Hybrid assembly of microbial isolates 286 Strains are cultured in isolation until stationary phase, followed by DNA extraction using phenol chlo-287 roform. DNA is sequenced using both Oxford Nanopore long-read and Illumina short-read sequencing, 288 followed by hybrid assembly using custom bioinformatic workflow (Fig. S2)  In vitro culture of synthetic community with mucin microcosms 302 To construct the full in vitro synthetic community, we first culture each strain in isolation in 1.8 mL of its 303 preferred media in a 96 well deep well plate (Table S1) pairs per sample. In addition to DNA derived from microbial communities, we also sequenced all input 340 strains used to construct the community to ensure strain purity and identity.

Read mapping and abundance estimation
Read mapping is performed with NinjaMap as previously described [27], using our de novo generated 343 genomes as reference database. Briefly, reads are aligned to genome sequences, with only perfect 344 unique matches considered in the first round. Ambiguous reads are held in escrow for the first round, 345 and subsequently assigned in a statistically weighted manner determined by initial abundance estimates 346 from the first round of alignment. This generates relative abundance and horizontal genome coverage  For each strain, and passage, microcosm enrichment scores are calculated as log ratio of microcosm to 358 supernatant abundance, for 3 biological replicates, replacing zeros with half-minimum non-zero value 359 prior to taking log. For each strain, a single aggregate microcosm is generated by taking the mean of 360 over mean over standard deviation of 12 log ratio scores in the late passages (P3-6, 4 passages × 3 361 biological replicates). Table S3 lists enrichment scores per strain. genomes, using bac120 multiple sequence alignment with GTDB-tk [73] (Fig. 1A). Therefore, to account 382 for this phylogenetic relatedness, for each KO we apply phylogenetic regression to test for significant 383 association between mucosal enrichment scores and maximum hmmer bitscore (standard scaled) across 384 strains. We implement this test using the R package phylolm [32], assuming a Brownian motion 385 model along evolutionary branches, using the bac120 phylogenetic tree as input. This generates effect 386 size estimates and p-values for every KO. We filter KOs for significance applying a p<0.01 cutoff 387 with Benjamini-Hochberg FDR correction. In addition to running phylogenetic regression across all 86 388 prevalent strains, we also run these models across subsets of these strains grouped by phylum to search  (Table S6). For these 13 individuals, we obtain abundance estimates 397 at all 6 sites by mapping reads to UHGG database using Kraken2 [2,35]. We then calculate normalized 398 mucosal enrichment scores for each species defined as log ratio of mucosal to lumen abundance. Score 399 are normalized by taking mean-over-standard-deviation across all individuals and sites (13 individuals × 400 3 sites -cecum, descending colon and terminal ileum -for 39 total measurements). We determine gene 401 presence absence for these species, across KOs, by using kofamscan to search the UHGG pangenome 402 database [2], and then apply phylogenetic regression as described above to test for associations be-403 tween gene presence absence and mucosal enrichment score across UHGG species. The regression uses 404 enrichment scores from 676 species detected with greater than 0.01% relative abundance in at least 405 10% of in vivo read libraries, which contain a total of 12,822 detected KEGG KO gene families (Table   406 S7,S10). coverage, as well as a bitscore greater than 0.5× the KO's bitscore threshold. We filter out BGCs with 412 fewer than 3 present KOs, and then use hierarchical clustering to cluster all remaining BGCs based 413 on their binary KO presence/absence profile into 256 BGC-groups, applying a Jaccard distance metric. 414 We then map presence of each BGC-group within community strains, and use this presence/absence 415 matrix to test for associations with microcosm-enrichment applying phylogenetic regression as described