Hadza Prevotella Require Diet-derived Microbiota Accessible Carbohydrates to Persist in Mice

Summary Industrialization has transformed the gut microbiota, reducing the prevalence of Prevotella relative to Bacteroides. Here, we isolate Bacteroides and Prevotella strains from the microbiota of Hadza hunter-gatherers of Tanzania, a population with high levels of Prevotella. We demonstrate that plant-derived microbiota-accessible carbohydrates (MACs) are required for persistence of Prevotella copri but not Bacteroides thetaiotaomicron in vivo. Differences in carbohydrate metabolism gene content, expression, and in vitro growth reveal that Hadza Prevotella strains specialize in degrading plant carbohydrates, while Hadza Bacteroides isolates use both plant and host-derived carbohydrates, a difference mirrored in Bacteroides from non-Hadza populations. When competing directly, P. copri requires plant-derived MACs to maintain colonization in the presence of B. thetaiotaomicron, as a no MAC diet eliminates P. copri colonization. Prevotella’s reliance on plant-derived MACs and Bacteroides’ ability to use host mucus carbohydrates could explain the reduced prevalence of Prevotella in populations consuming a low-MAC, industrialized diet.


Summary
Industrialization has transformed the gut microbiota, reducing the prevalence of Prevotella relative to Bacteroides. Here, we isolate Bacteroides and Prevotella strains from the microbiota of Hadza huntergatherers of Tanzania, a population with high levels of Prevotella. We demonstrate that plant-derived microbiota-accessible carbohydrates (MACs) are required for persistence of Prevotella copri but not Bacteroides thetaiotaomicron in vivo. Differences in carbohydrate metabolism gene content, expression, and in vitro growth reveal that Hadza Prevotella strains specialize in degrading plant carbohydrates, while Hadza Bacteroides isolates use both plant and host-derived carbohydrates, a difference mirrored in Bacteroides from non-Hadza populations. When competing directly, P. copri requires plant-derived MACs to maintain colonization in the presence of B. thetaiotaomicron, as a no MAC diet eliminates P. copri colonization. Prevotella's reliance on plant-derived MACs and Bacteroides' ability to use host mucus carbohydrates could explain the reduced prevalence of Prevotella in populations consuming a low-MAC, industrialized diet.

Statement on work with indigenous communities
In order to acquire scientific knowledge that accurately represents all human populations, rather than only reflecting and benefiting those in industrialized nations, it is necessary to involve indigenous populations in research in a legal, ethical, and non-exploitative manner (Abdill et al., 2022;Green et al., 2020). Here, we isolated live bacterial strains from anonymized fecal samples collected from Hadza hunter-gatherers in 2013/2014 (Fragiadakis et al., 2019;Merrill et al., 2022;Smits et al., 2017). Samples were collected with permission from the Tanzanian government, National Institute of Medical Research (MR/53i 100/83, NIMR/HQ/R.8a/Vol.IX/1542), the Tanzania Commission for Science and Technology, and with aid from Tanzanian scientists. A material transfer agreement with the National Institute for Medical Research in Tanzania specifies that collected samples are solely to be used for academic purposes. For more information on the consent practices followed, and our ongoing work to communicate the results of these projects to the Hadza, please see Olm et al., 2022).

Introduction
The industrialized lifestyle is defined by the consumption of highly-processed foods, high rates of antibiotic administration, cesarean section births, sanitation of the living environment, and reduced contact with animals and soil-all of which can impact the human gut microbiota . Certain taxa are influenced by industrialization, i.e., are prevalent and abundant in non-industrialized populations and diminished or absent in industrialized populations, or vice versa. (De Filippo et al., 2010;Jha et al., 2018;Merrill et al., 2022;Olm et al., 2022;Smits et al., 2017;Vangay et al., 2018;Yatsunenko et al., 2012). The microbiome of 1000-2000 year-old North American paleofeces is more similar to the modern non-industrialized than industrialized gut (Wibowo et al., 2021). The industrialized microbiota appears to be a product of both microbial extinction, as once-dominant taxa disappear, and expansion of less dominant or new taxa (Sonnenburg and Sonnenburg, 2014).
The industrialized diet differs drastically from non-industrialized diets, including in reduced amount of microbiota-accessible carbohydrates (MACs), a major metabolic input for microbes in the distal gastrointestinal tract (Cordain et al., 2005;Flint et al., 2012;Sonnenburg and Sonnenburg, 2014). Some gut-resident microbes use host mucin, which is heavily glycosylated, as a carbon source, depending on the availability of dietary MACs (Bell and Juge, 2021;Desai et al., 2016;Pudlo et al., 2022;Salyers et al., 1977;Sonnenburg et al., 2005). Shifts in dietary MACs alter microbial relative abundances and may increase inflammation and susceptibility to intestinal pathogens (Desai et al., 2016;Earle et al., 2015;Martens et al., 2018). Taxa are lost due to a lack of dietary MACs over generations in a mouse model, (Sonnenburg et al., 2016) and in humans as they immigrate to the U.S. (Vangay et al., 2018).
As human populations adopt an industrialized lifestyle, the prevalence of Prevotella decreases and that of Bacteroides increases (De Filippo et al., 2010;Jha et al., 2018;Kaplan et al., 2019). While Bacteroides are well-studied, Prevotella species remain understudied with few tools available for mechanistic investigation (Abdill et al., 2022;Accetto and Avguštin, 2015;Li et al., 2021;Xu et al., 2003). Both genera harbor well-documented carbohydrate utilization capabilities, encoded in carbohydrate active enzymes (CAZymes), often organized into polysaccharide utilization loci (PULs) (Bjursell et al., 2006;Dodd et al., 2010;Fehlner-Peach et al., 2019). Characterization of intestinal Prevotella species have been limited by challenges with colonization, particularly mono-colonization of germ-free mice. Here we overcome these barriers to establish a causal link between diet and P. copri abundance in a gnotobiotic mouse model.
The decreased prevalence of Prevotella in industrial populations is likely linked to a decline in relative abundance within individual microbiomes (Sprockett et al., 2020). Decreased abundance of bacterial taxa in individuals reduces the likelihood of transmission from mother to infant Sonnenburg and Sonnenburg, 2019). When compounded over generations, decreased abundance can result in a population-level decline in prevalence and eventually taxa loss or extinction (Vangay et al., 2018;Sonnenburg et al., 2016). The factors driving the decline in Prevotella and the increase in Bacteroides during industrialization remain to be defined. The abundance and prevalence of specific strains of P. copri, the dominant Prevotella species in the human gut, vary among populations based on host lifestyle, particularly diet Tett et al., 2019). Here we use gnotobiotic mice to investigate the role of diet in sustaining Prevotella and Bacteroides colonization; we demonstrate that dietary MACs play a key role in controlling the abundances of Bacteroides and Prevotella.

Bacteroides and Prevotella genomes from the Hadza microbiota vary in prevalence across populations
To compare Prevotella and Bacteroides from non-industrialized lifestyle populations, we isolated and sequenced 6 Bacteroides strains (from 4 species) and 7 Prevotella copri strains from stool samples collected from 13 Hadza individuals. Importantly, P. copri is a species known to encompass extensive genomic diversity and its division into multiple species has been discussed . Single isolate genomes were assembled using both MiSeq generated short reads (146bp) and nanopore generated long reads (10-100kb) ( Table 1).
To determine relatedness of the Hadza genomes to previously sequenced genomes, we calculated the average nucleotide identity (ANI) distance to the closest genome present in the National Center for Biotechnology Information (NCBI) GenBank. The Hadza Prevotella genomes are statistically distinct from P. copri genomes classified as complete in NCBI (p=0.0002; Wilcoxon rank-sums test). The Hadza Bacteroides genomes, however, are not statistically distinct from existing complete Bacteroides genomes in NCBI (p=0.76; Wilcoxon rank-sums test). The difference in distinctness of the Hadza Prevotella and Bacteroides genomes is unlikely due to an underrepresentation of Prevotella genomes as each genus has a similar number of published genomes (3482 and 4199, Prevotella and Bacteroides, respectively). It is also worth noting that the vast majority of human gut microbiota genome sequences were collected from North American and European samples (Abdill et al., 2022).
A phylogenetic comparison of the sequenced Hadza strains to representative genomes of the same species reveals that the Hadza Prevotella and Bacteroides strains cluster with, but are distinct from, type strains and other respective species (Fig. 1A, Fig. S1). All 7 Hadza P. copri strains belong to Clade A of the 4 proposed P. copri subgroups possessing >10% inter-clade genetic divergence . (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 9, 2023. ; https://doi.org/10.1101/2023.03.08.531063 doi: bioRxiv preprint industrialized (California). Prevalence defined as % of gut metagenomes from a population (column) in which a particular strain (row) is detected.

Figure S1. Clustering subspecies of Bacteroides and Prevotella. Number of representative genomes from
Prevotella (A) and Bacteroides (B) subspecies used in the genome comparisons in Fig. 1. Sub-species index: genomes were clustered at 98% average nucleotide identity (ANI). Each bar represents one of these subspecies bins. Blue indicates genomes isolated for this study.
To understand the prevalence of these genomes across human populations, we compared Prevotella and Bacteroides prevalence among Hadza adults and infants, four populations from Nepal living on a lifestyle gradient including foraging (Chepang), recent agriculturalist (Raute, Raj), longer term agriculturalist (Tharu), and industrial lifestyle populations (California) (Fig. 1B). We chose these groups due to their varied lifestyles and the exceptional metagenomic sequencing depth achieved, averaging 23 Gbp per sample (Jha et al., 2018;Merrill et al., 2022). Of the populations analyzed, the prevalence of Hadza Prevotella and Bacteroides isolate genomes are most similar to another foraging group, the Chepang, and most distinct from industrial lifestyle individuals (California). Prevotella genomes are rare or absent from the industrialized populations, while more prevalent and abundant in the Hadza and agriculturist samples. Conversely, nearly all Bacteroides genomes, including those isolated from the Hadza, are more prevalent in industrialized populations. The clear lifestyle shift associated with Bacteroides and Prevotella prevalence leads to the question of what aspects of the industrial lifestyle have driven these changes.

Dietary MACs are necessary for P. copri persistence
While many factors differentiate the industrial and non-industrial lifestyles, diet serves as the top candidate for driving microbiome alterations (Sonnenburg and Sonnenburg, 2014). The Hadza diet is rich in dietary MACs from foraged tubers, berries, and baobab (Marlowe and Berbesque, 2009). In contrast, the industrialized diet is typified by high caloric intake and foods rich in fat and low in MACs (Monteiro et al., 2013). We wondered whether diet alone could impact the ability of Hadza Bacteroides and Prevotella to colonize mice.  Fig. 2A). Bt H-2622 colonization density (10 9 CFU/ml in feces) on the high MAC diet was maintained in all three diet conditions (Fig. 2B). Pc H-2477 colonized to a lower degree on the high MAC diet (10 7 CFU/ml) and declined drastically following the change to the Western or no MAC diet, with no fecal CFUs detectable 7 days post diet switch (Fig. 2C). The lack of detectable Pc H-2477 in the absence of dietary MACs was particularly striking given the absence of competition from other microbes in this mono-associated state. To our knowledge this is the first example of a strain's apparent eradication in a mono-associated state due to a diet change. Two other P. copri strains (Hadza Pc H-2497 and a non-Hadza strain isolated from an individual of African origin Pc N-01) are also lost in vivo in the absence of dietary MACs ( Fig. S2 A, B), indicating that survival of P. copri in vivo depends on the presence of dietary MACs.  To better understand the strategies used by Hadza Pc and Bt to persist in vivo, we analyzed transcriptional profiling data from cecal contents of mice monocolonized with either Pc H-2477 or Bt H-2622 fed a high MAC diet relative to in vitro growth in peptone yeast glucose broth (PYG). Bt H-2622 and Pc H-2477 upregulate many genes in vivo under high MAC diet conditions (Fig. S2C, D). Despite the fact that 18% and 13% of genes in Bt H-2622 and Pc H-2477, respectively, encode for predicted carbohydrate utilization proteins, 86% (in Bt H-2622) and 65% (in Pc H-2477) of those upregulated in vivo encode for carbohydrate utilization (p < 3.7e-12 for Bt, p < 4.5e-13 for Pc, Fisher's Exact test), indicating that carbohydrate utilization is the major metabolic function of these organisms in vivo (Fig. 2D).
A comparison of glycosidic linkage-breaking CAZymes, glycoside hydrolases (GH) and polysaccharide lyases (PL), reveals that Bt H-2622 upregulates a higher proportion of GHs and PLs devoted to animalderived carbohydrate utilization relative to Pc H-2477 (Fig. 2E). Specifically, in vivo under high MAC diet conditions Bt H-2622 upregulates 8 of 22 encoded mucus targeted GHs (3/10 GH18; 5/12 GH20) whereas Pc H-2477 encodes no GH18s and only one GH20, which is not upregulated in the high MAC diet condition. In addition to targeting mucus carbohydrates, Bt H-2622 also upregulates 40 of its 97 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 9, 2023. plant-targeting GHs and PLs whereas Pc H-2477 upregulates all 38 of its plant-targeting GHs and PLs in the high MAC diet (Fig. 2E). On the no MAC diet, Bt H-2622 upregulates 2 additional GH20s (along with the other mucin CAZymes upregulated on the high MAC diet) as well as 27 plant-targeting GH and PLs relative to the in vitro condition (Fig. S2E). When comparing the high MAC and no MAC in vivo conditions, Bt H-2622 upregulates only 3 GHs, 2 of which degrade mucin (GH18) (Fig. 2F) (Sonnenburg et al., 2005). These data indicate that in the absence of diet derived MACs, Hadza Bt H-2622 relies on mucus carbohydrates and that limited mucin degrading capabilities render Pc H-2477 incapable of sustaining colonization in the absence of dietary MACs.

Carbohydrate degradation capacity differs between Hadza Bacteroides and Prevotella mirrors industrialized strains
Hadza Pc and Bacteroides isolates have a similar number and predicted function of GHs and PLs to reference strains of the corresponding species (Table 2). Unsupervised clustering of GHs and PLs reveals that the Hadza strains cluster with their type strain counterparts (Fig. 3A). When comparing the total number of GHs and PLs encoded within the Hadza strains to non Hadza strains, we found similar total numbers of these genes and distribution of substrate specificity between strains of the same species (Fig.  3B).
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 9, 2023. ; . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 9, 2023. ; https://doi.org/10.1101/2023.03.08.531063 doi: bioRxiv preprint . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 9, 2023. ; https://doi.org/10.1101/2023.03.08.531063 doi: bioRxiv preprint While Hadza Bacteroides and Prevotella strains mirror the carbohydrate degrading capacity of their non Hadza counterparts, large differences exist between the Bacteroides and Prevotella strains. The Bacteroides encode more GHs and PLs than Prevotella strains even when corrected for genome size (251/21 average GH/PL in Bacteroides; 101/5 in Prevotella; Welch Two Sample t-test p = 0.00561) (Table 2, Fig. 3B). The proportion of Bacteroides GHs and PLs that are predicted to target plant carbohydrates or animal carbohydrates are equivalent (average 34% and 37%, respectively) whereas the Prevotella-encoded carbohydrate degradation is biased toward plant over animal carbohydrates (average 44% and 19%, respectively) (Fig. 3C). The Bacteroides also encode a greater breadth of GH and PL families (averaging 68 CAZyme families per genome) while Pc isolates average 40 CAZy families per genome (Fig. S3A), consistent with previously reported distributions for industrial lifestyle derived Bacteroides and Prevotella strains . The two genera also differ in their predicted mucin-degradation capacity. CAZyme families GH18 and GH20 target carbohydrates found within the intestinal mucus lining (Luis et al., 2021). All Hadza Bacteroides isolates harbor 11-14 GH20 and 1-13 GH18 CAZymes, however the Hadza Prevotella isolates contain only 1 or 2 GH20s and only one isolate, Pc H-2497, contains a single GH18 (Fig. 3D, S3B).
The CAZyme content of Hadza Bacteroides and Prevotella isolates are similar to their non-Hadza counterparts. Hadza Bacteroides isolates contain both more GHs and PLs overall as well as broader substrate degrading capabilities that include both plant and animal derived carbohydrates relative to the Hadza Prevotella isolates. This difference between the Hadza Bacteroides and Prevotella strains is similar to that seen in non-Hazda strains suggesting that the Prevotella niche is more reliant upon plant carbohydrates compared to Bacteroides (Gálvez et al., 2020).

Dietary MACs are sufficient to maintain Pc colonization in the presence of Bt
To test whether Hadza Bacteroides and Prevotella isolates differ in their ability to use plant and mucus derived carbohydrates, we cultured Hadza and type strain Bacteroides and Pc isolates in media containing the plant carbohydrate inulin, porcine gastric mucin glycans, porcine intestinal heparin, or fructose as the sole carbon source. There is a range of ability to utilize inulin across the strains, consistent with previous work (Fig. 4A) (Sonnenburg et al., 2010). Growth in the presence of mucin, however, is divided by genera; most Bacteroides isolates grow well on mucin but the P. copri isolates do not (Fig. 4A). These data are consistent with the lack of mucin degrading capacity within the Pc genomes and the loss of Pc colonization in vivo when the host is the sole carbohydrate source.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 9, 2023. ; To determine whether the lack of diet-derived MACs is responsible for the loss of Pc H-2477 colonization in vivo, we fed mice mono-colonized with Pc H-2477 a high MAC diet and then switched to either a custom diet containing 34% inulin by weight as the sole fermentable carbohydrate to match MAC content of the high MAC diet (custom diets use gelatin as a binding agent and are noted by a "-g"; Inulin-g) or a no MAC diet (no MAC-g) (Fig. 4B) (Dubos and Pierce, 1948). The no MAC-g diet did not sustain Pc H-2477 colonization, with the strain becoming undetectable within one week (Fig. 4B). However, Pc H-2477 maintained colonization in the presence of the Inulin-g diet to levels similar to those ve g 77 to ch "; ot ). se . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 9, 2023. ; https://doi.org/10.1101/2023.03.08.531063 doi: bioRxiv preprint observed in the high MAC diet (Fig. 2C, 4B), consistent with the requirement of dietary MACs for Pc H-2477 colonization in vivo.
We were curious how dietary MACs impact the relative abundance of Pc and Bt in mice when colonized together. GF mice were co-colonized with Pc H-2477 and Bt H-2622 and fed a high MAC diet for 7 days and then either maintained on the high MAC diet, switched to the no MAC-g diet, or the Inulin-g diet for 2 weeks, followed by a one week period in which all mice consumed the high MAC diet (Fig. 4C). Prior to the diet switch (Day 0), mice harbored both Pc H-2477 and Bt H-2622. However, 7 days after the switch to either the no MAC-g, Pc H-2477 decreased dramatically in abundance relative to Bt H-2622; a decrease of Pc also occurred in the Inulin-g diet, however the drop was not as severe as the no MAC diet indicating that inulin provided support to this strain (Fig 4D). When mice were returned to the high MAC diet, those fed the Inulin-g diet regained relative abundance of Pc H-2477 equivalent to that of baseline and to mice fed the high MAC diet throughout the experiment (Fig. 4E). In mice switched to the high MAC diet from the no MAC-g diet, Pc H-2477 colonization was detectable, but remained low after 7 days on the high MAC diet. These data are consistent with the requirement of dietary MACs for Pc colonization in the presence of Bt and may show that the variety of carbohydrates in the high MAC diet (derived from wheat, corn, oats, and alfalfa) better supports Pc colonization than a single MAC source like inulin under competition from Bacteroides. Furthermore, prolonged absence of dietary MACs restricts the ability of Pc to regain abundance when MACs are reintroduced.

Discussion
The tradeoff between a microbiome dominated by Bacteroides or Prevotella based on host lifestyle has been well described, but its basis is not well understood (Gorvitovskaia et al., 2016;Yatsunenko et al., 2012). Here we demonstrate that Hadza isolates of Bacteroides and Prevotella do not differ dramatically from their non-Hazda counterparts in terms of genome-wide average nucleotide identity and carbohydrate utilization, suggesting that differences in their relative abundance and prevalence across lifestyle is not due to an inherent property of the population-specific strains themselves, but to differences in their environments. Furthermore, we demonstrate that dietary MACs are crucial for Prevotella to maintain colonization: even as the sole microbe, Prevotella is eradicated when dietary MACs are removed. Bacteroides species, however, can maintain colonization in the absence of dietary MACs due to their ability to use both plant-and host-derived carbohydrates, enabling continued colonization in low MAC industrialized diets. Our data demonstrates that in the presence of dietary MACs in gnotobiotic models, Hadza Bacteroides and Prevotella can co-exist, as is seen in the Hadza microbiome. However removal of dietary MACs results in a precipitous decline in Prevotella, which is slow to recover when MACs are reintroduced. The presence of a single MAC in the diet, inulin, was sufficient to maintain an intermediate level of colonization that then rebounded when a more complete palate of MACs was available. These data are reminiscent of the seasonal pattern of Prevotella abundance in the Hadza, which cycles in abundance with the seasonality of their diet.
All together these data are consistent with the model that prior to industrialization, human microbiomes harbored both Bacteroides and Prevotella species. As diets shift from high MAC foraged foods to low MAC industrially produced foods, abundance and prevalence of Prevotella diminished to the point of extinction in some individuals . How the loss of Prevotella and increased abundance . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 9, 2023. ; of Bacteroides within the industrialized microbiome impacts human physiology remains an important question. https://www.tidyverse.org/ . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

LEAD CONTACT AND MATERIALS AVAILABILITY
All information and requests for further resources should be directed to and will be fulfilled by the Lead Contact, Erica Sonnenburg, erica.sonnenburg@stanford.edu Data and code availability Datasets and code for analysis are available at https://github.com/SonnenburgLab/. Raw data files for WGS are in the process of being uploaded to public databases and will be freely available upon publication of this manuscript.

Experimental Model Details
Bacterial Culture Bacteria not isolated in this study were purchased from DSMZ (P. copri DSM 18205), or ATCC (all other reference strains). Glycerol stocks were struck out on Brain Heart Infusion plates with 10% defibrinated horse blood (BHIBA) and incubated anaerobically for 24-48 h at 37ºC. All growth and culturing of Bacteroides and Prevotella strains were performed anaerobically in a Coy anaerobic chamber containing 87% N2, 10% CO2, and 3% H2.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 9, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023 Mouse Husbandry All mouse experiments were performed in accordance with the Stanford Institutional Animal Care and Use Committee. Mice were maintained on a 12-h light/dark cycle at 20.5 °C at ambient humidity, fed ad libitum, and maintained in flexible film gnotobiotic isolators for the duration of all experiments (Class Biologically Clean). Swiss-Webster mice were used for gnotobiotic experiments and the sterility of germfree mice was verified by 16S PCR amplification and anaerobic culture of feces. Sample sizes were chosen on the basis of litter numbers and controlled for sex and age within experiments. Researchers were unblinded during sample collection (Pruss and Sonnenburg, 2021).

Method Details
Strain Isolation from Fecal Samples Samples for strain isolation were chosen from the samples reported previously based on the 16S abundance of either Bacteroides or Prevotella genera (Smits et al., 2017). All isolations were performed under anaerobic conditions on YCFA-Glucose and YCFA-Baobab agar. Visible colonies from the initial plates were identified via colony PCR and re-plated onto BBE and LKV plates (Anaerobe Systems).

Clustering Genomes into Subspecies
All public Bacteroides and Prevotella genomes were downloaded from NCBI GenBank on 1/26/2021 using the program ncbi-genome-download (https://github.com/kblin/ncbi-genome-download). For Bacteroides, all genomes marked as "representative genome" in RefSeq (n=53) and genomes marked as assembly level "Complete Genome" or "Chromosome" in Genbank (n=71) were retained for further analysis (n=113 genomes retained of 1,229 total genomes). For Prevotella, all available public genomes were retained (n=368). Public genomes were clustered along with the isolate genomes recovered in the study using dRep v3.2.1 (Olm et al., 2017) using the command "dRep dereplicate -S_algorithm fastANIsa 0.98 -SkipMash -nc 0.65" to ensure that genomes with ≥ 98% ANI and ≥ 65% alignment coverage according to FastANI (Jain et al., 2018) are considered to be the same "subspecies". These specific thresholds were chosen manually based on histograms of reported ANI and alignment coverage values. Representative genomes were chosen using dRep's default scoring system with the following . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 9, 2023. ; https://doi.org/10.1101/2023.03.08.531063 doi: bioRxiv preprint adjustments: public genomes marked as "representative genome" in Refseq were given an additional 50 points, and genomes recovered in this study were given an additional 200 points.

Evaluating Subspecies Prevalence and Phylogenetic Analysis
Metagenomic reads were downloaded from Merrill et. al. ) (all other populations). Metagenomic reads were mapped to Prevotella and Bacteroides subspecies representative genomes using Bowtie2 (Langmead and Salzberg, 2012), and the resulting .bam files were profiled using inStrain (Olm et al., 2021). Genomes detected with ≥ 65% genome breadth were considered "present" in a metagenome. The prevalence of each genome in each population was calculated as the percentage of metagenomes in which the genome was detected.Phylogenetic trees were made all for Bacteroides and Prevotella subspecies representative genomes detected in at least one metagenome using GToTree v1.5.36 with the command "GToTree -H Bacteria". One outgroup from a different genus was included in each tree. Tree leaves were labeled based on GTDB taxonomy release 202 (Chaumeil et al., 2020), which in some cases classified genomes as belonging to other genera than they were deposited in in GenBank. Trees were visualized using iTol (Letunic and Bork, 2021).

CAZyme Annotation
CAZyme annotations were performed for each isolate. An additional 20 strains of Prevotella copri available at NCBI, with variable assembly levels, were annotated as well for comparative purpose, with the isolates and two model strains. All amino acid sequences were first compared to the full-length sequences stored in the CAZy database (Sept. 2021) (Drula et al., 2022) using BlastP (version 2.3.0+) (Camacho et al., 2009). Queries obtaining 100% coverage, >50% sequence identity and E-value ≤ 10 -6 were automatically annotated with the same domain composition as the closest reference homolog. All remaining sequences were subject to human curation to verify the presence of each putative modules. During this process, the curator could rely on (i) bioinformatics tools, including BLAST against libraries on either full-length protein, modules only or characterized modules only, and HMMER version 3.1 (Mistry et al., 2013) against in-house built models for each CAZy (sub)family; (ii) human expertise on the appropriate coverage, sequence identity and E-value thresholds which vary across (sub)families, and ultimately on the verification of the catalytic amino-acid conservation. Hierarchical clustering of isolates' CAZyme repertoires was performed using ComplexHeatmap (Gu, 2022). Predicted substrate assignment was compiled from previously published works (Desai et al., 2016;Smits et al., 2017).

In Vitro Polysaccharide Growth Assays
Glycerol stocks were struck out on Brain Heart Infusion plates with 10% defibrinated horse blood and incubated anaerobically for 24 h at 37ºC. Isolates were passaged overnight in BHI-S (Bacteroides), and YCFA-G (Prevotella). After 16h, cultures were diluted 1:50 for Bacteroides and 1:10 for Prevotella into 200uL of culture media in a clear, flat bottomed 96-well plate. Growth media was composed of a YCFA background, plus 0.5% carbohydrate, with the exception of inulin, which was added at a 1.5% concentration. OD600 was measured every 15 minutes for 48h using a BioTek Epoch2 plate reader, with 30 seconds of shaking prior to each reading. Normalized OD was calculated for each carbohydrate condition by subtracting the average blank OD600 from the raw OD600 for each isolate grown in the corresponding polysaccharide. Maximum OD was calculated as the highest normalized OD in the first 24h period.

Colonization and Enumeration of Gnotobiotic Mice
For colonization with B. thetaiotaomicron H-2622, mice were gavaged with 300uL of a 3mL culture grown for 16h in BHI-S. For colonization with P. copri, mice were gavaged with 300uL of a 3mL culture . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 9, 2023. ; grown for 16h in YCFAC, in which was suspended 10-15 lawns (~1 per mouse) of P. copri grown on BHIBA for 48 hours. For Prevotella colonization, food removed from mouse cages and bedding changed 12h before gavage. Before the gavage of Prevotella, mice were gavaged with 300uL of 10% sodium bicarbonate in water. Food was returned 2h post-gavage.
For bicolonization experiments, mice were first colonized with Pc H-2477, then gavaged with Bt H-2622 7 days later. Bicolonization was allowed to stabilize for 5-7 days before the diet switch. Feces were collected from individual mice. Two biological replicates of 1 μ l feces were resuspended in 200 μ l sterile PBS, serially diluted 1:10, and 2μl of each dilution was plated on BHIBA. CFUs were counted after 36h anaerobic growth at 37 °C.

In Vivo Competition Assays
Feces were collected from individual mice. Genomic DNA was extracted from 2 biological replicates of fecal pellets using DNeasy PowerLyzer PowerSoil kit (Qiagen). Concentration of Pc and Bt DNA was assessed using species-specific qPCR primers (Key Resources Table). qPCR was performed using the Brilliant III, Ultra Fast SYBR Green QPCR Master Mix and a Bio Rad CFX thermocycler. Genomic DNA from Bt H-2622 and Pc H-2477 were used to generate a standard curve for each primer pair. The standard curves were used to calculate the absolute quantity of Bt or Pc DNA in the sample. The efficiency value (E) for each primer pair was calculated as 10 (1/−slope) of log10(DNA input) against Ct value. Competitive index was calculated using this equation:

Mouse Diets
The Inulin-g and No MAC-g diets were created using 32% AIN-93G Basal Mix (CHO, Cellulose Free) and 68% carbohydrates, to match the carbohydrate content of the No MAC diet (TD.150689). The Basal Mix and carbohydrate components were suspended in a mixture of water (1100ml per 250g package of Basal Mix) and 5% bovine gelatin as a binder. The carbohydrates (100% glucose, no MAC-g; 50% glucose and 50% inulin, Inulin-g) and gelatin were dissolved separately in MilliQ water and autoclaved. The gelatin mix and AIN-93G Basal Mix (CHO, Cellulose Free) (TD.200788) were added to the carbohydrate solution in a tissue culture hood, and the mix was allowed to solidify at 4ºC. Diets are listed in the Key Resources Table. After one week post-colonization, standard chow was removed and replaced with the desired test diet, and the bedding was changed. Gelatin chow was replaced every 3 days as the chow dried out. et al., 2021). Transcripts were assembled using the Stringtie commands "stringtie", "stringtie-merge", and "stringtie -e -B -p 11 -G" (Pertea et al., 2016). Differential expression was analyzed using DESeq2 (Love et al., 2014).