Distinct polysaccharide growth profiles of human intestinal Prevotella copri isolates

The gut-dwelling Prevotella copri, the most prevalent Prevotella species in human gut, have been associated with diet as well as disease, but our understanding of their diversity and functional roles remains rudimentary, as studies have been limited to 16S and metagenomic surveys and experiments using a single type strain. Here, we describe the genomic diversity of 83 P. copri isolates from 11 donors. We demonstrate that genomically distinct isolates, which can be categorized into different P. copri complex clades, utilize defined sets of plant polysaccharides that can be predicted in part from their genomes and from metagenomic data. This study thus reveals both genomic and functional differences in polysaccharide utilization between several novel human intestinal P. copri strains.

the oral cavity, and at least 17 species of intestinal Bacteroides, only five Prevotella species have been reported in human intestine, with Prevotella copri being the most abundant (Accetto and Avgustin, 2015;Ferrocino et al., 2015;Ibrahim et al., 2017;Li et al., 2009;Lin et al., 2013;Liu et al., 2018). Although mechanistic models of metabolism, growth, and colonization have been established for several intestinal Bacteroides and oral Prevotella, studies of intestinal Prevotella have lacked experimental tools and relied heavily on only two strains (Hayashi et al., 2007).
Many researchers have observed Prevotella's presence and abundance by 16S sequencing, but its genomic diversity and biological properties have not been described. Because Prevotella copri are highly sensitive to oxygen, and difficult to isolate and cultivate, previous studies have been limited to the single commercially available type strain, DSM 18205, to perform metagenomic comparisons and draw experimental conclusions. Prevotella spp. have extensive gene acquisition and loss (Gupta et al., 2015;Zhu et al., 2015). In fact, accessory genes within the P. copri DSM 18205 genome account for around 40% of all genes encoded within the genome, the highest proportion of accessory genes identified in a comparison of 11 gut-associated species (Zhu et al., 2015). This estimate should be revised, however, since a metagenomic study (supplemented by whole genome sequencing data from the present study) affirms that P. copri is diverse, comprising four genomically distinct clades . Further improvements in genomic comparisons of P. copri may be expected soon, since another study reports a closed genome of P. copri, generated from long-read sequencing of whole microbiomes (Moss and Bhatt, 2018).
Whether the genomic diversity in human P. copri translates to functional diversity has not yet been fully explored. This is particularly relevant in light of elevated frequency of P. copri colonization in some cohorts of treatment-naïve rheumatoid arthritis patients (Maeda et al., 2016;Scher et al., 2013). A better understanding of whether there are functional differences in strains from patients compared to healthy subjects will contribute to determining whether there is a causal relationship between P. copri and this autoimmune disease.
Several studies have found associations between diets and the presence of intestinal Prevotella.
Prevotella have been positively associated with diets high in plant fiber and carbohydrates, and negatively associated with fat and amino acid diets (De Filippo et al., 2010;Ruengsomwong S, 2016;Wu et al., 2011). Genes associated with carbohydrate catabolism were identified in P. copri metagenomes and correlated with vegan diets (De Filippis et al., 2019). The plant polysaccharide xylan has been used to select for growth of human Prevotella (Tan et al., 2018).
Prevotella isolates from livestock grow in the presence of plant polysaccharides and encode gene products capable of breaking down these polysaccharides (Accetto and Avgustin, 2019). Thus, diet may be an important factor promoting growth of intestinal Prevotella. Whether human P. copri are uniquely suited to digest dietary components highly abundant in plant-based diets remains unsubstantiated by experimental and genomic evidence.
Intestinal commensal bacterial genomes encode a system of genes, often clustered, that collaborate to harness energy from carbohydrates present in the host's intestinal lumen.
Extensive and careful work in Bacteroides species has identified several polysaccharide utilization loci (PULs) that are specific for glycans derived from plant, animal, and host sources (Martens et al., 2008;Martens et al., 2011). Bacteria lacking one of several critical genes in any polysaccharide-specific PUL fail to grow when the polysaccharide is provided as the sole carbon source (Martens et al., 2011;Raghavan et al., 2014). PULs have been linked to diet and subsequently exploited in experimental systems. Notably, a seaweed-utilizing PUL was horizontally transferred from marine bacteria to intestinal bacteria of seaweed-ingesting hosts (Hehemann et al., 2010). This PUL was experimentally introduced to B. thetaiotaomicron, and seaweed-selective growth was demonstrated in laboratory mice (Shepherd et al., 2018). Although by far the most extensive work on PULs has been performed in Bacteroides species, PULs have been annotated in Prevotella type strains and used to predict growth on plant polysaccharides in Prevotella isolates from livestock Avgustin, 2015, 2019). Despite their association with plant diet and PUL predictions, the functional capabilities of human intestinal Prevotella have not been tested, due to a dearth of cultured human intestinal Prevotella isolates.
Previous studies have been limited to analysis and experimentation with the available genomes and strains of Prevotella species found in human, two of which were isolated from the same individual (Hayashi et al., 2007). Here, we report whole genome sequencing of 83 Prevotella copri isolates from the stool of eleven individuals. We observe extensive genomic diversity of P. copri isolates within and between hosts. This genomic variation was exemplified by specific gene clusters and linked to functional growth of isolates on predicted plant polysaccharides, explaining the association of colonization of Prevotella with a plant-rich diet.

Isolation of Prevotella from human stool
To investigate the genomic diversity of Prevotella copri and also identify strains for functional analysis, we developed a qPCR screening and isolation strategy for P. copri from human stool ( Figure S1, Table S1, Methods). P. copri was identified in 31/63 individuals by qPCR ( Figure   S1). 16S rRNA gene sequencing of the hypervariable V3-V4 region found P. copri in 41/63 individuals, with 18/63 individuals harboring P. copri at levels >10% relative abundance ( Figure   1A). Relative abundance ranged from < 0.01 to 0.766. From the V3-V4 16S rRNA gene data, five P. copri 16S variants were identified ( Figure 1A, Supplemental file 1). The P. copri 16S variants displayed 2-6 single-nucleotide variants (SNVs) within the 253 bp region, corresponding to >97% sequence-sequence identity (Callahan et al., 2016). P. copri variants 1 and 2 were the most abundant ones in the stool samples. P. copri detection by qPCR positively correlated with P. copri detection by 16S sequencing (Figure 1B). The qPCR screen reliably detected P. copri in samples with >1% P. copri, as determined by 16S sequencing, but was less sensitive for samples containing P. copri at lower relative abundance. We collected 83 isolates from eleven individuals harboring P. copri ( Table 1), one of whom had <1% P. copri (Methods).

Variability in P. copri isolates
Our earlier study indicated differences in the P. copri genomes from new onset rheumatoid arthritis and healthy individuals (Scher et al., 2013). Other studies have since described genetic variation between strains from healthy individuals (De Filippis et al., 2019;Zhao et al., 2019).
We assessed whether genetic diversity existed between P. copri isolates from different individuals, or, furthermore, between isolates from the same individual, by sequencing the full genomes of the isolates. Alignment of filtered isolate sequencing reads to the P. copri PanPhlAn database revealed that fewer than 60% of reads mapped to the reference genome, indicating unexpected and unexplored diversity of P. copri genomes in our isolates (Figure 2A) (Scholz et al., 2016).
Evaluation of the de novo assembled genomes revealed low contamination and similar genome size among all isolates (Table S2) (Bankevich et al., 2012;Parks et al., 2015). Phylogenetic analysis based on the 16S rRNA gene revealed that most of the isolates clustered with Prevotella copri DSM 18205 ( Figure 2B). One isolate clustered with Paraprevotella, and three isolates clustered with other species of Prevotella; these four isolates were excluded from the rest of the analysis. A separate phylogenetic analysis based on 30 conserved ribosomal proteins further confirmed that the isolates clustered with P. copri, and also revealed distinct clades ( Figure 2C).
Comparison with a catalog of P. copri genomes from metagenomic assembly confirmed that most isolates were in a main clade (clade A), four isolates were in another clade (clade C), and one isolate each was in two additional clades (clades B and D) . Hosts AQ, A, and AA harbored isolates belonging to more than one clade ( Figure 2C).

P. copri genomic diversity between hosts
Despite close phylogenetic placement, most isolates exhibited striking genetic dissimilarities to the P. copri reference strain (Figure 2A). To better understand the differences in the P. copri genomes, we compared the presence and absence of predicted ORFs in all the isolates ( Figure   2D). This analysis revealed a core genome of approximately 1750 genes, suggesting accessory genomes ranging from approximately 1250-2250 genes per isolate.
Clustering of the isolates by presence/absence of predicted ORFs revealed that isolates often clustered by donor. When isolates from the same donor fell into the same clade, accessory genomes were often unique to isolates from individual hosts, as visualized by blocks of genes in the accessory genomes in Figure 2D. Host AK had isolates with distinct accessory genomes within the same clade (Figures 2C and 2D). Annotation of the genomes revealed that, among other functions, many of the host-specific genes were involved in polysaccharide transport, as indicated by the high frequency of susC genes ( Figure 2D).

Polysaccharide utilization prediction
SusC is the bacterial transmembrane protein that transports polysaccharides into the periplasm.
SusC collaborates with the outer-membrane glycan-binding protein SusD, as well as other proteins, to bind and digest polysaccharides. Conveniently, the genes for this polysaccharide utilization machinery usually occur grouped in close proximity to one another in the genome in polysaccharide utilization loci (PULs) in Bacteroidetes, although occasionally genes encoding polysaccharide catabolism enzymes are located elsewhere in the genome (Sonnenburg et al., 2010). We hypothesized that the genetic diversity of susC genes might indicate functional diversity in polysaccharide utilization by the isolates. We surveyed the susC and surrounding genes in the isolate genomes to predict the repertoire of PULs and, by extension, polysaccharide substrates for the P. copri isolates.
By clustering susC genes at 90% identity, and examining the neighboring gene products within each genome, we identified 125 susC gene clusters with a neighboring susD gene, which we called predicted PULs (pPULs) (Methods, Figure S2, Table S3). Based on the genes flanking susC in each pPUL, we predicted the substrate for 87 pPULs. Due to a high frequency of hypothetical proteins, we were unable to predict substrates for 38 pPULs, which were classified as "PULs of unknown substrate." The predicted substrate, as well as the distribution within isolate genomes, was analyzed for each pPUL ( Figure 3A). The same substrate was often predicted to be targeted by PULs with different susC clusters in different isolate genomes. For example, xylan was the pPUL substrate for susC clusters 2, 65, and 3. SusC cluster 2 was only in isolate genomes from individual BU, susC cluster 65 was only in the genome of isolate A622, and susC cluster 3 was only in isolate genomes from individual P. We identified unique pPULs for beta-glucanases and alpha-L-arabinofuranosidases ( Figure 3B). Although beta-glucanases break down linkages in beta-glucan, and alpha-L-arabinofuranosidases break down linkages in rhamnogalacturonan and arabinan, a polysaccharide substrate containing both linkages has not been described. In contrast with several Bacteroides species, which catabolize polysaccharides from both animal and plant sources, all identified pPUL substrates for P. copri isolates were from plant sources, and none were from animal sources (Martens et al., 2011;Salyers AA, 1982).
Consistent with this prediction, the V1-V3 16S sequences of select isolates were most closely related to two previously reported non-omnivore-associated P. copri Consistent with the 17 predicted PULs for P. copri DSM 18205, the number of predicted PULs per isolate ranged from 14-28, with a median of 23 (Terrapon et al., 2018). Intestinal Prevotella have been reported to utilize xylan and, as expected, xylan pPULs were ubiquitously represented throughout the isolate genomes (Tan et al., 2018). A survey of the xylan pPULs in our dataset revealed additional features ( Figure S4): All isolates contained xylan pPULs, as many as seven.
Each genome contained 4-5 xylan catabolism enzymes that were present in pPULs. SusC/D pairs in xylan pPULs often occurred in tandem.

Polysaccharide utilization in vitro
Based on the pPUL analysis, we predicted that P. copri isolates would grow in the presence of plant polysaccharides identified in the pPUL substrate repertoire ( Figure 3A). We therefore tested the ability of P. copri isolates to use predicted polysaccharides as the sole carbon source in vitro ( Table S4, substrates in bold in Figure 3A). Growth on individual polysaccharide substrates revealed functional diversity between isolates ( Figure 4A, 4B, Figure S5). For example, isolate AA108 reached a high OD when pullulan was provided as the sole carbon source, but isolates P54 and T2112 did not. Interestingly, isolates from the same individual grew in the presence of different polysaccharides: compared to isolate AK1212, isolate AK2718 grew to a higher OD in the presence of PGA and rhamnogalacturonan, and a lower OD in the presence of beta-glucan and starch. As predicted, none of the isolates grew when the animal polysaccharides chondroitin or heparin were provided as the sole carbon source ( Figure S5).
Growth (or lack of growth) on particular polysaccharides distinguished isolates from each clade ( Figure 4B, Table 2). In particular, arabinan, levan, inulin, xyloglucan, beta-glucan, and glucomannan growth distinguished isolates from each clade. All isolates except AQ1173 (which fell into clade B) grew on arabinoxylan and arabinan ( Figure 4A and 4B). Isolate AQ1173 also failed to grow on xyloglucan and glucomannan. Isolate A622 (which fell into clade D) failed to grow on both levan and inulin. Tested isolates from clade C failed to grow on beta-glucan, xyloglucan, glucomannan, and rhamnogalacturonan I. Tested isolates from clade A grew well on most substrates, although there was a range of growth among clade A isolates on several polysaccharide sources.
Isolates AQ1173 and A622 were the only isolates that fell into clades B and D, respectively.
Since growth data from single isolates limited our ability to generalize to those clades, we asked whether pPULs for relevant polysaccharides were present or absent in these clades using P. copri metagenome-assembled genomes (MAGs) reconstructed from metagenomic datasets (Tett et al., 2019) (Methods, Figure 4D). Consistent with our observation that isolate AQ1173 (Clade B) lacked arabinan, arabinoxylan, and glucomannan PULs and also did not grow in the presence of those polysaccharides (Figure 4), the Clade B P. copri MAGS lacked pPULs capable of degrading those polysaccharides. The Clade B MAGs did contain pPULs capable of degrading beta-glucan, however, consistent with our pPUL analysis and growth observations. pPULs were found for glucomannan and arabinan in the Clade D P. copri MAGs consistent with our predictions and growth observations for isolate A622 (Clade D) (Figure 4).

Polysaccharide growth predictions
We next evaluated how well the pPULs analysis predicted in vitro growth: 170 isolate/polysaccharide conditions matched predictions while 85 did not ( Figure 4C). We observed 42 instances in which we did not predict particular polysaccharide utilization by the isolate, yet we observed growth. For 29 of these cases of unpredicted growth, we found genes encoding relevant polysaccharide breakdown enzymes elsewhere in the genome. Perhaps the genes are part of fragmented PULs (as has been observed in Bacteroides fructan PULs (Sonnenburg et al., 2010)), or are simply located on a contig separate from the rest of the PUL.
More puzzlingly, we observed isolates that failed to grow on a particular substrate, despite presence of a corresponding intact pPUL in the genome. For example, we predicted arabinogalactan PULs in most of genomes of clade A, yet most failed to grow on this substrate, and those that did, particularly P. copri isolate BVe41210, grew poorly. Perhaps P. copri isolates prefer a different form or source of arabinogalactan. Additionally, substrate predictions were based on gene annotations that may be incomplete or inaccurate. Overall, the pPUL analysis performed well in predicting growth. Welch's t-test on predicted growth or unpredicted growth had a p-value < 0.0001 (Methods).

Discussion:
In this study, we observed genomic and functional diversity of human intestinal P. copri isolates.
Whole genome sequencing exposed diversity of P. copri isolates, demonstrated by the low percentage of isolate whole genome reads that mapped to the P. copri PanPhlAn database.
Within-species genomic variation, both between individuals and within individuals, was demonstrated by genetic diversity of susC genes and PULs and linked to functional growth of isolates on predicted plant polysaccharides. PULs predicted for several polysaccharides in large metagenomic datasets support our predictions and growth data for representative isolates, indicating that the results of this study may be generalizable beyond the isolates shown here.

Variability in P. copri genomes
ORF presence/absence among isolates revealed host-specific genes in the accessory genome, several of which were susC transport protein genes. The variability of the susC genes associated with P. copri strains in different hosts is intriguing, given their common function of transporting glycans across the outer membrane. Since susC gene products are on the cell surface, they may be susceptible to selective pressure from the immune system, e.g. antibodies. Another possibility is that susC products from different isolates have slightly different substrates, as has been observed in two closely related B. thetaiotaomicron strains (Joglekar et al., 2018).
In addition to susC genes, we also observed integrases, transposases, and transcriptional regulators among the host-specific genes in the accessory genome. These genes may reflect the way P. copri behaves within the host intestine, which may be influenced by the host diet or other microbes. Isolates from host AK all fell into the same clade and possessed two distinct accessory genomes (Figure 2D), suggesting distinct niches for each type of isolate. Perhaps one type of isolate inhabits the cecum, the other the colon; alternatively, one type could reside in the lumen, while the other could be mucosa-associated. Further investigation of the genes unique to each host or isolate may reveal unique roles for individual isolates.

P. copri presence in the human intestine may be diet-dependent
In humans, differences in P. copri 16S sequences and metagenomes have been correlated with host diet (De Filippis et al., 2019;De Filippis et al., 2016). Although our study lacks host diet information, the PUL analysis predicted and in vitro growth experiments confirmed that the P. copri isolates used only plant polysaccharides. A previous study found that particular P. copri V1-V3 16S sequences were associated with diet (De Filippis et al., 2016). The V1-V3 sequences of select isolates from different clades in this study were most closely related to two nonomnivore-associated sequences, consistent with their growth on plant polysaccharides ( Figure   S3). Although the dynamics of dietary and intestinal growth of P. copri within a complex microbiome require experimental validation, in vitro data presented here suggest that plant polysaccharides promote P. copri growth, but animal polysaccharides do not. These data provide a possible mechanism for the correlation between vegetable-rich diets and abundance of intestinal Prevotella: perhaps continuous availability of dietary plant polysaccharides is required to maintain Prevotella presence in the gut, and their absence may result in elimination of Prevotella from the human intestinal microbiome.
In hosts with more than one strain of P. copri, representative isolates often complemented each other to grow in the presence of more polysaccharides than any individual isolate. This point was illustrated best by isolates from donor AQ: isolates AQ1173 and AQ1179 did not grow on xyloglucan or glucomannan, but AQ1149 grew on both polysaccharides ( Figure 4A). These results suggest that the total P. copri population in hosts with multiple strains of P. copri was capable of catabolizing a greater diversity of polysaccharides than any individual P. copri strain.
Indeed, a separate study found that non-Western microbiomes contained multiple P. copri clades, and that some carbohydrate active enzymes were differentially represented between clades . Whether microbiomes of individuals with only one P. copri strain completely lack the ability to break down additional polysaccharides or rely on other bacterial species to maintain a full complement of polysaccharide digestion should be the subject of future metagenomic studies.

Prevotella polysaccharide utilization
Although the pPUL analysis predicted growth surprisingly well, incorrect predictions may indicate interesting biology or areas in need of technical improvements. In cases in which PULs were predicted but growth was not observed, perhaps mutations in the genes encoding the enzymes or regulators have rendered the gene products non-functional. Future studies should aim to evaluate how often and how rapidly such mutations might occur in intestinal commensals.
Open genomes may account for some cases in which polysaccharide utilization was not predicted by the presence of a PUL, yet growth was observed. If a PUL were split across contigs, it would be impossible to accurately predict its substrate. Recent closure of a P. copri genome from metagenomic analysis may help reduce such instances in future studies (Moss and Bhatt, 2018).
Utilization of xylan, which is found in cereal grains, has been repeatedly established in Prevotella species and in P. copri specifically (Accetto and Avgustin, 2019;da Silva et al., 2012;Dodd et al., 2011;Tan et al., 2018). Two distinct PULs encoding enzymes not previously associated with xylan catabolism were identified for corn xylan, birchwood xylan, and arabinoxylan in multiple B. ovatus and B. xylanisolvens strains (Rogowski et al., 2015). The numerous xylan-degrading enzymes and xylan pPULs identified in P. copri isolates in this study suggest that P. copri may have an expanded xylan degrading enzyme repertoire, and possibly a superior ability to target xylans, compared to other intestinal bacteria.
We identified three separate pPULs with enzymes capable of breaking down linkages found in beta-glucan, rhamnogalacturonan, and arabinan. The structure of the corresponding polysaccharide is mysterious. Arabinose and xylose decorate the beta-glucan backbone of xyloglucan. Accordingly, xyloglucan degrading PULs with beta-glucan, xylan, and arabinan catabolizing enzymes are well described in Bacteroides (Larsbrink et al., 2014;Nishinari et al., 2007). Structures and corresponding enzymes for beta-glucan, arabinan, and rhamnose, but not xylose have not yet been described. Arabinose and rhamnose cleavage enzymes imply that the PUL might act on a complex pectin, although beta-glucan linkages have not been described in pectins. One possibility is that some of the genes in these pPULs are non-functional. A more intriguing possibility is that these pPULs are unique to Prevotella and utilize a yet undescribed polysaccharide.
Xyloglucan is an abundant polysaccharide in vascularized plants, seeds, and food additives, yet xyloglucan utilization by known intestinal microbes is relatively rare, demonstrated only in a low proportion of Bacteroides species (Larsbrink et al., 2014;Nishinari et al., 2007). Despite its abundance in the human diet, xyloglucan utilization was overwhelmingly difficult to predict ( Figure 4C). In fact, we observed the keystone xyloglucan endo-beta-1,4-glucanase enzyme in pPULs only in isolates A624 and AQ1179 which did not grow in the presence of xyloglucan (Larsbrink et al., 2014). The same enzyme was encoded in the genomes of isolates AA615, AA917, and A622, although not in close proximity to susC and susD genes. All other genomes lacked this enzyme. Perhaps Prevotella possess novel xyloglucan PUL structures or enzymes for this abundant dietary polysaccharide. Given the rarity of xyloglucan catabolism genes reported to date in Bacteroides, perhaps P. copri is responsible for xyloglucan degradation in hosts lacking Bacteroides species possessing xyloglucan-breakdown genes.

Implications
The collection of P. copri isolates and their corresponding polysaccharide preferences described herein enable future experimentation in vitro and in vivo with isolates whose cultivation was previously elusive perhaps because of the absence of their required specific plant polysaccharides. The human genome encodes enzymes that break down only starch, sucrose, and lactose (El Kaoutari et al., 2013). Carbohydrates make up 45-65% of typical human diets (Board., 2005). Therefore, the ability of intestinal microbes to break down polysaccharides is critical for human survival. Whether P. copri strain-type and presence is meaningful to human health, or simply an indication of diet, requires further investigation. Future studies that aim to assess human microbiome interactions with disease will require dietary intervention or, at minimum, collection of patient dietary habits or records. To avoid confounding effects of diet, future metagenomic analyses may better illuminate disease-associated genes by excluding genes known to interact with host diet. If P. copri does exacerbate or ameliorate disease in specific populations, diet may be readily manipulated to decrease or increase P. copri growth.
Establishment of polysaccharide substrate preferences for genetically diverse human P. copri isolates allows for informed experimentation in vitro and in vivo, as well as clinical study design and interpretation.

Acknowledgements:
We thank Paul Zappile and Adriana Heguy at the NYU Genome Technology Center for sequencing support. We thank Rhina Medina and Lucy Alvarado for sample collection, Michael

Declaration of interests:
The authors declare no competing interests. Prevotella copri 16S SNVs in human stool quantified by 16S sequencing. Percentage of total reads from 16S rRNA gene sequencing is shown. Stars indicate individuals from whom P. copri was isolated. B. Correlation of P. copri abundance determined by 16S sequencing and qPCR using each qPCR primer set (Table S1).       Polysaccharides, see Table S4 see Table S4 see Table S4 LEAD CONTACT AND MATERIALS AVAILABILITY Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Dan R. Littman (Dan.Littman@med.nyu.edu).

EXPERIMENTAL MODEL AND SUBJECT DETAILS
Subjects enrolled in this study included new onset rheumatoid arthritis patients and healthy controls, as described in STAR methods. Ethical approval was granted by the Institutional Review Board of New York University School of Medicine.

Sample collection
Stool was collected into anaerobic collection tubes (Anaerobe Systems) from healthy controls and new onset rheumatoid arthritis patients, deidentified, and frozen at -80°C. New onset rheumatoid arthritis patients were untreated, presenting with symptoms, seropositive for anticitrullinated protein antibodies, and not on antibiotics in the previous six months.

qPCR Screen for Prevotella
Genomic fecal DNA was extracted from fecal samples using the MoBio PowerSoil DNA extraction kit. Fecal DNA was screened for the presence of P. copri using qPCR primers designed for six conserved regions of the P. copri genome (Scher et al., 2013), as well as two conserved regions of the Prevotella 16S gene (  (Caporaso et al., 2012). Sequences generated from amplicon sequencing were trimmed (maxN=0, truncQ=2) and denoised using DADA2 (Callahan et al., 2016).
Sequences were annotated using the Silva nr v128 database.

Whole genome sequencing
Isolates were grown on BRU plates for 24-48 hours, until a healthy lawn was present. The lawn was suspended in PYG media (Anaerobe Systems), pelleted, and DNA was isolated with the MoBio PowerSoil DNA isolation kit. Libraries were prepared for sequencing with the TruSeq PCR-free library preparation kit and sheared to 500bp fragment length. Prepared libraries were sequenced on Illumina HiSeq2500 2x100 bp. Paired-end sequences were trimmed using Trimmomatic (Bolger et al., 2014), then whole genomes were assembled using SPAdes (Bankevich et al., 2012). Trimmed reads were also mapped to the panphlan_pcopri_16 database (https://bitbucket.org/CibioCM/panphlan/wiki/Pangenome%20databases) using PanPhlAn (default parameters, Figure 2A) (Scholz et al., 2016). Quality and completeness of the assemblies were evaluated using CheckM (Parks et al., 2015). Genomes were annotated in PATRIC and are available at https://www.patricbrc.org/ and under BioProject PRJNA559898.

SusC analysis and polysaccharide utilization prediction
All annotated susC genes in the isolate genomes were clustered at >90% identity. This analysis revealed 168 susC gene clusters throughout all the genomes ( Figure S2). For each susC gene cluster, ten flanking genes on either side were pulled from the genome and annotated (Table S3).
Flanking genes were examined and classified according to predicted substrate. SusC clusters lacking a companion susD gene, as well as those comprised of genes unrelated to polysaccharide catabolism, were excluded from the pPUL repertoire analysis.

PUL prediction in P. copri MAGs
SusC and susD homologs in P. copri MAGs  were identified by blastp (Identity>75% , Bitscore > 300) using susC and susD predicted proteins from the P. copri isolate genomes. Only MAG contigs containing both susC and susD homologs within 10 gene products were considered further. For each susC centroid, the PUL operon was extended to include the 10 gene products upstream and downstream of the first and last occurrence of a susC or SusD homolog. Each PUL was checked for CAZy annotations (Lombard et al., 2014) predicted using HMMSEARCH (version 3.1b2) (Eddy, 2011) against dbCAN HMMs v6 (Yin et al., 2012) with an E-value < 1e-18 and coverage > 0.35. PULs were predicted by searching for the presence of CAZymes known to act on known polysaccharides. Predictions are presented as the proportion of P. copri MAGs containing at least one pPUL out of all P. copri MAGs within each P. copri complex clade.

Prediction analysis
For each isolate used in the in vitro polysaccharide growth assay, a one-way ANOVA was performed on the maximum OD for each polysaccharide. A post hoc Dunnet's test was performed with the H2O condition (no carbon source) as the negative control. Isolatepolysaccharide conditions with OD values that were statistically significant compared to the H2O control were considered positive for growth. Isolate-polysaccharide conditions that were not significant were considered negative for growth.

Statistical analysis of prediction
Isolate-polysaccharide conditions were separated according to whether growth was predicted: if a pPUL was present in the genome of an isolate, the isolate-polysaccharide condition was considered "growth predicted." If a pPUL was absent in the genome of an isolate, the isolatepolysaccharide condition was considered "growth not predicted." Isolate-polysaccharide conditions that were statistically significant as determined by a one-way ANOVA with a post hoc Dunnet's test using the H2O condition (no carbon source) as the negative control were given a score of 1 and isolate-polysaccharide conditions that were not significant were given a score of 0. Welch's t test was performed on the data.

DATA AVAILABILITY
All 16S sequencing data and whole genome sequencing data are available under BioProject

PRJNA559898.
Supplemental Information: Figure S1. PCR-based screen for P. copri-containing fecal microbiota; related to Figure 1 Eight sets of primers were designed to amplify conserved regions of the P. copri genome. Fecal DNA from each donor was screened by qPCR using each primer set (Table S1). Stars indicate individuals from whom P. copri was isolated. SusC homologs were identified in the genomes and clustered by 90% sequence identity.
Adjacent polysaccharide genes were identified and used to annotate putative PULs from each susC cluster.   Supplemental Tables: Table S1. P. copri primers table, related to Figure 1 and Figure S1   Table S4. Polysaccharides used, related to Figure 4 Supplemental Files: File S1. P. copri 16S SNV sequences, related to Figure 1    Figure S1. PCR-based screen for P. copri-containing fecal microbiota; related to Figure 1 Eight sets of primers were designed to amplify conserved regions of the P. copri genome. Fecal DNA from each donor was screened by qPCR using each primer set (Table S1). Stars indicate individuals from whom P. copri was isolated.     Table S3, and Methods). Predicted substrate usage for each isolate or group of isolates is indicated by a blue box. Stars next to cluster IDs indicate susC genes that have different surrounding enzyme genes in different isolates. B. Schematic of modified beta-glucan pPULs (polysaccharide source not yet identified). Protein families are labeled, and specific predicted functions are listed beneath relevant genes. Hypothetical proteins are unlabeled in black. Genes unlikely to be relevant to polysaccharide utilization are labeled and colored gray. SusC homologs were identified in the genomes and clustered by 90% sequence identity. Adjacent polysaccharide genes were identified and used to annotate putative PULs from each susC cluster.    Isolates were cultured anaerobically at 37 degrees C in YCFA media supplemented with individual polysaccharides as the sole carbon source. OD600 was measured at 24, 48, and 72h. Data represent 2-3 experiments, with each condition tested in duplicate or triplicate. The highest OD from 24, 48, or 72h for each condition for each experiment is shown. AS: media prepared by Anaerobe Systems. Heatmap (A) shows the mean of the highest OD for each condition of each experiment. P. copri complex clades are indicated above the heatmap. Graphs (B) show the highest measured OD on select polysaccharide substrates. Isolates from the same P. copri complex clade are plotted in the same color . C. Growth data were matched to growth predictions based on the presence or absence of pPULs in the genomes. Growth was considered positive if the OD was statistically significant, as determined by one-way ANOVA and post hoc Dunnett's test using the H2O (no carbon source) as the control (Methods). D. PULs were predicted in P. copri MAGs. Proportion of P. copri MAGs within each clade containing pPULs for select polysaccharides is shown.   Table S1. P. copri primers table, related to Figure 1 and Figure S1 primer set description; F/R sequence 5' --> 3'