Genome-scale metabolic reconstruction analysis of Clostridioides difficile identifies conserved patterns of virulence-related metabolic reprogramming

Clostridioides difficile is a Gram-positive, sporulating anaerobe that has become the leading cause of hospital-acquired infections. Over the previous decade, many studies have demonstrated the importance of metabolism in numerous aspects of C. difficile biology from initial colonization to regulation of virulence factors. Additionally, due to growing threats of antibiotic resistance and recurrent infection, targeting components of metabolism presents a novel possible approach to combat this infection. In the past, genome-scale metabolic network analysis of bacteria has enabled systematic investigation of the genetic and metabolic properties that potentially contribute to downstream phenotypes as well as prediction of outcome from perturbations to these pathways. These predictions ultimately create a platform for high-throughput identification and screening of metabolic targets prior to laboratory testing. To accomplish these goals in C. difficile, we constructed highly-curated genome-scale metabolic network reconstructions (GENREs) for a well-studied laboratory strain of the pathogen (str. 630) as well as a more recently characterized hyper-virulent isolate (str. R20291). These computational modeling platforms account for key components of C. difficile core metabolism and nutrient acquisition systems to recapitulate metabolic behaviors within the complex milieu of the gut. Simulating the impact of single-gene deletions resulted in accuracies of ~89.9% for both GENREs compared with transposon mutant libraries. Further analysis of both strains also revealed significant correlations between in silico and experimentally measured growth in carbon source utilization screens (p-values ≤ 0.002), with positive predictive values of ~95.0%. Subsequently, we generated context-specific models by integrating transcriptomic data from C. difficile grown in vitro or during in vivo infection. Simulations also predicted the consistent inverse patterns of carbohydrate and amino acid catabolism that corresponded with differential virulence factor expression measured experimentally. Collectively, our results indicate that GENRE-based analyses of C. difficile are an effective means for gaining novel insight into metabolism as it relates to pathogenesis and provides a platform for the identification of novel therapeutic targets.

GENREs must be revisited or remade entirely to improve the quality of the resultant metabolic predictions. As such, we began with the updated genome of the highly-characterized laboratory strain C. difficile str. 630 24 , first generating a de novo reconstruction followed by extensive literature-driven manual curation of catabolic pathways, metabolite transport, and a biomass objective function. We proceeded to use this reconstruction as a template to also create a curated GENRE for the more recently isolated hyper-virulent strain R20291 25 .
Predictions from both GENREs were subsequently compared against published in vitro gene essentiality and carbon utilization screens. These predictions indicated a high degree of agreement across experimental datasets. To then assess the application of our GENREs for in situ metabolic prediction, we integrated transcriptomic data collected from both in vitro and in vivo conditions into our models and assessed the emergent metabolic activities. Analysis of context-specific pathogen metabolism revealed conserved patterns of metabolism. Across states of increased virulence, both strains of C. difficile favored increased fermentation of amino acids and decreased capacity for glycolysis. These trends agreed with published phenotypes 10,26 , and supported the advantage provided by GENREs for delineating complex metabolic networks and patterns of gene expression into more tractable experimental targets. Additionally in vivo gene essentiality highlighted specific aspects of nucleotide scavenging as critical for growth during infection and may provide preliminary targets for future inhibitor discovery. Overall, high-quality GENREs can greatly augment the discovery of novel therapeutics to treat CDI due to the connections between metabolic signals and colonization or virulence induction in C. difficile. Finally, the current study lays the groundwork for systems-level analyses of CDIassociated metabolism in the context of complex extracellular environments like the gut microbiome during infection.

Current State of C. difficile Genome-scale Metabolic Modeling Efforts
We began by collecting and assessing the quality of existing C. difficile GENREs. The primary focus of curated C. difficile metabolic modeling efforts has been on the first fully sequenced strain of C. difficile, str. 630.
A high degree of additional genomic and phenotypic characterization was later performed for this isolate, making it an ideal candidate for representative GENRE creation. The first reconstruction effort (iMLTC806cdf 20 ) and subsequent revision (icdf834 20,21 ), were followed by a recent de novo creation following updated  28 . Another GENRE was developed for str. 630Δerm (iHD992 22 ), a strain derived from str. 630 by serial passage until erythromycin resistance was lost 29 . Four additional C. difficile strain GENREs were generated as a part of an effort to generate numerous new reconstructions for members of the gut microbiota 30 ; these reconstructions received only semi-automated curation performed without C. difficilespecific considerations.
To establish a baseline for the metabolic predictions possible with current C. difficile GENREs, we selected common criteria with large impacts on the quality of subsequent predictions for model performance (Fig. S1A). The first of these metrics is the level of consistency in the stoichiometric matrix [31][32][33] , which reflects proper conservation of mass and that no metabolites are incorrectly created or destroyed during simulations.
The next metric is a ratio for the quantity of metabolic reactions lacking gene-reaction rules to those possessing associated genes 34 , which may indicate an overall confidence in the annotation of the reactions.
These features reflect the importance of mass conservation and deliberate gene/reaction annotation which each drive confidence in downstream metabolic predictions, omics data integration, and likelihood for successful downstream experimentation. We found that each GENRE performed well in some categories, but unique challenges were found in each which made comparing simulation results across models challenging.
For example, neither iMLTC806cdf nor iHD994 have any detectable gene annotations associated with the reactions they contain. A high degree of stoichiometric matrix inconsistency was detected across icdf834, iHD992, and iCN900; with iHD992 many intracellular metabolites were able to be generated without acquiring necessary precursors from the environment. These findings reinforced the value of proper biochemical constraints for GENREs to allow for improved fidelity to the target organism's in situ metabolism.
We went on to determine the cumulative MEMOTE quality score for each C. difficile GENRE (Fig. S1A).
MEMOTE is a recent series of model quality assessment guidelines, agreed upon by the research community, and developed into a single platform to create an independent comparable quality metric across GENREs 35 .
These percentages reflect a composite measurement of mass conservation, reaction constraint, and standardized component annotation that are necessary for carrying out reliable simulations 34 . The three oldest C. difficile reconstructions each scored <50%; conversely the most recent GENRE (iCN900) received a 74% cumulative MEMOTE score yet underperformed in the other metrics. Furthermore, the pre-curation draft C.
difficile GENREs generated for this study scored similarly (~40%) to those automatically curated AGORA Finally, we assessed key metabolic functionalities and established general principles of C. difficile physiology within each of the existing GENREs. First, we compared imputed doubling times of each GENRE, derived from the optimal biomass objective flux value simulated in rich media 36 . While not strictly a measurement of GENRE quality, this value may generally reflect the degree of functional predictions possible with a given GENRE based on its deviation from measured values of ~29 minutes under similar conditions 37 .
This analysis uncovered that most GENREs indicated doubling times relatively close to the experimental measures, however iMLTC806cdf and iHD992 gave times under 5 minutes and iCN900 was well over 500 minutes (Fig. S1D). We also detected structural inconsistencies across several GENREs. For example, those GENREs acquired from the AGORA database possessed several intracellular metabolic products not adequately accounted for biologically (Table S1), as well as mitochondrial compartments despite being bacteria. Additionally, several key C. difficile metabolic pathways either were incomplete or absent from the curated models including multi-step Stickland fermentation, membrane-dependent ATP synthase, dipeptide and aminoglycan utilization, and a variety of saccharide fermentation pathways 38 . Overall, the existing C.
difficile GENREs possessed numerous mass imbalances and annotation inconsistencies, lacked key functional capacities, and failed to phenotypically mimic C. difficile growth. These collective results motivated the generation of a new reconstruction for our intended analyses.

C. difficile Metabolic Network Scaffold Construction
The existence of hypervirulent strains of C. difficile that have unique metabolism and virulence factors highlights the importance of equipping future modeling efforts to study and identify novel targets within these isolates. With this in mind, we focused on the most well-characterized hypervirulent isolate, str. R20291.
However, to maximize the utility of the bulk of published C. difficile metabolic research, we elected to generate a reconstruction for the lab-adapted str. 630 in parallel. This focus afforded the ability to continuously crossreference curations between the models and to more readily identify emergent differences that are specifically due to genomic content. We began the reconstruction process by accessing the re-annotated genome of str. Center database (PATRIC) 39 . Following a recent protocol for creating high-quality genome-scale models 40 , and utilizing the ModelSEED framework and reaction database 41 , we generated gap-filled scaffold reconstructions for both strains. Gap-filling refers to the automated process of identifying incomplete metabolic pathways due to an apparent lack of genetic evidence that are also necessary for in silico growth, and subsequent addition of the minimal functionality needed to achieve flux through these pathways 42 . The resultant scaffolds were stripped of reactions that were added due to gap-filling in order to be most reflective of original genomic content and partially reveal pathways in need of manual curation (Table S2). Additionally, to focus the reconstructions on bioconversion of metabolites, we removed genes that encoded enzymes involved in macromolecule synthesis (e.g. ribosomal genes). We subsequently performed complete translated proteome alignment between str. 630 and str. R20291, resulting in 684 homologous metabolic gene products and 22 and 33 unique gene products, respectively (Table S3). Among the distinctive features were additional genes for dipeptides import in str. 630 and glycogen import and utilization in str. R20291, which have both been linked to modulated levels of virulence across strains of C. difficile 43,44 . After resolving the dissimilarities between the strains by incorporating corresponding metabolism to each reconstruction, we moved on to extensive manual curation of both GENREs.

Metabolic Network Curation and Ensemble Gap-filling
Manual curation is required in order to ultimately produce high-quality GENREs and make meaningful biological predictions 45 . As such, we proceeded to manually incorporate 259 new reactions (with associated genes and metabolites) and altered the conditions of an additional 312 reactions already present within each GENRE prior to gap-filling (Table S2). Primary targets and considerations for the manual curation of the C. difficile GENREs included:  Anaerobic glycolysis, fragmented TCA-cycle, and known molecular oxygen detoxification 38,46  Minimal media components and known auxotrophies [47][48][49]  Aminoglycan and dipeptide catabolism [50][51][52]  Many Stickland fermentation oxidative and reductive pathways (Table S2)   Periplasmic-associated H+ gradient and ATP synthase  Additional pathogenicity-associated metabolites (e.g. p-cresol 55 and ethanolamine 66 ) Following the outlined manual additions, we created a customized biomass objective function with certain elements tailored to each strain of C. difficile. Our biomass objective function formulation was initially adapted from the well-curated GENRE of the close phylogenetic relative Clostridium acetobutylicum 67 with additional considerations for tRNA synthesis and formation of cell wall macromolecules, including teichoic acid and peptidoglycan (Table S2). Coefficients within the formulations of DNA replication, RNA replication, and protein synthesis component reactions were adjusted by genomic nucleotide abundances and codon frequencies in order to yield strain-specific biomass objective functions 68 . To successfully simulate growth, we next performed an ensemble-based pFBA gap-filling approach 69 (Table S2). With each step new reactions found across an ensemble were collected and integrated into the draft reconstruction. A total of 68 new reactions allowed for robust growth across all conditions. Final steps of the curation process were focused on limiting the directionality of reactions known to be irreversible, extensive balancing of the remaining incorrect reaction stoichiometries, and adding annotation data for all network components. We repeated the assessments that were performed for the earlier reconstructions and found that our GENREs had substantial improvements in all metrics including few, if any, flux or mass inconsistencies and now each received a cumulative MEMOTE score of 86% (Fig. S1C)  A standard measurement of GENRE performance is the comparison of predicted essential genes for growth in silico and those found to be essential experimentally through forward genetic screens 72 . This form of analysis moves past strict network quality criteria and into biologically tractable predictions. Many C. difficile strains have been historically difficult to manipulate genetically 73 ; however, methods were recently developed and a large-scale transposon mutagenesis screen was published for str. R20291 74 . As such, we first utilized the proteomic alignment from the previous section to determine those genes in str. 630 that possessed homologs within the str. R20291 dataset. We simulated single gene knockouts for all genes and evaluated for >1% optimal biomass objective flux in BHI medium after growth simulation 75 for both iCdR700 (Fig. 1A) and iCdG698 ( Fig. 1B), cross-referencing the results with those in the published study. These comparisons revealed overall accuracies of 89.1% and 88.9%, with negative-predictive values as high as 90.0% for iCdR700 and 89.9% for iCdG698. These results demonstrated that our GENREs correctly predicted with high accuracy the same genes determined to be essential for laboratory growth.

Predicted growth substrate utilization profiles mirror in vitro screening results
To assess if GENRE requirements reflected the components of minimal medium derived experimentally, we identified the minimum subset of metabolites that our model required as an exogenous supply for growth. Importantly, the specific metabolite composition of C. difficile minimal medium has been defined across three separate laboratory studies [47][48][49] . Through in silico limitation of extracellular metabolites to only the experimentally determined requirements, followed by growth simulations with systematic omission of each component individually, we were able to determine the impact of each component on achieving some level of biomass flux (Fig. 1C). This analysis revealed that the majority of metabolites found to be essential during growth simulation have also been shown experimentally to be required for in vitro growth. In disagreement with two of the published studies, simulations indicated that neither iCdG698 (str. 630) nor iCdR700 (str. R20291) is auxotrophic for methionine. However, the published formulation of BDM where methionine is present found the amino acid to be largely growth-enhancing and not essential for small levels of growth 48 . Additionally, it has been demonstrated in the laboratory that C. difficile is able to harvest sufficient bioavailable sulfur from excess cysteine instead of methionine 49,76 , further supporting a non-essential status for is not necessarily required to support slow growth rates. Finally, our results also indicated that iCdR700 was not auxotrophic for isoleucine relative to iCdG698, and indeed contained additional genes coding for synthesis of a precursor (3S)-3-methyl-2-oxopentanoate (ilvC, a ketol-acid reductoisomerase) which were not present in its counterpart GENRE (Table S3). Interestingly, increases in isoleucine consumption are associated with greater pathogenicity in some C. difficile strains 77 , which may contribute to the hypervirulence of str. R20291.
In summary, the in silico minimal requirements for iCdG698 and iCdR700 closely mirrored experimental results for both strains of C. difficile in addition to reconciling partially conflicting reports on experimentally-determined auxotrophies.

Metabolite-specific growth enhancement strongly correlates with in vitro results
We next assessed additional carbon sources that impact the growth yield predictions for both GENREs.
Utilizing previously published results for both C. difficile strains in a Carbon Source Utilization Screen 78 , we simulated the degree to which each metabolite influenced growth yield in minimal medium. Importantly, C. difficile is auxotrophic for specific amino acids (e.g. proline; Fig 1C) that it is also able to catabolize through Stickland fermentation 79 , so the diluting background medium must be supplemented with small concentrations of these metabolites. As such, the values are reported as the ratio of the final optical density for growth with the given metabolite versus low levels of growth observed in the background medium alone. Despite this calculation not being a direct comparison of utilization capability as in traditional Biolog analyses 80 , it provides insight into an organism's metabolic preferences. We similarly calculated the influence of each metabolite on the optimal biomass flux at quasi-steady state of each model provided with the same background media conditions as the Biolog analysis ( Fig. 2A). Across all of the 116 total metabolites that were in both the in vitro screen as well as the C. difficile GENREs, we identified significant predictive correlations in the amount of growth enhancement for iCdG698 (p-value < 0.001) and iCdR700 (p-value = 0.002) ( Fig. 2B & 2C). This relationship was even more pronounced for carbohydrates and amino acids, primary carbon sources for C. difficile (Fig. S2). When these predictions were reduced to binary interpretations of either enhancement or nonenhancement of growth, we found that iCdG698 predicted 92.8% and iCdR700 predicted 96.6% true-positive enhancement calls (Fig. 2D). Importantly, this metric is the most valuable measure in this instance as it indicates that each GENRE possesses the machinery for catabolizing a given metabolite. Collectively, these data strongly indicated that both GENREs were well-suited for prediction of growth substrate utilization in either strain of C. difficile.

Context-specific metabolism reveals inverse metabolic patterns relating to virulence in vitro
Following GENRE validation, we sought to qualify the ability of each GENRE to predict in situ metabolic phenotypes across diverse experimental settings. As previously stated, GENREs have provided powerful platforms for the integration of transcriptomic data, creating greater context for the shifts observed between conditions and capturing the potential influence of pathways not obviously connected 81 . With this application in mind, we chose to generate context-specific models for both in vitro and in vivo experimental conditions characterized with RNA-Seq analysis utilizing a recently published unsupervised transcriptomic data integration method 82 . Briefly, this approach calculates the most cost-efficient usage of the metabolic network in order to achieve growth given the pathway investments indicated by the transcriptomic data. This process is in line with the concept that natural selection generally selects against wasteful production of cellular machinery and affords the ability to make much more fine-scale predictions of metabolic changes that C. difficile undergoes as it activates pathogenicity. The resultant patterns also reveal central elements within contextspecific metabolism that could lead to targeted strategies for intentional downregulation of virulence factors through metabolic circuitry.
A recent study determined that phase variation, a reversible mechanism employed by many bacterial pathogens to generate phenotypic heterogeneity and maximize overall fitness of the population, also occurs in C. difficile str. R20291 and influences virulence expression 83 . One aspect of this phase variation manifests as a rough or smooth-edged colony morphology on solid agar; the morphologies can be propagated via subculture and are associated with distinct motility behaviors and altered virulence 84 . The colony morphology variants are generated through the phase variable (on/off) expression of the cmrRST genes. With this in mind, we sequenced transcriptomes from experimentally grown rough and smooth phase variants of C. difficile str.
R20291 grown on solid BHI rich medium for 48 hours. Utilizing these data, we generated context-specific versions of iCdR700 in simulated rich media conditions. It has been previously shown that mutation of cmrfamily genes does not significantly alter growth rate in vitro 84  significant difference in optimal biomass flux values between phase variants (Fig. 3A), which agrees with previously published experimental growth rate measurements for C. difficile 37 . We then calculated essential genes in each variant model similar to the earlier analysis which identified 81 core genes essential in both contexts (Table S4), another 13 genes essential to growth for both variants, and 5 genes that were conditionally essential between the morphologies in BHI rich medium (Fig. 3C). The conditionally essential gene set restricted to the smooth variant included an N-acetylglucosamine PTS system as well as pyruvate kinase, which mediates the last step of glycolysis and a bulk of the ATP generation. Notably, at the transcriptional level, reads mapped to pyruvate kinase were detected at nearly identical levels between the rough and smooth isolates (Table S4). These results indicate that glycolytic enzymes may be more active in the smooth colony variants. The essentiality of N-acetylglucosamine transport in the context-specific model for the smooth phase was of interest as this variant has been previously shown to generate biofilms 84 , in which Nacetylglucosamine is often a component 85 . We found that predicted exchange efflux of N-acetylglucosamine in the smooth variant was significantly greater than in rough (Fig. S3C). Conversely, in the rough context-specific model were multiple essential genes involved in Stickland fermentation (Fig. 3B). As with the pyruvate kinase gene, similar levels of transcription for these genes were also observed between smooth and rough variants (Table S4). These data were indicative of a potential trade-off between glycolysis and amino acid (Stickland) fermentation between smooth and rough phases respectively. In addition to genes that were critical for growth, we also identified those that were only required to achieve high growth yields in each context. This gene set included additional carbohydrate transporters in the smooth variant and multiple amino acid transporters in the rough variant (Table S4), further supporting differential utilization of glycolysis and Stickland fermentation across phases with highly dissimilar flux distributions of core metabolic pathways (Fig. S3), in spite of largely similar optimal growth rates (Fig. 3A).
The trends for the opposing metabolic strategies were reinforced when we compared sampled flux distributions for the associated exchange reactions for the most common substrates of each respective pathway, glucose and proline. We found not only that the model predicted that glucose was imported in the smooth variant, but that this functionality was entirely inactive in the rough-associated model (Fig. 3C).
Alternatively, proline was utilized significantly more in the rough variant-specific model (Fig. 3D), and unlike glucose import could not be entirely pruned from the opposing model as C. difficile is a proline auxotroph. It has been previously reported that this relationship between colony morphology phase variant and metabolism may occur in C. difficile 86 , and our collective results from contextualized iCdR700 analysis support discordant utilization of glycolysis or Stickland fermentation that may relate to phase variation. Based on these data, we hypothesized that access to easily catabolized carbohydrates influences colony morphology due to phase variation in C. difficile. To test this hypothesis, single colonies of either rough or smooth, grown anaerobically for 48 hours on BHIS agar (Fig. S4A), were subcultured onto BDM (Materials & Methods) agar plates both with and without 2 mg/ml glucose (Fig. 3E & S4B). Following anaerobic incubation for 48 hours we found that rough variants maintained their morphology across both media, with the rough phenotype even exacerbated on the minimal medium. However, while the smooth variant largely maintained its colony morphology upon subculture onto BDM + glucose, the colonies became much more analogous to their rough counterparts when glucose was absent. Further subculture of each altered morphology from minimal media back onto rich BHI medium also appeared to support consistent switching between the respective morphologies (Fig. S4C). Our data suggest that the absence of glucose provided a fitness advantage for variants that preferentially use Stickland metabolism, selecting for the rough variant. Furthermore, these results are consistent with the hypothesis that carbohydrates availability impacts phase variation in C. difficile, influencing the virulence-associated metabolic state and that environmental stress due to limited nutrients may be a key factor in driving the shift between phases.

Predicted metabolism during infection also supports differential strategies relating to altered virulence
Given laboratory media conditions (as used in the results described above) are much more easily defined, we also wanted to examine GENRE performance and prediction quality under more complex in vivo infection conditions. Another previously published study assessed the differential transcriptional activity of C. difficile str. 630 in the gut during infection in a mouse model pretreated with either streptomycin or clindamycin to induce sensitivity to colonization. These distinct treatments have different impacts on the structure of the gut microbiota 87 and allow for identical levels of pathogen colonization and vegetative cell load in the cecum.
However, these different treatments result in highly dissimilar levels of sporulation (another phenotype linked to C. difficile virulence) where streptomycin is associated with undetectable spore CFUs and clindamycin with significantly higher levels 88 . The authors of this study also detected no significant difference in toxin activity to correlate the transcriptional activity of metabolic pathways with changes in the abundance of their respective substrates and byproducts following infection. This analysis was performed for each antibiotic with both mockinfected and C. difficile-colonized groups to extract the specific impact of the infection on the gut metabolome, making this dataset extremely valuable for our purposes. Similar to the previous analysis, we overlaid these data onto our GENRE of str. 630 (iCdG698) and compared predicted doubling times, which were calculated from biomass objective flux in the sampled context-specific flux distributions (Fig. 4A). This comparison revealed a significantly faster growth rate in the slower sporulation context (p-value << 0.001), reflecting a potential focus on continued growth instead of spore formation and egress possibly due to preferred environmental conditions. To then quantify differential use of core metabolism, we compared the activity of those reactions conserved between conditions. We accomplished this analysis through unsupervised machine learning (Non-Metric Multidimensional Scaling) of Bray-Curtis dissimilarity for sampled flux distributions of all shared reactions (Fig. 4B). In agreement with the previous findings that C. difficile is able to adapt to distinct growth substrates 88 , we found a significant difference (p-value = 0.001) between the activity of core metabolism between high and low sporulation states. Additionally, within-group dissimilarities indicated that much more variation was found within the low sporulation group, potentially indicating that conditions favoring increased sporulation also support a lower diversity of potential metabolic strategies.
To support the unsupervised findings we implemented a supervised machine learning approach where we identified those reactions which most readily delineate flux distributions from low and high spore contextspecific models, and reported the importance of each reaction to the overall classification success (Fig. 4C).
The most prominent signals highlighted by this approach were differences in the catabolism of the host-derived mucus-associated aminoglycans N-acetylmannosamine, N-acetylneuraminate, and N-acetylglucosamine which have been shown to be readily fermented by C. difficile and play a role in determining virulence factor expression 26,89 . Additionally, multiple nucleoside phosphatase reactions which both contribute to maintenance of intracellular phosphorylated guanosine which has also been associated with determining virulence phenotype expression 90,91 . Taken together, these results support that environmental conditions that favor increased glycolytic activity in C. difficile are inversely associated with virulence expression which agrees with previous reports for the control of glucose over toxin expression 92 . We next cross-referenced exchange reactions that were differentially active across the high sporulation and low sporulation context-specific models (Fig. 4D), and compared changes in the concentration of associated metabolites from a paired untargeted metabolomics screen (Fig. 4E). This analysis predicted multiple Stickland fermentation substrates to be utilized at similar rates across both contexts. We found that proline was imported at higher rates in low spore-associated simulations (Table 4C; Table S5). This amino acid was also detected in significantly higher concentrations only in mock infection, supporting consumption by C. difficile 7 . These data agreed with findings from the previous section that amino acid catabolism may be associated with higher expression of certain virulence factors, despite previous reports that extracellular proline concentrations inversely correlated with expression of C. difficile toxin in vitro 9 . Leucine was also predicted to be imported at higher rates in this context, and its associated Stickland byproduct isovalerate was predicted to be produced only in the high spore model (Table S5). This trend agreed with in vivo metabolomic measurements where isovalerate concentrations were significantly increased only in the context of higher spores ( Fig 4E). Conversely, another known Stickland substrate tyrosine was predicted to be utilized more in the low spore-associated model (Fig. 4D). The byproduct of tyrosine fermentation, p-cresol, was also predicted to be secreted more readily in the context of lower sporulation (Table S5). This largely C. difficile-specific metabolic byproduct may be associated with its pathogenicity 86,93 , and reinforces a potential link between virulence expression. Collectively these results further support that while Stickland fermentation is a core metabolic strategy in C. difficile, this pathway is differentially utilized under conditions that favor altered virulence factor expression.
We also identified N-acetylneuraminate (NEu5Ac) as highly utilized in the lower sporulation context, a host-derived glycolysis substrate that C. difficile readily uses as a carbon source for growth 7 . This consumption was supported in the metabolomics screen where concentrations of this metabolite were significantly decreased following infection only in the lower spore condition (Fig. 4F). In agreement, this analysis also predicted that frequent byproducts of carbohydrate fermentation, acetate and formate, were more abundant in the lower sporulation context (Fig. 4D). Alternatively, both N-acetylglucosamine (GlcNAc) and Nacetylmannosamine (ManNAc) were predicted to be secreted at much larger rates from the low spore contextspecific model (Fig. 4D, S5A, & S5B). Interestingly, these metabolites are integral components of biofilms 94 , and C. difficile has been previously shown to generate these structures under certain circumstances 85 . We  (Fig. SC), potentially indicating a lack of consumption by C. difficile. These combined results may indicate that increased reliance on glycolysis may be associated with reduced sporulation but increased biofilm formation, supporting a complex metabolic regulation of distinct aspects of C. difficile virulence.
To then examine the utility of the str. R20291 GENRE for identifying potential gene targets that may be exploited to inhibit metabolism of the pathogen in vivo, we performed a similar in silico gene essentiality screen as in the preceding section. We subsequently cross-referenced our results to limit our focus to those genes that are only essential in vivo and shared across high and low sporulation-favoring conditions. This analysis uncovered 35 genes that are essential only during infection (Table S5). Among the genes highlighted were many components of nucleotide metabolism including pyrimidine synthesis regulator PyrR and adenylate kinase. These genes are highly expressed during infection and inhibition of specific enzymes within this pathway has been shown to downregulate toxin production 6,95 . Furthermore, proline racemase, which is an important part of Stickland fermentation in C. difficile and has been previously linked to virulence expression in vitro 96 , was also essential in both infection conditions. Alternatively, when we identified those genes that were discordantly essential between the conditions we found that additional genes in the higher sporulation context related to Stickland fermentation of glycine and proline; including glycine reductase and pyrroline-5-carboxylate reductase (Table S5). These results further highlight the relationship between Stickland fermentation and increased C. difficile sporulation. Additionally, these findings support that the GENREs were effective mechanisms for identifying targetable metabolic components in C. difficile to limit colonization or pathogenicity.

DISCUSSION
The control for much of C. difficile's physiology and pathogenicity is subject to a coalescence of metabolic signals from both inside and outside of the cell. Historically, C. difficile research has suffered from a shortage of molecular tools and high-quality predictive models for highlighting new potential therapies. Over between C. difficile virulence and metabolism. Furthermore, as much of bacterial pathogenicity is now being attributed to shifts in metabolism the analyses described here may provide large benefits to the identification of possible treatment targets in C. difficile and other recalcitrant pathogens 97 . In the current study, we develop and validate two highly-curated genome-scale metabolic network reconstructions for a well-described laboratory strain (str. 630) in addition to a more recently characterized hyper-virulent strain (str. R20291) of C.
difficile. Both iCdG698 (str. 630) and iCdR700 (str. R20291) draw from numerous molecular and metabolic studies of C. difficile and Clostridial metabolism in order to accurately incorporate a large array of metabolic subsystems known to be present across strains of the pathogen. We further improved the quality of the models through careful curation of core metabolic strategies, including amino acid and carbohydrate fermentation, to ensure growth in all major defined growth conditions for C. difficile.
After the curation process was complete, we found a high degree of agreement between model predictions and validating experimental datasets. Both iCdG698 and iCdR700 indicate that the respective strains are able to catabolize amino acids as the sole carbon source through Stickland fermentation and require only those metabolites present in the experimentally determined minimal media to achieve growth.
Additionally, close correlations of in silico predictions with both gene essentiality and carbon source utilization screens supported that the GENREs accurately recapitulate C. difficile physiology and reconcile some previous inconsistencies in C. difficile metabolism literature. Following contextualization using in situ transcriptomic data, both GENREs were also able to demonstrate established complex metabolic phenotypes for both laboratory and infection conditions. These analyses collectively indicated a shift away from glycolytic metabolism, and toward amino acid fermentation, during periods of increased pathogenicity. These findings could lay the groundwork for novel approaches to curbing the expression of virulence factors by influencing environmental conditions to favor certain forms of metabolism over others. In vivo context-specific gene essentiality also predicted proline racemase to be critical for growth during infection, yet it was previously found to be dispensable in an animal model using a forward genetic screen 96  iCdR700 may be more limited than previous reconstructions; however, we elected to focus on those gene sets where the greatest amount of evidence and annotation data could be found to maximize confidence in functionality included here. Future efforts could be directed at increasing the genomic coverage each GENRE contains. Concordantly, both GENREs consistently underpredict the impact of some metabolite groups, primarily nucleotides and carboxylic acids (Fig. S2), which could be due to the absent annotation of the relevant cellular machinery. Furthermore, more complex regulatory networks ultimately determine final expression of virulence factors and these may be needed additions in the future to truly understand the interplay of metabolism and pathogenicity in C. difficile. In spite of these potential shortcomings, both iCdG698 and iCdR700 produced highly accurate metabolic predictions for their respective strains, and are strong candidate platforms for directing future studies of C. difficile metabolic pathways. Additionally, the contextualized growth simulation results indicated an inverse relationship between glycolysis and Stickland fermentation with respect to expression of pathogenicity. Our results indicated that fermentation of specific amino acids may be more associated with increased expression of C. difficile virulence factors. These changes also seem to be predicated on a degree of environmental nutrient stress as the switch in phase was only induced across formulations of minimal medium.
Systems-biology approaches have enabled the assessment of fine-scale changes to metabolism of single species within complex environments that may have downstream implications on health and disease.
Overall, the combined in vitro-and in vivo-based results demonstrated that our GENREs are effective platforms for gleaning additional understanding from omics datasets, outside of the standard analyses. Both GENREs were able to accurately predict complex metabolic phenotypes when provided context-specific omic data, and ultimately underscores the metabolic plasticity of C. difficile. The reciprocal utilization of glycolysis and amino acid fermentation indeed support regimes of distinct metabolic programming associated with C. difficile pathogenicity. With this in mind, finding core metabolic properties in C. difficile strains may be key in identifying potential probiotic competitor strains or even molecular inhibitors of metabolic components. The current study is an example of the strength that systems-level analyses have in contributing to more rapid advancements in biological understanding, and in the future the metabolic network reconstructions presented here are well-suited to accelerate research efforts toward the discovery of more targeted therapies.

C. difficile GENRE Construction
We utilized PATRIC reference genomes from Clostridioides difficile str. 630 and Clostridioides difficile str. R20291 as initial reconstruction templates for the automated ModelSEED pipeline 39,98,99 . The automated ModelSEED draft reconstruction was converted utilizing the Mackinac pipeline (https://github.com/mmundy42/ mackinac) into a form more compatible with the COBRA toolbox 100 . Upon removal of GENRE components lacking genetic evidence (i.e. gap-filled), extensive manual curation was performed in accordance with best practices agreed upon by the community 101 . We subsequently performed ensemble gap-filling as previously described, utilizing a stoichiometrically consistent anaerobic, Gram-positive ModelSEED universal reaction collection curated for this purpose and available alongside code associated with this study. Next, we corrected reaction inconsistencies and incorrect physiological properties (e.g. ensured free water diffusion across compartments). Final transport reactions were then validated with TransportDB 102 . All formulas are mass and charged balanced at an assumed pH of 7.0 using the ModelSEED database in order to maintain a consistent and supported namespace to augment GENRE interpretability and future curation efforts. We then collected annotation data for all model components (genes, reactions, and metabolites) from SEED 101,103 , KEGG 104 , PATRIC, RefSeq 105 , EMBL 106 , and BiGG 107 databases and integrated it into the annotation field dictionary now supported in the most recent SBML version 108 . Complete MEMOTE quality reports for both C. difficile GENREs are also available in the GitHub repository associated with this study, and full pipelines for model generation are explicitly outlined in Jupyter notebooks hosted there as well. Download of either iCdG698 (str. 630) or iCdR700 (str. R20291) is possible from the studies' Github or the Papin lab website (https://bme.virginia.edu/csbl/Downloads1.html).
Growth simulations, flux-based analyses, and GENRE quality assessment All modeling analyses were carried out using the COBRA toolbox implemented in python 109  C. difficile str. R20291 in vitro growth and microscopy C. difficile str. R20291 growth was maintained in an anaerobic environment of 85% N2, 5% CO2, and 10% H2. The strain was grown on BHI-agar (37 g/L Bacto brain heart infusion, 1.5% agar) medium at 37 °C for 48 hours to obtain isolated colonies. Rough and smooth colonies were chosen for propagation on BHI-agar to ensure colony morphology maintenance 84 . Basal Defined Medium (BDM) was formulated as previously published 47 with the addition of 1.5% agar for plates, and incubated for 48 hours at 37 °C to generate isolated colonies. Microscopy images were taken on an EVOS XL Core Cell Imaging System at 4x magnification.

RNA isolation, and transcriptome sequencing
For RNA isolation, rough and smooth isolates were subcultured in BHIS broth (37 g/L Bacto brain heart infusion, 5 g/L yeast extract) overnight (16-18 h)

Acknowledgements
The authors would like to acknowledge Bonnie Dougherty, Laura Dunphy, and Dawson Payne for their input and feedback on modeling parameters and biomass objective function formatting. The authors have declared that no competing interests exist. This work was supported by funding from The U.S. National Institutes of Health awards R01AT010253 to JP and R01Al143638 to RT. The funding agency had no role in study design, data collection/analysis, or preparation of the manuscript. GENREs compared with experimentally determined C. difficile minimal medium components across three previously published studies. Essentiality was determined for those genes and metabolites that when absent resulted in a yield of <1.0% of optimal biomass flux during growth simulation utilizing components of the corresponding media used experimentally. Additional trace minerals required for bacterial growth can be found in Table S2.  (Table S5). (E & F) Liquid-chromatography mass spectrometry analysis from cecal content of mice with and without C. difficile str. 630 infection in antibiotic pretreatment groups that resulted in either high or low cecal spore CFUs for metabolites highlighted by growth simulation analysis: (E) Isovalerate and (F) N-Acetylneuraminate. Significant differences determined by Wilcoxon rank-sum test with Benjamini-Hochberg correction (* p-values ≤ 0.05).  between flux sampled distributions of shared reactions of context-specific models. Significant difference calculated by PERMANOVA (*** p-value < 0.001). Transcriptomic data from cmr operon mutants (described previously) was also utilized to generate context-specific models for phase-locked isolates. (B) Following the same trend as phase-favoring colony variants, optimal biomass objective flux from each context-specific model was not significantly different. (C) Exchange reaction flux associated with N-acetylglucosamine export for both context-specific models (* p-value = 0.015). Significant difference determined by Wilcoxon rank-sum test. infection in antibiotic pretreatment groups that resulted in either high or low cecal spore CFUs. Significant differences determined by Wilcoxon rank-sum test with Benjamini-Hochberg correction (* p-values ≤ 0.05). Table S1) Topology summary statistics for C. difficile GENREs from AGORA and those generated here. Table S2) GENRE creation steps, Biomass formulation, Gap-filling media compositions, and GENRE statistics.   Table S4) Differential transcription and exchange fluxes for iCdR700 (str. R20291) with in vitro transcriptome.