Reveals Insights into Metabolic Capabilities and Therapeutic Strategies for Cystic Fibrosis

Background: Cystic fibrosis (CF) is an inherited genetic disorder caused by mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene, resulting in the production of sticky and thick mucosal fluids. This leads to an environment that facilitates the colonization of various microorganisms, some of which can cause acute and chronic lung infections, while others may have a positive influence on the disease process. Rothia mucilaginosa , an oral commensal, is relatively abundant in the lungs of CF patients. Recent studies have unveiled the anti-inflammatory properties of R. mucilaginosa using in vitro three-dimensional (3-D) lung epithelial cell cultures and in vivo mouse models relevant to chronic lung diseases. Apart from a potentially beneficial anti-inflammatory role in chronic lung diseases, R. mucilaginosa has been associated with severe infections. This dual nature highlights the bacterium’s complexity and diverse impact on health and disease. However, its metabolic capabilities and genotype-phenotype relationships remain largely unknown. Results: To gain insights into the cellular metabolism and genetic content of R. mucilaginosa , we developed the first manually curated genome-scale metabolic model, i RM23NL. Through growth kinetic experiments and high-throughput phenotypic microarray testings, we defined its complete catabolic phenome. Subsequently, we assessed the model’s effectiveness in accurately predicting growth behaviors and utilizing multiple substrates. We used constraint-based modeling techniques to formulate novel hypotheses that could expedite the development of antimicrobial strategies. More specifically, we detected putative essential genes and assessed their effect on metabolism under varying nutritional conditions. These predictions could offer novel potential antimicrobial targets without laborious large-scale screening of knock-outs and mutant transposon libraries. Conclusion: Overall, i RM23NL demonstrates a solid capability to predict cellular phenotypes and holds immense potential as a valuable resource for accurate predictions in advancing antimicrobial therapies. Moreover, it can guide metabolic engineering to tailor R. mucilaginosa ’s metabolism for desired performance. Data Availability: Supplementary data are available along with this article, whereas the metabolic model is accessible through the BioModels Database.


Introduction 1
Rothia mucilaginosa is a Gram-positive, encapsulated, non-2 motile, and non-spore-forming bacterium of the Micrococ-mutans, A. timonensis, and Streptococcus sp.) as well as four methicillin-resistant strains of S. aureus (MRSA).Enterobactin is a type of siderophore produced by bacteria to scavenge, chelate, and transport ferric irons from their surrounding environment.These are essential for bacteria when iron is scarce as they facilitate their acquisition necessary for their growth and metabolic processes.
Prior metagenomic sequencing analyses have unveiled the prevalence of R. mucilaginosa at high abundances and its enhanced metabolic activity in the lungs of cystic fibrosis (CF) patients 10,11 .CF is caused by the hereditary mutation of the cystic fibrosis transmembrane conductance regulator (CFTR)     gene that disrupts the transepithelial movement of ions, leading to an aberrant accumulation of thick and sticky mucus within the airways.The impaired immune clearance creates a hypoxic environment 12 promoting the polymicrobial colonization of opportunistic microbes together with fungi and viruses, ultimately resulting in persistent and recurring infections 13 .

Guss et al. and Bittar et al. declared R. mucilaginosa as an
emerging CF pathogen 14,15 , while Lim et al. provided evidence supporting that R. mucilaginosa is a frequently encountered and metabolically active inhabitant of CF airways 16 .
Additionally, a study from 2018 shows that the opportunistic pathogen Pseudomonas aeruginosa, which frequently causes infections in CF patients, builds essential primary metabolites, like glutamate, by utilizing compounds produced by R. mucilaginosa 17 .This symbiotic interaction implies that P. aeruginosa benefits from its neighboring microbes, which contributes to its pathogenesis in the CF lungs.On the other hand, Rigauts et al. revealed the anti-inflammatory activity of R. mucilaginosa in the lower respiratory tract, which could impact the seriousness of chronic lung diseases 18 .
In systems biology, genome-scale metabolic models (GEMs) represent comprehensive reconstructions of organisms' metabolic networks.They are built using genomic sequences and comprise all known biochemical reactions and associated genes.These models provide systems-level insights into cellular metabolism, allowing researchers to simulate and analyze the flow of metabolites through these networks 19 .The interactions among reactions and metabolites in a metabolic model are mathematically represented with a stoichiometric matrix 20 .In the past years, an array of in silico methods have been developed to analyze GEMs and derive valuable hypotheses.Flux balance analysis (FBA) is such a powerful computational technique that operates on the principle of achieving a steady state by optimizing the flux (rate) of metabolites through reactions while accounting for various constraints such as stoichiometry, thermodynamics, and uptake/secretion boundaries 21 .Applying FBA on a GEM provides insights into the intricate biological system interactions.This analytical approach facilitates the prediction of cellular phenotypes and identification of promising drug targets and contributes to optimizing biotechnological processes 22 .
Moreover, such models can guide genetic engineering by suggesting genetic modifications that could enhance desired product formation or cellular behavior.Further applications 78 include ameliorating culture media by incorporating compo-79 nents that increase bacterial growth rates.So far, GEMs have 80 been an invaluable resource in the systems biology field that 81 helped untangle the metabolism of various organisms and 82 especially of high-threat pathogens 23,24 .As described above, 83 R. mucilaginosa has gained great interest in the context of 84 polymicrobial CF environments.However, its metabolic ca-85 pabilities and genotype-phenotype relationships in isolated 86 monoculture settings remain largely unexplored.

87
Here, we present the first manually curated and high-quality 88 GEM of R. mucilaginosa, iRM23NL, striving to understand 89 its metabolism and unique phenotypes under diverse condi-90 tions.Our simulation-ready network accounts for thousands 91 of reactions and is available in a standardized format following 92 the community guidelines 25 .Through growth kinetic exper-93 iments and high-throughput phenotypic microarray assays, 94 we validated iRM23NL's accuracy in predicting growth and 95 substrate utilization patterns.We refined the reconstruction by 96 comparing the in vitro results to in silico simulations, resulting 97 in novel metabolic reactions and genes.To our knowledge, 98 this is the first study presenting high-throughput nutrient uti-99 lization and comprehensive growth data for R. mucilaginosa.100 Finally, we employed FBA to formulate novel gene essential-101 ity hypotheses that could expedite the development of antimi-102 crobial strategies.Figure 1 summarizes the experimental and 103 computational work presented here.The pipeline we previously developed 26 was used to build 108 the first high-quality and manually curated GEM of R. mu-109 cilaginosa DSM20746 (ATCC 25296).An initial draft meta-110 bolic model was derived with CarveMe 29 and is based on the 111 Biochemical, Genetical, and Genomical (BiGG) identifiers 30 .112 The translated sequence with over 1,700 proteins and the 113 Gram-positive-specific template were employed.This enabled 114 us to build a more precise reconstruction considering infor-115 mation on the peptidoglycan layer for the biomass objective 116 function (BOF).The draft network contained 1,015 reactions 117 (141 pseudo-reactions), 788 metabolites, and 265 genes (Fig- 118 ure 2).In the first gap-filling stage (Draft_2), we expanded 119 the list of reactions based on the annotated genome and growth 120 kinetics data in diverse growth environments.For this, we 121 extensively indexed organism-specific literature and databases 122 and included additional enzymatic reactions together with 123 47 new gene-protein-reaction associations (GPRs).Subse-124 quently, high-throughput nutrient utilization assays and model 125 validation incorporated further 71 reactions and their associ-126 ated metabolic genes.Non-metabolic genes, which take part 127 in other cellular processes e.g., signaling pathways or tran-128 scription, were not considered.In total, 82 reactions, together 129 with associated genes and metabolites, were newly added into 130 the model, along with 62 novel GPRs, increasing the genetic 131 species based on our phylogenomic analysis (Figure 3).According to the calculated average nucleotide identity (ANI) matrix, R. mucilaginosa exhibits a similarity to six out of the 13 tested Rothia genomes.More specifically, it shares a greater resemblance with R. aeria and R. dentocariosa underscoring a closer evolutionary relationship between these species.
R. mucilaginosa is primarily aerobic, efficiently generating ATP through oxic respiration; however, in low-oxygen or oxygen-absent conditions, it shifts to anaerobic metabolism Subsystem-level statistics within pathways along with the distribution of gene-and non-gene-associated reactions.The pathway analysis was limited to reaction identifiers that could be successfully mapped to Kyoto Encyclopedia of Genes and Genomes (KEGG) 28   Figure 3 | Phylogenomic all-vs-all analysis between 13 Rothia species.Based on the calculated ANI matrix, R. mucilaginosa is mostly similar to six out of 13 genomes, with higher similarity to R. aeria and R. dentocariosa.
148 in periplasm, and 168 in the extracellular space), and 176 372 genes (Figure 2).The model's metabolic coverage is 177 at 3.12 %, which indicates a high level of modeling detail infusion (BHI) and Luria-Bertani (LB), and tryptic soy broth environment for the growth of R. mucilaginosa and enabled us to compare the bacterium's growth characteristics to the newly tested media.For the in silico simulations, we applied FBA and added additional constraints to the linear programming problem defined in Equation (5).In more detail, we specified the flux constraints such that only extracellular metabolites defined in the medium of interest could flow freely through the system (unconstrained, finite fluxes) while the remaining fluxes were constrained to zero.We compared the in vitro to the in silico observed growth using the FC OD as a qualitative measure of growth (see Materials and Methods).Furthermore, we compared the OD at the start and the end of the experiment, considering a statistically significant difference between these measurements as an indication of growth.Our metabolic network, iRM23NL, simulated positive fluxes through the biomass reaction for all tested media except for the M9 pure medium, where a zero flux was observed.These findings align with the experimentally observed data.More specifically, there is no statistically significant difference in OD between the initial and final time-points in M9 pure medium (p-value = 0.1202 and FC OD < 1.4) indicating no significant growth.Conversely, in the remaining examined media, statistically significant growth was observed (p-value = 0.00006 -0.00142 and FC OD > 1.4) indicating significant growth in these settings.The highest aerobic growth rate was predicted in TSB (1.6 mmol/(g DW • h)), while the lowest biomass production flux was recorded for the M9 pure medium containing only essential salts.However, the RPMI medium followed as the second-highest in supporting bacterial in vitro cellular growth, offering a defined medium suitable for R. mucilaginosa's cultivation.Although R. mucilaginosa increased its biomass after 24 h, it slightly declined after 48 h.On the other hand, the simulated network resulted in a contrary outcome compared to the expected experimental effect.More specifically, iRM23NL simulated a lower flux through biomass (0.44 mmol/(g DW • h)) with RPMI when compared to LB.It is important to note here that in order to simulate growth in RPMI medium, six metal ions (cobalt (Co 2+ ), cooper (Cu 2+ ), manganese (Mn 2+ ), zinc (Zn 2+ ), ferric iron (Fe 3+ ), and ferrous iron (Fe 2+ )) were supplemented.These compounds were missing from the providers' medium formulation.Our findings underscored R. mucilaginosa's adaptability to various nutritional environments, growing best in nutrient-rich conditions while revealing specific growth requirements beyond minimal settings.
We further employed iRM23NL to examine whether it could generate biomass within the human nasal environment and the CF lungs.For this purpose, we performed in silico simulations using the synthetic cystic fibrosis sputum medium (SCFM) 39 and synthetic nasal medium (SNM) 40 (Figure 4 Panel C).Our computational model successfully simulated positive growth in both media, with a growth rate of 0.43 mmol/(g DW • h) in SNM and 0.45 mmol/(g DW • h) in SCFM.These results align with the documented metabolic activity of R. mucilaginosa in CF lungs and its frequent isolation (B) Experimentally-derived growth curves for R. mucilaginosa DSM20746 in multiple liquid growth media along with the respective fold changes (FCs) of the acquired optical densitys (ODs) at 590 nm, as defined in Equation (1).The data shown here are an average of three biological replicates (n=3).Based on the experimental results, a threshold of FC OD = 1.4 was established to qualitatively describe bacterial growth.We verified the correctness of the threshold by performing statistical analysis as described in Materials and Methods.All data are normally distributed, while there is no significant difference between their variances.The asterisks flag the significance levels.The BHI medium was used as a baseline, while the Control line represents blank measurements of pure media.Bacterial growth was aerobically measured by the OD at 590 nm (ordinate) at three distinct time points ranging from 0 h to 48 h (abscissa).(C) In silico-simulated growth rates using iRM23NL.Detailed in silico media formulations are provided in Table S3.from the human nasal cavity.Notably, the observed growth  5).These high-throughput assays serve as 276 proxies for bacterial growth by measuring cellular respiration 277 across several conditions.Active respiration in the minimal medium is detected by the reduction of tetrazolium dye over time, indicating the utilization of the provided sole source 41 .We cultivated our strain in a minimal medium supplemented with various sources, and growth was monitored over 48 hours to identify suitable nutrients for the bacterium (as described in Materials and Methods).The derived OD measurements were normalized according to the average growth over replicates per plate and converted to qualitative data representing nongrowth (NG) or growth (G).In total, we tested the uptake and utilization of 379 distinct carbon, nitrogen, phosphorus, and sulfur substrates.R. mucilaginosa demonstrated the ability to utilize 61 of 190 tested carbon substrates, including carboxylates, saccharides, and amino acids, while 10 of 95 were found to be viable nitrogen sources (Figure 5 Panel B).Furthermore, out of 59 tested phosphorus sources, R. mucilaginosa exhibited a loss of metabolic activity for 28 compounds, resulting in a non-viable phenotype, while only 71.4 % of all analyzed sulfur substrates supported positive growth.More specifically,  six inorganic phosphorus (IP), 14 organic phosphorus (OP), two cyclic nucleoside monophosphates (cNMPs), and nine nucleoside monophosphates (NMPs) were successfully utilized as sole phosphorus sources (Figure 5 Panel C).The experimentally defined nutrient utilization phenome of R. mucilaginosa can be found in Supplementary Figure S1.An overview of all experimentally tested substrates, along with the assay results, can be found in Table S4.We independently confirmed the Biolog nutrient utilization data by testing the ability of DSM20746 to grow on minimal media in the presence of ten compounds (see Materials and Methods, Figure S2).
Additionally, we evaluated the predictive performance of our metabolic model by using various C-, N-, P-, and S-containing substrates.All compounds from the highthroughput phenotypic data were mapped to BiGG 30 identifiers and subsequently to iRM23NL.In total, 286 could be successfully mapped to the BiGG database.From these, 126 existed as extracellular metabolites in iRM23NL and were considered for further analysis.Model simulations were performed under aerobic conditions with the minimal medium defined in Table S3 and FBA (see Materials and Methods).
An extracellular reaction was enabled for each tested substrate to force the model to use its transporters.Discrepancies between the Biolog data and the model simulations were utilized as basis for hypotheses to further improve and refine the network reconstruction.We resolved most inconsistencies via extensive literature mining and iterative gap analysis.
For this, we used the organism-and strain-specific BioCyc 31 database.Throughout this process, we encountered different scenarios regarding incorrect model predictions.These included compounds present in all compartments, including the extracellular space, as well as substrates defined within the intracellular space and periplasm, with no transporter defined towards the extracellular space.If the experimental results indicated utilization of an undefined compound, we searched BioCyc 31 to find strain-specific and gene-based missing transporters or enzymatic reactions.When no organismspecific evidence was available, we sought supporting data from genomically identical species (Figure 3).For instance, the compound 3-sulfino-L-alanine (3sala) was initially absent from any compartment in the preliminary draft model.Since no strain-specific information was available, we conducted a homology-based search using Basic Local Alignment Search Tool (BLAST) 42 to find genes with high similarity (similarity threshold: > 80 %) in related species.Subsequently, we identified cysteine desulfurase (SULFCYS) along with three associated transport reactions (proton-mediated; In total, 14 carbon and nitrogen sources failed to promote 377 growth in iRM23NL.Surprisingly, all of these sources had 378 corresponding transport reactions iRM23NL but still remained 379 ineffective (e.g., L-fucose, L-arabinose, and L-rhamnose) and 380 nitrogen (L-tyrosin).We could not find further information on 381 their transport or metabolic mechanism either in the genome 382 annotation or the literature.

383
In summary, the final prediction accuracy of nutrient assimi-384 lation and utilization achieved by iRM23NL was 77 % for car-385 bon sources (MCC for PM1 = 0.52 and PM2A = 0.58), 94.4 % 386 for nitrogen sources (MCC = 0.82), 97 % for phosphorus and 387 sulfur sources (ACC = 100 %; MCC = 1.0 and ACC = 92.3%; 388 MCC = 0.82 respectively) (Figure 6).Our model's perfor-389 mance was notably increased by 40 % post-comprehensive 390 curation compared to the initial draft model.Our refinement 391 reduced false positive predictions by 17, leaving only three un-392 resolved mismatches.The most remarkable improvement was 393 in nitrogen and phosphorus sources predictions.The high pre-394 dictive accuracy indicates that core metabolic pathways and 395 multiple catabolic routes of DSM20746 have been accurately 396 reconstructed within iRM23NL.Consequently, the network 397 can predict the catabolism of numerous common compounds, 398 such as sugars and amino acids.Nitrogen sources ACC = 94,4% True Positive Figure 6 | Predictive accuracy performance of iRM23NL using nutrient utilization data.Only substrates that exhibited complete mapping to both BiGG and model identifiers could be analysed.Green represents correct predictions, and orange represents inconsistent predictions.The overall prediction accuracy of iRM23NL was computed using Equation (6).
systematically removed each biochemical reaction from the network and optimized iRM23NL to produce biomass using FBA.To mitigate the inherent variability of the optimization algorithm, we repeated our FBA simulation 100 times.Additionally, we employed parsimonious enzyme usage flux balance analysis (pFBA), which involves solving two sequential linear optimization problems to determine the flux distribution of the optimal solution while minimizing the total sum of flux.
Then, we compared the predicted growth rates before and after introducing the simulated gene deletion.The FC gr between the knocked-out and wild-type growth rates was employed as a proxy for the gene's essentiality.We proceeded with in silico single gene deletions using a minimal and nutrient-rich medium (LB and M9 supplemented with glucose) as well as two growth media that mimic intra-human nasal passages and the lungs of CF patients (SNM 40 and SCFM 39 ) (Table S3).
Generally, when subjected to nutrient-limited conditions, the model predicted a higher number of genes as essential for growth, while the count of essential genes remained consistent among oxic and anoxic conditions (Figure 7 Panel A).
In total, four metabolic genes exhibited a partially essential effect across all tested media.This indicates that these genes promote cellular fitness, and their deletion partially impairs the bacterium's capacity to generate biomass.These genes are the TrkA family potassium uptake protein (WP_005506372.1),ribulose-phosphate 3-epimerase (WP_005507411.1),glucose-6-phosphate isomerase (WP_005508482.1), and transaldolase (WP_005509117.1).The majority of essential genes involved in nucleotide metabolism, peptidoglycan biosynthesis, or the energy metabolism.These over-represented subsystems among the identified essential genes suggest their importance in supporting the bacterium's respiration (Figure S4).
448 Subsequently, we conducted a protein sequence homol-449 ogy analysis with BLAST against the human proteome to 450 detect potential antimicrobial targets.For this, only genes 451 highlighted as essential in both laboratory and synthetically-452 defined media were considered (Figure 7 Panel B).Over-453 all, 35 essential genes were common in LB and M9, of 454 which 20 common genes reported homologous counterparts 455 in the human genome.Further analysis revealed that among 456 these genes, five genes exhibited over 50 % sequence simi-457 larity with homologous proteins, although none resulted in 458 over 80 % similarity.Similarly, when iRM23NL was simu-459 lated with SCFM and SNM in both aerobic and anaerobic 460 conditions, 45 shared genes were predicted to be essential.461 Homology analysis against the human genome yielded 31 462 genes with exhibited homology in the human genomes, with 463 seven demonstrating over 50 % sequence similarity.For in-464 stance, genes encoding proteins such as phosphopyruvate hy-465 dratase (WP_005506838.1),CTP synthase (WP_044141843.1),466 and adenylosuccinate synthase (WP_005509175.1)consis-467 tently exhibited human counterparts with similarity exceed-468 ing 50 % across all tested growth media and oxygen lev-469 els.Among the essential genes shared between both LB 470 and M9, 15 of them did not have any homologous hits.471 The same was observed for 20 common essential genes in 472 SCFM and SNM.Some examples of these genes include 473 orotate phosphoribosyltransferase (WP_005507935.1),type I 474 pantothenate kinase (WP_005505041.1),dihydroneopterin al-475 dolase (WP_005507619.1), and pantetheine-phosphate adeny-476 lyltransferase (WP_005508106.1).A more detailed compari-477 son can be found in Table S7.

478
Our in silico transponson mutant analysis using iRM23NL 479 could serve as a basis for several research and practical appli-480 cations from rational drug target development to biotechno-481 logical applications and metabolic engineering.Protein sequence homology analysis of genes predicted to be essential in the laboratory media (LB and M9 pure supplemented with glucose) and the synthetically defined SNM and SCFM in both oxygen-rich and -limited conditions.The percentage identity threshold was set to 50 % similarity to the human proteome.
unexplored.Investigating its metabolic traits is of great im- to nutrient uptake, substrate production, and growth dynamics, crucial for understanding its role in a broader ecosystem.Monoculture studies identify key genes and pathways, revealing how the bacterium functions autonomously.Such analysis serves as a valuable reference, differentiating inherent characteristics from those influenced by external interactions.To this end, we empirically analyzed the metabolic phenome of R. mucilaginosa DSM20746 and developed the first high-quality strain-specific GEM of R. mucilaginosa, called iRM23NL.We considered literature and database organism-specific evidence to manually gap-fill the model and include highly relevant biochemical reactions.Phylogenetic analysis of further Rothia species provided insights into the relationship and genetic diversity between these species and was utilized to extend the metabolic network's completeness.Our model is simulationready, follows strictly community standards 25 , and exhibits a high content quality MEMOTE score.
R. mucilaginosa is primarily aerobic and can perform oxic respiration by efficiently generating energy in the form of adenosine triphosphate (ATP) 1 .However, when oxygen is limited or absent, R. mucilaginosa switches to anaerobic in the survival of all living organisms 47 .For instance, they are involved in redox catalysis, needed for energy production through respiration, and in non-redox catalysis, necessary for many biosynthetic and metabolic processes.Additionally, transition metals are required for the activity of many enzymes, including those involved in genomic replication and repair and nitrogen fixation.However, since these compounds were absent from the providers' medium formulation for RPMI, we speculate that the provided medium definition may not be exact.In all cases, the suggested metal co-factor promiscuity in R. mucilaginosa by iRM23NL, needs to be examined to shed light on whether the bacterium could survive in the absence of one of the suggested metals.
Moreover, we experimentally characterized the strain's ability to assimilate and utilize substrates using high-throughput phenotypic microarray assays.The utilization of various nitrogen sources did not result in active respiration, indicating that the bacterial genome lacks genes encoding for respective transporters.We used the phenotypic results to validate and extend our metabolic reconstruction, iRM23NL.Inconsistencies between the model and the phenotypic microarray results served as a basis for further model refinement.We enriched the model with missing transport reactions and their respective GPRs by referring to the organism-and strain-specific BioCyc 31 registry and the General Feature Format (GFF) annotation file.All in all, characterizing and determining the repertoire of nutrient sources a strain can use or assimilate is a critical factor of pathogenesis.It provides valuable insights into how pathogens adapt to host environments and evade host defenses.Our transporter-augmented model reflects a high accuracy degree with the experimental data regarding using carbon, nitrogen, phosphorus, and sulfur sources.Discrepancies between computational and empirical results highlight areas of current uncertainty knowledge regarding the metabolism of R. mucilaginosa.They could be attributed to non-metabolic factors that fall beyond the metabolic models' scope, including regulatory processes, gene expression, and signaling pathways.However, targeted experiments are needed to fill the remaining network gaps and reveal novel enzymatic processes.
Considering the predictive precision of our metabolic reconstruction, we utilized iRM23NL to derive novel hypotheses.We examined the effects of single gene knock-outs on the bacterial capacity to produce biomass.We created a highthroughput in silico-derived transposon mutant library considering two standard growth media, LB and M9, along with two growth media formulated to mimic the environment within the human body, SNM and SCFM.In this regard, we identified putative essential and partially essential genes and assessed their potential vulnerability under varying nutritional environments.With this, we opted for detecting candidate genes that could be considered in future antimicrobial and -inflammatory strategies in immunocompromised and CF patients.Determining which essential genes have human counterparts is of great importance for antibiotic drug development, as it helps as- sess potential side effects and cross-species effects on human genes targeted by antibiotics.Moreover, it provides insights into the molecular mechanisms of host-pathogen interactions, explaining how pathogens manipulate host cells and evade the immune system.Utilizing our GEM offers promising venues for future targeted engineering strategies without the need for laborious large-scale screening of knock-outs and mutant libraries.This methodology would facilitate the rapid design of metabolic gene knockout strains by eliminating the associated reaction(s) from the model.
Altogether, creating a genome-scale metabolic network for R. mucilaginosa reveals insights that would have been resource-intensive to acquire using traditional wet-lab means.
Understanding the metabolic complexities of R. mucilaginosa is essential for expanding our basic understanding of bacterium's microbiology and would benefit various practical applications.In medicine, it could facilitate the development of strategies to deal with caused infections, while in biotechnology, it would allow us to use its metabolic abilities for bioprocessing and bioengineering purposes.Hence, our highquality metabolic network, iRM23NL, could provide a systematic and detailed framework for analyzing R. mucilaginosa's metabolism, yielding valuable insights with broad-reaching impacts.

Bacterial strain and growth conditions
The R. mucilaginosa DSM20746 (ATCC 25296) used for the experimental work in this study is a type strain, and it was purchased from the American Type Culture Collection (ATCC, US).To create an inoculum, the bacterium was streaked onto nutrient agar (NA, Neogen, Heywood, UK) plates from a cryopreserved glycerol stock stored at −80 • C using a sterile loop.Subsequently, the plates were incubated at 37 • C for 48 h to form colonies (pure cultures).It is important to note that each biological replicate was conducted using pure cultures derived from the initial frozen stock (no sub-culturing).This ensures maintaining the genetic and phenotypic characteristics of the strain without introducing any potential mutations or adaptations.
Growth kinetics protocol R. mucilaginosa overnight liquid cultures were prepared by adding bacterial colonies from pure cultures to 5 mL BHI (Neogen, Heywood, UK) and were put at 37 • C in a shaking incubator for 24 h.The initial OD was assessed and, if necessary, adjusted via up-concentration or dilution to achieve OD 590 nm = 0.25.Then, the bacterial suspension was subjected to centrifugation at 10,000 RPM for 5 min, and the resulting pellet was re-suspended in the medium of interest at a dilution of 1 : 10.Ultimately, the inoculated growth media were transferred to a sterile 96 well-plate, including three technical replicates for each tested condition together with their corresponding control conditions (sterile growth media).
The outer wells were filled with milliQ water (MQ) to prevent 691 evaporation.The respective OD 590 nm was measured aerobi-692 cally at three distinct time points (0 h, 24 h and 48 h) using 693 an EnVision microplate reader (Perkin Elmer, Waltham, MA, 694 US).The microplates were incubated at 37 • C during the in-695 terim periods between measurements.The final growth curves 696 were generated for three biological replicates (n=3) for the fol-697 lowing growth media: BHI (baseline medium), LB (Neogen, 698 Heywood, UK), M9 pure, RPMI medium (RPMI-1640 Sigma-699 Aldrich), and TSB (Neogen, Heywood, UK).In the M9 pure 700 medium only salts were considered.For detailed information 701 regarding the constitution of M9, see Table S1.The rest of the 702 media were prepared according to the providers' instructions.703 The raw data were normalized by subtracting the blank 704 values from the measured ODs and were summarized by cal-705 culating the arithmetic mean across all replicates.To interpret 706 the growth of bacterial cells in all tested media and compare 707 their growth characteristics, we employed the FC OD ratio, 708 which is defined as follows: In this context, we define FC OD below 1.4 as no growth, while 710 FC OD ratios greater than PM4A).These assays use a tetrazolium redox dye to enable 719 a colorimetric detection of active cell respiration across differ-720 ent nutrient sources 41 .Normal cell respiration is indicated by 721 the formation of a purple color as a result of the reduction of 722 the colorless dye during incubation.The PM plates were prepared following the manufacturer's 724 protocol for Gram-positive bacteria.Table 1 lists the assay 725 set up for of PM plates.However, modifications were made 726 during the cell suspension preparation.The strain was grown 727 on nutrient agar plates without undergoing sub-culturing.Us-out for each tested compound.The M9 pure medium was created as described in Table S1, while Table S2 lists the exact concentrations of added substrates.All bacterial cell suspensions were prepared in 1:10 dilutions, and the ODs 590 nm were measured at 0 h, 24 h and 48 h using an EnVision microplate reader (Perkin Elmer, Waltham, MA, US) and the associated software package.
We computed the arithmetic mean across the three replicates from the collected dataset for each measured time point.Additionally, we performed a background correction to mitigate the influence of background noise or unwanted signal interference present in the measured ODs.

Statistical Hypothesis Analysis
We conducted statistical tests to evaluate the chosen threshold and potential statistically significant differences between measurements at the initial and final time-points, thereby indicating the significant growth or no growth.Specifically, we employed the Student's t-test for each experimental condition, taking into account the data from the three biological replicates.The null hypothesis is formulated as following: there is no significant difference between the measured OD values in starting and end time-points.Prior to hypothesis testing, we checked the correctness of associated assumptions.More specifically, we assessed data normality through the Shapiro-Wilk test and verified the homogeneity of variances using the Levene's test.

Phylogenomic analysis
We supported the gap-filling process using evidence of closely related species within the Rothia genus.Employing ANIclustermap v.1.1.0 49, we conducted a comprehensive genomic comparison involving R. mucilaginosa DSM20746 and 12 distinct Rothia species: R. koreensis, R. kristinae, R. santali, R. halotolerans, R. aeria, R. dentocariosa, R. terrae, R. amarae, R. nasimurium, R. mucilaginosa, R. aerolata, R. nasisuis, and R. endophytica (see Figure 3).In brief, ANIclustermap creates an all-vs-all genome ANI clustermap and groups microbial genomes based on their genetic similarity.ANI is a pairwise measure to classify bacterial genomes according to their genetic similarity.It is defined as the genetic similarity across all orthologous genes shared between any two genomes 50,51 .It serves as a powerful tool for distinguishing strains of the same species or closely related species.

Draft model reconstruction and curation
The proteome of R. mucilaginosa DSM20746 (GCF_000175615.1)served as the basis for reconstructing a draft metabolic network.The DSM20746 (ATCC 25296) represents a type strain obtained from the throat, and its genetic and proteomic sequences were retrieved from National Centre for Biotechnology Information (NCBI) 1 .The genome sequence was annotated using the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) 52 .An initial draft model was built using CarveMe 1.5.1 29 .CarveMe uses mixed-integer linear programming (MILP) to convert a universal model into an organism-specific one by deleting metabolites and reactions with low occurrence scores within the specific organism of interest.The universal BOF might yield incorrect gene essentiality predictions for biosynthesis pathways that rely on precursors unique to Gram-positive bacteria due to the absence of specific membrane and cell wall information.Hence, we chose the specialized Gram-positive template instead of the universal one to build our model more accurately.
We conducted an extensive two-staged iterative gap-filling to address incomplete or missing information in the metabolic model.Gaps or missing reactions can arise for various reasons, such as incomplete genome annotations or undiscovered enzymatic activities.For this purpose, we leveraged information from both the bibliome and biochemical databases, including BioCyc 31 .Thus, we ensured that the model could support the growth and viability of the organism under specific conditions.We applied our previously-published pipeline 26  KEGG 28 subsystems as groups:member (biological qualifier: BQB_OCCURS_IN), and gene annotations.The latter was done by mapping the gene locus tags to the old tags using the Gen-Bank GFF 53 .Finally, we checked the presence of potential EGCs that could bias the final predictions 54 .To manipulate the model structure, we employed the libSBML library 55 .
The SBML Validator from libSBML 55 was used to assure a correct syntax of the model, while the quality control was carried out employing MEMOTE 37 .However, it is worth noting that, as we discussed in our previous publication, MEMOTE considers only the parent nodes of the SBO directed acyclic graph excluding their respective children 26 .Hence, MEMOTE was used carefully and not as an absolute quality indicator.To address the under-determined nature of the system, constraints are imposed to define an allowable solution space that aligns with cellular functions.These constraints, encompassing mass balance, thermodynamics, and capacity, contribute to the FBA maximization problem.The linear programming problem used to obtain growth rates is described as follows: (5) where⃗ v is the vector of fluxes within the network, S is the stoi-878 chiometric matrix, Z is the linear objective function, ⃗ c is the 879 vector of coefficients, and I represents an index set containing 880 the indices of all irreversible reactions.The dimensionality of 881 vector ⃗ v matches the number of reactions, denoted as n in the 882 system, and is consistent with the n columns in the matrix S. 883

Bacterial growth analysis and nutrient utilization assays 884
Bacterial cell growth within various media and multiple sub-885 strate utilization evaluations were determined by solving Equa-886 tion (5).The medium and the nutrient source of interest de-887 fined additional constraints.To achieve this objective, adjust-888 ments were made to the upper and lower limits of exchange 889 reactions, as appropriate.We set specific uptake rates for 890 key components within the growth medium as follows: the 891 uptake rate of transition metals was set at 5.0 mmol/(g DW • h), 892 the uptake rate of oxygen under aerobic conditions was estab-893 lished at 20.0 mmol/(g DW • h), and the rest media components 894 equal to 10.0 mmol/(g DW • h).As previously mentioned, the 895 M9 pure medium was used for the substrate utilization as-896 says.Only substrates present in the metabolic network as 897 intra-or extracellular metabolites were considered for the in 898 silico validation.The results from the experimental and the in 899 silico growth tests were categorized into "growth" (G) or "non-900 growth" (NG).Here, "growt" indicates the network's ability 901 to generate biomass and, therefore, a positive growth rate.The 902 model's overall prediction performance was assessed using 903 the following statistical parameters: where true negative (TN) and true positive (TP) represent 907 accurate predictions, and false negative (FN) and false posi-908 tive (FP) indicate incorrect predictions.Inconsistencies were 909 resolved via iterative manual network gap-filling.For all FBA 910 simulations, we employed the Constraints-Based Reconstruc-911 tion and Analysis for Python (COBRApy) 56 package.All 912 growth media definitions are listed in Table S3. 913

Gene lethality analysis 914
The in silico single-gene knockouts were performed as de-915 scribed in our previous study using FBA 26 .To address the 916 degeneracy issue of optimization, we additionally ran our 917 FBA simulations in a total of 100 independent runs.Further-918 more, we utilized pFBA, a method that allows us to ascertain 919 the flux distribution of the optimal solution while concurrently 920 minimizing the overall flux sum 57 .The results were catego-921 rized as either essential FC gr = 0, inessential (FC gr = 1), or 922 partially essential (0 < FC gr < 1), where FC gr denotes the 923 FC bacterial growth rate before and after deletion 26 .Shared 924 essential genes between FBA and pFBA, as well as all tested 925 conditions, were further aligned against the human genome 926 using BLAST 42 .

Figure 1 |
Figure 1 | Construction and validation flowchart of the metabolic network for R. mucilaginosa, iRM23NL.The study is divided into the experimental and computational phases.The proteome-derived metabolic reconstruction and curation was done based on the workflow we described elsewhere26 .

NantiaFigure 2 |
Figure 2 | Properties of the R. mucilaginosa DSM20746 genome-scale metabolic model iRM23NL.(A) Evolution of metabolic network content from its initial draft to the final stage of extensive manual gap-filling.The shifts in the sets' sizes are also displayed in each stage.The first stage of gap-filling is denoted by Draft_2, while the final stage is upon validation with experimental data.(B) UpSet plots comparing sets between three model versions, created using the UpSetPlot package 27 .The numbers indicate the cardinality of the respective set.(C)Subsystem-level statistics within pathways along with the distribution of gene-and non-gene-associated reactions.The pathway analysis was limited to reaction identifiers that could be successfully mapped to Kyoto Encyclopedia of Genes and Genomes (KEGG)28 reactions.

200(
TSB), and two defined media; M9 minimal medium (M9) 201 pure and Roswell Park Memorial Institute (RPMI) (Figure 4 202 Panel B).The BHI medium was used as a baseline for the 203 in vitro experiments since it is a known and well-established 204

NantiaFigure 4 |
Figure 4 | Investigation of R. mucilaginosa's growth behavior in different nutrient media.(A) Metabolic response of R. mucilaginosa under anaerobic stress as represented in iRM23NL.Reduction process of oxygen (O 2 -) generating ROS is indicated by red arrows, while pathways highlighted in green arrows represent reactions governed by ROS scavenging enzymes leading to bacterial cell detoxification.(B)Experimentally-derived growth curves for R. mucilaginosa DSM20746 in multiple liquid growth media along with the respective fold changes (FCs) of the acquired optical densitys (ODs) at 590 nm, as defined in Equation(1).The data shown here are an average of three biological replicates (n=3).Based on the experimental results, a threshold of FC OD = 1.4 was established to qualitatively describe bacterial growth.We verified the correctness of the threshold by performing statistical analysis as described in Materials and Methods.All data are normally distributed, while there is no significant difference between their variances.The asterisks flag the significance levels.The BHI medium was used as a baseline, while the Control line represents blank measurements of pure media.Bacterial growth was aerobically measured by the OD at 590 nm (ordinate) at three distinct time points ranging from 0 h to 48 h (abscissa).(C) In silico-simulated growth rates using iRM23NL.Detailed in silico media formulations are provided in TableS3.
260 rates closely resembled the flux rate predicted for biomass 261 production in RPMI medium.Additionally, we confirmed that 262 iRM23NL accurately represented R. mucilaginosa's capacity 263 for facultative anaerobic respiration.In more detail, when the 264 oxygen uptake was turned off iRM23NL could successfully 265 exhibit growth using alternative metabolic pathways across 266 all tested nutritional media.When the oxygen level was de-267 creased, the model predicted up to 68 % reduction in biomass 268 yield compared to aerobic conditions.Consequently, the re-269 markably lower anaerobic rates in all tested media mimic 270 R. mucilaginosa's inherent facultative anaerobic capabilities.271 Nutrient utilization profile of R. mucilaginosa and pre-272 dictive performance of iRM23NL 273 We experimentally characterized the metabolic phenotype of 274 R. mucilaginosa DSM20746 using four 96 well Biolog PM 275 microplates (Figure

Figure 5 |
Figure 5 | Complete experimentally-derived nutrient utilization phenome of R. mucilaginosa DSM20746.(A) Utilization of individual nutrients by the bacterium across four Biolog phenotypic microarrays.Bacterial growth was measured by OD at 590 nm.(B) Numerical summary nutrient sources experimentally tested in each Biolog phenotype microarray (PM), classified into those resulting in bacterial growth and those that R. mucilaginosa could not utilize.(C) Categorization of all tested phosphorous sources during the high-throughput Biolog assay.Utilization of totally 31 phosphorus sources resulted in positive phenotype (green chart), while the cell exhibited an inability to utilize the remaining 28 (orange chart).

399Formulating novel hypotheses using iRM23NL 400
Gene essentiality analysis and identification of novel targets 401Given the increased percentage of gene-associated reactions 402 (Figure2Panel C) and the high predictive accuracy of the me-403 tabolic reconstruction, we employed iRM23NL further to pre-404 dict exploitable single gene knock-outs.For this purpose, we 405

482483Figure 7 |
Figure 7 | Comparative analysis of novel gene essentialities in iRM23NL across four distinct growth media.(A) Classification of network-derived single gene deletions within iRM23NL, classified into essential, inessential, and partially essential genes, when subjected to aerobic (green) and anaerobic (orange) environments.Details regarding the classification schema can be found in Materials and Methods.(B)Protein sequence homology analysis of genes predicted to be essential in the laboratory media (LB and M9 pure supplemented with glucose) and the synthetically defined SNM and SCFM in both oxygen-rich and -limited conditions.The percentage identity threshold was set to 50 % similarity to the human proteome.
to curate further the model based on community standards.The pipeline consists of eight steps, from which five (steps 3step 4) are related to model curation and ensure a high quality of the final model.In Summary, ModelPolisher 35 and SBOannotator 36 were employed to enrich the model with multiple cross-references, while the mass-and charge-unbalanced reactions were fixed.Further annotations integrated into the model encompassed: Evidence and Conclusion Ontology (ECO) terms representing the confidence level and the assertion method (biological qualifier: BQB_IS_DESCRIBED_BY), Linear programming: formulation of assumptions and constraints FBA is used to determine the flux distribution through optimization of the objective function, typically the maximization of biomass production rate, under steady-state conditions21 .
r ∈ {1, . . ., n} ∀r ∈ I : 0 ≤ v r T N T P + T N + FP + FN (6) and Matthews Correlation Coefficient (MCC): 906 MCC = (T P • T N − FP • FN) (T P + FP)(T P + FN)(T N + FP)(T N + FN) More specifically, initial model 352 predictions indicated that iRM23NL could not sustain growth 353 when supplied with either L-cysteate (Lcyst) or AMP (amp) 354 as sole sources, while Biolog assays indicated the opposite.355 To rectify this, we introduced the corresponding irreversible 356 transporters (LCYStex and AMPt) and enabled their in silico 357 utilization of these compounds.Moreover, several metabolites 358 (e.g., phosphoenolpyruvate; pep, trimetaphosphate; tmp, hy-359 potaurine; hyptaur, and inorganic triphosphate; pppi) which 360 were absent from the initial draft model but exhibited positive 361 growth in utilization assays, were subsequently incorporated 362 into the final network, leading to additional true positives pre-363 dictions.All in all, over 50 transport reactions were added 364 into the network, while 37 wrongly added enzymatic functions 365 were removed.We also incorporated novel GPRs encoding 366 over 60 biochemical reactions.Nevertheless, we identified 367 approximately 20 instances where the resolution of incon-368 sistencies necessitated the inclusion of metabolic reactions 369 lacking associated gene evidence.For instance, to enable 370 the utilization of L-aspartate, we introduced a transporter via 371 diffusion from extracellular to periplasm (ASPtex), for which 372 no associated GPR was available.Similar scenarios arose 373 for other compounds, e.g., D-galactose, D-glucuronate, and 374 acetate.These instances underscore knowledge gaps in the 375 metabolism of DSM20746 that require in-depth investigation.376 SULFCYSpp, diffusion; SULFCYStex, and ABC transport; SULFCYSabc) that displayed a similarity over 80 % with R. dentocariosa.These components were consequently incorporated into iRM23NL, resulting in the expected positive utilization phenotype.Generally, false negative or false positive predictions arise from missing or erroneous involvement of transporters, respectively.We resolved false positives by removing transport reactions lacking supporting gene evidence or adjusting their reversibil-ity to facilitate export solely.
1.4 indicate a growth increase over 711 time.

Table 1 |
Assay configuration for diverse Biolog PM microplates combinations.Volumes are expressed in µL.The provided volume quantities are adequate for inoculating the specified number of plates in this study, using 100 µL/well with an additional excess.
Basic Local Alignment Search Tool Nantia Leonidou et al.Validated Metabolic Model of R. mucilaginosa: iRM23NL