Genome-scale metabolic model of Staphylococcus epidermidis ATCC 12228 matches in vitro conditions

Staphylococcus epidermidis, a commensal bacterium inhabiting collagen-rich areas, like human skin, has gained significance due to its probiotic potential in the nasal microbiome and as a leading cause of nosocomial infections. While infrequently leading to severe illnesses, S. epidermidis exerts a significant influence, particularly in its close association with implant-related infections and its role as a classic opportunistic biofilm former. Understanding its opportunistic nature is crucial for developing novel therapeutic strategies, addressing both its beneficial and pathogenic aspects, and alleviating the burdens it imposes on patients and healthcare systems. Here, we employ genome-scale metabolic modeling as a powerful tool to elucidate the lifestyle and capabilities of S. epidermidis. We created a comprehensive computational resource for understanding the organism’s growth conditions within diverse habitats by reconstructing and analyzing a manually curated and experimentally validated metabolic model. The final network, iSep23, incorporates 1,415 reactions, 1,051 metabolites, and 705 genes, adhering to established community standards and modeling guidelines. Benchmarking with the MEMOTE test suite yields a high score, highlighting the model’s high semantic quality. Following the FAIR data principles, iSep23 becomes a valuable and publicly accessible asset for subsequent studies. Growth simulations and carbon source utilization predictions align with experimental results, showcasing the model’s predictive power. This metabolic model advances our understanding of S. epidermidis as a commensal and potential probiotic and enhances insights into its opportunistic pathogenicity against other microorganisms. Author summary Staphylococcus epidermidis, a bacterium commonly found on human skin, has shown probiotic effects in the nasal microbiome and is a notable causative agent of hospital-acquired infections. While typically causing non-life-threatening diseases, the economic ramifications of S. epidermidis infections are substantial, with annual costs reaching billions of dollars in the United States. To unravel its opportunistic nature, we utilized genome-scale metabolic modeling, creating a detailed mathematical network that elucidates S. epidermidis’s lifestyle and capabilities. This model, encompassing over a thousand reactions, metabolites, and genes, adheres rigorously to established standards and guidelines, evident in its commendable benchmarking scores. Adhering to the FAIR data principles (Findable, Accessible, Interoperable, and Reusable), the model stands as a valuable resource for subsequent investigations. Growth simulations and predictions align closely with experimental results, showcasing the model’s predictive accuracy. This metabolic model not only enhances our understanding of S. epidermidis as a skin commensal and potential probiotic but also sheds light on its opportunistic pathogenicity, particularly in competition with other microorganisms.


23
A prevalent constituent of the human skin flora is 24 the coagulase-negative commensal Staphylococcus epider-25 midis 1, 2 .This Gram-positive coccus predominantly inhabits 26 the skin and mucosal membranes in areas such as the axillae, 27 head, legs, arms, and nares.S. epidermidis plays a crucial role 28 in maintaining a balanced microbiome within the human nasal 29 cavity, where harmful pathogens like Staphylococcus aureus 30 commonly establish colonization.There is ongoing discourse 31 regarding whether S. epidermidis, through competition in 32 nutritionally scarce environments like the human nose, may 33 Figure 1 | A new metabolic network for S. epidermidis ATCC 12228, called iSep23.The computational metabolic network was created and validated using a two-phase approach.The initial phase encompassed the mathematical representation of the metabolism through the deployment of genome-scale models.Subsequently, the second phase involved the functional validation, rooted in experimental data.
One way to better understand an organism's lifestyle and capabilities is the reconstruction and analysis of genome-scale metabolic models (GEMs).These models rely on the annotated genome sequence of the organism in question.Specifically, genes encoding proteins with metabolic significance are allocated to their respective reactions through gene-proteinreaction associations (GPRs).Within the resulting network, biochemical reactions establish connections between metabolites, with enzymatic activities guided by genes associated with these reactions.Such models enable the comprehension of an organism's metabolism at a systemic level.Díaz Calvo et al. reconstructed the metabolic network of RP62A, a slime-producing and methicillin-resistant biofilm isolate 8 .However, the resulted model is available only upon request.Figure 1 summarizes the computational and experimental approach of this article.This work introduces iSep23, the first manually curated and experimentally validated GEM of S. epidermidis ATCC 12228.The model comprises 1,415 reactions, 1,051 metabolites, and 705 genes and is freely available from BioModels Database 9 with the accession identifier MODEL2012220002.Moreover, it aligns with current community standards 10,11,12 and modeling guidelines 13,14 .Semantic benchmarking was conducted utilizing the MEMOTE genome-scale metabolic model test suite 15 .Consequently, iSep23 upholds the Findable, Accessible, Interoperable, and Reusable (FAIR) data principles 16 , rendering it a valuable Leonidou et al.
Metabolic model of S. epidermidis

Properties of the constructed GEM
The (ii) The ECO terms are stored under BQB_IS_DESCRIED_BY.
(iii) Pathways associated with a reaction are saved with the biological qualifier type BQB_OCCURS_IN.
The metabolites and genes were annotated with twelve and three external databases, respectively, using the biological qualifier type BQB_IS (Table 1).The inclusion of ECO terms ensures a comprehensive understanding of evidence and assertion methodologies 36 , thereby facilitating robust quality control measures and evidence queries.The ECO term with the lowest evidence level is ECO:0000001, coding for inference from background scientific knowledge (Figure 2).This term was ascribed to 30.2 % of the biochemical reactions within the network.Notably, this percentage encompasses pseudoreactions, such as exchanges, sinks, demands, and the biomass function.Within the group of 431 reactions associated with this ECO term, 170 pertained to pseudo reactions.The ECO term ECO:0000251 denotes similarity evidence used in automatic assertion and was assigned to 28.5 % of all reactions.
Moreover, the terms ECO:0000251 (computational inference used in automatic assertion) and ECO:0000044 (sequence similarity evidence) annotated 9.3 % and 31.9 % of all reactions, The final curated metabolic model was stored as a Systems 144 Biology Markup Language (SBML) Level 3 37 file.This for-145 mat version supports the integration of various plugins, such 146 as the fbc package 38 and the groups package 39 , which are 147 both enabled in iSep23.The groups package facilitates the 148 incorporation of additional information without impacting the 149 mathematical interpretation of the model.We defined all path-150 ways and subsystems identified from the Kyoto Encyclopedia 151 of Genes and Genomes (KEGG) database 24 as an individual 152 group and added corresponding reactions as members.Over-153 all, we added 99 distinct groups to the model that facilitate 154 pathway-related analysis.

Validation of the Metabolic Network 156
Besides the syntactic evaluation, data structure, and file for-157 mat validation, the model also underwent assessment for its 158 predictive value.A standard approach for such evaluations 159 involves comparing simulation outcomes with empirical lab-160 oratory data.Given the adaptability of microbes to diverse 161 environmental conditions, our focus was on investigating their 162 growth behavior across various nutrient media.To enable 163 comparability between simulation and laboratory results, en-164 suring that the simulated conditions represented those in the 165 actual experiments was imperative.

Evaluation of Different Growth media
167 Therefore, we used chemically defined media for the growth 168 simulation.In more detail, we utilized three synthetic min-169 imal media: synthetic minimal medium (SMM) 40 , AAM 41 , 170 and AAM- 42 .Developed initially to explore the metabolic 171 requirements of S. aureus, these media definitions served as 172 the basis for our simulations.We used the compound concen- In minimal media where D-glucose serves as the sole carbon source, S. epidermidis could not exhibit growth.However, in the LB, S. epidermidis demonstrates the ability to utilize alternative carbon sources when glucose is absent.In silico simulations show growth in all tested minimal media, while in vitro experiments reveal no growth in AAM-, a medium lacking Larginine.Comparative analysis of AAM-, AAM, and SMM highlights the absence of L-arginine in AAM-, a compound crucial for S. epidermidis growth.Prior studies have identified  24 .In AAM-, L-glutamate 199 is not provided as an amino acid, but it can be synthesized 219

Discussion
Here, we present a manually curated GEM of S. epidermidis ATCC 12228, iSep23.Literature-based corrections and meticulous manual curation ensured accurate representation of enzymatic reaction directions, essential for precise constraints during simulations.Overall, our model aligns with experimental data and offers a comprehensive platform for exploring S. epidermidis's metabolic capabilities and behavior under diverse conditions.The inconsitency between the in silco and in vitro results reagrding the AAM-in the presence of glucose could be attributed to factors beyond the metabolic scope.For instance, non-metabolic factors could be regulatory mechanisms and Post-translational modifications.The observed discrepancy suggests a need for a more detailed understanding of the regulatory and metabolic factors influencing S. epidermidis growth in AAM-.Further experimental validation and exploration of regulatory mechanisms are crucial for resolving the observed differences between in silico predictions and experimental outcomes.
All in all, the refined network serves as a powerful tool for exploring S. epidermidis's metabolic capabilities and behavior under diverse conditions.Future perspectives involve leveraging the model for targeted studies, such as investigating metabolic pathways, assessing the impact of genetic modifications, and exploring potential drug targets.The model's compatibility with the fbc and groups packages in the SBML Level 3 Version 1 12 format enhances its flexibility, enabling the integration of additional plugins for more intricate analyses.Including 99 distinct groups representing pathways All available L-proline is actively taken up and subsequently metabolized to various products, including L-glutamate, L-ornithine, and ultimately L-arginine.Genetic evidence supporting each reaction is provided in the form of a driving enzyme associated with a gene-reaction rule.The values assigned to these reactions correspond to the flux distribution in AMM-.The graphical representation of the metabolic map was generated using Escher 43 .
and subsystems from the KEGG database provides a foundation for comprehensive pathway-related analyses.Altogether, iSep23 aligns with experimental data and lays the groundwork for future investigations into the bacterium's metabolism.Its accuracy, comprehensibility, and flexibility make it a valuable resource for advancing our understanding of microbial physiology and metabolic engineering applications.

Reconstructing the draft model of S. epidermidis
The reconstruction of the GEM is based on protocols described in previous studies 45,46 .The fast and automated reconstruction tool CarveMe 47 curates genome-scale metabolic models of microbial species and communities 47  was employed in a preliminary step.Leveraging the Biochemical, Genetical, and Genomical (BiGG) Models database 20 identifiers of the model instances, the ModelPolisher systematically accessed the BiGG Models database, assimilating all available information for these instances into the network as annotations.

Manual refinement of the draft metabolic network
In the initial draft model, a total of 63 reactions were identified as exhibiting mass and charge imbalances.To rectify these imbalances, an investigation into the connectivity of metabolites was conducted, focusing on identifying those frequently participating in reactions characterized by imbalances.
To ensure the accuracy of the corrected model, the databases MetaNetX 22 and BioCyc 26 were browsed.These databases provided essential information about the correct charges and chemical formulas of the metabolites involved in the identified reactions, facilitating the precise adjustment of mass and charge imbalances within the model.
Additionally, the network constraints were carefully reviewed.Enzymes frequently act as catalysts in metabolic reactions.However, some enzymes effectively catalyze the reaction only in one direction.Consequently, it becomes of constraints whereby all uptake rates were set to zero, an optimization process was conducted on the dissipation reaction.
The presence of a non-zero flux following optimization serves as an indicator of the existence of EGCs within the model.

Including gene annotations
The software ModelPolisher 50 was used to annotate the model instances.It is noteworthy, however, that this tool does not facilitate the annotation of model genes due to their strainspecific nature.We annotated the network genes using the associated National Centre for Biotechnology Information (NCBI) protein identifiers 21 .Notably, these gene identifiers underwent modifications during the reconstruction process due to the prokaryotic RefSeq genome re-annotation project 48 .
To address this, we retrieved the updated NCBI protein identifiers from the NCBI database 21 .Subsequently, leveraging these novel protein identifiers in conjunction with the organism's GenBank file 52 , we extracted the corresponding KEGG gene identifiers, which align with the organism's locus tag and UniProt identifiers 23 .The integration of cross-references was executed as annotations using libSBML 53 .This comprehensive process ensures the accuracy and coherence of gene annotations within the model, thereby contributing to the reliability and accuracy of subsequent analyses.

Adding subsystems and groups
The reaction-associated pathways were retrieved using the annotated KEGG identifiers and the KEGG Representational State Transfer (REST) Application Programming transfer Interface (API).Subsequently, these pathways were incorporated as annotations utilizing the biological qualifier BQB_OCCURS_IN.Furthermore, the groups package was activated for enhanced functionality.Each identified pathway was integrated as a group, and the corresponding reactions as members.

Adding ECO and SBO terms
To enhance the model's reusability, we incorporated ECO terms that annotate all metabolic reactions 36 .This ontology comprises terms and classes of the various evidence and asser-355 tion methods.These terms elucidate, for instance, the nature 356 of evidence associated with a gene product or reaction, thereby 357 facilitating robust model quality control.The assignment of 358 a suitable ECO term to each reaction involved the extraction 359 of GPRs.In instances where a reaction lacked a GPR, the 360 term ECO:0000001 was ascribed, denoting its inference from 361 background scientific knowledge.Conversely, for all reac-362 tions with a GPR, the protein's existence was reviewed in the 363 UniProt database 23 .We distinguished the presence of proteins 364 based on distinct categories, namely: (i) inferred from homol-365 ogy (ECO:0000044), (ii) predicted (ECO:0000363), (iii) evi-366 dence at the transcript level (ECO:0000009), or, or (iv) protein 367 assay evidence.Genes not found in UniProt were assigned the 368 term ECO:0000251, indicating the similarity evidence used 369 in an automatic assertion.The relevant ECO term was incor-370 porated as an annotation in instances where a biochemical 371 reaction was associated with a GPR described by a single gene.372 In cases where the GPR involved multiple genes, the gene 373 associated with the lowest evidence score was appended.All 374 ECO terms were supplemented with the biological qualifier 375 BQB_IS_DESCRIBED_BY.

376
The SBOannotator 19 was employed to assign SBO terms 377 to all reactions, metabolites, and genes within the metabolic 378 network.These terms offer clear and unambiguous semantic 379 information, delineating the type or role of each individual 380 model component.

Elimination of redundant information 382
CarveMe stores the annotation information on model instances 383 and cross-references to external databases within the notes 384 field.However, the annotation field in the form of the con-385 trolled vocabulary (CV) terms is more appropriate for this 386 information.Hence, we transferred all cross-references to 387 the annotation field using the ModelPolisher 50 .Subsequently, 388 to optimize file size and eliminate redundancy in informa-389 tion storage, the annotation information was systematically 390 removed from the notes field.

Model extension 392
Model extension involved the integration of supplementary 393 reactions sourced from established literature.The knowledge 394 bases utilized for this purpose included BioCyc 26 , KEGG 24 , 395 and ModelSEED 25 .To identify relevant genetic information, 396 locus tags from gene annotations were extracted and com-397 pared against the KEGG database.Reactions catalyzed by 398 hypothetical enzymes were excluded from analysis.Candi-399 date reactions were systematically cross-referenced with the 400 BiGG 20 and ModelSEED databases, and were subsequently 401 integrated into the network with BiGG identifiers and cor-402 responding GPRs.If no entry in the BiGG database was 403 specified, reaction identifiers from the source database were 404 used.

Different growth media
The growth behavior of S. epidermidis was assessed in three distinct synthetic minimal media initially formulated for investigating the metabolic requirements of S. aureus.These are the: (i) SMM 40 , (ii) AAM 41 , and (iii) AAM- 42 ; a modified version of the AAM medium.The concentrations of the various components served as lower bounds for the corresponding exchange reactions of metabolites, as detailed in Table 2.In addition to the already provided salts and ions, we added minimal traces of zinc (EX_zn2_e), cobalt (EX_cobalt2_e), and copper (EX_cu2_e) to the simulated medium to enable growth.The lower bound of these reactions was set to −0.0001 mmol/(g DW • h).Oxygen availability was defined by setting the lower bound of the exchange reaction to −20 mmol/(g DW • h).The initial formulation of the three media involved the use of nicotinic acid.However, as nicotinic acid was substituted with nicotinamide in laboratory experiments, our simulated media also incorporated nicotinamide.In addition to the three minimal media, we tested S. epidermidis's growth on the LB 47 .The lower bounds of the compounds' exchange reactions listed in the LB were set to −10 mmol/(g DW • h).All in silico simulations were evaluated with and without D-glucose as a carbon source.

Different carbon sources 430
Twelve different sugars were tested for their potential role as 431 a carbon source: D-glucose, D-arabinose, maltose, lactose, 432 raffinose, D-sucrose, trehalose, D-xylose, D-cellobiose, fruc-433 tose, mannose, and D-ribose.For the growth simulations in 434 different carbon sources, we used the SMM with nicotinamide 435 instead of nicotinic acid as a basis (see Table 3).The con-436 centrations reported in the medium were established as lower 437 bounds for the simulation.The concentrations of the listed 438 carbon sources were calculated to be equivalent in carbon con-439 tent to the initial 5 g/L of glucose used in the defined SMM.440 441

Media preparation 443
The minimal media AAM, AAM-, and SMM were prepared 444 as carbon-source free base media following the methods pro-445 vided by Machado et al. after omitting glucose as the de-446 fault carbon source 40 .The carbohydrates to replace glu-447 cose as alternative carbon sources were dissolved in their 448 respective base medium, and the resulting media were sterile 449 filtered.Carbohydrates were obtained from Carl Roth (D-450 arabinose, D-glucose, trehalose, lactose, sucrose, raffinose), 451 initial CarveMe draft comprised 1,295 reactions, 933 metabolites, and 722 genes, yielding a Metabolic Model Testing (MEMOTE) 15 score of 36 %.Subsequent manual refinement involved the addition of 120 reactions, 118 metabolites, and 63 genes, as illustrated in Figure 2, resulting in an overall MEMOTE score of 88 %.The 63 massand charge-imbalanced reactions were reduced to one massimbalanced and nine charge-imbalanced reactions, resulting in a MEMOTE mass balance score of 99.7 % and a charge balance score of 99.3 %.Based on literature evidence, we corrected the directionality of 34 enzymatic reactions in the model to ensure proper constraints during model simulations.Moreover, the final metabolic network does not include infeasible energy generating cycle (EGC) that could inflate the simulation results (see Materials and Methods).We annotated the model instances with cross-references to various databases and additional information to increase the model's interoperability and re-usability.The reaction annotations are divided into three different biological qualifier types: (i) The cross-references to the nine databases are stored under the biological qualifier type BQB_IS.

Figure 2 |
Figure 2 | Properties of the network reconstructed for S. epidermidis ATCC 12228 (A) The initial draft network consisted of 1,295 reactions, 933 metabolites, and 722 genes.Further refinement and augmentation yielded the final metabolic model, comprising 1,415 reactions, 1,051 metabolites, and 785 genes.(B) To characterize the reactions, Evidence and Conclusion Ontology (ECO) terms were assigned based on the associated GPRs.The terms were allocated according to varying levels of evidentiary support.Notably, the term denoting inference from background scientific knowledge was assigned the lowest evidence level, while the term linked to protein assay evidence received the highest.(C-D) Coverage of Systems Biology Ontology (SBO) terms within the metabolic network before (C) and after (D) utilizing the SBOannotator 19 .

173Figure 3
Figure3illustrates the growth behavior of S. epidermidis in various environments both in silico and in vitro.In minimal media where D-glucose serves as the sole carbon source, S. epidermidis could not exhibit growth.However, in the LB, S. epidermidis demonstrates the ability to utilize alternative carbon sources when glucose is absent.In silico simulations show growth in all tested minimal media, while in vitro experiments reveal no growth in AAM-, a medium lacking Larginine.Comparative analysis of AAM-, AAM, and SMM highlights the absence of L-arginine in AAM-, a compound crucial for S. epidermidis growth.Prior studies have identified 200 from L-proline, an amino acid present in the medium.The 201 biosynthetic pathway is illustrated in Figure 4.All available L-202 proline is taken up and subsequently metabolized to, amongst 203 others, L-glutamate, L-ornithine, and L-arginine.Each reac-204 tion in this pathway is supported by genetic evidence through 205 a gene-reaction rule, commonly known as GPR.206 Growth in different carbon sources 207 In addition to evaluating S. epidermidis's growth behavior in 208 different media, we assessed the utilization of various car-209 bon sources.This involved employing SMM and substituting 210 D-glucose with alterative sugars in amounts adjusted for car-211 bon content.A total of 12 different sugars were subjected to 212 evaluation, as illustrated in Figure 3. Except for cellobiose 213 and D-mannose, S. epidermidis demonstrated the capability to 214 utilize all tested sugars as a carbon source, both through com-215 putational simulations (in silico) and laboratory experiments 216 (in vitro).This consistency between model predictions and ex-217 perimental observations lends robust support to the accuracy 218 of the computational model.

Figure 4 |
Figure4| Biosynthetic pathway of L-arginine via L-glutamate and L-proline.All available L-proline is actively taken up and subsequently metabolized to various products, including L-glutamate, L-ornithine, and ultimately L-arginine.Genetic evidence supporting each reaction is provided in the form of a driving enzyme associated with a gene-reaction rule.The values assigned to these reactions correspond to the flux distribution in AMM-.The graphical representation of the metabolic map was generated using Escher43 .

405 Evaluation and validation of growth capabilities 406 Table 2 |
Testing growth of S. epidermidis in different synthetic minimal media.The values were set as lower bounds for the respective reactions and carry the unit mmol/(g DW • h).
. To assess the predictive capacity of the model, growth simulations in various media were compared against laboratory experiments.The model's predictions regarding the utilization of diverse carbon sources were cross-referenced with experimental findings.

Table 1 |
Cross-references to various reaction, metabolite, and gene databases. 143

Table 3 |
Testing of different carbon sources.Twelve different sugars were tested for their potential to serve as a carbon source in S. epidermidis.All values are given in mmol/(g DW • h).