Exploring the metabolic profiling of A. baumannii for antimicrobial development using genome-scale modeling

With the emergence of multidrug-resistant bacteria, the World Health Organization published a catalog of microorganisms in 2017 for which new antibiotics are urgently needed. Within this list, the carbapenem-resistant pathogen Acinetobacter baumannii, belonging to the ESKAPE group, has been granted the “critical” status. Over the years, such isolates have been detected within healthcare units, posing a global threat to upcoming pandemics. One way to facilitate a systemic view of bacterial metabolism and allow the development of new therapeutics based on environmental and genetic alterations is to apply constraint-based modeling on metabolic networks. We developed a versatile workflow to build high-quality and simulation-ready genome-scale metabolic models. We applied our workflow to create a novel metabolic model for A. baumannii and validated its predictive capabilities using experimental nutrient utilization and gene essentiality data. Our analysis showed that our model i ACB23LX could recapitulate cellular metabolic phenotypes observed during in vitro experiments with an accuracy of over 80%, while positive biomass production rates were observed in growth media relevant to A. baumannii . Additionally, we identified putative essential genes with no human counterparts, which could serve as novel antibiotic candidates for the development of future antimicrobial strategies. Finally, we have assembled the first curated collection of available reconstructions for distinct A. baumannii strains and analyzed their growth characteristics. The presented models herein are in a standardized and well-curated format, facilitating their usability, while they can be used to guide the reconstruction of multi-strain networks. Ultimately, they serve as a knowledge base for reliable predictions under various perturbations and the development of effective drugs.


Introduction 1
In the 21st century, treating common bacterial infections has 2 become a global health concern.The rapid emergence of It targets exposed surfaces and mucous tissues, colonizes the human nose 11,12,13 and is closely related to Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infections 14,15,16,17 .The skin has shown to be a community reservoir for A. baumannii in a very small percentage of samples 18,19 , while its prevalence in the soil is a frequent misconception as species from the genus Acinetobacter are ubiquitous in nature 6,20 .Finally, it shows susceptibility to commonly used drugs, like β -lactams, aminoglycosides, and polymyxins.
Systems biology, and especially the field of genome-scale metabolic network analysis, is the key to exploring genotypephenotype relationships, better understanding mechanisms of action of such threatening pathogens, and ultimately developing novel therapeutic strategies.Genome-scale metabolic models (GEMs) combined with constraint-based modeling (CBM) provide a well-established, fast, and inexpensive in silico framework to systematically assess an organism's cellular metabolic capabilities under varying conditions having only its annotated genomic sequence 21 .As of today, they have numerous applications in metabolic engineering, leading to the formulation of novel hypotheses driving the detection of new potential pharmacological targets 22 .
It has been more than a decade since the release of the first mathematical simulation of A. baumannii metabolism.Kim et al. integrate biological and literature data to manually build AbyMBEL891, representing the strain AYE 23 .This model was further employed as an essential foundation for future reconstructions; however, its non-standardized and missing identifiers limited its use.Following a tremendous increase in the amount of literature and experimental data regarding A. baumannii (over 5,670 articles published between 2010 and 2017 according to PubMed), two novel strain-specific metabolic networks arose, iLP844 24 and the AGORA (Assembly of Gut Organisms through Reconstruction and Analysis) model 25 .
Both models were reconstructed in a semi-automated process and simulated the metabolism of two distinct strains: ATCC 19606 and AB0057, respectively.With the help of transcriptomic data of sampled colistin responses and iLP844, it was observed that the type strain ATCC 19606 underwent metabolic reprogramming, demonstrating a stress condition as a resistance mechanism against colistin exposure.Alterations in gene essentiality phenotypes between treated and untreated conditions enabled the discovery of putative antimicrobial targets and biomarkers.Moreover, the model for AB0057 was part of an extensive resource of GEMs built to elucidate the impact of microbial communities on host metabolism.
The amount of mass-and charge-balanced reactions in these models is very high; however, they carry few to no database references.Norsigian et al. improved and expanded AbyM-BEL891 to finally create the high-quality model iCN718 that exhibited a prediction accuracy of over 80% in experimentally data 26 , while Zhu et al. built a GEM for ATCC 19606 (iATCC19606) integrating multi-omics data 27 .Compared to iLP844, iATCC19606 incorporates metabolomics data to-gether with transcriptomic data enabling the deciphering of 94 bactericidal activity upon polymyxin treatment and the inter-95 play of various metabolic pathways.Last but not least, in 96 2020, the first in vivo study on A. baumannii infection was 97 published utilizing constraint-based modeling 28 .This time, 98 the collection of strain-specific models was enriched with the 99 first GEM for the hyper-virulent strain AB5075 (iAB5075).100  The model was validated using various experimental data, 101 while transcriptomics data was leveraged to identify critical 102 fluxes leading to mouse bloodstream infections.Our literature 103 search revealed one last metabolic model of A. baumannii 104 ATCC 17978, named iJS784, which, by the time of writing, 105 has not been officially published in a scientific journal or 106 been deposited in a mathematical models database.Instead, 107 it is currently available solely in the form of a dissertation 29 .108 Nonetheless, the model cannot produce biomass even when 109 all uptake reactions are open and all medium nutrients are 110 available to the cell, making it unusable and hampering repro-111 ducibility.

112
We expanded this collection by building a novel GEM for 113 the nosocomial strain ATCC 17978, named iACB23LX.The 114 presented model follows the FAIR data principles and commu-115 nity standards and recapitulates experimentally-derived pheno-116 types with high predictive capability and accuracy scores.We 117 enriched the model with numerous database cross-references, 118 and we computationally inferred the minimal nutritional re-119 quirements.Moreover, we used this model to investigate the 120 organism's growth ability in experimentally defined media and 121 within the human nose while we assessed its ability to predict 122 essential genes using two different optimization approaches.123 Among the discovered strains, ATCC 17978 is one of the most 124 well-studied, with a substantial amount of experimental data 125 available that can be used to direct model refinement and vali-126 dation.Besides that, we systematically refined and evaluated 127 all pre-existing reconstructions' performance to finally create 128 the first compendium of curated and standardized models for 129 A. baumannii.With this, we aim to promote further studies to 130 give new insights into this pathogen and promote strain-and 131 species-specific therapeutic approaches.

Reconstruction process of the novel metabolic network 134 iACB23LX 135
To build a high-quality model for A. baumannii ATCC 17978, 136 we developed a workflow shown in Figure 1 following the 137 community standards 30 (see Materials and Methods for de-138 tailed description).We named the newly reconstructed net-139 work iACB23LX, where i stands for in silico, ACB is the 140 organism-and strain-specific three-letter code from the Ky-141 oto Encyclopedia of Genes and Genomes (KEGG) database, 142 22 the year of reconstruction, and LX the authors' initials.143 Our protocol involves eight major stages starting from the 144 attainment of the annotated genomic sequence until the model 145 validation, applies to any organism from the tree of life (Ar-146 chaea, Bacteria, and Eukarya), and ensures the good quality 147  and correctness of the final model.CarveMe 31 was used to 148 build a preliminary model, which was subsequently extended 149 and curated manually.We resolved syntactical issues and while its stoichiometric consistency lies at 100% and contains no unconserved metabolites.Over 1,800 reactions have a gene-protein-reaction associations (GPR) assigned, while 149 are catalyzed by enzyme complexes (GPR contains at least two genes connected via a logical AND).
Furthermore, we tested our model for EGCs to prevent having thermodynamically infeasible internal loops that bias the final predictions 32 .We defined energy dissipation reactions (EDRs) for 15 energy metabolites and evaluated their production with FBA after disabling all nutrients from entering the system (see Materials and Methods).In our final model iACB23LX, none of the tested metabolites could be produced; thus, no EGCs are contained.As shown in Figure 1, a plethora of database cross-references was embedded in the model, while Systems Biology Ontology (SBO) Terms were defined for every reaction, metabolite, and gene (Figure 3) 34 .Additionally, each reaction was mapped to an Evidence and Conclusion Ontology (ECO) Term representing the confidence level and the assertion method.
To assess the model quality, Metabolic Model Testing (MEMOTE) 36 and the SBML Validator from the libSBML 37 were used.Our metabolic network, iACB23LX, has a MEMOTE score of 89%, and all syntactical errors are resolved.Our

Nantia Leonidou
A. baumannii presented in this publication.The ordinate represents the MEMOTE scores.The reconstruction process is divided into manual (M, no computational tool was used to reconstruct and refine the model) and semi-automated (S, draft obtained via an automated reconstruction tool, while further extension was done manually) and is written together with the publication year.Also, the respective strains annotate the abscissa labels.Our new model exhibits the highest quality score and is more comprehensive and complete than the preceding reconstructions.
model undoubtedly exhibits the highest quality score among its predecessors (Figure 2).It must be noted here that the MEMOTE testing algorithm considers only the parent nodes of the SBO directed acyclic graph and not their respective children.Assigning more representative SBO Terms (Figure 3) does not increase the final score but reduces it by 2%.
The model is available as a supplementary file in the latest SBML Level 3 Version 2 38 and JSON formats with the flux balance constraints (fbc) and groups plugins available.

Prediction of bacterial growth on various nutritional environments
Constraint-based modeling approaches such as FBA estimate flux rates with which metabolites flow through the metabolic network and predict cellular phenotypes for multiple growth scenarios.A. baumannii is known to be strictly aerobic and, compared to the majority of Acinetobacter species, it is not considered ubiquitous in nature.As a nosocomial pathogen, it has been mostly detected in hospital environments, particularly in the ICUs, and within the human nasal microbiota 11,12,13 .We examined various growth conditions to ensure that our model iACB23LX recapitulates these already known and fundamental phenotypes.
First, we tested our model's capability to simulate a strictly aerobic growth.For this purpose, we examined the directionality of all active oxygen-producing and -consuming reactions when the oxygen uptake was disabled (Figure 4).We observed an accumulation of periplasmic oxygen by reactions that carried remarkably high fluxes and resulted in growth when the oxygen import was turned off.We examined each reaction individually and removed those without evidence to correct this.More specifically, we removed the periplasmic The tested media are the computationally self-defined minimal medium (iMinMed), the Luria-Bertani (LB) medium, the M9 minimal medium, and the synthetic nasal medium (SNM).All flux rates are given in mmol/(gDW • h).The simulated media are available in the supplementary file S1.

iMinMed
LB M9 SNM lyahoo 0.1677 0.5926 0.3735 0.5099 and the M9 medium (supplementary file S1) as a reference.Minimal growth media typically consist of carbon, nitrogen, phosphate, and sulfate sources, as well multiple inorganic salts and transition metals.These metals are crucial for the growth and survival of all three domains of life; however, they can be transformed into toxic compounds in hyper-availability 41 .
The exact composition of our minimal medium (iMinMed) is shown in Table 1 (and in the supplementary file S1) and is already defined in the final model.It comprises nine transition metals, D-glucose as the carbon source, ammonium as a nitrogen source, sulfate as a sulfur source, and phosphate as a phosphate source.Oxygen is also a vital component, as A. baumannii is known to be a strictly aerobic bacterium.Previous studies have highlighted the importance of the nutrient metals for A. baumannii to survive within the host.More specifically, the bacterium utilizes these metals as co-factors for vital cellular processes 42 .Manganese and zinc have also been studied as essential determinants of host defense against A. baumannii-acquired pneumonia through their sequestering by calprotectin via a type of bonding called chelation 43 .
With iMinMed, the optimization of the growth function re-sulted in a flux rate of 0.1677 mmol/(gDW • h).Like most curation, the first draft model was reconstructed using the automated tool CarveMe 31 .This may result in the incorrect inclusion of transport reactions, which were consequently removed to reduce the number of false positive predictions.In both cases, our main objective was to improve the accuracy while keeping the number of orphan and dead-end metabolites low and removing only reactions with no gene evidence (lack of assigned GPR).Similarly, missing reactions were identified and included in the network to eliminate the false negative predictions.For instance, in accordance with the phenotypic data, the strain ATCC 17978 should not be able to grow when utilizing D-trehalose as the sole carbon source.
Our model initially predicted a growth phenotype for this carbon source.To overcome this conflict, we deleted the reaction TREP with no organism-specific gene evidence, meaning no GPR was assigned.However, it was not feasible to resolve all inconsistencies since adding transport reactions to resolve false positives or false negatives in the nitrogen testings led to more false predictions in the carbon sources.More specifically, when adenosine, inosine, L-homoserine, and uridine are utilized as sole carbon sources, the model should not predict growth, while sole nitrogen sources should result in a nonzero objective value.In this case, adding transporters would resolve false predictions in the nitrogen tests, while it would have induced more false predictions in the carbon sources tests.All in all, iACB23LX exhibited an overall accuracy 86.3% for the carbon and 79.2% for the nitrogen sources test (Figure 5c); however, after further curation, the accuracy was remarkably improved and reached 87.5%.By adding their corresponding transport reactions, we resolved discrepancies regarding uridine, inosine, adenosine, and L-homoserine.
Our model was able to catabolize 49 sole carbon and 40 sole nitrogen sources (see Figure 5a) and Figure 5b), recapitulating totally 69 and 38 experimentally-derived phenotypes, respectively.
We further assessed the ability of our model to predict known gene essentialities.First, 1, 164 in silico single gene deletions were conducted on both LB and rich growth media, respectively, to identify all lethal gene deletions.Subsequently, the ratio between the growth rate after and before the respective knockouts (FC gr ) was calculated, and the genes were accordingly classified (see Materials and Methods).For the optimization, two mathematics-based approaches from the Constraints-Based Reconstruction and Analysis for Python (COBRApy) 50 package were deployed: the FBA 51 and the Minimization of Metabolic Adjustment (MOMA) 52 .Between the two methods, a similar distribution of the FC gr values was observed (Figure 6a and Figure 6b).Using FBA, 97, 75, and 991 genes were predicted to be essential, partiallyessential, and inessential on the LB medium, respectively, whereas optimization with MOMA resulted in 110, 85, and 968 genes (Figure 6c and supplementary files S2 and S3).These genes were primarily associated with the biosynthesis of cofactors and vitamins, the amino acid/nucleotide metabolism, the energy metabolism, and the metabolism of  terpenoids and polyketides.Additionally, we examined in more detail how nutrition availability impacts the gene essentiality by conducting single-gene knockouts in the rich medium.Both optimization methods resulted in more essential genes when the model was required to alter its metabolic behavior due to the absence of nutrients, i.e., with the LB growth medium, compared to the rich medium (Figure 6c and supplementary files S2 and S3).In general, FBA detected more genes to be dispensable for growth in both nutritional environments.On the other hand, MOMA classified more genes as essential or partially-essential ( Figure 6c and supplementary files S2 and S3), while genes from FBA build a subset of the essential genes derived by MOMA.Furthermore, we validated the prediction accuracy of iACB23LX using already existing gene essentiality data.At the time of writing, the transposon mutant library by Wang et al. is the only ATCC 17978specific experimental dataset 49 .With this dataset together and the LB medium, our model demonstrated an accuracy of 87% with both optimization methods (Figure 7).We further analyzed the predicted false negative genes and probed their proteomes to investigate the existence of human orthologs (see Materials and Methods).With this, we aimed to eliminate cross-linkings to human-similar proteins since metabolic pathways or enzymes that are missing from the human host have been an important resource of druggable targets against infectious diseases 53 .From the 37 genes that our model predicted to be essential (with FBA and MOMA) contradicting the experimental results, 17 were found to be non-homologous (see Supplementary File S7 plays an important role in the growth of different microbes, especially due to its photosynthesizing property that marks it as a non-invasive and safe therapeutic strategy against bacterial infections 55 .Lastly, the phosphogluconate dehydratase catalyzes the dehydration of 6-Phospho-D-gluconate to KDPG, the precursor of pyruvate and 3-Phospho-D-glycerate 56 .This enzyme is part of the Entner-Doudoroff pathway that catabolizes glucose to pyruvate, similarly to glycolysis, but using a different set of enzymes 57 .
We further assessed the druggability of our essential nonhomologous proteins and investigated the existence of inhibitors or compounds known to interact with the enzymes.
For this, we used the online DrugBank database that contains detailed information on various drugs and drug targets 58 .In  S7 lists all non-homologous essential genes reported for iACB23LX.
Overall, iACB23LX exhibits high agreement to all validation tests and can, therefore, be used to systematically derive associations between genotypes and phenotypes.paving new ways towards its update and refinement 48,49,59,60 .463 Since then, a variety of GEMs was developed aiming at the 464 empowering of drug development strategies and the enforce-465 ment of metabolic engineering by formulating novel and reli-466 able hypotheses (Table 3).However, the amount and format of 467 information contained are inconsistent, with some being syn-468 tactically invalid or of older formats.Here, we systematically 469 analyzed the quality of all seven currently existing GEMs, 470 reporting their strengths and weaknesses and debugging them 471 to finally build a curated, standardized, and updated collec-472 tion.To do so, we developed a workflow with curation steps 473 applicable to all models aiming at the standardization and us-474 ability of published GEMs by the community (Figure 8).This 475 closely follows the community-driven workflow published 476 by Carey et al. for the reconstruction of reusable and trans-477 latable models 30 .The curation procedure includes a series 478 of stages aiming at modifying data format, data amount, and 479 information quality.It is important to note that no contextual 480 modifications were conducted that could affect the model's 481 prediction capabilities (see Materials and Methods).

482
Five A. baumannii strains have been created throughout the 483 years, with AYE and ATCC 19606 having two reconstructions 484 each.All models are publicly stored and can be downloaded 485 either from a database/repository (BioModels, Virtual Meta-486 bolic Human (VMH) 61 , BiGG 62 , and GitHub) or directly 487 from the publication's supplementary material.The use of 488 distinct identifiers prevents the metabolic networks from be-489 ing compared to each other.More specifically, iLP844 and 490 Table 3 | List of genome-scale metabolic models curated for A. baumannii, along with information relevant to the manual refinement.Default growth rates (i.e., model simulated as downloaded), the cellular compartments (C: cytosol, E: extracellular space, P: periplasm, and ER: endoplasmic reticulum), and the reactions and metabolites identifiers are listed in the table.MEMOTE scores before and after manual curation are given in the last column.Blue highlights the novel reconstruction for the strain ATCC 17978 presented in this publication.After manual curation, our model developed following our workflow in Figure 1 has the highest quality score and comes along with a minimal medium defined.9).The majority resulted in a biomass flux of 0.0 mmol/(gDW • h) with the M9 medium, while the AGORA model could not simulate growth in the LB and SNM as well.Thus, we investigated and identified minimal medium supplementations needed to enable cellular biomass production.As already mentioned, iJS784 was excluded from further analysis (Table 3), together with AbyMBEL891 that debilitated the analysis due to its non-standardized identifiers and its missing genes and GPRs.When the medium of iATCC1906 and iLP844 was supplemented with D-alanine and glycyl-L-glutamate, respectively, their biomass reactions carried a positive flux rate of 0.4989 mmol/(gDW • h) and 0.7357 mmol/(gDW • h).Supplementation of meso-2,6-diaminoheptanedioate, menaquinone-8, niacinamide, heme, siroheme, and spermidine into the LB medium of the AGORA model resulted in a positive growth rate (1.9430 mmol/(gDW • h)).Similarly, when supplementing the SNM with glycyl-L-asparagine, the derived growth rate was 1.5020 mmol/(gDW • h), while the M9 medium needed to be extended with glycyl-L-asparagine and thiamine (resulted growth rate: 0.5844 mmol/(gDW • h)).Lastly, like with iACB23LX, the LB medium, together with FBA and MOMA, were applied to detect lethal genes in all models (see Supplementary Files S5 and S6).Despite significant efforts, we could not derive a mapping scheme between the strain-specific gene identifiers of iLP844 and iATCC1906 to resolve PROKKA or HMPREF identifiers.Thus, a strainwise comparison of essential genes could be feasible only for the strain ATCC 17978.As already mentioned, iJS784 simulated continuously zero growth and was excluded from the analysis.Consequently, we examined which genes were necessary for growth among the remaining models across three different strains: AYE (iCN718), ATCC 17978 (iACB23LX), and AB0057 (AGORA).Totally, 392 genes were identified as essential, while 34 occurred in all three strains.For instance, when the genes encoding for dephospho-coenzyme A (CoA) kinase, phosphopantetheinyl transferase, shikimate kinase (A1S_3190), or chorismate synthase (A1S_1694) were together with the fact that it was detected to be vital for growth 551 across three distinct strains, increases its potential to be a drug to be essential, while 97 of them were also reported by FBA to impair the growth.Generally, after enriching the nutritional input with all available compounds (rich medium), less lethal genes resulted, meaning that A. baumannii undergoes metabolic alterations when nutrients are lacking.Our in silico results compared to the strain-specific gene essentiality data 49 resulted in 87% overall accuracy, which is remarkably higher than all GEMs built for A. baumannii (e.g., 80.22% for iCN718 and 72% for iLP844), except iAB5075 which performed comparably.Subsequently, we examined more carefully our false negative predictions and searched for putative drug targets that could be employed for future therapeutics.

Availability
More specifically, we focused on genes found to be essential for growth and encode proteins with no human counterparts (see Supplementary Table S7).Our study highlighted the EPSP and chorismate synthases from the shikimate pathway as prominent target candidates with no correlation to the human proteome.Several knock-out studies have highlighted the significance of enzymes from the shikimate metabolism as potential targets against infections caused by threatening microorganisms, e.g., Mycobacterium tuberculosis 70 , Plasmodium falciparum 71 , and Yersinia enterocolitica 72 .Umland et al. identify these two genes products as essential in an in vivo study using a clinical isolate of A. baumannii and a rat abscess infection model 73 .This increases the confidence of our results and indicates that novel genes found to be essential in silico should be considered as potential antimicrobial targets.Similarly, numerous studies have suggested one of our further candidates, riboflavin, as a potential antimicrobial agent 55 , while the Entner-Doudoroff pathway (in which our candidate targets phosphogluconate dehydratase and KDPG aldolase act to produce pyruvate) is similar to the glycolysis but with different member enzymes, has been firstly discovered in Pseudomonas saccharophila 57 and later in E. coli 74 .Meanwhile, it is vital for the survival of further pathogenic microorganisms, like Neisseria gonorrhoeae, Klebsiella pneumoniae, and Pseudomonas aeruginosa 75,76,77 .However, they have not yet been examined in the context of Acinetobacter species and could be a source of antimicrobial therapeutic strategies.Hence, these biosynthetic routes could be a valuable resource for targets to fight bacterial infectious diseases.Finally, we investigated the druggability of our essential nonhomologous genes.We searched the DrugBank database to find compounds known to inhibit these genes and that are already approved by the Food and Drug Administration (FDA).Our analysis resulted in only drugs that have been found to interact with the gene product of interest; however their pharmacological action is yet unknown.We further probed the hypothetical and putative non-homologous genes against the DrugBank's sequence database to find homologous proteins and determine their activity.Also in this case, the resulted drugs were listed with still undetermined pharmacological action.These putative and yet unexplored targets with inhibitory potential are of great interest in the context of developing novel classes of antibiotics.
Overall, our model reached a MEMOTE score of 89%, which is the highest score reported for this organism.
Moreover, we improved and assessed all previously published models and created the first curated collection of metabolic networks for A. baumannii.We created a debugging workflow consisting of four major steps to systematically analyze and curate constraint-based models focusing on their standardization and the FAIR data principles.We applied this workflow and curated a total of seven metabolic models for A. baumannii.In addition, most of the models simulated growth rates by default that were unrealistic when compared to the fastest growing organism (V.natriegens) 36 .Hence, we determined the minimal number of components needed for these models to result in non-inflated biomass production rates.The defined minimal media were mostly composed of metal ions (e.g., cobalt, iron, magnesium) that are essential for bacterial growth.For the model iJS784, the minimization process was infeasible; thus, the model was not considered for further analysis.We also examined the growth ability of these models in three media (SNM, LB, and M9) and compared them to our model, iACB23LX.When the models simulated a zero flux through the biomass reaction, we continued by detecting the minimal amount of metabolites supplemented in the medium that resulted in a non-zero growth rate.These would enable the detection of gaps and assist in future improvement of the models.It is important to note here that with this curation, we opted for a systematical assessment of the previously reconstructed models and the detection of their assets and liabilities.Consequently, we did not undertake any contextual modification that could alter the models' predictive capabilities.Finally, we in silico detected lethal genes among comparable and simulatable models of A. baumannii.
Our analysis incorporated three strains of A. baumannii (AYE, AB0057, and ATCC 17978), and we examined the effect of genetic variation across strains in the gene essentiality.Our analysis highlighted once again the shikimate pathway, as well as the purine metabolism, the pantothenate, and CoA biosynthesis, and the amino acid metabolism as candidate routes to consider for future new classes of antibacterial drugs with potential effect across multiple A. baumannii strains.The curated models, together with our novel model, would benefit the future prediction of candidate lethal genes by reducing the considerable resources needed for classical whole-genome essentiality screenings.All in all, this collection of simulationready models will forward the selection of a suitable metabolic network based on individual research questions and help define the entire species and new hypothesis.
Our new metabolic reconstruction and the curated collection of further strain-specific models will guide the formulation of ground-breaking and reliable model-driven hypotheses about this pathogen and help examine the diversity in the metabolic behavior of different A. baumannii species in response to genetic and environmental alterations.Additionally, they can be utilized to detect critical pathways related to responses against multiple antibiotic treatments.This will ultimately strengthen the development of advanced precision antimi-711 crobial control strategies against multidrug-resistant (MDR) 712 A. baumannii strains.

713
Taken together, our workflows and models can be employed 714 to expand this collection further with additional standardized 715 strain-specific metabolic reconstructions to finally define the 716 core and pan metabolic capabilities of A. baumannii.

718
The metabolic model reconstruction workflow 719 Figure 1 illustrates the workflow we developed to cre-720 ate the novel high-quality genome-scale metabolic network 721 iACB23LX, following the state-of-the-art protocol of Thiele 722 and Palsson 21 .Our workflow consists of eight major steps 723 starting from the extraction of an annotated genome until 724 the model validation using experimental data.Modifications 725 in the model structure, as well as the inclusion of cross-726 references to multiple functional databases, were done using 727 the libSBML 37 library, while all simulations were conducted 728 via the COBRApy-0.22.1 50 suite that includes functions com-729 monly used for simulations.

730
The individual steps are described below in more detail 731 with respect to the reconstruction of iACB23LX.

733
A first draft model was built with CarveMe 1.5.1 using the an-734 notated genome sequence of the strain ATCC 17978.This was 735 downloaded from the National Centre for Biotechnology In-736 formation (NCBI) at https://www.ncbi.nlm.nih.gov and 737 has the assembly accession number ASM1542v1 78 .Seven 738 strain-specific assemblies are registered in NCBI; however, 739 the chosen entry is also present in the KEGG database facili-740 tating the model extension.The genome is 3.9 Mio bp long 741 and has two plasmids (pAB1 and pAB2).We set the SBML 742 flavor to activate the SBML Level 3 extension for fbc ver-743 sion 2 that allows semantic descriptions for domain-specific 744 elements such as metabolite chemical formulae and charges 745 together with reaction boundaries and GPRs.Moreover, we 746 the CarveMe parameter gramneg to employ the specialized 747 template for the Gram-negative bacteria.Compared to the 748 Gram-positive template, the Gram-negative template comes 749 with phosphatidylethanolamines, murein, and a lipopolysac-750 charide unit.Its biomass reactions involve membrane and cell 751 wall components resulting in more accurate gene essentiality 752 predictions in the lipid biosynthesis pathways.

754
We started the manual refinement of the draft model by re-755 solving syntactical errors within the model file using SBML 756 Validator from the libSBML library 37 .Missing metabo-757 lite charges and chemical formulas were retrieved from the 758 BiGG 62 and ChEBI 79 databases, while mass-and charge-759 imbalanced reactions were corrected.The most intense part of 760 the workflow is the manual network extension and gap-filling.761 this was done using the organism-specific databases KEGG 64 762 and BioCyc 80 , together with ModelSEED 63 .We mapped the new gene locus tags to the old ones using the GenBank General Feature Format (GFF) 81 and added missing metabolic genes along with the respective reactions and metabolites into our model.The network's connectivity was ensured by resolving as many dead-ends (can only be produced but not consumed) and orphan (can only be consumed but not produced) metabolites as possible.Also, reactions with no connectivity were not included in the model, while reactions with no organism-specific gene evidence were removed from the model.

Erroneous energy generating cycles
Energy generating cycles (EGC) are thermodynamically infeasible loops found in metabolic networks and have not been experimentally observed, unlike futile cycles.EGCs charge energy metabolites like adenosine triphosphate (ATP) and uridine triphosphate (UTP) without any external source of nutrients and may result in incorrect and unrealistic energy increases.Their elimination is crucial while correcting the energy metabolism since they can inflate the maximal biomass yields and make the predictions unreliable.We checked their existence in iACB23LX applying an algorithm developed by Fritzemeier et al. 32 .
We created a Python script that (1) defines and adds energy dissipation reactions (EDRs) in the network: where X is the metabolite of interest and (2) maximizes each EDR while blocking all influxes.This can be formulated as follows: max(v edr ) subject to where edr is the index of the current dissipation reaction, S is the stoichiometric matrix, v the flux vector, E the set of all exchange reactions, and v min In the case of existing EGCs, we examined the directionality and the gene evidence of all participated reactions using the BioCyc organism-specific information as reference 80 .

Database annotations 803
In this stage, the model was enriched with cross-linkings to 804 various functional databases.Reactions and metabolites were 805 annotated with databases (e.g., KEGG 64 , BRENDA 82 , and 806 UniProt 83 ).These were included in the model as controlled 807 vocabulary (CV) Terms following the Minimal Information 808 Required In the Annotation of Models (MIRIAM) guide-809 lines 84 and the resolution service at https://identifiers. 810 org/.We used ModelPolisher 85 to complete the missing 811 available metadata for all metabolites and genes.Similarly, 812 metabolic genes were annotated with their KEGG, NCBI Pro-813 tein, and RefSeq identifiers using the GFF 81 .To reactions, 814 metabolites, and genes SBO Terms were assigned using the 815 SBOannotator 34 .SBO Terms provide unambiguous seman-816 tic information and specify the type or role of the individual 817 model component 86 .In addition, ECO Terms were added to 818 every reaction to capture the type of evidence of biological as-819 sertions with BQB_IS_DESCRIBED_BY as a biological qualifier.820 They are useful during quality control and mirror the curator's 821 confidence about the inclusion of a reaction.When multiple 822 genes encode a single reaction, an ECO Term was added for 823 every participant gene.Both terms were incorporated into the 824 model according to our mapping in Figure 3.

825
Finally, reactions were annotated with the associated sub-826 systems in which they participate using the KEGG database 827 and the biological qualifier BQB_OCCURS_IN.Moreover, the 828 "groups" plugin was activated 38 .Every reaction that ap-829 peared in a given pathway was added as a groups:member, 830 while each pathway was created as a group instance with 831 sboTerm="SBO:0000633" and groups:kind="partonomy".

833
MEMOTE 36 version 0.13.0 was used to assess and track the qual-834 ity of our model after each modification, providing us with in-835 formation regarding the model improvement.The final model 836 was converted into the latest SBML Level 3 Version 2 38 for-837 mat using the libSBML package, while the SBML Validator 838 tracked syntactical errors and ensured a valid format of the 839 final model 37 .

Constraint-based analysis 841
The most frequently used constraint-based modeling approach is the FBA that determines a flux distribution via optimization of the objective function and linear programming 51 .Prior to this, the metabolic network is mathematically encoded using the stoichiometric matrix S formalism.This structure delineates the connectivity of the network, and it is formed by the stoichiometric coefficients of all participating biochemical reactions.The rows and columns are represented by the metabolites and the mass-and charge-balanced reactions respectively.At steady state, the system of linear equations derived from the network is defined as follows: with S being the stoichiometric matrix and ⃗ v the flux vector.With no defined constraints, the flux distribution may be deter- mined at any point within the solution space.This space must be further restricted since the system is under-determined and algebraically insoluble.An allowable solution space is defined by a series of imposed constraints that are followed by cellular functions.Altogether the FBA maximization problem, with mass balance, thermodynamic, and capacity constraints, is defined as: Here, n is the amount of reactions, Z represents the linear of our model in the human nasal niche, as A. baumannii have been isolated from nasal samples within ICUs 11, 12,13 .For this purpose, we utilized the SNM that imitates the human nasal habitat 87 .In all cases, if macromolecules or mixtures were present, we considered the constitutive molecular components for the medium definition.As our model was initially unable to reproduce growth on the applied media, we deployed the gap-filling option from CarveMe to detect missing reactions and gaps in the network 31 .All simulated media are available in S1.

Rich medium definition
To investigate our model's growth rate when all nutrients are available to the bacterial cell, we defined the rich medium.
For this purpose, we enabled the uptake of all extracellular metabolites by the model setting the lower bound of their exchange reactions to −10 mmol/(gDW • h).

Evaluation of carbon and nitrogen utilization
We employed the previously published Biolog Phenotypic Array data by Farrugia et al. for A. baumannii ATCC 17978 to validate the functionality of our model 48 .According to the experimental guidelines provided by Farrugia et al., we utilized the M9 minimal medium for all simulations.The medium was then supplemented with D-xylose as a carbon source for the nitrogen testings, while ammonium served as the only nitrogen source for the carbon tests.As D-xylose was initially not part of the model, we conducted an extensive search in the organism-specific databases KEGG 64 and BioCyc 80 to include missing reactions.
The phenotypes were grouped by their maximal kinetic curve height.A trait was considered positive ("growth") if the height exceeded the 115 and 101 OmniLog units for a nitrogen and carbon source, respectively.The prediction accuracy was evaluated by comparing the in silico-derived phenotypes to the Biolog results.More specifically, the overall model's accuracy (ACC) was calculated by the overall agreement: where true positive (TP) and true negative (TN) are correct predictions, while false positive (FP) and false negative (FN) are inconsistent predictions.Discrepancies were resolved via iterative manual curation of the model.

Gene perturbation analysis
We performed in silico single-gene deletions on iACB23LX to detect essential genes.For this purpose, we utilized the single_gene_deletion function from the COBRApy 50 package.A gene is considered to be essential if a flux of 0.0 mmol/(gDW • h) was observed through the biomass reaction after setting the lower and upper bounds of the associated reaction(s) to 0.0 mmol/(gDW • h).
Additionally, we examined the effect of gene deletions using two different optimization approaches: FBA 51 and MOMA 52 .Contrary to FBA, MOMA is based on quadratic 924 programming, and the involved optimization problem is the 925 Euclidean distance minimization in flux space.Moreover, 926 it approximates the metabolic phenotype and relaxes the as-927 sumption of optimal growth flux for gene deletions 52 .

928
The results were compared to the ATCC 17978-specific gene essentiality dataset from 2014 49 .Wang et al. generated a random mutagenesis dataset including 15, 000 unique transposon mutants using insertion sequencing (INSeq) 49 .By the time of writing, this is the only A. baumannii ATCC 17978 library presenting gene essentiality information.Analogously to the experimental settings, the nutrient uptake constraints were set to the LB medium.From the 453 genes identified as essential by Wang et al., 191 could be compared to our predictions.The rest were not part of iACB23LX due to their non-metabolic functions.To measure the effect of a single deletion, we calculated the fold change (FC) between the model's doubling time after (gr KO ) and before (gr W T ) a single knockout.This is formulated as follows: To this end, if FC gr = 0, the deleted gene is classified as  This can be applied to any metabolic network in SBML format 958 and follows the community "gold standards" strictly as pro-959 posed by Carey et al. 30 .The curation steps involved changes in the format, amount, and quality of the included information.The context has not been altered in any way that could impact the models' prediction capabilities.We employed a combination of already existing tools to analyze, simulate, and quality-control the models (COBRApy 50 , MEMOTE 36 , and the SBML Validator 37 ).Different database cross-references were incorporated in the models using ModelPolisher 85 and following the MIRIAM guidelines 84 , while the libSBML library 37 was used to manipulate the file format and convert to the latest version.To resolve inflated growth rates, we determined computationally-defined minimal growth media.The growth capabilities were examined with respect to various experimentally-derived growth media, while the LB medium was applied to identify lethal genes.A strain-wise comparison was not feasible due to strain-specific identifiers, no successful growth, or missing genes.Hence, we investigated the essential genes across all models with identifiers that could not be mapped with the Pathosystems Resource Integration Center (PATRIC) ID mapping tool 89 .
To begin with the debugging, we examined the syntactical correctness and internal consistency of the downloaded files using the SBML Validator from the libSBML library 37 .Two models (iCN718 and iJS784) could not pass the validator check and reported errors since they were not in a valid SBML format right after their attainment.We made iCN718 valid by deleting the reaction DNADRAIN for which neither a reactant nor a product was assigned since the associated metabolite was not part of the model.Similarly, the empty groups attribute was removed from iJS784, converting the file into a valid format.Warnings were detected for iATCC1906, and iAB5075 due to missing definition of the fbc extension (became available at the latest Level 3 release 90 ) and the non-alphanumeric chemical formulas.We resolved these issues by defining the fbc list listOfGeneProducts and the species attribute chemicalFormula.In more detail, we extracted the given GPR from the notes field and defined individual geneProduct classes with id, name, and label.The attribute chemicalFormula was set equal to the species chemical formulas extracted from the notes and is particularly essential in reaction's validation and balancing.Following the SBML specifications regarding its constitution, in case of ambiguous formulas separated by a semicolon (;), the first molecular representation was chosen.With this, the genes and metabolites' chemical formulas became part of the file's main structure.Since iATCC1906 carried KEGG identifiers, we could extract the metabolites' chemical formulas from the database and add them to the model.Moving on with the file extension, we declared the remaining missing attributes from reactions, metabolites, and genes that are required according to the SBML language guidelines 38 .More specifically, we defined the metaid attribute when missing, while we fixed any errors regarding the identifiers nomenclature.Further extension involved the annotation of reactions, metabolites, and genes with a plethora of database cross-references following the MIRIAM guidelines 84 .For this, we employed Model-Polisher that complements and annotates SBML models with 1016 additional metadata using the BiGG Models knowledgebase 1017 as reference 85 .We also defined precise SBO Terms with the 1018 sboTerm attribute using the SBOannotator 34

Figure 1 |
Figure 1 | Workflow developed for the metabolic network reconstruction of iACB23LX.The created workflow consists of eight main steps: extraction of the annotated genome, draft model reconstruction, model refinement, gap-filling, investigation of energy generating cycles (EGCs), model annotation, quality control and quality assurance (QC/QA), and model validation using experimental data.The growth simulations include the examination of growth requirements and the definition of a minimal growth medium.The last six processes are continuously iterated until the model is of good quality and recapitulates known phenotypes.

Figure 2 |
Figure2| Properties of all metabolic networks of A. baumannii.Blue highlights the novel reconstruction for the strain ATCC 17978 presented in this publication.The ordinate represents the MEMOTE scores.The reconstruction process is divided into manual (M, no computational tool was used to reconstruct and refine the model) and semi-automated (S, draft obtained via an automated reconstruction tool, while further extension was done manually) and is written together with the publication year.Also, the respective strains annotate the abscissa labels.Our new model exhibits the highest quality score and is more comprehensive and complete than the preceding reconstructions.

Figure 5 |Figure 6 |
Figure5| Model predictions compared to the Biolog experimental measurements for various carbon and nitrogen sources.From the Biolog data, only substances mappable to model metabolites were included, while the M9 minimal medium was applied.a) and b) The model's ability to catabolize various carbon and nitrogen sources was assessed using the strain-specific phenotypic data by Farrugia et al.48 .Blue indicates no growth, and orange indicates growth.Totally, 80 and 48 compounds were tested as sole carbon and nitrogen sources, respectively.Out of these, 69 and 38 phenotypes were recapitulated successfully by iACB23LX.c) Confusion matrices of model predictions and Biolog experimental measurements.The overall accuracy of iACB23LX is 86.3% for the carbon (left matrix) and 79.2% for the nitrogen (right matrix) testings.Orange represents correct predictions, and grey represents wrong predictions.
all cases, the listed drugs are of unknown pharmacological action, and there is still no evidence indicating the enzymes' association with the molecule's mechanism of action.For instance, the flavin mononucleotide and the cobalt hexamine ion were listed as known inhibitors of yet unknown function against the chorismate synthase, while glyphosate, shikimate-3-Phosphate, and formic acid have been experimentally found to act with EPSP synthase.Six non-homologous genes were marked as hypothetical or putative in the KEGG database and/or lacked enzyme-associated information.We searched for drug leads by aligning the query sequences against the DrugBank's database to find homologous proteins.Two out of six were found to have a protein hit.More specifically, the protein encoded by A1S_0589 was found to have high sequence identity with the phosphocarrier protein HPr of Enterococcus faecalis (Bit-score: 48.5), while the translation product of A1S_0706 resembles the sugar phosphatase YbiV of Escherichia coli (Bit-score: 225.3).According to Drug-Bank, dexfosfoserine and aspartate beryllium trifluoride have been experimentally determined to bind to these enzymes; however, their pharmacological action is still unknown.The Supplementary Table

842
objective function, and ⃗ c is a vector of coefficients on the 843 fluxes ⃗ v used to define the objective function. 957

Table 1 |
Composition of the computationally defined minimal growth medium iMinMed.It consists of nine transition metals, a carbon source, a nitrogen source, a sulfur source, and a phosphate source.Oxygen was used to simulate aerobic conditions.

SBO:0000376 SBO:0000178 SBO:0000377 SBO:0000216 SBO:0000222 SBO:0000223 SBO:0000402 SBO:0000217 SBO:0000218 SBO:0000220 SBO:0000403 SBO:0000214 SBO:0000219 SBO:0000215 SBO:0000695
33hematic representation of the SBO and ECO Terms mapping.It follows the graphs defined in the repository for biomedical ontologies Ontology Lookup Service (OLS)33.The SBO terms were added using the SBOannotator tool 34 .The ECO Terms annotated metabolic reactions and were declared based on the presence of GPR along with KEGG and UniProt annotations.Providing UniProt identifiers, the Protein Existence Level guides the mapping to appropriate ECO Terms. Figure created with yEd 35 .

Table 2 |
Simulated doubling times in various growth media.
Accuracy of gene essentiality predictions based on experimental data.iACB23LX was employed to predict essential genes.The in silico results were compared the Wang et al. transponson library.The LB medium was applied to mirror the experimental settings.The model exhibited 87% accuracy with FBA (left) and 87% with MOMA (right).Beige indicates correct predictions; grey indicates incorrect predictions.
Debugging workflow to curate and evaluate already published models.Following the community standards, the already published A. baumannii models were curated and transformed into re-usable, simulatable, and translatable models.Quality controls and metabolic standardized tests were conducted using Metabolic Model Testing (MEMOTE), while the validity of the file format and syntax were examined with the SBML Validator.ModelPolisher enhanced the models with missing metadata.
23curated collection of already published A. baumannii 458 metabolic models 459In 2010, Kim et al. publish the first GEM for the multidrug-460 resistant strain A. baumannii AYE23.After that, multiple stud-461 ies provided new data and genomic analyses were published, 462 . The final step 1019 of debugging involved the conversion of all models to the 1020 newest available format SBML Level 3 Version 2 38 , as well 1021 as the quality control using MEMOTE 36 .