ABSTRACT
With the emergence of multidrug-resistant bacteria, the World Health Organization published a catalog of microorganisms urgently needing new antibiotics, with the carbapenem-resistant Acinetobacter baumannii designated as “critical”. Such isolates, frequently detected in healthcare settings, pose a global pandemic threat. One way to facilitate a systemic view of bacterial metabolism and allow the development of new therapeutics is to apply constraint-based modelling. Here, we developed a versatile workflow to build high-quality and simulation-ready genome-scale metabolic models. We applied our workflow to create a novel metabolic model for A. baumannii and validated its predictive capabilities using experimental nutrient utilization and gene essentiality data. Our analysis showed that our model i ACB23LX could recapitulate cellular metabolic phenotypes observed during in vitro experiments, while positive biomass production rates were observed and experimentally validated in various growth media. We further defined a minimal set of compounds that increase A. baumannii ‘s cellular biomass and identified putative essential genes with no human counterparts, offering novel candidates for future antimicrobial development. Finally, we assembled and curated the first collection of reconstructions for distinct A. baumannii strains and analysed their growth characteristics. The presented models are in a standardised and well-curated format, enhancing their usability for multi-strain network reconstruction.
Introduction
In the 21st century, treating common bacterial infections has become a global health concern. The rapid emergence of pathogens with newly developed resistance mechanisms led to the ineffectiveness of hitherto used antimicrobial drugs. According to their resistance patterns, bacteria are classified into three main categories: multidrug-resistant (MDR, resistant to at least one agent in more than three antibiotic categories), extensively drug-resistant (XDR, non-susceptible to one or two categories), and pandrug-resistant (PDR, non-susceptible to all drugs in all categories)1. Pathogens from the last two classes are called “superbugs”. In February 2022, Murray et al. developed predictive statistical models within a large-scale global study and estimated 1.27 million deaths directly associated with antimicrobial resistance (AMR)2. The same study underlines the highly virulent ESKAPE pathogens (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter spp.) as the primary cause of AMR-related deaths, while the World Health Organization (WHO) announced in 2017 the urgent need for novel and effective therapeutic strategies against these microorganisms, assigning them the “critical status”.
Over the years, numerous studies highlighted the Gramnegative human pathogen Acinetobacter baumannii of substantial concern in hospital environments attributable to its high intrinsic resistance against antimicrobial agents, including biocides3, 4, 5, 6. A. baumannii (from the Greek word akínētos, meaning “unmoved”) is a rod-shaped, non-motile, and strictly aerobic bacterium. It is an opportunistic pathogen whose adaptable genetic apparatus has caused it to become endemic in intensive care units (ICUs), affecting immunocompromised patients, causing pneumonia, bacteremia, endocarditis, and more. Especially the carbapenem-resistant A. baumannii poses a serious global threat with high mortality rates7, 8, 9. It targets exposed surfaces and mucous tissues, colonises the human nose10, 11, 12 and is closely related to Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infections13, 14, 15, 16. The skin has shown to be a community reservoir for A. baumannii in a very small percentage of samples17, 18, while its prevalence in the soil is a frequent misconception as species from the genus Acinetobacter are ubiquitous in nature5, 19. Finally, it shows susceptibility to commonly used drugs, like β -lactams, aminoglycosides, and polymyxins.
Systems biology, and especially the field of genome-scale metabolic network analysis, is the key to exploring genotypephenotype relationships, better understanding mechanisms of action of such threatening pathogens, and ultimately developing novel therapeutic strategies. Genome-scale metabolic models (GEMs) combined with constraint-based modelling (CBM) provide a well-established, fast, and inexpensive in silico framework to systematically assess an organism’s cellular metabolic capabilities under varying conditions having only its annotated genomic sequence20. As of today, they have numerous applications in metabolic engineering, leading to the formulation of novel hypotheses driving the detection of new potential pharmacological targets21.
It has been more than a decade since the release of the first mathematical simulation of A. baumannii metabolism. Kim et al. integrated biological and literature data to manually build AbyMBEL891, representing the strain AYE22. This model was further employed as an essential foundation for future reconstructions; however, its non-standardised and missing identifiers limited its use. Following a tremendous increase in the amount of literature and experimental data regarding A. baumannii (over 5,670 articles published between 2010 and 2017 according to PubMed), two novel strain-specific metabolic networks arose, iLP84423 and the AGORA (Assembly of Gut Organisms through Reconstruction and Analysis) model24. Both models were reconstructed in a semi-automated process and simulated the metabolism of two distinct strains: ATCC 19606 and AB0057, respectively. With the help of transcriptomic data of sampled colistin responses and iLP844, it was observed that the type strain ATCC 19606 underwent metabolic reprogramming, demonstrating a stress condition as a resistance mechanism against colistin exposure. Alterations in gene essentiality phenotypes between treated and untreated conditions enabled the discovery of putative antimicrobial targets and biomarkers. Moreover, the model for AB0057 was part of an extensive resource of GEMs built to elucidate the impact of microbial communities on host metabolism. The amount of massand charge-balanced reactions in these models is very high; however, they carry few to no database references. Norsigian et al. improved and expanded AbyMBEL891 to finally create the high-quality model iCN718 that exhibited a prediction accuracy of over 80 % in experimental data25, while Zhu et al. built a GEM for ATCC 19606 (iATCC19606) integrating multi-omics data26. Compared to iLP844, iATCC19606 incorporates metabolomics data together with transcriptomic data enabling the deciphering of bactericidal activity upon polymyxin treatment and the interplay of various metabolic pathways. Last but not least, in 2020, the first in vivo study on A. baumannii infection was published utilizing constraint-based modelling27. This time, the collection of strain-specific models was enriched with the first GEM for the hyper-virulent strain AB5075 (iAB5075). The model was validated using various experimental data, while transcriptomics data was leveraged to identify critical fluxes leading to mouse bloodstream infections. Our literature search revealed one last metabolic model of A. baumannii ATCC 17978, named iJS784, which, by the time of writing, has not been officially published in a scientific journal or been deposited in a mathematical models database. Instead, it is currently available solely in the form of a dissertation28. Nonetheless, the model cannot produce biomass even when all uptake reactions are open and all medium nutrients are available to the cell, making it unusable and hampering reproducibility.
We expanded this collection by building a novel GEM for the nosocomial strain ATCC 17978, named iACB23LX. The presented model follows the FAIR data principles and community standards and recapitulates experimentally-derived phenotypes with high predictive capability and accuracy scores. We enriched the model with numerous database cross-references and computationally inferred the minimal nutritional requirements. Moreover, we used this model to investigate the organism’s growth ability in defined media and a medium simulating human nasal secretions while we assessed its ability to predict essential genes using two different optimization approaches. Among the examined strains, ATCC 17978 is one of the most well-studied, with a substantial amount of experimental data available that can be used to direct model refinement and validation. Besides that, we systematically refined and evaluated all pre-existing reconstructions’ performance to finally create the first compendium of curated and standardised models for A. baumannii. With this, we aim to promote further studies to give new insights into this pathogen and promote strainand species-specific therapeutic approaches.
Results
Reconstruction process of the novel metabolic network iACB23LX
To build a high-quality model for A. baumannii ATCC 17978, we developed a workflow shown in Figure 1 adhering closely to the community standards29 (see Materials and Methods). We named the newly reconstructed network iACB23LX, where i stands for in silico, ACB is the organismand strainspecific three-letter code from the Kyoto Encyclopedia of Genes and Genomes (KEGG)30 database, 23 the year of reconstruction, and LX the modellers’ initials. Our protocol involves eight major stages starting from the attainment of the annotated genomic sequence until the model validation, applies to any organism from the tree of life (Archaea, Bac-teria, and Eukarya), and ensures the good quality and correctness of the final model. CarveMe31 was used to build a preliminary model, which was subsequently extended and curated manually. We resolved syntactical issues and mass and charge imbalances during manual refinement while we defined missing metabolite charges and chemical formulas. Our final model contains no mass-imbalanced reactions and only two charge-imbalanced reactions. After extensive efforts, resolving all charge imbalances was impossible since all participated metabolites are interconnected to multiple reactions within the network, and any modification in their charge resulted in newly introduced imbalances. The model extension process involved incorporating missing metabolic genes considering the network’s connectivity. Dead-end and orphan metabolites do not exist biologically in the species, implying knowledge gaps in metabolic networks. Moreover, reactions including such metabolites are not evaluated in flux balance analysis (FBA). Hence, reactions with zero connectivity and no organism-specific gene evidence were omitted from the gap-filling. We extended the draft model by 138 reactions, 77 genes, and 110 metabolites in three compartments (cytosol, periplasm, and extracellular space). All in all, iACB23LX comprises 2,321 reactions, 1,660 metabolites, and 1,164 genes (Figure 2). It is the most comprehensive model, while its stoichiometric consistency lies at 100 % and contains no unconserved metabolites. Over 1,800 reactions have a gene-protein-reaction associations (GPR) assigned, while 149 are catalysed by enzyme complexes (GPR contains at least two genes connected via a logical AND).
Furthermore, we tested our model for energy-generating cycles (EGCs) to prevent having thermodynamically infeasible internal loops that bias the final predictions32. We defined energy dissipation reactions (EDRs) for 15 energy metabolites and evaluated their production with FBA after disabling all nutrients from entering the system (see Materials and Methods). In our final model, iACB23LX, none of the tested metabolites could be produced; thus, no EGCs are contained. As shown in Figure 1, a plethora of database cross-references was embedded in the model, while Systems Biology Ontology (SBO) terms were defined for every reaction, metabolite, and gene (Figure 3)34. Additionally, each reaction was mapped to an Evidence and Conclusion Ontology (ECO) term representing the confidence level and the assertion method.
To assess the model quality, Metabolic Model Testing (MEMOTE)36 and the Systems Biology Markup Language (SBML)37 Validator from the libSBML38 were used. Our metabolic network, iACB23LX, has a MEMOTE score of 89 % and all syntactical errors are resolved. Our model undoubtedly exhibits the highest quality score among its predecessors (Figure 2). It must be noted here that the MEMOTE testing algorithm considers only the parent nodes of the SBO directed acyclic graph and not their respective children. Assigning more representative SBO terms (Figure 3) does not increase the final score but reduces it by 2 %.
The model is available as a Supplementary File in the latest SBML Level 3 Version 239 and JavaScript Object Notation (JSON) formats with the flux balance constraints (fbc) and groups plugins available.
iACB23LX is of high quality and exhibits an increased predictive accuracy
Prediction and experimental validation of bacterial growth on various nutritional environments
Constraint-based modelling approaches such as FBA estimate flux rates with which metabolites flow through the metabolic network and predict cellular phenotypes for multiple growth scenarios. A. baumannii is known to be strictly aerobic and, compared to the majority of Acinetobacter species, it is not considered ubiquitous in nature. As a nosocomial pathogen, it has been mostly detected in hospital environments, particularly in the ICUs, and within the human nasal microbiota10, 11, 12. We examined various growth conditions to ensure that our model iACB23LX recapitulates these already known and fundamental phenotypes.
First, we tested our model’s capability to simulate a strictly aerobic growth. For this purpose, we examined the directionality of all active oxygen-producing and -consuming reactions when the oxygen uptake was disabled (see Supplementary Figure S1). We observed an accumulation of periplasmic oxygen by reactions that carried remarkably high fluxes and resulted in growth when the oxygen import was turned off. We examined each reaction individually and removed those without gene evidence to correct this. More specifically, we removed the periplasmic catalase (CATpp), one of bacteria’s main hydrogen peroxide scavengers. This enzyme is typically active in the cytosol40 and was not part of any precursor A. baumannii GEM or was found only in cytosol (iLP84423). To fill the gap and enable the usage of the periplasmic hydrogen peroxide, we included the reaction phenethylamine oxidase (PEAMNOpp) in the model. Eventually, our model iACB23LX demonstrated growth only in the presence of oxygen using a rich medium (all exchange reactions are open).
Furthermore, we determined the minimal number of metabolites necessary for growth using iACB23LX and the M9 minimal medium (M9) as a reference. Minimal growth media typically consist of carbon, nitrogen, phosphorus, and sulfur sources, as well multiple inorganic salts and transition metals. These metals are crucial for the growth and survival of all three domains of life; however, they can be transformed into toxic compounds in hyper-availability41. The exact composition of our minimal medium (iMinMed) is shown in Table 1. It comprises nine transition metals, acetate as the carbon source, ammonium as a nitrogen source, sulfate as a sulfur source, and phosphate as a phosphorus source. Oxygen is also a vital component, as A. baumannii is known to be a strictly aerobic bacterium. Previous studies have highlighted the importance of the nutrient metals for A. baumannii to survive within the host. More specifically, the bacterium utilises these metals as co-factors for vital cellular processes42. Manganese and zinc have also been studied as essential determinants of host defense against A. baumannii-acquired pneumonia through their sequestering by calprotectin via a type of bonding called chelation43. For the computational simulations, growth rates below 2.81 mmol/(gDW ·h) were considered to be realistic since the doubling time of the fastest growing organism Vibrio natriegens is 14.8 minutes that correspond to 2.81 mmol/(gDW·h)36. Table 2 displays the predicted growth rates of iACB23LX in the respective culture media. In the LB medium, our model grew with the highest rate; 0.5926 mmol/(gDW ·h). With our self-defined minimal medium, iMinMed, our model exhibited the lowest rate; 0.2097 mmol/(gDW ·h). Notably, our experimental validation revealed similarity between the growth rates obtained from the in vitro respiratory curves (Table 2 and Supplementary Figure S2) and those predicted by our in silico simulations. Additionally, we examined the growth rate of our model in a rich medium, in which all nutrients are available to the model. With this, the flux through the biomass production was the highest, 2.1858 mmol/(gDW·h), as expected. This is still less than the growth rate of the fastest organism, increasing the confidence in our model’s consistency and simulation capabilities. Initially, iACB23LX could not predict any realistic growth rate for the simulated media. Using the gap-filling function of CarveMe31, we detected three enzymes whose addition into the metabolic network resulted in successful growth in all tested media. These reactions are PHPYROX, OXADC, and LCYSTAT.
Functional validation of iACB23LX using nutrient utilization data
Multiple in silico approaches have hitherto been employed to predict lethal genes and to assess growth metrics on different carbon/nitrogen sources for severe pathogenic organisms including Mycobacterium tuberculosis44, 45 and Staphylococcus aureus46, 47. In 2013 and 2014, two studies were published that examined the phenome and gene essentialities of the strain ATCC 1797848, 49. We used these datasets to evaluate the overall performance (functionality and accuracy) of iACB23LX.
Our first validation experiment evaluated the accuracy of our model’s carbon and nitrogen catabolism potentials. More specifically, we detected compounds that could serve as sole carbon and nitrogen sources using the large-scale phenotypic data by Farrugia et al.48. Although the authors tested more compounds in total, we could examine only 80 compounds as carbon sources and 48 metabolites as sole nitrogen sources. For the remaining molecules, either no Biochemical, Genetical, and Genomical (BiGG) identifier could be found, or they were not part of the metabolic network. According to the experimental protocol followed by Farrugia et al., we applied the M9 medium and enabled D-xylose as the sole carbon source for the nitrogen testings. As D-xylose was initially not part of the reconstructed network, we conducted extensive literature and database search to include missing reactions. This improved the prediction accuracy, especially for the carbon sources, where an amelioration of 19 % was achieved. In more detail, despite the comprehensive manual curation, the first draft model was reconstructed using the automated tool CarveMe31. This resulted in the incorrect inclusion of transport reactions, which were consequently removed to reduce the number of false positive predictions. In both cases, our main objective was to improve the accuracy while keeping the number of orphan and dead-end metabolites low and removing only reactions with no gene evidence (lack of assigned GPR). Similarly, missing reactions were identified and included in the network to eliminate the false negative predictions. For instance, in accordance with the phenotypic data, the strain ATCC 17978 should not be able to grow when utilizing D-trehalose as the sole carbon source. Our model initially predicted a growth phenotype for this carbon source. To overcome this conflict, we deleted the reaction TREP with no organism-specific gene evidence, meaning no GPR was assigned. However, it was not feasible to resolve all inconsistencies since adding transport reactions to resolve false positives or false negatives in the nitrogen testings led to more false predictions in the carbon sources. More specifically, when adenosine, inosine, L-homoserine, and uridine are utilised as sole carbon sources, the model should not predict growth, while sole nitrogen sources should result in a nonzero objective value. In this case, adding transporters would resolve false predictions in the nitrogen tests, while it would have induced more false predictions in the carbon sources tests. All in all, iACB23LX exhibited an overall accuracy of 86.3 % for the carbon and 79.2 % for the nitrogen sources test (Figure 4c); however, after further curation, the accuracy was remarkably improved and reached 87.5 %. By adding their corresponding transport reactions, we resolved discrepancies regarding uridine, inosine, adenosine, and L-homoserine.
Our model was able to catabolise 49 sole carbon and 40 sole nitrogen sources (see Figure 4 (a) and Figure 4 (b), recapitulating totally 69 and 38 experimentally-derived phenotypes, respectively.
We further assessed the ability of our model to predict known gene essentialities. First, 1,164 in silico single gene deletions were conducted on both LB and rich growth media, respectively, to identify all lethal gene deletions. Subsequently, the ratio between the growth rate after and before the respective knockouts (FCgr) was calculated, and the genes were accordingly classified (see Materials and Methods). For the optimization, two mathematics-based approaches from the Constraints-Based Reconstruction and Analysis for Python (COBRApy)50 package were deployed: the FBA51 and the Minimization of Metabolic Adjustment (MOMA)52. Between the two methods, a similar distribution of the FCgr values was observed (Figure 5 (a) and Figure 5 (b)). Using FBA, 97, 75, and 991 genes were predicted to be essential, partially essential, and inessential on the LB medium, respectively, whereas optimization with MOMA resulted in 110, 85, and 968 genes (Figure 5 (c) and Supplementary Files S2 and S3). These genes were primarily associated with the biosynthesis of cofactors and vitamins, the amino acid/nucleotide metabolism, the energy metabolism, and the metabolism of terpenoids and polyketides. Additionally, we examined in more detail how nutrition availability impacts the gene essentiality by conducting single-gene knockouts in the rich medium. Both optimization methods resulted in more essential genes when the model was required to alter its metabolic behavior due to the absence of nutrients, i.e., with the LB growth medium, compared to the rich medium (Figure 5 (c) and Supplementary Files S2 and S3). In general, FBA detected more genes to be dispensable for growth in both nutritional environments. On the other hand, MOMA classified more genes as essential or partially essential (Figure 5 (c) and Supplementary Files S2 and S3), while genes from FBA build a subset of the essential genes derived by MOMA. Furthermore, we validated the prediction accuracy of iACB23LX using already existing gene essentiality data. At the time of writing, the transposon mutant library by Wang et al. is the only ATCC 17978-specific experimental dataset49. With this dataset together and the LB medium, our model demonstrated an accuracy of 87 % with both optimization methods (Figure 5 (d)). We further analysed the predicted false negative genes and probed their proteomes to investigate the existence of human orthologs (see Materials and Methods and Supplementary Table S4). With this, we aimed to eliminate crosslinkings to human-similar proteins since metabolic pathways or enzymes that are missing from the human host have been an important resource of druggable targets against infectious diseases53. From the 37 genes that our model predicted to be essential (with FBA and MOMA) contradicting the experimental results, 17 were found to be non-homologous (see Supplementary Table S4). Some examples are the genes encoding the enolpyruvylshikimate phosphate (EPSP) synthase (A1S_2276), chorismate synthase (A1S_1694), riboflavin synthase (A1S_0223), phosphogluconate dehydratase (A1S_0483), and 2-keto-3-deoxy-6-phosphogluconate (KDPG) aldolase (A1S_0484). The EPSP synthase converts the shikimate3-phosphate together with phosphoenolpyruvate to 5-O-(1carboxyvinyl)-3-phosphoshikimic acid. Subsequently, the chorismate synthase catalyses the conversion of the 5-O-(1carboxyvinyl)-3-phosphoshikimic acid to chorismate, the seventh and last step within the shikimate pathway54. Chorismate is the common precursor in the production of the aromatic compounds tryptophan, phenylalanine, and tyrosine, as well as folate and menaquinones during the bacterial life cycle. The shikimate pathway is of particular interest due to its absence from the human host metabolome and its vital role in bacterial metabolism and virulence. Moreover, the enzyme riboflavin synthase catalyses the final step of riboflavin (vitamin B2) biosynthesis with no participating cofactors. Riboflavin can be produced by most microorganisms compared to humans, who have to externally uptake them via food supplements. Also, it plays an important role in the growth of different microbes, especially due to its photosynthesizing property that marks it as a non-invasive and safe therapeutic strategy against bacterial infections55. Lastly, the phosphogluconate dehydratase catalyses the dehydration of 6-Phospho-D-gluconate to KDPG, the precursor of pyruvate and 3-Phospho-D-glycerate56. This enzyme is part of the Entner–Doudoroff pathway that catabolises glucose to pyruvate, similarly to glycolysis, but using a different set of enzymes57.
We further assessed the druggability of our essential nonhomologous proteins and investigated the existence of inhibitors or compounds known to interact with the enzymes. For this, we used the online DrugBank database that contains detailed information on various drugs and drug targets58. In all cases, the listed drugs are of unknown pharmacological action, and there is still no evidence indicating the enzymes’ association with the molecule’s mechanism of action. For instance, the flavin mononucleotide and the cobalt hexamine ion were listed as known inhibitors of yet unknown function against the chorismate synthase, while glyphosate, shikimate3-phosphate, and formic acid have been experimentally found to act with EPSP synthase. Six non-homologous genes were marked as hypothetical or putative in the KEGG30 database and/or lacked enzyme-associated information. We searched for drug leads by aligning the query sequences against the DrugBank’s database to find homologous proteins. Two out of six were found to have a protein hit. More specifically, the protein encoded by A1S_0589 was found to have high sequence identity with the phosphocarrier protein HPr of Enterococcus faecalis (Bit-score: 48.5), while the translation product of A1S_0706 resembles the sugar phosphatase YbiV of Escherichia coli (Bit-score: 225.3). According to DrugBank, dexfosfoserine and aspartate beryllium trifluoride have been experimentally determined to bind to these enzymes; however, their pharmacological action is still unknown. The Supplementary Table S4 lists all non-homologous essential genes reported for iACB23LX.
Overall, iACB23LX exhibits high agreement to all validation tests and can, therefore, be used to systematically derive associations between genotypes and phenotypes.
A curated collection of already published A. baumannii metabolic models
In 2010, Kim et al. published the first GEM for the multidrugresistant strain A. baumannii AYE22. After that, multiple studies provided new data and genomic analyses were published, paving new ways towards its update and refinement48, 49, 59, 60. Since then, a variety of GEMs was developed aiming at the empowering of drug development strategies and the enforcement of metabolic engineering by formulating novel and reliable hypotheses (Table 3). However, the amount and format of information contained are inconsistent, with some being syntactically invalid or of older formats. Here, we systematically analysed the quality of all seven currently existing GEMs, reporting their strengths and weaknesses and debugging them to finally build a curated, standardised, and updated collection. To do so, we developed a workflow with curation steps applicable to all models aiming at the standardization and usability of published GEMs by the community (Figure 6 (a)). This closely follows the community-driven workflow published by Carey et al. for the reconstruction of reusable and translatable models29. The curation procedure includes a series of stages aiming at modifying data format, data amount, and information quality. It is important to note that no contextual modifications were conducted that could affect the model’s prediction capabilities (see Materials and Methods).
Five A. baumannii strains have been created throughout the years, with AYE and ATCC 19606 having two reconstructions each. All models are publicly stored and can be downloaded either from a database/repository (BioModels, Virtual Metabolic Human (VMH)61, BiGG62, and GitHub) or directly from the publication’s supplementary material. The use of distinct identifiers prevents the metabolic networks from being compared to each other. More specifically, iLP844 and iJS784 carry ModelSEED63 identifiers for reactions and metabolites, while iCN718 and iAB5075 BiGG 62 identifiers. AbyMBEL891 uses distinct identifiers not supported by any database, and iATCC1906 includes identifiers derived from KEGG30. Most of the models resulted in an unrealistic and inflated growth rate (reference: doubling time of the fastest growing organism V. natriegens) in their defined medium, while iJS784 showed a zero growth even when all imports were enabled. Hence, this model was excluded from further analysis (Table 3). For each of the remaining GEM, we defined the minimum growth requirements that result in a nonzero and realistic objective value. For instance, the AGORA model required at least 21 compounds (mostly metal ions), while oxygen was sufficient for AbyMBEL891 to simulate a non-zero growth (see Supplementary File S5).
Since these models should successfully reflect the A. baumannii metabolic and growth capabilities (see Supplementary Figure S2), we examined the flux through their biomass reaction in various growth media known to induce A. baumannii growth (Figure 6 (b)). The majority resulted in a biomass flux of 0.0 mmol/(gDW · h) in the iMinMed, while the AGORA model could not simulate growth in the LB and SNM as well. Thus, we investigated and identified minimal medium supplementations needed to enable cellular biomass production. As already mentioned, iJS784 was excluded from further examination (Table 3), together with AbyMBEL891 that debilitated the analysis due to its non-standardised identifiers and its missing genes and GPRs. When the medium of iATCC1906 and iAB5075 was supplemented with D-alanine and D-glucose 6phospahte as well as guanosine 5’-phosphate (GMP), respectively, their biomass reactions carried a positive flux rate of 0.5279 mmol/(gDW·h) and 0.6477 mmol/(gDW ·h). Supplementation of meso-2,6-diaminoheptanedioate, menaquinone8, niacinamide, heme, siroheme, and spermidine into the LB medium of the AGORA model resulted in a positive growth rate (1.9430 mmol/(gDW ·h)). Similarly, when supplementing the SNM with glycyl-L-asparagine, the derived growth rate was 1.5020 mmol/(gDW ·h), while the iMinMed needed to be extended with 12 additional components (resulted growth rate: 1.2789 mmol/(gDW·h)). Lastly, like with iACB23LX, the LB medium, together with FBA and MOMA, were applied to detect lethal genes in all models (see Supplementary Tables S6 and S7). Despite significant efforts, we could not derive a mapping scheme between the strain-specific gene identifiers of iLP844 and iATCC1906 to resolve PROKKA or HMPREF identifiers. Thus, a strain-wise comparison of essential genes would be feasible only for the strain ATCC 17978. As already mentioned, iJS784 simulated continuously zero growth and was excluded from the analysis. Consequently, we examined which genes were necessary for growth among the remaining models across three different strains: AYE (iCN718), ATCC 17978 (iACB23LX), and AB0057 (AGORA). Totally, 392 genes were identified as essential, while 34 occurred in all three strains. For instance, when the genes encoding for dephospho-coenzyme A (CoA) kinase, phosphopanteth-einyl transferase, shikimate kinase (A1S_3190), or chorismate synthase (A1S_1694) were deleted from the three strains, no growth could be simulated in the LB medium. As already mentioned, the gene encoding the chorismate synthase has no human-like counterpart. This, together with the fact that it was detected to be vital for growth across three distinct strains, increases its potential to be a drug candidate for future therapies. Generally, most essential genes are members of the purine metabolism and encode various transferases. Besides this, the pantothenate and CoA biosynthesis and the amino acid metabolism were found to be a prominent target pathways for further drug development.
Discussion
The historical timeline of past pandemics shows the imposed threat of bacteria in causing repetitive outbreaks with the highest death tolls64, such as cholera and plague. By 2050, antimicrobial-resistant pathogens are expected to kill 10 million people annually65, while the antibiotics misuse accompanied by the ongoing Coronavirus Disease 2019 (COVID-19) crisis exacerbated this global threat. It is noteworthy that elevated morbidity rates were ascribed to bacterial co/secondary infections during previous viral disease outbreaks66, 67, 68. Hence, developing effective antibiotic regimens is of urgent importance. Here we present the most recent and comprehensive ready-to-use blueprint GEM for the Gram-negative pathogen A. baumannii. For this, we developed a workflow that applies to any living organism and ensures the reconstruction of high-quality models following the community standards. Our model, iACB23LX, was able to simulate growth in the SNM that mimics the human nasal niche, the experimentally defined medium LB, and the model-derived iMinMed. With iMinMed we denoted the minimal number of compounds needed to achieve non-zero growth. This medium contains totally 14 compounds, including transition metals and energy sources. Transitions metals have been shown to participate in important biological processes and are vital for the survival of living organisms41. We confirmed the computationally predicted growth rates by comparing them to our empirically determined growth kinetics data. With this, we ensured that our model recapitulates growth phenotypes in media that reflect Acinetobacter-associated environments.
Furthermore, we validated iACB23LX quantitatively and qualitatively using existing experimental data and observed remarkable improvements compared to precursory models. More specifically, our model predicted experimental Biolog growth phenotypes on various carbon sources48 with 86.3 % overall agreement, which is higher than the predictions capability of iATCC19606 (84.3 %) and iLP844 (84 %), and comparable to that of iAB5075 (86.3 %). Similarly, iACB23LX exhibited 79.2 % predictive accuracy on nitrogen sources tests, while this increases to 87.5 % after further refinement.Improving and re-defining the biomass objective func-tion (BOF) based on accurate strain-specific experimental data would be the next step to diminish the number of inconsistent predictions and to further improve the network and its predictive potential. During gene lethality analysis in LB medium, our model predicted 110 genes with MOMA to be essential, while 97 of them were also reported by FBA to impair the growth. Generally, after enriching the nutritional input with all available compounds (rich medium), less lethal genes resulted, meaning that A. baumannii undergoes metabolic alterations when nutrients are lacking. Our in silico results compared to the strain-specific gene essentiality data49 resulted in 87 % overall accuracy, which is remarkably higher than all GEMs built for A. baumannii (e.g., 80.22 % for iCN718 and 72 % for iLP844), except iAB5075 which performed comparably. Subsequently, we examined more care-fully our false negative predictions and searched for putative drug targets that could be employed for future therapeutics. More specifically, we focused on genes found to be essential for growth and encode proteins with no human counterparts (see Supplementary Table S4). Our study highlighted the EPSP and chorismate synthases from the shikimate pathway as prominent target candidates with no correlation to the human proteome. Several knockout studies have highlighted the significance of enzymes from the shikimate metabolism as potential targets against infections caused by threatening microorganisms, e.g., Mycobacterium tuberculosis69, Plasmodium falciparum70, and Yersinia enterocolitica71. Umland et al. identified these two gene products as essential in an in vivo study using a clinical isolate of A. baumannii and a rat abscess infection model72. This increases the confidence of our results and indicates that novel genes found to be essential in silico should be considered as potential antimicrobial targets. Similarly, numerous studies have suggested one of our further candidates, riboflavin, as a potential antimicrobial agent55, while the Entner–Doudoroff pathway (in which our candidate targets phosphogluconate dehydratase and KDPG aldolase act to produce pyruvate) is similar to the glycolysis but with different member enzymes, has been firstly discovered in Pseudomonas saccharophila57 and later in E. coli73. Meanwhile, it is vital for the survival of further pathogenic microorganisms, like Neisseria gonorrhoeae, Klebsiella pneumoniae, and Pseudomonas aeruginosa74, 75, 76. However, they have not yet been examined in the context of Acinetobacter species and could be a source of antimicrobial therapeutic strategies. Hence, these biosynthetic routes could be a valuable resource for targets to fight bacterial infectious diseases. Finally, we investigated the druggability of our essential nonhomologous genes. We searched the DrugBank database to find compounds known to inhibit these genes and that are already approved by the Food and Drug Administration (FDA). Our analysis resulted in only drugs that have been found to interact with the gene product of interest; however their pharmacological action is yet unknown. We further probed the hypothetical and putative non-homologous genes against the DrugBank’s sequence database to find homologous proteins and determine their activity. Also in this case, the resulted drugs were listed with still undetermined pharmacological action. These putative and yet unexplored targets with inhibitory potential are of great interest in the context of developing novel classes of antibiotics. Overall, our model reached a MEMOTE score of 89 %, which is the highest score reported for this organism.
Moreover, we improved and assessed all previously published models and created the first curated collection of metabolic networks for A. baumannii. We created a debugging workflow consisting of four major steps to systematically analyse and curate constraint-based models focusing on their standardization and the FAIR data principles. We applied this workflow and curated a total of seven metabolic models for A. baumannii. In addition, most of the models simulated growth rates by default that were unrealistic when compared to the fastest growing organism (V. natriegens)36. Hence, we determined the minimal number of components needed for these models to result in non-inflated biomass production rates. The defined minimal media were mostly composed of metal ions (e.g., cobalt, iron, magnesium) that are essential for bacterial growth. For the model iJS784, the minimization process was infeasible; thus, the model was not considered for further analysis. We also examined the growth ability of these models in three media (SNM, LB, and iMinMed) and compared them to our model, iACB23LX. When the models simulated a zero flux through the biomass reaction, we continued by detecting the minimal amount of metabolites supplemented in the medium that resulted in a non-zero growth rate. These would enable the detection of gaps and assist in future improvement of the models. It is important to note here that with this curation, we opted for a systematical assessment of the previously reconstructed models and the detection of their assets and liabilities. Consequently, we did not undertake any contextual modification that could alter the models’ predictive capabilities. Finally, we in silico detected lethal genes among comparable and simulatable models of A. baumannii. Our analysis incorporated three strains of A. baumannii (AYE, AB0057, and ATCC 17978), and we examined the effect of genetic variation across strains in the gene essentiality. Our analysis highlighted once again the shikimate pathway, as well as the purine metabolism, the pantothenate, and CoA biosynthesis, and the amino acid metabolism as candidate routes to consider for future new classes of antibacterial drugs with potential effect across multiple A. baumannii strains. The curated models, together with our novel model, would benefit the future prediction of candidate lethal genes by reducing the considerable resources needed for classical whole-genome essentiality screenings. All in all, this collection of simulationready models will forward the selection of a suitable metabolic network based on individual research questions and help define the entire species and new hypothesis.
Our new metabolic reconstruction and the curated collection of further strain-specific models will guide the formulation of ground-breaking and reliable model-driven hypotheses about this pathogen and help examine the diversity in the metabolic behavior of different A. baumannii species in response to genetic and environmental alterations. Additionally, they can be utilised as knowledge bases to detect critical pathways related to responses against multiple antibiotic treatments. This will ultimately strengthen the development of advanced precision antimicrobial control strategies against multidrugresistant (MDR) A. baumannii strains.
Taken together, our workflows and models can be employed to expand this collection further with additional standardised strain-specific metabolic reconstructions to finally define the core and pan metabolic capabilities of A. baumannii.
Materials and Methods
Growth curves of A. baumannii
Growth curves for A. baumannii strains AB5075, ATCC 17978, ATCC 19606 and AYE were recorded in LB medium, iMinMed medium supplemented with acetate as the sole carbon source (0.2 % weight per volume, respectively), and SNM77. Overnight cultures of the strains grown in LB medium were harvested by centrifugation, and the cell pellet was washed once with 5 mL of phosphate buffered saline (PBS). Cells were then re-suspended in the medium used for the growth curves. The starting optical density (OD)600 nm was adjusted to 0.1 and growth curves were recorded in 2 mL of medium for 12 h in triplicates using a Tecan infinite M200 PRO plate reader and 12-well plates covered with a plastic lid. Plates were incubated with linear shaking at 37 °C and the OD600 nm was measured every 15 min. Growth rates were determined as the slope of the linear part of the curves plotting the natural logarithm of OD600 nm against time.
The metabolic model reconstruction workflow
Figure 1 illustrates the workflow we developed to create the novel high-quality genome-scale metabolic network iACB23LX, following the state-of-the-art protocol of Thiele and Palsson20. Our workflow consists of eight major steps starting from the extraction of an annotated genome until the model validation using experimental data. Modifications in the model structure, as well as the inclusion of crossreferences to multiple functional databases, were done using the libSBML38 library, while all simulations were conducted via the COBRApy -0.22.150 suite that includes functions commonly used for simulations.
The individual steps are described below in more detail with respect to the reconstruction of iACB23LX.
Draft reconstruction
A first draft model was built with CarveMe version 1.5.1 using the annotated genome sequence of the strain ATCC 17978. This was downloaded from the National Centre for Biotechnology Information (NCBI) at https://www.ncbi.nlm.nih.gov and has the assembly accession number ASM1542v178. Seven strain-specific assemblies are registered in NCBI; however, the chosen entry is also present in the KEGG30 database facilitating the model extension. The genome is 3.9 Mbp long and has two plasmids (pAB1 and pAB2). We set the SBML flavor to activate the extension for fbc version 279 that allows semantic descriptions for domain-specific elements such as metabolite chemical formulas and charges together with reaction boundaries and GPRs. Moreover, the CarveMe parameter gramneg was selected to employ the specialised template for the Gram-negative bacteria. Compared to the Gram-positive template, the Gram-negative template comes with phosphatidylethanolamines, murein, and a lipopolysaccharide unit. Its biomass reactions involve membrane and cell wall components resulting in more accurate gene essentiality predictions in the lipid biosynthesis pathways.
Manual refinement and extension
We started the manual refinement of the draft model by resolving syntactical errors within the model file using SBML Validator from the libSBML library38. Missing metabolite charges and chemical formulas were retrieved from the BiGG62 and ChEBI80 databases, while massand chargeimbalanced reactions were corrected. The most intense part of the workflow is the manual network extension and gapfilling. This was done using the organism-specific databases KEGG30 and BioCyc81, together with ModelSEED63. We mapped the new gene locus tags to the old ones using the GenBank General Feature Format (GFF)82 and added missing metabolic genes along with the respective reactions and metabolites into our model. The network’s connectivity was ensured by resolving as many dead-ends (are only produced but not consumed) and orphan (are only consumed but not produced) metabolites as possible. Also, reactions with no connectivity were not included in the model, while reactions with no organism-specific gene evidence were removed from the model.
Erroneous energy generating cycles
Energy generating cycles (EGC) are thermodynamically infeasible loops found in metabolic networks and have not been experimentally observed, unlike futile cycles. EGCs charge energy metabolites like adenosine triphosphate (ATP) and uridine triphosphate (UTP) without any external source of nutrients and may result in incorrect and unrealistic energy increases. Their elimination is crucial while correcting the energy metabolism since they can inflate the maximal biomass yields and make the predictions unreliable. We checked their existence in iACB23LX applying an algorithm developed by Fritzemeier et al.32.
We created a Python script that (1) defines and adds energy dissipation reactions (EDRs) in the network: where X is the metabolite of interest and (2) maximises each EDR while blocking all influxes. This can be formulated as follows: subject to where edr is the index of the current dissipation reaction, S is the stoichiometric matrix, the flux vector, E the set of all exchange reactions, and the upper and lower bounds. The existence of EGCs is indicated by a positive optimal value of vedr.
Totally we examined 14 energy metabolites: ATP, cytidine triphosphate (CTP), guanosine triphosphate (GTP), UTP, inosine triphosphate (ITP), nicotinamide adenine dinucleotide (NADH), nicotinamide adenine dinucleotide phosphate (NADPH), flavin adenine dinucleotide (FADH2), flavin mononucleotide (FMNH2), ubiquinol-8, menaquinol-8, demethylmenaquinol-8, acetyl-CoA, and L-glutamate. Moreover, we tested the proton exchange between cytosol and periplasm.
In the case of existing EGCs, we examined the directionality and the gene evidence of all participated reactions using the BioCyc organism-specific information as reference81.
Database annotations
In this stage, the model was enriched with cross-linkings to various functional databases. Reactions and metabolites were annotated with databases (e.g., KEGG30, BRENDA83, and UniProt84). These were included in the model as controlled vocabulary (CV) Terms following the Minimal Information Required In the Annotation of Models (MIRIAM) guidelines85 and the resolution service at https://identifiers.org/. We used ModelPolisher86 to complete the missing available metadata for all metabolites and genes. Similarly, metabolic genes were annotated with their KEGG30, NCBI Protein, and RefSeq identifiers using the GFF82. To reactions, metabolites, and genes SBO terms were assigned using the SBOannotator34. SBO terms provide unambiguous semantic information and specify the type or role of the individual model component87. In addition, ECO terms were added to every reaction to capture the type of evidence of biological assertions with BQB_IS_DESCRIBED_BY as a biological qualifier. They are useful during quality control and mirror the curator’s confidence about the inclusion of a reaction. When multiple genes encode a single reaction, an ECO term was added for every participant gene. Both terms were incorporated into the model according to our mapping in Figure 3.
Finally, reactions were annotated with the associated subsystems in which they participate using the KEGG30 database and the biological qualifier BQB_OCCURS_IN. Moreover, the “groups” plugin was activated39. Every reaction that appeared in a given pathway was added as a groups:member, while each pathway was created as a group instance with sboTerm=“SBO:0000633” and groups:kind=“partonomy”.
Quality control and quality assurance
MEMOTE36 version 0.13.0 was used to assess and track the quality of our model after each modification, providing us with information regarding the model improvement. The final model was converted into the latest SBML Level 3 Version 239 format using the libSBML package, while the SBML Validator tracked syntactical errors and ensured a valid format of the final model38.
Constraint-based analysis
The most frequently used constraint-based modelling approach is the FBA that determines a flux distribution via optimization of the objective function and linear programming51. Prior to this, the metabolic network is mathematically encoded using the stoichiometric matrix S formalism. This structure delineates the connectivity of the network, and it is formed by the stoichiometric coefficients of all participating biochemical reactions. The rows and columns are represented by the metabolites and the massand charge-balanced reactions respectively. At steady state, the system of linear equations derived from the network is defined as follows: with S being the stoichiometric matrix and the flux vector. With no defined constraints, the flux distribution may be deter-mined at any point within the solution space. This space must be further restricted since the system is under-determined and algebraically insoluble. An allowable solution space is defined by a series of imposed constraints that are followed by cellular functions. Altogether the FBA maximization problem, with mass balance, thermodynamic, and capacity constraints, is defined as: Here, n is the amount of reactions, Z represents the linear objective function, and is a vector of coefficients on the fluxes used to define the objective function.
Growth simulations
Strict aerobic growth check
At the time of writing, the utilised draft reconstruction tool, CarveMe31, does not include reconstruction templates to differentiate between aerobic and anaerobic species. The directionality of reactions that produce or consume oxygen may affect the model’s ability to grow anaerobically. A. baumannii is defined to be a strictly aerobic species. Hence, we tested whether our model could grow with no oxygen supplementation. For this purpose, we examined all active oxygenproducing reactions under anaerobic conditions. We corrected their directionality based on the organism-specific information found in BioCyc81 and kept only those with associated gene evidence.
Defining a minimal growth medium
To determine the minimal number of nutrients needed for the bacterium to grow, we defined a minimal medium using iACB23LX. We determined the minimal amount of metabolites needed for growth using the M9 medium (Supplementary File S1) as a reference. We modeled growth on the minimal medium by enabling the uptake of all metabolites that constitute the medium (lower bound of exchange reactions set to −10 mmol/(gDW · h)). The lower bound for the rest of the exchanges was set to 0 mmol/(gDW · h). The final minimal medium is listed in Table 1 and the Supplementary File S1. It consists of nine transition metals, a carbon source, a nitrogen source, a sulfur source, and a phosphorus source. The aerobic environment was simulated by setting the lower bound for the oxygen exchange to −10 mmol/(gDW · h).
Growth in chemically defined media
We utilised experimentally verified growth media to examine the growth capabilities of iACB23LX. The LB medium serves as a common medium for the cultivation of A. baumannii. Consequently, we conducted an assessment of our model’s capacity to accurately simulate growth in this particular medium Additionally, we inspected the growth of our model in the human nasal niche, as A. baumannii have been isolated from nasal samples within ICUs10, 11, 12. For this purpose, we utilised the SNM that imitates the human nasal habitat77. In all cases, if macromolecules or mixtures were present, we considered the constitutive molecular components for the medium definition. As our model was initially unable to reproduce growth on the applied media, we deployed the gap-filling option from CarveMe to detect missing reactions and gaps in the network31. All growth media formulations are available in the Supplementary File S1.
Rich medium definition
To investigate our model’s growth rate when all nutrients are available to the bacterial cell, we defined the rich medium. For this purpose, we enabled the uptake of all extracellular metabolites by the model setting the lower bound of their exchange reactions to −10 mmol/(gDW · h).
Model validation
Evaluation of carbon and nitrogen utilization
We employed the previously published Biolog Phenotypic Array data by Farrugia et al. for A. baumannii ATCC 17978 to validate the functionality of our model48. According to the experimental guidelines provided by Farrugia et al., we utilised the M9 medium for all simulations. The medium was then supplemented with D-xylose as a carbon source for the nitrogen testings, while ammonium served as the only nitrogen source for the carbon tests. As D-xylose was initially not part of the model, we conducted an extensive search in the organism-specific databases KEGG30 and BioCyc81 to include missing reactions.
The phenotypes were grouped by their maximal kinetic curve height. A trait was considered positive (“growth”) if the height exceeded the 115 and 101 OmniLog units for a nitrogen and carbon source, respectively. The prediction accuracy was evaluated by comparing the in silico-derived phenotypes to the Biolog results. More specifically, the overall model’s accuracy (ACC) was calculated by the overall agreement: where true positive (TP) and true negative (TN) are correct predictions, while false positive (FP) and false negative (FN) are inconsistent predictions. Discrepancies were resolved via iterative manual curation of the model.
Gene perturbation analysis
We performed in silico single-gene deletions on iACB23LX to detect essential genes. For this purpose, we utilised the single_gene_deletion function from the COBRApy50 package. A gene is considered to be essential if a flux of 0.0 mmol/(gDW ·h) was observed through the biomass reaction after setting the lower and upper bounds of the associated reaction(s) to 0.0 mmol/(gDW ·h).
Additionally, we examined the effect of gene deletions using two different optimization approaches: FBA51 and MOMA52. Contrary to FBA, MOMA is based on quadratic programming, and the involved optimization problem is the Euclidean distance minimization in flux space. Moreover, it approximates the metabolic phenotype and relaxes the assumption of optimal growth flux for gene deletions52.
The results were compared to the ATCC 17978-specific gene essentiality dataset from 201449. Wang et al. generated a random mutagenesis dataset including 15,000 unique transposon mutants using insertion sequencing (INSeq)49. By the time of writing, this is the only A. baumannii ATCC 17978 library presenting gene essentiality information. Analogously to the experimental settings, the nutrient uptake constraints were set to the LB medium. From the 453 genes identified as essential by Wang et al., 191 could be compared to our predictions. The rest were not part of iACB23LX due to their non-metabolic functions. To measure the effect of a single deletion, we calculated the fold change (FC) between the model’s growth rate after (grKO) and before (grWT) a single knockout. This is formulated as follows: To this end, if FCgr = 0, the deleted gene is classified as essential, meaning its removal prevented the network from producing at least one key biomass metabolite predicting no growth. Similarly, if FCgr = 1, the deletion of the gene from the network did not affect the growth phenotype (labeled as inessential), while when 0 < FCgr < 1, the removal of this gene affected partially the biomass production (labeled as partially essential). The complete lists of the gene essentiality results are available in the Supplementary Files S2 and S3.
To examine the potential of the in silico determined essential genes on becoming novel drug candidates to fight A. baumannii infections, we probed the queries of predicted false negative candidates against the human proteome using Basic Local Alignment Search Tool (BLAST)88. The protein sequences were aligned to the human protein sequences using the default settings of the NCBI BLASTp tool (word size: 6, matrix: BLOSUM62, gap costs: 11 for existence and 1 for extension). To eliminate adverse effects and ensure no interference with human-like proteins, queries with any non-zero alignment score with the human proteome were not considered. Lastly, we searched in the DrugBank database version 5.1.9 to find inhibitors or ligands known to act with the enzymes encoded by the non-homologous genes58.
Curation of existing metabolic networks
Previously reconstructed models of A. baumannii for multiple strains (Figure 2, Table 3) were collected and curated following community standards and guidelines. For this, we created a workflow shown in Figure 6 that consists of four main steps and utilises model validation and annotation tools. This can be applied to any metabolic network in SBML37 format and follows the community “gold standards” strictly as proposed by Carey et al.29. The curation steps involved changes in the format, amount, and quality of the included information. The context has not been altered in any way that could impact the models’ prediction capabilities. We employed a combination of already existing tools to analyse, simulate, and quality-control the models (COBRApy50, MEMOTE36, and the SBML Validator38). Different database cross-references were incorporated in the models using ModelPolisher86 and following the MIRIAM guidelines85, while the libSBML library38 was used to manipulate the file format and convert to the latest version. To resolve inflated growth rates, we determined computationally-defined minimal growth media. The growth capabilities were examined with respect to various experimentally-derived growth media, while the LB medium was applied to identify lethal genes. A strain-wise comparison was not feasible due to strain-specific identifiers, no successful growth, or missing genes. Hence, we investigated the essential genes across all models with identifiers that could not be mapped with the Pathosystems Resource Integration Center (PATRIC) ID mapping tool89.
To begin with the debugging, we examined the syntacti-cal correctness and internal consistency of the downloaded files using the SBML Validator from the libSBML library38. Two models (iCN718 and iJS784) could not pass the validator check and reported errors since they were not in a valid SBML37 format right after their attainment. We made iCN718 valid by deleting the reaction DNADRAIN for which neither a reactant nor a product was assigned since the associated metabolite was not part of the model. Similarly, the empty groups attribute was removed from iJS784, converting the file into a valid format. Warnings were detected for iATCC1906, and iAB5075 due to missing definition of the fbc extension (became available at the latest Level 3 release79) and the non-alphanumeric chemical formulas. We resolved these issues by defining the fbc list listOfGeneProducts and the species attribute chemicalFormula. In more detail, we extracted the given GPR from the notes field and defined individual geneProduct classes with id, name, and label. The attribute chemicalFormula was set equal to the species chemical formulas extracted from the notes and is particularly essential in reaction’s validation and balancing. Following the SBML37 specifications regarding its constitution, in case of ambiguous formulas separated by a semicolon (;), the first molecular representation was chosen. With this, the genes and metabolites’ chemical formulas became part of the file’s main structure. Since iATCC1906 carried KEGG30 identifiers, we could extract the metabolites’ chemical formulas from the database and add them to the model. Moving on with the file extension, we declared the remaining missing attributes from reactions, metabolites, and genes that are required according to the SBML37 language guidelines. More specifically, we defined the metaid attribute when missing, while we fixed any errors regarding the identifiers nomenclature. Further extension involved the annotation of reactions, metabolites, and genes with a plethora of database cross-references following the MIRIAM guidelines85. For this, we employed ModelPolisher that complements and annotates SBML37 models with additional metadata using the BiGG Models knowledgebase as reference86. We also defined precise SBO Terms with the sboTerm attribute using the SBOannotator34. The final step of debugging involved the conversion of all models to the newest available format SBML Level 3 Version 239, as well as the quality control using MEMOTE36.
Data availability
Supplementary tables in Microsoft Excel format are available along with this article. The model iACB23LX along with all curated and refined models are available at the BioModels Database90 as an SBML Level 3 Version 191 file distributed as Open Modelling EXchange format (OMEX) archive92 including annotation93. Access the model at https://www.ebi.ac.uk/biomodels/MODEL2309120001.
Author contributions
Conceptualization and idea, N.L.; model reconstruction and analysis, N.L. and Y.X.; curation and analysis of additional models, N.L.; experimental work, L.F. and M.S.; manuscript writing, N.L.; manuscript revision, N.L., Y.X., and A.D.; supervision and funding acquisition, A.D. All authors approved the publishing of the manuscript.
Competing interests
The authors declare no conflict of interest.
Supporting Information
S1 Figure. Oxygen-producing and -consuming reactions found in iACB23LX together with their anaerobic fluxes. All flux rates are written in orange and are given in mmol/(gDW ·h). The reaction abbreviations are as follows: O2tpp, O2 transport via diffusion between periplasm and cytosol; CATpp, periplasmatic catalase; H2O2tex, hydrogen peroxide transport via diffusion; CAT, Catalase; O2tex, O2 transport via diffusion between periplasm and extracellular space; EX_h2o2_e, hydrogen peroxide exchange and EX_o2_e, O2 exchange. Figure generated with Escher94.
S2 Figure. Experimentally-derived growth curves of
A. baumannii. The growth curves for A. baumannii strains AB5075, ATCC 73217978, ATCC 19606, and AYE were measured in LB medium, M9 medium supplemented with acetate, and acSNM. Additionally, the in silico-defined minimal medium (iMinMed) was tested for all strains.
S1 Table. In silico formulations of examined media compositions. Metabolites are described by BiGG 62 identifiers.
S2 Table. In silico gene knockout results using FBA. The ratio column describes the growth rate change before and after the respective knockout.
S3 Table. In silico gene knockout results using MOMA. The ratio column describes the growth rate change before and after the respective knockout.
S4 Table. Metabolic genes found to be essential for growth in iACB23LX and encode proteins with no human counterparts.
S5 Table. Computationally-defined minimal growth me-dia for previously published models. Due to inflated growth rates in most published A. baumannii GEMs, we established minimal media supporting non-zero biomass flux.
S6 Table. Gene lethality predictions using previously published A. baumannii models and FBA.
S7 Table. Gene lethality predictions using previously published A. baumannii models and MOMA. This offers a complementary perspective on the essential genes in the organism’s metabolism.
Acknowledgments
This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC 2124 – 390838134 and supported by the Cluster of Excellence ‘Controlling Microbes to Fight Infections’ (CMFI). A.D. is supported by the German Center for Infection Research (DZIF, doi: 10.13039/100009139) within the Deutsche Zentren der Gesundheitsforschung (BMBF-DZG, German Centers for Health Research of the Federal Ministry of Education and Research (BMBF)), grant 𝒩0 8020708703. The authors acknowledge the support by the Open Access Publishing Fund of the University of Tübingen (https://uni-tuebingen.de/en/216529). The authors also thank Dr. Bernhard Krismer for providing the synthetic nasal medium.
Footnotes
Experimental data that validate the predicted growth rates in different media for various strains.
List of Abbreviations
- AGORA
- Assembly of Gut Organisms through Reconstruction and Analysis
- AMR
- antimicrobial resistance
- ATP
- adenosine triphosphate
- BiGG
- Biochemical, Genetical, and Genomical
- BLAST
- Basic Local Alignment Search Tool
- BMBF
- Federal Ministry of Education and Research (Bundesministerium für Bildung und Forschung)
- BMBF-DZG
- Deutsche Zentren der Gesundheitsforschung
- BOF
- biomass objective function
- CBM
- constraint-based modelling
- CMFI
- Controlling Microbes to Fight Infections
- COBRApy
- Constraints-Based Reconstruction and Analysis for Python
- CoA
- coenzyme A
- COVID-19
- Coronavirus Disease 2019
- CTP
- cytidine triphosphate
- CV
- controlled vocabulary
- DFG
- Deutsche Forschungsgemeinschaft
- DZIF
- German Center for Infection Research
- ECO
- Evidence and Conclusion Ontology
- EDR
- energy dissipation reaction
- EGC
- energy-generating cycle
- EPSP
- enolpyruvylshikimate phosphate
- FADH2
- flavin adenine dinucleotide
- FAIR
- Findable, Accessible, Interoperable, and Reusable
- FC
- fold change
- FMNH2
- flavin mononucleotide
- FBA
- flux balance analysis
- fbc
- flux balance constraints
- FDA
- Food and Drug Administration
- FN
- false negative
- FP
- false positive
- GEM
- genome-scale metabolic model
- GFF
- General Feature Format
- GMP
- guanosine 5’-phosphate
- GPR
- gene-protein-reaction associations
- GTP
- guanosine triphosphate
- ICU
- intensive care unit
- INSeq
- insertion sequencing
- ITP
- inosine triphosphate
- JSON
- JavaScript Object Notation
- KDPG
- 2-keto-3-deoxy-6-phosphogluconate
- KEGG
- Kyoto Encyclopedia of Genes and Genomes
- LB
- Luria-Bertani
- M9
- M9 minimal medium
- MDR
- multidrug-resistant
- MEMOTE
- Metabolic Model Testing
- MIRIAM
- Minimal Information Required In the Annotation of Models
- MOMA
- Minimization of Metabolic Adjustment
- NADH
- nicotinamide adenine dinucleotide
- NADPH
- nicotinamide adenine dinucleotide phosphate
- NCBI
- National Centre for Biotechnology Information
- OD
- optical density
- OLS
- Ontology Lookup Service
- OMEX
- Open Modelling EXchange format
- PATRIC
- Pathosystems Resource Integration Center
- PBS
- phosphate buffered saline
- DR
- pandrug-resistant
- SARS-CoV-2
- Severe Acute Respiratory Syndrome Coronavirus 2
- SBML
- Systems Biology Markup Language
- SBO
- Systems Biology Ontology
- SNM
- synthetic nasal medium
- TN
- true negative
- TP
- true positive
- UTP
- uridine triphosphate
- VMH
- Virtual Metabolic Human
- WHO
- World Health Organization
- XDR
- extensively drug-resistant