An updated genome-scale model for Xylella fastidiosa subsp. pauca De Donno

Xylella fastidiosa is a gram-negative phytopathogenic bacterium that caused a significant economic impact around the world. In the last decade, genome-scale metabolic models have become important systems biology tools for studying the metabolic behaviour of different pathogens and driving the discovery of novel drug targets. This work is a second iteration of the iMS508 model for X. fastidiosa subsp. pauca De Donno. The model comprises 1138 reactions, 1234 metabolites, and 509 genes. in silico validation of the metabolic model was achieved through the comparison of simulations with available experimental data. Aerobic metabolism was simulated properly and fastidian gum production rates predicted accurately.


10
Xylella fastidiosa is commonly known and widespread gram-negative phytopathogenic 11 bacterium [1]. It usually resides in the xylem of its host and is then transmitted to other 12 plants by insects that feed on the xylem-fluid [2]. Even though host specificity is a prevalent 13 characteristic in many phytopathogenic bacteria [3], this is not the case of X. fastidiosa. 14 According to the Xylella spp. host plant database [4], the infection has been recorded in 15 over 500 plant species. Still, X. fastidiosa's subspecies are rather host-specific and usually 16 related to certain plant diseases [5,6]. One key characteristic of the infection caused by this 17 phytopathogen is that some plants may be infected whilst not showing any symptoms [6]. 18 Hence, an X. fastidiosa outbreak is significantly harder to mitigate. 19 The X. fastidiosa infection was first detected in European territory in the Apulia region 20 of Italy and was caused by the X. fastidiosa subsp. pauca De Donno [7,8]. This led to a 21 severe outbreak of a disease that would come to be known as olive quick decline syndrome. 22 As a result, thousands of olive trees died, thus having a negative impact on the olive oil 23 market and the local economy [8]. Future projections indicate, in a worst-case scenario, 24 an economic impact of 5.2 billion Euros in Italy alone for the next 50 years [9]. Hence, 25 X. fastidiosa was declared a quarantine pest by the European Commission, and was the 26 highest-ranked crop infesting pest according to the Impact Indicator for Priority Pests (I2P2) 27 metric, leading all the economic, social, and environmental ranked domains in 2019 [10]. 28 With the rise in popularity of high-throughput sequencing techniques, the availability 29 of genomic and metabolic data has been exponentially increasing. As a result, fields 30 such as Systems Biology have gained significant relevance. Systems Biology aims at 31 studying cells through a system-level approach, trying to uncover their structure and the 32 dynamics between each of its components [11]. Genome-Scale Metabolic (GSM) Models are 33 Systems Biology tools that combine both genomic and metabolic data for a given organism. 34 Throughout the years these models have been extensively used to predict phenotypes for a 35 given organism, under specific conditions, and to drive new knowledge of the metabolic 36 behaviour of the organism in study [12]. Moreover, GSM models are useful tools to unveil 37 novel drug targets to kill pathogens with minimum effect on the host, leading to drug 38 discovery or re-purposing [13,14].
In 2020, a GSM model of X. fastidiosa subsp. multiplex CFBP 8418 was published. That 40 work aimed at developing a metabolic model to uncover metabolic characteristics connected 41 to the fastidious growth of the phytopathogen [15]. Moreover, an in-house developed GSM 42 model of X. fastidiosa subsp. pauca De Donno was published in a conference proceedings 43 in 2022 [16]. Considering this, this work aimed at providing a second iteration of latter 44 the metabolic model and its respective validation [8]. This model is also to be compared 45  The merlin's "automatic workflow" tool was used for the annotation, based on phylo-65 genetic proximity to X. fastidiosa and the number of records associated with the organism 66 at UniProtKB/SwissProt.

67
Briefly, the workflow explores the list of retrieved homologous for each candidate gene, 68 comparing it to the defined set of similar organisms. If a match is found, the candidate gene 69 is automatically annotated with the enzyme commission (EC) number [23] and the product 70 name of the closest organism's homologous gene. The employed automatic workflow is 71 shown in Figure S1, along with the list of selected organisms. A draft metabolic network for X. fastidiosa was assembled by coupling the information 75 obtained in the genome annotation step with metabolic data retrieved from the Kyoto 76 Encyclopedia of Genes and Genomes (KEGG) database [24]. The EC numbers identified in 77 the genome annotation step were used to select the reactions to be included in the draft 78 network. Reactions involved in generic metabolic pathways, such as DNA methylation and 79 rRNA modification, were ignored and not included in the model as these are unnecessary 80 (modelling-wise) for the reconstruction process. To finalize the initial draft metabolic 81 network, KEGG's spontaneous reactions were automatically added to the model.  The subcellular compartment location of each protein was predicted using PSORTb 84 3.0 [25]. The resulting file was loaded intomerlin, allowing the automatic integration of 85 compartments into the model's reactions. The Escherichia coli GSM model (iOJ1366) [26] and Xanthomonas campestris large-scale 91 metabolic model [27] were used as templates to infer the macromolecular composition of 92 the biomass equation. This information was complemented with data retrieved from the 93 literature.

94
The biomass reaction accounts for a total of seven macromolecular entities, namely 95 Deoxyribonucleic acid (DNA), Ribonucleic acid (RNA), Protein, Cofactor, Peptidoglycan, 96 Lipopolysaccharide, and Lipid. Deoxynucleotide, nucleotide, and amino acid contents 97 were determined through merlin's e-Biomass tool [28]. Additionally, this tool formulated 98 a template reaction for the Cofactor macromolecule, which was promptly rearranged 99 considering the metabolic potential of the assembled metabolic network. However, as 100 the composition of the remaining macromolecules cannot be obtained through genomic 101 information, available literature was sought to determine the necessary precursors for the 102 synthesis of peptidoglycans [29], lipopolysaccharides [30,31] and lipids [32]. Finally, as 103 the lipidic portion of the biomass requires fatty acids for its synthesis, whose chain is of 104 variable size, an average fatty acid metabolite was created based on data of multiple strains 105 of X. fastidiosa [1] as described elsewhere [33].

106
Growth and non-growth associated energy requirements were included in the model, 107 as these represent the energy required for several cellular processes, including the assembly 108 of macromolecules and the maintenance of internal homeostasis. As there is a lack of infor-109 mation regarding energy requirements for X. fastidiosa, information from the iOJ1366 model 110 was adapted to this work. Therefore, growth-associated energy necessities, represented in 111 the biomass equation, were set as the consumption of 53.5 mmol · g −1 DW of ATP. Likewise, 112 non-growth associated requirements were defined through an ATP hydrolysis reaction, 113 while setting its lower and upper bounds to 3 mmol · g −1 DW · h −1 .

115
To accurately represent an organism's metabolic behaviour, manual curation is re-116 quired. Therefore, whenever in silico simulations did not corroborate experimental data, 117 manual curation of the GSM model was carried out. Dead-end metabolites and blocked reactions were identified using merlin.

120
Through this phase, an initial assessment of the metabolic potential of X. fastidiosa 121 was performed by reviewing organism-specific literature. Specifically, pathways with 122 metabolic gaps were evaluated to a higher degree. Furthermore, whenever a gap was 123 found, supporting evidence at the genome level was sought, in the information obtained 124 from the genome annotation step. If the enzyme that catalyses the missing reaction was 125 indeed encoded in the genome, such reaction, and its associated GPR rule, was added to the 126 model. Thus, multiple missed/misannotations were corrected during this comprehensive 127 analysis.

128
Moreover, a tool recently added to themerlin software, BioISO [34], was also used 129 during this process. BioISO tests the capability of the network towards the synthesis of 130 products of a given reaction. Therefore, this tool is very valuable to test the network 131 potential to produce biomass precursor molecules and can pinpoint potential issues within 132 the network for the users. The reconstructed GSM model's performance was evaluated by comparing the pre-135 dicted results to available information retrieved from the literature. As there is a limited 136 amount of information regarding X. fastidiosa metabolism, X. campestris data were used 137 whenever necessary. In this work, all in silico simulations were made by setting one of the 138 biomass reactions ('R_e-Biomass' or 'R_Coupled_Biomass') as the objective function and 139 maximizing it using Parsimonious Flux Balance Analysis (pFBA) [35], except otherwise 140 indicated. Finally, multiple metabolic properties were tested (described in the sections 141 below) by restraining the model properly. Glucose flux pattern through the assembled metabolic network was assessed to allow 144 direct comparison to X. campestris experimental data. Therefore, an environmental con-145 dition that replicates a minimal medium was formulated, where glucose and ammonia 146 were used as carbon and nitrogen sources, respectively. Additionally, flux through the 147 phosphofructokinase reaction (R00764, EC 2.7.1.90) was restrained to a minimal value 148 (1% of the glucose intake rate), as the pyrophosphate-dependent phosphofructokinase 149 identified in X. fastidiosa's genome displays low processivity and represents a minimal 150 fraction of the total proteome in similar organisms [27,36,37].

151
A quantitative validation was performed by evaluating the GSM model's predicted 152 growth and exopolysaccharide production rates. However, as there is limited information 153 related to this topic for X. fastidiosa, both data published alongside the large-scale metabolic 154 model of X. campestris [27] and reported by Letisse et al. [38], were used for validation 155 purposes.

156
Nonetheless, the environmental conditions had to be properly constrained, identically 157 to previous validation steps, as the data retrieved from X. campestris concerns its growth 158 under sucrose, an absent metabolite in the reconstructed GSM model of X. fastidiosa. As a 159 result, glucose was used as a carbon source, instead of sucrose, and the carbon intake was 160 normalized according to the number of carbon atoms (an uptake rate of 1.8 mmol · g −1 · h −1 161 of sucrose equals an uptake rate of 3.6 mmol · g −1 · h −1 of glucose). For validation purposes, the model developed in this work was directly compared to 164 the GSM model of X. fastidiosa subsp. multiplex CFBP 8418 [15]. The first comparison relied 165 on the minimization of glutamine uptake a predefined set of growth, exopolysaccharide 166 production and LesA (a protein secreted by X. fastidiosa) production rates. This last entity, a 167 lipase/esterase known to be secreted by X. fastidiosa and a key virulence factor during the 168 phytopathogen's infection process [39], was inserted in the model by creating a biosynthetic 169 reaction that accounted for its amino acid composition.
170 Distinct carbon sources were tested to examine whether they supported X. fastidiosa's 171 growth. Hence, the environmental conditions replicated a minimal medium, where am-172 monia, orthophosphate and Fe 2+ would act as sources of nitrogen, phosphate, and iron, 173 respectively. Additionally, carbon uptake was normalized according to the number of 174 carbon atoms of the carbohydrate. Simulation outputs were then compared to Biolog 175 Phenotype Microarray data, published along with a GSM model of X. fastidiosa subsp. 176 multiplex CFBP 8418. The macromolecular composition of the X. fastidiosa's model reconstructed in this 180 work, whose entities and respective fractions were inferred from E. coli [26] and X. campestris 181 [27] models, is compiled in Table 1. A detailed biomass composition is available in Table S1. 182 Due to the phylogenetic closeness between X. campestris and X. fastidiosa, most biomass-183 relative contents were directly retrieved from the former model, whenever possible. Data 184 from E. coli was used to determine the remaining macromolecular contents, but the missing 185 portion of X. campestris was always considered during the calculations. Despite the usage of glycogen in iOJ1366 [26] and X. campestris [27] models, the 187 macromolecule was not added to the biomass equation of the X. fastidiosa model, as the 188 glycogen metabolism seems to be a lost trait in multiple parasitic bacteria, including X. 189 fastidiosa [40]. This was further confirmed by the lack of biosynthetic and catabolic enzymes 190 related to glycogen during the genome annotation step. Nevertheless, the glycogen fraction 191 was divided in a 30 − 70% split and allocated into peptidoglycan and lipopolysaccharide, 192 as both these macromolecules contain carbohydrates in their constitution. DNA, RNA and 193 Protein compositions, in terms of precursors, were determined using the e-Biomass tool 194 [28], which is embedded inmerlin.

195
Peptidoglycan production was represented by creating a reaction that accounted for 196 all of its precursors. As X. fastidiosa's peptidoglycan data is still lacking, information from 197 E. coli [29] was retrieved to formulate peptidoglycan biosynthesis, which is composed of 198 two monosaccharide units and five amino acids. One of these amino acids is D-Glutamate, 199 which was promptly replaced by the enantiomer L-Glutamate as X. fastidiosa's metabolic 200 network could not support its biosynthesis.

201
The lipidic portion of the biomass, which comprises membrane phospholipids, was 202 adapted from data of X. campestris B-24 [32], as no information was available for X. fas-203 tidiosa. Phosphatidylinositol and lysophosphatidylethanolamine, compounds measured 204 in the previous experiment, were ignored since these were not found in the assembled X. 205 fastidiosa's metabolic network. Therefore, these lipidic contents were distributed among the 206 other phospholipids. LPS, a constituent of gram-negative bacterial outer membranes, is 207 composed of three moieties: Lipid A, core oligosaccharide and O-antigen. While the Lipid 208 A and the inner portion of the core oligosaccharide are conserved in different bacterial 209 species, both the core oligosaccharide's outer section and the O-antigen, which is an impor-210 tant virulence determinant, are structurally variable [30,31]. These structural changes were 211 inferred from data published by [29] and an LPS synthesis reaction specific for X. fastidiosa 212 was assembled.

213
The remaining biomass portion was associated with the e-Cofactor macromolecule, 214 which is first determined using the e-Biomass tool [28]. However, a rearrangement was 215 performed according to the metabolic potential of the network. As no information regarding 216 the amount of each molecule in the cofactor pool is available, it was assumed that each 217 cofactor would have the same weight (in grams) in the final biomass formulation.

218
The inclusion of energy requirements, which include both cell growth and maintenance-219 related energy, completed the X. fastidiosa's biomass formulation. As stated previously, 220 energy necessities were retrieved from the iOJ1366 E. coli model [26].

221
Due to the possibility of exopolysaccharide biosynthesis during the life cycle of X. 222 fastidiosa, a reaction that couples biomass and exopolysaccharide production was inserted in 223 the model ("R_Coupled_Biomass"), as these entities would compete for the available carbon 224 source. This reaction forces the synthesis of biomass and fastidian gum (X. fastidiosa's 225 exopolysaccharide) as follows: 0.203 Biomass+0.797 Fastidian gum=>1.0 Coupled Biomass. 226 The stoichiometries were inferred from published data for X. campestris [41], and 227 employed in the large-scale metabolic model of X. campestris [27]. Several xanthomonads use the Entner-Doudoroff pathway (EMP) as the main catabolic 231 route for glucose [42], including X. campestris [37]. Although this was not demonstrated for 232 X. fastidiosa yet, due to the genetic and metabolic similarities among these organisms, the 233 EMP has been proposed to be the main glucose degradation pathway for X. fastidiosa [43]. 234 By assuming the maximisation of biomass production, the model predicts 88.6% and 235 8.6% of the glucose flux towards the Entner-Doudoroff and the non-oxidative branch of the 236 pentose phosphate pathways, respectively (Figure 1), which is in the range of the experi-237 mental data available for other organisms: 81 − 93% and 7 − 19% [42]. A more detailed 238 analysis of the glucose flux pattern from this model was already published previously [16]. 239

Specific growth rate and exopolysaccharide production 240
Specific growth and exopolysaccharides production rates were directly compared with 241 in silico simulations of the X. campestris large scale metabolic model [27] and experimental 242 data of the aforementioned species [38]. X. fastidiosa subsp. pauca De Donno GSM model 243 was properly constrained to be possible to perform this direct comparison. Therefore, 244 glucose was used for the in silico simulations of the model developed in this work.

245
As displayed in table 2, the model X. fastidiosa predicted a growth rate value of 0.11 h −1 246 and a specific production rate of exopolysaccharide of 0.44 mmol · g −1 · h −1 . The results are 247 similar to the experimental data [38] and agree with the known fact that X. fastidiosa displays 248 slower growth and exopolysaccharide production rates in comparison to X. campestris.

249
Table 2. In silico predictions of growth and exopolysaccharide production rates for X. fastidiosa iMS509, and X. campestris [27] metabolic models, in comparison to experimental data specific for X. campestris The other end-products of X. fastidiosa subsp. pauca De Donno GSM model in silico 250 simulations were assessed. As expected, the major outputs are water and carbon dioxide, 251 which are by-products of the respiration metabolism. On the other hand, xanthomonadin 252 was predicted to be produced at a small rate. Note that these metabolites were inferred to 253 be produced by the organism after an analysis of the metabolic capabilities of the assembled 254 network. Firstly, xanthomonadins are yellow pigments with photobiological damage protec-255 tion properties produced by Xanthomonas spp.
[44]. Since X. fastidiosa lacks a transaldolase 256 (EC 2.2.1.2), D-Erythrose-4-phosphate can not be completely recycled by the organism; 257 hence, xanthomonadins production was inferred. According to the metabolic model, D-258 Erythrose-4-phosphate could only be metabolised in the shikimate pathway and, as a 259 result, dead-end metabolites of this pathway were thoroughly analysed. Evidence found 260 shows that chorismate was linked to the synthesis of both ubiquinone and xanthomonadin 261 through a bifunctional chorismatase [45] found in X. fastidiosa's genome. Therefore, we 262 assumed the production of xanthomonadin and added the necessary reactions.  Recently, a GSM model of X. fastidiosa multiplex CFBP 8418 was published [15]. There-271 fore, the two metabolic models were compared to assess metabolic differences between the 272 two strains. A topological comparison between the models is displayed in Table 3, and as it 273 can be seen, the models show a small difference in the number of genes, metabolites, and 274 reactions.

275
Subsequently, a minimisation of L-glutamine uptake, while restraining the growth 276 rate, EPS and LesA production to a specific output, was performed and compared to the 277 published data. As displayed in Table 4, the uptake of L-glutamine and other metabolites 278 was identical, and the discrepancies can be explained by the intrinsic details of the biomass 279  A comparison to evaluate the usage of different carbon sources by X. fastidiosa was 283 then performed. The results revealed different behaviours between both models for a 284 subset of the tested carbon sources (Figure 3). While X. fastidiosa subsp. pauca De Donno 285 GSM model only predicts growth for 14 out of the 24 tested carbon sources, the X. fastidiosa 286 multiplex CFPB 8414 GSM model simulates growth in all of them. In silico simulations of the 287 iMS509 model do not indicate growth for the substrates L-Proline, L-Arginine, L-Histidine, 288 L-Ornithine, GABA, Acetate, D-Galactose, D-Xylose, and myo-Inositol, due to the lack 289 of genomic evidence concerning their catabolic pathways. On the other hand, the model 290 published by Gerlin and colleagues [15] includes the necessary reactions, although without 291 any genetic association, which may be valid for X. fastidiosa multiplex CFPB 8418 according 292 to the data provided by Biolog Phenotype Microarrays, published alongside the model. 293 Nevertheless, the same behaviour assumption could not be used for the strain used in this 294 work. Not only due to the lack of genomic evidence, but also because other strains do 295 not show growth in media supplemented with L-Histidine, L-Arginine and L-Lysine [48]. 296 Furthermore, it is common for different strains of the same organism to display distinct 297 metabolic behaviours, as shown in Streptococcus pneumoniae [33], which can be caused by 298 both genetic diversity and gene regulation. Hence, we assumed that X. fastidiosa subsp. 299 pauca De Donno was unable to degrade these compounds and the model was reconstructed 300 accordingly. Figure 3. Comparison of X. fastidiosa's GSM models growth rate (d −1 ) predictions under different carbon sources. The lack of growth prediction in some carbon sources by the X. fastidiosa pauca De Donno GSM model is justified by the lack of genomic evidence for enzymes related to the catabolic pathways of the metabolites, while the other model included these catabolic reactions without any gene association.

301
As stated previously, the presence of a pyrophosphate-dependent phosphofructok-302 inase (EC 2.7.1.90) was identified, allowing the usage of gluconeogenesis by the phy-303 topathogenic organism. However, only the model developed in this work contains this 304 gene-enzyme-reaction association, while in X. fastidiosa multiplex CFBP 8418 it was assumed 305 the utilization of this pathway, through the more typical enzyme fructose-1,6-bisphosphate 306 (EC 3.1.3.11) although without any genetic evidence/association to this enzyme.

308
The output of this work was the reconstruction of a GSM model for X. fastidiosa subsp. 309 pauca De Donno, named iMS509. This work was a second iteration of a previouly developed 310 model named iMS508 [16] The model was reconstructed using merlin, using an automatic 311 genome annotation approach. Biological databases were also considered during the process 312 of retrieving relevant information. The metabolic model has 509 genes, 1234 metabolites, 313 1138 reactions, and 3 compartments. The model was successfully validated accordingly to 314 the glucose flux pattern. Moreover, it was able to simulate aerobic growth of the bacterium 315 and accurately predict production rates of fastidian gum and the lesA protein.

316
Supplementary Materials: Figure S1: Automatic workflow employed; Figure S2: murA aminoacid 317 sequence; Table S1: Detailed biomass composition of X. fastidiosa pauca De Donno; Table S2: Genome 318 annotation results; Table S3: Complete list of drugs known to inhibit the found targets.   The following abbreviations are used in this manuscript: