Genome‐scale model of Rhodotorula toruloides metabolism

The basidiomycete red yeast Rhodotorula toruloides is a promising platform organism for production of biooils. We present rhto‐GEM, the first genome‐scale model (GEM) of R. toruloides metabolism, that was largely reconstructed using RAVEN toolbox. The model includes 852 genes, 2,731 reactions, and 2,277 metabolites, while lipid metabolism is described using the SLIMEr formalism allowing direct integration of lipid class and acyl chain experimental distribution data. The simulation results confirmed that the R. toruloides model provides valid growth predictions on glucose, xylose, and glycerol, while prediction of genetic engineering targets to increase production of linolenic acid, triacylglycerols, and carotenoids identified genes—some of which have previously been engineered to successfully increase production. This renders rtho‐GEM valuable for future studies to improve the production of other oleochemicals of industrial relevance including value‐added fatty acids and carotenoids, in addition to facilitate system‐wide omics‐data analysis in R. toruloides. Expanding the portfolio of GEMs for lipid‐accumulating fungi contributes to both understanding of metabolic mechanisms of the oleaginous phenotype but also uncover particularities of the lipid production machinery in R. toruloides.

A better understanding of lipid metabolism in R. toruloides has applications beyond the production of lipids for fuel and nutraceuticals. R. toruloides has also been used in screens for novel antiobesity drugs (Lee, Cheon, & Rhee, 2018), while lipid synthesis has been shown to be a promising target for antifungal therapies (Pan, Hu, & Yu, 2018). Several species within the Rhodotorula genus have been identified as emerging pathogens in both animals and people (Wirth & Goldani, 2012). Intracellular glycerol accumulation caused by hydrolysis of storage triglycerides by pathogenic fungi has been shown to play a crucial role in turgor generation for penetration and invasion of tissues (Nguyen et al., 2011;Wang & St. Leger, 2007).
Modeling of R. toruloides lipid metabolism, can, therefore, also aid in unraveling the pathobiology of this group of yeasts.
Genome-scale metabolic models (GEMs) are comprehensive summaries of the metabolic network in an organism, which are derived from the available genome sequence. GEMs can be used for the identification of potential metabolic engineering targets, or select the best performing metabolic network among alternatives (Kerkhoven, Lahtvee, & Nielsen, 2015). Extending the portfolio of GEMs for various organisms can enhance the benefits of bioprospecting and aid in the design and improvement of performance of synthetic organisms. The generation of a genome scale metabolic model of R. toruloides lipid synthesis would provide a deeper understanding of oleaginous ability and facilitate genetic engineering strategies. Although a number of metabolic models of lipid production in R. toruloides have been reported to date (Bommareddy, Sabra, Maheshwari, & Zeng, 2015;Castañeda, Nuñez, Garelli, Voget, & De Battista, 2018), this study represents the first genome-scale model of its metabolism.

| Draft model reconstruction
The genome-scale model, named rhto-GEM, is based on the genome sequence of R. toruloides strain NP11 (Zhu et al., 2012). The reconstruction of rhto-GEM was primarily performed using RAVEN 2.2.1, a MATLAB toolbox for genome-scale model reconstruction (H. Wang et al., 2018). All steps of the reconstruction are documented in detail on the GitHub repository under the folder Complementary-Scripts/reconstruction. More information on the GitHub repository is provided below.

| Gap-filling with Meneco
To obtain a functional model, a gap-filling step was performed to add reactions necessary to produce biomass from the preferred growth medium of R. toruloides. To avoid self-producing loops due to stoichiometric inconsistencies, we utilized Meneco 1.5.2 (Prigent et al., 2017) in combination with yeast-GEM (v. 8.2.0) as database of repair reactions. In Meneco, target compounds correspond to metabolites present in the biomass function, while seed compounds are composed of metabolites present in the growth medium, plus some cofactors and metabolites required for FBA growth. If no path exists between seed and target compounds, Meneco proposes one minimal set of reactions (or several minimal sets of same size) coming from a database of reactions to fill the gaps. The sets of seed and target compounds are given in on the GitHub repository under the folder ComplementaryScripts/meneco. The seed compounds included uncharged tRNAs as the biomass reaction explicitly represents protein translation as the transfer of amino acids from tRNAs. The union of all proposed completions was included in the draft model, while manual curation was performed to confirm the likelihood of those reactions and to identify their corresponding genes in R. toruloides.
Palmitoleate (16:1) is not modeled as it is only a minor contributor (<5%) to the overall acyl chain distribution  The growth-and nongrowth-associated energy requirements were fit to measured glucose uptake rates from continuous cultivations of R.
toruloides as reported in literature (Shen et al., 2013), and set at 132.7 mmol gDCW −1 and 3.39 mmol (gDCW h) −1 , respectively. The biomass composition was modified from yeast-GEM to include R.
toruloides lipid class and acyl chain distributions, as provided in ComplementaryData/data. The consumeSomething and produceSomething functions from RAVEN were used to ensure there is no net production or consumption of mass by any reaction in the model. The rhto-GEM model is hosted on a dedicated GitHub repository (http:// github.com/SysBioChalmers/rhto-GEM). Here, all scripts for model reconstruction are provided, in addition to the model in various file formats, for example, SBML, YAML, TXT, and XLSX, and scripts for performing the simulations detailed in this manuscript. This environment allows for versioning and comparison of the model, reporting and tracking of issues, organization of development, and continuous integration. Memote 0.9.2 is a model test suite (Lieven et al., 2018) used to assess model quality, which is automatically run via Travis CI with each new model release, currently skipping the consistency tests due to their long duration.

| Model simulations
Flux balance analysis was performed with RAVEN toolbox, using constraints on exchange fluxes as specified in the text, while also detailed in the relevant scripts in the ComplementaryScripts folder. All simulations here were performed with rhto-GEM version 1.2.1. Gene essentiality was predicted using singleGeneDeletion from COBRA toolbox 3.0.6 (Heirendt et al., 2017), where growth rates reduced by more than two-thirds were classified as lethal. Reactions significantly affecting TAG biosynthesis were identified using singleRxnDeletion from COBRA toolbox. To predict genetic targets for metabolic engineering, the flux scanning of enforced objective function (FSEOF; Choi, Lee, Kim, & Woo, 2010) implementation of RAVEN was used.
Exchange reactions were added for the products of interest, which were optimized with either glucose or xylose as carbon source.
The slope parameter derived from FSEOF is indicative of how strong each reaction is contributing toward a shift from growth toward production of the target compound, and thereby suggests which reactions are promising targets for overexpression to increase productivity. In addition to automated template-based reconstruction, it is imperative to curate organism-specific reactions and pathways to obtain a representative and comprehensive model. In the third step of rhto-GEM reconstruction, we included seven reactions from the carotene and torulene biosynthetic pathways (Buzzini et al., 2007), while fatty-acid degradation through mitochondrial β-oxidation introduced 67 further reactions (Zhu et al., 2012). Along with synthesis and degradation pathways of 18:2 and 18:3 fatty acids, the third step of model reconstruction increased the reaction and gene count to 3,362 and 842, respectively.

| Topological-based gap-filling
As the resulting draft model was unable to support the production of

| Representation of lipid metabolism
As particular interest on R. toruloides is focused on its oleaginous nature, attention was paid to accurately depict lipid metabolism. Recently, we have developed the SLIMEr formalism for describing lipids in genomescale models, which splits lipids into measurable entities . This formalism represents the flexibility of lipid metabolism while allowing incorporation of measurements of lipid classes and acyl chain distributions, briefly explained in Section 2, while a detailed analysis of the practical implications of this approach is provided in Sánchez et al., 2019.
In the fifth step of reconstruction, we applied the SLIMEr formalism as previously described for S. cerevisiae. As the acyl chain distribution of R. toruloides is different from S. cerevisiae, for example, the presence of 18:2 and 18:3 acyl chains, this required extensive manual curation of the SLIME reactions, culminating in a lipid-curated draft model with 2,781 reactions, which is less than before curation of lipid metabolism as nonrelevant reactions (based on lipid acyl chain compositions) were discarded. We subsequently populated the model with FAME and lipid class data obtained from mid-exponential phase bioreactor cultivations of R. toruloides on glucose .

| Quality control and validation of rhto-GEM
To transform this functional draft model to the first version of the R.
toruloides GEM, additional manual curation was performed wherein, step 6 reactions not connected to the main network, as introduced early in the reconstruction process, were removed. In step 7, remaining template-derived genes were replaced by their R. toruloides orthologs where possible and otherwise deprecated, while in step 8 of the reconstruction the (non-) growth associated maintenance energy was fitted to experimentally determined growth and glucose uptake rates (Shen et al., 2013), and in step 9, the annotation of metabolites Material S1). In particular, a low stoichiometric consistency is reported with a memote score of nearly 40%. This is at least partially an artifact of the SLIME reactions whose product stoichiometries are weightnormalized, which allows for direct integration of lipid measurements that are provided as grams per gram dry cell weight.
We structurally compared rhto-GEM with previously reported smallscale models of R. toruloides (Bommareddy et al., 2015;Castañeda et al., 2018) and identified a number of manually curated gene associations and reactions that were differently defined in the semiautomatically reconstructed rhto-GEM. These changes were used to curate the model, to yield version 1.2.1 that was used for further analysis. The maximum theoretical TAG production yields in the small-scale models were slightly lower than the yield predicted from rhto-GEM, which can be attributed to the absence of complex I of the oxidative phosphorylation resulting in lower energy yields (Figure 2a).
To validate rhto-GEM functionality, we compared its growth rate to experimental measured values. From literature, we gathered glucose, glycerol, and xylose uptake rates from R. toruloides bioreactor cultivations TIUKOVA ET AL. genes (Coradetti et al., 2018). A total of 1,337 genes were identified as putative essential genes due to their recalcitrance to T-DNA insertion, of which 207 genes are associated to reactions in rhto-GEM. As it is unclear whether recalcitrance to T-DNA insertion is solely based on gene essentiality, we took the more conservative assumption that genes not recalcitrant to insertion are not essential. Simulations with rhto-GEM indicated a specificity (or true negative rate) of 0.853 (Figure 2c).

| Essential reactions for oleaginous phenotype
R. toruloides is ranked as one of the best performing yeast species for lipid production together with the oleaginous yeasts, Lipomyces starkeyi and Trichosporon oleaginosus (Papanikolaou & Aggelis, 2011).
To evaluate which reactions are stoichiometrically most influential in the oleaginous phenotype of R. toruloides, we ran reaction essentiality analysis, where the production of one particular TAG (a major storage lipid) was set as cellular objective and the resulting TAG yield was indicative of the essentiality of each reaction. Comparing oleaginous-essential reactions between three carbon sources (Table 1), the largest differences are related to the carbon assimilation pathway; pentose phosphate pathway reactions are more affecting for lipid production on xylose compared to glycerol.
Acetyl-CoA carboxylase, which has previously been identified as an important target for increased TAG production (S. , was here identified as essential on all tested carbon sources.
Many reactions that might be perceived to be essential were not identified as such in our analysis, such as the last step of TAG biosynthesis using fatty acyl-CoA and diacylglycerol as catalyzed by diacylglycerol acyltransferase. However, TAGs can alternatively be generated by reshuffling acyl chains between phospholipids and diacylglycerols. The relatively small number of essential reactions, therefore, demonstrates the high flexibility of lipid metabolism.

| Prediction of potential targets for increased production of triacylglycerols
Genetic studies of R. toruloides have shown that its lipid accumulation capacity can be further enhanced (reviewed in Marella, Holkenbrink, Siewers, & Borodina, 2018). We employed FSEOF on rhto-GEM to identify potential metabolic engineering targets for improved production of TAG. FSEOF is based on the principle that increased production requires a redirection of flux, from originally going toward biomass generation, to ideally going (partially) toward our product of interest. However, reactions that already carry significant flux for biomass production are accounted for in FSEOF, as these are potentially less promising targets for overexpression. As there is interest in the use of hydrolyzed plant biomass as feedstock, we evaluated potential targets for both glucose and xylose as carbon source, in addition to the often used glycerol (Table 2 and  Gene targets that have previously been shown to enhance TAG accumulation in R. toruloides but were not predicted in the current analysis include nonmetabolic genes such as those involved in organelle morphogenesis, which are currently beyond the capacity of analysis of a purely metabolic model. This include genes such as RHTO_05627, which encodes the lipid droplet-associated protein Ldp1  whose expression has been shown to improve lipid production.

| Prediction of potential targets for increased production of linolenic acid
Oleaginous yeasts are a potential source of essential fatty acids such as linolenic and linoleic acid, while linolenic ω-3 fatty acids provide health benefits in nutrition and serve as precursors for synthesis of docosahexaenoic acid (DHA) and eicosapentaenoic acid (EPA) in humans (Innis, 2014). Natural strains of R. toruloides contain linolenic acid around 3% of total fatty acids and the level of fatty acids saturation can change in response to temperature (Suutari, Liukkonen, & Laakso, 1990). Genetic engineering targets for improved production of polyunsatured fatty acids (PUFAs) in micro-organisms have been reviewed previously (Gong et al., 2014).
We performed FSEOF analysis specifically for linolenic acid production, and as expected, most genes identified as targets were also identified for improved production of TAGs, in addition to oleoyl-CoA and linoleoyl-CoA desaturases (Table 2). This is furthermore supported by published reports, for example, overexpression of native Δ9 desaturase (Tsai et al., 2019; was Note: Lipid production as percentage of control strain, after removal of individual reactions, using glucose, xylose, or glycerol as carbon source.

| 3401
T A B L E 2 Comparison of targets predicted from flux scanning of enforced objective function for improved TAG and linolenic production on glucose, xylose, or glycerol as carbon source shown to result in increase of linolenic acid production in R.
toruloides. In the yeast cell, PUFAs may occur as constituent of phospholipids, sulfolipids, acylglycerols, or glycolipids (Jacob, 1992).  Note: Indicated are slopes derived from FSEOF, indicated of whether gene expression should be increased to direct flux from growth toward production.
Only reactions with gene associations are shown. First column refers to numbers in Figure 3. Abbreviation: TAG, triacylglycerol.
F I G U R E 3 Targets predicted by flux scanning of enforced objective function. Genetic engineering targets for improved production of TAGs (green circles) and torularhodin (red circles) in glucose-grown R. toruloides. The numbers refer to reactions indicated in Tables 2 and 3. TAG, triacylglycerol [Color figure can be viewed at wileyonlinelibrary.com] TIUKOVA ET AL.

| 3403
heterologous desaturases resulted in enhanced production of ω-3 fatty acids with EPA as final product at the highest content among known EPA sources (Xie, Jackson, & Zhu, 2015;Xue et al., 2013).
Many of these yeasts reside in the phylloplane and their ability to produce carotenoids is thought to serve as protection against solar radiation.
We performed FSEOF analysis to predict targets for carotenoid production, with torularhodin as representative product (Table 3 and Figure 3). Most of enzymes within the mevalonate pathway were predicted as targets for overexpression, as this pathway is responsible for the production of the prenyl pyrophosphate precursor. This indicates that a higher flux is required for carotenoids than what is obtained during biomass production. Overexpression of truncated 3-hydroxy-3-methylglutaryl-CoA reductase from Kluyveromyces marxianus in combination with other genes was shown to increase β-carotene production in Rhodotorula glutinis (Pi et al., 2018). Similarly, also all enzymes of isoprene biosynthetic pathway were predicted to correlate with improved carotenoid production in R. toruloides. The dimethylallyltranstransferase/ geranyltranstransferase (ERG20) is involved in synthesis of farnesyl pyrophosphate from geranyl pyrophosphate and isopentenyl pyrophosphate, which is a direct metabolic precursor of carotenoids as well as ergosterol, heme A, dolichols, and prenyl-adducts for prenylated proteins.
A recent study has demonstrated that overexpression of the X.
Carotenoid biosynthetic enzymes, such as phytoene synthase and dehydrogenase, were identified as targets for increased carotenoid production, as anticipated. The gene coding for phytoene dehydrogenase (RHTO_04602) has been experimentally verified to be involved in carotenoid biosynthesis (Sun et al., 2017). In addition, overexpression of the X. dendrorhous phytoene desaturase and synthase genes was shown to increase carotenoid yields in R. glutinis (Pi et al., 2018).
T A B L E 3 Comparison of targets predicted from FSEOF for improved torularhodin production on glucose, xylose, or glycerol as carbon source  Note: Indicated are slopes derived from FSEOF, indicated of whether gene expression should be increased to direct flux from growth toward production.
Only reactions with gene associations are shown. First column refers to numbers in Figure 3. Abbreviation: FSEOF, flux scanning of enforced objective function.
Little genetic analysis of the endogenous enzymes for carotenoid biosynthesis in R. toruloides has been carried out to date. A number of mutants with improved carotenoid yields have been generated but the exact genetic changes responsible have not been reported (Bao et al., 2019; The earlier mentioned T-DNA mutagenesis study reported decreased carotenoid production upon integration into the intron of R. toruloides hypothetical gene RTHO_00032 or the exon of hypothetical gene RTHO_07952, which is predicted to encode a bZIP transcription factor . The same study also reported that T-DNA insertion into the promoter of hypothetical gene RTHO_07650 (encoding a putative DUF1479 domain protein) increased carotenoid yields, exemplifying that regulation plays an important role in carotenoid biosynthesis, an assertion that would benefit greatly from integrative analysis of expression data with a comprehensive model of metabolism.
Collectively, the FSEOF results have demonstrated the ability of rhto-GEM to provide valuable predictions of targets for improved production of key products in R. toruloides, as many of these have experimentally been validated to increase production. This renders rhto-GEM as persuasive tool to aid in improving the production of less-studied high-value compounds, in addition as a framework for more detailed analysis of high-producing strains.
Our study presents the first reconstruction of GEM of lipidaccumulating basidiomycete R. toruloides, and while we used the S.
cerevisiae model as reference for the conserved parts of metabolism, rhto-GEM contains unique characteristics including ATP:citrate lyase, which is the main source of acetyl-CoA for lipid synthesis; mitochondrial β-oxidation; a cytoplasmic malic enzyme that provides an alternative to the pentose phosphate pathway for NADPH regeneration; and pathways related to polyunsaturated fatty acids and carotenoid biosynthesis.
The model incorporates knowledge obtained from genomics and proteomics data generated for R. toruloides (Zhu et al., 2012) and was validated using cultivation data (Azambuja et al., 2018;Bommareddy et al., 2015;Bonturi et al., 2017;Shen et al., 2013), demonstrating good agreement with experimentally reported growth rates. Analysis of the model allowed to identify potential genetic engineering strategies for enhanced lipid production. Some of these genetic targets were found to agree with published experimental studies (Díaz et al., 2018;. As such, rhto-GEM emerges as a valuable tool for future analysis of oleaginous and lipid metabolism. An important feature is its distribution through a Git repository, which allows for continuous improvement and tracking of model development.
By providing all relevant scripts to replicate the reconstruction of rhto-GEM, we have demonstrated how a new genome-scale model can conveniently be generated, a process greatly aided by RAVEN or alternative solutions such as AutoKEGGRec (Karlsen, Schulz, & Almaas, 2018