leADS: improved metabolic pathway inference based on active dataset subsampling

Metabolic pathways are composed of reaction sequences catalyzed by enzymes. The set of reactions within and between cells comprises a reactome. Pathways and reactomes can be predicted from organismal or multi-organismal genomes using rule-based or machine learning methods. While machine learning methods overcome issues of probability and scale associated with rule-based methods, several complications remain that can degrade performance including inadequately labeled training data, missing feature information, and inherent imbalances in the distribution of pathways within a dataset. Here, we present leADS (multi-label learning based on active dataset subsampling), a machine learning method, that uses subsampling to reduce the negative impact of training loss due to class imbalance. We demonstrate leADs performance using organismal and multi-organismal datasets in relation to other machine learning pathway prediction methods. Availability and implementation: leADS is available under the GNU license at github.com/hallamlab/leADS. A wiki, including a tutorial, is available at github.com


Introduction
The rise of next generation sequencing technologies has motivated innovations in metabolic pathway prediction methods [1]. These innovations encompass rule-based or heuristic methods including Patho-Logic [14], and machine learning (ML) methods including PtwML [6] and mlLGPR [20]. In the ML case, a class imbalance problems exists where certain pathways are more common than others because they conduct core metabolic functions conserved across the tree of life. These functions are overrepresented in labeled training data relative to more niche-defining or accessory metabolic functions and can result in training loss with decreased predictive performance. To address this problem, we developed leADS based on prior work in dataset subsampling [10]. leADS incorporates an ensemble of multi-label learners [32] to perform hard example mining [27], reducing the negative impact of training loss on pathway prediction.  Fig B)-predictive performance of MinPath, Pathologic, mlLGPR, triUMPF, and leADS (with random sampling, full data, and four acquisition functions) on Tier 1 (T1) organismal genomes and Critical Assessment of Metagenome Interpretation (CAMI) datasets. (x-axis). Z-axis is average F1 score. Gray cells indicate that algorithm was not run on that dataset. Fig. 1A.) performs training in three iterative steps: 1)-Building an acquisition model. At the very first iteration, an empty set is initialized with randomly selected data from a given pathway dataset ( Fig. 1A.1). Then, an ensemble consisting of g(∈ Z ≥1 ) members is constructed, where each member g in the ensemble is trained on a randomly selected subset of the data.

leADS (
2)-Dataset sub-sampling. During this step, a subset of pathway data is selected using one of four acquisition functions including entropy (H), mutual information (M), variation ratios (V), or normalized propensity scored precision at k ∈ Z >1 (nPSP) (Fig. 1A.2). For each function, the top per% examples are retrieved, where per%(∈ (0, 100]) is a prespecified hyperparameter indicating the subsampling proportion.
3)-Train using sub-sampled data. The selected subset of pathway data from the previous step are used to train leADS using a multi-label 1-vs-All approach [20] (Fig. 1A.3).
These steps are repeated τ (∈ Z ≥1 ) times (Fig. 1A). For each iteration some examples collected from the previous iteration q − 1 are randomly discarded to enable examples not selected in the top per% to be used in round q. Once training is complete: i)-pathway data with per% examples is produced; and ii)-the trained model is stored to use in pathway prediction on new datasets. For definitions, analytical expressions, and optimization, consult Supp. Sections A.1, A.2, and A. 3.
We evaluated leADS performance using a corpora of 10 experimental datasets manifesting diverse multi-label properties, including manually curated organismal genomes, synthetic microbial communities and low complexity microbial communities. We trained leADS on BioCyc v21 Tier 2 and Tier 3 (T2 &3) under three configurations: i)-random sampling (leADS+R) corresponding to 70% of BioCyc v21 T2 &3 selected at random, ii)-full configuration (leADS+F) where all BioCyc v21 T2 &3 data were utilized without subsampling, and iii)-per% = 70% using four acquisition functions: entropy (leADS+H), mutual information (leADS+M), variation ratios (leADS+V), and normalized propensity scored precision (leADS+nPSP). Training for each configuration was run for 10 epochs using g = 10 member size and k = 50 (for leADS+V and leADS+nPSP). For detailed experimental settings, see Supp. Section A.5.3. Pathway prediction results are reported using the average F1 score. As shown in Fig. 1B, leADS resulted in competitive performance compared to other inference methods. Among the four acquisition functions, nPSP resulted in the highest performance on EcoCyc (0.8874) while random sampling resulted in the poorest. Interestingly, full was on par with random sampling, reinforcing the idea that BioCyc T2 &3 contains noise that may hamper proper estimation of leADS coefficients. On CAMI low complexity metagenomes ( [24]), nPSP outperformed other methods (0.6214) Fig. 1B. Based on these results we recommend using nPSP with g = 10 and k = 50 settings for optimal leADS performance. Extensive experimental analysis can be found in Supp. Sections A.4 and A.5.

Conclusion
leADS is a novel multi-label ensemble-based approach for hard example mining that constructs a set of diverse multi-label base learners to jointly improve subselection of examples and overcome class imbalance during metabolic pathway prediction from genomic sequence information at different levels of complexity and completion.

A Supplementary Material
This material is divided into five parts: i)-the problem definitions (Section A.1), ii)-the leADS framework (Section A.2), iii)-optimization and prediction (Section A.3), iv)-experimental settings (Section A.4), and v)-empirical analysis (parameter sensitivity, scalability to the ensemble size, and metabolic pathway prediction effectiveness) (Section A.5).

A.1 Definitions and Problem Formulation
Here the default vector is considered to be a column vector and is represented by a boldface lowercase letter (e.g., x) while the matrices are represented by boldface uppercase letters (e.g., X). If a subscript letter i is attached to a matrix, such as X i , it indicates the i-th row of X, which is a row vector. A subscript character to a vector, x i , denotes an i-th cell of x. Occasional superscript, x (i) , suggests an index to an example or current epoch during a learning period. With these notations in mind, we introduce information integral to the problem formulation, starting by defining the multi-label data.
Definition A.1. Multi-label Pathway Dataset [20]. A pathway dataset is represented by is a vector indicating the abundance information corresponding to enzymatic reactions. An enzymatic reaction is denoted by c, which is an element of a set E = {c 1 , c 2 , ..., c r }, having r possible enzymatic reactions, hence, the vector size x (i) is r. The abundance of an enzymatic reaction for an example i, say c is a pathway label vector of size t representing the total number of pathways derived from a set of universal metabolic pathway Y. The matrix form of x (i) and y (i) are X and Y, respectively.
Both E and Y can be retrieved from trusted sources, such as KEGG [12] or MetaCyc [3]. Although the input space is assumed to be encoded as r-dimensional vector, symbolized as X = R r , through features engineering it can be represented as X = R d .
Problem Statement 1. Given a multi-label dataset, S A , the goal is to select a subset of S A , denoted by S per% , where per% is a prespecified hyperparameter, indicating the proportion of examples to be chosen from S A , such that learning on S per% incurs similar predictive score (or better) as if it was trained on full multi-label dataset, S A . , where each is trained on a randomly selected portion of the training set. Next, leADS applies an acquisition function (d), based on either: entropy, mutual information, variation ratios, or normalized propensity scored precision at k, to select per% sub-samples. Then, leADS performs parallel training steps (e). The process (b-e) is repeated τ times (f), where during each iteration few examples from per% are discarded at random (g) to give chance to examples that were not selected in per% to be used for the next subsequent round for training. If the current iteration q reaches a desired number of rounds τ , training is terminated and the final model and per% results are presented (h).

A.2 The leADS Method
In this section, we provide a description of leADS components including: i)-building an acquisition model, ii)-active dataset sub-sampling, and iii)-learning using the reduced sub-sampled data. These three steps interact with each other in an iterative process as illustrated in Fig. 2. At the very first iteration, a set S 0 per% is initialized with randomly selected data ( Fig. 2a-b). In the next iteration q, instead of re-initializing S q per% with randomly selected examples, S q−1 per% data collected from the previous iteration q − 1 is used, constituting a build-up scheme implemented in many active learning methods [5,10]. This process is repeated until the maximum number of rounds τ is reached.

A.2.1 Building an Acquisition Model
Given S A , the objective of this step (Fig. 2c) is to estimate posterior predictive uncertainty given a new test example x * for a pathway y j as: where Θ ∈ R t×r denotes pathway's parameters. Notice that Eq A.1 involves marginalization over Θ j parameters, which is hard to compute [22]. One way to mitigate this issue is to approximate the above equation using Monte Carlo (MC) techniques [16] by constructing an ensemble, denoted by E, which consists of g(∈ Z ≥1 ) models ( Fig. 2c) where each generates multiple examples according to the following formula: where, is sampled from q(Θ s ) which is considered to be in the same family distribution as the true hidden variables p(Θ s j |S A ). The parameters Θ s for the s-th model can be estimated according to the multi-label 1-vs-All approach [32].

A.2.2 Subsampling Dataset
During this step (Fig. 2d), a subset of S A , denoted as S q−1 per% ⊆ S A , is picked for each member in E using an acquisition function f : x → R where per% is a pre-specified threshold, indicating the proportion of examples to be chosen from S A , at iteration q − 1.
Four acquisition functions used in subsampling are described that incorporate predictive uncertainty distribution from the previous step: entropy, mutual information, variation ratios, and normalized PSP@k. For each function, we retrieve top per% examples that contain high acquisition (or uncertainty) values. [25]. This function measures the uncertainty of an example given the predictive distribution of that example:

Entropy (H)
where p is a vector of predictive probabilities over t pathways.
2. Mutual information (M) [28]. This function looks for low mutual information between g models, encouraging examples with high disagreement to be selected during the data acquisition process: where H s denotes the entropy obtained from an individual member of E for an example before marginalization. Since entropy is always positive, the maximum possible value for M is H. However, when the models make similar predictions, then 1 e s=e s=1 H s → H, resulting in M → 0, which is its minimum value ( [5]). Note that this formula is similar to multi-label negative correlation learning ( [26]), which estimates pairwise negative correlation of each learner's error with respect to errors of other members in E.

3.
Variation ratios (V) [7]. This function measures the number of members in E that disagree with the majority vote for an example according to k desired pathway size, where larger values indicate higher uncertainty: where V corresponds the disagreement of k pathways across g models, where k ∈ Z >0 is a prespecified number of pathways to be considered in computing the mode operation.
4. Normalized propensity scored precision at k (nPSP@k). This is a modified version of PSP@k [11], which measures the average precision of top k relevant pathways given an instance i where larger values indicate less uncertainty: where Norm(.) scales the score within [0, 1]. The term p is a vector of predictive probabilities over t pathways, rank k (p) returns the indices of k largest value in p, ranked in a descending order, where k ∈ Z >0 is a hyperparameter. ps j is the propensity score for the j-th pathway, where n j is the number of the positive training instances for the pathway j. In the context of extreme multi-label problems, PSP@k was used to derive an upper bound for missing/miss-classified labels [30], and is reported to be the best suited metric for long-tail distribution in which a significant portion of labels are tail labels [23,2].

A.2.3 Train on the Reduced Dataset
During this step (Fig. 2e), each member in E is assigned to train on randomly selected examples from S q−1 per% , which is expected to contain hard examples that are difficult to learn and classify. This process is repeated τ times (Fig. 2f), where during each iteration the top per% are selected based on their acquisition values for the next round of training. However, leADS discard few examples (γ ∈ (0, 1)) from S q−1 per% to increase coverage of information from all examples for the next round (Fig. 2g).

A.3 Optimization and Prediction
The objective function in Eq. A.2 can be solved by decomposing into t independent binary classification problems according to the multi-label 1-vs-All approach enabling parallel training. Consider optimization for a member s: where ||.|| 2 2,1 is the L 2,1 regularization term, which is the sum of the Euclidean norms of columns of Θ. The L 2,1 norm imposes sparsity on the model's parameters to minimize the negative effect of label correlations, where λ(∈ R >0 ) is employed to govern relative contributions of L 2,1 and the log-loss term. Although the joint formula in Eq A.7 is convex, the logistic log-loss function still posses a problem where there exists no analytical solution for it. To address this problem, we apply mini-batch gradient descent [18], which begins with some initial random guess for leADS parameters, and performs iterative  1  510  510  1  510  510  2182  2182  1  1034  1034  0.2337 Arabidopsis  thaliana  EcoCyc  1  307  307  1  307  307  1134  1134  1    updates to each individual parameter to minimize Eq. A.7 where the derivative for each Θ s j ∈ Θ s has the following formula: For prediction, we apply a cut-off threshold ξ ∈ R ≥0 to retain only pathways having higher probability values than ξ, i.e., L(x) = {j : p(y j = +1|x, Θ s j ) ≥ ξ, ∀j ∈ t, ∀s ∈ g}, where p(y j = +1|x, Θ s j ) = 1 1+e −Θ s, j x (i) .

A.4 Experimental Setup
In this section, we describe an experimental framework used to demonstrate leADS pathway prediction performance across multiple datasets spanning the genomic information hierarchy [20]. leADS was written in the Python programming language (v3). Unless otherwise specified all tests were conducted on a Linux server using 10 cores of Intel Xeon CPU E5-2650.
3. Distinct pathway labels (DL(S)). This notation indicates the number of distinct pathways in S.

A.4.2 Parameter Settings
We used pathway2vec [19] to obtain pathway and EC features using "crt" as the embedding method with the following settings: the number of memorized domain was 3, the explore and the in-out hyperparameters were 0.55 and 0.84, respectively, the number of sampled path instances was 100, the walk length was 100, the embedding dimension size was m = 128, the neighborhood size was 5, the size of negative examples was 5, and the used configuration of MetaCyc was "uec", indicating links among ECs were trimmed. The obtained features were used to leverage correlations among ECs and pathways for training leADS (see Section A.4.3). We then trained leADS using the following default settings (unless otherwise mentioned): the learning rate was 0.0001, the batch size was 50, the number of epochs was 3, the number of models was g = 3, the proportion of examples (per%) to be selected was 30%, the rate of examples to be discarded was γ = 0.1, the number of subsampled pathways for each member was 500, and the cutoff threshold for predictions was 0.5. For the regularized hyperparameter λ, we performed 10-fold cross-validation on BioCyc T2 &3 data and found the settings λ = 10 to be optimum according to results obtained on golden T1 and CAMI datasets.

A.4.3 Incorporating enzyme features
We applied the RUST-norm ("crt") random walk method from pathway2vec [19] that uses a unit-circle equation to obtain enzyme features with settings provided in Section A.4.2. Then, we use enzyme features to concatenate each example i according to: where ⊕ indicates the vector concatenation operation, E ∈ R r×m corresponds the feature matrix of enzymes and m = 128. The addition of features results in a dimension of size r + m, where r = 3650. We expect by incorporating enzymatic reaction features into the original r dimensional example x (i) , the modifiedx (i) summarizes informative characteristics, which are expected to be useful in pathway prediction.   Fig. 3a while the effect of four acquisition functions and random sampling by varying sample size according to per% ∈ {30%, 50%, 70%} is shown in Fig. 3b. results obtained from the previous experiment. All other hyperparameters, were set according to the configurations described in Section A.4.2 and results were reported using average F1 scores. Experimental results. Fig. 3a shows the impact of k for both variation ratios and nPSP acquisition functions. Although both functions have similar disagreement metrics, the optimum performance for variation ratios is at k = 15 while the optimum for nPSP is at k = 40. This discrepancy in k values likely results from the effects of subsampling pathways and examples that are allocated randomly to each member in E. After several rounds of experiments, we found k = 50 to be optimum for both variation ratios and nPSP. Next, we examined the effect of per% on leADS's performances using four acquisition functions and random sampling, where we fixed k = 50 for variation ratios and nPSP. From  Fig. 3b, it is evident that leADS performance generally improves by including more examples for each acquisition function, although the entropy function resulted in a marginal improvement. In contrast, random sampling had no performance benefit across the sample size range tested. In summary, this experiment suggests to consider using per% = 70% for any configurations and k = 50 for both variation ratios and nPSP.

A.5.2 Scalability to the Ensemble Size
Experimental setup. In this section, time complexity of training was determined when the model size varied as g ∈ {1, 2, 3, 5, 10, 15, 20, 50}, simultaneously. Performance was evaluated on the CAMI dataset as described above using the average F1 score metric for each configuration of g. per% was set to 30% of BioCyc v21 T2 &3 data for training under the four acquisition functions. In the case of random sampling, leADS was trained on 30% of randomly selected BioCyc v21 T2 &3 data. Performance was expected to improve proportionally to the member size in E of more members in E improving leADs performance. Although random sampling observed to have the lowest computational time complexity, nonetheless, this method resulted the lowest performance according to the average F1 metric (Fig. 4b). Among the four acquisition functions, variation ratios required an additional mode operation, contributing to increased training time. Based on these results, we suggest to set the member size between [3,10] ∈ Z >0 while increasing pathway subsampling size accordingly (e.g. 2000 for 10 members) to improve prediction outcomes and reduce both computational complexity (training and inference) and model size.

A.5.3 Metabolic Pathway Prediction
Experimental setup. In this section, pathway prediction performance was evaluated using parameter settings described in Section A.4.2. Three training configurations were tested: i)-per% = 70% under the four acquisition functions, ii)-random sampling corresponding to 70% of BioCyc T2 &3 selected at random, and iii)-full configuration where all BioCyc T2 &3 data were utilized without subsampling. After training, pathway prediction results were reported on golden T1 data using four evaluation metrics: Hamming loss, average precision, average recall, and average F1 score. leADS performance was compared to three pathway prediction algorithms: i)-MinPath v1.2 [31], ii)-PathoLogic v21 [13]; and iii)-mlLGPR [20] on the T1 data. In addition, we compared leADS performance to other methods on multi-organismal datasets including symbiont, CAMI low complexity and HOTS datasets. For all experiments, the number of epochs was 10, the member size was g = 10, the subsampled pathway size was 2000, and k was 50 (for variation ratios and nPSP). See Section A.4.2 for additional configuration settings.
Experimental results. As shown in Table 2, leADS resulted in competitive performance compared to other pathway inference algorithms based on average F1 scores. For each column in Table 2, a boldface number represents the best evaluation metric score while an underlined number indicates the best score between leADS variants. Among the four acquisition functions, leADS+nPSP resulted in the highest average F1 scores for EcoCyc (0.8874), HumanCyc (0.8333), and TrypanoCyc (0.6897) which are also  Table 2: Predictive performance of each comparing algorithm on 6 benchmark datasets. leADS+F: leADS with full data, leADS+R: leADS with random sampling, leADS+H: leADS with entropy, leADS+M: leADS with mutual information, leADS+V: leADS with variation ratios, and leADS+nPSP: leADS with normalized propensity scored precision. For each performance metric, '↓' indicates the smaller score is better while '↑' indicates the higher score is better. Values in boldface represent the best performance score while the underlined score indicates the best performance among leADS variances.
the highest scores among all models tested. On the other hand, random sampling achieved the poorest Figure 5: Schematic representation or distributed amino acid biosynthesis in the mealybug symbiotic system. A) indicates a simplified version of the L-lysine biosynthetic pathway from MetaCyc including locus ids and enzyme commission (EC) numbers for each step in the pathway starting from L-aspartate conversion. The specific steps encoded by Moranella (red), Tremblaya (blue) or both (purple) endosymbionts are shown as coloured circles in a simplified glyph structure based on McCuthcheon [21]. Missing steps are indicated in grey. B) Breakdown of essential amino acid biosynthetic pathways in the mealybug symbiosis using simplified glyph structures. C) Corresponding pathway prediction outcomes using PathoLogic, mlLGPR, and leADS (with random sampling, full data, and four acquisition functions).
Black circles indicate predicted pathways by associated models while open circles indicate pathways that were not recovered by models. The size of circles corresponds to pathway coverage information used in metabolic inference.
overall performance scores. Interestingly, leADS+F in Table 2 was on par with random sampling, reinforcing the idea that BioCyc v21 T2 &3 contain noisy data that hampered proper estimation of leADS coefficients. Through subsampling examples, leADS was able to reduce noise and improve the prediction performance on golden T1 data. Metabolic interactions are integral to microbial community structure and function. In some cases these interactions are related to production of public goods by a subset of community members that provision non-producing members, or through the removal of end products enabling unfavorable reactions to proceed [8]. In other cases enzymatic steps within a multi-step pathway are distributed between multiple community members resulting in emergent metabolic properties that are robust to loss of individual members [17]. To evaluate leADS performance on metabolic pathways distributed between organisms we used the reduced genomes of mealybug symbionts Moranella (GenBank NC-015735) and Tremblaya (GenBank NC-015736) [21]. Fig. 5A illustrates the distributed genes for the Lysine biosynthesis pathway. The two symbiont genomes in combination encode intact biosynthetic pathways for 9 essential amino acids ( Fig. 5B where a grey circle corresponds the missing gene by  Table 3: Predictive performance of mlLGPR and leADS on CAMI low complexity data. leADS+F: leADS with full data, leADS+R: leADS with random sampling, leADS+H: leADS with entropy, leADS+M: leADS with mutual information, leADS+V: leADS with variation ratios, and leADS+nPSP: leADS with normalized propensity scored precision. Values in boldface represent the best performance score while the underlined score indicates the best performance among leADS variances. MetaPathways software [15]). PathoLogic, mlLGPR, and leADS were used to predict pathways on individual symbiont genomes and a concatenated dataset consisting of both symbiont genomes, and resulting amino acid biosynthetic pathway distributions were determined (Fig. 5C). PathoLogic and leADS predicted 6 of the expected amino acid biosynthetic pathways on the composite genome while mlLGPR predicted 8 pathways. The L-phenylalanine biosynthesis I pathway was not inferred because the associated genes were reported to be missing during the ORF prediction process while Chorismate biosynthesis I was not incorporated during training (Fig. 6). All models inferred false positive pathways for individual symbiont genomes (Moranella and Tremblaya) despite reduced pathway coverage information (mapping enzymes onto associated 9 amino acid biosynthetic pathways) relative to the composite genome (Fig. 7). Although it is possible for leADS to reduce type I error by incorporating taxonomy-based predictions using rules, such pruning can also increase false-negative (type II error) pathway predictions in multi-organismal datasets [9].
To evaluate performance on more complex multi-organismal genomes we compared leADS to mlL-GPR using the CAMI low complexity dataset [24] and to PathoLogic and mlLGPR using the HOTS dataset [29]. In the case of CAMI, leADS+nPSP outperformed other methods resulting in an average F1 score of 0.6214 (Table 3). In the case of HOTS, leADS+R, leADS+F, leADS+H, leADS+M, leADS+V, and leADS+nPSP predicted a total of 60, 67, 63, 68, 67, and 68 pathways among a subset of 180 selected water column pathways [9], while PathoLogic and mlLGPR inferred 54 and 62 pathways, respectively (Figs 8, 9, 10, and 11). These observations indicate that leADS with subsampling improves pathway prediction outcomes by reducing training loss due to the class-imbalance problem in BioCyc v21 data. Based on these results we recommend using nPSP with g = 10 and k = 50 settings for optimal leADS performance.

Availability of Data and Materials
The leADS source code is available under the GNU License at github.com/hallamlab/leADS. A wiki, including a tutorial, is available at github.com//hallamlab/leADS/wiki. Figure 6: Comparison between Mccutcheon and colleagues [21] reported distributed genes in 9 amino acid pathways in the Candidatus Moranella endobia and Candidatus Tremblaya princeps genomes with MetaPathways v2.5 [15]. Circles represent detected presence of the pathway enzymes in Moranella (red), Tremblaya (blue), both genomes (purple), or missing (grey). Figure 7: Comparative study of predicted pathways for symbiont data between PathoLogic, mlLGPR, and leADS (with random sampling, full data, and four acquisition functions). Filled red, blue, and purple circles represent detected presence of the pathway in Moranella, Tremblaya, and both genomes, respectively. The size of circles corresponds the pathway coverage information. Figure 8: Comparative study of predicted pathways for HOTS 25m dataset between PathoLogic, mlLGPR, and leADS (with random sampling, full data, and four acquisition functions). Black circles indicate predicted pathways by the associated models while grey circles indicate pathways that were not recovered by models. The size of circles corresponds the pathway abundance information. Figure 9: Comparative study of predicted pathways for HOTS 75m dataset between PathoLogic, mlLGPR, and leADS (with random sampling, full data, and four acquisition functions). Black circles indicate predicted pathways by the associated models while grey circles indicate pathways that were not recovered by models. The size of circles corresponds the pathway abundance information. Figure 10: Comparative study of predicted pathways for HOTS 110m dataset between PathoLogic, mlL-GPR, and leADS (with random sampling, full data, and four acquisition functions). Black circles indicate predicted pathways by the associated models while grey circles indicate pathways that were not recovered by models. The size of circles corresponds the pathway abundance information. Figure 11: Comparative study of predicted pathways for HOTS 500m dataset between PathoLogic, mlL-GPR, and leADS (with random sampling, full data, and four acquisition functions). Black circles indicate predicted pathways by the associated models while grey circles indicate pathways that were not recovered by models. The size of circles corresponds the pathway abundance information.