Multiomics and quantitative modelling disentangle diet, host, and microbiota contributions to the host metabolome

Dietary nutrients, host metabolism, and gut microbiota activity each influence the host’s metabolic phenotype; however, the interplay between these factors remains poorly understood. We employed tissue-resolved metabolomics in gnotobiotic mice carrying a synthetic human gut microbiota and germfree mice in two dietary conditions to develop an intestinal flux model that quantifies diet, host, and bacterial contributions to the levels of 2,700 intestinal metabolites. While diet was the main factor affecting metabolite profiles, we identified 1,117 potential microbial substrates and products in the gut. By integrating metagenomics and metatranscriptomics data into genome-scale enzymatic networks, we linked 202 potential substrate-product pairs by a single enzymatic reaction. We further identified bacterial species and enzymes that can explain the differential abundance of 13% of the identified microbial products between the mouse groups. This quantitative modelling approach paves the way for controlling an individual’s metabolic phenotype by modulating their gut microbiome composition and diet.


Introduction
The gut microbiota comprises hundreds of microbial species residing along the intestine 1 , which come into contact with dietary and host-produced compounds that transit through the gastrointestinal tract during digestion. Although diet rapidly and reproducibly changes microbiota composition 2,3 , many microbial species are repeatedly detected in the gut for months, years and decades, indicating that microbes can adapt their metabolism to changing nutrient environments [4][5][6][7][8] . Some metabolic changes are restricted to the gut, with potential impacts on other microbes and gut epithelial cells 9,10 . Other microbe-derived metabolites are absorbed from the intestine and distributed systemically through blood circulation so that they can be detected in serum 11,12 and peripheral organs 13 , such as liver and brain [14][15][16] . Microbiota-produced metabolites span a broad range of chemical classes 17,18 , including amino acids and peptides, fatty acids and lipids 9,19 , glycans 20,21 , simple and complex polysaccharides 22 , bile acids 23 , hormones and neurotransmitters 14 .
Metabolites of different origin across tissues collectively define a host metabolic phenotype that can impact diverse aspects of health and disease risk.
Understanding the diet-microbiome relationship is essential for efforts to design dietary, microbial, or other interventions aimed at a desired beneficial metabolic phenotype. With the increasing availability of microbiome and metabolome data in humans, it becomes possible to associate metabolite variation to microbial abundance [24][25][26] . However, mechanistic insights into microbe-metabolite links are challenged by many confounding factors, including variations in host genotype, diet, environmental exposure, and others 27,28 . Although it is in principle possible to modulate the gastrointestinal and circulating metabolite pools through diet and microbiome interventions [29][30][31][32] , it remains challenging to predict these interactions because the contributions of diet, host, and microbiota to metabolite levels remains mostly undefined.
Here we combine gnotobiotic mouse experiments with dietary interventions and multiomics data integration to disentangle diet, microbiota and host contributions to the metabolic profiles in the host. Using six complementary mass spectrometry protocols, we measured 57,340 metabolic features corresponding to 4,649 annotated metabolites and classified them based on their distribution profiles across tissues. We developed a coarsegrained intestinal metabolic flux model, which quantifies diet, host and microbiome contributions to metabolite abundance in the gastrointestinal tract (GIT). We further explore metagenome-wide enzymatic reaction networks to identify candidate metabolite substrate-product pairs and bacterial enzymes potentially responsible for their conversion.
This approach for integrating spatial metabolomic measurements with quantitative mathematical modelling and multiomics integration can be applied across diets, perturbations, host genetic backgrounds and microbiome compositions to ultimately identify decisive factors affecting metabolite levels and inform rational design of microbiota therapies.

Diet influences both microbiota composition and gene expression.
To disentangle relationships between diet, host and microbiota, we conducted a controlled diet and microbiota experiment in a gnotobiotic mouse model. We colonized germfree C57BL/6 mice with a synthetic, defined community of 18 genome-sequenced human gut bacteria that represent four major phyla common in the human gut and were previously successfully used in synthetic communities 29,33 . This community encodes a diverse metabolic potential, including breakdown of complex polysaccharides, amino acid fermentation, and acetogenesis 29 sequencing reached an average depth of 2.5 mio reads (~750 Mb) and 5 mio reads (~1,500 Mb) across samples, respectively. This allowed us to stably recover reads unique to 14 of the 18 species from the defined community with a high coverage of bacterial genomes (average of 90% (metagenomics) and 80% (metatranscriptomics) of genes); missing species were below detection at our limit of resolution (<0.25%, Figure 1b), and were likely outcompeted by the other community members. We also recovered sequences that mapped to Lactococcus lactis, which was not included in the defined community but is known to be present but nonviable in autoclaved casein-rich diets 35 (Supplementary  Table 3 Taken together, our controlled diet and microbiota experiment provides high coverage data for microbiota abundance and gene expression, and establishes that previously reported diet-specific changes in microbiota composition are recapitulated in this model system.

Diet induces transcriptional changes in all bacterial species in the community.
Differential analysis of transcriptional profiles of the 14 stably detected species revealed that all species changed their gene expression between the two diets, with some species differentially expressing up to 10% of their genes (abs(log2(fold change)) ≥ log2(1.5),  Table 5). Fatty acid biosynthesis and metabolism genes were upregulated on HCD compared to HFD by representatives of Bacillota, suggesting that on HCD, these species need to synthesize fatty acids that they may be able to obtain directly from the HFD. This hypothesis is supported by previous reports that representatives of Bacillota, and specifically Clostridiaceae, are capable of uptake and utilization of longchain fatty acids 36,37 . On HFD, most species upregulated amino acid biosynthesis pathways, possibly indicating that amino acids are less available for uptake in HFD compared to HCD. Most species also upregulated genes involved in thiamine metabolism ( Figure 1f), which might reflect the increased need to synthesize thiamine on HFD compared to HCD. Notably, thiamine is supplemented to the latter diet ( Supplementary   Figure 1c). Consistent with previous reports 38 , we observed Bacteroidota-specific regulation of glycan degradation on HFD, suggesting that these bacteria shift from plant glycan to mucosal glycan foraging on HFD. Together, these results demonstrate an orchestrated general microbial response to the two different diets, and point out some phyla-specific adaptations linked to fatty acid, polyketide sugar unit or glycan metabolism.

Levels of metabolites in the host are tissue-, diet-and microbiota-dependent.
Next, we investigated the impact of diet and microbiota on metabolite levels across different tissues, employing six complementary protocols for untargeted metabolomics analysis. Altogether, we detected 57,340 unique metabolic features, 4,649 of which we could annotate using the Kyoto Encyclopedia of Genes and Genomes (KEGG) 39

Spatial profiles of metabolites across tissues are consistent across diet-and microbiota-conditions.
To identify sources of different metabolites, we next set out to investigate tissue-specific metabolite profiles starting with the GIT. For each experimental condition, we normalized metabolite intensities across the six sections of the GIT and performed unsupervised kmeans clustering of the normalized metabolite profiles. For each of the four conditions, the clustering revealed six distinct clusters characterized by peak metabolite intensity in one of the six sections of the GIT (Figure 2c, Supplementary Figure 4a, b). These clusters are linked to host physiology; for example, the amino acid tyrosine is assigned to Cluster 1 with peak intensity in duodenum, consistent with its absorption from the small intestine as a dietary nutrient (Figure 2d). Androstanediol glucuronide, assigned to Cluster 2 with peak intensity in jejunum, and taurolitocholate, assigned to Cluster 3 with peak intensity in ileum, are known host metabolites produced in the liver, and are secreted to the small intestine via the biliary duct. A carnitine derivative isovalerylcarnitine assigned to Cluster 4, a fatty acyl cohibin assigned to Cluster 5, and cholesterol derivative 4a-Carboxy-5acholesta-8-en-3b-ol assigned to Cluster 6, with peak intensities in cecum, colon and feces, Metabolites detected in serum and liver were highly overlapping: 833 metabolites were detected in both tissues, corresponding to 67% of serum and 48% of liver metabolites (Supplementary Figure 3a). By contrast, serum and liver samples shared between 49% and 37% of metabolites with GIT samples, with the number of shared metabolites decreasing from ileum to feces as expected due to the decrease in intestinal uptake, and thus exchange of metabolites between intestinal lumen and serum, from ileum to feces.
Metabolites belonging to chemical classes of glycerophosphoethanolamines and terpene glycosides were shared between liver and serum, but not GIT, while o-methylated flavonoids, steroidal glycosides, and glycerophosphoserines were either serum-or liverspecific independent of diet or mouse colonization state (Supplementary Figure 4c).
Together, these results demonstrate that metabolite profiles across tissues reflect expected host physiology, and that the experimental parameters (germfree or defined community, HCD or HFD) do not globally disrupt the spatial distribution of metabolites in host tissues.

Development of intestinal flux model to assess sources of metabolites in the GIT.
While tissue distributions of metabolites were consistent across the four mouse groups, we next set out to compare metabolite levels in each tissue across conditions. Both diet and microbiota colonization affected the levels of several hundred metabolites, with the largest number of differentially abundant metabolites detected in the large intestine Because metabolite profiles were generally consistent with expected host physiology, we reasoned that they should reflect diet, host, and microbiota contributions to metabolite levels in the GIT, and set out to establish a coarse-grained intestinal flux model that describes these processes. In each GIT compartment, we described the changes in Since the spatial metabolite profiles were highly similar across these ad libitum-fed animals, we assumed that the measurements reflect a pseudo-steady state, i.e.,

Intestinal flux model distinguishes sources of metabolite abundance in the GIT.
To quantify the relative contributions of diet, host and microbiota to a metabolite profile in the GIT, we estimated the model parameters and normalized the values of the eight condition-dependent metabolic flux parameters to the largest absolute value. While we did not explicitly include diet as a factor in our model, we reasoned that metabolites whose primary source is diet are characterized by large negative host metabolism parameters in the small intestine (f_SI_host), corresponding to host consumption or absorption in this tissue. Metabolites most influenced by host production are characterized by large positive host metabolism parameters in the small and/or large intestine (f_SI_host, f_Cecum_host, f_Colon_host), whereas metabolites whose levels are affected by microbiota production will have large positive microbiota metabolism parameter values (f_LI_microbe).
To test this approach, we first examined metabolites whose sources are known. For example, the major contributors to the profile of the amino acid glutamate are the negative host metabolism parameters in the small intestine (Figure 3d), suggesting that diet is the main source of glutamate in the GIT. The model also estimates negative bacterial metabolism parameters in the large intestine, suggesting that bacteria consume residual glutamate from the GIT. Propionate provides a second example: according to the model, its profile is mainly affected by bacterial production (positive bacterial metabolism parameter on both diets (Figure 3d)), consistent with reported production of propionate by the gut microbiota [41][42][43] . In a third example, the profile of L-octanoylcarnitine is influenced by host production in the large intestine, consistent with it being a product of fatty acid oxidation by the host. Similar to glutamate, negative microbial metabolism parameters indicate that bacteria consume this metabolite in the large intestine ( Figure 3d). For each of these metabolites, metabolite profiles restored by solving the reverse problem highly correlated with the original metabolite measurements (PCC ≥ 0.93, Figure 3e).
Together, these results demonstrate that the coarse-grained intestinal flux model can disentangle relative contributions of diet, host and microbiome to the metabolite profiles, which we next set out to systematically assess for all metabolites detected in the GIT.
Diet is the major factor affecting metabolite profiles in the GIT, followed by host and microbiota metabolism.
We applied the intestinal flux modelling approach to all 3,716 annotated metabolites detected in the GIT. For 2,700 (73%) of these metabolites, the identified parameters could reproduce measured metabolite levels in the reverse problem (PCC ≥ 0.7). Random shuffling of these parameters removed this correlation (Figure 4a).
To get an overview of the predicted factors contributing to each of the metabolite profiles, we performed hierarchical clustering of the normalized values of the eight metabolic flux parameters, which allowed us to define 12 distinct metabolite groups characterized by the main contributing parameter (Figure 4b). Thus, Groups I and II contain metabolites with a high host metabolism parameter in the colon on HFD or HCD, respectively; Group III contains metabolites with a high negative host metabolism parameter in the small intestine on both diets; whereas Groups X and XI contain metabolites with a high positive microbial metabolism parameter on HCD and HFD ( Figure   4b). Group III characterized by the negative host metabolism parameter in the small intestine is the largest of these groups, containing 38% (1,028/2,700) of the modelled metabolites; this suggests that diet is the major factor affecting metabolite profiles in the GIT. About one third of the analysed metabolites are potentially produced by the host, since they belong to the clusters with high host metabolism parameters in the small intestine (Groups IV and XII, in total 5%, or 135/2,700 metabolites), or in the large intestine (Groups I, II, VIII and IX, in total 28%, or 768/2,700 metabolites). We found that 9% of metabolites (244/2,700) belong to the clusters representing models in which microbial metabolism was the highest contributing parameter (Groups X and XI), suggesting that these metabolites are produced by (or in response to) gut bacteria.
To systematically characterize metabolites affected by different factors, we performed from Group X (characterized by a high microbial metabolism parameter) were enriched in organic acids, benzenoids, carbonyl compounds and polyketides, which is in line with prior reports of microbiota-associated compounds [44][45][46][47] .
This approach focuses on the largest model-identified parameter to highlight primary factors contributing to a metabolite profile in the GIT. However, we observed that for many metabolites (including glutamate and L-octanoylcarnitine), for which host parameters have the largest value, microbiota parameters can also have large values, indicating that both host and microbiota can jointly influence the profiles of these metabolites. Therefore, we decided to investigate metabolites whose profiles are potentially affected by microbiota activity in more detail.

Linking microbiota-associated metabolites with bacterial activities across conditions.
To investigate potential microbiota contributions to the metabolite profiles in the GIT, we inspected microbial metabolism parameters estimated by the intestinal flux model. We defined potential microbial substrates as those for which the normalized microbial metabolism parameter is ≤ -0.5 (a negative value that is at least 50% of the absolute value of the largest parameter for that metabolite), and potential microbial products with microbial metabolism parameters ≥ 0.5. In total, 547 metabolites in the GIT were defined as potential microbial substrates, and 570 as potential microbial products (Figure 5a). To test whether potential substrates and products can be linked by metabolic activity, we searched for enzymatic reactions that connect them in the KEGG database 39 (Figure 5b).
We found 202 potential substrate-product pairs that can be connected by a single enzymatic reaction, and 1,503 pairs and 5,263 pairs that can be connected by two or three subsequent enzymatic reactions, respectively, suggesting that some of the identified substrate-product pairs can reflect microbial metabolic activity in the gut (Supplementary Table 9). Further, the levels of 21 (out of 547) potential substrates and 27 (out of 570) potential products were also significantly different between colonized and germfree mice in serum and/or liver, while in total 20% of systemic metabolites were affected by mouse colonization state (294/1,241 metabolites in serum and 160/1,733 metabolites in liver, Table 7).
To investigate whether the levels of these metabolites can be directly linked to microbial activity, we performed correlation analysis between metabolite levels and relative species abundance, gene abundance detected with metagenomics, and gene expression detected with metatranscriptomics, taking advantage of the natural variation between individual animals. For the potential substrate-product pairs connected by a single enzyme, the correlation between product abundance and enzyme expression was the highest, followed by enzyme gene abundance and relative species abundance (median PCC = 0.34, 0.20 and 0.00, respectively, Figure 5c), suggesting that enzyme expression detected by metatranscriptomics can better explain the observed variation in metabolite levels than metagenomic measurements.
To identify which members of the synthetic community are affecting metabolite levels, we compared the correlations between the identified microbial products and the catalysing enzyme for each species. Among 66 potential products for which a potential substrate and bacterial enzyme is known, enzyme expression of at least one species significantly correlated with the levels of 13% (9/66) products (PCC p-value ≤ 0.1). We observed both general and species-specific correlations between products and catalysing enzymes. For example, enzyme expression of several species across phyla correlate with the abundance of taurine, suggesting that several species express bile salt hydrolases to cleave taurine from bile acids, as previously reported 48  This analysis identifies potential microbial substrates and products in the gut, which constitute 30% of detected metabolites (1,117/3,716), and suggests 202 direct metabolic links between 70 substrates and 66 products. Integrating metagenomics and metatranscriptomics data pinpoints candidate bacterial species and metabolic enzymes whose levels show significant associations with 13% (6/99) of potential microbial products, 50% of which (3/6) are also more abundant in systemic circulation of colonized mice. This approach can be generally applied to highlight metabolites most likely to be impacted by interpersonal differences in microbiota composition both in the GIT and the systems level.

Discussion
Separating diet, host and microbiome contributions to metabolite levels is a challenging task, especially since host and microbes share many metabolic capacities. Physiologybased pharmacokinetic models provide one strategy to disentangle host and microbiome contributions to metabolism of xenobiotics such as medical drugs 49,50 . However, this approach relies on drug and metabolite profiles collected across mouse tissues over a time-series following drug administration. For metabolites that may originate from diet, host, or microbiome, time-series measurements are not feasible to obtain because host and microbial metabolites are continuously produced. In this study, we found that spatial bacterial metabolism parameters as potential bacterial substrates or products, this approach cannot distinguish microbiome consumption or production from microbiome-dependent changes in the host consumption or production of these metabolites. We demonstrated that combining the modelling approach with other types of data reflecting microbial metabolic activity, such as metagenomics and metatranscriptomics, highlights metabolites that are more likely to be directly metabolized by the microbes.
Metatranscriptomics data has been underutilized in microbiome research due to experimental and computational challenges along with high processing costs 51 . Some studies comparing metagenomics and metatranscriptomics data from the same samples suggest that expression data may not provide much additional value to investigate microbiome function 29,52 . However, while metatranscriptomics and metagenomics estimates of species abundances are generally consistent in our study, we observed large discrepancies between relative RNA and DNA abundances in certain species (Supplementary Figure 2d). While we cannot exclude a technical explanation of this observation such as differences in RNA extraction efficacy or RNA degradation, our data suggests that at least for some species, metatranscriptomic and metagenomic contents of a microbial community can substantially vary. Further, our approach to integrate metatranscriptomics and metabolomics data based on the knowledge of enzymatic reactions connecting potential bacterial substrates and products demonstrated multiple cases in which the species-metabolite correlations produced from metagenomic data (species or gene abundances) were distinct from the correlations produced from metatranscriptomic data (enzyme expression). Notably, correlations of metabolite levels with gene abundances estimated from metagenomic data were distinct from correlations with species abundances, likely reflecting differences in genome replication. These results underline that while levels of some metabolites can be related to microbial species abundance, for others gene abundance is required to explain changes in metabolite levels, while in many cases information about enzyme expression best explains the differences in metabolite levels.
Finally, although our modelling and multi-omics integration framework mainly focussed on metabolites in the GIT, we found that many of these metabolites were detected in systemic circulation and affected by microbiota presence. This modelling approach can be generally applied to disentangle different factors affecting metabolite profiles in the host in other systems with synthetic or native microbial communities and across a range of diets, host genotypes, disease states, and other conditions. Understanding the relative contribution of different factors to metabolite levels will provide crucial information for the design of targeted intervention strategies and combinational therapies to modulate the host metabolic phenotype and improve human health and dietary response.          Schematic representation for potential substrate-product pairs analysis, where enzymatic paths connecting potential substrates to potential products were extracted from the KEGG database, and the corresponding enzyme abundance and expression was calculated from metagenomics and metatranscriptomics data, respectively. c. Distributions of PCC between potential bacterial products and either of the three types of microbiota measurements: relative species abundance, gene abundance of the associated KEGG enzyme detected with metagenomics, or expression of the associated KEGG enzyme detected with metatranscriptomics, for all potential products connected by a single enzyme to a potential substrate. P-values correspond to nonparametric two-sided Wilcoxon signed rank test. d. Heatmap representing maximum PCC values between product levels and catalysing enzyme expression (RNA) for each species. Only potential products for which either correlation with enzyme abundance (DNA) or enzyme expression (RNA) is significant in at least one species (PCC p-value ≤ 0.1) are depicted. Gray color indicates that there is no catalysing enzyme annotated in the genome, white color indicates PCC ≤ 0.2. e. Heatmaps representing PCC values between metabolite levels and relative species abundance, gene abundance (DNA) or enzyme expression (RNA) for each detected species for selected metabolites. f,g. Normalized profiles of potential substrates and products connected by a single enzyme, and scatterplots depicting the relationships between potential product levels and either of the three types of microbiota measurements: relative species abundance, gene abundance (DNA), or enzyme expression (RNA) for selected species. P-value indicates significance of PCC (null hypothesis of non-correlation).

Acknowledgements
We thank the members of the Goodman lab and the Bork lab for helpful discussions;

Competing interests:
The authors declare no competing interests.

Gnotobiotic Animal Experiments
All experiments using mice were performed using protocols approved by the Yale University Institutional Animal Care and Use Committee in accordance with the highest scientific, humane, and ethical principles and in compliance with federal and state regulations, such as Animal Welfare Act (AWA) and the Public Health Service (PHS).
Germfree (GF) 9-16 week old C57BL/6J mice were maintained in flexible plastic gnotobiotic isolators with a 12-hour light/dark cycle and GF status monitored by PCR and culture-based methods. Before the start of the diet experiment, individually caged animals (n = 5 per group, littermates of mixed sex were randomly assigned to experimental groups) were fed a standard, autoclaved mouse chow (5K67 LabDiet, Purina) ad libitum.

Mouse Colonization Experiments
For the colonization experiment, 1 mL aliquots of synthetic community mixture were thawed on ice, and 200 µL (~10 8 CFUs) were administered to each animal by oral gavage.
On the next day after gavage, standard mouse chow was switched to either irradiated Four weeks after the start of diet administration, mice were euthanized. Liver, blood from the heart, intestinal contents separated by section (duodenum, jejunum, ileum, cecum, colon, feces) were collected. Blood was kept on ice; all other samples were frozen on dry ice immediately. Blood samples were centrifuged for 5 min at 4°C 2500 g, serum was transferred to 1.5 mL Eppendorf tubes, centrifuged again for 5 min, transferred into cryovials and stored at -80°C until further processing.

Metagenomics and Metatransciptomic Sample Preparation and Sequencing
For DNA and RNA extraction, aliquots of ~100 mg of frozen cecal samples were thawed in 1 mL of RNAProtect, resuspended, incubated at RT for 2 minutes, and pelleted by centrifugation for 1 minute at 15,000 × g at 4˚C. Nucleic acids were then purified as described 34 . Briefly, pellets were resuspended in a solution containing 250 µL of acidwashed glass beads (Sigma-Aldrich), 500 µL of extraction buffer A (200 mM NaCl, 20 mM EDTA), 210 µL of 20% SDS, and 500 µL of phenol:chloroform:isoamyl alcohol (125:24:1, pH 4.5; Ambion), and lysed in a bead beater high for 4 min (2 min / kept on ice 1 min / 2 min) (BioSpec Products). Cellular debris was removed by centrifugation (8,000 × g, 4°C, 5 min). The extraction was repeated, and the nucleic acids were precipitated with isopropanol and sodium acetate (pH 5.5). The samples were placed in a −80°C freezer for 7.5 min, then centrifuged at 12,700 rpm / 4°C / 15 min (~18,200 × g), supernatant was removed, 900 µl cold 100% ethanol was added, the samples were mixed and vortexed briefly, centrifuged at 12,700 rpm / 4°C / 5 min, supernatant was decanted, and the samples were dried in a speed-vac at ambient temperature for 5 min. The pellet was resuspended in 150 uL nuclease-free water, 100 uL of the sample was used for RNA purification, and the remaining 50 uL for DNA purification.

Metabolomics Experiments
Extraction of solid tissue samples. 200 μL of 0.1 mm zirconia/silica beads (BioSpec Products) and 500 µL of organic solvent (acetonitrile:methanol, 1:1) were added to 50-350 mg of pre-weighed solid tissue material. Material was homogenized by mechanical disruption with a bead beater (BioSpec Products) set for 2 minutes on high setting at room temperature. After incubation for at least 1 h at -20°C, samples were centrifuged (3220 rcf, -9°C) for 15 min. 10 µL of supernatant were diluted with 10 µL H2O for analysis by LC-

Metagenomics and metatranscriptomics sequencing analysis
Bacterial abundance analysis. Species abundance analysis from metagenomics and metatranscriptomics samples was performed with MetaPhlAn 2 56 . For further analysis, bacterial species (filtered by taxonomic level "|s", "unclassified" species were filtered out) which were detected in at least 5 samples with abundance above 0.01% were selected. Differential metabolite abundance analysis. For differential analysis of metabolomics data, quantile normalized metabolite intensities were compared between mouse groups in each diet condition, and between diet conditions within each mouse group with two-sided t-test with equal variances (ttest2 function in MatLab). P-values were corrected for multiple hypotheses testing by calculating the false-discovery rate (FDR) with the Benjamini-Hochberg procedure (mafdr function in MatLab).
Spatial metabolite profile clustering. For spatial metabolite profile analysis, per each mouse group and dietary condition, mean metabolite intensity per GIT tissue was calculated, and mean intensities across the six GIT sections were normalized with zscore function in MatLab. To select the optimal k for k-means cluster analysis, silhouette criterion was used in evalclusters function in MatLab (with parameters ('kmeans','silhouette', 'KList', 1:30) to test k values between 1 and 30) for each of the four conditions separately. To test robustness of cluster assignment, k-means clustering procedure for the optimal k (k=6 for each group) was performed 100 times, and final cluster assignment for each metabolite was chosen based on the most frequent cluster assignment. For serum and liver, three clusters were defined based on metabolite detection across tissues: Cluster 1 detected only in serum (or only in liver), Cluster 2 detected in serum and liver (or liver and serum), and Cluster 3 detected in serum and GIT (or liver and GIT).

Pathway and chemical group enrichment analysis
For pathway enrichment analysis, KEGG pathway-gene assignments were extracted from the EggNOG annotation tables, and only pathways annotated in at least one species were analysed. For chemical group enrichment analysis for metabolites, chemical group information was downloaded from HMDB database 40 . Enrichment procedure based on the gene set enrichment analysis (GSEA) method 66 was applied as described 67 . Significantly changing transcripts (|log2(fold change)| ≥ 1, FDR ≤ 0.05) were ranked either by fold change or by FDR, and enrichment p-values were calculated with the Fisher exact test for each subset of size varying from 1 to the total changing set size. For each pathway, the smallest p-value of all the subsets was retained, and p-values were adjusted for multiple hypotheses testing by calculating FDR with the Benjamini-Hochberg procedure. Chemical group enrichment analysis of metabolites belonging to different k-means clusters or different groups defined from hierarchical clustering of the model coefficients was performed in the same way without varying the set size.

Substrate-product enzyme path analysis
For KEGG substrate-product-enzyme network analysis, the KEGG reaction pair list, KEGG reaction pair, and KEGG compound list were downloaded from KEGG API  Supplementary Table 8.

Data and code availability
The raw metagenomics and metatranscriptomics data have been deposited in the EMBL-

List of Supplementary Tables
Supplementary Table 1. List of 18 genome-sequenced human gut bacteria with metabolic characteristics that were used for community assembly.  normalized metabolite profiles in the GIT of germfree mice consuming HFD and colonized mice consuming either HCD or HFD. c. Hierarchical clustering of the results of chemical class enrichment analysis of the metabolites belonging to each of the k-means clusters combined with clusters of metabolites detected in serum or liver only, overlapping between serum and liver, or present in serum and GIT or liver and GIT. Analysis was performed separately for each cluster in each of the four mouse groups. Only significantly enriched groups (FDR-adjusted p-value ≤ 0.1) are depicted. HFD -high-fat diet, HCD -highcarbohydrate diet, GF -germ-free mice, DC -mice colonized with the defined community.