Systems analysis of gut microbiome influence on metabolic disease in HIV and high-risk populations

Poor metabolic health, characterized by insulin resistance and dyslipidemia, is higher in people living with HIV (PLWH) and has been linked with inflammation, anti-retroviral therapy (ART) drugs, and ART-associated lipodystrophy (LD). Metabolic disease is associated with gut microbiome composition outside the context of HIV but has not been deeply explored in HIV infection nor in high-risk men who have sex with men (HR-MSM), who have a highly altered gut microbiome composition. Furthermore, the contribution of increased bacterial translocation and associated systemic inflammation that has been described in HIV-positive and HR-MSM individuals has not been explored. We used a multi-omic approach to explore relationships between gut microbes, immune phenotypes, diet, and metabolic health across ART-treated PLWH with and without LD; untreated PLWH; and HR-MSM. For PLWH on ART, we further explored associations with the plasma metabolome. Sixty-nine measures of diet, gut microbes, inflammation, and demographics were associated with impaired metabolic health defined using fasting blood markers including lipids, glucose and hormones. We found microbiome-associated metabolites associated with metabolic disease including the microbially produced metabolites, dehydroalanine and bacteriohopane-32,33,34,35-tetrol. Our central result was that elevated plasma lipopolysaccharide binding protein (LBP) was the most important predictor of metabolic disease in PLWH and HR-MSM, with network analysis of predictors showing that LBP formed a hub joining correlated microbial and immune predictors of metabolic disease. Our results suggest the role of inflammatory processes linked with bacterial translocation (measured by LBP) and interaction with dietary components and the gut microbiome in metabolic disease among PLWH and HR-MSM. Importance Statement The role of the gut microbiome in the health of HIV infected individuals is of interest because current therapies, while effective at controlling disease, still result in long term comorbidities. Metabolic disease is prevalent in HIV-infected individuals even in well-controlled infection. Metabolic disease has been linked with the gut microbiome in previous studies but little attention has been given to HIV infected populations. Furthermore, integrated analyses that consider gut microbiome composition together with data on diet, systemic immune activation, metabolites and demographic data have been lacking. By conducting a systems level analysis of predictors of metabolic disease in people living with HIV and men who are at high risk of acquiring HIV, we found that increased LBP, an inflammatory marker indicative of compromised intestinal barrier function, was associated with worse metabolic health. We also found this relationship to be associated with dietary, microbial, and metabolic factors suggesting a systemic gut microbiome influence on the presence of increased inflammatory markers which, in turn, influences the risk of metabolic disease. This work lays the framework for mechanistic studies aimed at targeting the microbiome and diet to prevent or treat metabolic endotoxemia in HIV-infected individuals.

and ART-associated lipodystrophy (LD). Metabolic disease is associated with gut microbiome 28 composition outside the context of HIV but has not been deeply explored in HIV infection nor in 29 high-risk men who have sex with men (HR-MSM), who have a highly altered gut microbiome 30 composition. Furthermore, the contribution of increased bacterial translocation and associated 31 systemic inflammation that has been described in HIV-positive and HR-MSM individuals has not 32 been explored. We used a multi-omic approach to explore relationships between gut microbes, 33 immune phenotypes, diet, and metabolic health across ART-treated PLWH with and without LD; 34 untreated PLWH; and HR-MSM. For PLWH on ART, we further explored associations with the 35 plasma metabolome. Sixty-nine measures of diet, gut microbes, inflammation, and demographics 36 were associated with impaired metabolic health defined using fasting blood markers including 37 lipids, glucose and hormones. We found microbiome-associated metabolites associated with 38 metabolic disease including the microbially produced metabolites, dehydroalanine and 39 bacteriohopane-32,33,34,35-tetrol The role of the gut microbiome in the health of HIV infected individuals is of interest because 49 current therapies, while effective at controlling disease, still result in long term comorbidities. 50 Metabolic disease is prevalent in HIV-infected individuals even in well-controlled infection. 51 Metabolic disease has been linked with the gut microbiome in previous studies but little attention 52 has been given to HIV infected populations. Furthermore, integrated analyses that consider gut 53 microbiome composition together with data on diet, systemic immune activation, metabolites and 54 demographic data have been lacking. By conducting a systems level analysis of predictors of 55 metabolic disease in people living with HIV and men who are at high risk of acquiring HIV, we 56 found that increased LBP, an inflammatory marker indicative of compromised intestinal barrier 57 function, was associated with worse metabolic health. We also found this relationship to be 58 associated with dietary, microbial, and metabolic factors suggesting a systemic gut microbiome 59 influence on the presence of increased inflammatory markers which, in turn, influences the risk of 60 metabolic disease. This work lays the framework for mechanistic studies aimed at targeting the 61 microbiome and diet to prevent or treat metabolic endotoxemia in HIV-infected individuals. The gut microbiome in people living with human immunodeficiency virus type 1(HIV) (PLWH) is 68 of interest as a potential contributor to infection, disease progression, and development of co-69 morbidities. Poor metabolic health characterized by insulin resistance and dyslipidemia is frequent 70 in PLWH (1-3) and has been linked with chronic inflammation (4-7) and several anti-retroviral 71 therapy (ART) drugs (8). Metabolic disease is particularly prevalent in HIV-positive individuals 72 with lipodystrophy (LD), a disease linked with early ART drugs that is manifested by lipoatrophy in the face, extremities, and buttocks with or without visceral fat accumulation. LD can have a 74 severe impact on the quality of life of PLWH and is associated with the development of diabetes 75 and cardiovascular disease (9). 76 77 Metabolic disease has been linked with gut microbiome structure and function outside the context 78 of HIV infection (10)(11)(12)(13)(14), but this relationship has not been explored deeply in PLWH. We and 79 others have found an altered gut microbiome composition in both PLWH (15)(16)(17) and men who 80 have sex with men at high-risk of contracting HIV (HR-MSM) (16,18). Furthermore, we have 81 demonstrated that the altered microbiome in HIV (15) and 19) are pro-inflammatory 82 both in vitro and/or in gnotobiotic mice (15,19). This is of interest as peripheral inflammatory 83 signals have been implicated in both cardiovascular disease risk (7, 20) and insulin sensitivity (4, 84 5, 21-23) in PLWH. Increased peripheral immune activation in HIV-positive individuals is driven 85 in part by bacterial translocation (24, 25), as indicated by higher levels of the bacteria product 86 lipopolysaccharide (LPS) or LPS-binding protein (LBP) in blood. Increased blood LPS levels have 87 also been observed in MSM and linked with recent sexual behavior (26). An association between 88 LBP and metabolic disease in other diseases (e.g. hemodialysis patients) has been described 89 (27), however there are mixed data regarding a role in obesity associated metabolic disease (28-90 30 We measured seven common clinical markers of metabolic health from fasting blood: 134 triglycerides, glucose, insulin, LDL, HDL, leptin, and adiponectin. Since these markers are often 135 correlated with each other, we used principal component analysis (PCA) to define a single 136 continuous measure of overall metabolic health of study participants, as has been done previously 137 ( Figure  an outcome. Values of PC1 were shifted to a minimum of one and log transformed to define the 143 metabolic disease score, which ranged from 0 as healthy and 2.5 as impaired. In order to 144 determine how this score related to metabolic health, we performed regressions between the 145 metabolic disease score and individual measures to define a metabolic score threshold that 146 corresponded with clinically defined cutoffs for normal levels (Supplemental Figure 1). For 147 example, triglycerides positively correlated with metabolic disease score and almost all individuals 148 with a score above 1.45 had triglyceride levels in the unhealthy range of greater than 200 mg/dL. 149 Similar patterns and cutoffs were true for HDL, LDL, and glucose (Supplemental Figure 1). The 150 intersect of the regression with these cutoffs were all averaged to a single number of 1.4. 151 Individuals below the cutoff were categorized as metabolically normal and those above were 152 categorized as metabolically impaired. 153 154 When comparing the metabolic disease score across cohorts, we found that ART-treated, HIV-155 positive individuals with LD trended higher in both the average metabolic disease score and the 156 proportion of individuals with scores in the metabolically impaired group but intergroup 157 percentages noted above groups are the percent of individuals with a score above our 164 metabolic impairment cutoff (Supplemental Figure 1). There is no significant difference between 165 the proportions in each group (Fischer's exact test, p = 0.11) or between mean ranks in each 166 group (Kruskal-Wallis test p = 0.13). C. Relationships between metabolic disease score and age 167 stratified by cohort. Statistical significance of slopes are indicated and were calculated with the 168 linear model: score ~ age + cohort + age*cohort. P-value annotations: ** < 0.01; * < 0.05. To explore the complex relationships of the gut microbiome, peripheral immune activation, and 181 diet to the metabolic disease score and to each other, we first selected features that were 182 important predictors of the metabolic score using the tool VSURF (Variable Selection Using 183 Random Forest) tool (35). The VSURF implementation of random forest is optimized for feature 184 selection, returning all features that are highly predictive of the response variable, even when a 185 smaller subset of highly predictive variables with redundant features removed could be just as 186 accurate for prediction (35). We input the following features into the VSURF tool: 1) 130 microbial 187 features (99% identity Operational Taxonomic Units (OTUs) with highly co-correlated OTUs 188 binned into modules as described in the methods (detailed in Supplemental  Table 4). These 69 features were sufficient to accurately 201 predict metabolic disease score using traditional random forest (linear model: r^2 = 31.05%, p < 202 0.001). Additionally, permutation testing revealed that VSURF performed better at selecting 203 explanatory variables than a null model where the outcome was randomly permuted (permutation 204 test; p = 0.049, Supplemental Figure 2). We found that 21 of the 69 selected variables were 205 positively or negatively correlated with the metabolic disease score, indicating either increased or 206 decreased risk respectively (Spearman rank correlation, FDR p < 0.1, Supplemental Table 4). 207 Since random forest can detect non-linear relationships and/or features that are only important 208 when also considering another feature, it is not surprising that all features were not correlated 209 linearly with the metabolic disease score. 210 211 All VSURF selected clinical measures were positively correlated with metabolic disease score 212 and included age, BMI, lipodystrophy, and bloating (Supplemental Table 4). None of the six 213 selected diet measures correlated with metabolic disease score (Supplemental Table 4). VSURF 214 selected several inflammatory immune measures that were positively correlated with metabolic 215 disease score: LBP, intercellular adhesion molecule 1 (ICAM-1), interleukin (IL) 16, IL-12, and 216 granulocyte-macrophage colony-stimulating factor (GM-CSF) (Supplemental Table 4). The most 217 important feature as determined by random forest importance score was LBP.  Table 5). This approach 224 revealed many within data type associations such as positive correlations within many selected 225 dietary, microbiome, and immune features. It also identified correlations between data types such 226 as a negative relationship between LBP and several Gram-negative bacteria or a positive 227 correlation between age and immune measures such as LBP or IL-6 ( Figure 3A, Supplemental 228 Table 5). 229

230
The selected important microbes included many that positively or negatively correlated with the 231 metabolic disease score (Supplemental Table 4) and that were highly co-correlated with each 232 other and with dietary, clinical/demographic, and inflammatory phenotypes (Supplemental Figure  233 3A, Supplemental Table 5). For example, a module of bacteria identified within the Prevotella 234 genus and the Paraprevotellaceae family, negatively associated with metabolic disease score 235 and positively associated with dietary fiber (Supplemental Figure 3B). Because bacterial 236 translocation is known to occur at increased levels in both HIV-positive individuals (36) and HR-237 MSM (26), we were specifically interested in looking at the selected microbes and other features 238 that correlated with LBP, a marker of bacterial translocation. The network of LBP neighbors is 239 shown in Figure 3B. All of the microbes correlating with LBP were classified to the order 240 Clostridiales. More specifically, LBP is negatively correlated with several butyrate or putative 241 butyrate producing bacteria/bacterial modules such as OTUs in the genera Coprococcus (37, 38). 242 LBP was also positively correlated with Dorea species (Supplemental Table 5). 243

244
In addition to correlations, we evaluated interactions between variables using the tool iRF 245 (iterative Random Forest)(39). These interactions represent variables that are in adjacent nodes  in a random forest tree in which the value of one influences the predictability of the other. 255 Interactions between variables in more than 30% of the trees were kept for further analysis (Figure  256 3C). This analysis identified a group of 5 interactive features: LBP, age, BMI, an OTU in the 257 Lachnospiraceae family, and microbiome module 24 (Coprococcus sp. and Blautia sp.). These 258 features were also significantly correlated with LBP and suggest a subset of features that when 259 taken together may be predictive of metabolic disease risk.  (42)) to determine which of the measured metabolites could have 271 been produced by the microbiome. Second, plasma from both germ free (GF) and humanized 272 mice was analyzed using metabolomics to determine metabolites that had significantly altered 273 levels in mice with human microbiomes compared to GF mice, i.e. microbiome influenced 274 metabolites. For this purpose, GF mice were gavaged using fecal samples from eight men from 275 the study cohort (humanized mice) while two mice were gavaged using PBS as control 276 (Supplemental Table 6). Plasma was collected before and after gavage. All mice were fed a high-277 fat western diet. 278

279
We found that 820 metabolites were different in abundance between GF and humanized mice 280 after multiple test corrections (Student's t-test, FDR p < 0.05), 493 of which were also present in 281 the human plasma samples. However, only 376 of these 493 metabolites could be annotated (see 282 Methods and Supplemental Table 7) while the remaining 148 were only assigned a mass. From 283 the full set of 5,332 metabolites identified in the human plasma, 416 were able to be annotated 284 with KEGG IDs. These were further analyzed using AMON. 146 microbiome-associated 285 metabolites were identified that are putatively produced by the gut microbiome; however, many 286 of these could also be produced by the host. Twenty-six of the 134 microbiome-associated 287 metabolites identified by AMON were also identified in the gnotobiotic mouse analysis 288 (Supplemental Table 7). 289

290
Of the 5,332 total measured metabolites in the human samples, 150 correlated with metabolic 291 score (Spearman rank correlation, FDR p < 0.05; Supplemental Table 8 Table 9). 295 Consistent with the metabolic score being defined in part by dyslipidemia, 17 of the significant 296 compounds were annotated as triglycerides. Of the 150 correlated metabolites, seven were 297 associated with the microbiome either because they were predicted microbial products of the gut 298 microbiome (as determined with AMON (40)), because they were significantly different while 299 comparing the metabolome of germ-free mice to that of mice colonized with the feces of study 300 participants, or by literature search. We confirmed the identity of 5 of the 7 of these with MS/MS 301 (Supplemental Table 8). Of these seven microbiome-associated metabolites, two could 302 exclusively be explained by direct production by the microbiome. Specifically, dehydroalanine was 303 identified as a microbial product with AMON and negatively correlated with metabolic disease 304 score (Supplemental Table 8). Bacteriohopane-32,33,34,35-tetrol is a bacterial metabolite that 305 positively correlated with the metabolic disease score. Two additional microbiome-associated 306 compounds were triglycerides (TG(54:6) and TG (16:0/18:2.20:4)) that were positively correlated 307 with metabolic disease score and elevated in humanized mice compared to GF mice. Another of 308 these metabolites, 1-Linleoyl-2-oleoyl-rac-glycerol is a 1,2-diglyceride in the triglyceride 309 biosynthesis pathway. Finally, phosphatidylcholine (PC(17:0/18:2)) and 310 phosphatidylethanolamine (PE(20:3/18:0)) compounds were identified as microbiome associated 311 and positively and negatively correlated with the metabolic disease score respectively (44). 312 313 Discussion 314 In this study, we identified several bacterial, diet, and immune measures that predicted higher 315 metabolic disease score in a cohort of MSM with and without HIV, ART, and LD. Notably, we 316 identified a strong relationship between circulating LBP and higher metabolic disease score which 317 in turn correlated with other markers of systemic inflammation, a loss of beneficial microbes such 318 as Gram-positive, butyrate-producing bacteria, and higher BMI, indicating that diverse modifiable 319 factors may influence LPS/inflammation driven metabolic disease in this population. 320

321
There was a positive association between metabolic disease score and age, as has been reported 322 previously for non-HIV populations (45), but linear modeling suggested that this relationship was 323 driven by an association in HIV-negative MSM and HIV-positive untreated MSM in our study, 324 revealing a possibly larger effect size than in our other cohorts. Also, when controlling for age, 325 HIV-negative and positive untreated MSM had the highest metabolic disease score, even 326 compared to HIV positive individuals on ART with LD, a population that has previously been 327 reported to have higher incidence (46). This result is intriguing given our results supporting a role 328 for LBP driven inflammation in metabolic disease and prior research linking increased levels of 329 LPS in blood with high-risk behavior in MSM (26). Larger cohorts and more detailed behavior 330 information are required, however, to make any definitive claims on impaired metabolic health in 331 ageing in HR-MSM. 332 333 Consistent with prior studies that have associated high BMI with dyslipidemia, insulin resistance, 334 and/or metabolic syndrome (9, 47, 48), BMI was a positive predictor of metabolic disease score 335 in our cohort even though our study excluded obese individuals, but did include overweight. This 336 suggests the importance of weight management even among overweight, non-obese individuals 337 as a strategy for reducing metabolic health impairment in this population. 338

339
We did not find a positive association between ART treatment status and metabolic disease score, 340 but this may be because individuals in our study were on a wide variety of drug combinations with 341 the potential to have varied/contrasting effects. For instance, both integrase stand transfer 342 inhibitors (ISTI) (49) and regimens including the nucleoside reverse transcriptase inhibitor (NRTI) 343 tenofovir have been shown to increase risk of weight gain (50). Conversely, the CCR5 antagonist, 344 maraviroc, may confer a benefit to cardiovascular function and body weight maintenance and 345 evidence in mice suggests that this may be linked to differences in gut microbiome composition 346 with treatment (21, 51). Thus, future studies more targeted to particular ART regimes will be 347 required to look at factors important in particular drug contexts. 348 349 Several of the dietary components identified as important predictors of metabolic disease score 350 in our cohort have been previously associated with metabolic health, including dietary carotenoid, 351 lycopene, and fiber (52-56). Fiber's benefit in glucose response has been linked with the activity 352 of Prevotella copri. Individuals who had improved glucose response upon 3 days of a high-fiber 353 diet consumption were characterized by a higher increase in P. copri (55) and beneficial effects 354 of P. copri were confirmed in mice fed a high-fiber diet (55). Interactions between Prevotella, 355 dietary fiber, and metabolic health were of particular interest in this cohort of HIV positive and 356 negative MSM since these individuals have much higher Prevotella, including P. copri, than in 357 non-MSM (16, 18). However, other published studies suggested that high Prevotella might predict 358 increased risk of metabolic disease. One group observed that P. copri in mice fed a western diet 359 low in fiber could promote poor glucose response through the production of branched chain amino 360 acids (BCAAs) (12). Additionally, our prior study using in mediator of metabolic-disease associated immune phenotypes. These included 1) I-CAM 1, 377 whose expression in adipose tissue has been associated with diet-induced obesity in mice (59) 378 and metabolic syndrome in humans (60) 2) IL-6, a pro-inflammatory cytokine that has been shown 379 to play a direct role in insulin resistance (61), and 3) SAA, which is regulated in part by IL-6 and 380 plays a role in cholesterol metabolism (60); SSA3 specifically has been shown to be produced in 381 response to gut bacteria in obesity mice (62). We observed a positive association between 382 metabolic disease score and frequency of abdominal bloating, further supporting a role of 383 intestinal dysfunction in this population. Taken together these associations suggest that 384 inflammation originating from an impaired intestinal barrier is promoting worse metabolic health. 385 386 Although prior studies have connected LBP-associated inflammation with worse metabolic health 387 (27,28,63); the strength of this relationship is disease specific with less clear results in obesity-388 associated metabolic disease (29,30). An importance of bacterial translocation in HIV-associated 389 metabolic syndrome was demonstrated in a recent study of metabolic comorbidities in HIV-390 positive individuals which found that lower CD4 nadir and/or AIDS events, HIV-associated 391 microbiota, and low alpha diversity was correlated with increases in sCD14 and LBP and increase 392 risk of metabolic syndrome (31). Additionally, in our study LBP was correlated with age and BMI, 393 a relationship that was previously observed in a cohort of HIV-negative men of African ancestry 394 with this trio being further associated with adiposity and pre-diabetes (64). Lastly, the negative 395 association of LBP with putative butyrate producing bacteria suggests that a lack of microbes that 396 promote intestinal barrier integrity contributes to increased intestinal permeability and thus 397 microbial components in circulating blood. 398 399 In our metabolomic analysis, we identified 150 metabolites in blood that correlated with metabolic 400 disease score. In order to identify compounds whose prevalence may be related to the gut 401 microbiome we used two complimentary approaches. First, we predicted which of these 402 compounds could have been produced by the microbiome using information in KEGG and the 403 bioinformatics tool AMON (40). Second, we measured which compounds changed in relative 404 abundance in germ-free versus mice colonized with feces from our study cohort. The AMON 405 analysis allows us to specifically evaluate which compounds could have been directly produced 406 by the gut microbiome but is limited by a lack of KEGG annotations for many compounds (40). 407 The gnotobiotic mouse experiments can identify microbial influence in unannotated compounds 408 but cannot differentiate between direct production/consumption by microbes versus indirect 409 influence. The results will also be influenced by physiological differences between mice and 410 humans and the incomplete colonization of human microbes in humanized mice. Although these 411 weaknesses may have led us to underestimate which of the 150 metabolic disease associated 412 compounds may have been related to the microbiome, it still identified compounds that supported 413 a mechanistic link between gut microbes, metabolites, and metabolic disease in HIV Thirdly, we identified a PC and a PE associated with both the microbiome and metabolic disease 431 score. Changes in PCs and/or PEs have been previously implicated in atherosclerosis, insulin 432 resistance and obesity (44). AMON analysis indicated that both PCs and PEs can be synthesized 433 by intestinal bacteria; however, these compounds can also be synthesized in the host and may 434 be found in the diet. In our analysis, PE(18:1/20:1) levels were higher in colonized compared to 435 germ-free mice indicating that intestinal bacteria do influence overall levels despite diverse 436 potential sources. 437 438 Lastly, we observed increased levels of several plasma triglycerides in the humanized compared 439 to germ-free mice, including two plasma triglycerides that were significantly associated with 440 metabolic disease score. This confirms the influence of the gut microbiome on host plasma 441 triglycerides (67-69). However, we did not find any strong associations between these 442 triglycerides and specific microbes within our dataset, indicating a potential need for studies 443 conducted in larger cohorts or with shotgun metagenomics to look for functional correlates. comparisons within the diet survey data, we binned highly co-correlating groups of measures 484 within the data types into modules (Supplemental Table 3). These modules were defined using 485 the tool, SCNIC (72). 486

Immune Data Collection 488
Whole blood was collected in sodium heparin vacutainers and centrifuged at 1700rpm for 10 489 minutes for plasma collection. Plasma was aliquoted into 1mL microcentrifuge tubes and stored 490 at -80. For ELISA preparation, plasma was thawed, kept cold, and centrifuged at 2000xg for 20 491 minutes before ELISA plating. Markers for sCD14, sCD163, and FABP-2 were measured from 492 plasma using standard ELISA kits from R&D Systems (DC140, DC1630 &DFBP20). Positive 493 testing controls for each ELISA kit were also included (R&D Systems QC20, QC61, & QC213). Germ-free C57/BL6 mice were purchased from Taconic and bred and maintained in flexible film 509 isolator bubbles, fed with standard mouse chow. Three days before they were gavaged, male 510 mice between 5-7 weeks of age were switched over to a western high-fat diet and were fed this 511 diet for the remainder of the experiment. Diets were all obtained from Envigo (Indiana): Standard  512  chow  -Teklad  global  soy  protein-free  extruded  (item  2920X  -513 https://www.envigo.com/resources/data-sheets/2020x-datasheet-0915.pdf), Western Diet -New 514 Total Western Diet (item TD.110919). See Supplemental Table 10 for detailed diet composition.  515 Mice were gavaged with 200 μL of fecal solutions prepared from 1.5 g of donor feces mixed in 3 516 mL of anaerobic PBS (19). Mice were housed individually following gavage for three weeks in a 517 Tecniplast iso-positive caging system, with each cage having HEPA filters and positive 518 pressurization for bioexclusion. Feces were collected from mice at day 21 for 16S rRNA gene 519 sequencing. Mice were euthanized at 21 days post gavage using isoflurane overdose and all 520 efforts were made to minimize suffering. Blood from euthanized animals was collected using 521 cardiac puncture and cells were pelleted in K2-EDTA tubes; plasma was then aliquoted and 522 stored at -80° C. The hydrophobic fractions were analyzed using reverse phase chromatography on an Agilent 538 Technologies (Santa Clara, CA) 1290 ultra-high precision liquid chromatography (UHPLC) 539 system on an Agilent Zorbax Rapid Resolution HD SB-C18, 1.8um (2.1 x 100mm) analytical 540 column as previously described (73,74). The hydrophilic fractions were analyzed using 541 hydrophilic interaction liquid chromatography (HILIC) on a 1290 UHPLC system using an Agilent 542 InfinityLab Poroshell 120 HILIC-Z (2.1 x 100mm) analytical column with gradient conditions as 543 previously described (75)  Microbiome-associated metabolites were defined using metabolites identified as significantly 571 different in abundance between germ-free compared to humanized gnotobiotic mice and/or 572 metabolites identified as microbially produced by the tool AMON (40). 573 574 For the gnotobiotic mouse analysis aqueous and lipid metabolites were analyzed separately (see 575 mouse protocol above for details on experimental set-up). Metabolites that were present in <20% 576 of samples were filtered out before analysis. Significant difference was determined using a 577 Student's t-test with FDR p-value correction. FDR-corrected p values < 0.05 were deemed 578 significant. Significant metabolites also present in the human samples were retained for further 579 analysis. 580 581 For the AMON-identified metabolites, the tool used an inferred metagenome, which was 582 calculated using the PICRUSt2 QIIME2 plugin (42)  Microbiome processing was performed using QIIME2 version 2018.8.0 (80). Data was sequenced 604 across five sequencing runs. Each run was demultiplex and denoised separately using the 605 DADA2 q2 plugin (81). Individual runs were then merged together and 99% de novo OTUs were 606 defined using vSEARCH (82). Features were classified using the skLearn classifier in QIIME2 607 with a classifier that was pre-trained on GreenGenes13_8 (83). The phylogenetic tree was 608 building using the SEPP plugin (84). Features that did not classify at the phylum level or were 609 classified as mitochondria or chloroplast were filtered from the analysis. Samples were rarefied 610 at 19,986 reads. To reduce the number of comparisons within the microbiome, we binned highly 611 co-correlating groups of measures within the data types into modules (Supplemental Table 1). 612 These modules were defined using the tool, SCNIC (72). For statistical analysis features present 613 in <20% of samples were filtered out. 614

Module definition 617
Modules were called on microbiome and diet data. Modules were defined using the tool SCNIC 618 (72). The q2-SCNIC plugin was used with default parameters for the microbiome data and 619 standalone SCNIC was used for the diet data (https://github.com/shafferm/SCNIC). Specifically, 620 for each data type SCNIC was used to first identify pairwise correlations between all features. 621 Pearson correlation was used for diet and SparCC (85) Metabolic disease score was calculated using PCA in R with prcomp. Data was scaled using 638 default method within the prcomp library. All random forest analysis tools were used in R. 639 Standard random forest was performed using randomForest. Variable selection was performed 640 in R using the tool VSURF (35). Interaction analysis was performed in R using the tool iRF (39). 641

Data Availability 643
All data will be publicly available upon publishing. Microbiome data in QIITA 644 (https://qiita.ucsd.edu) Study ID 13338 and available upon request and will be publicly available 645 in EBI/ENA (https://www.ebi.ac.uk/ena) upon publishing. Immune and diet data are available 646 along with the microbiome data as associated metadata. Metabolomics data will be available on 647 Metabolomic Workbench (https://www.metabolomicsworkbench.org) upon publishing. Until 648 publicly available it is available upon request. 649 650 Statements 651 AJSA analyzed and interpreted all data. AJSA and CAL wrote the manuscript. NR guided 668 generation and interpretation of metabolomics data. KQ and KAD prepared, ran, and processed 669 metabolomics. KQ ran metabolic pathway analysis. SXL prepared and conducted mouse 670 experiments. JMS ran immunological assays. NMN prepared and ran sequencing and 671 coordinated fecal sample and metadata collection from study subjects. SF recruited subjects, 672 collected samples, and maintained regulatory compliance. TJM and JH collected and aided in 673 interpretation and processing of diet data. CAL, BEP, and TC conceptualized and led the study.TC 674 guided all clinical data collection and subject recruitment and provided clinical insight into study 675 populations. BEP guided generation and interpretation of immune data. CAL guided microbiome 676 data generation and multi'omic data analysis. All authors read and approved the final manuscript. 677

Supplemental Tables 678
Supplemental  Metabolic disease score was permuted 1,000 times and passed through VSURF. The resulting 709 variables were run through a standard random forest and the percent variation explained was 710 calculated. The blue line represents the percent variation explained for the true VSURF. P value 711 was calculated using a one tailed test. 712 Spearman rank correlations with an FDR p < 0.25 are shown. See Supplemental Table 5 for the  716  edge table and Supplemental Table 4 for the node table.  717