The Core Human Fecal Metabolome

Among the biomolecules at the center of human health and molecular biology is a system of molecules that defines the human phenotype known as the metabolome. Through an untargeted metabolomic analysis of fecal samples from human individuals from Africa and the Americas—the birthplace and the last continental expansion of our species, respectively—we present the characterization of the core human fecal metabolome. The majority of detected metabolite features were ubiquitous across populations, despite any geographic, dietary, or behavioral differences. Such shared metabolite features included hyocholic acid and cholesterol. However, any characterization of the core human fecal metabolome is insufficient without exploring the influence of industrialization. Here, we show chemical differences along an industrialization gradient, where the degree of industrialization correlates with metabolomic changes. We identified differential metabolite features like leucyl-leucine dipeptides and urobilin as major metabolic correlates of these behavioral shifts. Our results indicate that industrialization significantly influences the human fecal metabolome, but diverse human lifestyles and behavior still maintain a core human fecal metabolome. This study represents the first characterization of the core human fecal metabolome through untargeted analyses of populations along an industrialization gradient.

However, any characterization of the core human fecal metabolome is insufficient without 33 exploring the influence of industrialization. Here, we show chemical differences along an 34 industrialization gradient, where the degree of industrialization correlates with metabolomic 35 changes. We identified differential metabolite features like leucyl-leucine dipeptides and urobilin 36 as major metabolic correlates of these behavioral shifts. Our results indicate that industrialization 37 significantly influences the human fecal metabolome, but diverse human lifestyles and behavior 38 still maintain a core human fecal metabolome. This study represents the first characterization of 39 the core human fecal metabolome through untargeted analyses of populations along an 40 industrialization gradient. 41 was most associated with non-industrial populations, while urobilin abundance was strongly 110 associated with industrialized populations (Figure 1d-e). Leu-leu is a leucine dipeptide 111 previously recognized as a human metabolite in a study comparing fecal metabolomes of 112 individuals with and without colorectal cancer, where leu-leu showed 99% prevalence across 113 both control and colorectal cancer groups 36 . While leu-leu has not been mentioned in previous 114 industrialization-focused studies of human fecal metabolomes, increased abundance of leucine 115 was noted in fecal metabolomes of urban Nigerian adults as compared to rural adults 28 , 116 contrasting with the non-industrial association of leu-leu in our data. The second annotated 117 differential metabolite feature, urobilin, is formed from the metabolic breakdown of 118 hemoglobin 37 . While previous industrialization-focused fecal metabolomics studies did not 119 report this metabolite, urobilin has been identified as a common metabolite in human urine and 120 fecal metabolomes 38,39 . Importantly, urobilin abundance is affected by host diet and behavior 40 ,  in animal brains, including humans 6,57 ). While a number of these shared metabolite features 185 listed above provide key biological functions, some metabolites appear to be derived from 186 dietary sources. An example of a metabolite possibly acquired from food products includes 187 conjugated linoleic acid (m/z 263.24; RT 6.68 min; commonly found in meat and dairy products, 188 also recognized for anti-inflammatory capabilities 6,58 ).

189
To explore associations between the core human fecal metabolome and gut microbiome 190 profiles, Spearman's rho correlation coefficients were calculated for the core metabolites and 191 identified microbial operational taxonomic units (OTUs) derived from clustering sequences.

192
Moderate to strong correlations were noted between 604 core human fecal metabolites and gut 193 microbe pairs ( Figure 3; Table 3; Supplementary Table 3 Bacteroidia classes (16% of total nodes) (Figure 3d), which respectively, have reduced and 204 increased abundance in industrialized populations 19,32 . Indeed, urobilin had higher abundance in 205 industrialized than in non-industrialized populations in our analysis, and most of its strong 206 correlations were with Clostridia microbes while most of its negative correlations were with 207 Bacteroidia. Bilirubin, which was enriched in industrialized populations, was negatively 208 correlated with Bacteroidia. This pattern highlights interactions between the core human fecal 209 metabolome and the gut microbiome, especially as they are influenced by processes like 210 industrialization.

211
Our novel data thus represent a core human fecal metabolome from populations of 212 diverse behaviors and lifestyles, yet we do not presume to have captured the range of diversity of 213 industrial lifestyles or age groups seen in international metabolome initiatives. To broaden our 214 analysis, we co-analyzed our data with a total of 1,286 samples from ten public fecal 215 metabolome datasets 47,49,[60][61][62][63][64] (Supplementary Table 4), using the Re-Analysis of Data User 216 Interface (ReDU) 39 . These datasets contained samples from male and female children and adults.

217
Eight of the datasets consisted of samples collected from the United States, one contained 218 samples from Venezuela, and one dataset did not report samples' geographic origin. 219 Furthermore, the datasets included different MS platforms and different metabolite extraction 220 methods, enabling us to assess the commonality of these metabolites across experimental 221 methods. Indeed, every annotated core metabolite (Supplementary Table 2) was detected in this 222 co-analysis, but only 31% were identified in every selected dataset. Such shared annotated 223 molecules include palmitelaidic acid, urobilin, lithocholic acid, and cholesterol. Furthermore, we 224 also examined the human fecal metabolome database (HFMDB) 65 , which contains 6,810 225 metabolites identified across multiple datasets, for our annotated core metabolite features. 65% 226 of our annotated core metabolite features were present in the HFMDB (Supplementary Table 2); 227 examples of identified metabolites also found in the HFMDB include palmitoleic acid, 228 hypoxanthine, and xanthosine. However, it should be noted that the HFMDB is comprised of 229 data derived from various instrumental, analytical, and processing methods 65 . The absence of 230 some of our core metabolites from the HFMDB can be attributed to these methodological 231 differences.

232
While we were able to reveal the core human fecal metabolome, only 6.1% of our 233 complete dataset had putative compound-level annotations (level 2 according to the 234 metabolomics standards initiative 66 ). Fifteen of these were validated using standards, enabling

340
The sample preparation protocol used for this project was adapted from a global 341 metabolite extraction protocol with proven success 67 . Samples were thawed and 500 μl of chilled 342 LC-MS grade water (Fisher Scientific) was added to 50 mg of fecal material. Next, a Tissuelyzer 343 homogenized samples at 25 Hz for three minutes. Following homogenization, chilled LC-grade 344 methanol (Fisher Scientific) spiked with 4 μM sulfachloropyridazine as the internal standard (IS) 345 was added, bringing the total concentration to 50% methanol. The TissueLyzer homogenized 346 samples again at 25 Hz for three minutes, followed by overnight incubation at 4 °C. The next 347 day, samples were centrifuged at 16,000 x g at 4 °C for ten minutes. Aqueous supernatant was 348 then removed and dried using a SpeedVac vacuum concentrator. Dried extracts were frozen at -349 80 °C until the day of MS analysis. Immediately prior to MS analysis, extracts were resuspended 350 in 150 μl chilled LC-MS methanol:water (1:1) spiked with 1 μg/ml sulfadimethoxine as a second 351 IS. After resuspension, samples were diluted to a 1:10 ratio. Diluted samples were sonicated 352 using a Fisher Scientific Ultrasonic Cleaning Bath at maximum power for ten minutes. Scientific) with 0.1% formic acid. Elution gradient started at 5% Solvent B for one minute, 364 increased to 100% Solvent B until minute nine, held at 100% Solvent B for two minutes, 365 dropped to 5% Solvent B over 30 seconds, and 5% Solvent B for one minute as re-equilibration.

366
Samples were injected in random order with an injection volume of 5 μl. After elution, 367 electrospray ionization was conducted with spray voltage of 3.8 kV, auxiliary gas flow rate of 10 368 L/min, auxiliary gas temperature at 350 °C, sheath gas flow rate at 35 L/min, and sweep gas flow 369 at 0 L/min. Capillary temperature was 320 °C and S-lens RF was 50 V.

383
Each pure standard was diluted to 100 μM, 50 μM, 10 μM, 5 μM, and 1 μM concentrations to  filtering, additional processing was done in R 71 to remove any features that were not found in at 413 least six samples from each population. The resulting files were also analyzed in GNPS as 414 described above.

415
For 16S rRNA gene sequencing data, we used AdapterRemoval v2 76 to filter out 416 sequences < 90 bp in length. QIIME1 77 was used to perform closed-reference OTU picking using 417 the EzTaxon database 78 as a reference. For OTU picking, the maximum number of database hits 418 per sequence was eight and the maximum number of rejects for a new OTU was 12. After 419 creating biom files, each sample file was rarefied to a depth of 10,000. Generated taxa 420 summaries were limited to genus-level identifications. Only taxa with >0.5% relative frequency 421 were included for correlation analyses.  Principal coordinate analysis (PCoA) plots were created using Canberra distance metrics 433 from Quantitative Insights Into Microbial Ecology 2 (QIIME2) 79 and visualized using Emperor 80 .

434
PERMANOVA via QIIME2 assessed statistical significance for beta diversity measures.

435
Kruskal-Wallis p-values were calculated in R 71 through Jupyter Notebook 72 . Boxplots ( Figure   436 1c-h, Supplementary Figures 1-2) were also generated using R 71 in JupyterNotebook 72 . For these 437 boxplots, the center line represents the median, the upper and lower box lines reflect upper and 438 lower quartiles, whiskers reflect the interquartile range multiplied by one-and-a-half, and outliers 439 are dots. R packages ggplot2 81 and rworldmap 82 were used to create Figures 1a, 1c-            Bruker Impact (n=447); G3 is Bruker maXis (n=143); G4 is the dataset from this study (n=105).

725
The co-analysis illustrates overlap across the datasets, despite instrumental differences. Colored 726 box highlights intersection of all datasets (855 total metabolite feature).