A systematic analysis of metabolic pathways in the human gut microbiota

The gut microbiota produce hundreds of small molecules, many of which modulate host physiology. Although efforts have been made to identify biosynthetic genes for secondary metabolites, the chemical output of the gut microbiome consists predominantly of primary metabolites. Here, we systematically profile primary metabolic genes from the gut microbiome, identifying 19,885 gene clusters in 4,240 high-quality microbial genomes. We find marked differences in pathway distribution among phyla, reflecting distinct strategies for energy capture. These data explain taxonomic differences in short-chain fatty acid production and suggest a characteristic metabolic niche for each taxon. Analysis of 1,135 subjects from a Dutch population-based cohort shows that the level of 14 microbiome-derived metabolites in plasma is almost completely uncorrelated with the metagenomic abundance of the corresponding biosynthetic genes, revealing a crucial role for pathway-specific gene regulation and metabolite flux. This work is a starting point for understanding differences in how bacterial taxa contribute to the chemistry of the microbiome.


50
The detection rules were run on a test set, and all the MGC predicted by the same rule were grouped together and (3) run 51 through BiG-SCAPE, which grouped the MGCs into gene cluster families (GCFs). (4) Based on literature analysis of GCF 52 members, detection rules were manually fine-tuned to either include or exclude MGC architectures that were either related 53 to specialized primary metabolism or not. (5) Finally, fine-tuned detection rules were annotated and categorized into 54 different MGC classes based on their metabolic end products.

56
As a starting point, we constructed a dataset of 51 primary metabolic pathways from the gut 57 microbiome with biochemical or genetic literature support (including MGCs as well as pathways 58 encoded by a single gene) and identified core enzymes (i.e., required for pathway function) to serve 59 as a signature for the detection rules ( Figure 1, Table S1; see Methods for details). To more accurately 60 predict MGCs of interest, we performed three computational procedures. First, for core enzymes 61 belonging to 12 of the protein superfamilies that are known to catalyze diverse types of reactions and 62 were most commonly found across a wide range of pathways, we constructed phylogenies and used 63 them to create clade-specific pHMMs to detect specific subfamilies (see SI results Phylogenetic 64 analysis of protein superfamilies to identify pathway-specific clades). Second, we designed pathway-  (Table S2&S3). Third, despite the fact that 70 most specialized primary metabolic pathways are encoded in MGCs, there are also single-protein 71 pathways that are in charge of the secretion of key specialized primary metabolites in the gut 72 microbial ecosystem, such as serine dehydratase, which produces ammonia and pyruvate from serine 73 (5). For this reason, we also built 10 clade-specific pHMMs to detect these (see Methods section 74 Assessing single-protein pathway abundance within representative human gut bacteria). The above 75 procedures led to the design of a robust set of detection rules to identify both known and putative 76 MGCs that are potentially relevant for metabolite-mediated microbiome-associated phenotypes.

78
To profile the metabolic capacity of strains from the human gut microbiome, we selected a set of 79 4,240 unique high-quality reference genomes consisting of 1,520 genomes from the Culturable 80 Genome Reference (CGR) collection (6), 2,308 genomes from the Microbial Reference Genomes 81 collection of the Human Microbiome Project (HMP) consortium (7) and 414 additional genomes from 82 the class Clostridia to account for their metabolic versatility (8) ( Table S4). We refrained from including 83 metagenome-assembled genomes in this analysis, as they often lack the taxon-specific genomic 84 islands (9) on which many specialistic metabolic functions are encoded. In total, gutSMASH predicted

123
Our results provide insights into the metabolic strategies that microbes use to produce short chain 124 fatty acids (SCFAs). As expected, butyrate production is found exclusively in certain Firmicutes and 125 Fusobacteria, whereas propionate production is largely confined to (and conserved in) the

164
Next, we set out to determine the prevalence and abundance of each pathway in a cohort of human 165 samples. We used BiG-MAP (14) to profile the relative abundance of each MGC class across 1,135 166 metagenomes from the population-based LifeLines DEEP cohort (15), by mapping metagenomic reads 167 against a collection of 6,836 non-redundant MGCs detected in our set of reference genomes ( Figure   168 3A,B). Some pathways, such as CO2 to acetate (acetogenesis) and butyrate production from acetate 169 or glutamate, as well as polyamine-forming pathways, were found in >99% of microbiomes. Others, 170 such as 1,2-propanediol utilization and p-cresol production, both associated with negative effects on 171 gut health (16, 17), were observed at detectable levels in only half of the samples. In terms of 172 abundance, it is striking that for example the bile acid-induced (bai) operon for the formation of the 173 secondary bile acids deoxycholic acid and lithocholic acid, which has been characterized from very 174 low-abundance Clostridium scindens strains (18), was still shown to be present in relatively high 175 abundance across a subset of subjects. Analysis of the mapped reads showed that the vast majority 176 of these mapped to a homologous MGC from the genus Dorea instead (Suppl. Figure 2), for which the 177 physiological relevance remains to be established. It is also interesting to see that, while two of the 178 three acetate-forming pathways (PFL and WLP) were consistently found at high abundance levels, the 179 abundance of all butyrate-forming pathways is highly variable across subjects, with a >20-fold 180 difference between lower and upper quartiles in the abundance distribution of the glutamate-to-181 butyrate pathway, and a >440-fold difference between the 10th percentile and the 90th percentile.

183
The wide variability in the metagenome abundance of each pathway raises the question of whether 184 metagenomic abundance of a pathway correlates with the level of its small molecule product in the 185 host. To address this question, we systematically compared the level of each pathway with the 186 quantity of the corresponding metabolite as determined by plasma metabolomics. We find a striking 187 lack of correlation between pathway and metabolite levels (r ranging from -0.04 to 0.24, Figure 3C).  (raw data available at Table   206   S9). (C) Limited correlation of genetic pathway abundance with abundance of metabolites in blood plasma.

208
The gutSMASH software constitutes, to our knowledge, the first comprehensive automated tool 209 designed to identify niche-defining primary metabolic pathways from genome sequences or 210 metagenomic contigs-even a full-fledged metabolic network reconstruction software like 211 PathwayTools (21) (which uses the extensive MetaCyc database (22)) lacks detection capabilities for 212 3 out of the 41 MGC-encoded pathways detected by gutSMASH (Table S7)