Abstract
The most complex niche in the human microbiome is found in the distal gut, where communities harbor thousands of strains from across the microbial taxonomy. Despite playing important roles in immune response, nutrition, metabolism of pharmaceuticals, and enteric diseases, little is known of the intricate network of metabolic transformations mediated by the microbiota. Advances in high-throughput sequencing are enabling researchers to explore the metabolic potential of microbial communities inhabiting the human body with unprecedented resolution. Resulting datasets are reshaping how we perceive ourselves and opening new opportunities for prevention and therapeutic intervention. Several large-scale metagenomic datasets derived from hundreds of human microbiome samples and sourced from multiple studies are now publicly available.
However, the different proprietary functional annotation pipelines used to process sequence information from each of these studies, with their own choice of functional and metabolism reference databases, and cut-off parameters for relevant hits, introduce systematic differences that confound comparative analyses. To overcome these challenges, we developed GutCyc, a freely-available compendium of environmental pathway genome databases (ePGDBs)constructed from metagenome assemblies from 431 human microbiome samples, across three different large-scale studies. The ePGDBs were constructed using the open-source MetaPathways metagenomic annotation pipeline that enables reproducible functional metagenomic annotation. We also generated metabolic network reconstructions for each metagenome using the Pathway Tools software, that empowers researchers and clinicians interested in visualizing and interpreting the metabolic pathways, reactions, compounds, and transporters in the human gut microbiome. For the first time, GutCyc provides consistent annotations and metabolic pathway predictions across these three studies, making possible reproducible comparative community analyses between health and disease states in inflammatory bowel disease, Crohn’s disease, and type 2 diabetes. We demonstrate the utility of GutCyc as a computational model by reconstructing a missing metabolic route from a research study on the role of the microbiome in cardiovascular disease, and statistical enrichment analysis and visualization of high-throughput data from a microbiome metabolomics study.
With GutCyc, the publicly-deposited knowledge about human distal gut microbiotic transport and enzymatic reactions is integrated in a form that is both readily searchable by researchers and easily processed programmatically. GutCyc enables research on drug/target discovery, analysis of pharmaceutical fate in the lumen, and engineering of therapeutic microbiomes. GutCyc data products are searchable online, or may be downloaded and explored locally using the Metapathways graphical user interface and Pathway Tools.