Abstract
Enterotypes describe human fecal microbiomes grouped by similarity into clusters of microbial community composition, often associated with disease, medications, diet, and lifestyle. Numbers and determinants of enterotypes have been derived by diverse frameworks and applied to cohorts that often lack diversity or inter-cohort comparability. To overcome these limitations, we selected 16,772 fecal metagenomes collected from 38 countries to revisit the enterotypes using state-of-the-art fuzzy clustering and found robust clustering regardless of underlying taxonomy, consistent with previous findings. Quantifying the strength of enterotype classifications enriched the enterotype landscape, also reflecting some continuity of microbial compositions. As the classification strength was associated with the patient’s health status, we established an “Enterotype Dysbiosis Score” (EDS) as a latent covariate for various diseases. This global study confirms the enterotypes, reveals a dysbiosis signal within the enterotype landscape, and enables robust classification of metagenomes with an online “Enterotyper” tool, allowing reproducible analysis in future studies.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
↵* Co-corresponding
Abstract corrected with the correct sample number; Typos in graphical abstract corrected
List of abbreviations
- AUC
- Area under the curve
- CH-index
- Calinski-Harabasz index
- DMM
- Dirichlet multinomial mixtures
- EDS
- Enterotype dysbiosis score
- FKM
- Fuzzy K-means clustering
- glm
- generalized linear model
- GTDB
- Genome taxonomy database
- LASSO
- Least absolute shrinkage and selection operator
- JSD
- Jenson-Shannon divergence
- KEGG
- Kyoto encyclopedia of genes and genomes
- lmer
- linear mixed effect model
- mOTU
- marker gene-based operational taxonomic unit
- NCBI
- National center for biotechnology information
- PAM
- partition around medoid clustering
- PCA
- Principal components analysis
- PCoA
- Principal coordinate analysis
- PERMANOVA
- permutational multivariate analysis of variance
- PHATE
- Potential of heat-diffusion for affinity-based trajectory embedding
- UMAP
- Uniform manifold approximation and projection
- XGBoost
- Extreme gradient boosting