Abstract
Recent studies have shown that microbial communities are composed of groups of functionally cohesive taxa, whose abundance is more stable and better associated with metabolic fluxes than that of any individual taxon. However, identifying these functional groups in a manner that is independent from error-prone functional gene annotations remains a major open problem. Here, we develop a novel approach that identifies functional groups of taxa in an unsupervised manner, solely based on the patterns of statistical variation in species abundance and environmental parameters. We demonstrate the power of this approach on three distinct data sets. On data of replicate microcosm with heterotrophic soil bacteria, our unsupervised algorithm recovered experimentally validated functional groups that divide metabolic labor and remain stable despite large variation in species composition. When leveraged against the ocean microbiome data, our approach discovered a functional group that combines aerobic and anaerobic ammonia oxidizers, whose summed abundance tracks closely with nitrate concentrations in the water column. Finally, we show that our framework can enable the detection of species groups that are likely responsible for the production or consumption of metabolites abundant in animal gut microbiomes, serving as a hypothesis generating tool for mechanistic studies. Overall, this work advances our understanding of structure-function relationships in complex microbiomes and provides a powerful approach to discover functional groups in an objective and systematic manner.
Competing Interest Statement
The authors have declared no competing interest.