ABSTRACT
The composition of the gut microbiome in industrialized populations differs from those living traditional lifestyles. However, it has been difficult to separate the contributions of human genetic and geographic factors from lifestyle/modernization. Here, we characterize the stool bacterial composition of four Himalayan populations to investigate how the gut community changes in response to shifts in human lifestyles. These groups led seminomadic hunting-gathering lifestyles until transitioning to varying dependence upon farming. The Tharu began farming 250-300 years ago, the Raute and Raji transitioned 30-40 years ago, and the Chepang retain many aspects of a foraging lifestyle. We assess the contributions of dietary and environmental factors on their gut microbiota and find that the gut microbiome composition is significantly associated with lifestyle. The Chepang foragers harbor elevated abundance of taxa associated with foragers around the world. Conversely, the gut microbiomes of populations that have transitioned to farming are more similar to those of Americans, with agricultural dependence and several associated lifestyle and environmental factors correlating with the extent of microbiome divergence from the foraging population. For example, our results show that drinking water source and solid cooking fuel are significantly associated with the gut microbiome. Despite the pronounced differences in gut bacterial composition across populations, we found little differences in alpha diversity across populations. These findings in genetically similar populations living in the same geographical region establish the key role of lifestyle in determining human gut microbiome composition and point to the next challenging steps of isolating dietary effects from other factors that change during modernization.
INTRODUCTION
The human gut is comprised of a diverse community of bacteria, the microbiome or microbiota, that influences several aspects of human physiology including nutrient metabolism, immune responses, and resistance to infectious pathogens [1–3]. This highly malleable microbial component of human biology exhibits rapid, and in some cases, irreversible changes in response to dietary and environmental factors [4–11]. Modern humans experienced diverse environments since expanding out of Africa ~100,000 years ago, and over the past ~10,000 years hunting and gathering has largely yielded to different forms of agriculturally supported lifestyles. Dietary changes combined with a variety of other factors associated with the industrial revolution have been credited as contributing to the alterations in the gut microbiome in industrialized populations [12]. However interpretation of the current data is clouded by potential contributions of human genetic variation, environment, and geographical factors [5,7,13]. The potential connection between the gut ecosystem and several chronic diseases necessitates a better understanding of the extent to which modernization has contributed to population-wide community changes during industrialization [14,15].
Comparisons of the gut microbiomes of traditional human populations in Africa and South America with those of the industrialized Western populations from Europe and USA reveal that the human gut microbiome varies across geography and lifestyles [16–30]. One universal trend from these studies is the higher diversity of gut bacteria in unindustrialized traditional populations. However, most of the traditional societies investigated thus far live within tropical latitudes [31]. Hence, whether difference in alpha diversity is due to contrasting lifestyles, residence in the tropics, or other factors remains unclear. Moreover, most studies to date compare the gut microbiomes between populations that reside in geographically distinct regions, represent extreme modes of human subsistence, and are genetically and culturally distinct [16,17,19–21,25,27]. Although some studies have attempted to mitigate these differences by comparing human populations that reside in close geographical proximities [23,24,28], these populations have been separated for tens of thousands of years, a period of time sufficient for genetic and cultural differences to arise [32]. Since gut microbiome can be influenced by genetic, environmental, and cultural factors [23,28,33], these variables make it difficult to determine the impact of lifestyle changes in the gut microbiome in such distinct populations. Hence, understanding how transitions in human lifestyles lead to changes in the gut microbiomes would be greatly aided by studying populations that have undergone recent changes in culture, lifestyle, and diet.
In order to explore how the gut microbiota changes as human populations transition from traditional to more urban lifestyles, we have analyzed the gut microbiomes from four rural Himalayan populations and compared them to those of Americans with European ancestry. The Himalayan populations include the Chepang – a foraging population, the Raute and Raji – two foraging communities that are currently transitioning to subsistence farming, and the Tharu – former jungle dwellers that have completely transitioned to farming within the last two centuries. We assessed contributions of lifestyle, diet, and environment on the gut microbial variation in the rural Himalayan populations. Our results show that gut microbiome composition mirrors the transitions from traditional to westernized lifestyle in Himalaya. In addition to the dietary gradient across these populations, intra- and inter-population variability in lifestyle elucidated additional environmental and lifestyle associations that may contribute to microbiota change.
RESULTS
Description of populations
Our participants included 54 individuals from four Himalayan groups, including Chepang (N=14), Raji (N=9), Raute (N=11), and Tharu (N=20) with median age of 40 years (SD ± 14 years) from rural villages in Nepal (Figure 1, Supplementary Table 1). These four populations are long-term residents of the Himalayan foothills (altitude less than 1000 m) and contain various degrees of East Asian ancestries [34–36]. Although all four of the Himalayan populations in this study were forest dwellers until recently [37–40], habitat loss due to rapid deforestation, population expansions of non-native groups, establishment of new settlements, and construction of modern highways led to settlement of these groups at various time points in the last 300 years.
Historical records indicate that the Tharu gradually transitioned into agrarian lifestyles beginning in the late eighteenth century (250-300 years ago) [40]. They have fully transitioned into farming and are virtually completely disengaged from foraging practices. Historically, the Raute, Raji, and Chepang were semi-nomadic foragers and their diets included native tubers, greens and fruits from the jungle, wild honey, fish, and occasional game [38,39,41]. The Raute and Raji abandoned their foraging lifestyles in the 1980s [37,38]. While the Raute have settled in the remote hills in Far-Western Nepal, the Raji have settled in the Terai plains, which is relatively more urbanized. The Chepang were fully nomadic at least until 1848 [42] and began supplementing their foraging practices with subsistence agriculture less than a century ago [39]. The Chepang in this study currently inhabit a remote village that is devoid of modernity, including electricity, running water, irrigation, fertilizers, modern machines, and marketplaces. They still practice slash and burn agriculture and are completely dependent on rainwater for farming. Because yields from such traditional farming are low, their daily diet consists of wild plants such as sisnu (nettles) that are foraged from the forests.
Lifestyle gradients in the Himalayan populations
We conducted surveys to assess how lifestyle changed as these seminomadic populations transitioned to farming in the last few hundred years. The survey questionnaire included questions pertaining to current dietary practices, traditional and modern medicines, and several environmental factors, including sources of drinking water, alcohol use, and tobacco consumption (N=53, Supplementary Table 2). Combustion of solid biomass fuel such as firewood or animal dungs produces environmental particulate matters increasing indoor air pollution [43]. Prolonged exposure to environmental pollutants has the potential to alter gut microbiome [44]. Hence, we assessed the fuel types used for cooking and location of kitchen in our Himalayan participants. We also surveyed presence of parasites in our participants microscopically.
Supervised learning using a Random Forest classifier (RFC) model on the survey data (including intestinal parasite load) assigned the individuals to their respective populations with high accuracies (100% accuracy for the Chepang and Tharu, 90% for the Raute and Raji, OOB error = 3.5%, Supplementary Table 3A), indicating these populations have distinct lifestyles. A correspondence analysis (CA) of the survey data (including intestinal parasite load) also revealed lifestyle differences between these populations (Figure 2A). The first CA dimension (CA1) explained 15.8% variation in the data and was strongly correlated with lifestyle gradients. Along CA1, samples progressed from the Chepang foragers at one extreme, to the Raute and Raji transitioning populations, and then to the Tharu farmers at the opposite extreme (Figure 2B). Despite the geographical distance between them, the Raji lifestyle appears to be more similar to that of the Tharu farmers, consistent with the Raji settlement occurring in a more urbanized setting compared to the Raute. Similarly, the Raute reside in geographical proximity to the Raji, although their lifestyle partitions between the Raji and the Chepang, indicating geographical proximity is not driving the lifestyle differences.
A total of 10 variables contributed highly to the first two CA dimensions and most of them are strongly associated with dietary differences and modernity (Figure 2C). These differences are described in details in Supplementary Figure 1. Briefly, foraged plants such as sisnu (nettles) and jaand, a slushy alcoholic beverage made from fermenting millet or corn, are staples of the Chepang diet. In contrast, sisnu and jaand consumption was minimal among the Raute, Raji, and Tharu. Also, perceived food scarcity was higher in the Chepang and Raute relative to Raji and Tharu. Although meat consumption was low across all four populations, the Tharu consumed animal products such as yogurt more frequently than the other three populations. Furthermore, the Tharu and Raji also showed increased signs of modernity. For example, they have installed tube wells at their homes, enabling access to underground water for drinking. In contrast, the Chepang and Raute still fetch drinking waters from rivers and streams. Also, use of solid biomass fuel was lower in Tharu and Raji while Chepang and Raute are still completely dependent on burning firewood for cooking. Although we detected low overall levels of intestinal parasites across the participants, Ascaris, Entamoeba, Trichuris, Hymenolepis, and Coccidia were detected in some, and most of the infected were the Chepang. Together, the diet and lifestyle assessments provide unbiased support that the four populations represent a gradient from traditional to increasingly agrarian and urban lifestyles.
Gut microbiome composition varies by lifestyles
In order to assess whether the gut microbiome varies across lifestyles, we characterized the gut bacterial composition of these populations using the Illumina MiSeq to sequence the V4 region of 16S ribosomal RNA (rRNA) gene obtained from a total of 79 stool samples (including technical replicates) with an average of 11,570 (±4653) high quality reads/sample (Supplementary Figure 2, Supplementary Table 4). Since flash freezing of the samples was not possible in the remote sampling areas in the Himalaya, we used commercially available DNAgenotek OMNIgene kits to collect stool samples from the four populations (N=54). We also collected stool samples from 10 Americans of European descent using OMNIgene kits and compared them with freshly frozen samples to evaluate whether preservation method affected microbiome profile. The 16S rRNA profiles of the same samples stored by flash freezing or by OMNIgene were remarkably similar, with reproducible differences in minor taxa (Euryarcheota and Cyanobacteria), demonstrating the reliable preservation of microbiome composition with the OMNIgene kits (Supplementary Figure 3). Due to the reproducible, albeit minor, differences between the two collection methods, we used the OMNIgene data from the Americans for consistency in subsequent comparative analyses.
Comparison of the community structure in the five study populations using unweighted UniFrac distances, a measure of compositional similarity that includes the phylogenetic relatedness between microbiomes, showed that the gut microbial composition varied across populations (P< 2.2 X 10-16, Kruskal-Wallis test). The four Himalayan populations exhibited much larger distances when compared to the Americans than when compared to one another (Supplementary Table 5). The Chepang were the most distant from the Americans followed by the Raute, while the Raji and Tharu were equally close to the Americans. Within Himalaya, the Chepang were more distant from the Tharu and Raji relative to the Raute while the Raute, Raji and Tharu were equally distant from one another. Similar results were also observed with weighted UniFrac and Bray-Curtis distances, both of which take the taxa abundance into account (Supplementary Table 5).
Visualization of these distances using a Principal Coordinates Analysis (PCoA) revealed separation of populations along the top two dimensions (p=1 × 10-5, PERMANOVA, Figure 3A). Furthermore, gradients in lifestyles were reflected by the distribution of populations along the primary axis (PCoA1, Figure 3B). These distributions remained consistent when using Bray-Curtis and weighted UniFrac distances as well (p=1 × 10-5 for both, PERMANOVA, Supplementary Figure 4 and 5). When American microbiomes were eliminated from the principal coordinate analyses, the gradient between the Himalayan populations remained pronounced (P=1 × 10-5, PERMANOVA, Supplementary Figure 6). Among the four Himalayan populations, the strongest separation was observed between the Chepang foragers and the Tharu farmers.
A random forest classifier based on the 16S rRNA-defined read sequence variant (16S RSV) data assigned the Chepang, Tharu, and American individuals to their respective source populations with 86%, 100%, 100% accuracies (OOB error=32%, Supplementary Table 3B). The classification accuracy for the Raute and Raji, the two populations that recently transitioned from foraging to farming, were relatively poor (<10%). While some of the individuals from these groups were classified as the Chepang, others were classified as the Tharu. However, none of the Himalayan individuals were classified as American. These results collectively show that the gut microbiome compositions of the Himalayan populations are distinct from those of the Americans. They also indicate that within Himalaya, the gut microbiome of the Chepang foragers differs from that of the Tharu farmers while that of the Raute and Raji reflect their transitional state in their lifestyles.
To formally evaluate whether variation in gut microbiota reflects lifestyle differences within Himalaya, we assessed associations between the respective primary dimensions from the lifestyle questionnaire and parasite analysis (CA1) and gut microbial composition analysis (PCoA1) (Figure 3C and Supplementary Figure 4). We found that the CA1 was strongly correlated with the PCoA1 obtained from all of the three distance matrices (Spearman’s rho = 0.47, 0.44, and 0.28 for Bray-Curtis, unweighted UniFrac, and weighted UniFrac distances, respectively, P-value < 0.05 for all three, correlation test). The CA1 was also correlated with PCoA2 of all three distance matrices (Spearman’s rho = 0.26, 0.44, and 0.39; P-value = 0.06, 0.001, and 0.004 for Bray-Curtis, unweighted UniFrac, and weighted UniFrac distances, correlation test). Conversely, no significant correlations were detected between CA2 and either of the PCoA axes from all three distances (P-value < 0.05, correlation test). Notably, CA1 but not CA2 is associated with lifestyle gradient (Figure 2). Strong and consistent correlations between CA1 and PCoA axes indicate that gut microbiome compositions of the Himalayan populations mirror their lifestyles.
Gut bacterial diversity (alpha diversity) does not vary across lifestyles
Previous studies have suggested that elevated species diversity in gut microbiome is a hallmark of traditional populations [19,28]. We assessed the alpha diversity in the five study populations using four measures, namely species richness, Fisher’s alpha, Shannon’s H, and Simpson’s D at various rarefaction depths ranging from 500-3000 reads (Figure 4). Species richness and Fisher’s alpha were not significantly different between any of the five populations (Bonferroni adjusted P>0.05, Kruskal-Wallis test). We did find marginally significant differences in Shannon and Simpson indices between these populations (Bonferroni adjusted P<0.05, Kruskal-Wallis test). A post-hoc pairwise comparison of all five populations showed that only the alpha diversity in the Tharu was slightly lower than that in the Americans (Bonferroni adjusted P = 0.02 and 0.03 respectively, Dunn’s test) and none of the of the four diversity measures showed significant differences in alpha diversity between the Chepang, Raute, Raji, and the Americans. Moreover, correlations between each of the four alpha diversity measures and lifestyle differences within Himalaya measured using the CA1 were not statistically significant (P>0.05, correlation test). These results indicate that lifestyle differences among the Himalayan populations or between these populations and Americans have little effect on the alpha diversity of the gut microbiome.
Bacterial taxa are associated with lifestyle transitions
Although lifestyle differences have little effect on the alpha diversity, gut microbiome compositions of the Himalayan populations reflected the gradients in their lifestyles. To identify taxa driving the differentiation of the gut microbiomes across lifestyles we compared the differences in abundance of individual phylum across the five populations using a negative binomial generalized linear model (GLM) as implemented in DESeq2 [45]. Differential abundances were detected for 6 out of 10 phyla (FDR adjusted P-value <0.05, GLM, Supplementary Table 6) and four of the six phyla reflect a traditional-western lifestyle gradient. The Himalayan populations were characterized by higher abundance of Proteobacteria, while abundances of Actinobacteria, Firmicutes, and Verrucomicrobia were highest in the Americans, intermediate in the farmers (Tharu, Raji, and Raute), and lowest in the Chepang foragers (Figure 5A). Higher levels of Proteobacteria and lower levels of Actinobacteria and Verrucomicrobia are common features of many traditional human gut microbiomes across the world [19,24,28,30].
To characterize the taxonomic differences between populations at a finer level, we repeated the above analysis at the genus level and identified 52 out of 116 genera that showed significant differences in abundance across the five populations (FDR adjusted P-value <0.05, Figure 5B, Supplementary Table 7). Consistent with the differences observed at the phylum level, the rural populations were enriched for several members of Proteobacteria, including Ruminobacter, Campylobacter, Succinivibrio, and Escherichia/Shigella (Supplementary Figure 7). Among the rural populations, the Chepang foragers were enriched for Ruminobacter, Campylobacter, and Treponema. Although we did not detect significant differences in abundances of Bacteroidetes across these populations, several members of this phylum distinguished the rural and western populations. The rural Himalayan communities were enriched for Prevotella, Alloprevotella, and Anaerophaga and significantly depleted in Bacteroides, Alistipes, Butyricimonas, Odoribacter, and Barnesiella. 29 genera belonging to Firmicutes differed significantly across the five populations and their distribution was complex across these populations (Supplementary Figure 8). Traditional populations were enriched for Clostridium sensu stricto, Catenibacterium, Lactobacillus, Bulleidia, Sarcina, Enterococcus, Eubacterium, Oribacterium, Mogibacterium, Mitsuokella, Allisonella, Weissella, Papilbacter and two unknown genera of Erysipelotrichaceae and Veillonellaceae families. Alternatively, abundances of several Clostridium genera, Oscillibacter, Blautia, Butyriciococcus, Anaerostipes, and Flavonifractor were elevated in the Americans. The Americans also showed highest abundances of Bifidobacterium (Actinobacteria) and Akkermansia (Verrucomicrobia), both of which were extremely low in the Chepang foragers. Elevated abundances of Treponema and Prevotella with reduction of Bacteroides and Bifidobacterium is a characteristic feature of gut microbiomes of foraging communities [19,24,28,30].
To evaluate whether these taxa reflect lifestyle gradients, we measured the correlations of genus abundances with the coordinates from the PCoA1 axis obtained from the unweighted UniFrac analysis and found strong correlations for 33 of the 52 differentially abundant genera (Spearman’s rho >0.29, q-value <0.05, correlation test). Bacteroides showed the strongest positive correlation with the PCoA1 (rho=0.78, q-value = 1.9 X 10-12, correlation.test) while Ruminobacter, Treponema, Bulleidia, and Catenibacterium showed strong negative correlations (Figure 6), consistent with multiple genera varying with lifestyle differentiation across the five populations.
Factors affecting gut microbiome composition in the Himalaya
We next assessed whether any of the ten dietary and environmental factors that differentiate the Himalayan populations (from Figure 2) correspond to the variation in gut microbiome composition. A canonical correspondence analysis (CCA) revealed that the ten factors collectively explain 38% of the gut microbiome variation within Himalaya while 62% of the variation remained unexplained. Of the ten variables, the source of drinking water and use of solid biomass fuel were significantly associated with the gut microbiome composition in the Himalayan populations (P-value = 0.009 and 0.028 respectively, ANOVA), indicating that environmental factors can affect the gut microbiome. Both of these factors contributed most to the first CCA axis (CCA1), which distinguished the Chepang and Raute individuals who drink river water and exclusively burn solid biomass fuel for cooking from the Raji and Tharu who drink underground water and use biogas for cooking (Figure 7). Individuals who drank river water had higher abundances of Treponema and those who drank underground water had elevated levels of Fusobacterium (q-value<0.05 for both, Kruskal-Wallis test). Although cooking fuel was significantly associated at the compositional level, none of the genera reached statistical significance after correcting for multiple testing.
DISCUSSION
Several previous reports show that gut microbiomes of traditional populations vary from those of westerners [16,17,19–22,24,25,27–29]. These studies have emphasized that gut bacterial composition differs between traditional and westernized populations, alpha diversity is higher in traditional populations, and diet may be the primary driver of variation in the human gut microbiome. However, because these studies compare human populations that diverged tens of thousands of years ago, it has been difficult to separate the effect of geography and lifestyle on gut microbiome. In this study, we compared the gut microbiome from four rural Himalayan populations with shared ancestries that led nomadic lifestyles until recently and transitioned to farming at various time points in the last three hundred years. Although the individuals in our study have historically cohabited a geographically small region (less than 150K sq. km) in the Himalayan foothills and shared similar diets until recently, their current diets and lifestyles vary. Our results indicate that their gut microbiota strongly mirrors their lifestyles, indicating that the human gut microbiome can undergo pronounced changes within a short time (decades) of departure from foraging (as seen in the Raute and Raji). As dependences on agriculture increases, these changes become more pronounced (as seen in the Tharu). Since these populations have shared ancestries and they cohabit comparable latitudinal regions, such changes in gut microbiota are unlikely to be ascribable to host genetic differences or confounded by geography.
The variations in gut microbiome in the Himalayan populations are consistent with the general patterns observed in many traditional human populations. More importantly, our results suggest certain genera represent conserved gut microbiota markers of human subsistence states (Figure 8). Previous studies of the industrialized microbiota have demonstrated that gut microbiome composition associates with and can be driven by differences in host diet [4–6,8,15,16,22,26,30]. Several genera Ruminobacter and Treponema that are associated with metabolizing uncultivated plant products and are enriched in the Chepang foragers in this study are also elevated in hunter-gatherers across the world [19,28,30,33]. Moreover, Prevotella and Eubacterium, which have been previously associated with vegetarian diet in the westerners [5] were enriched in all Himalayan populations relative to Americans. In contrast, taxa associated with animal proteins in diet such as Bacteroides and Blautia [5,46] were enriched in the Americans relative to Himalayan populations. This is consistent with low animal protein content in diet across Nepal [47].
In addition to diet, environmental factors may also influence the human gut microbiome [7,23,44]. Consistent with these findings, we found that differences in sources of drinking water may exert a detectable effect on the gut microbiota. Differences in mineral and microbial content in drinking water in Nepal has been previously reported [48–51], which may be affecting gut microbiome in our Himalayan participants. Moreover, prolonged exposure to air pollutants have been shown to alter gut microbiome in mice [44]. Whether direct or indirect, breathing polluted air containing higher levels of particulate matters due to solid biomass cooking fuel is linked to gut microbiome composition in our study. In addition, intestinal parasite load has been shown to alter gut microbiota [23]. The association between gut microbiome and parasite load approached significance in our participants as well (P=0.075, ANOVA), although it did not reach significance likely due to lower parasite abundance in our participants.
Despite noticeable differences in the gut microbiome composition, we did not observe significant differences in gut bacterial diversity (alpha diversity) across lifestyles in the study populations. Comparisons of populations that reside in similar geographical areas but practice different subsistence strategies such as the BaAka hunter-gatherers and Bantu farmers [28] as well as Matses hunter-gatherers and Tunapuco farmers [33] also showed little differences in alpha diversity. However, these and other traditional populations such as the Hadza [19] have elevated gut bacterial diversity relative to westerners. We did not observe higher alpha diversity in the traditional Himalayan populations relative to the Americans. One possible explanation for this discrepancy could be that latitude is the primary factor that influences gut bacterial diversity. The traditional populations included in previous studies reside in the tropical climate zones, which have higher biodiversity likely affecting both diet and environmental microbial exposures.
In conclusion, our results emphasize the need to study additional traditional populations to understand how geography, climate, diet, and environment affect the gut microbiome. By comparing human populations that reside in a relatively small geographical area, shared a common diet and lifestyle until recently, and are currently practicing different subsistence strategies, we show that human gut microbiome undergoes marked changes within decades of increasing urbanization. Indeed, the extent to which the numerous factors associated with urbanization contribute to gut microbiome change remain to be determined, although gut microbiome extinction events have been shown in experimental models to result from western diet, antibiotics, and chemical laxatives [5–7]. However, the global trends of bacterial taxa within the gut that undergo depletion or enrichment upon lifestyle transitions are striking. The functional consequence of these changes, both in terms of the intrinsic microbial ecology of the gut and the impact on human biology, are critical questions for the field to address. Future work should incorporate metagenomics to characterize the gut microbial variation at finer scales, metabolomics and strain culturing to assess functional differences, and immune and metabolic profiling of these populations. Pursuit of mechanisms by which the gut microbiome interacts with the ecosystems of these populations may reveal conserved connections between microbial and human biology with large implications for industrialized humans who lack these microbes.
MATERIALS AND METHODS
Study sites, participating individuals, and sample collection
Stool samples were collected with informed consent from 56 adult participants (over 18 years old) from four indigenous Himalayan populations from Nepal and 10 adult Americans of European descent. Indigenous populations from Nepal included Chepang (N=14), Raji (N=10), Raute (N=12), and Tharu (N=20) inhabiting in Chitwan, Bardia, Dadeldhura, and Sarlahi districts respectively. The samples were collected in winter of 2016 (March and April) with consent from all participants. This work was approved by Ethical Review Board of the Nepal Health Research Council (NHRC) as well as by the Stanford University Institutional Review Board (IRB).
In addition to collecting the fecal samples, we also obtained ethno-linguistic, demographic, environmental, and dietary data from the participants using a survey questionnaire specifically designed for this study. The survey questionnaire assessed participant’s age, gender, diet, health status, use of medication, and behavioral practices such as tobacco and alcohol consumption along with several environmental variables (Supplementary Table 2). In addition, we also visually inspected the stool samples of each individual under the microscope for the presence of intestinal parasites (triplicate slides per individual). Participants’ responses to survey data questionnaires are included in Supplementary Table 4.
DNA extractions
Freshly produced stool samples from the Himalayan participants were collected on a clean OMNIgene gut accessory collection paper (OM-AC1). About 500mg of the stool samples was transferred to the OMNIgene gut kit collection tube containing the stabilizing buffer using the clean spatula provided with the kit. The tubes were shaken hard in a back and forth motion until the fecal samples were completely homogenized. Tubes were transported at room temperature within 48-72 hours of collection to Tribhuvan University Institute of Medicine, Kathmandu, Nepal where they were transferred to -80°C until DNA extraction. DNA was extracted using MolBio Power Soil Kit according to the manufacturer’s protocol. Extracted DNA was shipped to Stanford University on dry ice and stored at -20°C until sequencing. Samples from Americans were collected from volunteers at Stanford University in a 15ml centrifuge tubes and transported to the laboratory on ice. Half of each sample was immediately frozen at -80°C. From the other half, 500mg stool was transferred to OMNIgene collection tubes and kept at room temperature for 48-72 hours after which they were stored at -80°C. DNA was extracted from both sets of samples simultaneously using MolBio Power Soil Kit according to the manufacturer’s protocol and stored at -20°C until sequencing.
16S sequencing and analyses
The V4 region of the 16S rRNA gene was PCR amplified using the primers and protocols described previously [52]. The amplified DNA fragments were multiplexed and subjected to paired-end sequencing using Illumina MiSeq. Of the 66 samples, one yielded very low levels of DNA and another failed the paired end sequencing. After discarding these two samples, the final dataset included 64 individuals (14 Chepang, 9 Raji, 11 Raute, 20 Tharu, and 10 Americans). The amplification primers and barcodes used for multiplexing are described in Supplementary Table 4.
Paired-end reads were processed using DADA2 [53] and subsequently analyzed in R using phyloseq [54]. In order to identify high quality sequences, reads were trimmed to 150 bp. Sequences with N nucleotides and/or >2 expected errors were discarded (maxN=0, maxEE=2, truncQ=2) and sequence variants were inferred by pooling reads from all samples (pool=TRUE). Sequence tables were then created by merging paired-end reads. A naïve Bayesian classifier method [55] implemented in DADA2 algorithm was used to assign taxonomy using the RDP v14 training set [56]. Multiple alignment was conducted using DECIPHER [57] package in R and a maximum likelihood phylogenetic tree was constructed using phangorn [58] with a neighbor-joining tree as the starting point.
A total of 1,183,760 merged reads passed quality control and 1630 taxa were initially identified. After removing chimeric sequences, which constituted 22% of the reads, 921,345 merged reads remained. Further elimination of low abundance phyla – Synergistetes and Deferribacteres – that were observed only once across all samples resulted in 883 taxa in the dataset. After quality control, mean (±SD) sequencing depth per sample was 11570 (±4653). We performed three technical replicates of the frozen sample for one individual and a total of five replicates for two additional individuals for the OMNI samples. Since we did not observe marked differences in the technical replicates (Supplementary Figure 3), we retained the sample with highest coverage for these individuals. After removing the replicate samples, 64 individuals and 875 taxa remained in the final dataset.
Random forest classifier model
One hundred random forest classifiers (RFC) with 50 to 5000 trees were constructed using all 35 variables (Supplementary Table 3) from the survey data using ‘randomForest’ R package [59]. We also repeated this analysis on the 16S data and reported the RFC with smallest out of bag error rate for both analyses.
Statistical analyses
Correspondence analysis of the survey data was performed using FactoMineR package in R [60]. Canonical correspondence analysis was performed at the genus level (taxa collapsed based on genus names) by calling functions from vegan package via phyloseq. Phylogenetic diversity was computed by rarefying the samples to various depths starting from 500-3000 sequences per sample. Alpha diversity was measured using species richness, Shannon’s H, Simpson’s D, and Fisher’s alpha, calculated as the mean values from 100 iterations at each depth. Kruskal-Wallis tests were used to assess the significance of differences in each of the alpha diversity metrics between populations at each rarefaction depth. Differences in rarefaction depth did not alter significance of the observed differences. Hence, we chose to report results from rarefaction depth of 3000, which was the maximum depth that allowed inclusion of all of the samples. Beta diversity was assessed using Bray-Curtis as well as unweighted and weighted UniFrac distances calculated by log transformation of the non-rarefied 16S count data. Permutational multivariate analysis of variance (PERMANOVA) was performed using the vegan package in R [61]. For all PERMANOVA analyses, 10000 randomizations were performed to assess the statistical significance. In order to identify differentially abundant taxa at the phylum and genus levels, we first agglomerated the taxa abundance (counts) at each taxonomic level respectively. The differences in taxa abundance (counts) were then assessed using the DESeq2 package [45]. Multiple testing corrections were performed by computing false discovery rates (FDR) using Benjamini-Hochenberg method and adjusted p-values < 0.05 were considered statistically significant.
AUTHOR CONTRIBUTIONS
Conceived by A.R.J, E.R.D, J.S, and C.D.B; sample collection, DNA extraction, and sequencing were performed by A.R.J, Y.G, G.P.G, D.B, S.T, and K.N; parasite characterization by D.B and S.T under the supervision of J.B.S; data analysis was conducted by A.R.J and E.R.D, and under the supervision of S.H, J.S and C.D.B. Resources were provided by Y.G, G.P.G, J.B.S, J.S, and C.D.B. Manuscript prepared by A.R.J with input from all authors.
ACKNOWLEDGEMENTS
We would like to thank the Nepal Health Research Council (NHRC) under the Government of Nepal Ministry of Health for providing research permits to conduct our work in Nepal. We are grateful to Mr. Biswash Chepang for community outreach and participant recruitment in the Chepang community. We express our gratitude towards all the participants in this study. This work was partly supported by Center for Human and Evolutionary Genomics (CEHG) Seed Award to A.R.J., and grants from National Institutes of Health R01-DK085025 and DP1-AT00989201 to J.L.S. E.R.D is supported by NIH F32 DK109595. C.D.B and J.L.S are Chan Zuckerberg Biohub and Chan Zuckerberg Biohub Microbiome investigators.
REFERENCES
- [1].↵
- [2].
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].
- [10].
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].
- [19].↵
- [20].
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].
- [50].
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵