Microbial diversity in raw milk and Sayram Ketteki from southern of Xinjiang, China

Raw milk and fermented milk are rich in microbial resources, which are essential for the formation of texture, flavor and taste. In order to gain a deeper knowledge of the bacterial and fungal community diversity in local raw milk and home-made yogurts from Sayram town, Baicheng county, Akesu area, southern of Xinjiang, China,30 raw milk and 30 home-made yogurt samples were collected and experiment of high-throughput sequencing was implemented.The results of experiments revealed the species of fungi in raw milk was the most, and the species of bacteria in fermented milk was the least.Based on principal component analysis (PCA), it was found that the bacterial and fungal community structure differed in samples from two types of dairy products.And the presence of 15 bacterial and 12 fungal phyla, comprising 218 bacterial and 495 fungal genera respectively, among all samples. Firmicutes and Ascomycota,Lactobacillus and Candida were the predominant phyla and genera of bacteria and fungi, respectively. The results indicated that the microbial community of raw milk differs from home-made yogurts due to sampling location and manufacturing process. The study suggested that high-throughput sequencing could provide a better understanding of microbiological diversity as well as lay a theoretical foundation for selecting beneficial microbial resources from this natural yogurt.

high-throughput sequencing to simultaneously sequence millions of DNA molecules, and constructs a metagenome library by detecting the genetic material sequences of variable regions in microorganisms, exploring the diversity of microbial species and community structure. This technology avoids non-cultivable and low-abundance microbial unpredictability, and objectively restores the community structure and abundance ratio, maximizing the development of microbial resources. This approach has been widely used in medical and environmental science [16,17], and gradually used in the food science [18].
Through metagenomics, the diversity of microorganisms, dominant flora and metabolic potential in Mexican Cotija cheese was discovered [19]. It indicated that the Lactobacillus, Leuconostoc and Weissella were the main dominant flora, the microbial metabolic activities related to the formation of multiple flavors mainly were involved the metabolism of branched chain amino acids and free fatty acids, and also discovered some genes related to bacteriocin production and immunity. Shewanella, Acinetobacter, Pelomonas, Dysgonomonas, Weissella and Pseudomonas were found in Tibetan kefir grains for the first time [20]. In Turkish Kefir grains, it was found that communities in the two samples showed high consistency. Lactobacillus including L. kefiranofaciens, L. buchneri and L. helveticus was the most abundant genus [21]. The diversity of fungi in French Tomme d'Orchies cheese was interesting. The results showed that it contained Yarrowia lipolytica and Galactomyces geotrichum, and some rare microorganisms had been discovered, such as Clavis-poralusitaniae, Kazachstania unispora and Cladosporium cladosporioides [22].
Sayram Ketteki (SK), a traditional and popular handcraft fermented food, was produced by local Uighur peasant woman in Xinjiang province of China. To generate SK, the milk from cows is used as the raw material, and a little yogurt reserved in the previous batch acts as starter of fermentation. Because of continuous making yogurt, the fermentation starters from some families even have decades of history. After fermenting in porcelain bowels at 10-25℃ about 6-8 hours, it obtains the products with the color of milky white, elastic, sweet and sour-tasting with a slightly alcoholic. SK was typical alcoholic fermented milk. Several studies have investigated the lactic acid bacteria and yeasts in SK using culture-based method and molecular biological identification. It was reported that the dominant strains of SK were L.Bulgaricus, Streptococcus thermophilus and Saccharomyces cerevisiae [28]. But some others studies demonstrated that L. helveticus, Streptococcus thermophilus and Kluyveromycus marxianus were the predominant [29]. Although there are some related studies [30], the mechanism of its flavor formation is still unclear. Undoubtedly, the formation of flavor has a lot to do with microorganisms, therefore, unraveling microbial community diversity of SK and its raw materials is very necessary.
In the current study, to explore the microbial species diversity and the microbial composition in RM and SK, 30 RM and 30 home-made yogurt samples were collected, metagenomic sequencing were carried out. Through bioinformatics analyses, it was found that microorganisms in both types of dairy products were various. Here we report the results.

Sample collection and DNA extraction
Sixty samples were collected aseptically from different national minority families of Sayram town Baicheng county which located in the southern of Xinjiang (approx 41°48' N, 81°87' E) of China, where the average temperature was 27°C in late July 2019. RM samples were named with the letter "X" (X1, X2....X34), while SK samples were named with the letter "S" (S1, S2....S34), respectively. Each RM sample and each SK sample was corresponding, because they came from the same family. These collected samples were stored at −20℃ for further use.
Every sample was centrifuged at 10000 rpm for 10 min. The supernatant was discarded and the precipitate was left. Total microbial DNAs were extracted from 500 mg sediment for each sample using the DNeasy Power Soil Kit (QIAGEN, Netherlands), according to instructions of the manufacturer, and kept at −20°C prior to future analysis. The quantity and quality of extracted DNAs were measured by NanoDrop ND-1000 spectrophotometer and 0.8% agarose gel electrophoresis.

PCR amplification of 16SrDNA and ITS sequences
The bacterial V3-V4 of 16SrRNA gene was amplified, the used primers were as follows:

Sequence quality control
In order to fulfill the requirements of subsequent analyses, the all readings were screened and filtered using Quantitative Insights into Microbial Ecology (QIIME) (V1.8.0) software [31].
Briefly, the raw reads with exact matches to the barcodes were assigned to respective samples and identified as valid sequences. The low-quality sequences that were less than 150 bp, average Phred scores of less than 20, ambiguous, and contained mononucleotide repeats more than 8 bp were discarded. Paired-end reads were assembled using FLASH [32]. After chimera detection, the remaining high quality sequences were clustered into operational taxonomic units (OTUs) at 97% sequence identity with UCLUST [33]. The highest sequence was taken as the representative sequence of the OTU. OTU taxonomic classification was conducted by BLAST searching and aligning the Prokaryotic sequences with the Greengenes database [34], and Eukaryotic sequences with the Unite database [35].According to the number of sequences included in each sample of each OTU, a matrix file of the abundance of OTU in each sample was constructed.

Bioinformatics and statistical analysis
Sequence data analyses were mainly conducted using QIIME and R packages (V3.2.0).

OTU-level alpha diversity indices, such as Chao1 and ACE richness index, Shannon and
Simpson diversity index, were calculated using the OTU table in QIIME. OTU-level ranked abundance curves were generated to compare the richness and evenness of OTUs among samples. Beta diversity analysis was performed to investigate the structural variation of microbial communities across samples visualized via Principal component analysis (PCA) .
The taxonomy compositions and abundances were visualized using GraPhlAn [36], and correlations with |RHO| > 0.6 and P < 0.01 were visualized as co-occurrence network using Cytoscape. Microbial functions were predicted by PICRUSt (Phylogenetic investigation of communities by reconstruction of unobserved states) [37].

Bacterial and fungal sequence richness and diversity
After filtering, we obtained 4,092,576 high-quality reads for bacteria, ranging from 60,078 The 16SrRNA of bacteria from SK had the largest number of sequences, followed by the number of ITS sequences of fungi from RM and 16S rRNA of bacteria from RM and the number of ITS sequences of fungi from SK was the smallest (Table 1). From the Chao1 index, the highest abundance of bacteria was in RM samples, followed by bacteria of SK, fungi of RM, and the lowest abundance was fungi of SK. From the Shannon index, the highest diversity was fungi of RM, followed by bacteria of RM and fungi of SK, and the bacteria of SK had the lowest diversity. The results of the numbers of OTUS were consistent with the results of diversity, which indicated that the species of fungi in RM was the most, and the species of bacteria in SK was the least.
The rarefaction curves, which can predict the total number of species and relative abundance of each species in a sample at a given sequencing depth indicated that when the prokaryotic sequencing volume was 40,000, the bacterial sequence of each sample had not entered the plateau period (Fig 1A), and Shannon index curve (Fig 1B) indicated that all samples were still sequencing. When the amount reached 10,000, all OTUs were saturated.
Although continued sequencing might discover new phylotypes, at this level, the bacterial diversity in the sample had already been captured. All the eukaryotic sequences at 10,000 had entered a gentle state, indicating that the sequencing volume of this experiment can truly reflect the diversity of fungi in the samples (Fig 1C).

Comparison of bacterial and fungal community structure between RM and SK
The contribution rates of the first principal component of the sample bacteria and fungi were 61.86% and 37.29%, respectively, and the contribution rates of the second principal component were 10.4% and 21.3% (Fig 2). The closer the distance between the samples in the PCA diagram indicated the closer the composition of the two microbial communities. The SK and RM samples were separated on both sides, demonstrating that there was a significant difference in the bacterial group structure between the SK and the RM. The SK samples were almost stacked in a straight line, indicating that these samples had a high similar bacterial flora (Fig 2A). Simultaneously, the scattered distribution of bacterial communities in RM samples revealed that there were large differences between samples. The fungal samples ( Fig   2B) showed a large aggregation and small dispersion. Most RM and SK samples were highly aggregated and overlapped, suggesting that these samples might have the similar fungal community, and a small part of the SK samples be scattered. This indicated that there was a difference in the fungal community among these samples.   Rhodotorula (1.00%) (Fig 4B). It indicated that Candida had the highest content in the two types of samples (Table 2 and Fig 4C). After fermentation, the genera contents of Candida and

Bacterial and fungal composition of RM and SK
Kluyveromyces increased significantly. The OTUS contents of Mucor, Clavispora, Yarrowia and Pichia increased slightly, and other 14 fungal genera decreased significantly.
In all, we found that the OTUs of predominant bacterial genera in SK was higher and obvious, and the OTUs of fungal genera were relatively uniform and small in both two types of dairy products (Table 2 ).

Spearman association network analysis of dominant genera interaction
The association network of dominant bacteria was constructed by using Cytoscape, and the most of the bacterial relationships were positive (Fig 5). The Lactococcus was positively correlated with more than 20 species of bacteria for example Klebsiella, Staphylococcus, Enterobacter and so on, while the Lactobacillus was negatively correlated with more than 20 species of bacteria such as Enterococcus, Lactococcus and Streptococcus. It suggested that the lactic acid bacteria might ferment lactose to produce lactic acid, which created an acidic environment. The acid tolerance of Lactobacillus was higher than other bacteria, especially Lactococcus. This also explained the high abundance of Lactobacillus in yogurt, and higher abundance of Lactococcus in RM.

Prediction of flora metabolism function
Using the PICRUSt, the prediction of bacterial metabolic functions can be achieved by comparing the existing 16SrRNA gene sequencing data with the microbial reference genome database with known metabolic functions. The composition of the KEGG two-level functional prediction information obtained from RM and SK was basically similar, but the abundance was obviously different (Fig 6A).
Bacterial community with higher abundance of RM was involved in the two classifications of carbohydrate metabolism and amino acid metabolism. SK had the largest variety of flora for carbohydrate metabolism. Venn diagram of common functional groups showed that although there were obvious differences in the composition of the microbial community, the functional groups in each type of sample were basically similar (Fig 6B).

4.Discussion
The multiple microorganism and species diversity exist in RM [38] and traditional fermented dairy products [39,40], and they play a crucial role in the formation of flavor.
Studies have shown that the main source of the flavor substance acetaldehyde in yogurt is the synthesis of threonine by threonine aldolase secreted by the Bulgaria subspecies, and Streptococcus thermophilus can metabolize citric acid to produce diacetyl during yogurt fermentation and storage [41,42]. There are various and popular fermentation dairy products in Xinjiang rural area, such as Kurut, Shubat, hence, there is no doubt that their particular flavor has a lot to do with the microbial polymorphism. However, the microbial diversity of these dairy products remain elusive. SK is one of the beloved yogurt and its microbial diversity has been not investigated yet, so, it's the first time to study its microbial polymorphism in the current study.
Previous studies revealed that Lactobacillus is the main flora in cow's milk [43,44]. In the current study, the Macrococcus, Acinetobacter, Pseudomonas and Lactococcus were the dominant bacterial genera in RM, accounting for 51.5% of the total sequences, but the proportion of each sample is different. The Lactobacillus and Streptococcus were the dominant bacterial genera in SK, which are the similar to other studies [45]. At the fungal level, Candida was the absolutely advantageous genus in both two types of dairy products, while Aspergillus and Kluyveromyces were the subsequent dominant genera in RM and SK samples, which differed with others study [46]. The Kluyveromyces marxianus used in fermented food had the characteristics of weak alcohol and good aroma production, and could increase the content of free fatty acids and flavor substances in yogurt [47,48]. Maybe, that's why the flavor of SK is different from the other yogurts.
There were huge differences in the types and contents of microbial flora between samples. The most SK samples contained more than 50% Lactobacillus and about 10% Streptococcus, but in S12, the contents of Lactobacillus In all, our findings partially explore the microbial diversity of RM and SK in south Xinjiang and contribute to providing a basis for industrialization SK. Besides, the microbial functional prediction can be helpful to search for new probiotics.