Human microbiome aging clocks based on deep learning and tandem of permutation feature importance and accumulated local effects =============================================================================================================================== * Fedor Galkin * Aleksandr Aliper * Evgeny Putin * Igor Kuznetsov * Vadim N. Gladyshev * Alex Zhavoronkov ## Graphical Abstract ![Figure1](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2018/12/28/507780/F1.medium.gif) [Figure1](http://biorxiv.org/content/early/2018/12/28/507780/F1) **Abstract** The human gut microbiome is a complex ecosystem that both affects and is affected by its host status. Previous analyses of gut microflora revealed associations between specific microbes and host health and disease status, genotype and diet. Here, we developed a method of predicting biological age of the host based on the microbiological profiles of gut microbiota using a curated dataset of 1,165 healthy individuals (3,663 microbiome samples). Our predictive model, a human microbiome clock, has an architecture of a deep neural network and achieves the accuracy of 3.94 years mean absolute error in cross-validation. The performance of the deep microbiome clock was also evaluated on several additional populations. We further introduce a platform for biological interpretation of individual microbial features used in age models, which relies on permutation feature importance and accumulated local effects. This approach has allowed us to define two lists of 95 intestinal biomarkers of human aging. We further show that this list can be reduced to 39 taxa that convey the most information on their host’s aging. Overall, we show that (a) microbiological profiles can be used to predict human age; and (b) microbial features selected by models are age-related. ## Introduction The human gut is colonized by a dense microbial community, calculated to consist of 1014 cells, which is an order of magnitude higher than the number of cells in the host 1. Gut microbiota is a complex ecosystem that carries multiple important functions in the organism. Apart from being a core element of the digestive system, microbiota regulates immunity, processes xenobiotics, produces important metabolites, and even affects higher neural functions 2–4. The influence, however, is not one-sided: microbiota is not simply determining certain host characteristics, as it responds to signals from the host via multiple feedback loops 5. Some of these feedback loops were found to be reflected in the microbiota composition. For example, multiple studies indicate that irritable bowel diseases can develop following the intense immune response to an intestinal infection. Microbiota responds to proinflammatory milieu with a decreased number of beneficial bacteria that lack mechanisms to survive under such hostile conditions. In return, host immunity reacts to suppress the blooming pathogenic community, which produces chronic inflammation 6. Such changes constantly happen throughout an individual’s life and may be deleterious or beneficial, reflect strictly individual choices or be the effects of more widespread factors across populations. Metagenomic studies have provided valuable insights into how the gut microflora progresses with age. They revealed that gut colonization occurs during birth with the bacteria living in the birth canal. The “pioneer microbiome” consists of facultative aerobes (e.g. *Escherichia, Enterococcus, etc.*) that gets replaced during breast feeding with obligate anaerobes (e.g. *Bifidobacterium infantis*) 7. Upon weaning, another community shift happens towards more adult-like microbiomes 8. These early stages of colonization are extremely important as normal infant microbiota promotes intestinal mucus formation, prevents pathogen blooming, and regulates T-cells. The importance of early colonization is further emphasized by studies that indicate higher occurrences of eczema and food allergies in children with atypical microbiota 9 development (e.g. increased abundance of *Clostridium* and *Escherichia* microbes) 10. Factors such as the mode of birth delivery (vaginal or cesarean), infant diet (breast milk or formula), and maternal microbiome greatly influence microbiome development. Although infant microbiome succession is well studied and can be used to assess the risks of various health conditions, its transition to adult microbiome is less understood. More so, composition variability attributed to geographic location, medical history, diet, and other factors make it hard to analyze adult microbiomes as effectively as those of infants. Age-related studies of human microbiome have failed to produce a straightforward theory of gut flora aging. Some studies indicate decreasing biodiversity in the elderly gut 11,12. However, that is not the case for all data sets, and elderly healthy people may have microbiomes as diverse as the younger population 13,14. Other findings include changes in specific taxa abundance in aging microbiota. Such bacterial genera as *Bacteroides, Bifidobacterium, Blautia, Lactobacilli, Ruminococcus* have been shown to decrease in the elderly, while *Clostridium, Escherichia, Streptococci, Enterobacteria* increase 15,16. However, these patterns are not strictly established as results vary greatly across different studies. This may be attributed to different methodologies as well as unbalanced data sets that may contain people of different lifestyles 17. Despite these complications, the consensus is that the elderly gut has lower counts of short chain fatty acid (SCFA) producers such as *Roseburia* and *Faecalibacterium* and an increased number of aerotolerant and pathogenic bacteria. Such shifts can lead to dysbiosis, which in turn contributes to the onset of multiple age-related diseases 9. The idea that the gut microflora can be a major contributor to the aging process is not new. Already in the beginning of the 20th century, a Nobel Prize-winning Russian scientist Ilya Metchnikoff proposed that the malicious microbes processing undigested food (especially peptolytic bacteria, e.g. *Escherichia* and *Clostridium*) lead to autointoxication. Treating autointoxication with pro- and pre-biotics (such as *Lactobacillus* preparations) was suggested to alleviate an age-associated decline in organismal function. Recent studies have demonstrated promising results in line with this century-old hypothesis 18–20. The standard way of separating the gut microbiome into three chronological states - child, adult, and elderly microbiomes - lack a clear set of rules. Among them, adult microbiome remains the greatest mystery. It has no established succession stages, as in newborns, and does not normally reflect gradient detrimental processes typical for an old organism. This poses a question whether normal adult microbiome progresses at all or it is in a state of stasis. Considering the aging process is gradual and involves accumulation of damage and other deleterious changes 21 (as also indicated by a number of biomarkers such as DNA methylation clocks 22,23), it is logical to suppose that gut microbiome succession is also gradual 24. However, attempts to use microbiome-derived features to predict chronological age have been inconclusive. A support vector machine model trained on human metagenomic data to classify samples as young or old was shown to be only 10-15% more accurate than random assignment, as indicated by the Area Under the Curve (AUC) score 25. Another study attempting to use a co-abundance clustering approach has demonstrated general trendlines of microbiota composition for hosts aged 0-100 26. According to the study, specific clades of the gut community significantly differ in abundance among young adults compared to the middle aged. However, the lack of dietary and lifestyle data prevents the authors from putting together a conclusive theory of gut microflora progression. Compared to the well-established DNAm aging clocks that achieve mean absolute error (MAE) <5 years, these results of microflora-based age prediction suggest much room for improvement 27,28. The renaissance of deep learning that started in 2015 resulted in unprecedented machine learning performance in image, voice, and text recognition, as well as a range of biomedical applications 29 such as drug repurposing 30 and target identification 31. One of the most impactful applications of DL in biomedicine was in the applications of generative models to *de novo* molecular design 32–36. In the context of aging research, these new methods can be combined for geroprotector discovery 37–41. Indeed, since 2013, many aging clocks have been developed in both humans and other model organisms. The published aging clocks utilizing deep learning were developed using standard clinical blood tests 42, facial images 43, physical activity data, 44 and transcriptomic data 45. These clocks were used to rank the most important features contributing to the accuracy of the prediction by using the permutation feature importance (PFI), deep feature selection (DFS) and other techniques. These clocks were also used to assess the population-specificity of the various data types 42. The goal of this study was to build a predictor of age with whole genome sequencing (WGS) data aggregated from multiple sources and various machine learning techniques and use it to examine patterns of incessant microflora succession. Here, we report a method to estimate a host’s age based on their microflora taxonomic profile, assess the importance of specific taxa in organismal aging, and suggest candidate geroprotective microbiological interventions. ## Methods ### Data acquisition Only publicly available, fully anonymized data sets from WGS human metagenomic studies deposited in ENA and SRA were used. The corresponding project IDs are: ERP005534, SRP008047, ERP009422, ERP004605, ERP002061, ERP002469, ERP019502, SRP002163, ERP003612, ERP008729.46–49 Only healthy individuals with age metadata available were included in this study. These individuals were from Austria, China, Denmark, France, Germany, Kazakhstan, Spain, Sweden and USA. aged 20-90 years old. In total, 1,165 healthy individuals and 3,663 samples from 10 publicly available datasets were aggregated and analyzed (Figure 1). ![Figure 1:](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2018/12/28/507780/F2.medium.gif) [Figure 1:](http://biorxiv.org/content/early/2018/12/28/507780/F2) Figure 1: age distributions for 3’663 runs (A) and 1’165 donors (B) used in this study. ### Abundance calculation All acquired sequencing files have been quality trimmed and quality filtered with BBTools50. Human sequences have been detected using hg19 genome index. Additionally, specimen dilution test has been carried out as specified in 51. Resulting reads have been analyzed with Centrifuge and mapped against the collection of bacterial and archaeal genomes 52. In certain cases, operational taxonomic units ables have been modified to exclude unreliably detected microbes (relative abundance < 1e-5) and minor microbial species (<1.3e-3 prevalence). No sample has lost more than 5% of its abundance. After all the modifications, individual taxonomic profiles have been renormalized by dividing the vector by the sum of the abundances left. ### Neural networks training #### Regression All deep neural networks (DNNs) were implemented using the Python 3.6 Keras library with Tensorflow backend. Feature selection models were trained using a full list of species-level features, which includes 1,673 microbial taxa. Training and validation sets were separated to contain 90% and 10% of all profiles in all cases. Two regressors were built: one using taxonomic profiles derived from individual samples (sample-based model) and a second one using taxonomic profiles averaged among all the samples belonging to the same host (host-based model). Models were trained as a regressor with five-fold cross-validation. After completing grid search for various model configurations, the best performing model was selected based on the maximal R2 score. The best performing model architecture was determined in the sample-based setting. It contains three hidden layers with 512 nodes in each, with PReLU activation function, Adam optimizer, dropout fraction 0.5 at each layer, and 0.001 learning rate (Figure 2). The same architecture was applied to within the host-based setting. To verify the importance of features derived from the sample-based DNN model, gradient boosting was used, as implemented in XGBoost Python library 53. The best performing XGBoost model was trained using the following parameters: linear\_nthread = 35, max\_depth = 6, max\_delta\_step = 2, lambda= 0, gamma=0.1, eta=0.1, alpha = 0.5. The XGBoost models’ performance was evaluated using MAE. ![Figure 2:](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2018/12/28/507780/F3.medium.gif) [Figure 2:](http://biorxiv.org/content/early/2018/12/28/507780/F3) Figure 2: The neural network configuration for the best performing DNN regressor. The regressor takes in a full species level taxonomic profile and estimates the donor’s exact chronological age. The first hidden layer is linear and was used only to assess feature importance in accordance with deep feature selection method 55. #### Classification Age classifier models were trained using a subset of either 95 features or 39 features. Training and validation were separated to contain 80% and 20% of all donors, respectively. The age bracket classifier was implemented with the Python Keras library using Tensorflow backend. A weighted F1-score was selected as the target metric to assess model performance. Best performing architectures are illustrated in Figure 3. For 95 feature classifier it is: 128, 32 and 8 nodes respectively in 3 hidden layers, dropout rate of 0.5, PReLU activation function in hidden layers, softmax activation function in the output layer 54. For 39 feature classifier it is: [64, 8] nodes in 2 hidden layers, 0.5 dropout rate, PReLU activation function in hidden layers, Softmax activation in the output layer. ![Figure 3:](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2018/12/28/507780/F4.medium.gif) [Figure 3:](http://biorxiv.org/content/early/2018/12/28/507780/F4) Figure 3: The best performing DNN configurations for age bracket classification (20-40, 41-60, 61-90 years) based on short marker sets: 39 taxa (A), 95 taxa (B). ### Oversampling To solve the class imbalance problem while building models for age bracket prediction, we used oversampling. Self Organizing Maps (SOMs) based on presence/absence profiles (1 if a taxon is detected in a sample, 0 if it is not) have been built for each age bracket with the Python library Somoclu. Each SOM consists of 100 cells placed on a toroid lattice. To generate synthetic profiles for underrepresented classes, codebook vectors are picked at random with replacement according to the number of Best Matching Units (BMUs) mapped to them. Codebook values are used as probabilities for including a taxon into a fake sample. Fake presence/absence profiles are then multiplied by a vector of mean abundances of corresponding BMUs and normalized. ### Feature importance To assess individual feature importance, we have applied the Permutation Feature Importance (PFI) technique. PFI measures the change in prediction quality (measured in R2 score decrease) upon permuting a single feature vector. Greater decrease in quality signal greater importance of the feature. The features deemed most important have been further assessed with the Accumulated Local Effects (ALE) method to determine the change in age prediction upon minor changes in a microbial species abundance. ALE has been implemented following the algorithm described below. For each of the 95 selected species, a quantile value table (with 5% steps) has been composed. Local Effects (LE) for each quantile bin have been calculated by measuring the average change in prediction upon substituting observed abundance of a feature, with right and left bin border values. ALEs for each quantile are calculated by adding up all the previous LEs and centering the result to make the average effect of each taxon zero. ## Results ### Age prediction using machine learning To examine the relationship between human gut taxonomic profiles and chronological age, we prepared a collection of full metagenome sequences for 1,165 healthy individuals (3,663 samples total) from 10 publicly available datasets. All individuals in our data set were between 20 and 90 years, with median age of 46 years. After randomly separating the 3,663 samples into training (90%) and validation (10%) sets, we trained a deep neural network regressor to predict donor’s age using a vector of relative abundances for 1,673 microbial species. MAE achieved by the best model configuration was 3.94 years, with R2 of 0.81 (Figure 5A). We then divided the samples into three age groups (20-39, 40-59 and 60-90 years) and found that the predicted age distribution generated by the model closely matched the actual age distribution (Figure 6). ![Figure 4:](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2018/12/28/507780/F5.medium.gif) [Figure 4:](http://biorxiv.org/content/early/2018/12/28/507780/F5) Figure 4: Accumulated Local Effects (ALE) method used in this paper to assess specific taxa influence on age prediction. Changes in predicted age upon substituting observed taxon abundance with quantile values are averaged and recorded for every quantile bin. Then, they are summed to produce ALEs, which are additionally centered for convenience. ![Figure 5:](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2018/12/28/507780/F6.medium.gif) [Figure 5:](http://biorxiv.org/content/early/2018/12/28/507780/F6) Figure 5: Age predictions derived from cross-validation of the sample-based DNN model (A) and the XGB model (B). Samples are colored by data source, and dashed lines mark the median of observed age (46 years). ![Figure 6:](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2018/12/28/507780/F7.medium.gif) [Figure 6:](http://biorxiv.org/content/early/2018/12/28/507780/F7) Figure 6: Density distribution for observed (blue) and predicted (orange) ages for two regressors: DNN (A) and XGB (B). “N” stands for the total number of samples per class. Dashed lines within violins stand for quantile borders. Mean Absolute Error (MAE) (in years) for each age group is marked below the graph. To verify the results obtained with DNN, we implemented random forest, support vector machine and elastic net regressor. All of these methods performed poorly compared to the DNN approach with the mean absolute errors exceeding 11 years. Apart from them, we trained a gradient boosting (XGB) regressor with accuracy comparable to the DNN model (MAE = 4.69 years, R2 = 0.81) (Figure 5B). Both approaches skew the predictions towards the median age — 46 years (Figure 6). While there are certain variations within taxonomic profiles due to differences in geographical location or diet types, the described predictors can be applied to adult people from various populations equally well (see Supplementary). ### Microbiological influence on age prediction Using Permutation Feature Importance (PFI), we assessed which taxa abundances play the greatest role in microbiological age prediction. We identified 95 features that decrease both XGB and DFS models’ R2 score by >0.001 (Figure 7). According to PFI scores, DNN regressor is more sensitive to highly abundant species, while XGB regressor contains some minor taxa among its most important features. We consider this an indication of DNN’s increased robustness compared to other methods. The complete list of 95 taxa with corresponding scores, abundances and prevalences can be found in Supplementary Table 1. ![Figure 7:](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2018/12/28/507780/F8.medium.gif) [Figure 7:](http://biorxiv.org/content/early/2018/12/28/507780/F8) Figure 7: Bubble plot of 95 microbial taxa with PFI importance score >0.001 in both regression models (average among 5 folds). Bubble size stands for taxon prevalence (fraction of samples where the taxon is reliably detected), bubble color stands for taxon abundance (its average fraction in the communities where it was detected). To characterize how these 95 features affect age prediction, we utilized the Accumulated Local Effects (ALE) approach (Figure 4). The ALE approach measures the response of a regressor to changes in specific taxa abundance. Each feature’s ALE was calculated using only the independent profiles where it can be reliably detected (abundance > 1e-5). Some microbes showed steadily increasing age prediction with increasing abundance (e.g. *[Eubacterium] hallii*); other microbes were on the opposite, inversely correlating with predicted age (e.g. *Bacteroides vulgatus*) (Figure 8). Interestingly, certain microbes that were previously identified as important by PFI showed little influence on predicted age (e.g. *[Eubacterium] rectale*) (Figure 8). ![Figure 8:](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2018/12/28/507780/F9.medium.gif) [Figure 8:](http://biorxiv.org/content/early/2018/12/28/507780/F9) Figure 8: Twelve most important features’ effects on age prediction. Plots contain only 5-95% quantile segment due to extreme ALE values for extreme quantiles. N is the number of samples where a feature is reliably detected (abundance > 1e-05), total number of samples used is 1,165. More ALE plots are available in Supplementary Information. Using ALEs, all features can be classified into seno-positive (monotonically increasing ALE plot), seno-negative (monotonically decreasing ALE plot), and more complex groups (not monotonic cases) (Figure 9). Among 95 features, only 39 displayed the average change in predicted age of more than 1 year within the 5%-95% quantile bracket. Among those, 21 were seno-negative, 15 seno-positive and 3 non-monotonic. ![Figure 9:](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2018/12/28/507780/F10.medium.gif) [Figure 9:](http://biorxiv.org/content/early/2018/12/28/507780/F10) Figure 9: ALE range (maximum ALE minus minimum ALE within 5-95% abundance bracket) for 95 selected microbial features. Red are monotonically increasing ALEs, blue are monotonically decreasing ALEs, and green are non-monotonic ALEs. Only 39 taxa affect age prediction for more than 1 year within the specified abundance bracket. ### Age bracket prediction with DNN While DNN and XGB regressors displayed acceptable accuracy when trained on full taxonomic profiles, decreasing the number of features down to 100 during training produces poorly performing models (MAE > 11 years). To estimate the predictive value of 95 and 39 marker taxa sets (Figure 9), we applied them to a much easier task of age bracket prediction. All donors were separated into three age groups: young (20-39 years, 32% of all donors), middle aged (40-59 years, 41% of all donors) and elderly (60-90 years, 27% of all donors). Underrepresented classes were oversampled (see Methods). Within this setting, best performing DNN architectures show significantly higher accuracy than either random age group assignment (equiprobable or weighted). While the mean weighted F-score for random models do not exceed 38±1%, 95 marker set achieved the F-score of 67±4%. Downsizing this marker set using ALEs to 39 taxa reduced the score by 5% (to 62±3%). We have additionally compared the classifier constructed using the ALE-defined 39 intestinal marker set to classifiers built on relative abundances for 39 randomly selected taxa. Neither of 100 sets has produced a classifier as good as ALE-selected features (38±3%). (Figure 10) ![Figure 10:](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2018/12/28/507780/F11.medium.gif) [Figure 10:](http://biorxiv.org/content/early/2018/12/28/507780/F11) Figure 10: F-scores for four age bracket classifiers: three random models and two models built with 95 and 39 marker taxa. Equiprobable random classifier assigns a test sample to each age group (20-39, 40-59, 60-90 years) with ⅓ probability, weighted random classifier assigns samples with probabilities equal to the fraction of a class in the test sample. Models with randomly selected markers are built using 39 random taxa abundances as input. N stands for the number of cross validation folds. ### Host-based age prediction While the DNN model is highly accurate, during its training all available samples were treated as independent due to data scarcity. By averaging the taxonomic profiles obtained from samples with a shared host we eliminated remaining data contamination. This reduced the total number of features to 1,165 entries. The host-based model was trained using the best performing DNN configuration as identified during sample-based training (Figure 2). This model was less accurate than a sample-based one: it reached MAE of only 8.56 years (Figure 11). However, the model still performed better than baseline age assignment (MAE = 12.47 years). Interestingly, the regressorprocesses feamle and male specimen with equal accuracy, and the predicted intestinal age positively correlates (r = 0.23) with BMI, which is in line with existing data on connections between BMI and biological age 56. However, this correlation is lower than the one between donor BMI and observed age: r = 0.3. ![Figure 11:](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2018/12/28/507780/F12.medium.gif) [Figure 11:](http://biorxiv.org/content/early/2018/12/28/507780/F12) Figure 11: Age predictions derived from cross-validation of the host-based DNN model. Average MAE for best performing models in each of the 5 folds is 8.39 years, which is much lower than in the case of the sample-based approach (3.96 years). Blue area contains 52% of all predictions and corresponds to the trendline ±6 years. ## Discussion To our best knowledge, we present the first method to predict human chronological age using gut microbiota abundance profiles. We compare two approaches to age prediction: regression and classification. We applied multiple methods to build a regressor that takes in profiles containing abundances for all 1,673 taxa reliably detected in at least 0.13% of samples, including random forest, support vector machine, elastic net, gradient boosting (XGB) and deep neural network (DNN). However, only the latter two models achieved the predictions better than random (Figure 5). Due to data scarcity, we initially trained our models treating all samples as independent, while some of them belonged to the same host. To further demonstrate the applicability of the suggested method for age prediction, we trained a DNN model reducing the number of samples to only one per host. Not surprisingly, the resulting accuracy of the predictor was significantly lower (MAE = 8.56, Figure 11), yet above random. Such factors as study protocols and host country of residence (integrating geographic location, genotype and lifestyle) can be expected to affect taxonomic profiles. Despite great performance of XGB (MAE = 4.69 years) and DNN models (MAE = 3.94 years), extracting biologically relevant information from them presents a major challenge. We implemented ALEs approach using DNN regressor as a reference and its 95 most important features to see how changes in abundance affect the predictions. ALE is a technique that theoretically surpasses PFI as it takes into account intrinsic interdependence of microbiological features. According to our ALE analysis, only 39/95 features could change the average predicted age by more than 1 year (Figure 9). Interestingly, reducing the number of features by 59% caused only a 5% drop in F-score for the age bracket classification task. This suggests that the ALEs technique succeeded in selecting only the most relevant microbial features. Table 1 provides information for each bacterium in the 39 ALE-selected marker set of intestinal aging. Interestingly, while it contains both beneficial (e.g. *Bifidobacterium*) and pathogenic (e.g. *Pseudomonas aeruginosa*) microbes, seno-positive or seno-negative status is not determined by the nature of host-microbe interactions (Figure 12). For example, *Campylobacter jejuni* is known to cause campylobacteriosis – a foodborne diarrheal infection–yet it is seno-negative and can affect the average prediction age by more than 2 years (Figure 9) 57. On the other hand, both selected *Eubacterium* species are seno-positive and increase average predicted age by 1-3 years (Figure 9), despite having a generally beneficial effect on microbiota composition. View this table: [Table 1:](http://biorxiv.org/content/early/2018/12/28/507780/T1) Table 1: 39 ALE-selected biomarkers for microbiological age prediction. Median abundance for a microbe is calculated excluding the samples where it is not detected. Total number of samples: 1,165 ![Figure 12:](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2018/12/28/507780/F13.medium.gif) [Figure 12:](http://biorxiv.org/content/early/2018/12/28/507780/F13) Figure 12: Seno-positive/-negative status of a species is not determined by its function within the gut community. While some pathogens are associated with increased age prediction (red quadrant) and some beneficial bacteria are correlated with lower age prediction (green quadrant), many other species are seno-negative pathogens and seno-positive normobiotic species (blue quadrants). Moreover, some taxa are scarcely described in literature, yet they have a pronounced effect on age prediction (grey area). In the text, we suggest hypothetical explanations as to why taxa might be occupying their respective positions in the plane above. Although surprising at first glance, bacterial influence on age prediction is not determined by whether it is beneficial to the host or not. The proposed method of feature selection does not detect microbes that promote longevity or support useful functions of “youthful” microbiota. In the case of *C.jejuni*, campylobacteriosis affects mostly children. Moreover, exposure to *C.jejuni* can lead to asymptomatic colonization and immunity acquisition 58. Taken together, these facts can be used to put together a hypothetical explanation of *C.jejuni* being a seno-negative feature. Older individuals have a lower count of these bacteria, as they are more likely to carry the memory of previous *C.jejuni* exposure (either in their immune system or microbiota composition) and can effectively prevent its extensive colonization. Meanwhile, younger individuals have not yet tailored the means to oppose *C.jejuni* and let it multiply to a greater extent. Beneficial bacteria that turn out to be seno-positive are, in fact, more intriguing than seno-negative pathogens. One possible explanation could be that these bacteria are more resilient in the context of increasingly detrimental cross-talk with the host. Another possible explanation questions the very concept of the microbiological aging clock. Since global dieting and lifestyle habits have significantly changed during the last century—increased sedentary time, sugar intake, processed foods consumption, etc.—any microbiota changes observed in the elderly may not indicate natural progression through age, but reflect generation-specific microflora parameters 59. In other words, any features identified today as associated with the youth may become the signature of the elderly in 50 years, provided global diet and lifestyle keeps changing. Depending on the extent of such microbiological “generation gap,” any future intestinal aging clock may need to be regularly updated to account for an ever-changing environmental context. Unfortunately, metagenomics is an extremely young branch of biology and there is little hope to learn what the microflora of the elderly looked like when they were young. However, studies of multiple generations of migrants can help us estimate the persistence of microbiota obtained in early years. One such study conducted on American immigrants of Asian origin indicates that first generation migrants start to lose gut diversity as soon as nine months after relocation. This loss is even more pronounced in the second generation 60. A first generation immigrant’s microflora memory may be the reason why their microbiota is different from that of their children. Studies in such a setting are extremely important to assess the hyper-parameters driving human microbiota progression. Another interesting aspect of the ALE-based feature selection is that some features are very poorly described within human microbiota context—environmental bacteria, —yet have great influence over age prediction. More knowledge on such microorganisms (e.g. *Ornithobacterium rhinotracheale*) may provide useful insights into the functions of human microbiota. ## Conclusion We demonstrated the feasibility of age prediction by application of machine learning approaches to taxonomic microflora profiles. Our most accurate DNN regressor achieved the MAE of 3.94 years. This performance is comparable with the 1.9 MAE of the PhotoAgeClock, 2.7 of the state of art methylation aging clock, 7.8 MAE transcriptomic aging clock and 5.5 MAE of the hematological aging clock published previously. We also developed a method for microbiological feature selection and annotation. It combines two-fold feature importance assessment using PFI and ALE approaches upon training a DNN. This technique allows both selecting the most relevant features as biomarkers and quantifying their influence on the target variable, i.e. age. Using this method, we identified 95 and 39 prokaryote taxa as the biomarkers of intestinal aging. Despite the reduced predictive power of this set when compared to the whole taxonomic profiles, it let us to assign individuals to three age groups (young, middle aged and old) 86% more accurately than random classification (0.71 versus 0.34 F-score). The identified biomarkers include species whose abundance is positively or negatively correlated with predicted age. These species may be further investigated deeply by the community to improve our understanding of human aging and its relationship with the gut microbiome. ## Conflict of Interest Statement FG, AA, EP and AZ work for Insilico Medicine, a for-profit longevity biotechnology company developing the end-to-end target identification and drug discovery pipeline for a broad spectrum of age-related diseases. The company applied for a patent on the microbiomic aging clocks, and bacterial species combinations and the molecular products of these species for the treatment of age-related diseases and extending healthy longevity. The microbiomic aging clock is being integrated in the Young.AI system operated by the company. The company may have commercial interests in this publication. VNG provided academic contributions and does not have conflict of interest. ## Acknowledgments This publication contains free vector art created by Alice Noir from the Noun Project and by eucalyp from Flaticon. VNG is supported by NIH grants. ## List of abbreviations ALE : Accumulated local effect BMU : Best matching unit DFS : Deep feature selection DNN : Deep neural network IBD : Inflammatory bowel diseases IBS : Irritable bowel syndrome LE : Local effect MAE : Mean absolute error OUT : Operational taxonomic unit PFI : Permutation feature importance SCFA : Short chain fatty acids SOM : Self-organizing maps WGS : Whole genome shotgun [sequencing] XGB : Gradient boosting (XGBoost Python implementation) * Received December 28, 2018. * Revision received December 28, 2018. * Accepted December 28, 2018. * © 2018, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. 1.Suau, A. et al. Direct analysis of genes encoding 16S rRNA from complex communities reveals many novel molecular species within the human gut. Appl. Environ. Microbiol. 65, 4799–807 (1999). [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYWVtIjtzOjU6InJlc2lkIjtzOjEwOiI2NS8xMS80Nzk5IjtzOjQ6ImF0b20iO3M6Mzc6Ii9iaW9yeGl2L2Vhcmx5LzIwMTgvMTIvMjgvNTA3NzgwLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 2. 2.De Palma, G. et al. Transplantation of fecal microbiota from patients with irritable bowel syndrome alters gut function and behavior in recipient mice. Sci. Transl. Med. 9, eaaf6397 (2017). [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTE6InNjaXRyYW5zbWVkIjtzOjU6InJlc2lkIjtzOjE0OiI5LzM3OS9lYWFmNjM5NyI7czo0OiJhdG9tIjtzOjM3OiIvYmlvcnhpdi9lYXJseS8yMDE4LzEyLzI4LzUwNzc4MC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 3. 3.Rowland, I. et al. Gut microbiota functions: metabolism of nutrients and other food components. Eur. J. Nutr. 57, 1–24 (2018). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1007/s00394-017-1445-8&link_type=DOI) 4. 4.Wu, H.-J. & Wu, E. The role of gut microbiota in immune homeostasis and autoimmunity. Gut Microbes 3, 4–14 (2012). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.4161/gmic.19320&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=22356853&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) 5. 5.Lozupone, C. A., Stombaugh, J. I., Gordon, J. I., Jansson, J. K. & Knight, R. Diversity, stability and resilience of the human gut microbiota. Nature 489, 220–30 (2012). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/nature11550&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=22972295&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000308635900030&link_type=ISI) 6. 6.Azimi, T., Nasiri, M. J., Chirani, A. S., Pouriran, R. & Dabiri, H. The role of bacteria in the inflammatory bowel disease development: a narrative review. APMIS 126, 275–283 (2018). 7. 7.Perez-Muñoz, M. E., Arrieta, M.-C., Ramer-Tait, A. E. & Walter, J. A critical assessment of the “sterile womb” and “in utero colonization” hypotheses: implications for research on the pioneer infant microbiome. Microbiome 5, 48 (2017). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1186/s40168-017-0268-4&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=28454555&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) 8. 8.Tanaka, M. & Nakayama, J. Development of the gut microbiota in infancy and its impact on health in later life. Allergol. Int. 66, 515–522 (2017). 9. 9.Buford, T. W. (Dis)Trust your gut: the gut microbiome in age-related inflammation, health, and disease. Microbiome 5, 80 (2017). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1186/s40168-017-0296-0&link_type=DOI) 10. 10.Francino, M. P. Early development of the gut microbiota and immune health. Pathog. (Basel, Switzerland) 3, 769–90 (2014). 11. 11.García-Peña, C., Álvarez-Cisneros, T., Quiroz-Baez, R. & Friedland, R. P. Microbiota and Aging. A Review and Commentary. Arch. Med. Res. 48, 681–689 (2017). 12. 12.Macfarlane, G. T. & Hopkins, M. J. Changes in predominant bacterial populations in human faeces with age and with Clostridium difficile infection. J. Med. Microbiol. 51, 448–454 (2002). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1099/0022-1317-51-5-448&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=11990498&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000175190300011&link_type=ISI) 13. 13.Bian, G. et al. The Gut Microbiota of Healthy Aged Chinese Is Similar to That of the Healthy Young. mSphere 2, e00327–17 (2017). 14. 14.Maffei, V. J. et al. Biological Aging and the Human Gut Microbiota. Journals Gerontol. Ser. A 72, 1474–1482 (2017). 15. 15.Otoole, P. W. & Jeffery, I. B. Gut microbiota and aging. Science (80-.). 350, 1214–1215 (2015). [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzNTAvNjI2NS8xMjE0IjtzOjQ6ImF0b20iO3M6Mzc6Ii9iaW9yeGl2L2Vhcmx5LzIwMTgvMTIvMjgvNTA3NzgwLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 16. 16.Woodmansey, E. J., McMurdo, M. E. T., Macfarlane, G. T. & Macfarlane, S. Comparison of compositions and metabolic activities of fecal microbiotas in young adults and in antibiotic-treated and non-antibiotic-treated elderly subjects. Appl. Environ. Microbiol. 70, 6113–22 (2004). [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYWVtIjtzOjU6InJlc2lkIjtzOjEwOiI3MC8xMC82MTEzIjtzOjQ6ImF0b20iO3M6Mzc6Ii9iaW9yeGl2L2Vhcmx5LzIwMTgvMTIvMjgvNTA3NzgwLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 17. 17.Claesson, M. J. et al. Gut microbiota composition correlates with diet and health in the elderly. Nature 488, 178–184 (2012). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/nature11319&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=22797518&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000307267000026&link_type=ISI) 18. 18.Bested, A. C., Logan, A. C. & Selhub, E. M. Intestinal microbiota, probiotics and mental health: from Metchnikoff to modern advances: Part I - autointoxication revisited. Gut Pathog. 5, 5 (2013). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1186/1757-4749-5-5&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=23506618&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) 19. 19.Choi, J., Hur, T.-Y. & Hong, Y. Influence of Altered Gut Microbiota Composition on Aging and Aging-Related Diseases. J. lifestyle Med. 8, 1–7 (2018). 20. 20.Kaur, H., Das, C. & Mande, S. S. In Silico Analysis of Putrefaction Pathways in Bacteria and Its Implication in Colorectal Cancer. Front. Microbiol. 8, 2166 (2017). 21. 21.Gladyshev, V. N. Aging: progressive decline in fitness due to the rising deleteriome adjusted by genetic, environmental, and stochastic processes. Aging Cell 15, 594–602 (2016). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1111/acel.12480&link_type=DOI) 22. 22.Field, A. E. et al. DNA Methylation Clocks in Aging: Categories, Causes, and Consequences. Mol. Cell 71, 882–895 (2018). 23. 23.Levine, M. E. et al. An epigenetic biomarker of aging for lifespan and healthspan. Aging (Albany. NY). 10, 573–591 (2018). 24. 24.Xia, X., Chen, W., McDermott, J. & Han J.-D. J. Molecular and phenotypic biomarkers of aging. F1000Research 6, 860 (2017). 25. 25.Lan, Y., Kriete, A. & Rosen, G. L. Selecting age-related functional characteristics in the human gut microbiome. Microbiome 1, 2 (2013). 26. 26.Odamaki, T. et al. Age-related changes in gut microbiota composition from newborn to centenarian: a cross-sectional study. BMC Microbiol. 16, 90 (2016). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1186/s12866-016-0708-5&link_type=DOI) 27. 27.Horvath, S. DNA methylation age of human tissues and cell types. Genome Biol. 14, R115 (2013). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1186/gb-2013-14-10-r115&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=24138928&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) 28. 28.Hannum, G. et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol. Cell 49, 359–67 (2013). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1016/j.molcel.2012.10.016&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=23177740&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000314379400018&link_type=ISI) 29. 29.Mamoshina, P., Vieira, A., Putin, E. & Zhavoronkov, A. Applications of Deep Learning in Biomedicine. Mol. Pharm. 13, 1445–1454 (2016). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1021/acs.molpharmaceut.5b00982&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=27007977&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) 30. 30.Aliper, A. et al. Deep Learning Applications for Predicting Pharmacological Properties of Drugs and Drug Repurposing Using Transcriptomic Data. Mol. Pharm. 13, 2524–2530 (2016). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1021/acs.molpharmaceut.6b00248&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=27200455&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) 31. 31.West, M. D. et al. Use of deep neural network ensembles to identify embryonic-fetal transition markers: repression of *COX7A1* in embryonic and cancer cells. Oncotarget 9, 7796–7811 (2018). 32. 32.Putin, E. et al. Reinforced Adversarial Neural Computer for de Novo Molecular Design. J. Chem. Inf. Model. 58, 1194–1204 (2018). 33. 33.Putin, E. et al. Adversarial Threshold Neural Computer for Molecular de Novo Design. Mol. Pharm. 15, 4386–4397 (2018). 34. 34.Kuzminykh, D. et al. 3D Molecular Representations Based on the Wave Transform for Convolutional Neural Networks. Mol. Pharm. 15, 4378–4385 (2018). 35. 35.Polykovskiy, D. et al. Entangled Conditional Adversarial Autoencoder for de Novo Drug Discovery. Mol. Pharm. 15, 4398–4405 (2018). 36. 36.Kadurin, A. et al. The cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncology. Oncotarget 8, 10883–10890 (2017). 37. 37.Zhavoronkov, A. Artificial Intelligence for Drug Discovery, Biomarker Development, and Generation of Novel Chemistry. Mol. Pharm. 15, 4311–4313 (2018). 38. 38.Vanhaelen, Q. et al. Design of efficient computational workflows for in silico drug repurposing. Drug Discov. Today 22, 210–222 (2017). 39. 39.Moskalev, A. et al. Developing criteria for evaluation of geroprotectors as a key stage toward translation to the clinic. Aging Cell 15, 407–415 (2016). 40. 40.Aliper, A. et al. In search for geroprotectors: in silico screening and in vitro validation of signalome-level mimetics of young healthy state. Aging (Albany. NY). 8, 2127–2152 (2016). 41. 41.Aliper, A. et al. Towards natural mimetics of metformin and rapamycin. Aging (Albany. NY). 9, 2245–2268 (2017). 42. 42.Mamoshina, P. et al. Population Specific Biomarkers of Human Aging: A Big Data Study Using South Korean, Canadian, and Eastern European Patient Populations. Journals Gerontol. Ser. A 73, 1482–1490 (2018). 43. 43.Bobrov, E. et al. PhotoAgeClock: deep learning algorithms for development of non-invasive visual biomarkers of aging. Aging (Albany. NY). 10, 3249–3259 (2018). 44. 44.Pyrkov, T. V. et al. Extracting biological age from biomedical data via deep learning: too much of a good thing? Sci. Rep. 8, 5210 (2018). 45. 45.Mamoshina, P. et al. Machine Learning on Human Muscle Transcriptomic Data for Biomarker Discovery and Tissue-Specific Drug Target Identification. Front. Genet. 9, (2018). 46. 46.Costea, P. I. et al. Subspecies in the global human gut microbiome. Mol. Syst. Biol. 13, 960 (2017). [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoibXNiIjtzOjU6InJlc2lkIjtzOjk6IjEzLzEyLzk2MCI7czo0OiJhdG9tIjtzOjM3OiIvYmlvcnhpdi9lYXJseS8yMDE4LzEyLzI4LzUwNzc4MC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 47. 47.Karlsson, F. H. et al. Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498, 99–103 (2013). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/nature12198&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=23719380&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000319947800041&link_type=ISI) 48. 48.Le Chatelier, E. et al. Richness of human gut microbiome correlates with metabolic markers. Nature 500, 541–546 (2013). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/nature12506&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=23985870&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000323625900026&link_type=ISI) 49. 49.Integrative HMP (iHMP) Research Network Consortium, T. I. H. (iHMP) R. N. The Integrative Human Microbiome Project: dynamic analysis of microbiome-host omics profiles during periods of human health and disease. Cell Host Microbe 16, 276–89 (2014). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1016/j.chom.2014.08.014&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=25211071&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) 50. 50.Bushnell, B. BBMap short-read aligner, and other bioinformatics tools. BBMap short-read aligner, and other bioinformatics tools. (2016). Available at: [https://sourceforge.net/projects/bbmap/](https://sourceforge.net/projects/bbmap/). 51. 51.Plaza Onate, F. et al. Quality control of microbiota metagenomics by k-mer analysis. BMC Genomics 16, 183 (2015). 52. 52.Kim, D., Song, L., Breitwieser, F. P. & Salzberg, S. L. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 26, 1721–1729 (2016). [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ2Vub21lIjtzOjU6InJlc2lkIjtzOjEwOiIyNi8xMi8xNzIxIjtzOjQ6ImF0b20iO3M6Mzc6Ii9iaW9yeGl2L2Vhcmx5LzIwMTgvMTIvMjgvNTA3NzgwLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 53. 53.Chen, T. & Guestrin, C. XGBoost: A Scalable Tree Boosting System. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (ACM, 2016). doi:10.1145/2939672.2939785 [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1145/2939672.2939785&link_type=DOI) 54. 54.He, K., Zhang, X., Ren, S. & Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. (2015). 55. 55.Li, Y., Chen, C.-Y. & Wasserman, W. W. Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters. J. Comput. Biol. 23, 322–336 (2016). 56. 56.Yoo, J., Kim, Y., Cho, E. R. & Jee, S. H. Biological age as a useful index to predict seventeen-year survival and mortality in Koreans. BMC Geriatr. 17, 7 (2017). 57. 57.Indikova, I., Humphrey, T. J. & Hilbert, F. Survival with a Helping Hand: Campylobacter and Microbiota. Front. Microbiol. 6, 1266 (2015). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.3389/fmicb.2015.01266&link_type=DOI) 58. 58.Havelaar, A. H. et al. Immunity to Campylobacter: its role in risk assessment and epidemiology. Crit. Rev. Microbiol. 35, 1–22 (2009). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1080/10408410802636017&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=19514906&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) 59. 59.Popkin, B. M., Adair, L. S. & Ng, S. W. Global nutrition transition and the pandemic of obesity in developing countries. Nutr. Rev. 70, 3–21 (2012). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1111/j.1753-4887.2011.00456.x&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=22221213&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000298798800002&link_type=ISI) 60. 60.Vangay, P. et al. US Immigration Westernizes the Human Gut Microbiome. Cell 175, 962–972.e10 (2018). 61. 61.Rogosa, M. Acidaminococcus gen. n., Acidaminococcus fermentans sp. n., anaerobic gram-negative diplococci using amino acids as the sole energy source for growth. J. Bacteriol. 98, 756–66 (1969). [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MjoiamIiO3M6NToicmVzaWQiO3M6ODoiOTgvMi83NTYiO3M6NDoiYXRvbSI7czozNzoiL2Jpb3J4aXYvZWFybHkvMjAxOC8xMi8yOC81MDc3ODAuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 62. 62.Martínez, I., Muller, C. E. & Walter, J. Long-term temporal analysis of the human fecal microbiota revealed a stable core of dominant bacterial species. PLoS One 8, e69621 (2013). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0069621&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=23874976&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) 63. 63.Saulnier, D. M. et al. Gastrointestinal microbiome signatures of pediatric patients with irritable bowel syndrome. Gastroenterology 141, 1782–91 (2011). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1053/j.gastro.2011.06.072&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=21741921&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000296512200043&link_type=ISI) 64. 64.Wu, G. D. et al. Linking long-term dietary patterns with gut microbial enterotypes. Science 334, 105–8 (2011). [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzMzQvNjA1Mi8xMDUiO3M6NDoiYXRvbSI7czozNzoiL2Jpb3J4aXYvZWFybHkvMjAxOC8xMi8yOC81MDc3ODAuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 65. 65.Wexler, H. M. Bacteroides: the good, the bad, and the nitty-gritty. Clin. Microbiol. Rev. 20, 593–621 (2007). [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiY21yIjtzOjU6InJlc2lkIjtzOjg6IjIwLzQvNTkzIjtzOjQ6ImF0b20iO3M6Mzc6Ii9iaW9yeGl2L2Vhcmx5LzIwMTgvMTIvMjgvNTA3NzgwLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 66. 66.El Enshasy, H. et al. Anaerobic Probiotics: The Key Microbes for Human Health. in 397–431 (Springer, Cham, 2015). doi:10.1007/10\_2015\_5008 [CrossRef](http://biorxiv.org/lookup/external-ref?access\_num=10.1007/10_2015_5008&link_type=DOI) 67. 67.Kho, Z. Y. & Lal, S. K. The Human Gut Microbiome - A Potential Controller of Wellness and Disease. Front. Microbiol. 9, 1835 (2018). 68. 68.Janssen, R. et al. Host-pathogen interactions in Campylobacter infections: the host perspective. Clin. Microbiol. Rev. 21, 505–18 (2008). [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiY21yIjtzOjU6InJlc2lkIjtzOjg6IjIxLzMvNTA1IjtzOjQ6ImF0b20iO3M6Mzc6Ii9iaW9yeGl2L2Vhcmx5LzIwMTgvMTIvMjgvNTA3NzgwLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 69. 69.Park, G.-S. et al. Complete genome sequence of a keratin-degrading bacterium Chryseobacterium gallinarum strain DSM 27622T isolated from chicken. J. Biotechnol. 211, 66–67 (2015). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1016/j.jbiotec.2015.07.007&link_type=DOI) 70. 70.Argou-Cardozo, I. & Zeidán-Chuliá, F. Clostridium Bacteria and Autism Spectrum Conditions: A Systematic Review and Hypothetical Contribution of Environmental Glyphosate Levels. Med. Sci. (Basel, Switzerland) 6, (2018). 71. 71.Farooq, S., Farooq, R. & Nahvi, N. Comamonas testosteroni: Is It Still a Rare Human Pathogen? Case Rep. Gastroenterol. 11, 42–47 (2017). 72. 72.Rey, F. E. et al. Metabolic niche of a prominent sulfate-reducing human gut bacterium. Proc. Natl. Acad. Sci. U. S. A. 110, 13582–7 (2013). [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMjoiMTEwLzMzLzEzNTgyIjtzOjQ6ImF0b20iO3M6Mzc6Ii9iaW9yeGl2L2Vhcmx5LzIwMTgvMTIvMjgvNTA3NzgwLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 73. 73.Jia, W., Faulkner, M., Cadby, I. & Cole, J. Why was Desulfovibrio fairfieldensis not found in faecal DNA from patients with gastric disease? Pathog. Dis. 67, 3–3 (2013). 74. 74.Haiser, H. J. et al. Predicting and manipulating cardiac drug inactivation by the human gut bacterium Eggerthella lenta. Science 341, 295–8 (2013). [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNDEvNjE0My8yOTUiO3M6NDoiYXRvbSI7czozNzoiL2Jpb3J4aXYvZWFybHkvMjAxOC8xMi8yOC81MDc3ODAuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 75. 75.Engels, C., Ruscheweyh, H.-J., Beerenwinkel, N., Lacroix, C. & Schwab, C. The Common Gut Microbe Eubacterium hallii also Contributes to Intestinal Propionate Formation. Front. Microbiol. 7, 713 (2016). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.3389/fmicb.2016.00713&link_type=DOI) 76. 76.Mahowald, M. A. et al. Characterizing a model human gut microbiota composed of members of its two dominant bacterial phyla. Proc. Natl. Acad. Sci. U. S. A. 106, 5859–64 (2009). [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMToiMTA2LzE0LzU4NTkiO3M6NDoiYXRvbSI7czozNzoiL2Jpb3J4aXYvZWFybHkvMjAxOC8xMi8yOC81MDc3ODAuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 77. 77.Ferreira-Halder, C. V., Faria, A. V. de S. & Andrade, S. S. Action and function of Faecalibacterium prausnitzii in health and disease. Best Pract. Res. Clin. Gastroenterol. 31, 643–648 (2017). 78. 78.Braune, A. & Blaut, M. Bacterial species involved in the conversion of dietary flavonoids in the human gut. Gut Microbes 7, 216 (2016). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1080/19490976.2016.1158395&link_type=DOI) 79. 79.Mégraud, F., Bébéar, C., Dabernat, H. & Delmas, C. Haemophilus species in the human gastrointestinal tract. Eur. J. Clin. Microbiol. Infect. Dis. 7, 437–8 (1988). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1007/BF01962361&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=3137057&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) 80. 80.Song, H. S. et al. Complete genome sequence of a commensal bacterium, Hafnia alvei CBA7124, isolated from human feces. Gut Pathog. 9, 41 (2017). 81. 81.Durand, G., Afouda, P., Raoult, D. & Dubourg, G. “Intestinimonas massiliensis” sp. nov, a new bacterium isolated from human gut. New microbes new Infect. 15, 1–2 (2017). 82. 82.Mu, Q., Tavella, V. J. & Luo, X. M. Role of Lactobacillus reuteri in Human Health and Diseases. Front. Microbiol. 9, 757 (2018). 83. 83.Luerce, T. D. et al. Anti-inflammatory effects of Lactococcus lactis NCDO 2118 during the remission period of chemically induced colitis. Gut Pathog. 6, 33 (2014). 84. 84.Morgan, X. C. et al. Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment. Genome Biol. 13, R79 (2012). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1186/gb-2012-13-9-r79&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=23013615&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) 85. 85.Vandamme, P. et al. Ornithobacterium rhinotracheale gen. nov., sp. nov., Isolated from the Avian Respiratory Tract. Int. J. Syst. Bacteriol. 44, 24–37 (1994). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1099/00207713-44-1-24&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=8123560&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) 86. 86.Zehr, E. S., Bayles, D. O., Boatwright, W. D., Tabatabai, L. B. & Register, K. B. Complete genome sequence of Ornithobacterium rhinotracheale strain ORT-UMN 88. Stand. Genomic Sci. 9, 16 (2014). 87. 87.Stewart, C. S., Duncan, S. H. & Cave, D. R. Oxalobacter formigenes and its role in oxalate metabolism in the human gut. FEMS Microbiol. Lett. 230, 1–7 (2004). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1016/S0378-1097(03)00864-4&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=14734158&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) 88. 88.Segata, N. et al. Composition of the adult digestive tract bacterial microbiome based on seven mouth surfaces, tonsils, throat and stool samples. Genome Biol. 13, R42 (2012). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1186/gb-2012-13-6-r42&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=22698087&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) 89. 89.Zitomersky, N. L. et al. Characterization of adherent bacteroidales from intestinal biopsies of children and young adults with inflammatory bowel disease. PLoS One 8, e63686 (2013). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0063686&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=23776434&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) 90. 90.Graf, D. et al. Contribution of diet to the composition of the human gut microbiota. Microb. Ecol. Health Dis. 26, 26164 (2015). [PubMed](http://biorxiv.org/lookup/external-ref?access_num=25656825&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) 91. 91.Rabah, H., Rosa do Carmo, F. L. & Jan, G. Dairy Propionibacteria: Versatile Probiotics. Microorganisms 5, (2017). 92. 92.Ohara, T. & Itoh, K. Significance of Pseudomonas aeruginosa colonization of the gastrointestinal tract. Intern. Med. 42, 1072–6 (2003). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.2169/internalmedicine.42.1072&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=14686744&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) 93. 93.Ołdak, E. & Trafny, E. A. Secretion of proteases by Pseudomonas aeruginosa biofilms exposed to ciprofloxacin. Antimicrob. Agents Chemother. 49, 3281–8 (2005). [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYWFjIjtzOjU6InJlc2lkIjtzOjk6IjQ5LzgvMzI4MSI7czo0OiJhdG9tIjtzOjM3OiIvYmlvcnhpdi9lYXJseS8yMDE4LzEyLzI4LzUwNzc4MC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 94. 94.Li, C. et al. Biodegradation of Buprofezin by Rhodococcus sp. Strain YL-1 Isolated from Rice Field Soil. J. Agric. Food Chem. 60, 2531–2537 (2012). [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1021/jf205185n&link_type=DOI) [PubMed](http://biorxiv.org/lookup/external-ref?access_num=22335821&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2018%2F12%2F28%2F507780.atom) 95. 95.Burton, J. P., Wescombe, P. A., Moore, C. J., Chilcott, C. N. & Tagg, J. R. Safety assessment of the oral cavity probiotic Streptococcus salivarius K12. Appl. Environ. Microbiol. 72, 3050–3 (2006). [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYWVtIjtzOjU6InJlc2lkIjtzOjk6IjcyLzQvMzA1MCI7czo0OiJhdG9tIjtzOjM3OiIvYmlvcnhpdi9lYXJseS8yMDE4LzEyLzI4LzUwNzc4MC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=)