Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Temporal probabilistic modeling of bacterial compositions derived from 16S rRNA sequencing

Tarmo Äijö, Christian L. Müller, Richard Bonneau
doi: https://doi.org/10.1101/076836
Tarmo Äijö
1Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: taijo@simonsfoundation.org rb113@nyu.edu
Christian L. Müller
1Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Richard Bonneau
1Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
2Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
3Courant Institute of Mathematical Sciences, New York University, New York, NY 10003, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: taijo@simonsfoundation.org rb113@nyu.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

The number of microbial and metagenomic studies has increased drastically due to advance-ments in next-generation sequencing-based measurement techniques. Statistical analysis and the validity of conclusions drawn from (time series) 16S rRNA and other metagenomic sequencing data is hampered by the presence of significant amount of noise and missing data (sampling zeros). Accounting uncertainty in microbiome data is often challenging due to the difficulty of obtaining biological replicates. Additionally, the compositional nature of current amplicon and metagenomic data differs from many other biological data types adding another challenge to the data analysis.

To address these challenges in human microbiome research, we introduce a novel probabilistic approach to explicitly model overdispersion and sampling zeros by considering the temporal correlation between nearby time points using Gaussian Processes. The proposed Temporal Gaussian Process Model for Compositional Data Analysis (TGP-CODA) shows superior modeling performance compared to commonly used Dirichlet-multinomial, multinomial, and non-parametric regression models on real and synthetic data. We demonstrate that the nonreplicative nature of human gut microbiota studies can be partially overcome by our method with proper experimental design of dense temporal sampling. We also show that different modeling ap-proaches have a strong impact on ecological interpretation of the data, such as stationarity, persistence, and environmental noise models.

A Stan implementation of the proposed method is available under MIT license at https://github.com/tare/GPMicrobiome.

1 Introduction

Microbial ecology involves the study of microorganisms’ relationships with each other and with their environment and aims to provide insights into structure and dynamics of ecological networks (Kurtz et al., 2015), ecological stability (Faith et al., 2013), biodiversity (Lozupone et al., 2012), and discovery of key taxa in ecosystems (Ivanov et al., 2009).

16S ribosomal RNA (rRNA) amplicon sequencing (targeted next-generation sequencing of 16S rRNA gene) has proven to be a cost-effective, culture-free, and highly multiplexed method to identify and compare bacterial compositions present within biological samples across a wide range of habitats, including natural environments (Meron et al., 2012; Hell et al., 2013) and different host organisms (Kuczynski et al., 2012; Yatsunenko et al., 2012). While the majority of amplicon sequencing studies has been cross-sectional in nature or based on few selected time points, it has been recognized that longitudinal studies with the aim of mapping the trajectories of microbiota over time are a prerequisite for a deeper understanding of ecological mechanisms in the microbiome and for the development of microbiome therapies (Gerber, 2014; Fisher and Mehta, 2014). Sparsely sampled microbial time series have already revealed dynamic reorganization of gut microbial compositions during early development in humans (Yatsunenko et al., 2012) and upon external perturbations through antibiotic treatment (Jernberg et al., 2010), and have identified significant differences in vaginal microbiota during pregnancy (Romero et al., 2014). The richest resource to date for long-term longitudinal amplicon studies are the landmark studies by Caporaso et al. (2011) and David et al. (2014) which provide human-associated microbial compositions on a daily time scale spanning hundreds of days. Caporaso et al. (2011) quantify natural variations of microbial compositions within and among four body sites across time. David et al. (2014) focus on the effects of host lifestyle, including travel, change of diet, and infection, on changes in the human gut microbiome.

While statistical time series analysis has an extensive and successful history in classical genomics (Aach and Church, 2001; Bar-Joseph et al., 2004; Bonneau et al., 2006; Leek et al., 2006; Ahdesmäki et al., 2007; Bar-Joseph et al., 2012; Äijö et al., 2014), few attempts have been made to model amplicon-based temporal data in a principled statistical manner (Gerber et al., 2012; Bucci et al., 2016). This may stem in part from the fact that standard multivariate techniques can not be applied to amplicon-based sequencing data. Firstly, as compared to other technologies such as flow cytometry (Amann et al., 1990) and conventional plate counting that allow absolute taxa abundance measurements, standard 16S rRNA count data can only reveal relative abundances of taxa, thus rendering individual taxa counts not independent. Secondly, statistical analysis of 16S rRNA sequencing count data is complicated by the presence of overdispersion and missing data. Missing data manifests as an excessive number of zero counts due to imperfect sampling (i.e, zero-inflation and sampling zeros). Separation of sampling zeros (zeros due imperfect sampling) from structural zeros (true, biologically meaningful, zeros) is a common challenge in the analysis of many current biological data types, including single-cell RNA sequencing (Brennecke et al., 2013) and shotgun protein mass spectrometry data (Webb-Robertson et al., 2015). In the context of human-associated microbiome studies, amplicon-based sequencing studies face the additional restriction that well-controlled biological replicates (from different individuals) are not available due to different genetic background, environmental exposure, and life style of human subjects.

Different approaches have been proposed to deal with these intrinsic characteristics of (cross-sectional) 16S rRNA sequencing data (see, e.g., Xu et al. (2015) for a recent comparison). Methods based on the negative binomial (NB) distribution (popular in modeling RNA sequencing data) have been proposed for modeling overdispersion in 16S rRNA data, and zero-inflated negative binomial (ZINB) mixture models have been successfully used to fit excessive numbers of zeros. However, the NB and ZINB distributions model taxa as independent, thus ignoring the intrinsic compositional nature of the data. Moreover, the binary distribution component of ZINB only increases the probability of zeros instead of modeling the source of zeros (true vs. non-detected due to sequencing depth) (Mohri and Roark, 2005). The impossibility of obtaining well-controlled biological replicates of human microbiome samples limits the applicability of NB distribution and ZINB in that context because overdispersion of (taxon-specific) counts caused by biological variation cannot be reliably estimated. In light of these limitations, several methodologies have been proposed for simultaneous modeling of taxa through their relative abundances, such as the Dirichlet-multinomial (DM) (Holmes et al., 2012; Chen and Li, 2013) and logistic normal multinomial models (Xia et al., 2013). The logistic normal multinomial model is a generalized linear model (GLM) utilizing the logit link function, thus enabling the use of well-established theory and methods of linear models for modeling count data and relative abundances. Both models are extremely powerful for cross-sectional studies with proper biological replicates. Yet, extending these models to time course data analysis has thus far been limited to point-wise analysis, followed by projecting the dynamics using low-dimensional embedding (Caporaso et al., 2011) or calculating different diversity metrics or temporal summary statistics across pairs of time points (Flores et al., 2014; Faust et al., 2015). Recent approaches that utilize the full potential of the data by considering temporal dependencies among the data points include MC-TIMME (Gerber et al., 2012) which uses exponential relaxation processes to model time-varying counts (Gerber et al., 2012) and BioMiCo (Shafiei et al., 2015) which uses a supervised hierarchical mixed-membership model to track groups of taxa over time. Other methods rely on deterministic regularized model fitting using generalized Lotka-Volterra equations (Stein et al., 2013; Buffie et al., 2015; Bucci et al., 2016).

In this study, we present a fully Bayesian probabilistic model, the Temporal Gaussian Process Model for Compositional Data Analysis (TGP-CODA), that tackles the compositionality, overdispersion, and zero-inflation in 16S rRNA sequencing data through temporal analysis. Our approach is based on the assumption that by sharing information across time points it is possible to improve inference of overdispersion and zero-inflation parameters. We demonstrate that our model can accurately distinguish sampling zeros from structural zeros by using the temporal correlation and the global effect of sampling zeros on the compo-sitions. Our generative hierarchical model combines a multinomial distribution with Gaussian processes (for each taxon to model connections between time points), includes explicit model-based zero-inflation and overdispersion components, and can seamlessly integrate non-uniformly sampled time series (Section 2). We compare our temporal approach to the state-of-the-art DM model on realistic synthetic data and demonstrate more accurate composition estimation. We also model and reanalyze the long-term longitudinal gut microbiota data sets of four individuals (Caporaso et al., 2011; David et al., 2014) using TGP-CODA and maximum likelihood approaches (Section 3). We demonstrate (1) that the dynamical behavior of bacterial orders are globally stable but can accelerate upon environmental perturbations, (2) that our Bayesian model is robust to missing time points, and (3) that estimates of fundamental ecological indicators such as taxa persistence times and taxa stationarity are dependent on the underlying temporal model.

2 Methods

We first describe TGP-CODA, our Bayesian generative model that integrates temporal, overdispersion, and zero-inflation components for analyzing longitudinal 16S rRNA sequencing data (Figure 1).

Figure 1:
  • Download figure
  • Open in new tab
Figure 1: Statistical model and prior distributions

A graphical representation of our model. Grey and white circles depict observed variables and latent variables, respectively. Grey squares represent user-definable parameters. The Gaussian processes, G, model noise-free real-valued “compositions” (log odds ratios), which are used as a basis for generating noisy real-valued “compositions” (log odds ratios), F. Noisy compositions, Θ, are obtained from F by applying the softmax transformation. Zero-inflation-aware compositions, Θzi, are obtained from Θ and β by Θzi = Φ(Θ; β) (Equation (13)). The likelihood of data is evaluated using the zero-inflation-aware composition parameters, Θzi. Underlying unobservable noise-free compositions, ΘG, are obtained from G by applying the softmax transformation.

2.1 Data likelihood

Let M be the number of taxa, T the number of measurement time points, and Embedded Image the set of measurement time points. Let Embedded Image be the number of observed reads assigned to the ith taxon at time point Embedded Image (the corresponding random variable is denoted by Embedded Image, where every read is assigned exactly to one taxon. For notational simplicity, let Embedded Image and Embedded Image. Additionally, let us denote the total number of taxa assigned reads at time point t by Embedded Image. Next, let us assume: (1) Nt taxa reads are sampled independently of each other and (2) the M possible outcomes have fixed probabilities, Embedded Image (M-dimensional simplex), at time point t. Then, Xt follows multinomial distribution with the parameters Θt and Nt

Embedded Image

The normal approximation to the multinomial (Severini, 2005), while computationally convenient, is not applicable in this case even for large values Nt because Θt is empirically observed to be located close to a corner of the simplex Embedded Image(i.e., there are many lowly abundant taxa).

Next, let us define the likelihood in the case of multiple time points. Let us denote the collection of Θt over T time points by:

Embedded Image

The data likelihood assuming independence of observations at different time points (true for sequential sampling from a population) (Figure 1; see the “Likelihood” section), Embedded Image, is Embedded Image which can be used to evaluate the likelihood of the data, Embedded Image given the parameter Θt.

2.2 Temporal modeling of microbiome compositions

Modeling in compositional space is notoriously challenging (modeling fractions of population or fractions of reads, for example) (Aitchison, 1982): (1) the compositional space enforces restrictions on the modeling domain, which might not be easily expressible in the selected modeling framework (due to the intrinsic dependency among all taxa) and (2) the differences in relative abundances of taxa can vary over multiple orders of magnitude, which, combined with compositional effects renders the direct modeling of relative abundances a hard task. To overcome these challenges, modeling log odds ratios between taxa in real space have been proposed, typically followed by a transformation to map the real values to a simplex (Aitchison, 1982; Holmes et al., 2012). In this study, we will use the commonly used softmax transformation (e.g., in multinomial logistic regression) which is a generalization of the logistic function (Bishop, 2006). The softmax transformation from Embedded Image to Embedded Image is defined as follows Embedded Image where Embedded Image(Bishop, 2006). The explicit assumption Embedded Image in Equation (4) makes the softmax transformation bijective. The softmax transformation is required because the multinomial likelihood parameters, Θt, are constrained to lie in the M-dimensional simplex. Next, let us denote the collection of Gt over T time points by Embedded Image with the element-wise softmax transformation (see also Equation (2)) Embedded Image

Next, we will describe the temporal component of our generative model. It is unknown a priori how relative abundances of bacterial taxa vary over time and how treatments and abrupt changes in the environment might alter ecological dynamics. Therefore, we do not want to restrict the model and the resulting dynamics by strong assumptions on functional forms of temporal relative abundances. Thus, we will take a non-parametric approach and use a Gaussian process kernel to model temporal dynamics, requiring only weak assumptions (such as smoothness) on the temporal characteristics of the signal (Rasmussen and Williams, 2005).

We assume that G(i),i = 1,2,…,M − 1 (ith row of G) are smooth, and the time series data is well sampled (i.e., well-designed experiments to match the modeling objective). We will model G(i),i = 1,2,…,M − 1 using Gaussian process (Rasmussen and Williams, 2005) Embedded Image where Embedded Image, i = 1,2,…,M − 1 using Gaussian process (Rasmussen and Williams, 2005)

Embedded Image

The term Embedded Image is the covariance function given the hyperparameters Embedded Image. In this work, we use the squared exponential covariance function Embedded Image where Embedded Image with ηi denoting the signal variance parameter and ρi the characteristic length scale.

2.3 Modeling overdispersion of counts

When the values Nt are large and no replicates are available, the data likelihood (Equation (3)) will dominate the Gaussian process prior (Equation (7)) leading to overfitting of Θt. Consequently, inherent biological and technical variations are severely underestimated. Notably, the DM and logistic normal multinomial models suffer from the same problem (this is apparent from the forms of maximum likelihood and Bayes estimators in SEquations (1) and (4), respectively). Thus, it is advantageous to explicitly model sampling variation in Embedded Image by introducing an additional level of random variables to the hierarchical model Embedded Image where Embedded Image, i = 1,2,…,M − 1 are row vectors that depend on G(i) and Embedded Image as follows: Embedded Image where Embedded Image is assumed to be constant over time (i.e., sampling variation is similar over time series) in order improve identifiability. In this extended model, Θ is obtained by applying the softmax transformation on F(see also Equation (2)) Embedded Image where Embedded Image, i = 1, 2,…, T.

In summary, the random variable Θ = Softmax(F) (see Equations (11) and (12)) is sample-specific (after sampling), whereas the random variable ΘG = Softmax(G) (see Equations (7) and (6)) models biological variation over samples (before sampling). The overdispersion component of the model is illustrated in Figure 1 (see the “Sampling and biological variation” and “Observed compositions” sections).

2.4 Modeling zero-inflation and missing data

16S rRNA and other amplicon sequencing based count data have been empirically shown to suffer from severe zero-inflation (Xu et al., 2015). Zero-inflation can be seen as “salt” noise in the compositions Θt (i.e., zeroing of individual components of Θt); the “salt” term refers to the “salt-and-pepper” noise concept from the digital image processing literature (Jayaraman, 2009). To model zero-inflation, we introduce another level of simplex-valued latent variables, Embedded Image to the model (Figure 1). The variables Θt and Embedded Image model underlying proportions and “salty” proportions of taxa, respectively. The sampling and zero-inflation are modeled separately for modeling convenience and for identifying the source of zeros (sampling or structural).

To explicitly model the effect of imperfect sampling, we introduce random variables Embedded Image,i = 1,2,…,M and consider the following weighting based transformation: Embedded Image where the common denominator term ensures Embedded Image. For notational simplicity, let us denote β=Embedded Image. The zero-inflation component of the model is illustrated in Figure 1 (see the “Sampling zeros” and “Observed compositions” sections).

2.5 Posterior estimation

To carry out the Bayesian inference on the presented model (Figure 1), we first specify the parameter prior distributions, Embedded Image and Embedded Image (SFigure 1a). The parameters Embedded Image and Embedded Image determine the signal variance and how fast correlation between time points diminishes, respectively. We select a relatively broad prior distribution for Embedded Image in order to support temporal correlations that vary from a few days to a few weeks (SFigure 1a). In this study, the time points ti (model inputs) are obtained by scaling the days of measurement (e.g., integers from 1 to D) by the total number of days (D); thus, the prior of Embedded Image is selected as Embedded Image is positive truncated Gaussian distribution) (SFigure 1a). Since Gaussian processes model the log odds ratios, we assume that the variances of the log odds ratios of taxa over time are relatively small. We set the prior as Embedded Image ~ Gamma(1.0, 0.5) (SFigure 1b). The prior of the noise standard deviation is set to Embedded Image to support relatively low noise levels (SFigure 1b). Finally, we explicitly assume that the sampling zeros are relatively rare by defining the prior as Embedded Image ~ Beta(0.8, 0.4) (broad distribution improves sampling efficiency) (SFigure 1b).

The posterior distribution function (up to a normalizing constant) is obtained as the product of the likelihood function and priors. The full posterior distribution function of our model is given in SEquation (5). We implemented the model in Stan (Carpenter et al., in press) and used its No-U-Turn Sampler (NUTS) to sample the posterior (SEquation (5)). The Stan probabilistic programming language enables cross-platform implementation, code interpretability, numerical stability, scaling, and efficient posterior inferences of various statistical models. Convergence of chains was monitored using by the Gelman-Rubin statistic (Gelman and Rubin, 1992) Embedded Image. All relevant information (prior and data) about the parameters is summarized in the posterior distributions. We can thus use the obtained posterior samples to summarize the distributions, e.g., by calculating means and credible intervals (Gelman et al., 2014).

3 Results

3.1 Temporal analysis improves estimation accuracy

To validate the presented temporal compositional data analysis method, we first compare TGP-CODA to the DM model (Chen and Li, 2013) using synthetic data. To compare these two methods, we consider a scenario of 36 taxa with realistic dynamics and abundance distribution (see Supplementary Material). The generated synthetic data sets are analyzed using the temporal and DM models. The composition estimates at day 90 (common between six, nine, 14, and 27 time points to allow direct comparison) of both methods are compared to the noise-free ground truths (Figure 2a). Even in this simple scenario, the temporal approach consistently produces more accurate composition estimates than the DM model (Figure 2; STable 1). We find that the performance of the temporal approach improves (as expected) as the number of time points increases; e.g., the mean estimation errors and the corresponding standard deviations are 0.15±0.09 and 0.10±0.06 with six and 14 time points, respectively (STable 1). The estimation error of the DM model does not depend on the number of time points as it considers time points separately (STable 1). Our modeling of temporal correlations and thereby sharing information between time points leads to more accurate estimation of compositions from longitudinal count data.

Figure 2:
  • Download figure
  • Open in new tab
Figure 2: Temporal correlation in composition estimation

(a) Box plots illustrate estimation errors of our temporal TGP-CODA (orange) and DM models (green). Six, nine, 14, and 27 time points are considered. Estimation error is defined to be the Euclidean distance between the the first M 1 components of the simplex-valued proportions vectors. Each box plot is calculated from 100 simulations. Outliers are not shown. The two-sided p-values from the Wilcoxon signed-rank tests are listed. (b) The results from the sensitivity analysis of the results depicted in (a) with respect to the prior distributions of η2 and ρ2 are illustrated. (c) Box plots illustrate the variation of β values of taxa (proportions ≥ 1e-4) and time points with added sampling zero. The cases of 14 time points with either 10, 20, 40, or 120 added sampling zeros are considered. Each box plot shows the average from 100 simulations. Outliers are not depicted. (d)Box plots illustrate the estimation error of the temporal (orange) and DM (green) models at the time points with induced sampling zeros. The cases of 10, 20, 40, and 120 sampling zeros are considered. Estimation error is defined to be the Euclidean distance between the the first M-1 of the simplex-valued proportions vectors. Each box plot is calculated from 100 simulations. Outliers are not depicted. The two-sided p-values from the Wilcoxon signed-rank tests are listed. (e) The sensitivity of the estimation of β with respect to the prior of β is studied in the scenario of (c). The estimated β values of taxa (proportions ≥ 1e-4) and time points with induced sampling zeros are compared by calculating the difference between the estimates obtained under the original prior (β ~ Beta(0.8, 0.4)) and perturbed prior (β ~ Beta(θβ,1,θβ,2)). The cases of 14 time points with either 10, 20, 40, or 120 sampling zeros are considered. Each box plot is calculated from 100 simulations. Outliers are not depicted.

Because our estimates should not be critically sensitive to the hyperparameters, (θη, θρ, θβ). we carried out a sensitivity analysis with respect to the prior distributions of η2 and ρ2 defined in the Section 2.5. We considered random variables Embedded Image and Embedded Image(SFigure 4a) whose purpose is to perturb the prior distributions Embedded Image and Embedded Image. We then repeated the analysis presented in Figure 2a and compared the compositions estimates between the original and perturbed priors (Figure 2b). The means and the corresponding standard deviations of the estimate differences were 0.05±0.05, 0.03±0.03, 0.03±0.08, and 0.02±0.01 with six, nine, 14, and 27 time points, respectively (Figure 2b). As expected, the variations in the final estimates get smaller as the amount of data to base the estimation increases. Collectively, the small obtained differences demonstrate that the estimates are not critically sensitive to the prior distributions of η2 and ρ2.

3.2 Modeling sampling zeros improves estimation accuracy

To validate the described zero-inflation component and to see whether the estimated β values reflect sampling zeros, we consider the same example as above but with imposed sampling zeros. We generated data sets with different numbers (10, 20, 40, or 120) of imposed zeros randomly distributed to the taxa and time points. Importantly, there are likely additional zeros for lowly abundant taxa due to the low sampling depth. This unbiased procedure also introduces sampling zeros to lowly abundant taxa. Clearly, these zeros are harder to detect with the used sampling depth (or with any relatively low sampling depth). We analyzed these zero-inflated synthetic data sets using our temporal approach and studied the distribution of β values of taxa (proportions ≥ 1e-4) at the time points with imposed zeros (Figure 2c). Our model is able to identify 10 (mean±SD=0.04±0.07), 20 (0.05±0.09), and 40 (0.07±0.11) sampling zeros accurately among taxa that are not close to detection limit, whereas the identification of 120 sampling zeros is less reliable (0.21±0.19) (Figure 2c). As expected, detecting sampling zeros among lowly abundant taxa is challenging (SFigure 4b).

To check whether the detection and correction of sampling zeros improves composition estimation, we next considered the same scenario as shown in Figure 2c with the focus on the composition estimates instead of β value. We compared the composition estimates of the temporal and DM models at the time points with sampling zeros to the noise-free ground truths (Figure 2d, STable 2). The temporal approach produces smaller estimation error than the DM model in all the considered cases. For instance, the estimation error is almost two times smaller with the temporal approach (mean±SD=0.12±0.08) compared to the DM model (0.21±0.16) in the case of 20 sampling zeros (Figure 2d, STable 2). The weaker performance of the DM model is expected since it does not explicitly model sampling zeros. Additionally, we repeated this analysis with greater numbers of taxa (71, 102, and 160) and sampling zeros (120, 240, and 480) and 27 time points to validate our model’s performance in a larger setting (SFigure 4d).

To confirm that the estimation of sampling zeros is not critically sensitive to the prior distribution of β, we considered a perturbed prior, β ~ Beta(θβ,1, θβ,2) where θβ,1 ~ Beta(16, 4) and θβ,2 ~ Beta(8, 12) (SFigure 4c). Then, we compared the β estimates obtained with the original and perturbed prior in the case of Figure 2c (Figure 2e). The β estimates were stable with respect to the prior distribution; the means and the corresponding standard deviations of the differences were -0.006±0.044, 0.006±0.056, -0.005±0.066, and -0.011±0.134 with 10, 20, 40, and 120 sampling zeros, respectively.

3.3 Differential response of bacterial orders to environmental perturbations

To demonstrate our approach on real data, we reanalyzed the longitudinal gut microbial 16S rRNA sequencing data sets of four individuals, referred to as M3 and F4 (Caporaso et al., 2011) and Subject A and B (Caporaso et al., 2011; David et al., 2014) (see Supplementary Material). The percentages of zeros varied between 66% and 78% in these data sets (see Supplementary Material). Due to the sparsity of the data we grouped all Operational Taxonomic Units (OTUs) according to phylogenetic order and analyzed the resulting compositions. We visualize the dynamics of the orders in SFigures 5–8 by plotting the posterior mean composition estimates of bacterial orders with corresponding credible intervals at time points with and without measurements. For comparison, we included the maximum likelihood estimates (MLEs) under the multinomial model with and without the locally weighted scatterplot smoothing (LOWESS) (Cleveland, 1981).

We first focus on the Subject B time series. From days 151 to 159 the subject had a Salmonella infection; as expected, relative abundance of Enterobacteriales increases upon the infection as reported in (David et al., 2014) (SFigure 6a). Similarly, relative abundance of Enterobacteriales in Subject A’s gut microbiota is greater during the travel abroad (SFigure 5a). The relative abundance of Bifidobacteriales decreases during the time Subject A spent abroad (from 7e-2 to 2e-2) (SFigure 5a). The disappearance of the RF39 order from the gut microbiota of Subject B coincides with the Salmonella infection (average relative abundances pre-infection and post-infection are 5e-3 and 8e-7, respectively) (SFigure 6a). The decrease in the relative abundance of Enterobacteriales in F4’s gut microbiota around 50 days coincides with the increase of the relative abundances of Burkholderiales (SFigure 8a). Interestingly, our results suggest that F4’s gut microbiota undergo a global transition between states around 50 days (SFigure 9). Identification of the importance and/or the cause of this would require additional metadata. Finally, TGP-CODA quantifies the uncertainty in estimates caused by lower sequencing depth and missing samples (e.g., see lowly abundant orders Gallionellales in SFigure 5, Acidimicrobiales in SFigure 6, and Gammaproteobacteria in SFigure 8).

To confirm that the results are not too sensitive to the selected covariance function, we reanalysed the Subject A data using the Matérn covariance function (v = 3/2). The obtained similar results suggest that our method is stable with respect to the chosen covariance function (SFigures 9,10); the slightly less smooth processes are expected as the Matérn covariance function (with v = 3/2) leads to processes that are 1-times mean square (MS) differentiable, whereas the squared exponential covariance function leads to processes that are infinitely MS-differentiable. Additionally, to verify that our method does not produce analysis artifacts due to the temporal modeling, we shuffled the time points in the Subject A data set and analyzed the shuffled data (SFigure 11). As expected, we did lose the signals observed with the original data (SFigure 9). Importantly, the LOWESS estimator does seem to overfit the shuffled data (SFigure 11).

Collectively, our temporal approach is able to recover patterns from highly noisy 16S rRNA data which are not apparent from the MLEs even when these perturbations effect extreme restructuring of the dynamics and composition of the niche.

3.4 Effect of sampling frequency on estimating microbiome dynamics

To see study how the data sampling frequency affects the results, we performed downsampling experiments. Specifically, we reanalyzed Subject A data by taking into account only measurements from either every second or third time point (SFigures 12,13). Overall, the obtained results with the full and downsampled data sets are highly similar suggesting that daily sampling is not necessary to capture human gut microbiota dynamics (SFigures 4,9,12,13). In Figure 3, we illustrate four examples of how different sampling frequencies can affect results. As expected, when sampling frequency drops credible intervals become wider (see Bacteroidales in Figure 3). Importantly, the LOWESS estimates are sensitive to the sampling frequency, which suggests that the LOWESS estimator tends to overfit data (see Enterobacteriales, Sphingomonadales, and Myxococcales in Figure 3). The observed overfitting, especially among lowly abundant orders, is not surprising since LOWESS and ML estimation do not take into account the statistical nature of count data. Additionally, in contrast to our method, it is not straightforward to interpolate data with LOWESS.

Figure 3:
  • Download figure
  • Open in new tab
Figure 3: Effect of sampling frequency on the estimation of bacterial order dynamics

(a) Dynamics of the proportions of Enterobacteriales (first row), Bacteroidales (second row), Sphingomonadales (third row), and Myxococcales (fourth row) in Subject A’s gut microbiota over time. The black circles are the posterior mean estimates, ΘG, from the temporal analysis. The filled regions show the 5% and 95% credible intervals. The semi-transparent circles depict the maximum likelihood estimates under the multinomial model. The orange curve is the LOWESS (α = 0.05, which corresponds approximately to 20 days) estimate calculated from the maximum likelihood estimates. The time period where the subject was abroad and suffered from diarrhea are illustrated using the three shaded rectangles. (b) As in (a) but in the case when only every second time point is considered. (c) As in (a) but in the case when only every third time point is considered.

3.5 Revisiting dynamics of human gut microbiota

We next analyze the dynamical properties of the inferred time series and their ecological implications. Our Bayesian framework, together with the use of separate analysis windows, enables us to study the posterior distributions of length scales ρi inferred from the different time series. These distributions can serve as global summary statistics of the whole gut microbiota dynamics upon environmental perturbations. We illustrate the results for the Subject A time series over all the bacterial orders in Figure 4a. We first compare the profiles of prior and posterior distributions. We observe that the experimental data supports longer length-scales (i.e., greater temporal correlation) (Figure 4a; see SFigure 1b for interpretation) suggesting that the smoothness of the obtained profiles is not merely an analysis artifact caused by the length-scale prior (SFigure 1a).

Figure 4:
  • Download figure
  • Open in new tab
Figure 4: Kinetics of Subject A’s gut microbiota

(a) Black and red shaded regions are prior and posterior distributions of the length-scale parameter, respectively. The posterior distributions obtained in different analysis windows are illustrated separately (the days corresponding to each of the windows are listed in the titles). Posterior densities are estimated using Gaussian kernel density estimation (the Scott’s rule for estimating the bandwidth) on the pooled length-scale posterior samples over all the bacterial orders. (b) The posterior mean of the length-scale parameter and the corresponding standard deviations of Bifidobacteriales in different analysis windows (the window numbers correspond to the ones listed in (a)). (c) Dynamics of Bifidobacteriales in Subject A’s gut microbiota over time. The black circles are the posterior mean estimates, ΘG, from the temporal analysis. The filled regions show the 5% and 95% credible intervals. The semi-transparent circles depict the maximum likelihood estimates under the multinomial model. The time period where the subject was abroad and suffered from diarrhea are illustrated using the three shaded rectangles.

Across all windows, the posterior distributions have an overall similar right-skewed shape and cover a wide range of length scales. This suggests that, on the population level, each bacterial order has different degrees of internal temporal correlations that are persistent across the entire time series (Figure 4). We can also identify several bacterial orders that change their kinetics upon perturbations, as reflected in a potential bi-modality of the distribution between 44 and 149 days (windows 3 and 4 in Figure 4a). To highlight the effect of environmental perturbations, we visualize the length-scale distributions of Bifidobacteriales (Figure 4b,c). The dip in average length scale between windows 2 to 6 suggest that Bifidobacteriales’ kinetics are accelerated upon traveling abroad and being exposed to novel diet.

We next analyze estimates of autocorrelation, persistence, and self-affinity (self-similarity) for the most abundant bacterial orders (mean relative abundance >1e-3 across all four time series under TGP-CODA and ML modeling. We first calculate the sample autocorrelation function (ACF) for lags up to k = 60 (SFigure 14a). The TGP-CODA-derived time series show consistently longer autocorrelations (close to 1 in most cases) than the ML-based time series. For most bacterial orders, positive autocorrelation exists for up to a month under TGP-CODA. Coriobacteriales shows particularly strong long-term positive autocorrelation for both Subject A and B. To estimate the degree of self-affinity and the temporal persistence of the bacterial orders we use Hurst’s rescaled range analysis (Hurst, 1951; Di Matteo et al., 2003), resulting in scaling estimates of the Hurst exponent H ϵ [0, 1] (SFigure 14b). For ML-based time series we consistently estimate low H values across all time series (mean H ϵ[0.15, 0.25]), indicative of memory-less underlying processes, whereas TGP-CODA modeling results in considerably larger Hurst exponent estimates (mean H ϵ [0.8, 0.85]), hinting at underlying persistent, self-affine, long-term memory processes. Spectral analysis of the TGP-CODA-modeled times series reveals a scaling of the power spectrum S(f) ~ 1/f β with β ϵ [1.7, 4.2] for the majority of orders (SFigure 15). These results indicate that most time series modeled with TGP-CODA show non-stationary fractional Brownian motion behavior with long-term memory, persistence, and self-affinity.

4 Discussion and conclusions

The difficulty of obtaining well-controlled biological replicates renders the estimation of biological and tech-nical variation from individual time points impractical, thus severely limiting interpretability of human microbiome studies. To overcome this limitation, we have derived a probabilistic model, the Temporal Gaussian Process model for Compositional Data Analysis (TGP-CODA), that comprises non-parametric temporal, explicit overdispersion, and zero-inflation noise components leveraging temporal relationships be-tween time points and integrative analysis of all the bacterial taxa (to account for population structure and the compositional nature of typical microbiome data sets). Our results demonstrate that the lack of replicates for longitudinal human gut microbial data can be partially mitigated by our method in the case of proper experimental design: dense time series. Our temporal modeling framework can seamlessly incor-porate different experimental designs, such as non-equidistant sampling over time, missing time points, and variable sequencing depth. Our framework also quantifies the uncertainty of the final estimates, which is an important property in integrated microbiome studies, where downstream analysis methods might propagate this error.

Our results on real and synthetic data demonstrate TGP-CODA’s validity and superior performance for analyzing longitudinal microbiome data. Temporal autocorrelation and scaling analysis also revealed that ML and TGP-CODA modeling have a fundamental impact on time series characteristics and their ecological interpretation. ML modeling suggests that the observed time series are stationary and possess short-term memory, driven by white noise. TGP-CODA modeling suggests that relative abundances of microbiota are self-affine, persistent, and possess long-term memory, driven by Brownian noise. Using TGP-CODA, the Hurst exponents of the majority of microbial orders are in remarkable agreement to those of long species abundance time series across the tree of life, including fresh water diatoms (H = 0.85) and vertebrates (H = 0.77) (Arino and Pimm, 1995). Determining the true underlying dynamics as well as the appropriate environmental noise characteristics will be a key objective for future research because these features will have a major impact on our understanding of species persistence in microbial ecosystems and their potential extinction rates (Sugihara and May, 1990; Cuddington and Yodzis, 1999).

This work also suggests several research questions for future experimental and computational studies. Key objectives are to determine (1) which approximations can be made to the probabilistic model without compromising its validity and (2) how improved temporal analysis can be leveraged to estimate directed, time-varying microbial association networks. A key area of future development will also be the application of TGP-CODA-type methods to mixed experimental designs that include both cross-sectional (perturbation, steady-state) data and time series data. One could envision using time series data to estimate taxon specific zero-inflation parameters that serve as more accurate prior for estimates in cross-sectional data. Another important extension of the model would be the inclusion of the spatial information in a unifying GP modeling framework, which would greatly advance our understanding of microbial ecosystems across space and time.

As the prevalence and public availability of dense time series (including hybrid cross-sectional and time series data) in microbiome research will only increase in the near future, the importance of explicit treatments of microbiome dynamics with models like the one presented herein will likely be instrumental for a deeper understanding of microbial ecosystems.

Funding

The authors declare no competing financial or non-financial competing interests. This work was supported by Simons Foundation, Center for Computational Biology, US National Science Foundation [IOS-1126971, CBET-1067596, CHE-1151554], National Institutes of Health [GM 32877-21/22, PN2-EY016586, IU54CA143907-01, EY016586-06].

Acknowledgements

We acknowledge the computational resources provided by the computing group of Simons Center for Data Analysis.

References

  1. ↵
    Aach, J. and Church, G. M. (2001). Aligning gene expression time series with time warping algorithms. Bioinformatics, 17(6), 495–508.
    OpenUrlCrossRefPubMedWeb of Science
  2. ↵
    Ahdesmäki, M., Lähdesmäki, H., Gracey, A., Shmulevich, L., and Yli-Harja, O. (2007). Robust regression for periodicity detection in non-uniformly sampled time-course gene expression data. BMC Bioinformatics, 8, 233.
    OpenUrlCrossRefPubMed
  3. ↵
    Äijö, T., Butty, V., Chen, Z., Salo, V., Tripathi, S., Burge, C. B., Lahesmaa, R., and L¨ahdesm¨aki, H. (2014). Methods for time series analysis of rna-seq data with application to human th17 cell differentiation. Bioinformatics, 30(12), i113–i120.
    OpenUrlCrossRefPubMed
  4. ↵
    Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal Statistical Society. Series B (Methodological), pages 139–177.
  5. ↵
    Amann, R. I., Binder, B. J., Olson, R. J., Chisholm, S. W., Devereux, R., and Stahl, D. A. (1990). Combination of 16s rrna-targeted oligonucleotide probes with flow cytometry for analyzing mixed microbial populations. Appl Environ Microbiol, 56(6), 1919–1925.
    OpenUrlAbstract/FREE Full Text
  6. ↵
    Arino, A. and Pimm, S. L. (1995). On the nature of population extremes. Evolutionary Ecology, 9(4), 429–443.
    OpenUrlCrossRefWeb of Science
  7. ↵
    Bar-Joseph, Z., Farkash, S., Gifford, D. K., Simon, I., and Rosenfeld, R. (2004). Deconvolving cell cycle expression data with complementary information. Bioinformatics, 20 Suppl 1, i23–i30.
    OpenUrl
  8. ↵
    Bar-Joseph, Z., Gitter, A., and Simon, I. (2012). Studying and modelling dynamic biological processes using time-series gene expression data. Nat Rev Genet, 13(8), 552–564.
    OpenUrlCrossRefPubMed
  9. ↵
    Bishop, C. M. (2006). Pattern recognition and machine learning. Springer New York.
  10. ↵
    Bonneau, R., Reiss, D. J., Shannon, P., Facciotti, M., Hood, L., Baliga, N. S., and Thorsson, V. (2006). The inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biol, 7(5), R36.
    OpenUrlCrossRefPubMed
  11. ↵
    Brennecke, P., Anders, S., Kim, J. K., Kolodziejczyk, A. A., Zhang, X., Proserpio, V., Baying, B., Benes, V., Teichmann, S. A., Marioni, J. C., and Heisler, M. G. (2013). Accounting for technical noise in single-cell rna-seq experiments. Nat Methods, 10(11), 1093–1095.
    OpenUrlCrossRefPubMedWeb of Science
  12. ↵
    Bucci, V., Tzen, B., Li, N., Simmons, M., Tanoue, T., Bogart, E., Deng, L., Yeliseyev, V., Delaney, M. L., Liu, Q., Olle, B., Stein, R. R., Honda, K., Bry, L., and Gerber, G. K. (2016). MDSINE: Microbial Dynamical Systems INference Engine for microbiome time-series analyses. Genome Biology, 17(1), 121.
    OpenUrlCrossRef
  13. ↵
    Buffie, C. G., Bucci, V., Stein, R. R., McKenney, P. T., Ling, L., Gobourne, A., No, D., Liu, H., Kinnebrew, M., Viale, A., Littmann, E., van den Brink, M. R. M., Jenq, R. R., Taur, Y., Sander, C., Cross, J. R., Toussaint, N. C., Xavier, J. B., and Pamer, E. G. (2015). Precision microbiome reconstitution restores bile acid mediated resistance to clostridium difficile. Nature, 517(7533),205–208.
    OpenUrlCrossRefPubMedWeb of Science
  14. ↵
    Caporaso, J. G., Lauber, C. L., Costello, E. K., Berg-Lyons, D., Gonzalez, A., Stombaugh, J., Knights, D., Gajer, P., Ravel, J., Fierer, N., Gordon, J. I., and Knight, R. (2011). Moving pictures of the human microbiome. Genome Biol, 12(5), R50.
  15. ↵
    Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M. A., Guo, J., Li, P., and Riddell, A. (in press). Stan: A probabilistic programming language. Journal of Statistical Software.
  16. ↵
    Chen, J. and Li, H. (2013). Variable selection for sparse dirichlet-multinomial regression with an application to microbiome data analysis. The annals of applied statistics, 7(1).
  17. ↵
    Cleveland, W. S. (1981). Lowess: A program for smoothing scatterplots by robust locally weighted regression. The American Statistician, 35(1), 54.
    OpenUrl
  18. ↵
    Cuddington, K. M. and Yodzis, P. (1999). Black noise and population persistence. Proceedings of the Royal Society B: Biological Sciences, 266(1422), 969.
    OpenUrlCrossRefWeb of Science
  19. ↵
    David, L. A., Materna, A. C., Friedman, J., Campos-Baptista, M. I., Blackburn, M. C., Perrotta, A., Erdman, S. E., and Alm, E. J. (2014). Host lifestyle affects human microbiota on daily timescales. Genome Biol, 15(7), R89.
    OpenUrlCrossRefPubMed
  20. ↵
    Di Matteo, T., Aste, T., and Dacorogna, M. M. (2003). Scaling behaviors in differently developed markets. Physica A: Statistical Mechanics and its Applications, 324(1-2), 183–188.
    OpenUrl
  21. ↵
    Faith, J. J., Guruge, J. L., Charbonneau, M., Subramanian, S., Seedorf, H., Goodman, A. L., Clemente, J. C., Knight, R., Heath, A. C., Leibel, R. L., Rosenbaum, M., and Gordon, J. I. (2013). The long-term stability of the human gut microbiota. Science, 341(6141), 1237439.
  22. ↵
    Faust, K., Lahti, L., Gonze, D., de Vos, W. M., and Raes, J. (2015). Metagenomics meets time series analysis: unraveling microbial community dynamics. Curr Opin Microbiol, 25, 56–66.
    OpenUrlCrossRefPubMed
  23. ↵
    Fisher, C. K. and Mehta, P. (2014). Identifying keystone species in the human gut microbiome from metagenomic timeseries using sparse linear regression. PLoS ONE, 9(7), 1–10.
    OpenUrlCrossRefPubMed
  24. ↵
    Flores, G. E., Caporaso, J. G., Henley, J. B., Rideout, J. R., Domogala, D., Chase, J., Leff, J. W., Vázquez-Baeza, Y., Gonzalez, A., Knight, R., Dunn, R. R., and Fierer, N. (2014). Temporal variability is a personalized feature of the human microbiome. Genome Biol, 15(12), 531.
    OpenUrlCrossRefPubMed
  25. ↵
    Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical science, pages 457–472.
  26. ↵
    Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2014). Bayesian data analysis, volume 2. Taylor & Francis, Boca Raton.
  27. ↵
    Gerber, G. K. (2014). The dynamic microbiome. FEBS Letters, 588(22), 4131–4139.
    OpenUrlCrossRefPubMed
  28. ↵
    Gerber, G. K., Onderdonk, A. B., and Bry, L. (2012). Inferring dynamic signatures of microbes in complex host ecosystems. PLoS Comput Biol, 8(8), e1002624.
    OpenUrlCrossRefPubMed
  29. ↵
    Hell, K., Edwards, A., Zarsky, J., Podmirseg, S. M., Girdwood, S., Pachebat, J. A., Insam, H., and Sattler, B. (2013). The dynamic bacterial communities of a melting high arctic glacier snowpack. ISME J, 7(9), 1814–1826.
    OpenUrlCrossRefPubMedWeb of Science
  30. ↵
    Holmes, I., Harris, K., and Quince, C. (2012). Dirichlet multinomial mixtures: generative models for microbial metagenomics. PLoS One, 7(2), e30126.
    OpenUrlCrossRefPubMed
  31. ↵
    Hurst, H. E. (1951). Long-term storage capacity of reservoirs. Trans. Amer. Soc. Civil Eng., 116, 770–808.
    OpenUrl
  32. ↵
    Ivanov, I. I., Atarashi, K., Manel, N., Brodie, E. L., Shima, T., Karaoz, U., Wei, D., Goldfarb, K. C., Santee, C. A., Lynch, S. V., Tanoue, T., Imaoka, A., Itoh, K., Takeda, K., Umesaki, Y., Honda, K., and Littman, D. R. (2009). Induction of intestinal th17 cells by segmented filamentous bacteria. Cell, 139(3), 485–498.
    OpenUrlCrossRefPubMedWeb of Science
  33. ↵
    Jayaraman, S. (2009). Digital image processing. Tata McGraw Hill Education Private Limited, New Delhi.
  34. ↵
    Jernberg, C., L¨ofmark, S., Edlund, C., and Jansson, J. K. (2010). Long-term impacts of antibiotic exposure on the human intestinal microbiota. Microbiology, 156(Pt 11), 3216–3223.
    OpenUrlCrossRefPubMedWeb of Science
  35. ↵
    Kuczynski, J., Lauber, C. L., Walters, W. A., Parfrey, L. W., Clemente, J. C., Gevers, D., and Knight, R. (2012). Experimental and analytical tools for studying the human microbiome. Nat Rev Genet, 13(1), 47–58.
    OpenUrlCrossRefPubMed
  36. ↵
    Kurtz, Z. D., Müler, C. L., Miraldi, E. R., Littman, D. R., Blaser, M. J., and Bonneau, R. A. (2015). Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput Biol, 11(5), e1004226.
    OpenUrlCrossRefPubMed
  37. ↵
    Leek, J. T., Monsen, E., Dabney, A. R., and Storey, J. D. (2006). Edge: extraction and analysis of differential gene expression. Bioinformatics, 22(4), 507–508.
    OpenUrlCrossRefPubMedWeb of Science
  38. ↵
    Lozupone, C. A., Stombaugh, J. I., Gordon, J. I., Jansson, J. K., and Knight, R. (2012). Diversity, stability and resilience of the human gut microbiota. Nature, 489(7415), 220–230.
    OpenUrlCrossRefPubMedWeb of Science
  39. ↵
    Meron, D., Rodolfo-Metalpa, R., Cunning, R., Baker, A. C., Fine, M., and Banin, E. (2012). Changes in coral microbial communities in response to a natural ph gradient. ISME J, 6(9), 1775–1785.
    OpenUrlCrossRefPubMedWeb of Science
  40. ↵
    Mohri, M. and Roark, B. (2005). Structural zeros versus sampling zeros. Technical report, Technical Report# CSE-05-003, Computer Science & Electrical Engineering, Oregon Health & Science University.
  41. ↵
    Rasmussen, C. E. and Williams, C. K. I. (2005). Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning series). The MIT Press, Cambridge.
  42. ↵
    Romero, R., Hassan, S. S., Gajer, P., Tarca, A. L., Fadrosh, D. W., Nikita, L., Galuppi, M., Lamont, R., Chaemsaithong, P., Miranda, J., Chaiworapongsa, T., and Ravel, J. (2014). The composition and stability of the vaginal microbiota of normal pregnant women is different from that of non-pregnant women. Microbiome, 2(1), 4.
    OpenUrlCrossRefPubMed
  43. ↵
    Severini, T. A. (2005). Elements of distribution theory. Cambridge University Press, New York.
  44. ↵
    Shafiei, M., Dunn, K. A., Boon, E., MacDonald, S. M., Walsh, D. A., Gu, H., and Bielawski, J. P. (2015). BioMiCo: a supervised Bayesian model for inference of microbial community structure. Microbiome, 3, 8.
    OpenUrlCrossRef
  45. ↵
    Stein, R. R., Bucci, V., Toussaint, N. C., Buffie, C. G., Rästsch, G., Pamer, E. G., Sander, C., and Xavier, J. B. (2013). Ecological modeling from time-series inference: insight into dynamics and stability of intestinal microbiota. PLoS Comput Biol, 9(12), e1003388.
    OpenUrlCrossRefPubMed
  46. ↵
    Sugihara, G. and May, R. M. (1990). Applications of fractals in ecology. Trends in Ecology and Evolution, 5(3).
  47. ↵
    Webb-Robertson, B.-J. M., Wiberg, H. K., Matzke, M. M., Brown, J. N., Wang, J., McDermott, J. E., Smith, R. D., Rodland, K. D., Metz, T. O., Pounds, J. G., and Waters, K. M. (2015). Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics. J Proteome Res, 14(5), 1993–2001.
    OpenUrlCrossRefPubMed
  48. ↵
    Xia, F., Chen, J., Fung, W. K., and Li, H. (2013). A logistic normal multinomial regression model for microbiome compositional data analysis. Biometrics, 69(4), 1053–1063.
    OpenUrlCrossRef
  49. ↵
    Xu, L., Paterson, A. D., Turpin, W., and Xu, W. (2015). Assessment and selection of competing models for zero-inflated microbiome data. PLoS One, 10(7), e0129606.
    OpenUrlCrossRefPubMed
  50. ↵
    Yatsunenko, T., Rey, F. E., Manary, M. J., Trehan, I., Dominguez-Bello, M. G., Contreras, M., Magris, M., Hidalgo, G., Baldassano, R. N., Anokhin, A. P., Heath, A. C., Warner, B., Reeder, J., Kuczynski, J., Caporaso, J. G., Lozupone, C. A., Lauber, C., Clemente, J. C., Knights, D., Knight, R., and Gordon, J. I. (2012). Human gut microbiome viewed across age and geography. Nature, 486(7402), 222–227.
    OpenUrlCrossRefPubMedWeb of Science
Back to top
PreviousNext
Posted September 22, 2016.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Temporal probabilistic modeling of bacterial compositions derived from 16S rRNA sequencing
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Temporal probabilistic modeling of bacterial compositions derived from 16S rRNA sequencing
Tarmo Äijö, Christian L. Müller, Richard Bonneau
bioRxiv 076836; doi: https://doi.org/10.1101/076836
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Temporal probabilistic modeling of bacterial compositions derived from 16S rRNA sequencing
Tarmo Äijö, Christian L. Müller, Richard Bonneau
bioRxiv 076836; doi: https://doi.org/10.1101/076836

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Ecology
Subject Areas
All Articles
  • Animal Behavior and Cognition (4382)
  • Biochemistry (9591)
  • Bioengineering (7090)
  • Bioinformatics (24856)
  • Biophysics (12600)
  • Cancer Biology (9955)
  • Cell Biology (14349)
  • Clinical Trials (138)
  • Developmental Biology (7948)
  • Ecology (12105)
  • Epidemiology (2067)
  • Evolutionary Biology (15988)
  • Genetics (10925)
  • Genomics (14738)
  • Immunology (9869)
  • Microbiology (23659)
  • Molecular Biology (9484)
  • Neuroscience (50855)
  • Paleontology (369)
  • Pathology (1539)
  • Pharmacology and Toxicology (2681)
  • Physiology (4013)
  • Plant Biology (8657)
  • Scientific Communication and Education (1508)
  • Synthetic Biology (2394)
  • Systems Biology (6433)
  • Zoology (1346)