## Abstract

**Summary** Many fundamental questions in evolutionary biology entail estimating rates of lineage diversification (speciation – extinction). We develop a flexible Bayesian framework for specifying an effectively infinite array of diversification models—where rates are constant, vary continuously, or change episodically through time—and implement numerical methods to estimate parameters of these models from molecular phylogenies, even when species sampling is incomplete. Additionally we provide robust methods for comparing the relative and absolute fit of competing branching-process models to a given tree, thereby providing rigorous tests of biological hypotheses regarding patterns and processes of lineage diversification.

**Availability and implementation** the source code for TESS is freely available at http://cran.r-project.org/web/packages/TESS/.

**Contact** Sebastian.Hoehna{at}gmail.com

## 1 Introduction

Stochastic-branching process models (*e.g.,* birth-death models) describe the process of diversification that gave rise to a given study tree, and include parameters such as the rate of speciation and extinction. Parameters of these models are commonly estimated from molecular phylogenies using maximum-likelihood methods (*e.g.*, Paradis *et al.*, 2004; Rabosky, 2006; Stadler, 2013). There are several potential benefits of pursuing this inference problem in Bayesian statistical framework, such as: (1) providing a natural means for accommodating uncertainty in our estimates (by inferring parameters as posterior probability densities rather than point values); (2) incorporating prior information regarding various aspects of the branching-process models (such as the expected number or severity of mass-extinction events), and; (3) leveraging robust Bayesian approaches for model comparison and model averaging.

These considerations influenced our development of TESS, an R package for the Bayesian inference of lineage diversification rates that allows researchers to address three fundamental questions: (1) *What are the rates of the process that gave rise to my study tree?* (2) *Have diversification rates changed through time in my study tree?* (3) *Is there evidence that my study tree experienced mass extinction?*

## 2 Methods and algorithms

### Branching-process models

Inferring rates of lineage diversification is based on the *reconstructed evolutionary process* described by Nee *et al.* (1994); a birth-death process in which only sampled, extant lineages are observed. Our implementation exploits recent theoretical work (Lambert, 2010; Höhna, 2013, 2014, 2015) that allows the rate of diversification to be specified as an arbitrary function of time. By virtue of adopting this generic approach, it is possible to specify an effectively infinite number of branching-process models in TESS. These possibilities correspond to four main types of diversification models: (1) constant-rate birth-death models; (2) continuously variable-rate birth-death models; (3) episodically variable-rate birth-death models, and; (4) explicit mass-extinction birth-death models.

### Phylogenetic data

Parameters of the branching-process models are inferred from a given study tree. Specifically, TESS takes as input rooted *ultrametric* trees, where all of the tips are sampled at the same time horizon (the present). Other types of trees—*e.g.,* where tips are sampled sequentially through time (Heath *et al.*, 2014)—are currently not supported. It is now well established that estimates of diversification rates are sensitive to incomplete species sampling (*i.e.,* where the study tree includes only a fraction of the described species; Cusimano and Renner, 2010; Höhna *et al.*, 2011). This is a particular concern, as most empirical phylogenies include only a fraction of the member species. Accordingly, TESS implements various approaches for accommodating incompletely sampled trees, including uniform sampling and diversified sampling schemes (Höhna *et al.*, 2011; Höhna, 2014).

### Parameter estimation

In TESS, parameters of the branching-process models are inferred in a Bayesian statistical framework. Specifically, we estimate the joint posterior probability density of the model parameters from the study tree using numerical methods—Markov chain Monte Carlo (MCMC) algorithms (Figure 1). The numerical methods implemented in TESS include adaptive-MCMC algorithms (Haario *et al.*, 1999)—where the scale of the proposal mechanisms is automatically tuned to ensure optimal efficiency (mixing) of the MCMC simulation—and also feature real-time diagnostics to assess convergence of the MCMC simulation to the stationary distribution (the joint posterior probability density of the model parameters).

### Model comparison

Each branching-process model specifies a possible scenario for the diversification process that gave rise to a given study tree. For most studies, several (possibly many) competing branching-process models of varying complexity will be plausible *a priori*. We therefore need a way to objectively identify the best candidate diversification model. Bayesian model selection is based on *Bayes factors* (*e.g.,* Kass and Raftery, 1995; Suchard *et al.*, 2001; Holder and Lewis, 2003). This procedure requires that we first estimate the marginal likelihood of each candidate model, and then compare the ratio of the marginal likelihoods for each pair of candidate models. We have implemented both *stepping-stone sampling* (Xie *et al.*, 2011; Fan *et al.*, 2011) and *path-sampling* (Lartillot and Philippe, 2006; Baele *et al.*, 2012) algorithms for estimating the marginal likelihoods of branching-process models in TESS, which provides a robust and flexible framework for Bayesian tests of diversification-rate hypotheses.

### Model adequacy

Bayes factors allow us to assess the *relative* fit of two or more competing branching-process models to a given study tree. However, even the very best of the competing models may nevertheless be woefully inadequate in an *absolute* sense. Accordingly, TESS implements methods to assess the absolute fit of a candidate diversification model to a given study tree using *posterior-predictive simulation* (Gelman *et al.*, 1996; Bollback, 2002; Moore and Donoghue, 2009; Brown, 2014). The basic premise of this approach is as follows: if the diversification model under consideration provides an adequate description of the process that gave rise to our study tree, then we should be able to use that model to generate new phylogenies that are in some sense ‘similar’ to our study tree. TESS permits use of any summary statistic—*e.g.,* the *γ*-statistic (Pybus and Harvey, 2000) or the nLTT statistic (Janzen *et al.*, 2015)—to measure the similarity between predicted and observed data.

### Model averaging

The vast space of possible branching-process models precludes their exhaustive pairwise comparison using Bayes factors. This issue may be addressed by means of model-averaging approaches that treat the model as a random variable (Huelsenbeck *et al.*, 2004, 2006). TESS implements such an approach; the CoMET (CPP on Mass-Extinction Times) model (May *et al.*, 2015). The CoMET model is comprised of three compound Poisson process (CPP) models that describe three corresponding types of events: (1) instantaneous tree-wide shifts in speciation rate; (2) instantaneous tree-wide shifts in extinction rate, and; (3) instantaneous tree-wide mass-extinction events. The dimensions of the CoMET model are therefore dynamic; there is effectively an infinite number of nested models that include zero or more events. We use reversible-jump MCMC to average over all possible models, visiting each model in proportion to its posterior probability (Green, 1995; Huelsenbeck *et al.*, 2000). The resulting joint posterior probability distribution can then be queried to assess whether the study tree has been impacted by mass extinction, and if so, to identify the number and timing of those events using Bayes factors (Figure 2).

## 3 Conclusions

TESS allows users to specify an effectively countless number of diversification models, where each model describes an alternative scenario for the diversification of the study tree. Additionally, TESS provides robust methods for assessing the relative fit of competing models to a given study tree, providing users with an extremely flexible yet intuitive framework for testing hypotheses regarding the patterns and processes of lineage diversification. We are optimistic that the implementation of a robust and powerful Bayesian statistical framework for exploring rates of lineage diversification will provide biologists with an important tool for advancing our understanding of the processes that have shaped the Tree of Life.

## Acknowledgements

*Funding*: This research was made possible by NSF grants DEB-0842181, DEB-0919529, DBI-1356737, and DEB-1457835 awarded to BRM, and by a Miller Institute for Basic Research in Science scholarship awarded to SH.

*Conflict of interest*: None declared.