## Abstract

Divergence-time estimation based on molecular phylogenies and the fossil record has provided insights into fundamental questions of evolutionary biology. In Bayesian node dating, phylogenies are commonly time calibrated through the specification of calibration densities on nodes representing clades with known fossil occurrences. Unfortunately, the optimal shape of these calibration densities is usually unknown and they are therefore often chosen arbitrarily, which directly impacts the reliability of the resulting age estimates. As possible solutions to this problem, two non-exclusive alternative approaches have recently been developed, the “fossilized birth-death” model and “total-evidence dating”. While these approaches have been shown to perform well under certain conditions, they require including all (or a random subset) of the fossils of each clade in the analysis, rather than just relying on the oldest fossils of clades. In addition, both approaches assume that fossil records of different clades in the phylogeny are all the product of the same underlying fossil sampling rate, even though this rate has been shown to differ strongly between higher-level taxa. We here develop a flexible new approach to Bayesian node dating that combines advantages of traditional node dating and the fossilized birth-death model. In our new approach, calibration densities are defined on the basis of first fossil occurrences and sampling rate estimates that can be specified separately for all clades. We verify our approach with a large number of simulated datasets, and compare its performance to that of the fossilized birth-death model. We find that our approach produces reliable age estimates that are robust to model violation, on par with the fossilized birth-death model. By applying our approach to a large dataset including sequence data from over 1000 species of teleost fishes as well as 147 carefully selected fossil constraints, we recover a timeline of teleost diversification that is incompatible with previously assumed vicariant divergences of freshwater fishes. Our results instead provide strong evidence for trans-oceanic dispersal of cichlids and other groups of teleost fishes. (Keywords: node dating; calibration density; relaxed molecular clock; fossil record; Cichlidae; marine dispersal)

In phylogenetic analyses, molecular sequence data are commonly used to infer not only the relationships between species, but also the divergence times between them. The estimation of divergence times in a phyogenetic context is usually based on an assumed correlation between the age of species separation and the number of observed genetic differences, i.e. a “molecular clock” (Zuckerkandl and Pauling 1962). Evidence for the existence of molecular clocks initially derived from relative rate tests (Sarich and Wilson 1967) and has since been corroborated by a large body of literature (e.g. Wilson et al. 1977; Bromham and Penny 2003). However, it has been shown that the rate of the molecular clock often differs between lineages (Drummond et al. 2006) and that it can depend on factors including body size, metabolic rate, and generation time (Martin and Palumbi 1993; Nabholz et al. 2008).

To allow the estimation of absolute divergence dates from sequence data, a calibration of the rate of the molecular clock is required. This calibration can be obtained from serially sampled DNA sequences, if the range of sampling times is wide enough to allow accumulation of substantial genetic differences between the first and last sampling event (Drummond et al. 2003), which is often the case for rapidly evolving viruses (Faria et al. 2014; Smith et al. 2009; Gire et al. 2014). However, for macroevolutionary studies aiming to estimate divergence times of eukaryotic organisms on the order of tens or hundreds of million years, genetic differences accumulated between sampling times are negligible and other sources of calibration information are required. Commonly, the age of the oldest known fossil of a given clade is then used to calibrate the age of this clade, an approach often referred to as “node dating” (Ronquist et al. 2012; Grimm et al. 2015). However, due to the incompleteness of the fossil record, clade origin will almost always predate the preservation of its oldest known fossil. As a result, fossils can provide absolute minimum clade ages, but are usually less informative regarding the maximum ages of clades (Benton and Donoghue 2007). In a Bayesian framework for phylogenetic time-calibration, the uncertainty regarding clade ages can be accomodated by the specification of “calibration densities” (also referred to as “node age priors”) with a hard lower bound set to the age of the earliest fossil record and a soft upper bound as provided by exponential, lognormal, or gamma distributions. Unfortunately, the optimal parameterization of these distributions is usually unknown but has been shown to have a strong influence on the resulting age estimates (Ho and Phillips 2009). In addition, the effect of inaccurate calibration densities can only partially be corrected with larger molecular datasets (Yang and Rannala 2006).

Other shortcomings of node dating have been identified. As described by Heled and Drummond (2012), calibration densities interact with each other and with the tree prior to produce marginal prior distributions of node ages that may differ substantially and in unpredictable ways from the specified calibration density. The application of recently-introduced calibrated tree priors can compensate for this effect, but becomes computationally expensive when more than a handful of calibrations are used in the analysis (Heled and Drummond 2015). Node dating has also been criticized for ignoring most of the information from the fossil record, as only the oldest known fossils of each clade are used to define calibration densities (Ronquist et al. 2012, but see Marshall 2008; Claramunt and Cracraft 2015). Furthermore, node dating relies on the correct taxonomic assignment of fossils to clades, and may produce misleading age estimates when fossils are misplaced on the phylogeny (Marshall 2008; Ho and Phillips 2009; Forest 2009).

As alternatives to node dating, two approaches have recently been developed. In “total-evidence” dating, fossils are not explicitely assigned to any clades, but are instead included as terminal taxa. The position of these tips is determined as part of the phylogenetic analysis, based on morphological character data that are required for all included fossils and at least some of the extant taxa (Pyron 2011; Ronquist et al. 2012). Branch lengths, and thus divergence times between extant and extinct species are inferred under the assumption of a “morphological clock”, usually based on the Mk model of Lewis (2001) (Pyron 2011; Beck and Lee 2014; Arcila et al. 2015). The total-evidence approach is conceptually appealing as it is able to account for uncertainty in the phylogenetic position of fossils, and allows a more complete representation of the fossil record than node dating. However, this approach has been found to result in particularly ancient age estimates and long “ghost lineages” when applied to empirical data sets, leading some authors to question the suitability of morphological clocks for phylogenetic time-calibration (Beck and Lee 2014; Arcila et al. 2015; O’Reilly et al. 2015). The developments of more realistic sampling schemes (Höhna et al. 2011) and advanced models of morphological character evolution (Wright et al. 2015) are likely to improve age estimates obtained with total-evidence dating, but have so far been applied only rarely (Klopfstein et al. 2015; Zhang et al. 2015). Importantly, due to the requirement of a morphological character matrix, total-evidence dating is limited to groups that share sufficient numbers of homologous characters (Grimm et al. 2015) so that its application is practically not feasible for higher-level phylogenies combining very disparate taxa from different taxonomic orders or classes.

The “fossilized birth-death (FBD) process” (Heath et al. 2014) provides an elegant framework in which fossils are used as terminal taxa and thus more than the oldest fossil can be used for each clade. In this model, fossils as well as extant taxa are considered as the outcome of a common process based on the four parameters speciation rate λ, extinction rate *μ*, proportion of sampled extant taxa *ρ*, and the fossil “sampling rate” *ψ* It is assumed that the fossils that are ultimately sampled and included in the study have been preserved along branches of the complete species tree following a constant-rate Poisson process. Unlike in total-evidence dating, a morphological character matrix is not required to place fossil taxa in the phylogeny (but can be used for this purpose; Gavryushkina et al. 2015; Zhang et al. 2015). Instead, fossils are assigned to clades through the specification of topological constraints. The FBD process was first implemented in the Bayesian divergence time estimation program FDPPDIV (Heath et al. 2014), which requires the specification of a fixed tree topology, point estimates of fossil ages, and constant rates of diversification and sampling throughout the tree. All these limitations have subsequently been overcome in the implementations of the FBD process as a tree prior in the “Sampled Ancestors” package for BEAST (Gavryushkina et al. 2014; Bouckaert et al. 2014) and in the software MrBayes (Huelsenbeck and Ronquist 2001; Ronquist and Huelsenbeck 2003; Zhang et al. 2015). These implementations allow to specify priors on fossil ages to account for the often large uncertainties associated with them as well as the specification of time intervals within which rates are assumed constant, but between which they are free to vary. However, a limitation that remains also in newer FBD implementations is the assumption that all clades existing in a given time interval are subject to the same rates of diversification and fossil sampling. Especially in higher-level phylogenies, this assumption is unlikely to be met, as substantial clade-specific differences in these rates have been identified in many groups (Foote and Sepkoski 1999; Alfaro et al. 2009; Jetz et al. 2012; also see Supplementary Table S1), suggesting that time estimates obtained on the basis of this assumption may be misleading. The FBD model further assumes that the fossils included in the analysis represent either the complete set or a random sample of the known fossil record of a clade. However, the use of a complete or randomly sampled representation of the fossil record may be impractical with higher-level phylogenies due to the enormous number of fossil occurrences known for many higher taxa, e.g. for mammals (> 90 000), birds (> 5 000), and insects (> 40 000; www.paleobiodb.org). Presumably as a consequence of these difficulties, node dating has remained popular despite its drawbacks, and was applied in all recent phylogenomic time-tree analyses of groups above the order level (mammals: dos Reis et al. 2014; birds: Jarvis et al. 2014; Prum et al. 2015; insects: Misof et al. 2014).

Here, we develop a new approach for Bayesian phylogenetic divergence-time estimation that is based on node dating, but infers the optimal shape of calibration densities from a combination of the first fossil occurrence age of a given clade and independently assessed estimates of the sampling rate and the diversification rates. This approach therefore overcomes a major problem of node dating, the fact that calibration densities are often chosen arbitrarily despite their strong influence on the resulting age estimates (Heath et al. 2014). Our approach is suitable for divergence-time estimation in higher-level phylogenies combining groups with different sampling or diversification characteristics, as parameters can be specified independently for each clade. We have implemented our method in a new package for BEAST called “CladeAge”, and we will refer to calibration densities obtained with it as “CladeAge calibration densities” throughout the manuscript. Using a wide range of simulations, we assess the optimal scheme by which to select clades for calibration, and we show that the application of CladeAge calibration densities can result in age estimates comparable or better than those produced with the FBD model if the input rate estimates are correctly specified and only the oldest fossil of each clade is used for calibration.

We use our new approach together with a large and partially newly-generated molecular data set of 1187 teleost fishes to address the long-standing question whether freshwater cichlid fishes from India, Madagascar, Africa and the Neotropics diverged before or after continental separation (Chakrabarty 2004; Sparks and Smith 2005; Genner et al. 2007; Azuma et al. 2008; Friedman et al. 2013; McMahan et al. 2013). By rigorous examination of the paleontological literature, we identify as many as 147 fossil calibrations that can be reliably assigned to monophyletic clades in our phylogeny. To our knowledge this is the largest number of fossil calibrations thus far applied in a single time-tree analysis, and it suggests that studies using phylogenetic divergence-time estimation commonly underutilize the fossil record. Our results strongly support divergence of freshwater fishes long after continental separation, implying multiple marine dispersal events not only in cichlid fishes but also in other freshwater groups included as outgroups in our phylogeny.

## CladeAge calibration densities

### Calculating CladeAge calibration densities

Here, our goal is to design calibration densities that reproduce the probability density for a clade originating at time *t _{o},* given the age of its oldest fossil

*t*. To estimate this probability density, we assume that the probability density of a clade being

_{f}*t*time units older than it oldest fossil is identical to the probability density

*f*of the oldest fossil being

_{s}(t)*t*time units younger than the clade origin. This is equivalent to assuming a uniform prior probability distribution for the age of the clade, which is justified for calibration densities, as these probability densities will be multiplied with a (non-uniform) tree prior at a later stage, during the divergence time analysis. Thus, any non-uniform prior assumptions about the clade origin can be incorporated via the tree prior. We further assume that speciation, extinction, and fossil sampling are all homogeneous Poisson processes with rates

*⋋, μ*, and

*ψ*, respectively.

For a single lineage that does not speciate or go extinct, the probability to remain unsampled until time *t _{1}* is

*p*, while the probability of being sampled at least once during the same period is

_{u}(t_{1}) = e^{−ψt1}*p*—

_{s}(t_{1}) = 1*e*. Thus the probability of not being sampled before time

^{−ψt1}*t*

_{1}, but then being sampled before time

*t*is

_{2}> t_{1}The probability density for the clade being sampled for the first time exactly at time *t _{1}* is then

If we now allow for the possibility that the lineage has diversified into *N* species extant at time *t _{1}*, then the probability of the clade not being sampled before time

*t*is

_{1}*p*, where

_{u}(t_{1}) =^{e-ψS(t1)}*S(t*is the sum of lineage durations between clade origin and time

_{1})*t*(Foote et al. 1999). The probability of no lineage being sampled before time t

_{1}_{1}, but at least one lineage being sampled before time

*t*2 is then

In this case, the probability density for the clade being sampled for the first time exactly at *t _{1}* is
where the first derivative of

*S(t)*at time

*t*is

_{1}By ignoring the possibility of speciation or extinction between *t _{1}* and

*t*(which is justified at the limit

_{2}*t*), we get

_{2}→ t_{1}*S(t*and thus

_{2}) = S(t_{1}) + N * (t_{2}—1_{1})*S’(t*, which gives us

_{1}) = NIf we now take into account the stochastic nature of *S* as a variable resulting from a birth-death process with parameters *⋋* and *μ*, we have to rewrite Equation 6 as
where we condition on the survival of at least one species at time *t _{1}*, which is necessary to allow sampling at this time.

Unfortunately, we can not solve *f _{s}(t_{1})* analytically. To approximate

*f*CladeAge generates 10 000 birth-death trees based on estimates of the speciation rate

_{s}(t_{1}),*⋋*and the extinction rate

*μ*, infers

*S(t*in each of these trees as the sum of all branch lengths between clade origin and

_{1})*t*, and calculates

_{1}*ψNe*if the birth-death process resulted in

^{−ψS(t1)}*N ≥ 1*. According to the law of large numbers, the mean of a large sample converges to its expected value, therefore the probability density

*f*can be approximated by the mean of all values calculated for

_{s}(t_{1})*ψNe*. This process is repeated for 100 time points evenly spaced between 0 and a maximum time

^{−ψS(t1)}*t*which is predetermined so that

_{max},*p*≈

_{u}(t_{max})*0.001 * p*The probability density

_{u}(0).*f*for times

_{s}(t)*t*in between two of the 100 time points is estimated through interpolation from the probability densities of the two neighbouring time points, using linear regression. For all times larger than

*t*probability densities are approximated by a scaled exponential distribution that is calculated on the basis of the two largest time points and their respective probability densities

_{max},*f*Finally, all estimates of probability densities are scaled so that the total probability mass becomes 1.

_{s}(t).The calculation of calibration densities, as described above, requires estimates of the fossil sampling rate, as well as of the speciation and extinction rates, which can be obtained externally, from the fossil record alone (Silvestro et al. 2014; Starrfelt and Liow 2015), or from a combination of fossil and phylogenetic information (Alfaro et al. 2009; Stadler 2011; Rabosky 2014). As diversification is commonly parameterized as “net diversification” (*⋋ — μ)* and “turnover” (*μ/⋋*), and researchers often have greater confidence in estimates of net diversification and turnover than in those for speciation and extinction rates (Beaulieu and Donoghue 2013), our method accepts input in these units, and calculates *λ* and *μ* from it. Uncertainty in the three parameters net diversification, turnover, and sampling rate can be expressed by specifying minimum and maximum values and is accounted for by randomly drawing from the specified ranges, for each of the 10 000 birth-death trees generated to estimate estimated probability density *f _{s}*. Examples demonstrating the shape of CladeAge calibration densities, based on exactly known (A) or uncertain ages of the first fossil record (B), are shown in Figure 1.

### Calibration schemes for the use of CladeAge calibration densities in phylogenetic divergence-time estimation

Under the assumption of constant rates of diversification and sampling as well as a uniform prior probability for node ages, CladeAge calibration densities approximate the probability density for the age of a clade, given the age of the oldest fossil record of this clade. These probability distributions are therefore suitable as constraints on clade ages in Bayesian divergence-time estimation. However, in practice, it may not always be clear which clades should be used for time calibration: All clades with fossil records? Only those clades with stem-group fossils? Or only those with a crown-group fossil record? If a fossil represents the earliest record of not only one clade, but of multiple nested clades, CladeAge calibration densities could be used as time constraints for all these clades (we refer to this as “scheme A”), only for the most inclusive of these clades (“scheme B”), or only for the least inclusive clade (“scheme C”). As scheme B would allow one or more of the clades to appear younger than the fossil itself, it seems reasonable to specify, in addition to the CladeAge calibration density for the most inclusive clade, the fossil age as a strict minimum age for the least inclusive clade when using this scheme. Furthermore, if two sister clades both possess a fossil record, they could both be used to constrain the age of the two clades. However, as the ages of the two clades are necessarily linked by their simultaneous divergence, two time constraints would be placed on one and the same node. Instead, it may seem more intuitive to use only the older of the two fossils for time-calibration and disregard the younger fossil (“scheme D”). However, in contrast to traditional node dating, where maximally one calibration density is placed on each node, the model used to calculate CladeAge calibration densities considers each clade individually, and could thus be biased if the selection of clades for calibration is based on information about their sister clade. Figure 2a illustrates the four different calibration schemes.

As CladeAge calibration densities approximate the probability densities of clade ages conditional on the age of the first fossil record of this clade, they are also expected to approximate frequency distributions of observed waiting times between the origin of a clade and the appearance of the first fossil record of this clade in a sufficiently large sample of simulated phylogenetic trees. Since these waiting times can be sampled according to the above four schemes, we can determine the optimal calibration scheme by comparison of waiting time frequency distributions with CladeAge calibration densities. We simulated three times 10 000 pure-birth phylogenies with a speciation rate *⋋ = 0.04* and a root age *t _{root}* randomly drawn from a uniform distribution between 20 and 200 time units, conditioned on the survival of exactly 100 extant species. Assuming a Poisson process of fossil sampling, we added simulated fossil records to the branches of each of these trees, with three different sampling rates

*ψ = 0.1, 0.03, 0.01.*Applying the above four calibration schemes (A-D) independently, we recorded waiting times between a clade’s origin and the age of its oldest fossil in each simulated phylogeny.

Waiting time frequency distributions recorded from relatively young clades can be biased by the fact that only those waiting times shorter than the clade age can be recorded (otherwise the clade did not preserve at all). To assess the degree of this effect, we repeated this analysis, counting only waiting times for clades with a time of clade origin *t _{o}* above one out of four thresholds:

*t*(all clades included),

_{o}≥ 0*t*and

_{o}≥ 0.5) × t_{root}, t_{o}≥ 0.9 × t_{root},*t*(including only the two clades descending from the root, per simulated phylogeny). With the strictest clade age threshold of

_{o}= 1 × t_{root}*t*the same two waiting times per phylogeny are recorded with schemes A and B if both clades descending from the root have produced fossils. This is because the root node represents the oldest node that can be constrained with fossils in these clades, and thus waiting times between the root and these fossils are recorded with both schemes A and B. If further divergence events occurred between the root and the fossil, the root does not represent the youngest node that can be constrained with the fossil, and thus, the waiting time between the root and the fossil are not recorded with schemes C and D (see Fig. 2a). Differences between schemes A and B become apparent with less strict clade age thresholds, when also clades are included that do not represent the oldest possible clade to be constrained with a given fossil.

_{o}= 1 × t_{root},Figure 2b shows comparisons between waiting time frequency distributions and CladeAge calibration densities for a clade age threshold of *t _{o} ≥ 0. 9 × t_{root},* which is sufficiently young to show differences between all schemes, but still old enough to be affected only minimally by the bias described above. Comparisons for all other tested clade age thresholds are shown in Supplementary Figure S1. Taken together, these results show that waiting time frequency distributions deviate from the respective CladeAge distribution in most comparisons, and the degree of disagreement depends on sampling rate 0, on the clade age threshold, and on the applied scheme (A-D). However, for all but the youngest clade age thresholds, scheme A produces a frequency distribution that is virtually identical in shape to the distribution of CladeAge calibration densities. This suggests that when CladeAge calibration densities are used for time calibration, they should strictly be applied to constrain all clades for which a given fossil represents the first occurrence, even if the same fossil is used to constrain multiple nodes, and even if more than one constraint is placed on one and the same node.

## Testing CladeAge Calibration Densities with Simulated Phylogenies

To more extensively compare the performance of the four different calibration schemes A to D, we simulated phylogenetic data sets including fossil records and sequence alignments, and used CladeAge calibration densities to estimate clade ages in BEAST v.2.1.3. For comparison, we also used the same generated data sets to estimate clade ages with the FBD process implemented in the Sampled Ancestors (Gavryushkina et al. 2014) package for BEAST.

### Generation of data sets

Phylogenetic data sets of trees and fossil records were generated as decribed above with sampling rates *ψ = 0.1, 0.03, 0.01,* a root age between 20 and 200 time units, and a net diversification *⋋ — μ* of 0.04, however, species turnover was now modeled with rate *μ/⋋ = 0.5* (thus using *⋋ = 0.08* and *μ = 0.04).* If the time units used in these simulations are considered to be million years, the sampling and diversification rates used here are comparable to those found in empirical data sets (Jetz et al. 2012; Stadler and Bokma 2013; Rabosky et al. 2013; Supplementary Table S1). In separate sets of simulations, branch-specific substitution rates were modeled either with an uncorrelated molecular clock (Drummond et al. 2006), or with an autocorrelated molecular clock that accounts for the heritability of factors influencing rate variation (such as body mass, longevity, and generation time; Nabholz et al. 2008; Amster and Sella 2016) and may therefore model rate evolution more realistically than the uncorrelated molecular clock (Lepage et al. 2007). For both types of branch rate variation, we used a mean rate of *4 × 10 ^{−3}* substitutions per site per time unit and a variance parameter of

*1.6 ×*10

^{−5}. Branch-rate autocorrelation was simulated with the Cox-Ingersoll-Ross (CIR) process of Lepage et al. (2006), using a decorrelation time of 100 time units. The branch lengths and substitution rates were used to simulate sequence evolution of 3000 nucleotides according to the unrestricted empirical codon model of Kosiol et al. (2007). For each of two clock models and each of the three sampling rates, we generated 50 replicate data sets. An example of a data set simulated with these settings is illustrated in Supplementary Figure S2.

### Phylogenetic divergence-time estimation

For each of the replicate data sets, the simulated phylogenetic trees were reconstructed, and for each clade in each reconstructed phylogeny, the oldest fossil occurrence was identified. CladeAge calibration densities were calculated for these fossils based on the parameters used in simulations (λ = 0.08, *μ* = 0.04, and *ψ* = 0.1, 0.03, 0.01), and used to constrain node ages according to calibration schemes A to D in divergence-time estimation with BEAST. All sequence alignments were divided into three partitions according to codon position, and for each partition, we used the reversible-jump based substitution model of Bouckaert et al. (2013) with four gamma-distributed rate categories. For all simulated data sets, we used the lognormal relaxed molecular clock (Drummond et al. 2006) for divergence-time estimation. To account for extinction in the diversification process, we used the birth-death tree prior of Gernhard (2008) with uninformative prior distributions for the birth rate and the relative death rate. We used the reconstructed simulated tree as a starting tree in all analyses, and fixed the tree topology by disallowing all topological changes. For each analysis, 50 million Monte Carlo Markov Chain (MCMC) steps were carried out, which was always sufficient for convergence.

For the analysis of the same data sets with the FBD model implemented in the Sampled Ancestors package for BEAST, we used settings as described above, except that between 100 and 400 million MCMC steps were required for convergence. For comparability with age estimates based on CladeAge calibration densities, we used only the oldest fossils for each clade, and we fixed the values of diversification rates to those used to generate the data set. In separate sets of analyses, however, we fixed the sampling proportion according to the sampling rate used in simulations, or allowed the sampling proportion to be estimated. We also fixed the tree topology of all extant species, while at the same time allowing fossil taxa to attach anywhere whithin the clade (including its stem) to which they were assigned. This was done by using instances of “CladeConstraint”, a new type of topological constraint for BEAST introduced as part of the Sampled Ancestors package (Gavryushkina et al. 2014), with which ingroups and outgroups can be defined for a given clade, and taxa not listed in either of these groups are free to appear in either of them. For each clade, we specified CladeConstraints that place all extant taxa and fossils of this clade within the ingroup and all other extant taxa in the outgroup, thus allowing fossils from parent clades to appear outside or within this clade. As the starting tree, we used the reconstructed simulated tree but reattached each clade’s oldest fossil (provided that it had any) to its stem with an additional branch. All BEAST input files used for the analysis of simulated datasets are provided as Supplementary Data S1.

### Results with simulated phylogenies

Our simulations produced phylogenetic trees with root heights between 52.2 and 163.8 time units, with a median height of 84.8 time units. Mean branch rates per tree were between 2.2 × 10^{−3,} and 6.6 × 10^{−3,} (median 3.9 × 10^{−3}) substitutions per time unit with branch rate variances between 7.8 × 10^{−7} and 4.4 × 10^{−5} (median 7.1 × 10^{−6}), resulting in 5280 to 15090 (median 8289) nucleotide substitutions. The sequence alignments contained between 2217 and 2836 (median 2568) variable sites and between 1724 and 2646 (median 2231) parsimony-informative sites, out of a total of 3000 sites per alignment. Simulated fossil records consisted of 165 to 380 (median 240.5) fossils when generated with a sampling rate of *ψ* = 0. 1, 40 to 123 (median 74.5) fossils with *ψ* = 0.03, and 10 and 49 (median 24) fossils when a sampling rate of *ψ* = 0.01 was applied (see Supplementary Figure S2 for an illustration). Discarding fossils that did not represent the oldest fossil of any clade left between 75 and 113 (median 94) fossils when the sampling rate was *ψ* = 0.1, 30 to 71 (median 47) fossils with 0 = 0.03, and 9 to 33 (median 19.5) fossils with *ψ* = 0.01. Figure 3a shows the mean number of fossil constraints in 50 simulated data sets, per bin of 20 time units. The number of fossils available as time constraints decreases with bin age, a direct result of the fact that younger time bins contain an overall larger sum of lineage durations.

Comparisons of estimated and true node ages are shown in Figure 3b-c, for all analyses of data sets generated with the intermediate sampling rate of *ψ* = 0.03 and the uncorrelated clock model (see Supplementary Figure S3 for results obtained with *ψ* = 0.1 or *ψ* = 0.01, or with the autocorrelated clock model). The difference between results based on MCMC sampling from the prior only (Fig. 3b) and results based on the posterior (Fig. 3c) is most pronounced for young clades where 95% highest posterior density (HPD) intervals (indicated with gray bars in Fig. 3b-c) are much wider when the MCMC sampled from the prior only. This suggests that in combination with a relaxed clock model, sequence data is most informative to determine the age of young nodes, but that the age estimates of older nodes are primarily determined by the specified prior probabilities.

Following Heath et al. (2014) and Gavryushkina et al. (2014), we describe the age estimates for simulated phylogenies with two summary statistics, the mean width of 95% HPD intervals and the percentage of 95% HPD intervals that include the true node age. Shorter 95% HPD intervals indicate greater precision, and the percentage of 95% HPD intervals that include the true node age serves to assess the accuracy of age estimates. If the model used to generate the data is identical to that assumed for divergence-time estimation, and if MCMC sampling has completely converged, 95% of the 95% HPD intervals are expected to include the true node age. For CladeAge analyses of data sets generated with uncorrelated branch rates, a nearly identical model was used for simulation and inference, and the resulting percentage of 95% HPD intervals containing the true node age can therefore serve as an indicator of the optimal calibration scheme to be used with CladeAge. In contrast, the model used in analyses with the FBD differs to a greater extent from the model used to generate data sets, as the FBD model assumes that all, or a randomly sampled set of fossils of a clade are used for calibration, whereas our data sets were reduced to contain only the oldest fossils of each clade. Thus, for FBD analyses of data sets generated with uncorrelated branch rates, the two summary statistics allow to assess the robustness of the FBD model to a violation of the assumed fossil record representation. In addition, results for data sets generated with autocorrelated branch rates indicate the degree of robustness of both CladeAge and FBD analyses to violations of the clock model assumed for divergence-time estimation.

For all analyses with CladeAge as well as FBD analyses in which the sampling rate was allowed to be estimated, the two summary statistics are listed in Table 1. For data sets generated with the uncorrelated clock model, these results are also illustrated in bins of 20 time units in Figure 3d-e (detailed results for all analyses are given in Supplementary Tables S10-S12).

Among the four calibration schemes A to D, scheme A produced the shortest 95% HPD intervals with data sets based on *ψ* = 0. 1 or *ψ* = 0.03, regardless of whether the MCMC was set to sample from the prior only, or from the posterior, and both with data sets generated with uncorrelated or autocorrelated branch rates. At the same time, the percentage of true node ages included in 95% HPD intervals obtained with scheme A is closer to the expected value of 95% than that of any other calibration scheme. In contrast, scheme B performed slightly better than scheme A for data sets with the lowest sampling rate *ψ* = 0.01, as indicated by shorter 95% HPD intervals and a greater percentage of true node ages included within them. However, when scheme B was used for the analysis of data sets generated with uncorrelated branch rates, the accuracy of age estimates decreased with node age, and for nodes with a true age between 80 and 100 time units, only 76.2% of the 95% HPD intervals contained their true age (Fig. 3e, Supplementary Table S11).

In all cases, MCMC sampling from the posterior decreased the mean width of 95% HPD intervals, compared to analyses using the prior probability alone. The percentage of 95% HPD intervals containining the true node age remained comparable between analyses based on the prior probability alone (92.6-95.2% with scheme A) and analyses using the posterior (90.5-94.9% with scheme A) for data sets generated with uncorrelated branch rates. However, for data sets generated with autocorrelated branch rates, the percentage of 95% HPD intervals containining the true node age decreased substantially when the posterior was used for MCMC sampling (66.2-87.9% with scheme A; Table 1).

Overall, FBD analyses with a fixed sampling rate produced very similar summary statistics to CladeAge analyses with scheme A (Table 1). As for CladeAge analyses, the percentage of 95% HPD intervals containing the true node age was lower with autocorrelated branch rates, and remained around 95% with uncorrelated branch rates or when MCMC sampling from the prior only. In seven out of nine comparisons, however, the 95% HPD intervals were slightly wider when estimated with the FBD model than with CladeAge scheme A. The FBD model, used with a fixed sampling rate, also appeared somewhat less robust to the violation of the assumed clock model (i.e. with branch-rate autocorrelation), except when the lowest sampling rate *ψ* = 0.01 was used for dataset generation.

In contrast, when the sampling rate was not fixed in FBD analyses, 95% HPD intervals remained similarly wide regardless of the true sampling rate used in dataset generation, and relatively small percentages of 95% HPD intervals contained the true node age in analyses using the posterior (Fig. 3c). A particularly low percentage of 95% HPD intervals (63.5-63.8%; Table 1) contained the true node ages in analyses of datasets generated with autocorrelated branch rates and high or intermediate sampling rates (*ψ* = 0.1 or *ψ* = 0.03). Low accuracy with the FBD model in which sampling rates were not fixed was mostly due to overestimation of intermediate node ages (Fig. 3e, Supplementary Figure S3b-c). The overestimation of node ages in these analyses coincides with a substantial understimation of the sampling rate itself (Supplementary Figure S4). In analyses using the prior alone, the sampling rate was on average estimated as only 30.4, 51.4, and 73.9% of the true sampling rate, when the true sampling rate was *ψ* = 0.1, 0.03, or 0. 01, respectively. Also when sampling from the posterior in analyses of datasets generated with uncorrelated or autocorrelated branch rates, these percentages remained nearly identical (Supplementary Figure S3).

## Applying CladeAge Calibration Densities to Resolve Divergence Times of Cichlid Fishes

### Phylogeography of Cichlidae

Fishes of the percomorph family Cichlidae are known for their extraordinary species richness, which includes the replicated adaptive radiations of the East African Lakes Tanganyika, Malawi, and Victoria (Salzburger et al. 2014). Three reciprocally monophyletic subfamilies occur in Africa and the Middle East (Pseudocrenilabrinae; ∽2100 spp.; see Supplementary Text S2), in South and Central America (Cichlinae; ∽500 spp.), and on Madagascar (Ptychochrominae; 16 spp.). In addition, the most ancestral subfamily Etroplinae consists of two genera, of which *Etroplus* (3 spp.) occurs in Southern India and Sri Lanka while *Paretroplus* (13 spp.) is endemic to Madagascar (Sparks and Smith 2004). As the distribution of cichlids is mostly limited to landmasses of the former supercontinent Gondwana, their biogeography is traditionally considered a product of Gondwanan vicariance (Chakrabarty 2004; Sparks and Smith 2005; Smith et al. 2008; Azuma et al. 2008). In this scenario, the divergence of African Pseudocrenilabrinae and South American Cichlinae must have occurred before or during the break-up of the two continents about 100 Ma (Heine et al. 2013), and Indian and Malagassy members of Etroplinae must have separated before 85 Ma (Ali and Aitchison 2008). Regardless of whether cichlids colonized Africa or South America first, this colonization should have occurred before 120 Ma, as Madagascar and India were separated by that time from both Africa and Antarctica, through which a connection to South America could have existed previously (Ali and Aitchison 2008; Ali and Krause 2011).

However, a Gondwanan history is not supported by the fossil record of Cichlidae. The earliest record of Cichlidae is provided by †*Mahengechromis* spp. from the Mahenge palaeolake, Tanzania (46-45 Ma) (Murray 2000a), followed by the first occurences of neotropical cichlids, recovered from the Argentinian Lumbrara Formation (Malabarba et al. 2006; Alano Perez et al. 2010; Malabarba et al. 2010). The fossiliferous “Faja Verde” layer of the Lumbrera Formation has frequently been cited as Ypresian-Lutetian in age (48.6 Ma) (e.g. Alano Perez et al. 2010); however, it is unclear how the date was obtained (Friedman et al. 2013). The Lumbrera Formation has been assigned to the Casamayoran South American Land Mammal Age (SALMA) (del Papa et al. 2010), which corresponds to 45.4-38.0 Ma (as it is defined by polarities C20-C18) (Vucetich et al. 2007). This age can be further constrained by radiometric dating of a tuff layer overlying the “Faja Verde” deposits, providing a minimum age of 39.9 Ma (del Papa et al. 2010; this evidence seems to have been misinterpreted in a previous analysis of cichlid divergence dates where a minimum age of 33.9 Ma was assumed; Friedman et al. 2013).

Due to the lack of cichlid remains older than 46 Ma, long ghost lineages would need to be postulated in order to reconcile the biogeography of cichlid fishes with Gondwanan vicariance. On the other hand, trans-oceanic dispersal over hundreds or thousands of kilometers, followed by successful colonization of a new continent, appears extremely improbable, given that cichlids are found almost exclusively in freshwater. Whereas several cichlid species occur in brackish-water estuaries and some species are known to tolerate marine saltwater conditions (Myers 1949; Stickney 1986; Uchida et al. 2000), none have ever been observed in the open ocean, more than a few miles from the coast (Conkel 1993; Greenfield and Thomserson 1997). Thus, a long-standing debate has centered on the relative probabilities of the two alternative scenarios, Gondwanan vicariance despite long ghost lineages, or trans-oceanic dispersal despite a common freshwater lifestyle (Vences et al. 2001; Murray 2001a; Chakrabarty 2004; Sparks and Smith 2005; Genner et al. 2007; Smith et al. 2008). However, arguments for both sides have mostly been verbal, and the probabilities of the long ghost lineages required for the Gondwanan vicariance scenario could not properly be quantified, as an objective basis has been lacking for the specification of calibration densities in previous divergence-time analyses (e.g. Azuma et al. 2008; but see Friedman et al. 2013). In contrast, CladeAge calibration densities are based on sampling rate estimates and thus directly account for probabilities of individual ghost lineage durations. In combination with a large-scale molecular phylogeny including multiple cichlid and outgroup fossil constraints, the CladeAge method is therefore ideally suited to assess the most plausible phylogeographic scenario for cichlid fishes.

### A multi-marker phylogeny of teleost fishes

In order to time-calibrate cichlid divergences, we applied CladeAge calibration densities to a large-scale phylogeny of cichlid and outgroup taxa, including nearly 150 fossil constraints. As a first step, we compiled a molecular data set for 40 mitochondrial and nuclear markers, sequenced from 1187 species of the teleost Supercohort Clupeocephala (see Betancur-R et al. 2013). Of the species included in the data set, 578 were members of the family Cichlidae, 516 were members of other families of Cohort Euteleosteomorpha, and 93 species were members of Cohort Otomorpha, the sister lineage of Euteleosteomorpha, and were collectively used as an outgroup in our phylogenetic analysis. Out of a total of 11 050 sequences, 9970 were retrieved from the NCBI nucleotide database (www.ncbi.nlm.nih.gov/nuccore), 85 were obtained from annotated genomes of the Ensembl database (Cunningham et al. 2015), 5 mt-co1 sequences were downloaded from the Barcode of Life Data System (BOLD; Ratnasingham and Hebert 2007), and 328 sequences were identified from other non-annotated genomic resources (Supplementary Tables S2-S7). In addition, 662 sequences of 19 markers were produced specifically for this study, including 26 mitochondrial genomes (see Supplementary Text S1 for sequencing protocols and Supplementary Tables S2 and S5 for accession numbers).

For each marker, sequences were aligned with MAFFT v.7.122b (Katoh and Standley 2013), visually inspected, and poorly aligned regions were removed. Alignments were subsequently divided into primary data blocks according to codon position. In combination, the alignments included 35 817 sites with an overall proportion of undetermined characters of 82.84%. Assuming a general time-reversible model of sequence evolution with gamma-distributed rate variation among sites (GTR+Γ), the fit of partitioning schemes was assessed according to the Bayesian Information Criterion (BIC). The best-fitting partitioning scheme determined with the greedy algorithm implemented in PartitionFinder v.1.0.1 (Lanfear et al. 2012) combined primary data blocks into 30 different partitions (Supplementary Table S8).

Maximum likelihood (ML) phylogenetic tree search was conducted with RAxML v.7.3.1 (Stamatakis 2006; Pfeiffer and Stamatakis 2010), applying unlinked “GTRCAT” models of sequence evolution for each of the 30 partitions. Topological node support was evaluated with RAxML’s rapid bootstrap analysis (option “-f a”) and the “autoMRE” automatic stopping criterion (Stamatakis et al. 2008). Based on the ML phylogeny, we identified 455 clades that were potentially suitable for time calibration, as they were supported by high bootstrap values in our study (≥ 93% with only 6 exceptions) and corroborated by previously published molecular phylogenetic analyses and morphological synapomorphies (Supplementary Figure S5 and Supplementary Text S2). Of the 455 clades, 362 were mutually exclusive and in their sum represented nearly the entire species richness of Clupeocephala (> 99.5%; Supplementary Table S9). This is important in analyses with CladeAge calibration densities, as it ensures that the sister groups of clades used for time calibration are present in the phylogeny, even if their identity is not known prior to the phylogenetic analysis. If sister groups were instead missing from the taxon set, the stem lineages of calibrated clades would appear older than they are, potentially leading to biased age estimates.

### CladeAge model parameter estimation

CladeAge calibration densities are calculated based on estimates of rates of sampling, net diversification, and turnover. In order to use CladeAge calibration densities for the time-calibration of teleost divergences, we obtained estimates for these three parameters from previous studies. Net diversification and turnover rates of teleost fishes were estimated by Santini et al. (2009) as 0.041-0.081 per lineage per million years (L^{−1}myr^{−1}) and 0.0011-0.37 L^{−1}myr^{−1}, respectively. These estimates are comparable to those of a more recent analysis by Rabosky et al. (2013), who estimated a mean net diversification rate of 0.098 L^{−1}myr^{−1} and a mean turnover rate of 0.284 L^{−1}myr^{−1} using a Bayesian model of diversification with rate shifts. We here apply the slightly lower diversification rate estimates of Santini et al. (2009) to calculate CladeAge calibration densities and note that their distributions will tend to be wider, and thus older, than distributions calculated with the rate estimates of Rabosky et al. (2013).

Sampling probabilities have been estimated from the fossil record for a variety of groups and with a wide range of methods. For bony fishes (Osteichthyes) including Clupeocephala, an estimate of the sampling probability was calculated by Foote and Sepkoski (1999) from the frequency ratio *f*_{2}^{2}/(*f*_{1}*f*_{3}), where *f*_{1}, *f*_{2}, and *f*_{3} are the frequencies of genera with stratigraphic ranges of one, two, and three geologic time intervals, respectively (Foote and Raup 1996). The resulting estimate of 0.15-0.30 (Foote and Miller 2007) thus represents the probability that one or more members of a given genus are sampled from a geological time interval, and Foote and Sepkoski (1999) used five-million-year time intervals in their analysis. As CladeAge calibration densities are calculated from instantaneous species-level sampling rates, we translated the genus-level sampling probability estimate of Foote and Sepkoski (1999) as follows. We downloaded the list of all valid scientific names of bony fishes from the Catalogue of Life database (Roskov et al. 2015) and determined the frequency distribution of extant bony fish genus sizes from these names. We then used this distribution in combination with species-level sampling rates to simulate bony fish preservation over five million years and recorded the proportion of genera that were sampled during this interval. The species-level sampling rate was optimized until the resulting proportion of sampled genera was sufficiently close to the genus-level estimate of (Foote and Miller 2007). This optimization was performed separately for the lower and upper bound of estimate of (Foote and Miller 2007). We find that species-level instantaneous sampling rates of 0.0066-0.01806 L^{−1}myr^{−1} provide the best fit to five-million-year genus-level preservation probabilities of bony fishes, under the assumption of constant rates and a constant genus-size frequency distribution. For comparison, and in order to provide species-level estimates for future users of CladeAge, we compiled a comprehensive list of published sampling rates in Supplementary Table S1, using the above translation where necessary.

### Divergence-time estimation of teleost fishes

We analyzed the published fossil record for each of the 455 strongly supported teleost clades, and identified their first occurrences, the rock formation in which the earliest record was found, as well as the minimum and maximum age of this formation. Detailed information of the fossil record of each clade is given Supplementary Text S1. According to calibration scheme A, we used first occurrences to define CladeAge calibration densities distributions even if earlier records were known in sister clades, and we reused calibration densities for more inclusive clades if these (i) had no earlier fossil record on their own, but were (ii) either morphologically recognizable or characterized by a discrete geographical distribution so that fossil finds could in principle have been assigned to them directly rather than to parental clades only. For example, the Miocene *Nandopsis* †woodringi represents the earliest record of the genus *Nandopsis,* to which it can be assigned based on the presence of lingual cusps on the oral teeth and four anal-fin spines, a character combination which within cichlids is unique to members of this genus (Chakrabarty 2007). However, *Nandopsis* †woodringi also represents the first occurrence of the clade “SCAC+NCAC”, combining the groups “SCAC” (Southern Central American Clade) and “NCAC” (Northern Central American Clade) of López-Fernández et al. (2010) with a total of 19 genera of Neotropical cichlids. This clade is strongly supported by molecular phylogenies (López-Fernández et al. 2010, this study), but is not characterized by known synapomorphies or a geographical distribution that separates it from its potential sister groups. Thus, if stem-fossils were found of clade “SCAC+NCAC”, these would likely be misassigned to the next more inclusive clade that is morphologically recognizable, in this case the tribe Heroini. A lack of recognizable features for a clade thus effectively reduces the sampling rate of its stem lineage to 0. In order to account for this reduction, CladeAge calibration densities were defined exclusively for morphologically (or in some cases geographically) recognizable clades. We identified fossil constraints for a total of 147 clades, including 18 clades within cichlids (see Supplementary Text S1 and Supplementary Figure S5).

In order to reduce model complexity and increase compuational efficiency of Bayesian phylogenetic inference, eight markers with the greatest proportions of missing sequences were removed from the data set (Supplementary Table S4). In addition, a total of 80 codon positions with signatures of episodic selection were identified with the mixed effects model of evolution implemented in HyPhy (Murrell et al. 2012; Kosakovsky Pond et al. 2005) and removed from the alignment. We further collapsed each of the 362 mutually exclusive clades to individual tips, and for each marker we chose sequences of clade members at random to represent the terminal clade. To account for sequence variation within a clade, we repeated random sequence sampling five times, producing five replicate datasets that each included a total of 27 950 sites with 59.3% missing data. Each of the five replicate data sets was used for phylogenetic inference and time calibration with BEAST, on the basis of 147 CladeAge calibration densities. As for ML analyses, the data set was partitioned according to marker and codon position. Tree topology and branch lengths were linked among partitions, but parameters of the clock and sequence substitution models remained unlinked. We assumed an uncorrelated relaxed molecular clock (Drummond et al. 2006) and applied the reversible-jump based substitution model of Bouckaert et al. (2013). For each partition, a gamma distribution of among-site rate heterogeneity with four rate categories was assumed. We used the flexible birth-death skyline model (Stadler et al. 2012) with independent diversification rate parameters for the pre-Cretaceous Mesozoic (> 145.5 Ma), the Early (145.5-99.6 Ma) and Late Cretaceous (99.6-66.0 Ma), as well as the Cenozoic (< 66.0 Ma), and specified a sampling fraction *p* of 0.0135 according to the ratio of tips included in the analysis to the total extant diversity of Clupeocephala. We left the tree topologically unconstrained except for nodes used for time calibration. Justifications for the assumed monophyly of each clade used for time-calibration are given in Supplementary Text S2. For each data set replicate, 600 million MCMC states were sampled, and we repeated the analysis without data, sampling from the prior to ensure that conclusions were not pre-determined by the prior.

### Resulting timeline of cichlid and teleost divergences

Comparison of MCMC traces between run replicates suggested that all replicates had converged at the same posterior distribution. After discarding 60 million MCMC generation of each replicate run as burn-in, we produced a joint posterior tree sample with 1000 trees per replicate, and generated a Maximum Clade Credibility (MCC) tree from the combined distribution of 5000 posterior trees. The inferred timeline of cichlid and outgroup teleost divergences is summarized in Figure 4a and shown in more detail in Supplementary Figure S6.

The MCC tree topology was well supported, and corroborates the higher-level groupings found in recent large-scale Bayesian phylogenies of teleost fishes (Near et al. 2013; Betancur-R et al. 2013), as well as previously identified relationships within cichlid fishes (e.g. Schwarzer et al. 2009; López-Fernández et al. 2010; Meyer et al. 2015). With a single exception (Centropomidae), all unconstrained clades from our list of 455 clades were recovered as monophyletic. According to the MCC timeline, crown Clupeocephala originated around 207.8 Ma (95% HPD: 234.5-186.2 Ma), crown Acanthomorphata appeared 144.2 Ma (95% HPD: 158.4-130.6 Ma), and South American Cichlinae and African Pseudocrenilabrinae diverged about 81.6 Ma (95% HPD: 89.4-74.0 Ma). In comparison, the age of crown Clupeocephala appears markedly older in the studies of Betancur-R et al. (2013) and Near et al. (2013), who estimated their origin at about 251.1 (95% HPD: 276.1-226.1 Ma) and 273.7 Ma (95% HPD: 307.5-242.0 Ma), respectively. The divergence date of Acanthomorphata is more comparable between the three studies, and was estimated at 164.9 Ma (95% HPD 186.0-144.4 Ma) in Betancur-R et al. (2013), and around 142.5 Ma (95% HPD: 154.0-132.0 Ma) in Near et al. (2013), less than two million years younger than the estimate resulting from our time-calibrated phylogeny. For relatively younger divergences, however, our estimates appear older than those of Betancur-R et al. (2013) and Near et al. (2013): The divergence of South American Cichlinae and African Pseudocrenilabrinae was estimated at 62.0 Ma (95% HPD: 70.4-53.9 Ma) in Betancur-R et al. (2013) and as young as 26.0 Ma (95% HPD: 29.6-22.0 Ma) in Near et al. (2013). Notably even the older limit of the 95% HPD of the latter estimate is predated by at least 5 well-characterized fossil species within crown Pseudocrenilabrinae (Murray 2001b) and crown Cichlinae (Malabarba et al. 2010, 2006; Alano Perez et al. 2010; Malabarba and Malabarba 2008) and thus is in strong disagreement with the cichlid fossil record. Thus, the consistent application of CladeAge divergence date probability distributions to all clades with known fossil records appears to remove conflicts of comparatively younger node ages with the fossil record, while at the same time reducing the ghost lineages for relatively older clades. A more detailed comparison of clade age estimates between the three studies is shown in Supplementary Figure S7.

While our age estimates for cichlid divergences are generally older than those obtained in Betancur-R et al. (2013) and Near et al. (2013), they are still markedly too young to support strictly Gondwanan vicariance between Indian, Malagassy, Neotropical, and African groups of cichlid fishes, as well as within other groups of freshwater fishes included in our phylogeny (Fig. 4). Notably, the divergence of Neotropical Cichlinae and African Pseudocrenilabrinae, estimated at 81.6 Ma, appears to have occurred about 20 myr after the final separation of the American and African landmasses at 104-100 Ma Heine et al. (2013).

Comparison of these results with those obtained by MCMC sampling from the prior distribution shows that the divergence estimate for African and South American cichlids is driven by the molecular sequence data. The prior distribution is markedly older than the divergence date posterior for this split, with 65.6% of the prior samples being younger than 100 Ma, whereas the same is true for 99.93% of the posterior distribution (Fig. 4d). As a consequence, the Bayes factor in favour of a divergence younger than 100 Ma is 752, which can be considered overwhelming evidence (Kass and Raftery 1995) supporting the trans-Atlantic dispersal scenario, as opposed to Gondwanan vicariance.

## Discussion

### Divergence-time estimation with CladeAge calibration densities

Our analyses of datasets simulated under a wide range of conditions show that CladeAge calibration densities allow bias-free estimation of divergence times. The comparison of four different calibration schemes confirms that calibration scheme A (Fig. 2a) performs better than other schemes and should thus be applied whenever CladeAge calibration densities are used for time calibration. This implies that for each clade, the oldest fossil record of this clade should be used as a time constraint, regardless of whether the fossil is also the earliest record of other (nested or parental) clades, and even if the fossil is younger than the oldest fossil record of the clade’s sister group. With calibration scheme A, the CladeAge model produces very similar results to the FBD model with fixed sampling rates, but appears slightly more robust to model violation in the form of branch-rate autocorrelation, at least with larger numbers of fossil calibrations. When the sampling rate is not fixed in FBD analyses and only the oldest fossils of each clade are used for time calibration, the sampling rate is often substantially underestimated, leading to wide confidence intervals and overestimation of node ages. This suggests that when the sampling-rate parameter is not fixed in analyses with the FBD model, a rather complete representation of the fossil record should be included in the analysis instead of only the oldest fossil of each clade.

However, for the practical time calibration of higher-level phylogenies, the compilation of the entire fossil record of clades used for calibration may be far less feasible than the identification of their oldest reported fossils. A large amount of paleontological literature has been dedicated to determine oldest taxon appearances across the tree of life, and demonstrates the difficulties associated with the identification of these records as well as of their ages (Benton 1993; Benton and Donoghue 2007; Hedges and Kumar 2009; Ksepka et al. 2011; Benton et al. 2015). While the Paleobiological Database (www.paleobiodb.org) provides information not only about the oldest fossils of clades, but about a much larger number of fossils for many clades, the taxonomic assignment and age ranges given for these fossils are usually far less well curated than those of first taxon appearances that are dealt with in dedicated literature. In addition, neither the paleontological literature nor databases that use information from this literature are likely to provide an unbiased representation of the age distribution of fossils within a clade. Instead, new discoveries of fossils that extend the known age range of clades are almost guaranteed to be reported in the literature (and as a consequence also in databases), whereas younger findings may often not be considered worthy of publication. Thus, available information about the oldest records of clades is likely to be better curated and less biased than the collective data for all its fossils. Furthermore, since fossils are added as tips in analyses with the FBD model, the computational demand increases with the number of fosssils, and may be prohibitive for higher-level phylogenies of clades with rich fossil records.

On the other hand, the specification of CladeAge calibration densities is computationally not more demanding than any other calibration density used in node dating, and is thus suitable for large-scale phylogenetic analyses. In contrast to the current software implementations of the FBD model (Gavryushkina et al. 2014; Zhang et al. 2015), the CladeAge method can also account for different rates of diversification and sampling in different co-existing clades, as calibration densities are independently specified for each calibrated clade. This is likely to improve age estimates in higher-level phylogenies such as the vertebrate tree of life, where substantial differences in these rates have previously been demonstrated (Foote and Sepkoski 1999; Alfaro et al. 2009). Thus, a strategy for the Bayesian divergence time estimation of large trees like the vertebrate tree of life could include the following steps. First, representative groups with suitable fossil records could be chosen from several of the higher taxa (i.e. mammals, birds, teleost fishes) included in the phylogeny, and could be used to estimate sampling and diversification rate parameters for these taxa. This could be done either using information from the fossil record alone (Silvestro et al. 2014; Starrfelt and Liow 2015), or in combination with molecular data, e.g. by means of separate FBD analyses for each of the representative groups. Then, the resulting rate estimates could be used to calibrate the ages of clades within the higher taxa, under the assumption that the true rates of these clades are comparable to those estimated from representative groups. Finally, divergence time estimation of the complete phylogeny could be carried out with node dating, on the basis of CladeAge calibration densities.

### Trans-Atlantic dispersal of cichlid fishes

Using CladeAge calibration denstities for 147 clades of teleost fishes, we found strong evidence for trans-oceanic dispersal, not only in cichlid fishes, but also in several other groups of freshwater fishes, including Cyprinodontoidei, Aplocheiloidei, and Siluroidei (Fig. 4). The calibration densities used in our analysis were based on estimates of sampling and diversification rates by Foote and Sepkoski (1999) and Santini et al. (2009), and our results could thus be biased if these estimates are inaccurate. We note, however, that the rate estimates used by us are low compared to those of other clades (see Supplementary Table S1) or those obtained by other authors (Rabosky et al. 2013). Thus, the used estimates for sampling and diversification rates are more likely underestimates than overestimates, which would lead to calibration densities that are wider than they should be, and therefore to overestimated ages of clades in our phylogeny.

For several further reasons, we would expect our age estimates to be rather over-than underestimated. First, our simulations have shown that age estimates obtained with CladeAge calibration densities (or the FBD model) can appear too old when the sampling rate is low and the assumed clock model is violated, e.g. by branch-rate autocorrelation (Supplementary Figure S3b). In practice, autocorrelation of branch-specific substitution rates can rarely be excluded, and may be present also in teleost fishes, as many factors influencing rate variation are heritable in vertebrates (Nabholz et al. 2008; Amster and Sella 2016). Second, while our molecular dataset is composed of both nuclear and mitochondrial sequences, nuclear sequences were available to a greater degree for taxa outside of Cichlidae, and may be underrepresented for clades within this family. As the substitution rate of mitochondrial markers is usually higher than that of nuclear markers (Brown et al. 1979), overall genetic divergences between cichlids might appear higher than those of other groups that have a similar age but a lower proportion of missing data in nuclear markers. Third, by using concatenation of all sequence markers rather than the multispecies coalescent-model (which would have been computationally infeasible), we essentially ignored potential variation between gene trees due to incomplete lineage sorting, which has been shown to lead to inflated age estimates in several studies (McCormack et al. 2011; Colombo et al. 2015). Fourth, in contrast to the authors of a previous study on cichlid divergence times (Friedman et al. 2013), we assumed nested positions of the oldest Neotropical and African cichild fossils within the subfamilies Cichlinae and Pseudocrenilabrinae, respectively, rather than positions along their stem lineages. Specifically, we assumed a position within genus *Gymnogeophagus* for *Gymnogeophagus* † *eocenicus,* a position of † *Tremembichthys garciae* within Cichlasomatini, a position of † *Plesioheros chauliodus* within Heroini, a position of †*Proterocara argentina* within a clade formed by the extant genera *Teleocichla* and *Crenicichla,* and a position of † *Mahengechromis* spp. within the African tribe Hemichromini, which are all supported by morphological analyses (Murray 2000b, 2001b; Malabarba and Malabarba 2008; Smith et al. 2008; Alano Perez et al. 2010; Malabarba et al. 2010). If these nested positions should be unjustified (as suggested by Friedman et al. 2013), even younger ages of cichlid divergences would be expected. Taken together, our analyses strongly support trans-Atlantic dispersal of cichlids, between 89.4 and 74.0 Ma, or earlier.

## Conclusion

In this study, we have presented a new approach to Bayesian node dating that is directly based on probabilities of fossil sampling, and thus overcomes previous shortcomings of node dating. We have demonstrated that our approach allows accurate and precise divergence-time estimation and represents a viable alternative to the FBD model when estimates for the rates of fossil sampling and diversification are available *a priori.* Our approach is particularly suitable for the time calibration of large-scale phylogenies, and we have outlined strategies how to use our method in order to account for variable rates of sampling and diversification in different clades. By applying our approach to a detailed phylogeny of teleost fishes, we have shown that freshwater fishes in several clades have diverged long after the separation of the continents on which they live, which implies that fishes from these clades have successfully traversed oceanic environments despite their adaptations to a freshwater lifestyle. These examples include the trans-Atlantic dispersal of cichlid fishes, which led to their colonization of South and Central American rivers and lakes, and to the radiation of Neotropical cichlid fishes into over 600 extant species. We have implemented our approach in the CladeAge package for BEAST (Gavryushkina et al. 2014; Bouckaert et al. 2014), which is freely available at www.beast2.org.

## Funding

M.M. was funded by a postdoctoral fellowship from the Swiss National Science Foundation (PBBSP3-138680). W.S. was supported by the Swiss National Science Foundation (Sinergia Grant CRSII3_136293) and the European Research Council (CoG “CICHLID∽X”). R.B. was supported by a Rutherford Discovery Fellowship from the Royal Society of New Zealand awarded to Alexei Drummond.

## Acknowledgements

We would like to thank Alison Murray and Ken Monsch for advice on fossil calibrations, and Jaroslav Vohánka, Jérémy Winninger and Brigitte Aeschbach for help during labwork and 454 pyrosequencing. Alexei Drummond, Sebastian Höhna, Hallie Sims, Tanja Stadler, and Christian Heine provided valuable comments on the project and the manuscript. Discussions with Mike Foote and David Bapst stimulated the translation of genus-level to species-level sampling rates. We wish to acknowledge the contribution of the NeSI (www.nesi.org.nz) and Abel (www.hpc.uio.no) high-performance computing facilities to the results of this research.