Abstract
Time-dependent birth-death sampling models have been used in numerous studies for inferring past evolutionary dynamics in different areas, e.g. speciation and extinction rates in macroevolutionary studies, or effective reproductive number in epidemiological studies. These models are branching processes where lineages can bifurcate, die, or be sampled with time-dependent birth, death, and sampling rates, generating phylogenetic trees. It has been shown that in some subclasses of such models, different sets of rates can result in the same distributions of reconstructed phylogenetic trees, and therefore the rates become unidentifiable from the trees regardless of their size. Here we show that widely used time-dependent fossilised birth-death (FBD) models are identifiable. This subclass of models makes more realistic assumptions about the fossilisation process and certain infectious disease transmission processes than the unidentifiable birth-death sampling models. Namely, FBD models assume that sampled lineages stay in the process rather than being immediately removed upon sampling. Identifiability of the time-dependent FBD model justifies using statistical methods that implement this model to infer the underlying temporal diversification or epidemiological dynamics from phylogenetic trees or directly from molecular or other comparative data. We further show that the time-dependent fossilised-birth-death model with an extra parameter, the removal after sampling probability, is unidentifiable. This implies that in scenarios where we do not know how sampling affects lineages we are unable to infer this extra parameter together with birth, death, and sampling rates solely from trees.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
↵* sasha.gavryushkina{at}canterbury.ac.nz
Major updates to the manuscript are listed below. Minor revisions such as typo corrections are not explicitly stated. Added a paragraph to the discussion explaining why the FBD model is identifiable in less technical terms. Updated references to the work of Kubo & Iwasa (1995) and Louca et al. (2021). Re-ordered paragraphs in the introduction, methods and results sections. Included constraint strategies to address identifiability. Explained the normalised lineage through time. Split a figure of nLTT curves into two separate figures and added an additional figure exploring the effect of different removal probabilities on an nLTT curve. Re-worded discussion on what identifiability provides for inferring temporal rates. Funding section updated. Added statement on challenges of including fossils. Added references to Coddington & Levison, and Hale, for use of Gronwall's inequality for ODEs. Supplementary file updated, including correcting equation 13.