PT - JOURNAL ARTICLE
AU - Fabreti, Luiza Guimarães
AU - Höhna, Sebastian
TI - Bayesian inference of phylogeny is robust to substitution model over-parameterization
AID - 10.1101/2022.02.17.480861
DP - 2022 Jan 01
TA - bioRxiv
PG - 2022.02.17.480861
4099 - http://biorxiv.org/content/early/2022/02/19/2022.02.17.480861.short
4100 - http://biorxiv.org/content/early/2022/02/19/2022.02.17.480861.full
AB - Model selection aims to choose the most adequate model for the statistical analysis at hand. The model must be complex enough to capture the complexity of the data but should be simple enough to not overfit. In phylogenetics, the most common model selection scenario concerns selecting an appropriate substitution and partition model for sequence evolution to infer a phylogenetic tree. Here we explored the impact of substitution model over-parameterization in a Bayesian statistical framework. We performed simulations under the simplest substitution model, the Jukes-Cantor model, and compare posterior estimates of phylogenetic tree topologies and tree length under the true model to the most complex model, the GTR+Γ+I substitution model, including over-splitting the data into additional subsets (i.e., applying partitioned models). We explored four choices of prior distributions: the default substitution model priors of MrBayes, BEAST2 and RevBayes and a newly devised prior choice (Tame). Our results show that Bayesian inference of phylogeny is robust to substitution model over-parameterization but only under our new prior settings. All three default priors introduced biases for the estimated tree length. We conclude that substitution and partition model selection are superfluous steps in Bayesian phylogenetic inference pipelines if well behaved prior distributions are applied.Competing Interest StatementThe authors have declared no competing interest.