Abstract
Genes evolve under processes such as gene duplication and loss (GDL), so that gene family trees are multi-copy, as well as incomplete lineage sorting (ILS); both processes produce gene trees that differ from the species tree. The estimation of species trees from sets of gene family trees is challenging, and the estimation of rooted species trees presents additional analytical challenges. Two of the methods developed for this problem are STRIDE (Emms and Kelly, MBE 2017), which roots species trees by considering GDL events, and Quintet Rooting (Tabatabaee et al., ISMB 2022 and Bioinformatics 2022), which roots species trees by considering ILS. We present DISCO+QR, a new method for rooting species trees in the presence of both GDL and ILS. DISCO+QR, operates by taking the input gene family trees and decomposing them into single-copy trees using DISCO (Willson et al., Systematic Biology 2022) and then roots the given species tree using the information in the single-copy gene trees using Quintet Rooting (QR). We show that the relative accuracy of STRIDE and DISCO+QR depend on properties of the dataset (number of species, genes, rate of gene duplication, degree of ILS, and gene tree estimation error), and that each provides advantages over the other under some conditions. Availability: DISCO and QR are available in GitHub. The supplementary materials are available at http://tandy.cs.illinois.edu/discoqr-suppl.pdf.
Competing Interest Statement
The authors have declared no competing interest.