Assessing Confidence in Root Placement on Phylogenies: An Empirical Study Using Non-Reversible Models

Using time-reversible Markov models is a very common practice in phylogenetic analysis, because although we expect many of their assumptions to be violated by empirical data, they provide high computational efficiency. However, these models lack the ability to infer the root placement of the estimated phylogeny. In order to compensate for the inability of these models to root the tree, many researchers use external information such as using outgroup taxa or additional assumptions such as molecular-clocks. In this study, we investigate the utility of non-reversible models to root empirical phylogenies and introduce a new bootstrap measure, the rootstrap, which provides information on the statistical support for any given root position. Availability and implementation A python script for calculating rootstrap support values is available at https://github.com/suhanaser/Rootstrap.


MAIN TEXT 23
The most widely used method for rooting trees in phylogenetics is the outgroup method. 24 Although the use of an outgroup to root an unrooted phylogeny usually outperforms other 25 rooting methods (Huelsenbeck, et al. 2002), the main challenge with this method is to find an 26 appropriate outgroup (Watrous and Wheeler 1981;Maddison, et al. 1984;Smith 1994;27 Swofford, et al. 1996;Milinkovitch and Lyons-Weiler 1998). 28 Outgroups that are too distantly-related to the ingroup may have substantially different 29 molecular evolution than the ingroup, which can compromise accuracy. And outgroups that are 30 too closely related to the ingroup may not be valid outgroups at all. 31 It is possible to infer the root of a tree without an outgroup using molecular clocks 32 (Huelsenbeck, et al. 2002;Drummond, et al. 2006). A strict molecular-clock assumes that the 33 substitution rate is constant along all lineages, a problematic assumption especially when the 34 ingroup taxa are distantly related such that their rates of molecular evolution may vary. 35 Relaxed molecular-clocks are more robust to deviations from the clock-like behaviour 36 (Drummond, et al. 2006), although previous studies have shown that they can perform poorly 37 in estimating the root of a phylogeny when those deviations are considerable (Tria, et al. 2017). 38 Other rooting methods rely on the distribution of branch lengths, including Midpoint 39 Rooting (MPR) (Farris 1972), Minimal Ancestor Deviation (MAD) (Tria, et al. 2017), and 40 Minimum Variance Rooting (MVR) (Mai, et al. 2017). Such methods also assume a clock-like 41 behaviour; however, they are less dependent on this assumption as the unrooted tree is 42 estimated without it. Similar to inferring a root directly from molecular-clock methods, the 43 accuracy of those rooting methods decreases with higher deviations from the molecular-clock 44 assumption (Mai, et al. 2017). 45 Other less common rooting methods that can be used in the absence of outgroup are: 46 rooting by gene duplication (Dayhoff and Schwartz 1980;Gogarten, et al. 1989;Iwabe, et al. 47 1989), indel-based rooting (Rivera and Lake 1992; Baldauf and Palmer 1993;Lake, et al. 48 2007), rooting species tree from the distribution of unrooted gene trees (Allman, et al. 2011;49 Yu, et al. 2011), and probabilistic co-estimation of gene trees and species tree (Boussau, et al. 50 2013). 51 All the methods mentioned above, apart from the molecular-clock, infer the root 52 position independently of the ML tree inference. The only existing approach to include root 53 placement in the ML inference is the application of non-reversible models. Using non-54 reversible substitution models relaxes the fundamental assumption of time-reversibility that 55 exists in the most widely used models in phylogenetic inference (Jukes and Cantor 1969;56 Kimura 1980;Hasegawa, et al. 1985;Tavaré 1986;Dayhoff 1987;Jones, et al. 1992;Tamura 57 and Nei 1993; Whelan and Goldman 2001; Le and Gascuel 2008). This in itself is a potentially 58 useful improvement in the fit between models of sequence evolution and empirical data. In 59 addition, since non-reversible models naturally incorporate a notion of time, the position of the 60 root on the tree is a parameter that is estimated as part of the ML tree inference. Since the 61 incorporation of non-reversible models in efficient ML tree inference software is relatively new 62 (Minh, et al. 2020), we still understand relatively little about the ability of non-reversible 63 models to infer the root of a phylogenetic tree, although a recent simulation study has shown 64 some encouraging results (Bettisworth and Stamatakis 2020). 65 Regardless of the rooting method and the underlying assumptions, it is crucial that we 66 are able to estimate the statistical confidence we have in any particular placement of the root 67 on a phylogeny. A number of previous studies have sensibly use ratio likelihood tests such as 68 the Shimodaira-Hasegawa (SH) test (Shimodaira and Hasegawa 1999) and the Approximately 69 Unbiased (AU) test (Shimodaira 2002) to compare a small set of potential root placements, 70 the non-reversible model's parameters by maximizing the likelihood function seems more 96 computationally intensive than calculating posterior probabilities (Huelsenbeck, et al. 2002), 97 the IQ-TREE algorithm sufficiently fast to allow us to estimate root placements, with rootstrap 98 support for very large datasets. 99 MATERIAL AND METHODS 100 The "Rootstrap" Support, and measurements of error in root placement 101 To compute rootstrap supports, we conduct a bootstrap analysis, i.e., resampling alignment 102 sites with replacement, to obtain a number of bootstrap trees. We define the rootstrap support 103 for each branch in the ML tree, as the proportion of bootstrap trees that have the root on that 104 branch. Since the root can be on any branch in a rooted tree, the rootstrap support values are 105 computed for all the branches including external branches. The sum of the rootstrap support 106 values along the tree are always smaller than or equal to one. A sum that is smaller than one 107 can occur when one or more bootstrap replicates are rooted on a branch that does not occur in 108 the ML tree (Fig. 1). 109 The ML tree with the rootstrap support values for each branch. Note that the sum of the 112 rootstrap support values is less than 100% due to 100 bootstrap replicates trees (green) that 113 have their root at a branch that does not exist in the ML tree. 114 If the true position of the root is known (e.g. in simulation studies) or assumed (e.g. in 116 the empirical cases we present below), we can calculate additional measurements of the error 117 of the root placement. We introduce two such measurements here: root branchlength error 118 distance (rBED) and root split error distance (rSED). Since the non-reversible model infers 119 the exact position of the root on a branch, we define the root branchlength error distance 120 (rBED) as the range between the minimum and maximum distance between the inferred root 121 position and the "true root" branch. If the true root is on the same branch as the ML tree root, 122 then rBED will be between 0 and the distance between the ML tree root and the farthest point 123 on that branch (Fig. 2). Since rBED is based on branch lengths only, it ignores the absolute 124 number of splits between the ML tree root and the true root; and therefore, the rBED for the 125 true root being on the same ML root branch can be bigger than the rBED for the true root 126 being on a different branch (e.g. Fig. 2). In order to account for the number of splits (nodes) 127 between the ML tree root and the true root, we define root split error distance (rSED) as the 128 number of splits between the ML root branch and the branch that is believed to contain the 129 true root (Fig. 2). (2) Genome-scale MSAs to ensure that the MSAs have as much information as possible with 151 which to estimate the non-reversible models' free parameters and the root position. Since 152 we do not know the number of sites required to correctly infer the rooted ML tree, we 153 define 100,000 sites as the minimum number of required sites. This also allows us to 154 subsample the dataset to explore the ability of smaller datasets to infer root positions. 155 (3) Highly-curated alignments: since the quality of the inferred phylogeny is highly 156 dependent on the quality of the MSA (Philippe, et al. 2011), we focussed on datasets that 157 were highly-curated for misalignment, contamination, and paralogy. 158 (4) Existence of several clades for which there is a very strong consensus regarding their root 159 placement. Since we are interested in evaluating the performance of non-reversible 160 models to infer root placements in an empirical rather than a simulation context, we need 161 to identify monophyletic sub-clades for which we can be almost certain about their root 162 position. This enables us to divide the dataset into non-overlapping sub-clades for which 163 we are willing to assume we know the root positions. Furthermore, we define the 164 minimum number of taxa in each sub-dataset as five. 165 We initially identified a number of genome-scale datasets that contained large numbers of 166 nucleotide and amino acid MSAs. In many cases, it was difficult to determine whether these 167 alignments had been rigorously curated, and even more challenging to find datasets for which 168 the root position of a number of subclades could be assumed with confidence. The only dataset 169 that met all of our criteria was a dataset of placental mammals with 78 ingroup taxa and 170 3,050,199 amino acids (Wu, et al. 2019). This dataset was originally published as an MSA 171 (Liu, et al. 2017) based on very high-quality sequences from Ensembl, NCBI, and GenBank 172 databases. After receiving detailed critiques for potential alignment errors (Gatesy and Springer 173 2017), the dataset was further processed to remove potential sources of bias and error, and an 174 updated version of the dataset was recently published (Wu, et al. 2018). The fact that this 175 alignment comes from one of the most well-studied clades on the planet, has been 176 independently curated and critiqued by multiple groups of researchers and includes truly 177 genome-scale data, makes it ideally suited for our study. 178 179

Selecting Clades with a Well-Defined Root 180
Since our main objective in this study is to evaluate the effectiveness of non-reversible 181 models and the rootstrap value in estimating and measuring the support for a given root 182 placement on empirical datasets, we must identify a collection of sub-clades of the larger 183 mammal dataset for which it is reasonable to assume a root position. We acknowledge, of 184 course, that outside a simulation framework it is not possible to be certain of the position of the 185 root position of a clade. Nevertheless, it is possible to identify clades for which the position of 186 the root is well supported and non-controversial, thus minimising the chances that the 187 assumption of particular root position is incorrect. To achieve this, we analysed the root 188 position of each order and superorder in the dataset, and defined "well-defined clades" that 189 fulfilled all of the following criteria: 190 (1) It contains at least five taxa. This ensures that the probability of obtaining a random ML 191 rooted tree to be at most 0.95%. For clades with four taxa, there are 15 different rooted 192 topologies, and therefore a 6.7% probability to get any of these topologies by chance. On 193 the other hand, for clades with at least five taxa, there are at least 105 different rooted 194 topologies and maximum probability of 0.95% to randomly get one of them as the ML 195 tree. 196 (2) The bootstrap support for the deepest two levels of branches leading to that clade in the 197 phylogenetic tree calculated from the whole dataset is 100%: since the bootstrap value 198 indicates the support the data have for a certain branch, we require 100% support for the 199 deepest two levels of branches leading to a certain clade in the whole tree (Appendix Fig.  200 A.1). This requirement ensures that there is strong support in the dataset for the root 201 position of the clade when the entire dataset is rooted with an outgroup. 202 (3) The gene concordance factor (gCF) and the site concordance factor (sCF) for the deepest 203 two levels of branches leading to the clade are significantly greater than 33%. The site 204 Concordance Factor (sCF) is calculated by comparing the support of each site in the 205 alignment for the different arrangements of quartet around a certain branch. In other 206 words, an sCF of 33% means equal support for any of the possible arrangements. 207 Therefore, we require that the sCF of the deepest two levels of branches leading to that 208 clade to be significantly greater than 33%. The gene Concordance Factor (gCF) of a 209 branch is calculated as the proportion of gene trees contain that branch. Although there is 210 no threshold regarding the required proportion of genes concordant with a certain branch, 211 for convenience, we define branches with gCF significantly greater than 33% as 212 branches that are concordant with enough genes in the alignment (Minh, et al. 2020 For the whole nucleotide and amino-acid datasets with ingroup and outgroup taxa, we 222 inferred the unrooted phylogeny using IQ-TREE (Nguyen, et al. 2015) with the best-fit fully 223 partitioned model (Chernomor, et al. 2016) and edge-linked substitution rates (Duchene,et al. 224 2020). We then determined the best-fit reversible model for each partition using ModelFinder 225 (Kalyaanamoorthy, et al. 2017). See the algorithm for finding well-defined clades in 226 Appendix Algorithm A.1. 227

Estimating Rooted phylogenies 228
For each well-defined clade, we first removed all other taxa from the tree and then 229 sought to infer the root of the well-defined clade using non-reversible models without 230 outgroups. Using the best partitioning scheme from the reversible analysis, we inferred the rooted tree for each well-defined clade with the non-reversible models for amino acid (NR-232 AA) and nucleotide (NR-DNA) sequences (Minh, et al. 2020). This approach fits a 12-233 parameter non-reversible model for DNA sequences, and a 380-parameter non-reversible 234 model for amino acids. Details of the command lines used are provided in the supplementary 235 material section "Algorithm A.2". Each analysis returns a rooted tree. We performed 1000 236 non-parametric bootstraps of every analysis to measure the rootstrap support. 237 To assess the performance of the rootstrap and the ability of non-reversible models to 238 estimate the root of the trees on smaller datasets, we also repeated every analysis on 239 subsamples of the complete dataset. For each well-defined clade, we performed analysis on 240 the complete dataset (100%) as well as datasets with 10%, 1% and 0.1% of randomly-241 selected loci from the original alignment. 242 The confidence set of root branches using the Approximately Unbiased test 243 In addition to the rootstrap support, we calculate the confidence set of all the branches 244 that may contain the root of the ML tree using the Approximately Unbiased (AU) test 245 (Shimodaira 2002). To do this, we re-root the ML tree with all possible placements of the root 246 (one placement for each branch) and calculate the likelihood of each tree. Using the AU test, 247 we then ask which root placements can be rejected in favour of the ML root, using an alpha 248 value of 5%. We define the root branches confidence set as the set of root branches that are not 249 rejected in favour of the ML root placement. 250

Reducing systematic bias by removing third codon positions and loci that fail the MaxSym 251
test 252 As it is common in many phylogenetic analyses to remove third codon positions from 253 the alignment (Swofford, et al. 1996), we wanted to assess the effect of removing third codon 254 positions on the root inference and the rootstrap values in nucleotide datasets. For that 255 purpose, we remove all the third codon positions from the nucleotide alignments and re-ran 256 the analysis using the NR-DNA model. 257 Moreover, although the NR-AA and NR-DNA models relax the reversibility assumption, 258 they still assume stationarity and homogeneity. To reduce the systematic bias produced by 259 violating these assumptions, we used the MaxSym test (Naser-Khdour, et al. 2019) to remove 260 loci that violate those assumptions in the nucleotide and amino acid datasets, and then re-ran 261 all analyses as above. 262 Applying the methods to two clades whose root position is uncertain 263 In addition to the well-defined clades, we used the methods we propose here to infer 264 the root of two clades of mammals whose root position is controversial; Chiroptera and the 265

Cetartiodactyla. 266
There is a controversy around the root of the Chiroptera (bats) in literature.

Inference of the mammal tree and selection of well-defined clades 289
The trees inferred from the whole datasets with the nucleotide-reversible model and 290 the amino-acid-reversible model (Appendix Fig. A.2, Appendix Fig. A.3, Appendix Table  291 A.2) are consistent with the published tree (Liu, et al. 2017). Five clades met all the criteria of 292 well-defined clades, namely, Afrotheria, Bovidae, Carnivora, Myomorpha, and Primates in 293 both amino acid and nucleotide datasets (see Appendix Table A.1 and Appendix Table A.2). 294

High accuracy of the AA non-reversible model in inferring the root 295
Using NR-AA, we inferred the correct root with very high rootstrap support for all 296 five well-defined clades (Appendix Table A.3). Moreover, for all the five clades, the true root 297 was the only root placement in the confidence set of the AU test. 298 Our results show that using only 10% of the sites in the amino acid alignments gave 299 very high rootstrap support values (> 98%) for most of these clades (Fig. 3). Moreover, in 300 some datasets, 1% of the sites were enough to give a very high rootstrap support value. Yet, 301 using only 0.1% decreased the rootstrap support value noticeably in all datasets (Appendix 302 Table A.3). These values are shown for each dataset in Figure 3, where the X-axis is plotted 303 in terms of parsimony-informative sites to allow for a more direct comparison between 304 datasets. Although the rootstrap support for the true root improves as the number of 305 parsimony-informative sites increase, in some datasets (e.g. Afrotheria nucleotide dataset) 306 this is not the case (Fig. 3). Poor performance of the DNA non-reversible model in inferring the root 313 We correctly inferred the root for four out of the five nucleotide datasets with the NR-314 DNA model. However, the rootstrap support was generally lower than in the amino-acid 315 datasets (Fig. 3, Appendix Tables A.3 and A.4). In addition, our results show that removing 316 the third codon positions does not improve the rootstrap support value. In contrast, in some 317 datasets removing third codon positions decreased the rootstrap support value and increased 318 the rSED (Table 1). 319 Using the whole amino acid dataset, our results show 65.5% rootstrap support for the 333 Yinpterochiroptera-Yangochiroptera hypothesis and 23.2% for the Microchiroptera -334 Megachiroptera hypothesis. The remaining11.3% of the rootstrap support goes to supporting 335 the branch leading to Rhinolophoidea as root branch of the bats (Fig. 4). Removing amino 336 acid loci that fail the MaxSym test (110 loci) gives similar results, with 65.9% rootstrap 337 support for the Yinptero-Yango hypothesis and 25.6% rootstrap support for the Micro-Mega 338 hypothesis. In both cases, the AU test could not reject any of the three root positions that 339 received non-zero rootstrap support (Appendix Table A.5). 340 Yinptero-Yango hypothesis using the AU test (Appendix Fig. A.4). Yet, removing nucleotide 347 loci that fail the MaxSym test (~25% of the loci) decreases the support for the Yinptero-348 Yango hypothesis to 90.1%, although we can still confidently reject the Micro-Mega 349 hypothesis using the AU test (Appendix Table A.5). 350 Interestingly, when we randomly subsample 10%, 1%, and 0.1% of the loci in the 351 nucleotide dataset, we consistently get the Yinptero-Yango hypothesis as the ML tree and the 352 solely rooted topology in the AU confidence set (Appendix Table A.5). Moreover, the 353 rootstrap support value for the Yinptero-Yango hypothesis increases and the rootstrap support 354 value for the Micro-Mega hypothesis decreases as more parsimony-informative sites are 355 added to the alignment, for both nucleotide and amino acid datasets (Fig. 5, Appendix Table  356 A.5). 357

FIGURE 5.
Rootstrap support value as a function of the number of parsimony-informative 359 characters in the Chiroptera nucleotide and amino acid datasets. 360 The ΔGLS and ΔSLS values (Shen, et al. 2017) reveal that approximately half of the 361 nucleotide and amino acid loci prefer the Yinptero-Yango hypothesis while the other half 362 prefers Micro-Mega hypothesis. Furthermore, slightly less than half of the nucleotide sites 363 prefer the Yinptero-Yango hypothesis. However, more than two-thirds of the amino acid sites 364 prefer the Yinptero-Yango hypothesis (Appendix Fig. A.5). The distributions of ΔGLS and 365 ΔSLS (Appendix Fig. A.6) show that a small proportion of the amino acid loci (~1%) have 366 very strong support for the Micro-Mega hypothesis, and removing those loci from the 367 alignment increased the rootstrap support for the Yinptero-Yango hypothesis to 76.6%. 368 Nonetheless, both root placements are still in the confidence set of the AU test (Appendix 369 Table A.5) with the amino acid dataset. On the other hand, removing nucleotide loci with the 370 highest absolute ΔGLS value still gives the Yinptero-Yango hypothesis as the ML tree and 371 the sole topology in the AU confidence set. We conclude that while the nucleotide data show 372 a clear preference to the Yinptero-Yango hypothesis, the amino acid data do not allow us to 373 distinguish between the two leading hypotheses for the placement of the root of the 374 Chiroptera based on rooting with non-reversible models. 375 The ambiguous root of Cetartiodactyla 376 The ML tree inferred with the whole amino acid dataset places the clade containing 377 Tylopoda (represented by its only extant family; Camelidae) and Suina as the most basal 378 cetartiodactylan clade with 71.8% rootstrap support (Fig. 6) The results from the subsample datasets are mixed (Fig. 7). Analyses on smaller datasets 397 show no clear pattern in the placement of the root (Appendix Table A In this paper, we introduced a new measure of support for the placement of the root in 423 a phylogenetic tree, the rootstrap support value, and applied it to empirical amino acid and 424 nucleotide datasets inferred using non-reversible models implemented in IQ-TREE (Minh, et 425 al. 2020). The rootstrap is a useful measure because it can be used to assess the statistical 426 support for the placement of the root in any rooted tree, regardless of the rooting method. In a 427 Maximum Likelihood setting, interpretation of the rootstrap support is similar to the 428 interpretation of the classic nonparametric bootstrap. In a Bayesian setting, the same 429 procedure could be used to calculate the posterior probability of the root placement given a 430 posterior distribution of trees. It is noteworthy that the rootstrap support value is not a 431 measure of the accuracy of the root placement and therefore should not be interpreted as 432 such. However, it provides information about the robustness of the root inference with regard 433 to resampling the data. This interpretation is consistent with the interpretation of the 434 nonparametric bootstrap (Holmes 2003) but with regard to the root placement instead of the 435 whole tree topology. 436 In addition to the rootstrap support value, we introduced another two metrics; the root 437 branch-length error distance (rBED), and the root split error distance rSED. Similar to the 438 rootstrap metric, these additional metrics can be used in with any approach that generates 439 rooted phylogenetic trees. We note that both metrics require the true position of the root to be 440 known (or assumed) and that the rBED requires the rooting method to be able to accurately 441 place the root in a specific position of the root branch. 442 In this study, we used these and other methods to assess the utility of non-reversible 443 models to root phylogenetic trees in a Maximum Likelihood framework. We focussed on 444 applying these methods to a large and very well curated phylogenomic dataset of mammals, 445 as the mammal phylogeny provides perhaps the best opportunity to find clades for which the 446 root position is known with some confidence. As expected, our results show an exponential 447 increase in the rootstrap support for the true root as we add more information to the MSA. 448 Our results suggest that non-reversible amino-acid models are more useful for inferring root 449 positions than non-reversible DNA models, which is consistent with results from previous 450 simulations using the NR-DNA model (Bettisworth and Stamatakis 2020). One explanation 451 for this difference between the NR-DNA and the NR-AA models is the bigger character-state 452 space of the NR-AA models. These models have 400 parameters (380 rate parameters and 20 453 amino acid frequencies) whereas NR-DNA models have only 16 parameters (12 rate 454 parameters and 4 nucleotide frequencies). This could allow the NR-AA model to capture the 455 evolutionary process better than the NR-DNA model, potentially providing more information 456 on the root position of the phylogeny. This hypothesis requires some further exploration 457 though, and we note that the actual character-space of amino acids is much smaller than 458 Our results demonstrate that both non-reversible models can be surprisingly useful for 482 inferring the root placement of phylogenies in the absence of additional information (such as 483 outgroups) or assumptions (such as molecular clocks). Indeed, we show that root placements 484 appear to be accurate even with fairly datasets as small as 50 well-curated loci between fairly 485 closely-related taxa such as orders of mammals. We hope that the combination of non-486 reversible models and rootstrap support will add another tool to the phylogeneticist's arsenal 487 when it comes to inferring rooted phylogenies. 488