Abstract
The rooting of the SARS-CoV-2 phylogeny is important for understanding the origin and early spread of the virus. Previously published phylogenies have used different rootings that do not always provide consistent results. We investigate several different strategies for rooting the SARS-CoV-2 tree and provide measures of statistical uncertainty for all methods. We show that methods based on the molecular clock tend to place the root in the B clade, while methods based on outgroup rooting tend to place the root in the A clade. The results from the two approaches are statistically incompatible, possibly as a consequence of deviations from a molecular clock or excess back-mutations. We also show that none of the methods provide strong statistical support for the placement of the root in any particular edge of the tree. Our results suggest that inferences on the origin and early spread of SARS-CoV-2 based on rooted trees should be interpreted with caution.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
-We have more than doubled our sample size (from n=64 to n=132) to investigate whether other rootings are possible. We have included more samples from Wuhan and/or mainland China and more samples from later in the pandemic. -We have performed a simulation study of the accuracy of the bootstrap in the current context.. -We have added a section about investigating for temporal signal in the data using root-to-tip divergence for the two rootings (a rooting in clade A and a rooting in clade B). We have also added an analysis using a formal approach for testing for temporal signal using BETS. -We have made our two software programs used in this manuscript available for download. -We have rewritten the text of the manuscript to provide more context to the significance of the problem we are addressing including an additional main figure which illustrates the inconsistency in the rooting using outgroup rooting vs. molecular clock rooting.