RT Journal Article SR Electronic T1 SCAMPP+FastTree: Improving Scalability for Likelihood-based Phylogenetic Placement JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.05.23.493012 DO 10.1101/2022.05.23.493012 A1 Gillian Chu A1 Tandy Warnow YR 2022 UL http://biorxiv.org/content/early/2022/08/08/2022.05.23.493012.abstract AB Phylogenetic placement is the problem of placing “query” sequences into an existing tree (called a “backbone tree”), and is useful in both microbiome analysis and to update large evolutionary trees. The most accurate phylogenetic placement method to date is the maximum likelihood-based method pplacer, which uses RAxML to estimate numeric parameters on the backbone tree and then finds the edge in the tree to add the query sequence that maximizes the probability that the resultant model tree generates the query sequence. Unfortunately, pplacer fails to return valid outputs on many moderately large datasets, and so is limited to backbone trees with at most 10,000 or so leaves. We present a technique to enable pplacer to scale to large backbone trees. We draw on two prior approaches: the divide-and-conquer strategy in SCAMPP (Wedell et al., TCBB 2022) and the use of FastTree2 (Price et al., PLOS One 2010) instead of RAxML to estimate numeric parameters. We find that pplacer-SCAMPP-FastTree matches or improves on the accuracy of other placement methods, can scale to large backbone trees with 200,000 sequences, and is fast and has only moderate memory usage. In addition, pplacer-SCAMPP-FastTree enables the user to vary the placement tree size, thus enabling an exploration of the runtime-accuracy trade-off. Our software for pplacer-SCAMPP-FastTree is available at https://github.com/gillichu/PLUSplacer-taxtastic.Competing Interest StatementThe authors have declared no competing interest.