PT - JOURNAL ARTICLE AU - Wedell, Eleanor AU - Shen, Chengze AU - Warnow, Tandy TI - BATCH-SCAMPP: Scaling phylogenetic placement methods to place many sequences AID - 10.1101/2022.10.26.513936 DP - 2023 Jan 01 TA - bioRxiv PG - 2022.10.26.513936 4099 - http://biorxiv.org/content/early/2023/06/18/2022.10.26.513936.short 4100 - http://biorxiv.org/content/early/2023/06/18/2022.10.26.513936.full AB - Phylogenetic placement, the problem of placing sequences into phylogenetic trees, has been limited either by the number of sequences placed in a single run or by the size of the placement tree. The most accurate scalable phylogenetic placement method with respect to the number of query sequences placed, EPA-ng, has a runtime that scales sub-linearly to the number of query sequences. However, larger phylogenetic trees cause an increase in EPA-ng’s memory usage, limiting the method to placement trees of up to 10,000 sequences. Our recently designed SCAMPP framework has been shown to scale EPA-ng to larger placement trees of up to 200,000 sequences by building a subtree for the placement of each query sequence. The approach of SCAMPP does not take advantage of EPA-ng’s parallel efficiency since it only places a single query for each run of EPA-ng. Here we present BATCH-SCAMPP, a new technique that overcomes this barrier and enables EPA-ng and other phylogenetic placement methods to scale to ultra-large backbone trees and many query sequences. BATCH-SCAMPP is freely available at https://github.com/ewedell/BSCAMPP_code.Competing Interest StatementThe authors have declared no competing interest.