Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

BATCH-SCAMPP: Scaling phylogenetic placement methods to place many sequences

View ORCID ProfileEleanor Wedell, View ORCID ProfileChengze Shen, View ORCID ProfileTandy Warnow
doi: https://doi.org/10.1101/2022.10.26.513936
Eleanor Wedell
1University of Illinois Urbana-Champaign
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Eleanor Wedell
Chengze Shen
2University of Illinois, Urbana-Champaign
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Chengze Shen
Tandy Warnow
3University of Illinois at Urbana-Champaign
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Tandy Warnow
  • For correspondence: warnow@illinois.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Phylogenetic placement, the problem of placing sequences into phylogenetic trees, has been limited either by the number of sequences placed in a single run or by the size of the placement tree. The most accurate scalable phylogenetic placement method with respect to the number of query sequences placed, EPA-ng, has a runtime that scales sub-linearly to the number of query sequences. However, larger phylogenetic trees cause an increase in EPA-ng’s memory usage, limiting the method to placement trees of up to 10,000 sequences. Our recently designed SCAMPP framework has been shown to scale EPA-ng to larger placement trees of up to 200,000 sequences by building a subtree for the placement of each query sequence. The approach of SCAMPP does not take advantage of EPA-ng’s parallel efficiency since it only places a single query for each run of EPA-ng. Here we present BATCH-SCAMPP, a new technique that overcomes this barrier and enables EPA-ng and other phylogenetic placement methods to scale to ultra-large backbone trees and many query sequences. BATCH-SCAMPP is freely available at https://github.com/ewedell/BSCAMPP_code.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • We have added experiments comparing two alignment-free methods for phylogenetic placement. We have also improved the writing for the sake of clarity.

  • https://github.com/ewedell/BSCAMPP_code

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted June 18, 2023.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
BATCH-SCAMPP: Scaling phylogenetic placement methods to place many sequences
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
BATCH-SCAMPP: Scaling phylogenetic placement methods to place many sequences
Eleanor Wedell, Chengze Shen, Tandy Warnow
bioRxiv 2022.10.26.513936; doi: https://doi.org/10.1101/2022.10.26.513936
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
BATCH-SCAMPP: Scaling phylogenetic placement methods to place many sequences
Eleanor Wedell, Chengze Shen, Tandy Warnow
bioRxiv 2022.10.26.513936; doi: https://doi.org/10.1101/2022.10.26.513936

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4672)
  • Biochemistry (10336)
  • Bioengineering (7655)
  • Bioinformatics (26283)
  • Biophysics (13497)
  • Cancer Biology (10664)
  • Cell Biology (15408)
  • Clinical Trials (138)
  • Developmental Biology (8485)
  • Ecology (12802)
  • Epidemiology (2067)
  • Evolutionary Biology (16819)
  • Genetics (11380)
  • Genomics (15458)
  • Immunology (10593)
  • Microbiology (25164)
  • Molecular Biology (10196)
  • Neuroscience (54377)
  • Paleontology (399)
  • Pathology (1664)
  • Pharmacology and Toxicology (2889)
  • Physiology (4332)
  • Plant Biology (9223)
  • Scientific Communication and Education (1585)
  • Synthetic Biology (2554)
  • Systems Biology (6769)
  • Zoology (1459)