Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Benchmarking topological accuracy of bacterial phylogenomic workflows using in silico evolution

View ORCID ProfileBoas C.L. van der Putten, Niek A.H. Huijsmans, View ORCID ProfileDaniel R. Mende, View ORCID ProfileConstance Schultsz
doi: https://doi.org/10.1101/2021.08.03.454900
Boas C.L. van der Putten
1Department of Medical Microbiology, Amsterdam UMC, University of Amsterdam, Amsterdam, the Netherlands
2Department of Global Health, Amsterdam Institute for Global Health and Development, Amsterdam UMC, University of Amsterdam, Amsterdam, the Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Boas C.L. van der Putten
  • For correspondence: boas.vanderputten@amsterdamumc.nl
Niek A.H. Huijsmans
1Department of Medical Microbiology, Amsterdam UMC, University of Amsterdam, Amsterdam, the Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Daniel R. Mende
1Department of Medical Microbiology, Amsterdam UMC, University of Amsterdam, Amsterdam, the Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Daniel R. Mende
Constance Schultsz
1Department of Medical Microbiology, Amsterdam UMC, University of Amsterdam, Amsterdam, the Netherlands
2Department of Global Health, Amsterdam Institute for Global Health and Development, Amsterdam UMC, University of Amsterdam, Amsterdam, the Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Constance Schultsz
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Phylogenetic analyses are widely used in microbiological research, for example to trace the progression of bacterial outbreaks based on whole-genome sequencing data. In practice, multiple analysis steps such as de novo assembly, alignment and phylogenetic inference are combined to form phylogenetic workflows. Comprehensive benchmarking of the accuracy of complete phylogenetic workflows is lacking.

To benchmark different phylogenetic workflows, we simulated bacterial evolution under a wide range of evolutionary models, varying the relative rates of substitution, insertion, deletion, gene duplication, gene loss and lateral gene transfer events. The generated datasets corresponded to a genetic diversity usually observed within bacterial species (≥95% average nucleotide identity). We replicated each simulation three times to assess replicability. In total, we benchmarked seventeen distinct phylogenetic workflows using 8 different simulated datasets.

We found that recently developed k-mer alignment methods such as kSNP and SKA achieve similar accuracy as reference mapping. The high accuracy of k-mer alignment methods can be explained by the large fractions of genomes these methods can align, relative to other approaches. We also found that the choice of de novo assembly algorithm influences the accuracy of phylogenetic reconstruction, with workflows employing SPAdes or SKESA outperforming those employing Velvet. Finally, we found that the results of phylogenetic benchmarking are highly variable between replicates.

We conclude that for phylogenomic reconstruction k-mer alignment methods are relevant alternatives to reference mapping at species level, especially in the absence of suitable reference genomes. We show de novo genome assembly accuracy to be an underappreciated parameter required for accurate phylogenomic reconstruction.

Impact statement Phylogenetic analyses are crucial to understand the evolution and spread of microbes. Among their many applications is the reconstruction of transmission events which can provide information on the progression of pathogen outbreaks. For example, to investigate foodborne outbreaks such as the 2011 outbreak of Escherichia coli O104:H4 across Europe. As different microbes evolve differently, it is important to know which phylogenetic workflows are most accurate when working with diverse bacterial data. However, benchmarks usually consider only a limited dataset. We therefore employed a range of simulated evolutionary scenarios and benchmarked seventeen phylogenetic workflows on these simulated datasets. An advantage of our simulation approach is that we know a priori what the outcome of the analyses should be, allowing us to benchmark accuracy. We found significant differences between phylogenetic workflows and were able to dissect which factors contribute to phylogenetic analysis accuracy. Taken together, this new information will hopefully enable more accurate phylogenetic analysis of bacterial outbreaks.

Data summary A Zenodo repository is available at https://doi.org/10.5281/zenodo.5036179 containing all simulated genomes, all alignments produced by phylogenetic workflows and .csv files summarising the topological accuracies of phylogenies produced based on these alignments. Code is available at https://github.com/niekh-13/phylogenetic_workflows.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • https://doi.org/10.5281/zenodo.5036179

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Back to top
PreviousNext
Posted August 04, 2021.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Benchmarking topological accuracy of bacterial phylogenomic workflows using in silico evolution
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Benchmarking topological accuracy of bacterial phylogenomic workflows using in silico evolution
Boas C.L. van der Putten, Niek A.H. Huijsmans, Daniel R. Mende, Constance Schultsz
bioRxiv 2021.08.03.454900; doi: https://doi.org/10.1101/2021.08.03.454900
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Benchmarking topological accuracy of bacterial phylogenomic workflows using in silico evolution
Boas C.L. van der Putten, Niek A.H. Huijsmans, Daniel R. Mende, Constance Schultsz
bioRxiv 2021.08.03.454900; doi: https://doi.org/10.1101/2021.08.03.454900

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3687)
  • Biochemistry (7782)
  • Bioengineering (5673)
  • Bioinformatics (21259)
  • Biophysics (10567)
  • Cancer Biology (8165)
  • Cell Biology (11921)
  • Clinical Trials (138)
  • Developmental Biology (6751)
  • Ecology (10393)
  • Epidemiology (2065)
  • Evolutionary Biology (13847)
  • Genetics (9700)
  • Genomics (13061)
  • Immunology (8133)
  • Microbiology (19976)
  • Molecular Biology (7841)
  • Neuroscience (43008)
  • Paleontology (318)
  • Pathology (1276)
  • Pharmacology and Toxicology (2257)
  • Physiology (3350)
  • Plant Biology (7219)
  • Scientific Communication and Education (1310)
  • Synthetic Biology (2000)
  • Systems Biology (5529)
  • Zoology (1126)