Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Large-scale comparative genomics of Salmonella enterica to refine the organization of the global Salmonella population structure

View ORCID ProfileChao Chun Liu, View ORCID ProfileWilliam W.L. Hsiao
doi: https://doi.org/10.1101/2021.09.30.462489
Chao Chun Liu
1Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Chao Chun Liu
William W.L. Hsiao
1Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, Canada
2Faculty of Health Sciences, Simon Fraser University, Burnaby, British Columbia, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for William W.L. Hsiao
  • For correspondence: wwhsiao@sfu.ca
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

1. Abstract

Since the introduction of the White-Kauffmann-Le Minor (WKL) scheme for Salmonella serotyping, the nomenclature remains the most widely used for reporting the disease prevalence of Salmonella enterica across the globe. With the advent of whole genome sequencing (WGS), traditional serotyping has been increasingly replaced by in-silico methods that couple the detection of genetic variations in antigenic determinants with sequence-based typing. However, despite the integration of genomic-based typing by in-silico serotyping tools such as SeqSero2 and SISTR, in-silico serotyping in certain contexts remains ambiguous and insufficiently informative due to polyphyletic serovars. Furthermore, in spite of the widespread acknowledgement of polyphyly from genomic studies, the serotyping nomenclature remains unaltered. To prompt refinements to the Salmonella typing nomenclature for disease reporting, we herein performed a systematic characterization of putative polyphyletic serovars and the global Salmonella population structure by comparing 180,098 Salmonella genomes (representing 723 predicted serovars) from GenomeTrakr and PubMLST databases. We identified a range of core genome MLST typing thresholds that result in stable population structure, potentially suitable as the foundation of a genomic-based typing nomenclature for longitudinal surveillance. From the genomic comparisons of hundreds of predicted serovars, we demonstrated that in-silico serotyping classifications do not consistently reflect the population divergence observed at the genomic level. The organization of Salmonella subpopulations based on antigenic determinants can be confounded by homologous recombination and niche adaptation, resulting in shared classification of highly divergent genomes and misleading distinction between highly similar genomes. In consideration of the pivotal role of Salmonella serotyping, a compendium of putative polyphyletic serovars was compiled and made publicly available to provide additional context for future interpretations of in-silico serotyping results in disease surveillance settings. To refine the typing nomenclatures used in Salmonella surveillance reports, we foresee an improved typing scheme to be a hybrid that integrates both genomic and antigenic information such that the resolution from WGS is leveraged to improve the precision of subpopulation classifications while preserving the common names defined by the WKL scheme. Lastly, we stress the importance of controlled vocabulary integration for typing information in open data settings in order for the global Salmonella population dynamics to be fully trackable.

Impact Statement Salmonella enterica (S. enterica) is a major foodborne pathogen responsible for an annual incidence rate of more than 90 million cases of foodborne illnesses worldwide. To surveil the high order Salmonella lineages, compare disease prevalence across jurisdictions worldwide, and inform risk assessments, in-silico serotyping has been established as the gold standard for typing the bacteria. However, despite previous Salmonella genomic studies reporting discordance between phylogenomic clades and serovars, refinements have yet been made to the serotyping scheme. Here, we analyzed over 180,000 Salmonella genomes representing 723 predicted serovars to subdivide the population into evolutionarily stable clusters in order to propose a stable organization of the Salmonella population structure that can form the basis of a genomic-based typing scheme for the pathogen. We described numerous instances in which genomes between serotypes are more similar than genomes within a serotype to reflect the inconsistencies of subpopulation classifications based on antigenic determinants. Moreover, we found inconsistencies between predicted serovars and reported serovars which highlighted potential errors in existing in-silico serotyping tools and the need to implement controlled vocabularies for reporting Salmonella subtypes in public databases. The findings of our study aim to motivate the future development of a standardized genomic-based typing nomenclature that more accurately captures the natural populations of S. enterica.

Data Summary The assembly accession numbers of the genomes analyzed in this study (n = 204,952) and the associated metadata (e.g. sampling location, collection date, FTP address for retrieval) are documented in Table S1. The GenomeTrakr genomes were retrieved from the National Center for Biological Information GenBank database. The PubMLST genomes were retrieved using the BIGSdb API.

Competing Interest Statement

The authors have declared no competing interest.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted September 30, 2021.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Large-scale comparative genomics of Salmonella enterica to refine the organization of the global Salmonella population structure
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Large-scale comparative genomics of Salmonella enterica to refine the organization of the global Salmonella population structure
Chao Chun Liu, William W.L. Hsiao
bioRxiv 2021.09.30.462489; doi: https://doi.org/10.1101/2021.09.30.462489
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Large-scale comparative genomics of Salmonella enterica to refine the organization of the global Salmonella population structure
Chao Chun Liu, William W.L. Hsiao
bioRxiv 2021.09.30.462489; doi: https://doi.org/10.1101/2021.09.30.462489

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3602)
  • Biochemistry (7567)
  • Bioengineering (5522)
  • Bioinformatics (20782)
  • Biophysics (10325)
  • Cancer Biology (7978)
  • Cell Biology (11635)
  • Clinical Trials (138)
  • Developmental Biology (6602)
  • Ecology (10200)
  • Epidemiology (2065)
  • Evolutionary Biology (13611)
  • Genetics (9539)
  • Genomics (12844)
  • Immunology (7919)
  • Microbiology (19538)
  • Molecular Biology (7657)
  • Neuroscience (42081)
  • Paleontology (308)
  • Pathology (1257)
  • Pharmacology and Toxicology (2201)
  • Physiology (3267)
  • Plant Biology (7038)
  • Scientific Communication and Education (1294)
  • Synthetic Biology (1951)
  • Systems Biology (5426)
  • Zoology (1116)