Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Cluster-specific gene markers enhance Shigella and Enteroinvasive Escherichia coli in silico serotyping

Xiaomei Zhang, View ORCID ProfileMichael Payne, Thanh Nguyen, Sandeep Kaur, Ruiting Lan
doi: https://doi.org/10.1101/2021.01.30.428723
Xiaomei Zhang
1School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael Payne
1School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Michael Payne
Thanh Nguyen
1School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sandeep Kaur
1School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ruiting Lan
1School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: r.lan@unsw.edu.au
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Shigella and enteroinvasive Escherichia coli (EIEC) cause human bacillary dysentery with similar invasion mechanisms and share similar physiological, biochemical and genetic characteristics. The ability to differentiate Shigella and EIEC from each other is important for clinical diagnostic and epidemiologic investigations. The existing genetic signatures may not discriminate between Shigella and EIEC. However, phylogenetically, Shigella and EIEC strains are composed of multiple clusters and are different forms of E. coli. In this study, we identified 10 Shigella clusters, 7 EIEC clusters and 53 sporadic types of EIEC by examining over 17,000 publicly available Shigella/EIEC genomes. We compared Shigella and EIEC accessory genomes to identify the cluster-specific gene markers or marker sets for the 17 clusters and 53 sporadic types. The gene markers showed 99.63% accuracy and more than 97.02% specificity. In addition, we developed a freely available in silico serotyping pipeline named Shigella EIEC Cluster Enhanced Serotype Finder (ShigEiFinder) by incorporating the cluster-specific gene markers and established Shigella/EIEC serotype specific O antigen genes and modification genes into typing. ShigEiFinder can process either paired end Illumina sequencing reads or assembled genomes and almost perfectly differentiated Shigella from EIEC with 99.70% and 99.81% cluster assignment accuracy for the assembled genomes and mapped reads respectively. ShigEiFinder was able to serotype over 59 Shigella serotypes and 22 EIEC serotypes and provided a high specificity with 99.40% for assembled genomes and 99.38% for mapped reads for serotyping. The cluster markers and our new serotyping tool, ShigEiFinder (https://github.com/LanLab/ShigEiFinder), will be useful for epidemiologic and diagnostic investigations.

Impact statement The differentiation of Shigella strains from enteroinvasive E. coli (EIEC) is important for clinical diagnosis and public health epidemiologic investigations. The similarities between Shigella and EIEC strains make this differentiation very difficult as both share common ancestries within E. coli. However, Shigella and EIEC are phylogenetically separated into multiple clusters, making high resolution separation using cluster specific genomic markers possible. In this study, we identified 17 Shigella or EIEC clusters including five that were newly identified through examination of over 17,000 publicly available Shigella and EIEC genomes. We further identified an individual or a set of cluster-specific gene markers for each cluster using comparative genomic analysis. These markers can then be used to classify isolates into clusters and were used to develop an in silico pipeline, ShigEiFinder (https://github.com/LanLab/ShigEiFinder) for accurate differentiation, cluster typing and serotyping of Shigella and EIEC from Illumina sequencing reads or assembled genomes. This study will have broad application from understanding the evolution of Shigella/EIEC to diagnosis and epidemiology.

Data summary Sequencing data have been deposited at the National Center for Biotechnology Information under BioProject number PRJNA692536.

Repositories Raw sequence data are available from NCBI under the BioProject number PRJNA692536.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • 1, Title “Cluster-specific gene marker enhance Shigella and Enteroinvasive Escherichia coli in silico serotyping” updated to “Cluster-specific gene markers enhance Shigella and Enteroinvasive Escherichia coli in silico serotyping” 2, References have been fixed.

  • Abbreviations
    SS
    Shigella sonnei
    SF
    Shigella flexneri
    SB
    Shigella boydii
    SD
    Shigella dysenteriae
    EIEC
    Enteroinvasive Escherichia coli
    NCBI SRA
    National Center for Biotechnology Information Sequence Read Archive
    ST
    sequence type
    rST
    ribosomal ST
    MLST
    Multilocus sequence typing
    rMLST
    Ribosomal MLST
    ECOR
    Escherichia coli reference collection
    WGS
    wholegenome sequencing
    TP
    true positive
    FN
    false negative
    FP
    false positive
    HK
    House Keeping
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
    Back to top
    PreviousNext
    Posted February 01, 2021.
    Download PDF

    Supplementary Material

    Email

    Thank you for your interest in spreading the word about bioRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    Cluster-specific gene markers enhance Shigella and Enteroinvasive Escherichia coli in silico serotyping
    (Your Name) has forwarded a page to you from bioRxiv
    (Your Name) thought you would like to see this page from the bioRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    Cluster-specific gene markers enhance Shigella and Enteroinvasive Escherichia coli in silico serotyping
    Xiaomei Zhang, Michael Payne, Thanh Nguyen, Sandeep Kaur, Ruiting Lan
    bioRxiv 2021.01.30.428723; doi: https://doi.org/10.1101/2021.01.30.428723
    Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
    Citation Tools
    Cluster-specific gene markers enhance Shigella and Enteroinvasive Escherichia coli in silico serotyping
    Xiaomei Zhang, Michael Payne, Thanh Nguyen, Sandeep Kaur, Ruiting Lan
    bioRxiv 2021.01.30.428723; doi: https://doi.org/10.1101/2021.01.30.428723

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Microbiology
    Subject Areas
    All Articles
    • Animal Behavior and Cognition (3689)
    • Biochemistry (7797)
    • Bioengineering (5676)
    • Bioinformatics (21290)
    • Biophysics (10578)
    • Cancer Biology (8176)
    • Cell Biology (11945)
    • Clinical Trials (138)
    • Developmental Biology (6763)
    • Ecology (10401)
    • Epidemiology (2065)
    • Evolutionary Biology (13867)
    • Genetics (9708)
    • Genomics (13073)
    • Immunology (8146)
    • Microbiology (20014)
    • Molecular Biology (7853)
    • Neuroscience (43058)
    • Paleontology (320)
    • Pathology (1279)
    • Pharmacology and Toxicology (2258)
    • Physiology (3353)
    • Plant Biology (7232)
    • Scientific Communication and Education (1312)
    • Synthetic Biology (2006)
    • Systems Biology (5538)
    • Zoology (1128)