Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

What can we learn from over 100,000 Escherichia coli genomes?

View ORCID ProfileKaleb Abram, View ORCID ProfileZulema Udaondo, View ORCID ProfileCarissa Bleker, View ORCID ProfileVisanu Wanchai, View ORCID ProfileTrudy M. Wassenaar, View ORCID ProfileMichael S. Robeson II, View ORCID ProfileDave W. Ussery
doi: https://doi.org/10.1101/708131
Kaleb Abram
1Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kaleb Abram
Zulema Udaondo
1Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Zulema Udaondo
Carissa Bleker
2The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, TN, USA
3Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Carissa Bleker
Visanu Wanchai
1Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Visanu Wanchai
Trudy M. Wassenaar
4Molecular Microbiology and Genomics Consultants, Zotzenheim, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Trudy M. Wassenaar
Michael S. Robeson II
1Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Michael S. Robeson II
Dave W. Ussery
1Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Dave W. Ussery
  • For correspondence: DWUssery@uams.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

ABSTRACT

The explosion of microbial genome sequences in public databases allows for large-scale population genomic studies of bacterial species, such as Escherichia coli. In this study, we examine and classify more than one hundred thousand E. coli and Shigella genomes. After removing outliers, a semi-automated Mash-based analysis of 10,667 assembled genomes reveals 14 distinct phylogroups. A representative genome or medoid identified for each phylogroup serves as a proxy to classify more than 95,000 unassembled genomes. This analysis shows that most sequenced E. coli genomes belong to 4 phylogroups (A, C, B1 and E2(O157)). Authenticity of the 14 phylogroups described is supported by pangenomic and phylogenetic analyses, which show differences in gene preservation between phylogroups. A phylogenetic tree constructed with 2,613 single copy core genes along with a matrix of phylogenetic profiles is used to confirm that the 14 phylogroups change at different rates of gene gain/loss/duplication. The methodology used in this work is able to identify previously uncharacterized phylogroups in E. coli species. Some of these new phylogroups harbor clonal strains that have undergone a process of genomic adaptation to the acquisition of new genomic elements related to virulence or antibiotic resistance. This is, to our knowledge, the largest E. coli genome dataset analyzed to date and provides valuable insights into the population structure of the species.

Footnotes

  • General restructuring of the results; Figures 2 and 3 revised; author addition

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted January 15, 2020.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
What can we learn from over 100,000 Escherichia coli genomes?
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
What can we learn from over 100,000 Escherichia coli genomes?
Kaleb Abram, Zulema Udaondo, Carissa Bleker, Visanu Wanchai, Trudy M. Wassenaar, Michael S. Robeson II, Dave W. Ussery
bioRxiv 708131; doi: https://doi.org/10.1101/708131
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
What can we learn from over 100,000 Escherichia coli genomes?
Kaleb Abram, Zulema Udaondo, Carissa Bleker, Visanu Wanchai, Trudy M. Wassenaar, Michael S. Robeson II, Dave W. Ussery
bioRxiv 708131; doi: https://doi.org/10.1101/708131

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4085)
  • Biochemistry (8755)
  • Bioengineering (6477)
  • Bioinformatics (23331)
  • Biophysics (11740)
  • Cancer Biology (9144)
  • Cell Biology (13237)
  • Clinical Trials (138)
  • Developmental Biology (7410)
  • Ecology (11364)
  • Epidemiology (2066)
  • Evolutionary Biology (15084)
  • Genetics (10397)
  • Genomics (14006)
  • Immunology (9115)
  • Microbiology (22036)
  • Molecular Biology (8777)
  • Neuroscience (47345)
  • Paleontology (350)
  • Pathology (1420)
  • Pharmacology and Toxicology (2480)
  • Physiology (3703)
  • Plant Biology (8045)
  • Scientific Communication and Education (1431)
  • Synthetic Biology (2207)
  • Systems Biology (6014)
  • Zoology (1249)