Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Discordance between different bioinformatic methods for identifying resistance genes from short-read genomic data, with a focus on Escherichia coli

View ORCID ProfileTimothy J Davies, Jeremy Swan, Anna E Sheppard, Hayleah Pickford, Samuel Lipworth, Manal AbuOun, Matthew Ellington, Philip W Fowler, Susan Hopkins, Katie L Hopkins, Derrick W Crook, Tim EA Peto, Muna F Anjum, A Sarah Walker, Nicole Stoesser
doi: https://doi.org/10.1101/2021.11.03.467004
Timothy J Davies
aNuffield Department of Medicine, Oxford University, Oxford, United Kingdom
bNational Institute for Health Research (NIHR) Health Protection Research Unit on Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Timothy J Davies
  • For correspondence: timothy.davies@ndm.ox.ac.uk nicole.stoesser@ndm.ox.ac.uk
Jeremy Swan
aNuffield Department of Medicine, Oxford University, Oxford, United Kingdom
bNational Institute for Health Research (NIHR) Health Protection Research Unit on Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Anna E Sheppard
aNuffield Department of Medicine, Oxford University, Oxford, United Kingdom
bNational Institute for Health Research (NIHR) Health Protection Research Unit on Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hayleah Pickford
aNuffield Department of Medicine, Oxford University, Oxford, United Kingdom
bNational Institute for Health Research (NIHR) Health Protection Research Unit on Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Samuel Lipworth
aNuffield Department of Medicine, Oxford University, Oxford, United Kingdom
bNational Institute for Health Research (NIHR) Health Protection Research Unit on Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Manal AbuOun
cBacteriology, Animal and Plant Health Agency, Surrey UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Matthew Ellington
bNational Institute for Health Research (NIHR) Health Protection Research Unit on Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford, UK
eAntimicrobial Resistance and Healthcare Associated Infections (AMRHAI) Division, UK Health Security Agency, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Philip W Fowler
aNuffield Department of Medicine, Oxford University, Oxford, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Susan Hopkins
bNational Institute for Health Research (NIHR) Health Protection Research Unit on Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford, UK
eAntimicrobial Resistance and Healthcare Associated Infections (AMRHAI) Division, UK Health Security Agency, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Katie L Hopkins
bNational Institute for Health Research (NIHR) Health Protection Research Unit on Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford, UK
fHCAI, Fungal, AMR, AMU and Sepsis Division, UK Health Security Agency, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Derrick W Crook
aNuffield Department of Medicine, Oxford University, Oxford, United Kingdom
bNational Institute for Health Research (NIHR) Health Protection Research Unit on Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford, UK
dOxford University Hospitals NHS Foundation Trust, Oxford, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tim EA Peto
aNuffield Department of Medicine, Oxford University, Oxford, United Kingdom
bNational Institute for Health Research (NIHR) Health Protection Research Unit on Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford, UK
dOxford University Hospitals NHS Foundation Trust, Oxford, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Muna F Anjum
cBacteriology, Animal and Plant Health Agency, Surrey UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
A Sarah Walker
aNuffield Department of Medicine, Oxford University, Oxford, United Kingdom
bNational Institute for Health Research (NIHR) Health Protection Research Unit on Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nicole Stoesser
aNuffield Department of Medicine, Oxford University, Oxford, United Kingdom
bNational Institute for Health Research (NIHR) Health Protection Research Unit on Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford, UK
dOxford University Hospitals NHS Foundation Trust, Oxford, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: timothy.davies@ndm.ox.ac.uk nicole.stoesser@ndm.ox.ac.uk
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

2. Abstract

Several bioinformatics genotyping algorithms are now commonly used to characterise antimicrobial resistance (AMR) gene profiles in whole genome sequencing (WGS) data, with a view to understanding AMR epidemiology and developing resistance prediction workflows using WGS in clinical settings. Accurately evaluating AMR in Enterobacterales, particularly Escherichia coli, is of major importance, because this is a common pathogen. However, robust comparisons of different genotyping approaches on relevant simulated and large real-life WGS datasets are lacking. Here, we used both simulated datasets and a large set of real E. coli WGS data (n=1818 isolates) to systematically investigate genotyping methods in greater detail.

Simulated constructs and real sequences were processed using four different bioinformatic programs (ABRicate, ARIBA, KmerResistance, and SRST2, run with the ResFinder database) and their outputs compared. For simulations tests where 3,092 AMR gene variants were inserted into random sequence constructs, KmerResistance was correct for all 3,092 simulations, ABRicate for 3,082 (99.7%), ARIBA for 2,927 (94.7%) and SRST2 for 2,120 (68.6%). For simulations tests where two closely related gene variants were inserted into random sequence constructs, ABRicate identified the correct alleles in 11,382/46,279 (25%) of simulations, ARIBA in 2494/46,279 (5%), SRST in 2539/46,279 (5%) and KmerResistance in 38,826/46,279 (84%). In real data, across all methods, 1392/1818 (76%) isolates had discrepant allele calls for at least one gene.

Our evaluations revealed poor performance in scenarios that would be expected to be challenging (e.g. identification of AMR genes at <10x coverage, discriminating between closely related AMR gene sequences), but also identified systematic sequence classification (i.e. naming) errors even in straightforward circumstances, which contributed to 1081/3092 (35%) errors in our most simple simulations and at least 2530/4321 (59%) discrepancies in real data. Further, many of the remaining discrepancies were likely “artefactual” with reporting cut-off differences accounting for at least 1430/4321 (33%) discrepants. Comparing outputs generated by running multiple algorithms on the same dataset can help identify and resolve these artefacts, but ideally new and more robust genotyping algorithms are needed.

Impact statement Whole-genome sequencing is widely used for studying the epidemiology of antimicrobial resistance (AMR) genes in bacteria; however, there is some concern that outputs are highly dependent on the bioinformatics methods used. This work evaluates these concerns in detail by comparing four different, commonly used AMR gene typing methods using large simulated and real datasets. The results highlight performance issues for most methods in at least one of several simulated and real-life scenarios. However most discrepancies between methods were due to differential labelling of the same sequences related to the assumptions made regarding the underlying structure of the reference resistance gene database (i.e. that resistance genes can be easily classified in well-defined groups). This study represents a major advance in quantifying and evaluating the nature of discrepancies between outputs of different AMR typing algorithms, with relevance for historic and future work using these algorithms. Some of the discrepancies can be resolved by choosing methods with fewer assumptions about the reference AMR gene database and manually resolving outputs generated using multiple programs. However, ideally new and better methods are needed.

Repositories Sequencing data are available at the following NCBI BioProject accession number: PRJNA540750.

Competing Interest Statement

The authors have declared no competing interest.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted November 03, 2021.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Discordance between different bioinformatic methods for identifying resistance genes from short-read genomic data, with a focus on Escherichia coli
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Discordance between different bioinformatic methods for identifying resistance genes from short-read genomic data, with a focus on Escherichia coli
Timothy J Davies, Jeremy Swan, Anna E Sheppard, Hayleah Pickford, Samuel Lipworth, Manal AbuOun, Matthew Ellington, Philip W Fowler, Susan Hopkins, Katie L Hopkins, Derrick W Crook, Tim EA Peto, Muna F Anjum, A Sarah Walker, Nicole Stoesser
bioRxiv 2021.11.03.467004; doi: https://doi.org/10.1101/2021.11.03.467004
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Discordance between different bioinformatic methods for identifying resistance genes from short-read genomic data, with a focus on Escherichia coli
Timothy J Davies, Jeremy Swan, Anna E Sheppard, Hayleah Pickford, Samuel Lipworth, Manal AbuOun, Matthew Ellington, Philip W Fowler, Susan Hopkins, Katie L Hopkins, Derrick W Crook, Tim EA Peto, Muna F Anjum, A Sarah Walker, Nicole Stoesser
bioRxiv 2021.11.03.467004; doi: https://doi.org/10.1101/2021.11.03.467004

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Microbiology
Subject Areas
All Articles
  • Animal Behavior and Cognition (4377)
  • Biochemistry (9568)
  • Bioengineering (7080)
  • Bioinformatics (24814)
  • Biophysics (12594)
  • Cancer Biology (9940)
  • Cell Biology (14318)
  • Clinical Trials (138)
  • Developmental Biology (7940)
  • Ecology (12090)
  • Epidemiology (2067)
  • Evolutionary Biology (15971)
  • Genetics (10911)
  • Genomics (14721)
  • Immunology (9856)
  • Microbiology (23611)
  • Molecular Biology (9468)
  • Neuroscience (50791)
  • Paleontology (369)
  • Pathology (1537)
  • Pharmacology and Toxicology (2677)
  • Physiology (4004)
  • Plant Biology (8651)
  • Scientific Communication and Education (1507)
  • Synthetic Biology (2388)
  • Systems Biology (6419)
  • Zoology (1345)