Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Detection of plasmid contigs in draft genome assemblies using customized Kraken databases

View ORCID ProfileRyota Gomi, View ORCID ProfileKelly L. Wyres, View ORCID ProfileKathryn E. Holt
doi: https://doi.org/10.1101/2020.11.29.402966
Ryota Gomi
1Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Victoria 3004, Australia
2Department of Environmental Engineering, Graduate School of Engineering, Kyoto University, Kyoto, Japan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ryota Gomi
  • For correspondence: gomi.ryota.34v@kyoto-u.jp
Kelly L. Wyres
1Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Victoria 3004, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kelly L. Wyres
Kathryn E. Holt
1Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Victoria 3004, Australia
3London School of Hygiene & Tropical Medicine, London WC1E 7HT, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kathryn E. Holt
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

ABSTRACT

Plasmids play an important role in bacterial evolution and mediate horizontal transfer of genes including virulence and antimicrobial resistance genes. Although short-read sequencing technologies have enabled large-scale bacterial genomics, the resulting draft genome assemblies are often fragmented into hundreds of discrete contigs. Several tools and approaches have been developed to identify plasmid sequences in such assemblies, but require trade-off between sensitivity and specificity. Here we propose using the Kraken classifier, together with a custom Kraken database comprising known chromosomal and plasmid sequences of Klebsiella pneumoniae species complex (KpSC), to identify plasmid-derived contigs in draft assemblies. We assessed performance using Illumina-based draft genome assemblies for 82 KpSC isolates, for which complete genomes were available to supply ground truth. When benchmarked against five other classifiers (Centrifuge, RFPlasmid, mlplasmids, PlaScope, and Platon), Kraken showed balanced performance in terms of overall sensitivity and specificity (90.8% and 99.4%, respectively for contig count; 96.5% and >99.9%, respectively for cumulative contig length), and the highest accuracy (96.8% vs 91.8%-96.6% for contig count; 99.8% vs 99.0%-99.7% for cumulative contig length), and F1-score (94.5% vs 84.5%-94.1%, for contig count; 98.0% vs 88.9%-96.7% for cumulative contig length). Kraken also achieved consistent performance across our genome collection. Furthermore, we demonstrate that expanding the Kraken database with additional known chromosomal and plasmid sequences can further improve classification performance. Although we have focused here on the KpSC, this methodology could easily be applied to other species with a sufficient number of completed genomes.

IMPACT STATEMENT The assembly of bacterial genomes using short-read data often results in hundreds of discrete contigs due to the presence of repeat sequences in those genomes. Separating plasmid contigs from chromosomal contigs in such assemblies is required, e.g., to assess the mobility of antimicrobial resistance genes. Although several tools have been developed for that purpose, they often suffer from low sensitivity or specificity. Here, we propose that the Kraken classifier coupled with a custom Kraken database comprising plasmid-free chromosomal sequences and complete plasmid sequences can be used for detection of plasmid contigs in draft genome assemblies. We showed that Kraken achieved balanced and higher performance compared with other methods (Centrifuge, RFPlasmid, mlplasmids, PlaScope, and Platon). We therefore consider that the Kraken classifier can be the best option for predicting the origin of contigs for species with a suitable number of completed chromosomal and plasmid sequences.

DATA SUMMARY Table S1: Complete chromosomes used for creating the base Kraken database. Plasmid-free chromosomal sequences and complete plasmid sequences used for creating the base Kraken database are also available via Figshare at https://doi.org/10.6084/m9.figshare.13289564.

Table S2: Sequence data used for benchmarking. Draft assemblies of these 82 KpSC strains are available via Figshare at https://doi.org/10.6084/m9.figshare.13553432. The corresponding sequence read files and complete genomes were deposited in the NCBI SRA and GenBank under BioProjects PRJEB6891, PRJNA351909, PRJNA486877, and PRJNA646837 (individual BioSample IDs listed in Table S2).

Kraken output files are available via Figshare at https://doi.org/10.6084/m9.figshare.13553789.

Competing Interest Statement

The authors have declared no competing interest.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted February 08, 2021.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Detection of plasmid contigs in draft genome assemblies using customized Kraken databases
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Detection of plasmid contigs in draft genome assemblies using customized Kraken databases
Ryota Gomi, Kelly L. Wyres, Kathryn E. Holt
bioRxiv 2020.11.29.402966; doi: https://doi.org/10.1101/2020.11.29.402966
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Detection of plasmid contigs in draft genome assemblies using customized Kraken databases
Ryota Gomi, Kelly L. Wyres, Kathryn E. Holt
bioRxiv 2020.11.29.402966; doi: https://doi.org/10.1101/2020.11.29.402966

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Microbiology
Subject Areas
All Articles
  • Animal Behavior and Cognition (2536)
  • Biochemistry (4983)
  • Bioengineering (3487)
  • Bioinformatics (15242)
  • Biophysics (6914)
  • Cancer Biology (5404)
  • Cell Biology (7756)
  • Clinical Trials (138)
  • Developmental Biology (4543)
  • Ecology (7162)
  • Epidemiology (2059)
  • Evolutionary Biology (10240)
  • Genetics (7522)
  • Genomics (9803)
  • Immunology (4869)
  • Microbiology (13250)
  • Molecular Biology (5151)
  • Neuroscience (29496)
  • Paleontology (203)
  • Pathology (838)
  • Pharmacology and Toxicology (1468)
  • Physiology (2143)
  • Plant Biology (4759)
  • Scientific Communication and Education (1013)
  • Synthetic Biology (1339)
  • Systems Biology (4015)
  • Zoology (770)