Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Hostile: accurate host decontamination of microbial sequences

View ORCID ProfileBede Constantinides, View ORCID ProfileMartin Hunt, View ORCID ProfileDerrick W Crook
doi: https://doi.org/10.1101/2023.07.04.547735
Bede Constantinides
1Nuffield Department of Medicine, University of Oxford, Oxford, UK
2The National Institute for Health Research Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford, Oxford, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Bede Constantinides
  • For correspondence: bedeabc@gmail.com
Martin Hunt
1Nuffield Department of Medicine, University of Oxford, Oxford, UK
3EMBL-EBI, Cambridge, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Martin Hunt
Derrick W Crook
1Nuffield Department of Medicine, University of Oxford, Oxford, UK
2The National Institute for Health Research Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford, Oxford, UK
4The National Institute for Health Research Oxford Biomedical Research Centre, University of Oxford, Oxford, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Derrick W Crook
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Motivation Microbial sequences generated from clinical samples are often contaminated with human host sequences that must be removed for ethical and legal reasons. Care must be taken to excise host sequences without inadvertently removing target microbial sequences to the detriment of downstream analyses such as variant calling and de novo assembly.

Results To facilitate accurate host decontamination of both short and long sequencing reads, we developed Hostile, a tool capable of accurate host read removal using a laptop. We demonstrate that our approach removes at least 99.6% of real human reads and retains at least 99.989% of simulated bacterial reads. Using Hostile with a masked reference genome further increases bacterial read retention (>=99.997%) with negligible (<=0.001%) reduction in human read removal performance. Compared with an existing tool, Hostile removes 21-23% more human short reads and 22-43x fewer bacterial reads with comparable execution time.

Availability and implementation Hostile is implemented as an MIT licensed Python package available from https://github.com/bede/hostile together with supplementary material.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • Added evaluation of diverse human genomes representing all 26 populations in the expanded 1000 Genomes Prohect cohort; incorporated feedback from Martin Hunt; added coauthor Martin Hunt; revised benchmarks to address defect in HRRT's handling of paired reads; added description of built-in masking utility. Manuscript and supplementary updated.

  • https://github.com/bede/hostile

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted July 21, 2023.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Hostile: accurate host decontamination of microbial sequences
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Hostile: accurate host decontamination of microbial sequences
Bede Constantinides, Martin Hunt, Derrick W Crook
bioRxiv 2023.07.04.547735; doi: https://doi.org/10.1101/2023.07.04.547735
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Hostile: accurate host decontamination of microbial sequences
Bede Constantinides, Martin Hunt, Derrick W Crook
bioRxiv 2023.07.04.547735; doi: https://doi.org/10.1101/2023.07.04.547735

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4675)
  • Biochemistry (10347)
  • Bioengineering (7659)
  • Bioinformatics (26307)
  • Biophysics (13505)
  • Cancer Biology (10672)
  • Cell Biology (15424)
  • Clinical Trials (138)
  • Developmental Biology (8490)
  • Ecology (12808)
  • Epidemiology (2067)
  • Evolutionary Biology (16835)
  • Genetics (11383)
  • Genomics (15471)
  • Immunology (10603)
  • Microbiology (25186)
  • Molecular Biology (10211)
  • Neuroscience (54399)
  • Paleontology (400)
  • Pathology (1667)
  • Pharmacology and Toxicology (2889)
  • Physiology (4334)
  • Plant Biology (9237)
  • Scientific Communication and Education (1586)
  • Synthetic Biology (2556)
  • Systems Biology (6774)
  • Zoology (1461)