Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Strain Tracking with Uncertainty Quantification

View ORCID ProfileYounhun Kim, Colin J. Worby, Sawal Acharya, View ORCID ProfileLucas R. van Dijk, Daniel Alfonsetti, Zackary Gromko, Philippe Azimzadeh, Karen Dodson, View ORCID ProfileGeorg Gerber, Scott Hultgren, Ashlee M. Earl, Bonnie Berger, View ORCID ProfileTravis E. Gibson
doi: https://doi.org/10.1101/2023.01.25.525531
Younhun Kim
1Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA
2Department of Pathology, Brigham and Women’s Hospital, Boston MA, USA
3Infectious Disease and Microbiome Program, Broad Institute, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Younhun Kim
Colin J. Worby
3Infectious Disease and Microbiome Program, Broad Institute, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sawal Acharya
2Department of Pathology, Brigham and Women’s Hospital, Boston MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lucas R. van Dijk
3Infectious Disease and Microbiome Program, Broad Institute, Cambridge, MA, USA
4Delft Bioinformatics Lab, Delft University of Technology, Delft, 2628 XE, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Lucas R. van Dijk
Daniel Alfonsetti
5Computer Science and AI Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zackary Gromko
5Computer Science and AI Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Philippe Azimzadeh
6Department of Molecular Microbiology and Center for Women’s Infectious Disease Research, Washington University School of Medicine, St. Louis, MO, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Karen Dodson
6Department of Molecular Microbiology and Center for Women’s Infectious Disease Research, Washington University School of Medicine, St. Louis, MO, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Georg Gerber
2Department of Pathology, Brigham and Women’s Hospital, Boston MA, USA
7Harvard Medical School, Boston, MA USA
8Harvard-MIT Health Sciences and Technology, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Georg Gerber
Scott Hultgren
6Department of Molecular Microbiology and Center for Women’s Infectious Disease Research, Washington University School of Medicine, St. Louis, MO, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ashlee M. Earl
3Infectious Disease and Microbiome Program, Broad Institute, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bonnie Berger
1Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA
3Infectious Disease and Microbiome Program, Broad Institute, Cambridge, MA, USA
5Computer Science and AI Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
8Harvard-MIT Health Sciences and Technology, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Travis E. Gibson
2Department of Pathology, Brigham and Women’s Hospital, Boston MA, USA
3Infectious Disease and Microbiome Program, Broad Institute, Cambridge, MA, USA
5Computer Science and AI Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
7Harvard Medical School, Boston, MA USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Travis E. Gibson
  • For correspondence: tegibson@bwh.harvard.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

The ability to detect and quantify microbiota over time has a plethora of clinical, basic science, and public health applications. One of the primary means of tracking microbiota is through sequencing technologies. When the microorganism of interest is well characterized or known a priori, targeted sequencing is often used. In many applications, however, untargeted bulk (shotgun) sequencing is more appropriate; for instance, the tracking of infection transmission events and nucleotide variants across multiple genomic loci, or studying the role of multiple genes in a particular phenotype. Given these applications, and the observation that pathogens (e.g. Clostridioides difficile, Escherichia coli, Salmonella enterica) and other taxa of interest can reside at low relative abundance in the gastrointestinal tract, there is a critical need for algorithms that accurately track low-abundance taxa with strain level resolution. Here we present a sequence quality- and time-aware model, ChronoStrain, that introduces uncertainty quantification to gauge low-abundance species and significantly outperforms the current state-of-the-art on both real and synthetic data. ChronoStrain leverages sequences’ quality scores and the samples’ temporal information to produce a probability distribution over abundance trajectories for each strain tracked in the model. We demonstrate Chronostrain’s improved performance in capturing post-antibiotic E. coli strain blooms among women with recurrent urinary tract infections (UTIs) from the UTI Microbiome (UMB) Project. Other strain tracking models on the same data either show inconsistent temporal colonization or can only track consistently using very coarse groupings. In contrast, our probabilistic outputs can reveal the relationship between low-confidence strains present in the sample that cannot be reliably assigned a single reference label (either due to poor coverage or novelty) while simultaneously calling high-confidence strains that can be unambiguously assigned a label. We also include and analyze newly sequenced cultured samples from the UMB Project.

Competing Interest Statement

Georg K. Gerber is a shareholder in ParetoBio, Inc. His interests were reviewed and are managed by Brigham and Women's Hospital and Mass General Brigham in accordance with their conflict of interest policies.

Footnotes

  • ↵✉ bab{at}mit.edu or tegibson{at}bwh.harvard.edu

  • https://www.ncbi.nlm.nih.gov/bioproject/PRJNA400628/

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted January 26, 2023.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Strain Tracking with Uncertainty Quantification
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Strain Tracking with Uncertainty Quantification
Younhun Kim, Colin J. Worby, Sawal Acharya, Lucas R. van Dijk, Daniel Alfonsetti, Zackary Gromko, Philippe Azimzadeh, Karen Dodson, Georg Gerber, Scott Hultgren, Ashlee M. Earl, Bonnie Berger, Travis E. Gibson
bioRxiv 2023.01.25.525531; doi: https://doi.org/10.1101/2023.01.25.525531
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Strain Tracking with Uncertainty Quantification
Younhun Kim, Colin J. Worby, Sawal Acharya, Lucas R. van Dijk, Daniel Alfonsetti, Zackary Gromko, Philippe Azimzadeh, Karen Dodson, Georg Gerber, Scott Hultgren, Ashlee M. Earl, Bonnie Berger, Travis E. Gibson
bioRxiv 2023.01.25.525531; doi: https://doi.org/10.1101/2023.01.25.525531

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4234)
  • Biochemistry (9128)
  • Bioengineering (6775)
  • Bioinformatics (23993)
  • Biophysics (12117)
  • Cancer Biology (9525)
  • Cell Biology (13776)
  • Clinical Trials (138)
  • Developmental Biology (7631)
  • Ecology (11690)
  • Epidemiology (2066)
  • Evolutionary Biology (15506)
  • Genetics (10640)
  • Genomics (14322)
  • Immunology (9479)
  • Microbiology (22832)
  • Molecular Biology (9089)
  • Neuroscience (48987)
  • Paleontology (355)
  • Pathology (1481)
  • Pharmacology and Toxicology (2568)
  • Physiology (3844)
  • Plant Biology (8328)
  • Scientific Communication and Education (1471)
  • Synthetic Biology (2296)
  • Systems Biology (6187)
  • Zoology (1300)