PT - JOURNAL ARTICLE AU - Younhun Kim AU - Colin J. Worby AU - Sawal Acharya AU - Lucas R. van Dijk AU - Daniel Alfonsetti AU - Zackary Gromko AU - Philippe Azimzadeh AU - Karen Dodson AU - Georg Gerber AU - Scott Hultgren AU - Ashlee M. Earl AU - Bonnie Berger AU - Travis E. Gibson TI - Strain Tracking with Uncertainty Quantification AID - 10.1101/2023.01.25.525531 DP - 2023 Jan 01 TA - bioRxiv PG - 2023.01.25.525531 4099 - http://biorxiv.org/content/early/2023/01/26/2023.01.25.525531.short 4100 - http://biorxiv.org/content/early/2023/01/26/2023.01.25.525531.full AB - The ability to detect and quantify microbiota over time has a plethora of clinical, basic science, and public health applications. One of the primary means of tracking microbiota is through sequencing technologies. When the microorganism of interest is well characterized or known a priori, targeted sequencing is often used. In many applications, however, untargeted bulk (shotgun) sequencing is more appropriate; for instance, the tracking of infection transmission events and nucleotide variants across multiple genomic loci, or studying the role of multiple genes in a particular phenotype. Given these applications, and the observation that pathogens (e.g. Clostridioides difficile, Escherichia coli, Salmonella enterica) and other taxa of interest can reside at low relative abundance in the gastrointestinal tract, there is a critical need for algorithms that accurately track low-abundance taxa with strain level resolution. Here we present a sequence quality- and time-aware model, ChronoStrain, that introduces uncertainty quantification to gauge low-abundance species and significantly outperforms the current state-of-the-art on both real and synthetic data. ChronoStrain leverages sequences’ quality scores and the samples’ temporal information to produce a probability distribution over abundance trajectories for each strain tracked in the model. We demonstrate Chronostrain’s improved performance in capturing post-antibiotic E. coli strain blooms among women with recurrent urinary tract infections (UTIs) from the UTI Microbiome (UMB) Project. Other strain tracking models on the same data either show inconsistent temporal colonization or can only track consistently using very coarse groupings. In contrast, our probabilistic outputs can reveal the relationship between low-confidence strains present in the sample that cannot be reliably assigned a single reference label (either due to poor coverage or novelty) while simultaneously calling high-confidence strains that can be unambiguously assigned a label. We also include and analyze newly sequenced cultured samples from the UMB Project.Competing Interest StatementGeorg K. Gerber is a shareholder in ParetoBio, Inc. His interests were reviewed and are managed by Brigham and Women's Hospital and Mass General Brigham in accordance with their conflict of interest policies.