RT Journal Article SR Electronic T1 Scalable microbial strain inference in metagenomic data using StrainFacts JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.02.01.478746 DO 10.1101/2022.02.01.478746 A1 Smith, Byron J. A1 Li, Xiangpeng A1 Shi, Zhou Jason A1 Abate, Adam A1 Pollard, Katherine S. YR 2022 UL http://biorxiv.org/content/early/2022/04/09/2022.02.01.478746.abstract AB While genome databases are nearing a complete catalog of species commonly inhabiting the human gut, their representation of intraspecific diversity is lacking for all but the most abundant and frequently studied taxa. Statistical deconvolution of allele frequencies from shotgun metagenomic data into strain genotypes and relative abundances is a promising approach, but existing methods are limited by computational scalability. Here we introduce StrainFacts, a method for strain deconvolution that enables inference across tens of thousands of metagenomes. We harness a “fuzzy” genotype approximation that makes the underlying graphical model fully differentiable, unlike existing methods. This allows parameter estimates to be optimized with gradient-based methods, speeding up model fitting by two orders of magnitude. A GPU implementation provides additional scalability. Extensive simulations show that StrainFacts can perform strain inference on thousands of metagenomes and has comparable accuracy to more computationally intensive tools. We further validate our strain inferences using single-cell genomic sequencing from a human stool sample. Applying StrainFacts to a collection of more than 10,000 publicly available human stool metagenomes, we quantify patterns of strain diversity, biogeography, and linkage-disequilibrium that agree with and expand on what is known based on existing reference genomes. StrainFacts paves the way for large-scale biogeography and population genetic studies of microbiomes using metagenomic data.Competing Interest StatementKSP is on the scientific advisory board of Phylagen.