TY - JOUR T1 - The single-species metagenome: subtyping <em>Staphylococcus aureus</em> core genome sequences from shotgun metagenomic data JF - bioRxiv DO - 10.1101/030692 SP - 030692 AU - Sandeep J. Joseph AU - Ben Li AU - Robert A. Petit III AU - Zhaohui S. Qin AU - Lyndsey A. Darrow AU - Timothy D. Read Y1 - 2015/01/01 UR - http://biorxiv.org/content/early/2015/11/05/030692.abstract N2 - Metagenome shotgun sequence projects offer the potential for large scale biogeographic analysis of microbial species. In this project we developed a method for detecting 33 common subtypes of the pathogenic bacterium Staphylococcus aureus. We used a binomial mixture model implemented in the binstrain software and the coverage counts at &gt; 100,000 known S. aureus SNP (single nucleotide polymorphism) sites derived from prior comparative genomic analysis to estimate the proportion of each subtype in metagenome samples. Using this pipeline we were able to obtain &gt; 87% sensitivity and &gt; 94% specificity when testing on low genome coverage samples of diverse S. aureus strains (0.025X). We found that 321 and 149 metagenome samples from the Human Microbiome Project and metaSUB analysis of the New York City subway, respectively, contained S. aureus at genome coverage &gt; 0.025. In both projects, CC8 and CC30 were the most common S. aureus subtypes encountered. We found evidence that the subtype composition at different body sites of the same individual were more similar than random sampling and more limited evidence that certain body sites were enriched for particular subtypes. One surprising finding was the apparent high frequency of CC398, a lineage associated with livestock, in samples from the tongue dorsum. Epidemiologic analysis of the HMP subject population suggested that high BMI (body mass index) and health insurance are risk factors for S. aureus but there was limited power to find factors linked to carriage of even the most common subtype. In the NYC subway data, we found a small signal of geographic distance affecting subtype clustering but other unknown factors influence taxonomic distribution of the species around the city. We argue that pathogen detection in metagenome samples requires the use of subtypes based on whole species population genomic analysis rather than using ad hoc collections of reference strains. ER -