ABSTRACT
The oral cavity of each person is home for hundreds of bacterial species. While taxa for oral diseases have been well studied using culture-based as well as amplicon sequencing methods, metagenomic and genomic information remain scarce compared to the fecal microbiome. Here we provide metagenomic shotgun data for 3346 oral metagenomics samples, and together with 808 published samples, assemble 56,213 metagenome-assembled genomes (MAGs). 64% of the 3,589 species-level genome bins contained no publicly available genomes, others with only a handful. The resulting genome collection is representative of samples around the world and across physiological conditions, contained many genomes from Candidate phyla radiation (CPR) which lack monoculture, and enabled discovery of new taxa such as a family within the Acholeplasmataceae order. New biomarkers were identified for rheumatoid arthritis or colorectal cancer, which would be more convenient than fecal samples. The large number of metagenomic samples also allowed assembly of many strains from important oral taxa such as Porphyromonas and Neisseria. Predicted functions enrich in drug metabolism and small molecule synthesis. Thus, these data lay down a genomic framework for future inquiries of the human oral microbiome.