PT - JOURNAL ARTICLE AU - Alexandre Almeida AU - Stephen Nayfach AU - Miguel Boland AU - Francesco Strozzi AU - Martin Beracochea AU - Zhou Jason Shi AU - Katherine S. Pollard AU - Donovan H. Parks AU - Philip Hugenholtz AU - Nicola Segata AU - Nikos C. Kyrpides AU - Robert D. Finn TI - A unified sequence catalogue of over 280,000 genomes obtained from the human gut microbiome AID - 10.1101/762682 DP - 2019 Jan 01 TA - bioRxiv PG - 762682 4099 - http://biorxiv.org/content/early/2019/09/19/762682.short 4100 - http://biorxiv.org/content/early/2019/09/19/762682.full AB - Comprehensive reference data is essential for accurate taxonomic and functional characterization of the human gut microbiome. Here we present the Unified Human Gastrointestinal Genome (UHGG) collection, a resource combining 286,997 genomes representing 4,644 prokaryotic species from the human gut. These genomes contain over 625 million protein sequences used to generate the Unified Human Gastrointestinal Protein (UHGP) catalogue, a collection that more than doubles the number of gut protein clusters over the Integrated Gene Catalogue. We find that a large portion of the human gut microbiome remains to be fully explored, with over 70% of the UHGG species lacking cultured representatives, and 40% of the UHGP missing meaningful functional annotations. Intra-species genomic variation analyses revealed a large reservoir of accessory genes and single-nucleotide variants, many of which were specific to individual human populations. These freely available genomic resources should greatly facilitate investigations into the human gut microbiome.