Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

A unified sequence catalogue of over 280,000 genomes obtained from the human gut microbiome

View ORCID ProfileAlexandre Almeida, Stephen Nayfach, Miguel Boland, Francesco Strozzi, Martin Beracochea, Zhou Jason Shi, Katherine S. Pollard, Donovan H. Parks, View ORCID ProfilePhilip Hugenholtz, View ORCID ProfileNicola Segata, View ORCID ProfileNikos C. Kyrpides, View ORCID ProfileRobert D. Finn
doi: https://doi.org/10.1101/762682
Alexandre Almeida
1European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
2Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Alexandre Almeida
  • For correspondence: aalmeida@ebi.ac.uk rdf@ebi.ac.uk
Stephen Nayfach
3U. S. Department of Energy Joint Genome Institute, Walnut Creek, California, USA
4Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Miguel Boland
1European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Francesco Strozzi
5Enterome Bioscience, Paris, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Martin Beracochea
1European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zhou Jason Shi
6Gladstone Institutes, San Francisco, CA, USA
7Chan-Zuckerberg Biohub, San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Katherine S. Pollard
6Gladstone Institutes, San Francisco, CA, USA
7Chan-Zuckerberg Biohub, San Francisco, CA, USA
8Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
9Institute for Computational Health Sciences, University of California San Francisco, San Francisco, CA, USA
10Quantitative Biology Institute, University of California San Francisco, San Francisco, CA, USA
11Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Donovan H. Parks
12Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Queensland, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Philip Hugenholtz
12Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Queensland, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Philip Hugenholtz
Nicola Segata
13CIBIO Department, University of Trento, Trento, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nicola Segata
Nikos C. Kyrpides
3U. S. Department of Energy Joint Genome Institute, Walnut Creek, California, USA
4Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nikos C. Kyrpides
Robert D. Finn
1European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Robert D. Finn
  • For correspondence: aalmeida@ebi.ac.uk rdf@ebi.ac.uk
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Comprehensive reference data is essential for accurate taxonomic and functional characterization of the human gut microbiome. Here we present the Unified Human Gastrointestinal Genome (UHGG) collection, a resource combining 286,997 genomes representing 4,644 prokaryotic species from the human gut. These genomes contain over 625 million protein sequences used to generate the Unified Human Gastrointestinal Protein (UHGP) catalogue, a collection that more than doubles the number of gut protein clusters over the Integrated Gene Catalogue. We find that a large portion of the human gut microbiome remains to be fully explored, with over 70% of the UHGG species lacking cultured representatives, and 40% of the UHGP missing meaningful functional annotations. Intra-species genomic variation analyses revealed a large reservoir of accessory genes and single-nucleotide variants, many of which were specific to individual human populations. These freely available genomic resources should greatly facilitate investigations into the human gut microbiome.

Footnotes

  • http://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_genomes

  • https://www.ebi.ac.uk/metagenomics/genomes

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted September 19, 2019.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A unified sequence catalogue of over 280,000 genomes obtained from the human gut microbiome
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A unified sequence catalogue of over 280,000 genomes obtained from the human gut microbiome
Alexandre Almeida, Stephen Nayfach, Miguel Boland, Francesco Strozzi, Martin Beracochea, Zhou Jason Shi, Katherine S. Pollard, Donovan H. Parks, Philip Hugenholtz, Nicola Segata, Nikos C. Kyrpides, Robert D. Finn
bioRxiv 762682; doi: https://doi.org/10.1101/762682
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
A unified sequence catalogue of over 280,000 genomes obtained from the human gut microbiome
Alexandre Almeida, Stephen Nayfach, Miguel Boland, Francesco Strozzi, Martin Beracochea, Zhou Jason Shi, Katherine S. Pollard, Donovan H. Parks, Philip Hugenholtz, Nicola Segata, Nikos C. Kyrpides, Robert D. Finn
bioRxiv 762682; doi: https://doi.org/10.1101/762682

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Microbiology
Subject Areas
All Articles
  • Animal Behavior and Cognition (3607)
  • Biochemistry (7581)
  • Bioengineering (5529)
  • Bioinformatics (20809)
  • Biophysics (10338)
  • Cancer Biology (7988)
  • Cell Biology (11647)
  • Clinical Trials (138)
  • Developmental Biology (6611)
  • Ecology (10217)
  • Epidemiology (2065)
  • Evolutionary Biology (13630)
  • Genetics (9550)
  • Genomics (12854)
  • Immunology (7925)
  • Microbiology (19555)
  • Molecular Biology (7668)
  • Neuroscience (42147)
  • Paleontology (308)
  • Pathology (1258)
  • Pharmacology and Toxicology (2203)
  • Physiology (3269)
  • Plant Biology (7051)
  • Scientific Communication and Education (1294)
  • Synthetic Biology (1952)
  • Systems Biology (5429)
  • Zoology (1119)