Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Automated and accurate estimation of gene family abundance from shotgun metagenomes

Stephen Nayfach, Patrick H. Bradley, Stacia K. Wyman, Timothy J. Laurent, Alex Williams, Jonathan A. Eisen, Katherine S. Pollard, Thomas J. Sharpton
doi: https://doi.org/10.1101/022335
Stephen Nayfach
1Gladstone Institute of Cardiovascular Disease, San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Patrick H. Bradley
1Gladstone Institute of Cardiovascular Disease, San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stacia K. Wyman
1Gladstone Institute of Cardiovascular Disease, San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Timothy J. Laurent
1Gladstone Institute of Cardiovascular Disease, San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alex Williams
1Gladstone Institute of Cardiovascular Disease, San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jonathan A. Eisen
2Department of Evolution and Ecology, Department of Medical Microbiology and Immunology, UC Davis Genome Center, University of California, Davis
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Katherine S. Pollard
1Gladstone Institute of Cardiovascular Disease, San Francisco, CA, USA
3Department of Epidemiology & Biostatistics and Institute for Human Genetics, University of California, San Francisco, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: kpollard@gladstone.ucsf.edu thomas.sharpton@oregonstate.edu
Thomas J. Sharpton
1Gladstone Institute of Cardiovascular Disease, San Francisco, CA, USA
4Department of Microbiology, Department of Statistics, Oregon State University, Corvallis, OR, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: kpollard@gladstone.ucsf.edu thomas.sharpton@oregonstate.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Shotgun metagenomic DNA sequencing is a widely applicable tool for characterizing the functions that are encoded by microbial communities. Several bioinformatic tools can be used to functionally annotate metagenomes, allowing researchers to draw inferences about the functional potential of the community and to identify putative functional biomarkers. However, little is known about how decisions made during annotation affect the reliability of the results. Here, we use statistical simulations to rigorously assess how to optimize annotation accuracy and speed, given parameters of the input data like read length and library size. We identify best practices in metagenome annotation and use them to guide the development of the Shotgun Metagenome Annotation Pipeline (ShotMAP). ShotMAP is an analytically flexible, end-to-end annotation pipeline that can be implemented either on a local computer or a cloud compute cluster. We use ShotMAP to assess how different annotation databases impact the interpretation of how marine metagenome and metatranscriptome functional capacity changes across seasons. We also apply ShotMAP to data obtained from a clinical microbiome investigation of inflammatory bowel disease. This analysis finds that gut microbiota collected from Crohn’s disease patients are functionally distinct from gut microbiota collected from either ulcerative colitis patients or healthy controls, with differential abundance of metabolic pathways related to host-microbiome interactions that may serve as putative biomarkers of disease.

Author Summary Microbial communities perform a wide variety of functions, from marine photosynthesis to aiding digestion in the human gut. Shotgun “metagenomic” sequencing can be used to sample millions of short DNA sequences from such communities directly, without needing to first culture its constituents in the laboratory. Using these data, researchers can survey which functions are encoded by mapping these short sequences to known protein families and pathways. Several tools for this annotation already exist. But, annotation is a multi-step process that includes identification of genes in a metagenome and determination of the type of protein each gene encodes. We currently know little about how different choices of parameters during annotation influences the final results. In this work, we systematically test how several key decisions affect the accuracy and speed of annotation, and based on these results, develop new software for annotation, which we named ShotMAP. We then use ShotMAP to functionally characterize marine communities and gut communities in a clinical cohort of inflammatory bowel disease. We find several functions are differentially represented in the gut microbiome of Crohn’s disease patients, which could be candidates for biomarkers and could also offer insight into the pathophysiology of Crohn’s. ShotMAP is freely available (https://github.com/sharpton/shotmap).

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted July 10, 2015.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Automated and accurate estimation of gene family abundance from shotgun metagenomes
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Automated and accurate estimation of gene family abundance from shotgun metagenomes
Stephen Nayfach, Patrick H. Bradley, Stacia K. Wyman, Timothy J. Laurent, Alex Williams, Jonathan A. Eisen, Katherine S. Pollard, Thomas J. Sharpton
bioRxiv 022335; doi: https://doi.org/10.1101/022335
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Automated and accurate estimation of gene family abundance from shotgun metagenomes
Stephen Nayfach, Patrick H. Bradley, Stacia K. Wyman, Timothy J. Laurent, Alex Williams, Jonathan A. Eisen, Katherine S. Pollard, Thomas J. Sharpton
bioRxiv 022335; doi: https://doi.org/10.1101/022335

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One
Subject Areas
All Articles
  • Animal Behavior and Cognition (4113)
  • Biochemistry (8815)
  • Bioengineering (6518)
  • Bioinformatics (23462)
  • Biophysics (11789)
  • Cancer Biology (9208)
  • Cell Biology (13322)
  • Clinical Trials (138)
  • Developmental Biology (7436)
  • Ecology (11409)
  • Epidemiology (2066)
  • Evolutionary Biology (15150)
  • Genetics (10436)
  • Genomics (14043)
  • Immunology (9171)
  • Microbiology (22154)
  • Molecular Biology (8812)
  • Neuroscience (47568)
  • Paleontology (350)
  • Pathology (1428)
  • Pharmacology and Toxicology (2491)
  • Physiology (3730)
  • Plant Biology (8080)
  • Scientific Communication and Education (1437)
  • Synthetic Biology (2221)
  • Systems Biology (6037)
  • Zoology (1253)