Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes

Michelle L. Treiber, View ORCID ProfileDiana H. Taft, View ORCID ProfileIan Korf, View ORCID ProfileDavid A. Mills, View ORCID ProfileDanielle G. Lemay
doi: https://doi.org/10.1101/760207
Michelle L. Treiber
USDA ARS Western Human Nutrition Research Center, Davis, CA 95616Department of Food Science and Technology, Robert Mondavi Institute for Wine and Food Science, University of California, Davis, One Shields Ave., Davis, CA 95616
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Diana H. Taft
Department of Food Science and Technology, Robert Mondavi Institute for Wine and Food Science, University of California, Davis, One Shields Ave., Davis, CA 95616
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Diana H. Taft
Ian Korf
Genome Center, University of California, Davis, California 95616
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ian Korf
David A. Mills
Department of Food Science and Technology, Robert Mondavi Institute for Wine and Food Science, University of California, Davis, One Shields Ave., Davis, CA 95616
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for David A. Mills
Danielle G. Lemay
USDA ARS Western Human Nutrition Research Center, Davis, CA 95616Genome Center, University of California, Davis, California 95616Department of Nutrition, University of California, Davis, California 95616
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Danielle G. Lemay
  • For correspondence: Danielle.Lemay@usda.gov
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Background Shotgun metagenomes are often assembled prior to annotation of genes which biases the functional capacity of a community towards its most abundant members. For an unbiased assessment of community function, short reads need to be mapped directly to a gene or protein database. The ability to detect genes in short read sequences is dependent on pre- and post-sequencing decisions. The objective of the current study was to determine how library size selection, read length and format, protein database, e-value threshold, and sequencing depth impact gene-centric analysis of human fecal microbiomes when using DIAMOND, an alignment tool that is up to 20,000 times faster than BLASTX.

Results Using metagenomes simulated from a database of experimentally verified protein sequences, we find that read length, e-value threshold, and the choice of protein database dramatically impact detection of a known target, with best performance achieved with longer reads, stricter e-value thresholds, and a custom database. Using publicly available metagenomes, we evaluated library size selection, paired end read strategy, and sequencing depth. Longer read lengths were acheivable by merging paired ends when the sequencing library was size-selected to enable overlaps. When paired ends could not be merged, a congruent strategy in which both ends are independently mapped was acceptable. Sequencing depths of 5 million merged reads minimized the error of abundance estimates of specific target genes, including an antimicrobial resistance gene.

Conclusions Shotgun metagenomes of DNA extracted from human fecal samples sequenced using the Illumina platform should be size-selected to enable merging of paired end reads and should be sequenced in the PE150 format with a minimum sequencing depth of 5 million merge-able reads to enable detection of specific target genes. Expecting the merged reads to be 180-250bp in length, the appropriate e-value threshold for DIAMOND would then need to be more strict than the default. Accurate and interpretable results for specific hypotheses will be best obtained using small databases customized for the research question.

Footnotes

  • https://github.com/mltreiber/functional_metagenomics

  • List of abbreviations

    DIAMOND
    a BLAST-like sequence aligner that uses translated sequence queries to greatly increase annotation speed over traditional BLAST.
    MG-RAST
    the MetaGenomics Rapid Annotation using Subsystems Technology server, a public analysis pipeline for handling metagenome and metatranscriptome datasets
    PE
    paired end read
    SAMSA2
    Simple Analysis of Metatranscriptomes through Sequence Annotation version 2, a standalone metatranscriptomics analysis pipeline
    SEED
    a protein database that seeks to group sequences into hierarchical categories, created by the Fellowship for Interpretation of Genomes (FIG group)
    SR
    single read
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.
    Back to top
    PreviousNext
    Posted September 08, 2019.
    Download PDF

    Supplementary Material

    Data/Code
    Email

    Thank you for your interest in spreading the word about bioRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes
    (Your Name) has forwarded a page to you from bioRxiv
    (Your Name) thought you would like to see this page from the bioRxiv website.
    Share
    Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes
    Michelle L. Treiber, Diana H. Taft, Ian Korf, David A. Mills, Danielle G. Lemay
    bioRxiv 760207; doi: https://doi.org/10.1101/760207
    Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
    Citation Tools
    Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes
    Michelle L. Treiber, Diana H. Taft, Ian Korf, David A. Mills, Danielle G. Lemay
    bioRxiv 760207; doi: https://doi.org/10.1101/760207

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Bioinformatics
    Subject Areas
    All Articles
    • Animal Behavior and Cognition (1544)
    • Biochemistry (2500)
    • Bioengineering (1757)
    • Bioinformatics (9727)
    • Biophysics (3928)
    • Cancer Biology (2990)
    • Cell Biology (4235)
    • Clinical Trials (135)
    • Developmental Biology (2653)
    • Ecology (4129)
    • Epidemiology (2033)
    • Evolutionary Biology (6931)
    • Genetics (5243)
    • Genomics (6531)
    • Immunology (2207)
    • Microbiology (7012)
    • Molecular Biology (2782)
    • Neuroscience (17410)
    • Paleontology (127)
    • Pathology (432)
    • Pharmacology and Toxicology (712)
    • Physiology (1068)
    • Plant Biology (2515)
    • Scientific Communication and Education (647)
    • Synthetic Biology (835)
    • Systems Biology (2698)
    • Zoology (439)