High throughput sequencing methods and analysis for microbiome research

https://doi.org/10.1016/j.mimet.2013.08.011Get rights and content

Highlights

  • We discuss bioinformatic tools applied to microbiomes.

  • Amplicon, metagenome shotgun and metatranscriptome sequences

  • Stimulate overcoming problems with sensitivity, specificity and interpretation

  • Elucidate how bacteria interact with each other and their environment

  • A new paradigm of microbiology and the organisms that make up most life on Earth

Abstract

High-throughput sequencing technology is rapidly improving in quality, speed and cost. It is therefore becoming more widely used to study whole communities of prokaryotes in many niches. This review discusses these techniques, including nucleic acid extraction from different environments, sample preparation and high-throughput sequencing platforms. We also discuss commonly used and recently developed bioinformatic tools applied to microbiomes, including analyzing amplicon sequences, metagenome shotgun sequences and metatranscriptome sequences. This field is relatively new and rapidly evolving, thus we hope that this review will provide a baseline for understanding these methods of microbiome analyses. Additionally, we seek to stimulate others to solve the many problems that still exist with the sensitivity, specificity and interpretation of high throughput microbiome sequence analysis.

Introduction

Our world is dominated by prokaryotes. The total number of microbial cells on Earth is estimated to be 1030 (Turnbaugh and Gordon, 2008) and in the human body alone, there are up to 100 trillion organisms, which approximately equates to ten times the number of our own human cells (Savage, 1977). There are literally millions of prokaryotic species, though most have not yet been cultivated (Amann et al., 1995). It is likely that there are numerous enzymes and metabolic capabilities not previously described but encoded by the genes of these species. In the human body, bacteria play important roles in modulation of the digestive, endocrine and immune functions. With the advent of more recent culture-independent sequencing based methods, the composition and diversity of the human microbiome is being uncovered.

The earliest direct cloning of environmental microbial DNA was proposed by Lane et al. (1985), while the term ‘metagenome’ was proposed by Handelsman et al. (1998) to describe “the genomes of the total microbiota found in nature”; that is the whole collection of genomic information of all microorganisms in a given environment. With the advancement of technologies such as sequence- and function-based gene screening, high-throughput sequencing and metatranscriptomics, incredible insight has been achieved in studying microbiomes, including those associated with human health and disease (Hess et al., 2011, Qin et al., 2010). In this review, we aim to describe methods for metagenomic and metatranscriptomic studies in the context of the microbiome, and discuss progress and future steps in the field.

Section snippets

Considerations for study design

Three major types of experimental approaches will be discussed in this review. In amplicon sequencing, a particular gene, gene fragment or sequence is amplified and the sequence determined. This is usually done in very highly conserved genes, such as segments of the 16S rRNA gene, in order to determine which organisms are in a sample and how organisms differ with the environment. In metagenome sequencing, the entire DNA in a sample is sequenced to determine which genes are present in the sample

DNA extraction techniques

Microbial genomic DNA extraction and purification serves as the first step of library preparation. Since most sequencing protocols require between nanograms and micrograms of DNA, efficient DNA isolation and purification is critical for downstream sequencing.

Cell lysis and subsequent DNA extraction from certain microbes, especially those living in extreme environments, can be difficult, as these organisms have rigid cell wall structures and also release stable nucleases upon cell lysis.

High throughput sequencing methodologies

The first-generation DNA sequencing technology was developed by Sanger et al. (1977) based on the selective incorporation of chain-terminating dideoxynulceotides and the first automatic sequencing machine (AB370) was produced by Applied Biosystems in 1987 (Liu et al., 2012). The Sanger sequencing technique completed the first bacterial genome sequencing in 1995 (Fleischmann et al., 1995), and constituted the main part of the Human Genome Project in 2001 (Collins et al., 2003), which in turn

Common applications of high-throughput sequencing and bioinformatic tools

High-throughput sequencing techniques produce massive amounts of data and thus to draw any useful conclusions it is necessary to computationally analyze the information. In this section, we discuss the most commonly used methods and tools for targeted amplicon sequencing analysis, shotgun metagenome analysis, and metatranscriptome analysis, and provide some examples of emerging technologies. Fig. 1 summarizes the steps of bioinformatic analyses for each of these experiments.

Perspectives and conclusions

High-throughput sequencing technologies have improved in output and quality, and have become an indispensable tool for an increasingly wide variety of experiments, including in phylogenetic, diagnostic, and ecological contexts. Through these tools, we can gain insight into the composition, activities and dynamics of a wide variety of microbiomes, helping to elucidate how bacteria interact with each other and their environment.

Updates to laboratory and computational tools are ongoing. In the

Acknowledgement

This work was partly funded through the CIHR Vogue team grant.

References (196)

  • R.I. Amann et al.

    Phylogenetic identification and in situ detection of individual microbial cells without cultivation

    Microbiol. Rev.

    (1995)
  • M.J. Anderson

    A new method for non-parametric multivariate analysis of variance

    Aust. Ecol.

    (2001)
  • M. Arumugam et al.

    SmashCommunity: a metagenomic annotation and analysis tool

    Bioinformatics

    (2010)
  • A. Bateman et al.

    The Pfam protein families database

    Nucleic Acids Res.

    (2004)
  • S. Bennett

    Solexa Ltd

    Pharmacogenomics

    (2004)
  • Berka, J., Chen, Y.-J., Leamon, J.H., Lefkowitz, S., Lohman, K.L., Makhijani, V.B., Rothberg, J.M., Sarkis, G.J.,...
  • J. Bowers et al.

    Virtual terminator nucleotides for next-generation DNA sequencing

    Nat. Methods

    (2009)
  • A. Brady et al.

    Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models

    Nat. Methods

    (2009)
  • D. Branton et al.

    The potential and challenges of nanopore sequencing

    Nat. Biotechnol.

    (2008)
  • Y. Cai et al.

    ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time

    Nucleic Acids Res.

    (2011)
  • J.G. Caporaso et al.

    QIIME allows analysis of high-throughput community sequencing data

    Nat. Methods

    (2010)
  • N. Caruccio

    Preparation of next-generation sequencing libraries using Nextera technology: simultaneous DNA fragmentation and adaptor tagging by in vitro transposition

    Methods Mol. Biol.

    (2011)
  • A. Chao

    Nonparametric estimation of the number of classes in a population

    Scand. J. Stat.

    (1984)
  • A. Chao et al.

    Estimating the number of classes via sample coverage

    J. Am. Stat. Assoc.

    (1992)
  • A. Chao et al.

    Abundance-based similarity indices and their estimation when there are unseen species in samples

    Biometrics

    (2006)
  • L. Chen

    Statistical and computational methods for high-throughput sequencing data analysis of alternative splicing

    Stat. Biosci.

    (2012)
  • L. Cheng et al.

    Bayesian estimation of bacterial community composition from 454 sequencing data

    Nucleic Acids Res.

    (2012)
  • J. Chun et al.

    The analysis of oral microbial communities of wild-type and toll-like receptor 2-deficient mice using a 454 GS FLX Titanium pyrosequencer

    BMC Microbiol.

    (2010)
  • H. Cieslinski et al.

    Identification and molecular modeling of a novel lipase from an Antarctic soil metagenomic library

    Pol. J. Microbiol.

    (2009)
  • M.J. Claesson et al.

    Comparison of two next-generation sequencing technologies for resolving highly complex microbiota composition using tandem variable 16S rRNA gene regions

    Nucleic Acids Res.

    (2010)
  • T.A. Clark et al.

    Enhanced 5-methylcytosine detection in single-molecule, real-time sequencing via Tet1 oxidation

    BMC Biol.

    (2013)
  • K.R. Clarke

    Non‐parametric multivariate analyses of changes in community structure

    Aust. J. Ecol.

    (1993)
  • J.R. Cole et al.

    The Ribosomal Database Project: improved alignments and new tools for rRNA analysis

    Nucleic Acids Res.

    (2009)
  • F.S. Collins et al.

    The Human Genome Project: lessons from large-scale biology

    Science

    (2003)
  • H.C. den Bakker et al.

    Comparative genomics of the bacterial genus Listeria: genome evolution is characterized by limited gene acquisition and limited gene loss

    BMC Genomics

    (2010)
  • T.Z. DeSantis et al.

    Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB

    Appl. Environ. Microbiol.

    (2006)
  • N.N. Diaz et al.

    TACOA—taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach

    BMC Bioinforma.

    (2009)
  • Q. Dong et al.

    The microbial communities in male first catch urine are highly similar to those in paired urethral swab specimens

    PLoS One

    (2011)
  • C.J. Duan et al.

    Isolation and partial characterization of novel genes encoding acidic cellulases from metagenomes of buffalo rumens

    J. Appl. Microbiol.

    (2009)
  • R.C. Edgar

    Search and clustering orders of magnitude faster than BLAST

    Bioinformatics

    (2010)
  • R.C. Edgar et al.

    UCHIME improves sensitivity and speed of chimera detection

    Bioinformatics

    (2011)
  • J.W. Efcavitch et al.

    Single-molecule DNA analysis

    Annu. Rev. Anal. Chem. (Palo Alto Calif.)

    (2010)
  • J.J. Egozcue et al.

    Isometric logratio transformations for compositional data analysis

    Math. Geol.

    (2003)
  • J. Eid et al.

    Real-time DNA sequencing from single polymerase molecules

    Science

    (2009)
  • M. Eisenstein

    Oxford Nanopore announcement sets sequencing sector abuzz

    Nat. Biotechnol.

    (2012)
  • A. El Allali et al.

    MGC: a metagenomic gene caller

    BMC Bioinforma.

    (2013)
  • S. Engen et al.

    Estimating similarity of communities: a parametric approach to spatio‐temporal analysis of species diversity

    Ecography

    (2011)
  • K. Faust et al.

    Microbial co-occurrence relationships in the human microbiome

    PLoS Comput. Biol.

    (2012)
  • A.D. Fernandes et al.

    ANOVA-like differential expression (ALDEx) analysis for mixed-population RNA-Seq

    PLoS One

    (2013)
  • M. Ferrer et al.

    Metagenomics for mining new genetic resources of microbial communities

    J. Mol. Microbiol. Biotechnol.

    (2009)
  • Cited by (222)

    View all citing articles on Scopus
    1

    These authors contributed equally to this work.

    View full text