High throughput sequencing methods and analysis for microbiome research
Introduction
Our world is dominated by prokaryotes. The total number of microbial cells on Earth is estimated to be 1030 (Turnbaugh and Gordon, 2008) and in the human body alone, there are up to 100 trillion organisms, which approximately equates to ten times the number of our own human cells (Savage, 1977). There are literally millions of prokaryotic species, though most have not yet been cultivated (Amann et al., 1995). It is likely that there are numerous enzymes and metabolic capabilities not previously described but encoded by the genes of these species. In the human body, bacteria play important roles in modulation of the digestive, endocrine and immune functions. With the advent of more recent culture-independent sequencing based methods, the composition and diversity of the human microbiome is being uncovered.
The earliest direct cloning of environmental microbial DNA was proposed by Lane et al. (1985), while the term ‘metagenome’ was proposed by Handelsman et al. (1998) to describe “the genomes of the total microbiota found in nature”; that is the whole collection of genomic information of all microorganisms in a given environment. With the advancement of technologies such as sequence- and function-based gene screening, high-throughput sequencing and metatranscriptomics, incredible insight has been achieved in studying microbiomes, including those associated with human health and disease (Hess et al., 2011, Qin et al., 2010). In this review, we aim to describe methods for metagenomic and metatranscriptomic studies in the context of the microbiome, and discuss progress and future steps in the field.
Section snippets
Considerations for study design
Three major types of experimental approaches will be discussed in this review. In amplicon sequencing, a particular gene, gene fragment or sequence is amplified and the sequence determined. This is usually done in very highly conserved genes, such as segments of the 16S rRNA gene, in order to determine which organisms are in a sample and how organisms differ with the environment. In metagenome sequencing, the entire DNA in a sample is sequenced to determine which genes are present in the sample
DNA extraction techniques
Microbial genomic DNA extraction and purification serves as the first step of library preparation. Since most sequencing protocols require between nanograms and micrograms of DNA, efficient DNA isolation and purification is critical for downstream sequencing.
Cell lysis and subsequent DNA extraction from certain microbes, especially those living in extreme environments, can be difficult, as these organisms have rigid cell wall structures and also release stable nucleases upon cell lysis.
High throughput sequencing methodologies
The first-generation DNA sequencing technology was developed by Sanger et al. (1977) based on the selective incorporation of chain-terminating dideoxynulceotides and the first automatic sequencing machine (AB370) was produced by Applied Biosystems in 1987 (Liu et al., 2012). The Sanger sequencing technique completed the first bacterial genome sequencing in 1995 (Fleischmann et al., 1995), and constituted the main part of the Human Genome Project in 2001 (Collins et al., 2003), which in turn
Common applications of high-throughput sequencing and bioinformatic tools
High-throughput sequencing techniques produce massive amounts of data and thus to draw any useful conclusions it is necessary to computationally analyze the information. In this section, we discuss the most commonly used methods and tools for targeted amplicon sequencing analysis, shotgun metagenome analysis, and metatranscriptome analysis, and provide some examples of emerging technologies. Fig. 1 summarizes the steps of bioinformatic analyses for each of these experiments.
Perspectives and conclusions
High-throughput sequencing technologies have improved in output and quality, and have become an indispensable tool for an increasingly wide variety of experiments, including in phylogenetic, diagnostic, and ecological contexts. Through these tools, we can gain insight into the composition, activities and dynamics of a wide variety of microbiomes, helping to elucidate how bacteria interact with each other and their environment.
Updates to laboratory and computational tools are ongoing. In the
Acknowledgement
This work was partly funded through the CIHR Vogue team grant.
References (196)
- et al.
Sequencing our way towards understanding global eukaryotic biodiversity
Trends Ecol. Evol.
(2012) - et al.
Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products
Chem. Biol.
(1998) - et al.
Evaluation of different partial 16S rRNA gene sequence regions for phylogenetic analysis of microbiomes
J. Microbiol. Methods
(2011) - et al.
Real-time DNA sequencing from single polymerase molecules
Methods Enzymol.
(2010) - et al.
Metagenomic study of the oral microbiota by Illumina high-throughput sequencing
J. Microbiol. Methods
(2009) - et al.
Metabolic reconstruction for metagenomic data and its application to the human microbiome
PLoS Comput. Biol.
(2012) - et al.
Environmental whole-genome amplification to access microbial populations in contaminated sediments
Appl. Environ. Microbiol.
(2006) - et al.
Divergence and redundancy of 16S rRNA sequences in genomes with multiple rrn operons
J. Bacteriol.
(2004) The statistical analysis of compositional data
J. R. Stat. Soc. B Methodol.
(1982)- et al.
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Res.
(1997)
Phylogenetic identification and in situ detection of individual microbial cells without cultivation
Microbiol. Rev.
A new method for non-parametric multivariate analysis of variance
Aust. Ecol.
SmashCommunity: a metagenomic annotation and analysis tool
Bioinformatics
The Pfam protein families database
Nucleic Acids Res.
Solexa Ltd
Pharmacogenomics
Virtual terminator nucleotides for next-generation DNA sequencing
Nat. Methods
Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models
Nat. Methods
The potential and challenges of nanopore sequencing
Nat. Biotechnol.
ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time
Nucleic Acids Res.
QIIME allows analysis of high-throughput community sequencing data
Nat. Methods
Preparation of next-generation sequencing libraries using Nextera technology: simultaneous DNA fragmentation and adaptor tagging by in vitro transposition
Methods Mol. Biol.
Nonparametric estimation of the number of classes in a population
Scand. J. Stat.
Estimating the number of classes via sample coverage
J. Am. Stat. Assoc.
Abundance-based similarity indices and their estimation when there are unseen species in samples
Biometrics
Statistical and computational methods for high-throughput sequencing data analysis of alternative splicing
Stat. Biosci.
Bayesian estimation of bacterial community composition from 454 sequencing data
Nucleic Acids Res.
The analysis of oral microbial communities of wild-type and toll-like receptor 2-deficient mice using a 454 GS FLX Titanium pyrosequencer
BMC Microbiol.
Identification and molecular modeling of a novel lipase from an Antarctic soil metagenomic library
Pol. J. Microbiol.
Comparison of two next-generation sequencing technologies for resolving highly complex microbiota composition using tandem variable 16S rRNA gene regions
Nucleic Acids Res.
Enhanced 5-methylcytosine detection in single-molecule, real-time sequencing via Tet1 oxidation
BMC Biol.
Non‐parametric multivariate analyses of changes in community structure
Aust. J. Ecol.
The Ribosomal Database Project: improved alignments and new tools for rRNA analysis
Nucleic Acids Res.
The Human Genome Project: lessons from large-scale biology
Science
Comparative genomics of the bacterial genus Listeria: genome evolution is characterized by limited gene acquisition and limited gene loss
BMC Genomics
Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB
Appl. Environ. Microbiol.
TACOA—taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach
BMC Bioinforma.
The microbial communities in male first catch urine are highly similar to those in paired urethral swab specimens
PLoS One
Isolation and partial characterization of novel genes encoding acidic cellulases from metagenomes of buffalo rumens
J. Appl. Microbiol.
Search and clustering orders of magnitude faster than BLAST
Bioinformatics
UCHIME improves sensitivity and speed of chimera detection
Bioinformatics
Single-molecule DNA analysis
Annu. Rev. Anal. Chem. (Palo Alto Calif.)
Isometric logratio transformations for compositional data analysis
Math. Geol.
Real-time DNA sequencing from single polymerase molecules
Science
Oxford Nanopore announcement sets sequencing sector abuzz
Nat. Biotechnol.
MGC: a metagenomic gene caller
BMC Bioinforma.
Estimating similarity of communities: a parametric approach to spatio‐temporal analysis of species diversity
Ecography
Microbial co-occurrence relationships in the human microbiome
PLoS Comput. Biol.
ANOVA-like differential expression (ALDEx) analysis for mixed-population RNA-Seq
PLoS One
Metagenomics for mining new genetic resources of microbial communities
J. Mol. Microbiol. Biotechnol.
Cited by (222)
Associations of maternal prenatal psychological symptoms and saliva cortisol with neonatal meconium microbiota: A cross-sectional study
2024, Progress in Neuro-Psychopharmacology and Biological PsychiatryCombined toxic effects of nanoplastics and norfloxacin on mussel: Leveraging biochemical parameters and gut microbiota
2023, Science of the Total EnvironmentMicrobiome-scale analysis of aerosol facemask contamination during nebulization therapy in hospital
2023, Journal of Hospital InfectionStructures and diversities of bacterial communities in oil-contaminated soil at shale gas well site assessed by high-throughput sequencing
2024, Environmental Science and Pollution Research
- 1
These authors contributed equally to this work.