Groundtruthing next-gen sequencing for microbial ecology-biases and errors in community structure estimates from PCR amplicon pyrosequencing

PLoS One. 2012;7(9):e44224. doi: 10.1371/journal.pone.0044224. Epub 2012 Sep 6.

Abstract

Analysis of microbial communities by high-throughput pyrosequencing of SSU rRNA gene PCR amplicons has transformed microbial ecology research and led to the observation that many communities contain a diverse assortment of rare taxa-a phenomenon termed the Rare Biosphere. Multiple studies have investigated the effect of pyrosequencing read quality on operational taxonomic unit (OTU) richness for contrived communities, yet there is limited information on the fidelity of community structure estimates obtained through this approach. Given that PCR biases are widely recognized, and further unknown biases may arise from the sequencing process itself, a priori assumptions about the neutrality of the data generation process are at best unvalidated. Furthermore, post-sequencing quality control algorithms have not been explicitly evaluated for the accuracy of recovered representative sequences and its impact on downstream analyses, reducing useful discussion on pyrosequencing reads to their diversity and abundances. Here we report on community structures and sequences recovered for in vitro-simulated communities consisting of twenty 16S rRNA gene clones tiered at known proportions. PCR amplicon libraries of the V3-V4 and V6 hypervariable regions from the in vitro-simulated communities were sequenced using the Roche 454 GS FLX Titanium platform. Commonly used quality control protocols resulted in the formation of OTUs with >1% abundance composed entirely of erroneous sequences, while over-aggressive clustering approaches obfuscated real, expected OTUs. The pyrosequencing process itself did not appear to impose significant biases on overall community structure estimates, although the detection limit for rare taxa may be affected by PCR amplicon size and quality control approach employed. Meanwhile, PCR biases associated with the initial amplicon generation may impose greater distortions in the observed community structure.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Bacteria / genetics*
  • Base Sequence
  • Bias*
  • Biota*
  • DNA Primers / metabolism
  • Polymerase Chain Reaction / methods*
  • RNA, Ribosomal, 16S / genetics
  • Sequence Analysis, DNA / methods*
  • Statistics, Nonparametric
  • Temperature*

Substances

  • DNA Primers
  • RNA, Ribosomal, 16S

Grants and funding

This research was supported by grants from the Office of Science (BER), U.S. Department of Energy (Cooperative Agreement No. De-FC02- 02ER63453) and the National Science Foundation to KEW, SJW, and SCC (MCB-0731916) and SCC (ANT-0739648 and ANT-0229836). The New Zealand Marsden Fund provided financial support for CWH, IRM, SCC (UOW0802), and CKL (UOW1003). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.