TY - JOUR T1 - The impact of DNA polymerase and number of rounds of amplification in PCR on 16S rRNA gene sequence data JF - bioRxiv DO - 10.1101/565598 SP - 565598 AU - Marc A Sze AU - Patrick D Schloss Y1 - 2019/01/01 UR - http://biorxiv.org/content/early/2019/03/04/565598.abstract N2 - PCR amplification of 16S rRNA genes is a critical, yet under appreciated step in the generation of sequence data to describe the taxonomic composition of microbial communities. Numerous factors in the design of PCR can impact the sequencing error rate, the abundance of chimeric sequences, and the degree to which the fragments in the product represent their abundance in the original sample (i.e. bias). We compared the performance of high fidelity polymerases and varying number of rounds of amplification when amplifying a mock community and human stool samples. Although it was impossible to derive specific recommendations, we did observe general trends. Namely, using a polymerase with the highest possible fidelity and minimizing the number of rounds of PCR reduced the sequencing error rate, fraction of chimeric sequences, and bias. Evidence of bias at the sequence level was subtle and could not be ascribed to the fragments’ fraction of bases that were guanines or cytosines. When analyzing mock community data, the amount that the community deviated from the expected composition increased with rounds of PCR. This bias was inconsistent for human stool samples. Overall the results underscore the difficulty of comparing sequence data that are generated by different PCR protocols. However, the results indicate that the variation in human stool samples is generally larger than that introduced by the choice of polymerase or number of rounds of PCR.Importance A steep decline in sequencing costs drove an explosion in studies characterizing microbial communities from diverse environments. Although a significant amount of effort has gone into understanding the error profiles of DNA sequencers, little has been done to understand the downstream effects of the PCR amplification protocol. We quantified the effects of the choice of polymerase and number of PCR cycles on the quality of downstream data. We found that these choices can have a profound impact on the way that a microbial community is represented in the sequence data. The effects are relatively small compared to the variation in human stool samples, however, care should be taken to use polymerases with the highest possible fidelity and to minimize the number of rounds of PCR. These results also underscore that it is not possible to directly compare sequence data generated under different PCR conditions. ER -