RT Journal Article SR Electronic T1 Accurate Determination of Bacterial Abundances in Human Metagenomes Using Full-length 16S Sequencing Reads JF bioRxiv FD Cold Spring Harbor Laboratory SP 228619 DO 10.1101/228619 A1 Fanny Perraudeau A1 Sandrine Dudoit A1 James H. Bullard YR 2017 UL http://biorxiv.org/content/early/2017/12/04/228619.abstract AB DNA sequencing of PCR-amplified marker genes, especially but not limited to the 16S rRNA gene, is perhaps the most common approach for profiling microbial communities. Due to technological constraints of commonly available DNA sequencing, these approaches usually take the form of short reads sequenced from a narrow, targeted variable region, with a corresponding loss of taxonomic resolution relative to the full length marker gene. We use Pacific Biosciences single-molecule, real-time circular consensus sequencing to sequence amplicons spanning the entire length of the 16S rRNA gene. However, this sequencing technology suffers from high sequencing error rate that needs to be addressed in order to take full advantage of the longer sequence. Here, we present a method to model the sequencing error process using a generalized pair hidden Markov chain model and estimate bacterial abundances in microbial samples. We demonstrate, with simulated and real data, that our model and its associated estimation procedure are able to give accurate estimates at the species (or subspecies) level, and is more flexible than existing methods like SImple Non-Bayesian TAXonomy (SINTAX).