RT Journal Article SR Electronic T1 Ultra-accurate Microbial Amplicon Sequencing Directly from Complex Samples with Synthetic Long Reads JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.07.07.192286 DO 10.1101/2020.07.07.192286 A1 Benjamin J Callahan A1 Dmitry Grinevich A1 Siddhartha Thakur A1 Michael A Balamotis A1 Tuval Ben Yehezkel YR 2020 UL http://biorxiv.org/content/early/2020/07/07/2020.07.07.192286.abstract AB Out of the many pathogenic bacterial species that are known, only a fraction are readily identifiable directly from a complex microbial community using standard next generation DNA sequencing technology. Long-read sequencing offers the potential to identify a wider range of species and to differentiate between strains within a species, but attaining sufficient accuracy in complex metagenomes remains a challenge. Here, we describe and analytically validate LoopSeq, a commercially-available synthetic long-read (SLR) sequencing technology that generates highly-accurate long reads from standard short reads. LoopSeq reads are sufficiently long and accurate to identify microbial genes and species directly from complex samples. LoopSeq applied to full-length 16S rRNA genes from known strains in a microbial community perfectly recovered the full diversity of full-length exact sequence variants in a known microbial community. Full-length LoopSeq reads had a per-base error rate of 0.005%, which exceeds the accuracy reported for other long-read sequencing technologies. 18S-ITS and genomic sequencing of fungal and bacterial isolates confirmed that LoopSeq sequencing maintains that accuracy for reads up to 6 kilobases in length. Analysis of rinsate from retail meat samples demonstrated that LoopSeq full-length 16S rRNA synthetic long-reads could accurately classify organisms down to the species level, and could differentiate between different strains within species identified by the CDC as potential foodborne pathogens. The order-of-magnitude improvement in both length and accuracy over standard Illumina amplicon sequencing achieved with LoopSeq enables accurate species-level and strain identification from complex and low-biomass microbiome samples. The ability to generate accurate and long microbiome sequencing reads using standard short read sequencers will accelerate the building of quality microbial sequence databases and removes a significant hurdle on the path to precision microbial genomics.Competing Interest StatementMichael Balamotis and Tuval Ben Yehezkel are employees of Loop Genomics, the vendor for the synthetic long-read sequencing technology analyzed in this manuscript.