Implications of error-prone long-read whole-genome shotgun sequencing on characterizing reference microbiomes

Yu Hu; Li Fang; Christopher Nicholson; Kai Wang

doi:10.1101/2020.03.05.978866

Summary

Long-read sequencing techniques, such as the Oxford Nanopore Technology, can generate reads that are tens of kilobases in length, and are therefore particularly relevant for microbiome studies. However, due to the higher per-base error rates than typical short-read sequencing, the application of long-read sequencing on microbiomes remains largely unexplored. Here we deeply sequenced two human microbiota mock community samples (HM-276D and HM-277D) from the Human Microbiome Project. We showed that assembly programs consistently achieved high accuracy (~99%) and completeness (~99%) for bacterial strains with adequate coverage. We also found that long-read sequencing provides accurate estimates of species-level abundance (R=0.94 for 20 bacteria with abundance ranging from 0.005% to 64%). Our results demonstrate the feasibility to characterize complete microbial genomes and populations from error-prone Nanopore sequencing data, but also highlight necessary bioinformatics improvements for future metagenomics tool development.