RT Journal Article SR Electronic T1 Benchmarking metagenomic classification tools for long-read sequencing data JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.11.25.397729 DO 10.1101/2020.11.25.397729 A1 Josip Marić A1 Krešimir Križanović A1 Sylvain Riondet A1 Niranjan Nagarajan A1 Mile Šikić YR 2021 UL http://biorxiv.org/content/early/2021/08/18/2020.11.25.397729.abstract AB We performed a comprehensive assessment of metagenomics classification tools on long sequenced reads. In addition to well defined mock communities, we prepared various synthetic datasets to simulate real-life scenarios. The results show that off-the-shelf mappers such as Minimap2 or Ram are at least comparable with mapping-based classification tools in most accuracy measures while not being much slower than kmer based tools and requiring equal or less RAM. Majority of tested tools are prone to report organisms not present in datasets and underperform in the case of high presence of host’s genetic material. Furthermore, longer read lengths make classification easier, but due to the difference in read length distributions among species, the usage of only longest reads reduces the accuracy. Finally, evaluation on a mock community shows the importance of careful isolation of genetic material and sequencing preparation.Availability and implementation Python scripts used to generate all figures and tables in this study, and all supplementary texts and figures are available via the Github repository https://github.com/lbcb-sci/MetagenomicsBenchmark. Datasets, supporting files, analysis results and reports are available via Zenodo repository https://doi.org/10.5281/zenodo.5203182.Competing Interest StatementThe authors have declared no competing interest.