Abstract
With the emergence of portable DNA sequencers, such as Oxford Nanopore Technology MinION, metagenomic DNA sequencing can be performed in real-time and directly in the field. However, because metagenomic DNA analysis is computationally and memory intensive, and the current methods are designed for batch processing, the current metagenomic tools are not well suited for mobile devices.
In this paper, we propose a new memory-efficient method to identify Operational Taxonomic Units (OTUs) in metagenomic DNA streams. Our method is based on finding connected components in overlap graphs constructed over a real-time stream of long DNA reads as produced by MinION platform. We propose an efficient algorithm to maintain connected components when an overlap graph is streamed, and show how redundant information can be removed from the stream by transitive closures. Through experiments on simulated and real-world metagenomic data, we demonstrate that the resulting solution is able to recover OTUs with high precision while remaining suitable for mobile computing devices.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
vickyzhe{at}buffalo.edu, erdem{at}buffalo.edu, jzola{at}buffalo.edu
ACM Reference Format: Vicky Zheng, Ahmet Erdem Sariyuce, and Jaroslaw Zola. 2020. Identifying Taxonomic Units in Metagenomic DNA Streams. In International Workshop on Data Mining in Bioinformatics (BIOKDD’20), August 22-27, 2020, San Diego, CA, USA. ACM, New York, NY, USA, 10 pages.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions{at}acm.org.
Included acknowledgement section to thank NSF