TY - JOUR T1 - Deconvolute individual genomes from metagenome sequences through read clustering JF - bioRxiv DO - 10.1101/620666 SP - 620666 AU - Kexue Li AU - Lili Wang AU - Lizhen Shi AU - Li Deng AU - Zhong Wang Y1 - 2019/01/01 UR - http://biorxiv.org/content/early/2019/04/29/620666.abstract N2 - Motivation Metagenome assembly from short next-generation sequencing data is a challenging process due to its large scale and computational complexity. Clustering short reads before assembly offers a unique opportunity for parallel downstream assembly of genomes with individualized optimization. However, current read clustering methods suffer either false negative (under-clustering) or false positive (over-clustering) problems.Results Based on a previously developed scalable read clustering method on Apache Spark, SpaRC, that has very low false positives, here we extended its capability by adding a new method to further cluster small clusters. This method exploits statistics derived from multiple samples in a dataset to reduce the under-clustering problem. Using a synthetic dataset from mouse gut microbiomes we show that this method has the potential to cluster almost all of the reads from genomes with sufficient sequencing coverage. We also explored several clustering parameters that deferentially affect genomes with various sequencing coverage.Availability https://bitbucket.org/berkeleylab/jgi-sparc/.Contact zhongwang{at}lbl.gov ER -