TY - JOUR T1 - An image processing method for metagenomic binning: multi-resolution genomic binary patterns JF - bioRxiv DO - 10.1101/096719 SP - 096719 AU - Samaneh Kouchaki AU - Avraam Tapinos AU - David L Robertson Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/04/10/096719.abstract N2 - Bioinformatics methods typically use textual representations of genetic information, represented computationally as strings or sub-strings of the characters A, T, G and C. Image processing methods offer a rich source of alternative descriptors as they are designed to work in the presence of noisy data without the need for exact matching. We introduce a method, multi-resolution local binary patterns (MLBP) from image processing to extract local ‘texture’ changes from nucleotide sequence data. We apply this feature space to the alignment-free binning of metagenomic data. The effectiveness of MLBP is demonstrated using both simulated and real human gut microbial communities. The intuition behind our method is the MLBP feature vectors permit sequence comparisons without the need for explicit pairwise matching. Sequence reads or contigs can then be represented as vectors and their ‘texture’ compared efficiently using state-of-the-art machine learning algorithms to perform dimensionality reduction to capture eigengenome information and perform clustering (here using RSVD and BH-tSNE). We demonstrate this approach outperforms existing methods based on k-mer frequency. The image processing method, MLBP, thus offers a viable alternative feature space to textual representations of sequence data. The source code for our Multi-resolution Genomic Binary Patterns method can be found at https://github.com/skouchaki/MrGBP. ER -