Motivation: Metagenomics is a powerful approach to study genetic content of environmental samples, which has been strongly promoted by next-generation sequencing technologies. To cope with massive data involved in modern metagenomic projects, recent tools rely on the analysis of k-mers shared between the read to be classified and sampled reference genomes.
Results: Within this general framework, we show that spaced seeds provide a significant improvement of classification accuracy, as opposed to traditional contiguous k-mers. We support this thesis through a series of different computational experiments, including simulations of large-scale metagenomic projects.Availability and implementation, Supplementary information: Scripts and programs used in this study, as well as supplementary material, are available from http://github.com/gregorykucherov/spaced-seeds-for-metagenomics.
Contact: gregory.kucherov@univ-mlv.fr.
© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.