Abstract
Coherent genomic groups are frequently used as a proxy for bacterial species delineation through computation of overall genome relatedness indices (OGRI). Average nucleotide identity (ANI) is a widely employed method for estimating relatedness between genomic sequences. However, pairwise comparisons of genome sequences based on ANI is relatively computationally intensive and therefore precludes analyses of large datasets composed of thousand genome sequences.
In this work we evaluated an alternative OGRI based on k-mers counts to study prokaryotic species delimitation. A dataset containing more than 3,500 Pseudomonas genome sequences was successfully classified in few hours with the same precision as ANI. A new visualization method based on zoomable circle packing was employed for assessing relationships between among the 350 cliques generated. Amendment of databases with these Pseudomonas cliques greatly improved the classification of metagenomic read sets with k-mers-based classifier.
The developed workflow was integrated in the user-friendly KI-S tool that is available at the following address: https://iris.angers.inra.fr/galaxypub-cfbp.