Abstract
Background Fast, accurate and high-throughput detection of bacteria is in great demand. The present work was conducted to investigate the possibility of identifying both known and unknown bacterial strains from unassembled next-generation sequencing reads using custom-made guide trees.
Results A program named StrainSeeker was developed that constructs a list of specific k-mers for each node of any given Newick-format tree and enables rapid identification of bacterial genomes within minutes. StrainSeeker has been tested and shown to successfully identify Escherichia coli strains from mixed samples in less than 5 minutes. StrainSeeker can also identify bacterial strains from highly diverse metagenomics samples. StrainSeeker is available at http://bioinfo.ut.ee/strainseeker.
Conclusions Our novel approach can be useful for both clinical diagnostics and research laboratories because novel bacterial strains are constantly emerging and their fast and accurate detection is very important.
List of abbreviations
- bp
- base pair
- NCBI
- National Center for Biotechnology Information
- MLST
- multi-locus sequence typing
- SRA
- Sequence Read Archive
- WGS
- whole-genome sequencing
- UPGMA
- unweighted pair group method with arithmetic mean