Abstract
Summary The large and growing number of microbial genomes available in public databases makes the optimal selection of reference genomes necessary for many in-silico analyses, e.g. single nucleotide polymorphism detection, scaffolding and comparative genomics, increasingly difficult. Here, we present ReferenceSeeker, a novel command line tool combining a fast kmer profile-based database lookup of candidate reference genomes with subsequent calculation of highly specific average nucleotide identity (ANI) values for the rapid determination of appropriate reference genomes. Pre-built databases for bacteria, archaea, fungi, protozoa and viruses based on the RefSeq database are provided for download.
Availability and Implementation ReferenceSeeker is open source software implemented in Python. Source code and binaries are freely available for download at https://github.com/oschwengers/referenceseeker under the GNU GPL3 license.
Contact referenceseeker{at}computational.bio