TY - JOUR T1 - Identification of <em>Klebsiella</em> capsule synthesis loci from whole genome data JF - bioRxiv DO - 10.1101/071415 SP - 071415 AU - Kelly L. Wyres AU - Ryan R. Wick AU - Claire Gorrie AU - Adam Jenney AU - Rainer Follador AU - Nicholas R. Thomson AU - Kathryn E. Holt Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/12/06/071415.abstract N2 - Background Klebsiella pneumoniae and close relatives are a growing cause of healthcare-associated infections for which increasing rates of multi-drug resistance are a major concern. The Klebsiella polysaccharide capsule is a major virulence determinant and epidemiological marker. However, little is known about capsule epidemiology since serological typing is not widely accessible, and many isolates are serologically non-typeable. Molecular methods for capsular typing are needed, but existing methods lack sensitivity and specificity and fail to take advantage of the information available in whole-genome sequence data, which is increasingly being generated for surveillance and investigation of Klebsiella.Methods We investigated the diversity of capsule synthesis loci (K loci) among a large, diverse collection of 2503 genome sequences of K. pneumoniae and closely related species. We incorporated analyses of both full-length K locus DNA sequences and clustered protein coding sequences to identify, annotate and compare K locus structures, and we propose a novel method for identifying K loci based on full locus information extracted from whole genome sequences.Results A total of 134 distinct K loci were identified, including 31 novel types. Comparative analysis of K locus gene content detected 508 unique protein coding gene clusters that appear to reassort via homologous recombination, generating novel K locus types. Extensive nucleotide diversity was detected among the wzi and wzc genes, both within and between K loci, indicating that current typing schemes based on these genes are inadequate. As a solution, we introduce Kaptive, a novel software tool that automates the process of identifying K loci from large sets of Klebsiella genomes based on full locus information.Conclusions This work highlights the extensive diversity of Klebsiella K loci and the proteins that they encode. We propose a standardised K locus nomenclature for Klebsiella, present a curated reference database of all known K loci, and introduce a tool for identifying K loci from genome data (https://github.com/katholt/Kaptive). These developments constitute important new resources for the Klebsiella community for use in genomic surveillance and epidemiology. ER -