%0 Journal Article %A Kelly L. Wyres %A Ryan R. Wick %A Claire Gorrie %A Adam Jenney %A Rainer Follador %A Nicholas R. Thomson %A Kathryn E. Holt %T Identification of Klebsiella capsule synthesis loci from whole genome data %D 2016 %R 10.1101/071415 %J bioRxiv %P 071415 %X Background Klebsiella pneumoniae and close relatives are a growing cause of healthcare-associated infections for which increasing rates of multi-drug resistance are a major concern. The Klebsiella polysaccharide capsule is a major virulence determinant and epidemiological marker. However, little is known about capsule epidemiology since serological typing is not widely accessible, and many isolates are serologically non-typeable. Molecular methods for capsular typing are needed, but existing methods lack sensitivity and specificity and fail to take advantage of the information available in whole-genome sequence data, which is increasingly being generated for surveillance and investigation of Klebsiella.Methods We investigated the diversity of capsule synthesis loci (K loci) among a large, diverse collection of 2503 genome sequences of K. pneumoniae and closely related species. We incorporated analyses of both full-length K locus DNA sequences and clustered protein coding sequences to identify, annotate and compare K locus structures, and we propose a novel method for identifying K loci based on full locus information extracted from whole genome sequences.Results A total of 134 distinct K loci were identified, including 31 novel types. Comparative analysis of K locus gene content detected 508 unique protein coding gene clusters that appear to reassort via homologous recombination, generating novel K locus types. Extensive nucleotide diversity was detected among the wzi and wzc genes, both within and between K loci, indicating that current typing schemes based on these genes are inadequate. As a solution, we introduce Kaptive, a novel software tool that automates the process of identifying K loci from large sets of Klebsiella genomes based on full locus information.Conclusions This work highlights the extensive diversity of Klebsiella K loci and the proteins that they encode. We propose a standardised K locus nomenclature for Klebsiella, present a curated reference database of all known K loci, and introduce a tool for identifying K loci from genome data (https://github.com/katholt/Kaptive). These developments constitute important new resources for the Klebsiella community for use in genomic surveillance and epidemiology. %U https://www.biorxiv.org/content/biorxiv/early/2016/12/06/071415.full.pdf