Abstract
The lipopolysaccharide (O) and flagellar (H) surface antigens of Escherichia coli are targets for serotyping that have traditionally been used to identify pathogenic lineages of E. coli. As serotyping has several limitations, public health reference laboratories are increasingly moving towards whole genome sequencing (WGS) for the rapid characterisation of bacterial isolates. Here we present a method to rapidly and accurately serotype E. coli isolates from raw, short read sequence data, leveraging the known genetic basis for the biosynthesis of O- and H-antigens. Our approach bypasses the need for de novo genome assembly by directly screening WGS reads against a curated database of alleles linked to known E. coli O-groups and H-types (the EcOH database) using the software package SRST2. We validated our approach by comparing in silico results with those obtained via serological phenotyping of 197 enteropathogenic (EPEC) isolates. We also demonstrated the utility of our method to characterise enterotoxigenic E. coli (ETEC) and the uropathogenic E. coli (UPEC) epidemic clone ST131, and for in silico serotyping of foodborne outbreak-related isolates in the public GenomeTrakr database.