RT Journal Article SR Electronic T1 Rapid Identification of Stable Clusters in Bacterial Populations Using the Adjusted Wallace Coefficient JF bioRxiv FD Cold Spring Harbor Laboratory SP 299347 DO 10.1101/299347 A1 Dillon OR Barker A1 João A Carriço A1 Peter Kruczkiewicz A1 Federica Palma A1 Mirko Rossi A1 Eduardo N Taboada YR 2018 UL http://biorxiv.org/content/early/2018/04/16/299347.abstract AB Whole-genome sequencing (WGS) of microbial pathogens has become an essential part of modern epidemiological investigations. Although WGS data can be analyzed using a number of different approaches, such as traditional phylogenetic methods, a critical requirement for global systems for pathogen surveillance is the development of approaches for transforming sequence data into WGS-based subtypes, which creates a nomenclature that describes their higher-order relationships to one another. To this end, subtype similarity thresholds are needed to define clusters of subtypes representing lineages of interest. WGS-based subtyping presents a challenge since both the addition of novel genome sequences and small adjustments in similarity thresholds can have a dramatic impact on cluster composition and stability. We present the Neighbourhood Adjusted Wallace Coefficient (nAWC), a method for evaluating cluster stability based on computing cluster concordance between neighbouring similarity thresholds. The nAWC can be used to identify areas in in which distance thresholds produce robust clusters. Using datasets from Salmonella enterica and Campylobacter jejuni, representing strongly and weakly clonal bacterial species respectively, we show that clusters generated using such thresholds are both stable and reflect basic units in their overall population structure. Our results suggest that the nAWC could be useful for defining robust clusters compatible with nomenclatures for global WGS-based surveillance networks, which require stable clusters to be defined that both harness the discriminatory power of WGS data while allowing for long-term tracking of strains of interest.