RT Journal Article SR Electronic T1 Heterogeneity among estimates of the core genome and pan-genome in different pneumococcal populations JF bioRxiv FD Cold Spring Harbor Laboratory SP 133991 DO 10.1101/133991 A1 Andries J van Tonder A1 James E Bray A1 Keith A Jolley A1 Sigríður J Quirk A1 Gunnsteinn Haraldsson A1 Martin CJ Maiden A1 Stephen D Bentley A1 Ásgeir Haraldsson A1 Helga Erlendsdóttir A1 Karl G Kristinsson A1 Angela B Brueggemann YR 2017 UL http://biorxiv.org/content/early/2017/05/06/133991.abstract AB Background Understanding the structure of a bacterial population is essential in order to understand bacterial evolution, or which genetic lineages cause disease, or the consequences of perturbations to the bacterial population. Estimating the core genome, the genes common to all or nearly all strains of a species, is an essential component of such analyses. The size and composition of the core genome varies by dataset, but our hypothesis was that variation between different collections of the same bacterial species should be minimal. To test this, the genome sequences of 3,121 pneumococci recovered from healthy individuals in Reykjavik (Iceland), Southampton (United Kingdom), Boston (USA) and Maela (Thailand) were analysed.Results The analyses revealed a ‘supercore’ genome (genes shared by all 3,121 pneumococci) of only 303 genes, although 461 additional core genes were shared by pneumococci from Reykjavik, Southampton and Boston. Overall, the size and composition of the core genomes and pan-genomes among pneumococci recovered in Reykjavik, Southampton and Boston were very similar, but pneumococci from Maela were distinctly different. Inspection of the pan-genome of Maela pneumococci revealed several >25 Kb sequence regions that were homologous to genomic regions found in other bacterial species.Conclusions Some subsets of the global pneumococcal population are highly heterogeneous and thus our hypothesis was rejected. This is an essential point of consideration before generalising the findings from a single dataset to the wider pneumococcal population.