PT - JOURNAL ARTICLE AU - Yu Kang AU - Lina Yuan AU - Zilong He AU - Fei Chen AU - Zhancheng Gao AU - Shulin Liu AU - Xinmiao Jia AU - Qin Ma AU - Xinhao Jin AU - Rongrong Fu AU - Yang Yu AU - Chunxiong Luo AU - Jiayan Wu AU - Jingfa Xiao AU - Songnian Hu AU - Jun Yu TI - Worldwide Population Structure of <em>Escherichia coli</em> Reveals Two Major Subspecies AID - 10.1101/122713 DP - 2017 Jan 01 TA - bioRxiv PG - 122713 4099 - http://biorxiv.org/content/early/2017/03/31/122713.short 4100 - http://biorxiv.org/content/early/2017/03/31/122713.full AB - Escherichia coli is a Gram-negative bacterial species with both great biological diversity and important clinical relevance. To study its population structure in both world-wide and genome-wide scales, we scrutinise phylogenetically 104 high-quality complete genomes of diverse human/animal hosts, among which 45 are new additions to the collection; most of them are clustered into two major clades: Vig (vigorous) and Slu (sluggish). The two clades not only show distinct physiological features but also genome content and sequence variation. Limited recombination and horizontal gene transfer separate the two clades, as opposed to extensive intra-clade gene flow that functionally homogenizes even commensal and pathogenic strains. The two clades that are genetically isolated should be recognized as two subspecies both independently represent a continuum of possibilities range from commensal to pathogenic phenotype. Additionally, the frequent intra-clade recombinant events, often in larger fragments of over 5kb, indicates possibility of highly-efficient gene transfer mechanism depending on inheritance. Underlying molecular mechanisms that constitute such recombinant barrier between the subspecies deserve further exploration and investigations among broader microbial taxa.IMPORTANCE The concept of bacterial species has debated over decades. The question becomes more important today as human microbiomes and their health relevance are being studied extensively. The human microbiomes where thousands of bacterial species co-habit need to be deciphered at minute details and down to species and subspecies. In this study, we scrutinize the population genomics of E. coli and define two subspecies that are distinct from each other concerning physiology, ecology, and clinical features. As opposed to extensive genetic recombination within subspecies, limited genetic flux between subspecies leads to their phenotypical distinctions and separate evolution paths. We provide a key example illustrating that the divergence of a species into two subspecies depends on recombination efficiency; when the recombination efficiency becomes a barrier the species appears split into two. The E. coli scenario and its molecular mechanisms deserve further exploration in a broader taxa of microbes.dN/dSratios of non-synonymous to synonymous polymorphismECOREscherichia coli collection of referenceESBLsextended spectrum beta lactamasesExPECextraintestinal pathogenic E. coliHGThorizontal gene transferIPECintestinal pathogenic E. coliLAVsLinage Associated VariationsMLSTmulti-locus sequence typingNGSnext-generation sequencingr/mratio of polymorphisms caused by recombination to mutationSluSluggish cladeStxShiga toxinsVigVigorous clade