Abstract
Increasing contact between humans and non-human primates provides an opportunity for the transfer of potential pathogens or antimicrobial resistance between host species. We have investigated genomic diversity, and antimicrobial resistance in Escherichia coli isolates from four species of non-human primate in the Gambia: Papio papio (n=22), Chlorocebus sabaeus (n=14), Piliocolobus badius (n=6) and Erythrocebus patas (n=1). We performed Illumina whole-genome sequencing on 101 isolates from 43 stools, followed by nanopore long-read sequencing on eleven isolates. We identified 43 sequence types (STs) by the Achtman scheme (ten of which are novel), spanning five of the eight known phylogroups of E. coli. The majority of simian isolates belong to phylogroup B2—characterised by strains that cause human extraintestinal infections—and encode factors associated with extraintestinal disease. A subset of the B2 strains (ST73, ST681 and ST127) carry the pks genomic island, which encodes colibactin, a genotoxin associated with colorectal cancer. We found little antimicrobial resistance and only one example of multi-drug resistance among the simian isolates. Hierarchical clustering showed that simian isolates from ST442 and ST349 are closely related to isolates recovered from human clinical cases (differences in 50 and seven alleles respectively), suggesting recent exchange between the two host species. Conversely, simian isolates from ST73, ST681 and ST127 were distinct from human isolates, while five simian isolates belong to unique core-genome ST complexes—indicating novel diversity specific to the primate niche. Our results are of public health importance, considering the increasing contact between humans and wild non-human primates.
Impact statement Little is known about the population structure, virulence potential and the burden of antimicrobial resistance among Escherichia coli from wild non-human primates, despite increased exposure to humans through the fragmentation of natural habitats. Previous studies, primarily involving captive animals, have highlighted the potential for bacterial exchange between non-human primates and humans living nearby, including strains associated with intestinal pathology. Using multiple-colony sampling and whole-genome sequencing, we investigated the strain distribution and population structure of E. coli from wild non-human primates from the Gambia. Our results indicate that these monkeys harbour strains that can cause extraintestinal infections in humans. We document the transmission of virulent E. coli strains between monkeys of the same species sharing a common habitat and evidence of recent interaction between strains from humans and wild non-human primates. Also, we present complete genome assemblies for five novel sequence types of E. coli.
Author notes All supporting data, code and protocols have been provided within the article or through supplementary data files. Nine supplementary figures and six supplementary files are available with the online version of this article.
Abbreviations ExPEC, Extraintestinal pathogenic Escherichia coli; ST, Sequence type; AMR, Antimicrobial resistance; MLST, Multi-locus sequence typing; VFDB, Virulence factors database; SNP, single nucleotide polymorphism; SPRI, Solid phase reversible immobilisation.
Data summary The raw sequences and polished assemblies from this study are available in the National Center for Biotechnology Information (NCBI) Short Read Archive, under the BioProject accession number PRJNA604701. The full list and characteristics of these strains and other reference strains used in the analyses are presented in Table 1 and Supplementary Files 1-4 (available with the online version of this article).
Introduction
Escherichia coli is a highly versatile species, capable of adapting to a wide range of ecological niches and colonising a diverse range of hosts (1, 2). In humans, E. coli colonises the gastrointestinal tract as a commensal, as well as causing intestinal and extraintestinal infection (2). E. coli is also capable of colonising the gut in non-human primates (3–5), where data from captive animals suggest that gut isolates are dominated by phylogroups B1 and A, which, in humans, encompass commensals as well as strains associated with intestinal pathology (6–9). E. coli strains encoding colibactin, or cytotoxic necrotising factor 1 have been isolated from healthy laboratory rhesus macaques (4, 10), while enteropathogenic E. coli strains can—in the laboratory—cause colitis in marmosets (11), rhesus macaques infected with simian immunodeficiency virus (12) and cotton-top tamarins (13).
There are two potential explanations for the co-occurrence of E. coli in humans and non-human primates. Some bacterial lineages may have been passed on through vertical transmission within the same host species for long periods, perhaps even arising from ancestral bacteria that colonised the guts of the most recent common ancestors of humans and non-human primate species (14–16). In such a scenario, isolates from non-human primates would be expected to be novel and distinct from the diversity seen in humans. However, there is also clearly potential for horizontal transfer of strains from one host species to another (17).
The exchange of bacteria between humans and human-habituated animals, particularly non-human primates, is of interest in light of the fragmentation of natural habitats globally (18–28). We have seen that wild non-human primates in the Gambia are frequently exposed to humans through tourism, deforestation and urbanisation. In Uganda, PCR-based studies have suggested transmission of E. coli between humans, non-human primates and livestock (26–28). Thus, wild non-human primates may constitute a reservoir for the zoonotic spread of E. coli strains associated with virulence and antimicrobial resistance to humans. Alternatively, humans might provide a reservoir of strains with the potential for anthroponotic spread to animals—or transmission might occur in both directions (29).
We do not know how many different lineages can co-exist within the same non-human primate host. Such information may help us contextualise the potential risks associated with transmission of bacterial strains between humans and non-human primates. In humans, up to eleven serotypes could be sampled from picking eleven colonies from individual stool samples (30).
To address these issues, we have exploited whole-genome sequencing to explore the colonisation patterns, population structure and phylogenomic diversity of E. coli in wild non-human primates from rural and urban Gambia.
Methods
Study population and sample collection
In June 2017, wild non-human primates were sampled from six sampling sites in the Gambia: Abuko Nature Reserve (riparian forest), Bijilo Forest Park (coastal fenced woodland), Kartong village (mangrove swamp), Kiang West National park (dry-broad-leaf forest), Makasutu Cultural Forest (ecotourism woodland) and River Gambia National park (riparian forest) (Figure 1). We sampled all four of the diurnal non-human primate species indigenous to the Gambia. Monkeys in Abuko and Bijilo are frequently hand-fed by visiting tourists, despite prohibiting guidelines (31).
Troops of monkeys were observed and followed. We collected a single freshly passed formed stool specimen from 43 visibly healthy individuals (38 adults, 5 juveniles; 24 females, 11 males, 8 of undetermined sex), drawn from four species: Erythrocebus patas (patas monkey), Papio papio (Guinea baboon), Chlorocebus sabaeus (green monkey) and Piliocolobus badius (Western colobus monkey). Stool samples were immediately placed into sterile falcon tubes, taking care to collect portions of stool material that had not touched the ground, then placed on dry ice and stored at 80°C within 6 h. The sample processing flow is summarised in Figure 2.
Microbiological processing
For the growth and isolation of E. coli, 0.1–0.2 g aliquots were taken from each stool sample into 1.5 ml microcentrifuge tubes under aseptic conditions. To each tube, 1 ml of physiological saline (0.85%) was added, and the saline-stool samples were vortexed for 2 min at 4200 rpm. The homogenised samples were taken through four ten-fold serial dilutions and a 100 µl aliquot from each dilution was spread on a plate of tryptone-bile-X-glucoronide agar using the cross-hatching method. Plates were incubated at 37°C for 18–24 h in air. Colony counts were performed for each serial dilution, counting translucent colonies with blue-green pigmentation and entire margins as E. coli. Up to five colonies from each sample were sub-cultured on MacConkey agar at 37°C for 18–24 h and then stored in 20% glycerol broth at −80°C.
Genomic DNA extraction
A single colony from each subculture was picked into 1 ml Luria-Bertani broth and incubated overnight at 37°C. Broth cultures were spun at 3500rpm for 2 min and lysed using lysozyme, proteinase K, 10% SDS and RNase A in Tris EDTA buffer (pH 8.0). Suspensions were placed on a thermomixer with vigorous shaking at 1600 rpm, first at 37°C for 25 min and subsequently at 65°C for 15 min. DNA was extracted using solid-phase reversible immobilisation magnetic beads (Becter Coulter Inc., Brea, CA, U.S.A.), precipitated with ethanol, eluted in Tris-Cl and evaluated for protein and RNA contamination using A260/A280 and A260/A230 ratios on the NanoDrop 2000 Spectrophotometer (Fisher Scientific, Loughborough, UK). DNA concentrations were measured using the Qubit HS DNA assay (Invitrogen, MA, USA). DNA was stored at −20°C.
Illumina sequencing
Whole-genome sequencing was carried out on the Illumina NextSeq 500 platform (Illumina, San Diego, CA). We used a modified Nextera XT DNA protocol for the library preparation as follows. The genomic DNA was normalised to 0.5 ng µl-1 with 10 mM Tris-HCl. Next, 0.9 µl of Tagment DNA buffer (Illumina Catalogue No. 15027866) was mixed with 0.09 µl of Tagment DNA enzyme (Illumina Catalogue No. 15027865) and 2.01 µl of PCR-grade water in a master-mix. Next, 3 µl of the master-mix was added to a chilled 96-well plate. To this, 2 µl of normalised DNA (1 ng total) was added, pipette-mixed and the reaction heated to 55°C for 10 min on a PCR block. To each well, we added 11 µl of KAPA2G Robust PCR master-mix (Sigma Catalogue No. KK5005), comprising 4 µl KAPA2G buffer, 0.4 µl dNTPs, 0.08 µl polymerase and 6.52 µl PCR-grade water, contained in the kit per sample. Next, 2 µl each of P7 and P5 Nextera XT Index Kit v2 index primers (Illumina Catalogue numbers FC-131-2001 to 2004) were added to each well. Finally, the 5 µl of Tagmentation mix was added and mixed. The PCR was run as follows: 72°C for 3 min, 95°C for 1 min, 14 cycles of 95°C for 10 sec, 55°C for 20 sec and 72°C for 3 min. Following the PCR, the libraries were quantified using the Quant-iT dsDNA Assay Kit, high sensitivity kit (Catalogue No. 10164582) and run on a FLUOstar Optima plate reader. After quantification, libraries were pooled in equal quantities. The final pool was double-SPRI size-selected between 0.5 and 0.7x bead volumes using KAPA Pure Beads (Roche Catalogue No. 07983298001). We then quantified the final pool on a Qubit 3.0 instrument (Invitrogen, MA, USA) and ran it on a high sensitivity D1000 ScreenTape (Agilent Catalogue No. 5067-5579) using the Agilent TapeStation 4200 to calculate the final library pool molarity. The pooled library was run at a final concentration of 1.8 pM on an Illumina NextSeq500 instrument using a mid-output flow cell (NSQ® 500 Mid Output KT v2 300 cycles; Illumina Catalogue No. FC-404-2003) following the Illumina recommended denaturation and loading parameters, which included a 1% PhiX spike (PhiX Control v3; Illumina Catalogue FC-110-3001). The data was uploaded to BaseSpace (http://www.basespace.illumina.com) and then converted to FASTQ files.
Oxford nanopore sequencing
We used the rapid barcoding kit (Oxford Nanopore Catalogue No. SQK-RBK004) to prepare libraries according to the manufacturer’s instructions. We used 400 ng DNA for library preparation and loaded 75 µl of the prepared library on an R9.4 MinION flow cell. The size of the DNA fragments was assessed using the Agilent 2200 TapeStation (Agilent Catalogue No. 5067-5579) before sequencing. The concentration of the final library pool was measured using the Qubit high-sensitivity DNA assay (Invitrogen, MA, USA).
Genome assembly and phylogenetic analysis
Sequences were analysed on the Cloud Infrastructure for Microbial Bioinformatics (32). Paired-end short-read sequences were concatenated, then quality-checked using FastQC v0.11.7 (33). Reads were assembled using Shovill (https://github.com/tseemann/shovill) and assemblies assessed using QUAST v 5.0.0, de6973bb (34). Draft bacterial genomes were annotated using Prokka v 1.13 (35). Multi-locus sequence types were called from assemblies according to the Achtman scheme using the mlst software (https://github.com/tseemann/mlst) to scan alleles in PubMLST (https://pubmlst.org/) (36). To identify and assign new STs, we used the ST search algorithm in EnteroBase, allowing for one allele mismatch (37). Snippy v4.3.2 (https://github.com/tseemann/snippy) was used for variant calling and core genome alignment, including references genome sequences representing the major phylogroups of E. coli and Escherichia fergusonii as an outgroup (Supplementary File 1B). We used Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) to detect and remove recombinant regions of the core genome alignment (38). RAxML v 8.2.4 (39) was used for maximum-likelihood phylogenetic inference from this masked alignment based on a general time-reversible nucleotide substitution model with 1,000 bootstrap replicates. The phylogenetic tree was visualised using Mega v. 7.2 (40) and annotated using Adobe Illustrator v 23.0.3 (Adobe Inc., San Jose, California). Pair-wise single nucleotide polymorphism (SNP) distances between genomes were computed from the core-gene alignment using snp-dists v0.6 (https://github.com/tseemann/snp-dists).
Population structure and analysis of gene content
Merged short reads were uploaded to EnteroBase (41) where we used the Hierarchical Clustering (HierCC) algorithm to assign our genomes from non-human primates to HC1100 clusters, which in E. coli correspond roughly to the clonal complexes seen in seven-allele MLST. Core genome MLST (cgMLST) profiles based on the typing of 2, 512 core loci for E. coli facilitates single-linkage hierarchical clustering according to fixed core genome MLST (cgMLST) allelic distances, based on cgMLST allelic differences. Thus, cgST HierCC provides a robust approach to analyse population structures at multiple levels of resolution. The identification of closely-related genomes using HierCC has been shown to be 89% consistent between cgMLST and single-nucleotide polymorphisms (42). Neighbour-joining trees were reconstructed with Ninja—a hierarchical clustering algorithm for inferring phylogenies that is capable of scaling to inputs larger than 100,000 sequences (43).
ARIBA v2.12.1 (44) was used to search short reads against the Virulence Factors Database (45) (VFDB-core) (virulence-associated genes), ResFinder (AMR) (46) and PlasmidFinder (plasmid-associated genes) (47) databases (both ResFinder and PlasmidFinder databases downloaded 29 October 2018). Percentage identity of ≥ 90% and coverage of ≥ 70% of the respective gene length were taken as a positive result. Analyses were performed on assemblies using ABRicate v 0.8.7 (https://github.com/tseemann/abricate). A heat map of detected virulence- and AMR-associated genes was plotted on the phylogenetic tree using ggtree and phangorn in R studio v 3.5.1. We searched EnteroBase for all E. coli strains isolated from humans in the Gambia (n=128), downloaded the genomes and screened them for resistance genes using ABRicate v 0.9.8. Assembled genomes for isolates that clustered with our colibactin-encoding ST73, ST127 and ST681 isolates were downloaded and screened for the colibactin operon using ABRicate’s VFDB database (accessed 28 July 2019). Assemblies reported to contain colibactin genes were aligned against the colibactin-encoding Escherichia coli IHE3034 reference genome (NCBI Accession: GCA_000025745.1) using minimap2 2.13-r850. BAM files were visualised in Artemis Release 17.0.1 (48) to confirm the presence of the pks genomic island which encodes the colibactin operon.
Hybrid assembly and analysis of plasmids and phages
Base-called FASTQ files were concatenated into a single file and demultiplexed into individual FASTQ files based on barcodes, using the qcat python command-line tool v 1.1.0 (https://github.com/nanoporetech/qcat). Hybrid assemblies of the Illumina and nanopore reads were created with Unicycler (49). The quality and completion of the hybrid assemblies were assessed with QUAST v 5.0.0, de6973bb and CheckM (34, 50). Hybrid assemblies were interrogated using ABRicate PlasmidFinder and annotated using Prokka (35). Plasmid sequences were visualised in Artemis using coordinates from ABRicate. Prophage identification was carried out using the phage search tool, PHASTER (51).
Antimicrobial susceptibility
We determined the minimum inhibitory concentrations of amikacin, trimethoprim, sulfamethoxazole, ciprofloxacin, cefotaxime and tetracycline for the isolates from non-human primates using agar dilution (52). Two-fold serial dilutions of each antibiotic were performed in molten Mueller-Hinton agar (Oxoid, Basingstoke, UK), from 32mg/L to 0.03 mg l-1 (512 mg l-1 to 0.03 mg l-1 for sulfamethoxazole), using E. coli NCTC 10418 as control. MICs were performed in duplicate and interpreted using breakpoint tables from the European Committee on Antimicrobial Susceptibility Testing v. 9.0, 2019 (http://www.eucast.org).
Results
Twenty-four of 43 samples (56%) showed growth indicative of E. coli, yielding a total of 106 colonies. The isolates were designated by the primate species and the site from which they were sampled as follows: Chlorocebus sabaeus, ‘Chlos’; Papio papio, ‘Pap’; Piliocolobus badius, ‘Prob’; Abuko Nature Reserve, ‘AN’; Bijilo Forest Park, ‘BP’; Kartong village, ‘K’; Kiang West National Park, ‘KW’; Makasutu Cultural Forest, ‘M’; and River Gambia National Park, ‘RG’. After genome sequencing, five isolates (PapRG-04, (n=1); PapRG-03 (n=1); ChlosRG-12 (n=1); ChlosAN-13 (n=1); ProbAN-19 (n=1)) were excluded due to low depth of coverage (<20x), leaving 101 genomes for subsequent analysis (Table 1).
We recovered 43 seven-allele sequence types (ten of them novel), spanning five of the eight known phylogroups of E. coli and comprising 38 core-genome MLST complexes (Figure 3). The majority of strains belonged to phylogroup B2 (42/101, 42%), which encompasses strains that cause extraintestinal infections in humans (ExPEC strains) (6–8). Strains from phylogroup B2 carried colonisation and fitness factors associated with extraintestinal disease in humans (Figure 3). A subset of the B2 strains (13/42, 31%), belonging to STs 73, 681 and 127, carried the pks genomic island, which encodes the DNA alkylating genotoxin, colibactin. Colibactin-encoding E. coli frequently cause colorectal cancer, urosepsis, bacteraemia and prostatitis, and are highly associated with other virulence factors such as siderophores and toxins (53–56).
Thirteen individuals were colonised by two or more STs and nine by two or more phylogroups (Supplementary File 1A). Five colony picks from a single Guinea baboon (PapRG-06) yielded five distinct STs, two of which are novel. Two green monkeys sampled from Bijilo (ChlosBP-24 and ChlosBP-25) shared an identical ST73 genotype, while two Guinea baboons from Abuko shared an ST226 strain—documenting transmission between monkeys of the same species. Among the monkey isolates, we found several STs associated with extraintestinal infections and/or AMR in humans: ST73, ST681, ST127, ST226, ST336, ST349 (57–62).
In seventeen monkeys, we observed a cloud of closely related genotypes (sepearated by 0-5 SNPs, Table 2A) from each strain, suggesting evolution within the host after acqusition of the strain. However, in two individuals, pair-wise SNP distances between genotypes from the same ST were susbtantial enough (25 SNPs and 77 SNPs) to suggest multiple acquisitions of each strain (Table 2B).
We identified the closest neighbours to all the recovered strains from our study (Table 3). Our results suggest, in some cases, recent interactions between humans or livestock and non-human primates. However, we also found a diversity of strains specific to the non-human primate niche. Hierarchical clustering analysis revealed that simian isolates from ST442 and ST349 (Achtman)— sequence types that are associated with virulence and AMR in humans (49, 55)—were closely related to human clinical isolates, with differences of 50 alleles and seven alleles in the core-genome MLST scheme respectively (Supplementary Figures 1-2). Similarly, we found evidence of recent interaction between simian ST939 isolates and strains from livestock (Supplementary Figure 3). Conversely, simian ST73, ST127 and ST681 isolates were genetically distinct from human isolates from these sequence types (Supplementary Figures 4-6). The multi-drug resistant isolate PapAN-14-1 from ST349 was, however, closely related to an environmental isolate recovered from water (Supplementary Figure 7).
Five isolates were >1000 alleles away in the core-genome MLST scheme from anything in EnteroBase (Supplementary Figures 8 & 9). Four of these were assigned to novel sequence types in the seven-allele scheme (Achtman) (ST8550, ST8525, ST8532, ST8826), while one belonged to ST1873, which has only two other representatives in EnteroBase: one from a species of wild bird from Australia (Sericornis frontalis); the other from water. Besides, ST8550, ST8525, ST8532, ST8826 belonged to novel HierCC 1100 groups (cgST complexes), indicating that they were unrelated to any other publicly available E. coli genomes.
We observed few antimicrobial resistance genes in our study population, compared to what prevails in isolates from humans in the Gambia (Figure 4). Phenotypic resistance to single agents was confirmed in ten isolates: to trimethoprim in a single isolate, to sulfamethoxazole in four unrelated isolates and to tetracycline in four closely related isolates from a single animal. A single ST2076 (Achtman) isolate (PapAN-14-1) belonging to the ST349 lineage was resistant to trimethoprim, sulfamethoxazole and tetracycline. The associated resistance genes were harboured on an IncFIB plasmid.
Eighty percent (81/101) of the study isolates harboured one or more plasmids. We detected the following plasmid replicon types: IncF (various subtypes), IncB/K/O/Z, I1, IncX4, IncY, Col plasmids (various subtypes) and plasmids related to p0111 (rep B) (Supplementary File 2A). Long-read sequencing of six representative samples showed that the IncFIB plasmids encoded acquired antibiotic resistance, fimbrial adhesins and colicins (Supplementary File 2B). Also, the IncFIC/FII, ColRNAI, Col156 and IncB/O/K/Z plasmids encoded fimbrial proteins and colicins. Besides, the IncX and Inc-I-Aplha encoded bundle forming pili bfpB and the heat-stable enterotoxin protein StbB respectively.
We generated complete genome sequences of five novel sequence types of E. coli (ST8525, ST8527, ST8532, ST8826, ST8827) within the seven-allele scheme (Achtman) (Supplementary File 3A) (63). Although none of these new genomes encoded AMR genes, ono of them (PapRG-04-4) contained an IncFIB plasmid encoding fimbrial proteins, and a cryptic ColRNA plasmid. PHASTER identified thirteen intact prophages and four incomplete phage remnants (Supplementary File 3B). Two pairs of genomes from Guinea baboons from different parks shared common prophages: one pair carrying PHAGE_Entero_933W, the other PHAGE-Entero_lambda.
Discussion
We have described the population structure of E. coli in diurnal non-human primates living in rural and urban habitats from the Gambia. Although our sample size was relatively small, we have recovered isolates that span the diversity previously described in humans and have also identified ten new sequence types (five of them now with complete genome sequences). This finding is significant, considering the vast number of E. coli genomes that have been sequenced to date (9, 597 with MLST via sanger sequencing, and 127, 482 via WGS) (64).
Increasing contact between animal species facilitates the potential exchange of pathogens. Accumulating data shows that ExPEC strains are frequently isolated from diseased companion animals and livestock—highlighting the potential for zoonotic as well as anthroponotic transmission (65–70). In a previous study, green monkeys from Bijilo Park were found to carry lineages of Staphylococcus aureus thought to be acquired from humans (31). Our analyses suggest similar exchange of E. coli strains between humans and wild non-human primates. However, non-human primates also harbour E. coli genotypes that are clinically important in humans, such as ST73, ST127 and ST681, yet are distinct from those circulating in humans—probably reflecting lineages that have existed in this niche for long periods.
We found that several monkeys were colonised with multiple STs, often encompassing two or more phylotypes. Although colonisation with multiple serotypes of E. coli is common in humans (30, 71) we were surprised to identify as many as five STs in a single baboon. Sampling multiple colonies from single individuals also revealed within-host diversity arising from microevolution. However, we also found evidence of acquisition in the same animal of multiple lineages of the same sequence type, although it is unclear whether this reflects a single transmission event involving more than one strain or serial transfers.
Antimicrobial resistance in wildlife is known to spread on plasmids through horizontal gene transfer (72). Given the challenge of resolving large plasmids using short-read sequences (73), we exploited long-read sequencing to document the contribution of plasmids to the genomic diversity that we observed in our study population. Consistent with previous reports (74), we found IncF plasmids which encoded antimicrobial resistance genes. Virulence-encoding plasmids, particularly colicin-encoding and the F incompatibility group ones, have long been associated with several pathotypes of E. coli (75). Consistent with this, we found plasmids that contributed to the dissemination of virulence factors such as the heat-stable enterotoxin protein StbB, colicins and fimbrial proteins.
This study could have been enhanced by sampling human populations living near those of our non-human primates; however, we compensated for this limitation by leveraging the wealth of genomes in publicly available databases. Besides, we did not sample nocturnal monkeys due to logistic challenges; however, these have more limited contact with humans than the diurnal species. Despite these limitations, however, this study provides insight into the diversity and colonisation patterns of E. coli among non-human primates in the Gambia, highlighting the impact of human continued encroachment on natural habitats and revealing important phylogenomic relationships between strains from humans and non-human primates.
Data bibliography
Foster-Nyarko, E. et al, NCBI BioProject PRJNA604701 (2020).
Forde, B. M., Ben Zakour, N. L., Stanton-Cook, M., Phan, M. D., Totsika, M. et al., 17 representative E. coli reference isolates (2014). NCBI accession numbers are provided in Table 1B.
Nougayrede J.P, Homburg S, Taieb F., Boury M., Brzuszkiewicz E., et al., Escherichia coli induces DNA double-strand breaks in eukaryotic cells (2006). NCBI accession: GCA_000025745.1.
Funding information
MP, EFN, NT, AR, GT, JO and GK were supported by the BBSRC Institute Strategic Programme Microbes in the Food Chain BB/R012504/1 and its constituent projects 44414000A and 4408000A. NFA and DB were supported by the Quadram Institute Bioscience BBSRC funded Core Capability Grant (project number BB/ CCG1860/1). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author contributions
Conceptualization, MA, MP; data curation, MP, NFA; formal analysis, EFN, analytical support, GT; funding, MP and MA; sample collection, JDC; laboratory experiments, EFN, DB; supervision, AR, NFA, GK, JO, MP, MA; manuscript preparation – original draft, EFN; review and editing, NT, AR, JO, NFA, MP; review of final manuscript, all authors.
Conflicts of interest
The authors have no conflicts of interest to declare.
Ethical statement
No human nor animal experimentation is reported.
Figure legends
Supplementary Figure 1. A Ninja neighbour-joining tree showing the phylogenetic relationship between Achtman ST442 strains from this study and all other publicly available genomes that fell within the same HC1100 cluster (cgST complex). The locations of the isolates are displayed, with the genome count displayed in parenthesis. Branch lengths display the allelic distances separating genomes. Gambian strains are highlighted in red. The sub-tree (B) shows the closest relatives to the study strains, with the allelic distance separating them displayed with the arrow. Dotted lines represent long branches which have been shortened.
Supplementary Figure 2. A Ninja neighbour-joining tree showing the phylogenetic relationship between the ST349 (Achtman) strain from this study and all other publicly available genomes within the same HC1100 cluster (cgST complex). The legend shows the locations of the isolates, with genome counts displayed in parenthesis. Gambian strains are highlighted in red. The study ST349 strain is separated from a clinical ST349 strain by only seven alleles (<7 SNPs), as depicted in the subtree (B). Long branches are shortened (indicated by dashes).
Supplementary Figure 3. A phylogenetic neighbour-joining tree reconstructed with the study ST939 (Achtman) strain and all publicly available genomes that fell within the same HC1100 cluster (cgST complex). The legend shows the locations of the isolates, with red highlights around the nodes indicating the Gambian strains. The allelic distance between the study strain and its nearest relative, a bovine ST939 strain, has been given, depicted by the arrow. Dotted lines indicate shortened long branches.
Supplementary Figure 4. A Ninja neighbour-joining tree reconstructed with Achtman ST73 colibactin+ strains from this study and all other publicly available ST73 (Achtman) strains that fell within the same HC1100 cluster (cgST complex) in EnteroBase (Reference 41). The sources of the isolates are displayed, with Gambian strains highlighted in red. The Gambian non-human primate strains are on separate long branches, although nested within clades populated by human strains from other countries, suggestive of probably an ancient transmission between the two hosts. The branch lengths for the Gambian strains are displayed. Dotted lines represent long branches which have been shortened.
Supplementary Figure 5. A Ninja neighbour-joining tree showing the phylogenetic relationship between ST127 strains from this study and other publicly available strains that occur within the same HC1100 cluster (cgST complex). The sources of the isolates are displayed in the legends, with Gambian strains highlighted in red. Branch lengths display the allelic distances separating genomes. The sub-tree (B) shows the closest relatives to the study strains, with the allelic distances separating them displayed with the arrow. Dotted lines represent long branches which have been shortened. Dotted lines represent long branches which have been shortened.
Supplementary Figure 6. A Ninja neighbour-joining tree showing the phylogenetic relationship between ST681 strains from this study and other publicly available strains that fell within the same HC1100 cluster (cgST complex). The study strains fell into two separate HC100 clusters, which are depicted in the two subtrees (B and C). The closest neighbours to both HC100 clusters are displayed, with the branch labels indicating the allelic distances between strains. The locations of the isolates are displayed for each tree, with Gambian strains highlighted in red. Dotted lines represent long branches which have been shortened.
Supplementary Figure 7. A phylogenetic tree showing the phylogenetic relationship between ST2076 strain (an MDR strain) and all other publicly available genomes that fell within the same HC1100 cluster (cgST complex). The legend shows the locations of the isolates, Gambian strains are highlighted in red. The subtree (B) shows the allelic distance between the study strain and its nearest relative, an ST2076 isolate recovered from water. Dotted lines indicate shortened long branches.
Supplementary Figure 8. A Ninja phylogenetic tree showing the closest neighbours of simian ST1873 strain—an environmental (soil) isolate belonging to ST83, separated from the study strain by 1659 alleles. The legends of both the main tree and the subtree show the locations of the isolates Gambian strains are highlighted in red. In the subtree (B), the closest neighbour to the simian ST1873 strain is also highlighted in red. Dotted lines are used to indicate shortened long branches.
Supplementary Figure 9. Ninja phylogenetic trees showing the closest neighbours to simian isolates belonging to novel sequence types (Achtman) ST8550 (A), ST8532 (B) and ST8525 (C), ST8826 (D). The allelic distances between these study isolates and their closest neighbours are >1100 alleles, and the closest neighbours belong to seven-allele STs which share less than five out of the seven MLST loci. Each genome (ST8550, ST8532, ST8525) belongs to a unique cgST complex (novel groups at HierCC 1100), indicative of novel diversity within the non-human primate niche.
Supplementary files
Supplementary File 1. A. Characteristics of the study population, displaying the primate species, their age and gender, and the E. coli sequence types (Achtman MLST STs) and phylotypes recovered from individual samples. Novel STs are designated by an asterisk (*). B. Reference strains that were included in this study.
Supplementary File 2. A. Predicted plasmids from short-read sequences, using ARIBA PlasmidFinder (Reference 44). B. A table indicating the virulence and (or) resistance genes located on representative plasmids that were sequenced by Oxford nanopore technology. The size of each plasmid and the functions of the respective genes encoded thereon are also indicated.
Supplementary File 3. A. A summary of the sequencing statistics of the novel sequence types derived from this study. B. Prophage types detected from long-read sequences using PHASTER (reference 51).
Supplementary File 4. A summary of the sequencing statistics of the study isolates.
Supplementary File 5. List of virulence factors detected using ARIBA VFDB (Reference 44).
Supplementary File 6. Pair-wise single nucleotide polymorphism distances calculated from the core genome alignment using snp-dists v0.6 (https://github.com/tseemann/snp-dists).
Acknowledgements
We want to thank Dr Andrew Page and Dr Thanh Le-Viet for their thoughtful advice on the long-read analysis. We also thank Dr Mark Webber for proofreading the manuscript and giving constructive feedback.
Footnotes
Error in Author name (Nabil-Fareed John Alikhan) edited (to Nabil-Fareed Alikhan).