Botulinum-neurotoxin-like sequences identified from an Enterococcus sp. genome assembly

Botulinum neurotoxins (BoNTs) are produced by diverse members of the Clostridia and result in a flaccid paralysis known as botulism. Exploring the diversity of BoNTs is important for the development of therapeutics and antitoxins. Here we describe a novel, bont-like gene cluster identified in a draft genome assembly for Enterococcus sp. 3G1_DIV0629 by querying publicly available genomic databases. The bont-like gene is found in a gene cluster similar to known bont gene clusters. Protease and binding motifs conserved in known BoNT proteins are present in the newly identified BoNT-like protein; however, it is currently unknown if the BoNT-like protein described here is capable of targeting neuronal cells resulting in botulism.

SRR5648109) for Enterococcus sp. 3G1_DIV0629 were downloaded from NCBI. To generate our own assembly, sequence data were assembled with the SPAdes assembler (22) and assemblies were annotated with Prokka (23). To test for potential laboratory constructs, contigs containing bont-like sequences were screened for vector contamination with VecScreen (https://www.ncbi.nlm.nih.gov/tools/vecscreen/). The contigs (nucleotide sequences) were compared to the NCBI nt database with blastn to gain insight into the origin of the contigs of interest (taxonomic origin and "genomic origin" -chromosome, plasmid or bacteriophage). Contigs were also compared to known plasmid sequences with progressiveMauve (24). Translated protein sequences of coding regions identified on contigs containing bont-like sequences were compared to the UniProt database (25) and the nr database with blastp to gain insight into the origin and function of proteins encoded on the contigs of interest.
Sequence data representing known BoNT serotypes and subtypes were downloaded from NCBI (Table 1). Genomic sequences containing botulinum neurotoxin genes were annotated with Prokka, and nucleotide and protein sequences of interest were extracted from Prokka output files. Gene cluster diagrams were generated with the R package genoPlotR (27). Aligned nucleotide sequences were compared with SimPlot (28) using the hamming distance option and filtering positions containing gaps in >75% of sequences to determine if the genes were the result of recombination. The diversity of the botulinum-neurotoxin-like sequences was also evaluated on the protein level as protein sequences are often more conserved than nucleotide sequences, which can be useful for comparing distantly related sequences. Botulinum neurotoxin protein sequences and BoNT-like protein sequences were aligned with MUSCLE, and a maximum likelihood phylogeny was inferred with IQ-TREE v1.5.5 (26) using the predicted best-fit model -VT+F+R4. The phylogeny was viewed with FigTree v1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/). Protein sequences representing known BoNTs and newly identified BoNT-like sequences were compared with blastp. Protein sequences of novel and established BoNTs were aligned with MUSCLE (29) and the alignment was viewed in MEGA (30) to evaluate motifs of interest.
To determine the phylogenetic relationship of strain 3G1_DIV0629 to other Enterococcus spp., the publicly available assembly and the in-house assembly (SPAdes) were compared to RefSeq genomes using Mash v2.0 (31) and the presketched RefSeq archive (https://gembox.cbcb.umd.edu/mash/refseq.genomes%2Bplasmid.k21s1000.msh). The contig containing bont-like toxin cluster sequences from both assemblies were also compared to the pre-sketched RefSeq and plasmid databases. Whole genome SNP analysis was used to further investigate the relationship of Enterococcus sp. 3G1_DIV0629 with other Enterococcus faecium isolates. E. faecium genome assemblies were downloaded from NCBI and SNPs were called from NUCmer (32,33) alignments to a reference genome (GCA_000174395.2) within NASP (34). SNPs identified in duplicated regions of the reference genome (with NUCmer self-alignments) were filtered from downstream analyses. A maximum likelihood phylogeny was generated from the concatenated SNP data (114904 positions) with IQ-TREE v1.5.5 (26) using the predicted best-fit model -TVM+F+ASC+R3.
Supplemental data files including the in-house genome assembly and BoNT protein sequences evaluated in this study are available on githubhttps://github.com/chawillia/bont-like-sequences_2017.git.

Results:
Blastp searches of BoNT sequences against publicly available databases revealed a BoNT-like protein from the recently whole genome sequenced Enterococcus sp. 3G1_DIV0629. The bont-like gene sequence was associated with a gene cluster that included an ntnh-like gene, orfX-like genes and a p47-like gene, which is similar to known botulinum neurotoxin gene clusters (10) (Figure 1). While the order and orientation of the genes in the Enterococcus sp. 3G1_DIV0629 cluster are similar to known orfX+ botulinum neurotoxin gene clusters, only two of the three orfX genes (orf-X2 and orf-X3) are present in the bont-like gene cluster of Enterococcus sp. 3G1_DIV0629, and these genes are in the opposite orientation of orfX genes in known botulinum neurotoxin gene clusters. This orientation of the orfX genes is similar to a recently identified bont-like gene cluster on the chromosome of C. botulinum strain 111 (8) (Figure 1). A SimPlot comparing aligned nucleotide sequences ( Figure 2) indicates a consistent level of identity along the length of the bont-like gene sequence when compared to known botulinum neurotoxin genes, which suggests that the gene is not the result of a novel recombination of multiple serotypes. Zhang and colleagues (8) made similar observations when evaluating the bont-like sequence in C. botulinum strain 111.
Sequence data representing each of the botulinum neurotoxin serotypes and subtypes were downloaded from NCBI (Table 1) and predicted protein sequences were compared. A phylogeny of botulinum neurotoxin protein sequences (Figure 3), including the BoNT-like sequence described here, indicates that the BoNT-like protein predicted from the assembly for Enterococcus sp. 3G1_DIV0629 is not closely related to known BoNTs that cause most botulism cases in humans (serotypes A and B). The BoNT-like sequence is most closely related to a BoNT-like protein sequence recently identified on the C. botulinum Group I Strain 111 chromosome (8). The predicted protein sequences of the bont-like toxin cluster genes in Enterococcus sp. 3G1_DIV0629 were also queried against known botulinum toxins and associated proteins with blastp ( Table 2). The BoNT-like protein sequence of Enterococcus sp. 3G1_DIV0629 generally shares approximately 29% (median=29.12%, median coverage of 97% - Table 2) identity with known BoNT proteins. However, the BoNT-like sequence from Enterococcus sp. 3G1_DIV0629 shares 38.54% identity with the BoNT-like sequence from C. botulinum strain 111. As observed with the BoNT-like protein sequences in W. oryzae SG25 and C. botulinum strain 111 (7,8), the HExxH metallopeptidase motif (HELCH in Enterococcus sp. 3G1_DIV0629) is conserved in the predicted BoNT-like protein sequence of Enterococcus sp. 3G1_DIV0629 ( Figure 4A). Additionally, the SxWY binding motif is conserved in the BoNT-like protein sequence of Enterococcus sp. 3G1_DIV0629 ( Figure 4B) which suggests the protein could potentially interact with neuronal cells. The NTNH-like, P47-like and ORFX-like protein sequences also show relatively low sequence identity to protein sequences associated with known BoNTs ( Table 2).
The genome assembly for Enterococcus sp. 3G1_DIV0629 was investigated to characterize the contig containing the bont-like gene cluster and to determine the relationship of the strain to other Enterococcus spp. Contigs containing bont-like sequences, NGLI01000004.1 and NODE_32 from an in-house assembly (SPAdes), show no signs of vector contamination when screened with VecScreen against the UniVec database. Several lines of evidence suggest the bont-like gene cluster has been inserted into the Enterococcus genomic background and is putatively located on a plasmid. The GC content of the contigs containing the bont-like sequences is slightly lower than the GC content of the entire genome assembly ( Table 3). Comparisons of bacterial chromosomes with plasmids and insertion sequences have shown that the GC content of the chromosome is often higher than the GC content of the associated plasmid or insertion sequence (35,36). Comparisons of the contigs containing the bontlike sequences to reference genomes and plasmids with Mash distances suggest that the contigs are most closely related to Enterococcus spp. plasmid sequences. Additionally, blastn comparisons of the entire contigs as well as the regions adjacent to the bont-like gene cluster share identity with sequences associated with Enterococcus spp. plasmids. Contigs from an in-house genome assembly for Enterococcus sp. 3G1_DIV0629 were compared to E. faecium plasmids with progressiveMauve ( Figure  5). The regions surrounding the bont-like gene cluster share homology with known Enterococcus plasmid sequences. However, the genomic location of the bont-like gene cluster is only putative due to the draft nature of the genome assembly. Coding region sequences in the regions adjacent to the bont-like gene cluster are annotated as transposases, which could explain how the bont-like genes were inserted into the Enterococcus genomic background.
The draft genome assembly of Enterococcus sp. 3G1_DIV0629 was compared to publicly available genomic data to determine how this strain is related to other Enterococcus spp. Comparisons of the genome assembly against RefSeq genomes with Mash indicate that top hits for the Enterococcus sp. 3G1_DIV0629 assembly are members of Enterococcus faecium. Thus, Enterococcus sp. 3G1_DIV0629 was compared to E. faecium genomes with a core genome SNP analysis ( Figure 6). The most closely related genomes to Enterococcus sp. 3G1_DIV0629 are E. faecium T110 (GCA_000737555.1) and E. faecium L-X (GCA_000787065.1), both of which are identified as probiotic strains.

Discussion:
Exploring the diversity of BoNTs is important for the development of therapeutics and antitoxins and also to understand the function, evolution, and pathogenesis of botulinum neurotoxins. Here we describe a novel, bont-like gene cluster that was identified in a draft genome assembly for Enterococcus sp. 3G1_DIV0629 by querying publicly available genomic databases. A similar data mining approach was recently used to identify a novel botulinum-neurotoxin-like gene cluster in a Group I C. botulinum genome, strain 111 (8). Metadata associated with the Enterococcus sp. 3G1_DIV0629 genome assembly indicate the bacterium was isolated from a bovine fecal sample in South Carolina, USA. The isolate is closely related to members of E. faecium. E. faecium strains are Gram-positive, non-endospore forming facultative anaerobes that are members of the Firmicutes and commonly inhabit the digestive tracts of mammals (37,38). E. faecium has been associated with antimicrobial resistance and a variety of human infections (39)(40)(41). Interestingly, antagonistic interactions between E. faecium and C. botulinum have been reported. E. faecium has been described to inhibit the growth of C. botulinum strains and inhibit BoNT production (42)(43)(44). The impact of ruminal microbial communities, particularly Enterococcus spp., on bovine botulism has been an area of on-going research (45)(46)(47)(48).
This report is the first instance of identifying botulinum-neurotoxin-like sequences in a member of the Enterococcus. The bont-like gene cluster identified in Enterococcus sp. 3G1_DIV0629 is putatively located on a plasmid sequence, and the regions surrounding the bont-like gene cluster are putatively of Enteroccocus origin. Coding region sequences in the regions adjacent to the bont-like gene cluster are annotated as transposases. Horizontal gene transfer of botulinum neurotoxin genes via association with transposases has been described (17,18). One explanation for the presence of the bont-like gene cluster in Enterococcus sp. 3G1_DIV0629 is that the gene cluster was inserted from an unknown source into the Enterococcus genomic background (putatively on a plasmid) long ago and has since diverged from known botulinum neurotoxin genes. Alternatively, this gene cluster could be the result of a more recent insertion event from a divergent source (divergent from known BoNTs). The toxin-like gene cluster sequence could provide some unknown ecological advantage for survival in the environment or in a host. Similar bont-like sequences may be present in additional organisms that have yet to be sampled as they are difficult to culture or are not pertinent to studies regarding human health, which have driven a great deal of genome sequencing efforts.
The botulinum-neurotoxin-like gene cluster in Enterococcus sp. 3G1_DIV0629 was identified using bioinformatic search tools and publicly available genomic sequence data. The gene cluster is similar to known botulinum neurotoxin gene clusters that contain orfX+ genes. The HExxH protease motif and the SxWY binding motif that are conserved in known BoNTs are present in the predicted BoNT protein sequence. Importantly, it is currently unknown if the BoNT-like protein described here is capable of targeting neuronal cells resulting in botulism or if the BoNT-like protein and associated proteins are even expressed by the bacterium; there is likely a fitness cost with maintaining the bont-like gene cluster, which suggests that this region has an ecological function that has yet to be identified. No known cases of botulism have been attributed to Enterococcus spp. Thus, it seems unlikely that these microbes are expressing proteins that cause botulism in humans; additional experiments would be required to test this hypothesis. Bioinformatic studies such as this coupled with laboratory experiments can inform our understanding of the diversity and evolution of Clostridial toxins.