1 Diversity of Secretion System Apparatus in Tomato Wilt Causing 2 Ralstonia solanacearum Strains: a Comparative Analysis Using in-3 silico Approach

24 Ralstonia solanacearum (Rs) species is the leading cause of bacterial wilt disease in a wide range 25 of host plants worldwide. In the present study, secretion system analysis of five tomato pathogenic 26 Rs strains was carried out in-silico . This paper describes a new protocol to identify the secretion 27 system components i.e. SSCs (T1SS-T6SS, Flg, T4P, and Tad-Tat). A total of 865 SSCs were 28 identified using the new protocol. Contributions of SSCs into core-secretion system apparatus (i.e. 29 SSA) were also studied. Synteny was discovered among the secretion system apparatus (SSA) 30 where relative frequency of SSCs to core-SSA is high (>20%) which includes T1SS, T2SS, T5SS, 31 T4P, and Tad-Tat, but excludes T3SS, T4SS, and Flg. To the best of our knowledge, this is the 32 first report indicating that during the evolution of Rs , most of the secretion system apparatus 33 (T1SS, T2SS, T5SS, T4P, and Tad-Tat) were highly conserved and came from a single ancestor, 34 while T3SS and T6SS may have arrived later, probably from horizontal gene transfer. 35 36 37


Introduction
Ralstonia solanacearum (hereafter referred to as 'Rs') is an aerobic , non-spore-forming, gramnegative, plant pathogenic bacterial species that includes a group of β-proteo-bacteria (1).Rs is considered a major plant pathogen and has been divided into four distinct monophyletic phylotypes (2).Rs exhibits some unique features of plant pathogens, including abundance in rhizospheric soil, large host specificity (>200 plant species), tissue-specific tropism, invasion of the root system, and multiplication and colonization (> 10 9 c.f.u. per g fresh weight) in xylem vessels (3).Therefore, Rs is declared as 'priority plant pathogens' in many countries in the world.Rs is also classified as 'quarantine organisms', 'bioterrorism', and 'double usage agents' by different regulatory authorities in the USA and Europe (4).
Many bacterial secretion systems have been identified in Rs (5).As per the current understanding, bacterial secretion is made up of a number of specialized systems, such as type I-VI secretion systems (T1SS-T6SS), along with flagella (Flg), type IV pili (T4P), and tight adherence (Tad) secretion systems (6).It is well established that bacterial secretion systems contribute to host specificity.Over decades of intensive research, Rs has become a model system for investigating mechanisms of plant pathogenesis (7).However, the study of secretion systems diversity within Rs is rare.
As a model organism, Rs gives us a unique opportunity to study bacterial secretion system apparatus of five tomato pathogenic strains (Rs GMI1000, Rs CFBP2956, Rs CMR15, Rs FQY_4, and RsPSI07), representing four phylotypes.Here, we have tested the hypothesis that there might be a high degree of variation in the secretion system components (SSC) of Rs due to their ability to infect different types of hosts.With the advent of improved data analysis tools, many bioinformatic platforms have been introduced to identify bacterial secretion systems across entire genomes.Previously, secretion systems of 2643 bacterial genome have been investigated by a standalone programme i.e.TXSScan (6).However, in this present in-silico investigation, we utilized the KO (KEGG Orthology) based protocol to identify bacterial secretion system-related candidate genes and protein orthologs.Host specificity may have an impact on bacterial secretin system.To avoid this factor, we have selected and explored the diversity of secretion system only the strains that infect tomatoes.This study is important to determine the infection pattern of a deadly pathogen within a common host, and here we have elucidated the impact of current data analysis tools to enhance scientific knowledge.These secretion-related genes might be a future target to control the infection in the agricultural field.

Data Mining
This in-silico study was undertaken specifically to compare the bacterial secretion systems of five tomato infecting strains of R. solanacearum (Table 1 S1).Orthologs of secretion system components were derived from KEGG-KO (KO or KEGG ORTHOLOGY database of Kyoto Encyclopaedia of Genes and Genomes) database.These orthologs were verified within the selected genomes using KEGG-Genome database.S1 (considered as reference set).These systems were selected in order to maximize sequence diversity.To identify bacterial secretion system related genes this reference dataset was examined manually using the KO (KEGG ORTHOLOGY) database (Table S2).All the KO (orthologs) was validated by their presence or absence in all the selected R. solanacearum genomes in KEGG-Genome database (Table S3).A flowchart of the experimental design is presented in Fig. 1.

Fig 1.
Protocol to identify secretion system components of five tomato pathogenic R. solanacearum strains.

Synteny Analysis
Synteny is the study of the conservation of gene order.It is an important tool to predict the functional relationships between genes and to assess the orthology of genomic regions (14).In this study, SyntTax (a web service designed to take full advantage of the large amount of archaeal and bacterial genomes by linking them through taxonomic relationships) was used to determine the synteny among the SSC subcomponents (14).In SyntTax, the synteny methodology is based on the Absynte algorithm (15).

Phylogenetic Analysis
The phylogenetic analysis of five R. solanacearum strains was performed based on 16S rRNA gene sequence.Full length 16S rRNA gene sequences were retrieved from the whole genome sequence using RNAmmer 1.2 server (16).The phylogenetic tree was constructed by neighbourjoining algorithm using MEGA 6.0 software (17), taking Xanthomonas oryzae pv.oryzae KACC 10331 as an out-group member.

Statistical Analyses
All SSC-related genes were scored based on their copy numbers obtained from the KO database.
Based on this matrix, bacterial secretion system components present in all the R. solanacearum genomes were identified as "core secretion system apparatus" or SSA.Contributions of all the systems and sub-systems in the "super SSC" of R. solanacearum were determined and presented in terms of a Venn diagram and occurrence frequency (%).Venn diagrams were calculated with the help of an online tool provided by Bioinformatics & Evolutionary Genomics (available at: http://bioinformatics.psb.ugent.be/webtools/Venn/).Contribution of the individual secretion system into the "core SSC" was measured through relative frequency.The genetic relationships among the R. solanacearum strains were estimated using the Euclidean distance matrix using Past3.17software (18).The matrix data were further evaluated by Principal Component Analysis (PCA) as minimum distance, using Past 4.08 software.Genetic distance was estimated based on pair-wise comparisons and was represented in the form of a dendrogram (Figure 6).Boot strap analysis was carried out for 10,000 replications of the dendrogram.

Identification & distribution of Secretion System Components (SSCs)
Identification of SSC were accomplished in three steps: identification of references from published literature and databases (Total entry 294, Table S1), then searching for the orthologs in KEGG-KO database (Total entry 226, Table S2), and then validation of datasets by KO numbers in KEGG Genome database (Total ortholog entry validated 145, Table S3).In this method, a super R. solanacearum SSC (cumulative no.865) containing nine secretion systems and 145 orthologs (Table S3) were identified (Fig. 1).Furthermore, the unique distribution of SSC into five strains was demonstrated in a matrix plot with protein orthologs (Fig. 2).Among the 145 orthologs, 60 were identified in all the strains and defined as the "core SSC" (Fig. 3a).

Flg
In the present study 18 orthologs (Table S3) for Flg SSACs were identified commonly present in all the experimental genomes except Rs CFBP2957 (Fig. 3h).None of these orthologs have contributed in core-SSA (Fig. 4a).However, no synteny was found for Flg components.

T4P and Tad-Tat
In the case of T4P SSA, 12 orthologs were identified (Table S3).Among all the SSA, the contribution of T4P in 'core-SSA' was maximum (relative frequency of 100%); whereas it is 83.33% for other pili system i.e.Tad-Tat (Fig. 3a).Synteny of T4P and Tad components (Fig. 5a,b) indicate that T4P and Tad originated from a single ancestor.These diagrams were generated from Synteny webserver.

Secretion System vs. Phylogeny
Multivariate analysis was conducted based on the similarity matrix drawn from the presence and absence of 145 SSC orthologs.In PCA analysis, three main clusters were identified in the bi-plot comprising PCA1-PCA2 (Fig. 6a), which exhibited differently, clusters of Rs strains.However, the resulting cluster dendrogram (Fig. 6b) showed significant similarity with 16S rRNA based phylogeny (Fig. 6c).Both the clusters dendrogram and phylogenetic tree were supported by high boot strap values (>50) and a common root from the out-group X. oryzae pv.oryzae KACC 10331.S2).(c) Reference phylogenetic tree based on 16S rRNA sequence showing genetic relationships among the five strains of R. solanacearum: Genetic distance was calculated based on p-distance, and the phylogenetic tree was constructed by neighbor-joining algorithm using MEGA6 software.Both the cluster dendrogram and phylogenetic tree were compared with the geographical origin of the strains.

Discussion
In our present investigation, a total of 865 SSC representing 145 orthologs distributed into different recreation system viz.T1SS-T6SS, Flg, T4P and Tad-Tat were identified through the protocol described here (Fig 1).The protocol is based on identification and validation of orthologs from the KO database and Genome database respectively.Annotation of secretion system apparatus components was done earlier by Abby et al. ( 6).In the present study, manual curating is important to deal with proper annotation of secretion system apparatus components for the following reasons.Firstly, there may be homologous secretion system apparatus components present within a single strain.For example, Vir and Trb proteins of T4SS were homologous.Vir proteins were identified in Rs CMR15, while Trb proteins were identified in Rs GMI100 and Rs FQY_4.Secondly, there may be different orthologs of a single SSC.For example, four orthologs of HlyD were identified (namely: K01993, K02005, K11003, and K12542).Thirdly, even a single ortholog may have more than one copy number within a single strain.For example, GspD (K02453) has four copies in Rs GMI1000.Finally, these copies may be distributed in bacterial chromosomes and plasmids.This may have appeared as a result of gene duplication and then transferred to the plasmid or may have appeared through horizontal gene transfer.Therefore, annotating the SSC orthologs is very critical.Nevertheless, investigation on the SSCs using the KO database has its own limitations.The main limitation is the limited number of genomes available in the KEGG database.In addition to that, there are instances where a KO number was not assigned.
In our present study, all the selected strains were from a single bacterial species (R. solanacearum) infecting a common host (tomato).Still, a significant difference was found in their secretion system (Fig. 2).Out of 145 SSC orthologs, only 60 were identified in all the strains and defined as the "core-SSC" for tomato pathogenic strains of Rs (Fig. 3a).The secretion system components have been studied extensively and have repprted in different animal pathogenic bacteria such as Pseudomonas aeruginosa (19,20), Vibrio cholerae (21,22), Salmonella sp.(23), Escherichia coli (24), Bacillus subtilis (25) etc.However, in depth reports regarding the components of secretion system on plant pathogens are scanty.Previously few reports have been published addressing some specific secretion system in R. solanacearum such as type III (26) and type VI (27).However, detail observation of all secretion systems in association with host is not reported yet.In this present study, we have explored the distribution of secretion system components and have compared among five RS strains infecting a common host.Result clearly demonstrates that SSCs are intrinsic property of the individual strain, and thus the secretion system of a particular strain is independent of its host.There are many reports available on the comparative analysis of bacterial strains.But, there are only a few reports on studying the synteny of bacterial secretion systems.The variation and synteny in the T6SS operon within plant and animal associated proteobacteria was previously described by Wu et al. (28).In the present investigation, synteny was observed among T1SS, T2SS, T5SS (Fig. 4b-d), T4P, and Tat-Tad (Fig. 5a,b).Synteny was not observed in T3SS, T6SS, and Flg.Interestingly, T3SS and T6SS are among the lowest contributors to secretory systems for "core bacterial secretion system proteins" (Fig. 4a).This indicates that during the evolution of Rs, most of the secretion systems (T1SS, T2SS, T5SS, T4P, and Tad) were highly conserved and came from a single ancestor.T3SS is responsible for bacterial invasion into the host following intracellular replication of bacteria and triggering apoptosis of the host cell.T6SS is considered an important virulence factor of bacteria which also aids in the formation of biofilms.The absence of synteny in these two systems among the strains of the same bacterial species infecting the same host (tomato) suggests the horizontal transfer of genes in T3SS and T6SS of Rs. (29).To the best of our knowledge, this is the first report on synteny of bacterial secretion systems of tomato wilt causing strains of R. solanacearum.

Conclusions
Our present report provides cluster dendrogram with SSC shows close resemblance with 16S rRNA based phylogeny, which suggests that the bacterial secretion systems are an intrinsic property of the strain.To minimize the secretion system diversity, we selected the Rs strains with a common host; still, we found only 60 out of 145 SSC orthologs as core-SSA.This is for the first time synteny was found to be absent in SSA that contributes less (>20%) in the Core-SSC among the strains of Rs infecting a common host (tomato).To our knowledge, we think this is the first study that clearly indicates that during the evolution of Rs, most of the bacterial SSC (T1SS, T2SS, T5SS, T4P, and Tad-Tat) were highly conserved and came from a single ancestor.
T3SS and T6SS have evolved into the strain probably from horizontal gene transfer.Our present findings give a new insight into the Global threat of Rs.T3SS and T6SS components may be targeted to control R. solanacearum infestations in the agricultural field in the future.

Fig 2 .Fig 3 .
Fig 2. Matrix plot showing similarity and dissimilarity of bacterial secretion systems of five

Fig. 4 .
Fig. 4. (a) Bar diagram representing contribution of the SSCs into the "core-SSC" of five tomato

Fig. 5 .
Fig. 5. (a) Synteny diagram showing conservation of T4P for the five strains infecting same host.
). Fully annotated whole genome sequences of R. solanacearum strains infecting tomatoes were searched from publicly available databases.

Table 1 .
List of the five-tomato pathogenic Ralstonia solanacearum genomes used in this study.