Pan-species transcriptomic analysis reveals a constitutive adaptation against oxidative stress for the highly virulent Leptospira species

Different environments exert selective pressures on bacterial populations, favoring individuals with particular genetic traits that are well-suited for survival in those conditions. Evolutionary mechanisms such as natural selection have, therefore, shaped bacterial populations over time selecting, in a stepwise manner, the fittest bacteria that gave rise to the modern lineages that exist today. Advances in genomic sequencing, computational analysis, and experimental techniques continue to enhance our understanding of bacterial evolution and its implications. Nevertheless, these are often limited to genomic comparisons of closely related species. In the present study, we introduce Annotator-RNAtor, a graphical user interface (GUI) method for performing pan-species transcriptomic analysis and studying intragenus evolution. The pipeline uses third-party software to infer homologous genes in various species and highlight differences in the expression of the core-genes. To illustrate the methodology and demonstrate its usefulness, we focus on the emergence of the highly virulent Leptospira subclade known as P1+, which includes the causative agents of leptospirosis. Here, we expand on the genomic study through the comparison of transcriptomes between species from P1+ and their related P1-counterparts (low-virulent pathogens). In doing so, we shed light on differentially expressed pathways and focused on describing a specific example of adaptation based on a differential expression of PerRA-controlled genes. We showed that P1+ species exhibit higher expression of the katE gene, a well-known virulence determinant in pathogenic Leptospira species correlated with greater tolerance to peroxide. Switching PerRA alleles between P1+ and P1-species demonstrated that the lower repression of katE and greater tolerance to peroxide in P1+ species was solely controlled by PerRA and partly caused by a PerRA amino-acid permutation. Overall, these results demonstrate the strategic fit of the methodology and its ability to decipher adaptive transcriptomic changes, not observable by comparative genome analysis, that may have been crucial for the emergence of these pathogens. Author summary Natural selection is one of the central mechanisms of the bacterial evolution. Speciation events and adaptation occurs such as mutations, deletions and horizontal gene transfers to enhance our understanding of evolution. Nevertheless, these are often limited to genomic comparisons between species. Here, we are developed a graphical user interface method, named Annotator-RNAtor, to perform pan-species transcriptomic analysis and studying intragenus evolution. To illustrate the methodology, we focus on the emergence of the virulent Leptospira species, causative agents of leptospirosis. We shed light on a differential regulation of several PerRA-controlled genes in P1+ Leptospira subclade (highly virulent pathogens) compared to P1-Leptospira subclade (low virulent pathogens). P1+ species exhibit higher expression of the catalase-encoding gene katE, than P1-species, correlating with a greater ability to withstand peroxide. Additionally, we demonstrate that the difference in katE expression is mediated only by PerRA and the residue 89 of the PerRA protein participates on this regulation. These findings highlight the importance to decipher adaptative transcriptomic changes to fully understand the emergence of pathogenic species.

293 respectively, as enumerated using a Petroff-Hausser counting chamber. The animals were monitored 294 daily and euthanized by carbon dioxide inhalation upon reaching the predefined endpoint criteria (sign 295 of distress, morbidity). To assess leptospiral load, blood, kidney, and liver were sampled and DNA was 296 extracted with the Tissue or Blood DNA purification kit (Maxwell, Promega). The burden in blood and 297 tissues was determined by qPCR with the Sso Fast EvaGreen Supermix assay (Bio-Rad) using the flaB2 298 gene, and the concentration of host DNA was quantified using the gapdh gene. Leptospira load was 299 expressed as genomic equivalent (GEq) per µg of host DNA.  There are two primary modules to analyse data in this pipeline (Fig 2). In the initial steps, 311 Annotator module assigns homologous sequences in the different species studied and reannotates the 312 genomes. Annotator uses two different methods namely NetworkX [26] and GET_HOMOLOGUES [27] 313 to assign homology. The user can define similarity cut-off for NetworkX and identity cut-off for 314 GET_HOMOLOGUES. This will depend on the phylogenetic proximity of the strains studied (herein 315 we use 60% and 45% respectively). In the end, the pipeline reannotates the homologous genes with 316 common user-defined unique tags that are found in the different species (convert GFF to GTF). A gene 317 association table is generated at the end of this module which shows presence-absence of genes in the 318 genomes (S4 Table). Singletons are not reannotated using the pipeline and can be seen as locus tags in 319 the table. Once the gene association table is generated, the RNAtor module is used to map transcriptome 320 data onto the genomes to generate a counts matrix file. This matrix is generated using the .GTF files 321 (containing reannotated common unique names), BWA for mapping and featureCounts scripts to count 322 the number of reads by genes. The final counts comparison file (S4 Table) shows the expression of the 323 homologous genes in all the species studied. Species-specific gene counts can also be seen.  The reannotated genomes by the two methods generated two new .GTF files. These files were 361 used to map the reads to their corresponding genomes using BWA and the number of reads per gene was 362 determined using featureCounts, which produced a .csv table containing the reannotated genes and their 363 respective counts (S4 Table). For inferring differential expression between the P1+ and P1-subclades, 364 standard DESeq2 analysis was employed, excluding genes that were absent or not expressed in more 365 than one species. A total of 2323 and 2370 instances were analyzed using data from 366 GET_HOMOLOGUES and NetworkX, respectively (S4 Table). The volcano plot (Fold change in 367 function of the padj values) in Figure 3 presents the results obtained with NetworkX ( Fig 3A) and 368 GET_HOMOLOGUES ( Fig 3B). Notably, both methods yielded highly similar results, as the majority 369 of over-expressed and under-expressed genes in the P1+ subclade were identified by both methods (see

379
Concerning the differentially expressed genes identified using this methodology, we used a 380 STRING association analysis to visually shed light on pathways that could be differentially expressed 381 between two groups ( Fig 3E). From this, a clear overexpression of genes encoding the flagellation 382 machinery was observed in P1+ and an underexpression of pathways from the general metabolisms.
383 Indeed, the functional COG [36] analysis revealed a significantly higher representation (20-fold) of the 384 cell motility category in the overexpressed genes compared to the underexpressed ones (S5 Table). There To validate the method, our focus was solely on genes that exhibited the most differential 391 expression between the P1+ and P1-groups. Interestingly, we observed two genes, tonB and katE, which 392 are part of two distant loci, showing altered expression between the two groups (see ank/perR and 393 exbD1/exbB genes respectively in Fig 3C). Intriguingly, both loci are part of the PerRA regulon, with 394 tonB and katE being respectively the most underexpressed and overexpressed genes in a L. interrogans 395 perRA deletion mutant [21,22]. Importantly, most of the genes detected differentially expressed in this 396 mutant were also differentially expressed between P1+ and P1-(S6 Table). Most of the genes from this 397 regulon are known to be modulated in the presence of peroxide to protect the cell against oxidative stress.
398 Therefore, our results strongly suggest a disparity in the expression of the PerRA regulon between species 399 from the P1+ and P1-subclades.  H89N (perRA P1+-H89N ). We first verified that all complemented strains 468 restored PerRA production to a similar level as L. interrogans WT (Fig 7A). Complementing the L.
469 interrogans perRA mutant with perRA P1+, led to a lower katE repression than with perRA P1+-H89N or 470 perRA P1- (Fig 7B). Interestingly, not only katE was dramatically more repressed by perRA P1-but  H89N (Fig 8B). Furthermore, the reduced repression of katE observed in the mutant L.
488 adleri perRA complemented with perRA P1+ correlated with higher catalase activity in the presence of 489 H 2 O 2 and an increase in tolerance to peroxide (Fig 8C-D). Since P1-species, including L. adleri, are 490 rapidly cleared from hamsters, we could solely assess Leptospira burden in blood, kidney and liver 1-491 day post-infection but no significant difference was observed between the different L. adleri strains (Fig   492 8E). This indicates that the difference of catalase activity between P1+ and P1-species is not the only 493 determinant of their difference in this model of acute infection.

494
Altogether, these findings demonstrate that difference in katE repression and ability to tolerate 505 Analysis of gene expression in multiple species is highly dependent on the evolutionary relationship 506 between the orthologous and co-orthologous genes present, as well as a robust phylogeny. Cross-species 507 analysis of distantly related species is complex due to the identification of homologous genes, and it may 508 result in false predictions, grouping non-related genes into the same clusters [53]. In this study, we 509 developed a stand-alone GUI pipeline to analyze intra-genus species using homology. The pipeline is 510 particularly useful for understanding subtle variations in the transcriptomic repertoire of phylogenetically 511 related species within the same genus, which may exhibit varied phenotype such as morphological 512 features or pathogenicity. Although our pipeline is a strong tool for analysis, it has certain limitations 513 that could be refined in the future. Orthology may be incorrectly assigned in poorly annotated or 514 divergent genomes, which could lead to bias in the analysis. To address this, we use standardized 515 annotation and employ two methods for homologues assignment. In addition, even though they concern 516 only few genes from the core genome, duplications of genes in specific species also represent a 517 significant challenge for analysis. Aside from this, we demonstrated that this unique tool is able to 518 pinpoint some major changes in core genes expression between two groups of species. Using similar 519 methodology, we have recently detected differential gene expression patterns between Multicellular  563 interrogans perRA mutant. Therefore, although important, the H89N alone could not explain the higher 564 gene repression by PerRA in P1-species. Additional divergence in PerRA protein sequence between P1+ 565 and P1-species can be identified, including at positions 88 and 105, (Fig 6 and S1 Fig). PerRAs from 566 P1-species possess charged amino acids at the position 88 and 105 while those of P1+ species possess 567 polar or hydrophobic amino acids. We can infer that such differences might also participate in the 568 difference of PerRA activity between P1+ and P1-species. Thorough biochemical characterization of the 569 PerRAs from the two P1 subgroups will be necessary to understand the exact role of these different 570 residues and to verify the hypotheses mentioned above.

571
Catalase has been shown to be required for Leptospira virulence [24]. KatE gene is present in all 572 species from the P1 subclade but this study suggests that a reduced repression of katE resulting in a 573 higher catalase activity may have emerged as an evolutionary advantage for highly virulent species.
574 Together with a higher catalase activity, P1 + species have a reduced expression of a cluster encoding 575 the TonB-dependent energy transduction system. TonB-dependent transporters are often involved in the 576 uptake of ferric siderophores, among other molecules. Accumulation of iron upon peroxide stress 577 worsens oxidative damage because it favors the production of hydroxyl radicals through the Fenton 578 reaction. Lowering iron uptake together with increasing peroxide breakdown by catalase would allow 579 P1+ species to be better equipped to resist the oxidative stress encountered when infecting a host. Using