A survey of cis regulatory non-coding RNA involved in bacterial virulence

Study of pathogenesis in bacteria is important to find new drug targets to treat bacterial infections. Pathogenic bacteria, including opportunists, express numerous so-called virulence genes to escape the host natural defenses and immune system. Regulation of virulence genes is often required for bacteria to infect their host. Such regulation can be achieved by cis-regulatory RNAs, like the metabolite-binding riboswitches or thermoregulators. In spite of the hundreds of RNA families annotated as cis-regulatory, there are relatively few examples of non-coding RNAs (ncRNAs) in 5′-UnTranslated Regions (UTRs) of bacteria described to regulate downstream virulence genes. To reassess the potential roles of such regulatory elements in bacterial pathogenesis, we collected genes important for virulence from different databases and evaluated the presence of ncRNAs in their UTRs to highlight the potential role of this type of gene regulation for virulence and, at the same time, get insight on some of the physical and chemical triggers of virulence.

Tight regulation of VFs improves the ability of pathogens to infect their host (Caldelari et al., 2013). 36 Quick regulation is a key for successful pathogenicity, which is expected in a highly changing 37 environment like hosts, especially when the latter react to the invasion (Fris and Murphy, 2016). 38 The regulation with RNA has been shown to be more effective than regulation with proteins in some 39 contexts (Gripenland et al., 2010). Noncoding RNAs (ncRNAs) are a heterogeneous group of RNA that 40 do not code for proteins but instead directly enact a function, often related to gene control. Regulation 41 carried out by ncRNAs can impact one or several genes during transcription or translation (Eddy, 2001). 42 The ncRNAs can be divided into two major groups as cis-regulatory or non cis-regulatory (trans) RNAs. 43 Cis intergenic 5′-UTRs having ncRNAs (484,136 sequences) were extracted with their corresponding cds 72 using RiboGap, i.e. all prokaryotic ncRNAs searched with cmsearch with Rfam's covariance models, as 73 well as a few more RNAs (Supplementary Data). BLASTp was then used to determine homology 74 between genes downstream of ncRNAs and the list of VFs (9019 genes). PERL scripts (supplementary 75 data) were used to analyze the results obtained by BLASTp. To avoid getting genes with common 76 domains, but that are non-orthologous, the BLASTp condition was set to 98% coverage for High Scoring 77 Pair (HSP). The BLASTp result was then sorted to keep hits with at least 60% identity. Only cis-78 regulatory RNAs on the same strand as the downstream gene were taken into account, except for tRNAs 79 (see below). 80

tRNA searches 81
The tRNAs were searched separately. RNA distance from start codon of VF was also taken in 82 consideration. The search was carried out for the same virulence genes as described above. Because all 83 genomes harbor many tRNAs, numerous genes are expected to have tRNAs upstream of their coding 84 sequences just by chance, so samples of genes (three replicas of 100 randomly chosen genes) were also 85 used to put results in context. To evaluate the presence of pseudo-tRNAs and also obtain information on 86 tRNA identity, tRNAscan-SE (Chan and Lowe, 2019) was used instead of RiboGap annotations, but 87 RiboGap was used to fetch all the UTRs. 88

Northern of co-transcribed tRNAs 89
Three tRNAs were identified upstream of "Elongation Factor Tu" (Ef-Tu) gene in Neisseria. To determine 90 whether this gene is transcribed alone or co-transcribed with the tRNAs upstream, we selected Neisseria 91 gonorrhoeae, Neisseria meningitidis, Neisseria sicca and Neisseria elongata. Oligonucleotides 92 complementary to each of the three tRNAs upstream of the EF-Tu cds and to the cds itself were ordered 93 from IDT to probe the membrane (Table S1). Similarly, an oligonucleotide complementary to the tmRNA 94 for the four Neisseria species was ordered from IDT and used as control. 95 Northern blots were performed as previously described (Perreault et al., 2011). In brief, total RNA of 96 Neisseria gonorrhoeae, Neisseria meningitidis MC58_NMB0124, Neisseria sicca and Neisseria 97 elongata was migrated on a 6% polyacrylamide gel and then transferred onto nitrocellulose membrane 98 (Amersham Hybond™ N + from GE healthcare). The oligonucleotides were labeled in 5´ by using 5 99 pmoles of oligonucleotide, 2 µL ATP (γ-32-P), 1 µL of 10 U/µL polynucleotide T4 kinase and PNK buffer 100 (NEB) in 20 µL, then incubated at 37ºC for 1 h. The labeled products were then purified on denaturing 101 6% polyacrylamide gel. The labeled oligonucleotides were incubated with the membrane for 24 hours at 102 42° C in a rotating oven with hybridization buffer with SCC 5X prepared from SCC 20 X (175.3 g NaCl, 103 88.2 g sodium citrate in 1 L, pH 7.0) and the day after washed twice with SCC 2X, 1 % SDS and SCC 104 0.2 X, 0.1% SDS. Membranes were then exposed overnight on a phosphorimaging plate. The plate was 105 scanned with a Typhoon FLA9500. 106

107
3.1 cis-regulatory RNA distribution upstream of virulence factors 108 We decided to not limit our search to the genes listed as VFs in the PATRIC, Victors and VFBD databases. 109 The focus of these databases is on experimentally validated VF genes and orthologs that follow stringent 110 criteria (including similar genomic context), but can omit some orthologs in other pathogens. We even 111 extended our survey of cis-regulatory ncRNAs to non-pathogens because regulation of a gene in mechanisms. Therefore, extending searches to VF orthologs may provide hints on the regulation of these 119 genes in pathogens. 120 We found 95,943 genes associated with virulence (and orthologs) downstream of ncRNAs (Table 1 and  121  Table S2). From these RNAs, we selected cis-regulatory RNAs (as annotated "type" in Rfam) based on 122 the criteria described in materials and methods to produce compiled lists of RNA families already known 123 to be cis-regulators (Table 1 and Tables S3, S4 and S5). This list includes 16 riboswitches for metabolites, 124 14 thermoregulatory RNAs and 4 cation-associated regulators (Table 1), as well as many additional 125 ncRNAs such as the T-boxes, leucine-operon leader or PyrD leader (1,473 hits in the latter case) (Table  126 S5). The purine riboswitch was found to be the most common riboswitch among cis-regulatory RNAs 127 (777 instance), whereas FMN and NiCo were found just one time and many riboswitches, such as THF, 128 guanidine (I, II and III) and fluoride riboswitches, were not observed with any genes associated with 129 virulence. Among the cation associated ncRNAs, the most common RNA family is associated to a zinc 130 metalloproteinase, Listeria snRNA rli51, followed by the Mg 2+ riboswitch and ykoK. Thermoregulators 131 are clearly important ncRNAs regarding pathogenesis as 14 families of such ncRNAs are found upstream 132 of ~9,000 instances of VFs (or VF homologs). The most abundant thermoregulator identified is HtrA 5' 133 UTR, followed by cspA followed by TrxA-5'UTR and shuA/chuA 5' UTR. While this may appear as few, 134 compared to the 35 RNA cis-thermoregulator families, most of the other families have relatively few 135 representatives or are found only in taxons never associated with infections, such as cyanobacteria. 136

tRNA upstream of virulence factors 137
We observed tRNAs upstream of hundreds of genes (Table S6). Interestingly, several VFs have pseudo-138 tRNAs in their UTR, such as clpP encoding a protease; and numerous genes have tRNA sequences on 139 the opposite polarity in their 5′-UTR, like the rnr encoded ribonuclease for many Betaproteobacteria 140 species (Table S7). 141 In numerous cases, tRNAs are found close enough to the downstream gene to suggest co-transcription 142 (Tables S6 and S8). We tried to determine whether Ef-Tu was co-transcribed with the three tRNAs 143 observed upstream of its cds in several Neisseria strains. The tRNA closest to Ef-Tu is at only 46 bases 144 from its start codon, leaving little room for a promoter. The northern blot result reveals the three tRNAs 145 (Tyr, Gly, Thr) are co-transcribed (supplementary Figure S1 and S2), but co-transcription with Ef-Tu was 146 not apparent. Nevertheless, other instances of co-transcription are very likely in 56 cases where tRNAs 147 are less than 40 bases apart from the downstream VF (or VF homolog). In some cases, the tRNAs even 148 overlap the annotated 5′ portion of the coding sequence, such as in several strains of Helicobacter pylori 149 for a gene encoding an outer membrane protein. Given precedents of tRNA Sec that overlap the coding 150 sequence of the selB gene close to its 3' end (Mukai, 2021), tRNAs overlapping the 5' end to enact gene 151 control are easy to envision. 152

Transcription Terminators 153
We evaluated existing "Rho-independent transcription terminators" (RiTTs , Table S9) and "Rho-154 dependent transcription terminators" (RTTs, ZTP-sensing has been previously associated to Zn homeostasis (Nies, 2019). Moreover, there is one hit 189 for an Mg 2+ ATPase C transporter found in association with the CspA thermoregulator (Table S2 and  regulate several VFs (Tamayo, 2019), but only cyclic-di-GMP-I was found in our searches. This is likely 198 due to the relative stringency (60% identity on 98% of sequence length) when we looked for homology. 199 Indeed, reducing our threshold to 40% identity revealed many instances of cyclic-di-GMP-II 200 riboswitches upstream of genes encoding components of a type II secretion system, which has homologs 201 annotated as VFs. Moreover, many other genes not recognized as VFs in PATRIC (and thus absent from experimentally as a TRAP target. We thus avoided this type of motifs for our compilation to avoid 230 spurious annotations as much as possible. 231 One of the ncRNAs that was searched independently was tRNA. Many VFs on the list exhibit presence 232 of tRNAs very close to their coding sequence (less than 30 nt). While we could not show by Northern 233 blot that Ef-Tu is indeed co-transcribed with these tRNAs, they are still likely to be, given the short 234 distance of only 46 bases separating them from the AUG. The rate of processing of the tRNAs might be 235 too fast to permit detection of a transcript including the tRNAs together with Ef-Tu. In fact, co-236 transcription was previously observed in E. coli (Miyajima et al., 1981) and the proximity of tRNAs to 237 Ef-Tu was already noticed in several species (Cousineau et al., 1992), which we find is generalized to 238 numerous bacteria (Table S6), Proteobacteria, Bacteroidetes as well as Firmicutes. Presumed co-239 transcription of Ef-Tu with these tRNAs could suggest potential regulation by tRNA or merely co-240 regulation due to the use of the same promoter. This is further supported by the absence of predicted 241 promoters between the tRNA closest to Ef-Tu and the start codon, as well as by the presence of a few 242 promoters upstream of the three tRNA sequences, promoters which would thus also be responsible of implying that these tRNAs may have a critical role in horizontal gene transfer and the evolution of 255 virulence. Yet, the role of tRNAs upstream of VFs, if any, remains to be elucidated in most cases. 256 Bacteria respond to signals coming from the host and its immune system. Such signal can be simple and 257 yet present an acute change in the bacterial environment, like the change in temperature when entering a 258 host, to which bacteria need to respond very quickly. Regulation by ncRNAs is very fast and less 259 energetically demanding compared to regulation by protein. Discovering more ncRNAs involved in VF 260 regulation helps better understand the means of bacteria to escape the host immune system, as well as 261 provide potential targets to overcome bacteria pathogenicity as a promising way for treatment 262        532  533  534  535  536  537  538  539  540  541  542  543  544  545  546  547  548  549  550  551  552  553  554  555  556  557  558  559  560  561  562  563  564  565  566  567  568  569  570  571  572  573  574  575  576  577  578  579  580  581  582     Evaluating co-transcription of tRNAs with Ef-Tu19 Sequences of Neisseria species evaluated19 Table S1 : The primer sequences used for Northern blot20 Figure S120  Additional methodology details22 Table S7: Virulence genes with pseudo-tRNA predicted in their 5'UTR22 Table S8: Overall tRNAscan results23 Figure S3. Distribution and orientation of tRNA (and pseudo tRNA) sequences upstream of genes.23 Transcription terminators24 ########################################## $accession_locus_tag_product=$acc_num."|_|".$acc_desc."|_|".$locus_tag."|_|".$produ ct."|_|".$strt_cds."|_|".$end_cds."|_|". $gene_strd."|_|". $strt_igr ."|_|".$end_igr ."|_|". $igr_srtd ."|_|". $rfam_id ."|_|". $rfam_name ."|_|".$rfam_desc."|_|".$rfam_type."|_|". $strt_rna."|_|".$end_rna. ## for blastp ###################### ############################## this take 98 percent coverege and not the first hit system("blastp -query $filename -db $database -evalue 1 -out $out -outfmt '6 qseqidsseqidpident length mismatch gapopenqstartqendsstart send qcovsqcovhspevaluebitscoresalltitles ' -qcov_hsp_perc98 -max_target_seqs 500threshold 11 "); print $filename,"with ", $filename," has finished with blast: \n"; }catch{ print $_,"\n"; }; }else { print "sequences_query.$filename has a problem or is empty\n"; } print "Normal end of the script \n"; exit;   In summary, while very weak bands corresponding to transcripts not fully processed with the tRNA thr and tRNA tyr probes, no convincing band could be detected that would have had the expected size of the Ef-Tu mRNA + one or more tRNAs still part of the same transcript. We have also tried amplifying such transcripts by RT-PCR without success. These results indicate that if Ef-Tu is co-transcribed with tRNAs (as suggested by this genetic arrangement), the transcript is quickly processed in its individual genes (as it is also suggested by the overwhelming intensity of bands corresponding to processed tRNA compared to the bands putatively comprising two or three tRNAs). Nevertheless, this does not preclude that Ef-Tu may use one of the tRNAs' promoter(s) for its transcription.
(Tables S2, S3, S4, S5 and S6 are in separate excel files.) tRNAs upstream of VFs (and homologs) (Table S6 is in a separate excel file.) Additional methodology details  Profile of tRNAs upstream of virulence genes: We evaluated the distribution of tRNAs upstream of the virulence genes, which were compared to 100 genes randomly taken from the genome of the strain Escherichia coli K12 MG1655. The 100 genes were randomly selected with Excel's RAND () function in three replicas For each replica, the 5' UnTranslated Reagion (5'UTR) were extracted from the Ribogap database (http://ribogap.iaf.inrs.ca) following mysql requests (see supplementary material, previous section).
 Method: o 5'UTRs sequences: extraction from the ribogap database (see the mysql scripts in the additional material) o Analysis with tRNAscan-SE (Chan and Lowe, 2019) (http://lowelab.ucsc.edu/tRNAscan-SE/): tRNAscan is a tool for detecting tRNA genes and predicting function from genomic sequences. We used the locally installed source code version to make our predictions. The configuration parameters were left by default (see material additional). Table S7: Virulence genes with pseudo-tRNA predicted in their 5'UTR