Analysis of non-coding RNAs in Methylorubrum extorquens reveals a novel small RNA specific to Methylobacteriaceae

Methylorubrum extorquens metabolizes methanol, a cheap raw material that can be derived from waste. It is a facultative methylotroph, making it a model organism to study the metabolism of one carbon compounds. Despite a considerable interest to exploit this bacteria as a biotechnological tool in a methanol-based bioeconomy, little is known about its non-coding sRNA. Small RNAs play well-documented essential roles in Escherichia coli for post-transcriptional regulation; and have important functions in many bacteria, including other Alphaproteobacteria like Agrobacterium tumefaciens. M. extorquens is expected to contain many sRNAs, especially since it also encodes for the protein Hfq, a chaperone protein important in the interaction between sRNAs and their target, but also critical for the stabilization of sRNAs themselves. Few sRNAs are annotated in the genome of this Alphaproteobacteria and they were never validated. In this study, formerly annotated sRNAs ffh, CC2171, BjrC1505 were confirmed by Northern blot, validating the expression of sRNAs in M. extorquens. Moreover, analysis of RNA-sequencing data established a considerable list of potential sRNAs. Interesting candidates selected after bioinformatic analysis were tested by Northern blot, revealing a novel sRNA specific to Methylobacteriaceae, sRNA Met2624. Its expression patterns and genomic context were analyzed. This research is the first experimental validation of sRNAs in M. extorquens and paves the way for other sRNA discoveries.

determine interesting noncoding regions from RNA-seq data such as sRNA-Detect [ 23 ].
Small RNAs are discerned based on the assumptions that for a given length, reads show small coverage variation with a minimal depth coverage. We experimentally validated several sRNAs previously annotated in M. extorquens. Moreover, analysis of the RNAseq data revealed numerous putative sRNAs in M. extorquens. We also confirmed one of them and evaluated it experimentally, describing at the same time a novel sRNA specific to Methylobacteriaceae.

Annotated sRNAs and intergenic regions size distribution
To get a sense of the likelihood to discover sRNAs in M. extorquens, we first looked at the number of annotated sRNAs with an E-value lower than 0.0005 within the genomes of Alphaproteobacteria ( Figure 1). All sRNAs mentioned in this study had an E-value lower than 0.0005. This information was extracted from the database RiboGap (version 2) [ 24 ], since it facilitates the examination of intergenic regions (IGR) from prokaryotes.
Evidence about sRNAs in RiboGap comes from the Rfam database [ 25 ] and are limited to available covariance models used for homology searches and annotations. A total of 87 distinct sRNAs were annotated in Alphaproteobacteria and spread throughout all orders of this bacterial class (Figure 1, A). The order with the highest number of distinct annotated sRNAs (66 sRNAs) is Hyphomicrobiales, which comprises M. extorquens (Figure 1, A). Numerous sRNAs are predicted in different genera of the order Hyphomicrobiales (Figure 1, B) where eight distinct potential sRNAs [26][27][28][29][30][31][32][33] are annotated in the genome of all available strains of the genus Methylorubrum, but they were never confirmed in laboratory conditions (Table 1). This is still very few compared to the 103 distinct sRNAs annotated within the genome of Escherichia coli alone ( Figure 1, B), a Gammaproteobacteria model organism. All Hyphomicrobiales bacteria represented in (Figure 1, B) also encode for the chaperone protein Hfq. leading to more sRNA predictions in RiboGap than in Rfam (Figure 1, C-D).
The size distribution of IGRs in the range of 50 and 1000 nucleotides for both proteobacteria has a similar pattern. In E. coli, approximately 65% of the annotated sRNAs are concentrated in the intergenic gaps between 50 to 400 nucleotides.
Remarkably, only two trans-regulatory elements are annotated in this potential sRNA rich region in M. extorquens, reinforcing the idea that there are more to be discovered.

Expression of annotated sRNAs
We first wanted to confirm the expression of some of the annotated sRNAs in M. extorquens by Northern blot analysis with bacteria grown with 1% methanol. Three of the annotated sRNAs in M. extorquens AM1 were selected to be experimentally validated (ffh, CC2171, BjrC1505) ( Table 1). To use as positive controls for Northern blots, probes for 5S RNA, transfer-messenger RNA (tmRNA) and the leucine tRNA were created as well, all expected to be highly transcribed (Table 1). These would also act as a size guideline for the predicted sRNAs.
Hybridization was observed for all RNAs used as a positive control (5S RNA, tRNA-leu and tmRNA), confirming the proper transfer of the extracted RNA on the nitrocellulose membrane (Supplementary material, Figure S1). Bands were also detected for all three sRNAs that were annotated in the genome of M. extorquens (ffh, CC2171, BjrC1505), validating for the first time the presence of sRNAs in this biotechnologically relevant bacteria ( Figure 2, A). All hybridization experiments were done in tri-replicates (data not shown).

Prediction of sRNA Candidates sRNA-Detect
Beyond these few, now confirmed, sRNAs, we were interested in discovering potential novel sRNAs. For this, we analyzed the transcriptome with sRNA-Detect [ 23 ]. This data came from a transcriptomic study realized as part of another research project (unpublished data). Briefly, M. extorquens strain ATCC55366 was genetically engineered to allow the accumulation of the tricarboxylic acid cycle (TCA) metabolite succinic acid using a sdhA gap20::145 phaC::Km R triple mutant [ 36 ]. To investigate the impact of these mutations at the transcriptional level, RNA-seq data were acquired for the WT strain, the mutant at pH 6.5 and without pH control. We used the RNA-seq data from these three samples (each in triplicates) for the sRNA-Detect analysis, but without further focus on mutant vs WT strains, unless otherwise mentioned in the text.
Inspection of the transcriptome using sRNA-Detect resulted in a list of 10,267 detected candidates from all three conditions. Some of these were repetitive amongst growth conditions, with approximately 3,500 potential sRNAs for each of them. This includes multiples hits for a single ncRNA (e.g., 15 candidates are found within the 23S rRNA sequence). Sequences with a predicted length of 50 to 250 nucleotides by sRNA-Detect within the main chromosome (NC_012808.1) for the WT strain were kept for further analysis (2,079 candidates).

Annotated RNAs among sRNA-Detect Candidates
Candidates were first inspected for the presence of already annotated RNA. The list of all annotated RNA within the genome of M. extorquens AM1 was obtained with RiboGap [ 24 ]. Amongst our list of presumptive regulatory elements, ribosomal RNAs (rRNA), sRNAs, transfer RNAs (tRNAs) and cis-regulatory elements were found (Table 2). Most importantly, we were able to recover six of the seven distinct sRNAs that are annotated in the genome of M. extorquens AM1 (all except 5_ureB_sRNA), confirming that sRNA-Detect is a reliable tool to detect sRNAs.  Table S1). The value provided by sRNA-Detect represents the average read depth coverage, which is the sum of the reads mapped to each nucleotide of small transcripts divided by the length of such transcripts. It could therefore be interpreted as the level of expression of that RNA region. A cut-off of 1000 was determined to be acceptable since the mean score of all annotated RNAs was higher than this value (Figure 3). Red dots with a black outline are sRNAs that were previously annotated, whereas red dots without an outline are novel sRNA specific to this study. Information is divided into positive (left panel) and negative strand (right panel).
Transcripts for sRNAs 1153 and 2624 were detected by Northern blot analysis ( Figure 4).
Hybridization of the probe for candidate 2624 was observed in triplicates with methanol or succinic acid as a source of carbon (data not shown). The band intensity for candidate 1153 was very weak, but could also be detected on two other membranes, albeit with an even weaker signal. When comparing their migration profile in a membrane with RNA of known size, both sRNA2624 and sRNA1153 are between 5S RNA (115 nt) and tmRNA (378 nt), where sRNA2624 is much closer to tmRNA. Further analysis focuses on sRNA2624, but a potential secondary structure was still determined for Met1153 based on IGR containing its sequence.

Conserved genomic context of the candidate sRNA2624 in Methylobacteriaceae
The candidate sRNA2624 seems to be constitutively expressed over the growth of M. extorquens (Supplementary material, Figure S2). As controls, probes for the 5S RNA, leucine tRNA and tmRNA were also hybridized on the same membrane. By comparing their migration on the gel with that of the candidate sRNA2624, we can estimate its size to approximately 300 nucleotides (Supplementary material, Figure S1). According to sRNA-Detect prediction, sRNA2624 was only 105 nucleotides long and expressed within the negative strand. However, this bioinformatic method is known to miss some nucleotides at the 5´ and 3´ end, so it is probably longer, as suggested by the hybridization results ( Figure 4)   to corroborate the idea that it is a functional RNA. The same multiple alignment file submitted to RNAcode was provided to the RNAz program, leading to a "RNA-class probability" of 0.94, indicating that it is most likely to be a functional RNA (Supplementary material, Figure S4). To make such forecast, RNAz considers the structure conservation index (SCI) and the thermodynamic stability (negative z-score). Considering that Met2624 was only found in the family of Methylobacteriaceae, sequence conservation is relatively high, which limits covariation. Despite this conservation, some covariations and compatible mutations could be observed within the predicted conserved structure for Met2624 created ( Figure 5). A potential secondary structure was also determined for Met1153 based on sequences containing the sRNA candidates in the same manner (Supplementary material, Figure S5). Taken individually, none of the indicated covarying base pairs are considered statistically significant according to R-scape [ 44 ]. With alternatives Stockholm alignment, the base pair R-Y from stem III was covarying significantly based on R-scape (Supplementary material, Figure S6).
None of the covarying base pairs were statistically significant when evaluated with the R-scape tool for these predictions [ 44 ]. However, a base pair from stem III was covarying significantly based on R-scape for the structure of Met2624 when using an alternatives Stockholm alignment (Supplementary Material, Figure S5). Moreover, stems II and III are still predicted to form in this alternate conformation, even if the start and end positions of aligned sequences varied, further supporting our predicted secondary structures.

Prediction of Promoter and Terminator for Met2624
The IGR from AM1 containing Met2624 was analyzed for the presence of a promoter and terminator ( Figure 6).  (Figure 6, B). No Rho-independent terminator was predicted with the tool RNIE in the IGR where Met2624 is found [ 47 ]. Met2624 regulation is therefore likely independent of the chaperone protein Hfq, since the poly-U tail of Rho-independent terminator is typically an important Hfq binding site [ 48 ]. Hfq can otherwise bind AU-rich regions, which are also absent from Met2624. However, two Rho-dependent terminators (RDT) were identified within the sequence using the tool RhoTermPredict [ 49 ] (Figure 6, A-B). The program assigns scores in between 6 and 15 to potential terminators, where the highest value represents the greater probability. RDT I and II had a value of 6 and 11 respectively.
To narrow down the location of a potential promoter and a terminator for Met2624, various probes towards the 5´ and 3´ were designed to delimit the sRNA experimentally

RNA-sequencing data
A DASGIP ® parallel bioreactor system (Eppendorf) equipped with 1.5 L reactor vessels was used to grow the M. extorquens wild-type strain ATCC55366 at pH 6.5 and its isogenic sdhA gap20::145 phaC::Km R triple mutant at pH 6.5 and without pH control [ 36 ], each in biological triplicates, for a total of nine fermentation runs. Precultures were prepared as follow: two 3 L baffled Erlenmeyer containing 400 mL of CHOI  Figure S7). At every time point, 1 mL of culture was centrifuged at 5,000 rpm at 4 °C for 10 minutes and the supernatant was discarded. The bacterial pellet was stored at -80 °C before RNA extraction.

RNA Extraction
The Whatman® filter paper and a nitrocellulose membrane were cut to the length and width of the polyacrylamide gel. Another Whatman® filter paper was cut the same width, but its length was long enough to touch the buffer when placed on a support. They were all pre-soaked into 10X SSC buffer for 30 minutes prior to the assembly. The Whatman® filter papers, the nitrocellulose membrane and the polyacrylamide gel were all stacked one on top of the other.
The RNA was left to transfer from the polyacrylamide gel to the nitrocellulose membrane overnight. The next morning, the membrane was dried for a few minutes. To fix the RNA unto the membrane, shortwave UV light was used (UV stratalinker 2400 Stratagene). The membrane was stained with a methylene blue solution (0.02% methylene blue and 0.3 M sodium acetate pH 5.5) for 10 minutes with agitation to verify proper transfer of the RNA.
The membrane was rinsed with distilled water for at least one hour. As the excess coloration was washed from the membrane, the bands corresponding to the highly abundant transferred RNA were revealed (data not shown).

Hybridization of probes corresponding to candidate sRNAs
Radiolabelling of the DNA probe were purified on a 6 % denaturing gel (8 M urea PAGE, polyacrylamide gel electrophoresis). Loading dye 2 X and 1 X TBE was used as described before. The gel was exposed with phosphor imaging screens for 5 minutes before being scanned with a Typhoon TM FLA9500 (GE Healthcare Life Sciences). The bands corresponding to the probes were cut out of the gel and conserved at -20°C for future work. Intensity of radioactive bands was quantified with ImageJ.
The nitrocellulose membrane with the transferred RNA was pre-incubated with 15 mL To ensure that the membranes were cleaned, they were exposed in phosphor imaging screens as before. If radioactivity was still present, the last washing step was repeated. The membrane was stored in Saran wrap plastic between uses.

Data availability statement
Data available on request from the authors