Polymorphic ribonucleoprotein folding as a basis for translational regulation

The widely occurring bacterial RNA chaperone Hfq is a key factor in the post-transcriptional control of hundreds of genes in Pseudomonas aeruginosa. How this broadly acting protein can contribute to the regulation requirements of so many different genes remains puzzling. Here, we describe the structures of higher-order assemblies formed on control regions of different P. aeruginosa target mRNAs by Hfq and its partner protein Crc. Our results show that these assemblies have mRNA-specific quaternary architectures resulting from the combination of multivalent protein-protein interfaces and recognition of patterns in the RNA sequence. The structural polymorphism of the ribonucleoprotein assemblies enables selective translational repression of many different target mRNAs. This system suggests how highly complex regulatory pathways can evolve and be rewired with a simple economy of proteinogenic components. Graphical Abstract The RNA chaperone Hfq, in conjunction with the co-repressor Crc, forms higher order assemblies on nascent mRNAs. These complexes impact on translation of hundreds of transcripts in the pathogen Pseudomonas aeruginosa. Assemblies with different quaternary structures result from the interactions of the proteins with sequence motifs and structural elements in different mRNA targets, as well as from a repertoire of protein-to-protein interfaces. In this way, the combination of RNA sequence and two proteins can generate the diversity required to regulate many genes. It is proposed that the multi-step assembly process is highly cooperative and most likely competes kinetically with translation initiation to silence the targeted transcripts.


59
Two major post-transcriptional systems control gene expression in the pathogenic bacterium 60 Pseudomonas aeruginosa. One system depends on the CsrA-like Rsm proteins, which act as 61 translational repressors and engage GGA motifs in the Shine-Dalgarno sequence of target 62 mRNAs that are often exposed in loops of stem-loop structures ( Hfq-mediated translational repression forms the basis for a hierarchical control of 75 carbon and nitrogen utilization by Pseudomonas spp., a mechanism referred to as carbon 76 catabolite repression (CCR) (Rojo, 2010;Sonnleitner & Bläsi, 2014). CCR ensures that 77 alternative nutrients are not utilized until the preferred carbon source, succinate, is depleted. 78 The regulation is exerted through translational repression of genes affecting the uptake and 79 metabolism of non-preferred nutrients (Sonnleitner & Bläsi, 2014). The best studied CCR-80 regulated gene is amiE, which encodes the enzyme aliphatic amidase that generates organic 81 acids from short-chain aliphatic amides, thereby enabling Pseudomonas to utilize acetamide 82 as a source of both carbon and nitrogen. When preferred carbon sources such as succinate 83 are abundant, translation of amiE mRNA is suppressed through sequestration of the ribosome 84 binding site by Hfq and the catabolite control protein Crc (Figure 1), which is followed by 85 mRNA degradation (Sonnleitner & Bläsi, 2014). When the preferred carbon source is 86 exhausted, CCR is alleviated by the regulatory sRNA CrcZ (Figure 1), which sequesters Hfq 87 away from substrate mRNAs (Sonnleitner & Bläsi, 2014). The CrcZ levels are controlled by the 88 alternative sigma factor RpoN (Sonnleitner et al., 2009;Abdou et al., 2011;Valentini et al., 89 2014) and the two-component system CbrA/B, which may be activated in response to the 90 cellular energy status (Valentini et al., 2014). 91 92 93

98
Understanding the molecular basis of CCR has been advanced by structural insights 99 into RNA recognition by Hfq. These have identified three different RNA binding surfaces on 100 the Hfq hexamer: the proximal face, the distal face, and the circumferential rim ( Figure 1B). 101 The proximal face binds uridine tracts, which are enriched at the 3ʹ end of sRNAs, the distal 102 face has sequence preference for ARN triplet motifs (where A is an adenosine, R a purine and 103 N any base), and the rim has arginine-rich patches that can interact with UA rich motifs of 104 RNAs (Santiago-Frangos & Woodson, 2018). Cryo-EM structures of Hfq-Crc complexes on a 105 short octadecameric segment derived from the 5ʹ upstream untranslated region (5´-UTR) of 106 amiE mRNA (Supplementary Figure 1A; amiE6ARN) Figure 1A and 151 Supplementary Figure 4A, the amiE6ARN sequence present in the 5´UTR of the amiE gene is 152 followed by one cluster of 3 complete ARN motifs in the immediate coding region, the 153 amiExARN motif, and two ARN triplet motifs further downstream. These downstream ARN 154 motifs have been implicated in Hfq binding in vivo by proximity crosslinking of Hfq and DNA 155 in nascent transcripts followed by Hfq-specific chromatin immunoprecipitation coupled with 156 DNA sequencing (ChIP-seq) (Kambara et al., 2018). In contrast to amiE RNA, clusters of ARN 157 motifs are predominantly found in the 5´-UTR of rbsB and estA mRNA, rather than in the 158 immediate 5´coding region, which is in accord with the reported ChlP-seq data (Kambara et 159 al., 2018). To verify that higher order assemblies form on different mRNA targets in vitro, 160 electrophoretic mobility shift assays were performed with the mRNA fragments amiE105 (nts 161 -45 to +60), rbsB110 (nts -75 to +33) and estA118 (nts -85 to +33) ( Supplementary Figure 1 B). 162 For all three transcripts, a number of higher order species was observed to form gradually 163 with increasing Hfq concentrations (Supplementary Figure 1B Sequential formation of higher-order Hfq-Crc assemblies from a core complex

171
To characterize the details of the oligomeric state and quaternary structure, the Hfq-Crc 172 complex formed on amiE105 was analyzed by cryo-EM. For the grid preparation a Hfq-titration 173 series was performed like that used for the EMSAs shown in Supplementary Figure 1B. Figure  174 2A shows a gallery of the key species observed in the titration series. Stable intermediates 175 can be resolved on the grid, and these have been ordered in the gallery in a proposed 176 assembly pathway. Although it is possible that the pathway for the formation of the higher 177 order complexes may be heterogenous, based on the observed species we propose a pathway 178 where in the first step, a Hfq-Crc-Crc core is formed, whereby the amiE6ARN region is bound 179 by the Hfq distal side with two Crc molecules recognizing and engaging the Hfq-RNA complex 180 (   pockets on the Hfq distal side, while the N-bases are exposed, as such presenting the RNA for 226 interaction with Crc partner molecules. Interestingly, all three complexes constitute a 227 modular assembly that consists of one (estA118 and rbsB110) or two (amiE105) copies of the 228 same Hfq-Crc-Crc core unit ( Figure 3A, In the amiE105 and the rbsB110 repressive assemblies Crc forms homodimers, and three 269 different self-dimerization interfaces were observed (labeled as 1, 2 and 3 in Figure 3E, F, and 270 G), constituting a second recurring feature in the repressive assemblies. Since Crc is 271 monomeric in solution at high µM concentrations, these dimerisation interfaces must require 272 that the Crc molecules are organized in repressive assemblies. Substitution of Glu193 for Arg 273 is predicted to disrupt the Crc interfaces 2 and 3 in the amiE105 and estA118 assemblies due to 274 electrostatic repulsion and impact translation repression ( Figure 3G). Indeed, the E193R 275 substitution in Crc reduced the translation repression of the amiE+60::lacZ and rbsB+13::lacZ 276 reporter genes to DCrc levels ( Figure 4H). Compensation of the repulsive Crc interface by the 277 R230E substitution in Crc in turn restored translation repression for the same amiE and rbsB 278 reporter genes ( Figure 3H). Although the Crc-Crc interfaces 2 and 3 are absent in the estA118 279 complex, Crc residue E193 provides an alternative interaction with Hfq2 N28 (chain V) and 280 Hfq2 R19 (chain V) that is anticipated to be weakened by the E193R substitution, which might 281 account for the observed weakened repression effect for the estA+18::lacZ reporter gene as 282 well ( Figure 3H). The substitution R230E in Crc can form interactions with Hfq2 N28 and R19 283 (chain V), and can restore the Hfq/Crc interface to compensate for the E193R substitution in 284 the estA118 complex. The hydrogen bonding interactions of R90 and Y91 in interface 3 of the 285 rbsB110 complex were tested with the double mutant R90E and Y91F, and found to have 286 roughly a 2-fold effect on translational repression ( Figure 3H). Interface 3 does not occur in 287 either the amiE105 or estA118 complexes, where instead R90 and Y91 of Crc interact with the 288 C-terminal tail of a Hfq protomer. The double mutation R90E and Y91F de-repressed 289 translation of the amiE and estA reporter genes roughly 7-fold and 2-fold, respectively. In 290 summary, these results show that the Crc interaction surfaces can be directed to either form 291 self-complementary contacts that support Crc-Crc interactions or to contact Hfq, both of 292 which stabilize the polymorphic higher order repressive assemblies. 293 294 295 A third salient feature of the complexes is how the RNA is shared between adjacent 296 Hfq molecules, where some Hfq distal sides present ARN motifs to the distal face or the rim 297 of a neighbouring Hfq hexamer, rather than to Crc partner molecules. In the Hfq-amiE105-Crc 298 assembly, the second amiExARN motif in the amiE coding region is partially shared between 299 the distal faces of Hfq1 and Hfq2, with nucleobases occupying alternating R-site pockets on 300 both distal sides ( Figure 4A and B, Supplementary Figure 4A). This sharing of amiExARN in the 301 amiE coding region between Hfq molecules drives higher-order assembly formation and 302 efficiently masks the ribosome binding site, rationalizing the observation that the 303 downstream ARN cluster enhances translational repression of amiE in vivo (Supplementary 304 Figure 1C and D). In the Hfq-estA118-Crc complex, Hfq3 presents the first longer 5ʹ ARN motif 305 (Supplementary Figure 4B, 12-mer, four ARN triplet repeats) to the Hfq2 rim side ( Figure 4F). 306 The Hfq2 rim-side residues Arg19 and Arg66 form hydrogen bonds with the exposed N-site 307 bases C-28 and U-25 of estA118 (counting backward from the AUG start codon, with A 308 annotated as nucleotide 1). Indeed, disruption of the ARN pattern of this Hfq3 binding site in 309 the estA sequence resulted in a decrease in translation repression of the corresponding 310 reporter gene by an order of magnitude in vivo ( Figure 4J, Hfq3mut). Such disruption would 311 also abrogate binding of estA by the Hfq3 distal side. A similar yet more extensive Hfq-to-Hfq 312 presentation of the RNA target is found in the Hfq-rbsB110-Crc assembly, where a short RNRN-313 motif in the rbsB110 coding region is presented by the Hfq3 distal side to the Hfq2 rim/proximal 314 side ( Figure 4I). In particular, Hfq2 residues Arg16, Lys17 and Arg19 form hydrogen bonds 315 with the phosphate backbone and the C6 nucleobase of rbsB110 (counting from the AUG start 316 codon, with A annotated as nucleotide 1). From these observations, it is apparent that 317 completion of higher-order assembly enhances translational repression, and that RNA-318 mediated Hfq-Hfq oligomerization drives this process. 319

346
Finally, in all three complexes RNA duplex elements interact with the Hfq proximal 347 sides in a sequence independent manner ( Figure 4A, C, E, G and H, Supplementary Figure 4). 348 Although the local resolutions were not sufficient to resolve side chains, it is apparent that 349 these interactions are between basic and polar residues in the proximal Hfq a-helix and the 350 phosphate backbone of the RNA duplex structures ( Figure 4C and E). In particular, the Hfq 351 residues Lys3, Asn13, Arg16 and Lys17 are likely candidates for such interactions ( Figure 4C  assemblies, however, are all likely to fold following the recurring architectural principles 371 described above. 372 373 The complexes are characterised by a common core sub-assembly, comprised of one Hfq 386 hexamer which presents an ARN-rich motif to two Crc molecules ( Figure 3C). Based on the 387 cryo-EM structures, we have elaborated rules that encode the architectural principles of 388 translation repression assemblies. These are based on four recurring features of the 389 complexes: i) the distal face of Hfq hexamer engages ARN-rich repeats, and is the sequence 390 specificity determining factor. Crc can then interact with these elements via a distinct basic 391 patch on its surface; ii) the proximal side of Hfq binds secondary structure elements in the 392 RNA targets; iii) higher-order folding is driven by sharing of RNA segments between Hfq 393 protomers and is enabled by the hexameric ring organisation of Hfq, in which protomers rich 394

Discussion
in RNA binding patches provide repetitive RNA-interaction sites. Notably, the Hfq-Hfq and 395 Hfq  Figure 1D). 429 The biological impact of these assemblies might depend on windows of opportunity 430 arising during the synthesis of the transcript. Sequential assembly of the Hfq-Crc complexes 431 is envisaged to occur on nascent transcripts as they emerge from the RNA polymerase, 432 potentially coupled with RNA folding in analogy with other systems (Kambara et  structure presents a kinetic energy barrier that determines whether target recognition occurs 437 before stable pairing to sRNA (Malecka and Woodson, 2021). Extending this principle, we 438 envisage that critical kinetic steps also occur in the assembly of Hfq-Crc complexes on nascent 439 transcripts. Indeed, removing the prominent hairpin structure that is recognised by the 440 central Hfq in the estA118 repressive complex ( Figure 4A and E, Hfq2) promotes translational 441 repression in vivo, suggesting a somewhat antagonistic role for such hairpin structures in the 442 assembly pathway. Another indication that assembly of repressive complexes might be 443 coupled to the transcription machinery is the observation that the 3´-end of transcripts that 444 are repressed during CCR are diminished up to 10-fold in RNA sequencing analyses 445 (Sonnleitner et al., 2018). One mechanism that could explain this observation is the 446 recruitment of transcription termination factors during formation of the Hfq/Crc complexes, 447 coupling translation-repression of a transcript to termination of its transcription. This 448 hypothesis awaits validation. 449 450 The RIL-seq results described by Kambara et al. (2018) and our on-grid Hfq-Crc-amiE 451 assembly pathway (Figure 2A) point towards a stepwise assembly scenario, where protein 452 binding occurs as soon as the ARN motifs are synthesized during transcription. However, given 453 the high affinity of Hfq for RNA and the architectural principles described above, it is plausible 454 that dynamic sampling of the RNA sequence and secondary structure elements by Hfq and/or 455 Crc, and synergistic recruitment of multiple copies of these, forms the basis for RNP assembly. 456 As such, co-transcriptional Hfq-Crc assembly on an RNA target might bear analogy to the 457 synergistic co-transcriptional assembly of ribosomes on rRNA (Rodgers & Woodson 2019). If 458 so, stepwise Hfq-Crc assembly is not sequential, but rather depends on the contextual state 459 of the RNA binding sites and the presence of other copies of Hfq/Crc proteins. The difference 460 here is that Hfq/Crq assembly is tuned temporally and the resulting translation repression 461 complexes are transient in nature, subject to kinetic competition, as distinct from folding 462 equilibrium complexes such as ribosomes and spliceosome components (Herzel et al., 2017, 463 Rodgers & Woodson 2019). 464 465 The response to environmental changes and stress, and the re-routing of metabolic 466 pathways demands systems of hierarchical control that form highly inter-connected 467 networks. Such an intricate system is a demanding process for the cell, would require many 468 specificity factors, i.e. protein components, to function properly. Here, we observe that 469 specificity can be achieved with only two multifaceted protein factors and patterns in the