Bidirectional cooperation between Ubtf1 and SL1 determines RNA Polymerase I promoter recognition in cell and is negatively affected in the UBTF-E210K neuroregression syndrome

Transcription of the ∼200 mouse and human ribosomal RNA genes (rDNA) by RNA Polymerase I (RPI/PolR1) accounts for 80% of total cellular RNA, around 35% of all nuclear RNA synthesis, and determines the cytoplasmic ribosome complement. It is therefore a major factor controlling cell growth and its misfunction has been implicated in hypertrophic and developmental disorders. Activation of each rDNA repeat requires nucleosome replacement by the architectural multi-HMGbox factor UBTF to create a 15kbp nucleosome free region (NFR). Formation of this NFR is also essential for recruitment of the TBP-TAFI factor SL1 and for preinitiation complex (PIC) formation at the gene and enhancer-associated promoters of the rDNA. However, these promoters show little sequence commonality and neither UBTF nor SL1 display significant DNA sequence binding specificity, making what drives PIC formation a mystery. Here we show that cooperation between SL1 and the longer UBTF1 splice variant generates the specificity required for rDNA promoter recognition in cell. We find that conditional deletion of the Taf1b subunit of SL1 causes a striking depletion UBTF at both rDNA promoters but not elsewhere across the rDNA. We also find that while both UBTF1 and −2 variants bind throughout the rDNA NFR, only UBTF1 is present with SL1 at the promoters. The data strongly suggest an induced-fit model of RPI promoter recognition in which UBTF1 plays an architectural role. Interestingly, a recurrent UBTF-E210K mutation and the cause of a pediatric neurodegeneration syndrome provides indirect support for this model. E210K knock-in cells show enhanced levels of the UBTF1 splice variant and a concomitant increase in active rDNA copies. In contrast, they also display reduced rDNA transcription and promoter recruitment of SL1. We suggest the underlying cause of the UBTF-E210K syndrome is therefore a reduction in cooperative UBTF1-SL1 promoter recruitment that may be partially compensated by enhanced rDNA activation.

UBTF1 splice variant and a concomitant increase in active rDNA copies. In contrast, they also display 49 reduced rDNA transcription and promoter recruitment of SL1. We suggest the underlying cause of the 50 UBTF-E210K syndrome is therefore a reduction in cooperative UBTF1-SL1 promoter recruitment that 51 may be partially compensated by enhanced rDNA activation. 52

INTRODUCTION 53
The ribosomal RNA (rRNA) genes encode the catalytic and structural RNAs of the ribosome as a 54 9 The observation that Ubtf recruitment depended on SL1 specifically at the rDNA promoters but not 178 elsewhere across the rDNA repeat suggested that the Ubtf variants might be important in this 179 specificity. Mammals express two splice variants of Ubtf, both Ubtf1 and Ubtf2 encompass six tandem 180 HMGbox DNA binding domain homologies but differ in HMGB-box2, a central segment of which is 181 deleted in Ubtf2 ( Figure 4A). While MEFs express both forms of Ubtf, ESCs naturally express 182 exclusively Ubtf1 ( Figure S5C). Promoter recruitment of Ubtf1 in these cells was found to be strongly 183 suppressed on depletion functional SL1 ( Figure S5A  Ubtf were present at both promoters in the Ubtf1 profile but were absent in the Ubtf2 profile. This 192 differential promoter binding was most evident in Ubtf1-Ubtf2 difference maps ( Figure 4B  The combined data showed that formation of the RPI preinitiation complex in cell involves a 203 cooperation between Ubtf1 and SL1, and most surprisingly, this same cooperation occurred at both the 204 Spacer and 47S promoters despite their unrelated base sequences. Since the only difference between 205 Ubtf1 and Ubtf2 lies in the structure of HMGbox2, this domain must play a key role in Ubtf-SL1 206 cooperation and RPI promoter recognition. 207 208 An HMGbox2 mutation linked to neuroregression potentially affects Ubtf interactions. 209 An E>K mutation at residue 210 in HMGbox2 of Ubtf was recently shown to be the cause of a 210 recurrent human pediatric neuroregression syndrome (6, 11-13). The key role of HMGbox2 revealed by 211 our study suggested that this mutation might affect the formation of the RPI preinitiation complex in 212 vivo and possibly explain the origin of this syndrome. Unfortunately, as yet the structure of HMGbox2 213 has not been determined experimentally. However, despite a high degree of primary sequence 214 variability, HMGboxes display very similar tertiary structures and DNA contacts, making them 215 accessible to molecular modelling (summarized in Figure S9A). Modelling of Ubtf-HMGbox2 revealed 216 a typical HMGB saddle structure with basic residues K198, 200 and 211 lining the DNA binding 217 underside ( Figure S9B). Significantly, the sidechain of residue K211, a highly conserved minor groove 218 contact in other HMGboxes, was predicted to be correctly oriented towards the DNA. In contrast, the 219 sidechain of the immediately adjacent E210 residue was predicted to point away from the DNA and lay 220 on the seat of the HMGbox saddle. Furthermore, this predicted sidechain position was unaffected by 221 the E210K mutation ( Figure S9C). We concluded that the E210K mutation was extremely unlikely to 222 affect HMGbox2 interactions with the DNA. However, the mutation would create a significant change 223 in the electrostatic surface potential of the seat of HMGbox2 ( Figure S9D), suggesting that it could 224 well affect interactions with other factors such as SL1. 225

226
The UBTF HMGbox2 E210K mutation suppresses 47S rRNA synthesis in a MEF model. 227 Given that the sequences of human and mouse UBTF are 99% identical, we took advantage of a 228 recently generated Ubtf E210K mouse knock-in model. Mice homozygous for the E210K mutation are 229 viable but exhibit behavioral abnormalities that worsen with increasing postnatal age (details will be 230 described elsewhere). Ubtf E210K/E210K MEFs were isolated from these mice and found to proliferate 231 somewhat more slowly than MEFs from isogenic wild type littermates, doubling times of 35h and 31h 232 respectively ( Figure 5A). Metabolic RNA labelling also revealed a >40% lower rate of de novo 47S 233 pre-rRNA synthesis in the mutant as compared to the wild type MEFs ( Figure 5B), however, no overt 234 rRNA processing defects were detected ( Figure S10A). The mutant MEFs also contained 30% less total 235 cellular RNA, (~80% of which is of course rRNA), than wild type MEFs ( Figure 5C). Thus, the E210K 236 mutation in Ubtf significantly reduced the capacity of MEFs to synthesize rRNA and to assemble 237 ribosomes, explaining their reduced proliferation rate. 238

239
The E210K mutation also enhances Ubtf1 levels and the fraction of active rDNA repeats 240 Unexpectedly, the Ubtf E210K/E210K MEFs displayed a significant increase in the fraction of activated 241 rDNA copies determined by PAC ( Figure 5D and E), and this corresponded to an equally significant 242 increase in the expression of the Ubtf1 variant both at the protein and mRNA levels ( Figure 5F and G). 243 A similar bias towards Ubtf1 expression was also observed in brain tissue of mutant mice ( Figure 5H  244 and I). This suggested the interesting possibility that the enhanced levels of Ubtf1 in the mutant MEFs 245 revealed an inherent feedback mechanism regulating splicing. In this way the cell might control the 246 fraction of active rDNA copies and hence potentially also rRNA synthesis. However, it will first be 247 necessary to determine whether or not the E210K mutation directly affected usage of the adjacent 248 splice junctions (see Figure S10B). In either scenario, the increase in active rDNA copies would 249 normally be expected to enhance rRNA synthesis and cell growth in the mutant MEFs. Since this was 250 clearly not the case, the E210K mutant MEFs displaying reduced rRNA synthesis, accumulation and 251 proliferation ( Figure 5A to C), we sought other origins for these effects. 252 The E210K mutation reduces RPI loading and SL1 and Ubtf recruitment to the rDNA promoters 254 ChIP-qPCR analyses revealed that RPI loading across the rDNA was reduce by >40% in the 255 Ubtf E210K/E210K mutant MEFs, explaining the observed reduction in pre-rRNA synthesis in these cells 256 (compare RPI loadings in Figure 6A with de novo rRNA synthesis levels in 5B). Recruitment of Taf1B 257 (SL1) and Ubtf to both Spacer and 47S rDNA promoters was somewhat reduced in the mutant MEFs, 258 though less than RPI loadings ( Figure 6B). Thus, the E210K Ubtf mutation most probably reduced pre-259 initiation complex formation, consistent with it affecting Ubtf-SL1 cooperation. The higher resolution 260 of DChIP-Seq further showed that occupancy of Ubtf at both 47S and Spacer promoters was selectively 261 reduced by the E210K mutation ( Figure 6C), again consistent with a reduced Ubtf-SL1 cooperativity. 262 The reduction of Ubtf at the rDNA promoters was particularly apparent in difference maps between 263 wild type and E210K mutant MEFs ( Figure 6D and S11). The reduction in Ubtf was especially strong 264 at the Spacer promoter and corresponded with a similar reduction in Taf1B occupancy and in RPI 265 recruitment (( Figure 6D and E). The data strongly suggested that the E210K mutation causes a small 266 but significantly reduced ability of Ubtf to cooperate with SL1 in the formation of the RPI preinitiation 267 complex, and together point to a reduction in the efficiency of RPI transcription initiation as the 268 basal factors Ubtf and Rrn3, Taf1b is an essential factor in mouse. Conditional deletion of taf1b in 291 MEF and mES cell culture was also found to arrest rDNA transcription and to cause severe disruption 292 of nucleolar structure characteristic of nucleolar stress (24, 28). Depletion of Taf1b also prevented 293 promoter recruitment of Taf1c and TBP subunits of SL1 and hence PIC formation at both the 47S pre-294 rRNA and the Enhancer-associated Spacer rDNA promoters. Quite unexpectedly, this also led to a loss 295 of Ubtf at both these promoters, though not elsewhere across the rDNA NFR. ChIP-qPCR and high 296 resolution DChIP-Seq showed that the loss of Ubtf from the promoters was proportional to the loss of 297 SL1, strongly arguing that binding of these two basal factors was cooperative. Conversely, we had 298 previously shown that in cell loss of Ubtf eliminated SL1 from the rDNA promoters (24, 26), consistent 299 with the cooperative recruitment of these factors. Data from early cell-free studies had suggested two 300 possible scenarios for RPI preinitiation complex formation, either SL1 recruitment depended on pre-resolve this contradiction by showing that in cells Ubtf and SL1 binding at the rDNA promoters is 303 strongly interdependent, neither factor being recruited in the absence of the other. The lack of Ubtf 304 binding at the promoters in the absence of SL1 was particularly surprising, especially so since Ubtf 305 remained bound throughout the rest of the rDNA NFR and even at immediately promoter adjacent 306 sites. Thus, the absence of SL1 the RPI promoters rather than being prefer sites of Ubtf binding as 307 usually assumed, are quite on the contrary sites of low Ubtf affinity lying within the NFR continuum of 308 higher affinity sites. 309 310 Our data further revealed the key importance of the Ubtf1 variant in the recruitment of SL1 to the 311 rDNA promoters. Mouse and human cells express varying levels of the Ubtf1 and Ubtf2 splice variants 312 that differ by a 37a.a deletion in HMGbox2 of Ubtf2 ( Figure 4). By mapping these variants across the 313 rDNA we found that Ubtf1 was recruited to the rDNA promoters at least four times more often than 314 Ubtf2, though the data were also consistent with the exclusive recruitment of UBTF1 at the promoters. 315 In contrast, Ubtf1 and Ubtf2 bound indistinguishably elsewhere across the rDNA. Since only Ubtf1 is 316 present in mESCs, deletion of taf1b in these cells also clearly demonstrated that promoter recruitment 317 of Ubtf1 depended on SL1 ( Figures S4 and S5). Thus, formation of the RPI preinitiation complex is 318 driven predominantly if not exclusively by a cooperation between SL1 and Ubtf1. This provides the 319 first mechanistic explanation for why Ubtf1 is absolutely required for rDNA activity in vivo (31). 320 321 Recruitment of Ubtf1 and SL1 was found to be cooperative not only at the major 47S rDNA promoter 322 but also at the enhancer associated Spacer promoter. Since these promoters display little DNA 323 sequence homology, this raised the question of what in fact defines an RPI promoter and how is it 324 recognized? Our data clearly show that promoter recognition involves the cooperative recruitment of 325 Ubtf1 and SL1. Previous data showed that Ubtf interacts with SL1 solely via its highly acidic C-326 terminal tail, an ~80 a.a. domain containing 65% Asp/Glu residues (32). However, this domain is not essential for cell-free transcription (33) and is anyhow present in both Ubtf variants. So, while it might 328 play some role in bringing SL1 to the promoters it cannot explain their selective binding of Ubtf1. Co-329 immunoprecipitation also failed to detect any specific interaction between SL1 and one or other of the 330 Ubtf variants (data not shown). Thus, it seems unlikely that the rDNA promoters are recognized by a 331 pre-formed SL1-Ubtf1 pre-initiation complex. Rather we suggest that promoter recognition involves 332 the transient imposition of a specific DNA conformation by Ubtf1 that is in turn locked into place by 333 SL1 (Figure 7). There is significant precedent for such a mechanism, since the HMGboxes of Ubtf 334 were shown to induce in-phase bending and looping of a DNA substrate. Indeed, it was suggested that 335 such a looping could position UCE and Core promoter elements ( Figure 1A Indeed, our study of the UBTF-E210K recurrent pediatric neuroregression syndrome suggested that 344 this was quite probably the case. Molecular modelling showed that while this E210K mutation in 345 HMGbox2 of UBTF was very unlikely to affect interactions with DNA, it might well affect interactions 346 with other proteins such as SL1. We found that introduction of the homozygous Ubtf-E210K mutation 347 in MEFs significantly reduced rDNA transcription rates, reduce total cellular RNA accumulation and 348 slowed cell proliferation. In apparent contradiction to these effects, the E210K mutation enhanced 349 expression of Ubtf1 both in mutant MEFs and mouse tissues, and this led to an increase in the fraction 350 of active rDNA copies, possibly as an attempt to compensate for reduced rDNA transcription. recruitment of SL1 and Ubtf to the rDNA promoters. Thus, it appeared that the primary effect of the 353 E210K-Ubtf mutation was to limit PIC formation on the rDNA. This further emphasized the central 354 importance of a functional cooperation between Ubtf and SL1 in determining rDNA activity. It further 355 suggested that the UBTF-E210K neurodegeneration syndrome was caused by a subtle defect in PIC 356 formation on the rDNA. 357

358
In summary, our study identifies the parameters that determine RNA polymerase I promoter 359 recognition and preinitiation complex formation in vivo. We reveal the central importance of a 360 cooperative interaction between the RPI-specific TBP complex SL1 and the Ubtf1 splice variant in 361 promoter recognition and propose an induce-fit model for pre-initiation complex formation that 362 explains the functional differences between the Ubtf splice variants in terms of their abilities to induce 363 specific conformational changes in the rDNA promoter sequences. Our data further suggest that the 364 UBTF-E210K recurrent neurodegeneration syndrome is caused by a subtle reduction in UBTF-SL1 365 cooperativity that leads to reduced rDNA transcription. 366

Primary antibodies for Immunofluorescence, ChIP and Western blotting. 370
Rabbit polyclonal antibodies against mouse Ubtf, RPI large subunit (RPA194/Polr1A), and Taf1b were 371 generated in the laboratory and have been previously described (24), anti-Taf1c was a gift from I. 372 Grummt. All other antibodies were obtained commercially; anti-Fibrillarin (#LS-C155047, LSBio), Heterozygous taf1b ∆/wt mice were inter-crossed and embryos isolated, imaged and genotyped from 398 pregnant females at E3.5, 6.5, 7.5, 8.5 and E9.5 as described in (24, 26). DNA from E3.5 embryos was 399 amplified using the REPLI-g Mini kit (QIAGEN). Individual embryos were genotyped by PCR using 400 the same primers as for mouse lines ( Figure S1A procedures using an HRP conjugated secondary antibody and Immobilon chemiluminescence substrate 445 (Millipore-Sigma). Membranes were imaged on an Amesham Imager 600 (Cytiva) and Ubtf1/2 ratios 446 were determined from lane scans using ImageJ (40) and Gaussian curve fit using MagicPlot Pro 447 (Magicplot Systems). Relative Ubtf1/2 mRNA levels were determined by PCR on total cDNA using 448 primers bracketing the spliced sequences (5'TGCCAAGAAGTCGGACATCC and 449 5'TCCGCACAGTACAGGGAGTA). Products were fractionated by electrophoresis on a 1.5 or 2% 450 agarose EtBr-stained gel, photographed using the G:BOX acquisition system (Syngene) and Ubtf1/2 451 mRNA ratios determined using ImageJ and Gaussian curve fitting as for proteins. 452 453 Determination of rRNA synthesis rate 454 The rate of rRNA synthesis was determine by metabolic labelling immediately before cell harvesting. 455 10 µCi [ 3 H]-uridine (PerkinElmer) was added per 1ml of medium and cell cultures incubated for a 456 further 30min to 3h as indicated. RNA was recovered with 1 ml Trizol (Invitrogen) according to the 457 manufacturer's protocol and resuspended in Formamide (Invitrogen). One microgram of RNA was 458 loaded onto a 1% formaldehyde/MOPS Buffer gel (41, 42) or a 1% formaldehyde/TT Buffer gel (43). 459 The EtBr-stained gels were photographed using the G:BOX acquisition system (Syngene), irradiated in 460 a UV cross-linker (Hoefer) for 5 min at maximum energy, and transferred to a Biodyne B membrane 461 (Pall). The membrane was UV cross-linked at 70 J/cm 2 , washed in water, air dried and exposed to a 462 Phosphor BAS-IP TR 2025 E Tritium Screen (Cytiva). The screen was then analyzed using a Typhoon 463 imager (Cytiva) and quantified using the ImageQuant TL image analysis software. 464 465 Psoralen crosslinking accessibility and Southern blotting. 466 The psoralen crosslinking accessibility assay and Southern blotting were performed on cells grown in 467 gene EcoRI fragment (pMr100) (44). The ratio of "active" to "inactive" genes was estimated by 469 analyzing the intensity profile of low and high mobility bands revealed by phospho-imaging on an 470 Amersham Typhoon (Cytiva) using a Gaussian peak fit generated with MagicPlotPro (MagicPlot 471 Systems LLC). Cells from two ubf wt/wt MEF clones (#3, #4) and three ubf E210K/E210K MEF clones (#1, #2, #3) were 531 continuously cultured for more than a week prior to assay. Cells were plated at ~500 per well in 532 96-well plates and cultured for six days. At each timepoint, duplicate wells were treated with 533 Hoechst 3342 (Invitrogen, Thermo Fisher Scientific) for 45min. Images were acquired using 534 Cytation5 (Cell Imaging Multi-Mode Reader by BioTek) and cell counts for each clone were 535 determined using the Gen5 software. 536 537

Total RNA Extraction and Quantification 538
Cells were trypsinized, counted and total RNA was recovered from 3x10 6 cells using 1 ml of Trizol 539 (Invitrogen, Thermo Fisher Scientific) according to the manufacturer's protocol. RNA yields were 540 determined using Qubit RNA BR (Invitrogen, Thermo Fisher Scientific).

SUPPORTING RESULTS 790
The Taf1B gene is essential for mouse development beyond early blastula. 791 Mouse lines carrying a targeted "Knockout First" insertion in the gene for Taf1B (Taf68), were 792 established and these crossed to remove the ß-Gal and Neo cassette insertion, generating lines carrying 793 lox sites flanking exons 4 and 5 of taf1b ( Figure S1A and B). Subsequent recombination of these lox 794 sites inactivated the taf1b gene, (Figure S1C), see Supplementary Materials and Methods for more 795 detail. Mice heterozygous for the taf1b ∆ allele were found to be both viable and fertile and the null-796 allele was propagated at near Mendelian frequency (Table S1). However, no taf1b ∆/∆ homozygous 797 offspring (pups) were identified and genotyping of embryos detected no taf1b ∆/∆ homozygotes at stages 798 6.5 and later. In contrast, four taf1b ∆/∆ embryos were detected at 3.5 dpc, though only one of these 799 displayed a recognizable blastula morphology ( Figure S1D and E). It was concluded that taf1b was 800 essential for mouse development beyond blastula but that maternal Taf1B mRNA or protein, or simply 801 ribosome availability may have been sufficient to support development beyond the morula stage. This 802 is fully consistent with the previous data for inactivation of the TBP gene tbp/gtf2d (47), and suggests 803 the interesting possibility that the effects of TBP-loss on early development could in large part be due 804 to inactivation of RPI transcription. In support of this possibility, the SL1 complex is known to be 805 generally less abundant than the RPII/PolII TFIID complex (48) and so could be limiting for embryo 806 growth. Further, inactivation of the genes for the RPI factors Ubtf (ubtf) and Rrn3/TIF1A (rrn3) arrest 807 mouse development during early cleavage stages (24, 26). A similar argument could be made for 808 inactivation of RPIII/PolIII transcription since loss of the Brf1 subunit of the TFIIIB complex also 809 causes developmental arrest during early cleavage stages (49). We conclude that the maternal protein 810 translation machinery is limiting in the cleavage embryo and must be replenished by zygotic expression 811 to allow further development. 812 3.5 20 3 (15%) 13 (65%) 4 (20%) 6.5 23 12 (52%) 11 (48%) 0 (0%) 7.5 15 6 (40%) 9 (60%) 0 (0%) 8.5 35 8 (23%) 27 (77%) 0 (0%) 9.5 16 4 (25%) 12 (75%) 0 (0%) Pups 112 35 (31%) 77 (69%) 0 (0%) 815 Table S1. Numbers and genotypes of embryos and pups derived from matings of Taf1b+/mice. before and after taf1b inactivation are shown, (dark blue line), and the best Gaussian peak fits to these profiles (dashed red line). In the case of Taf1b the profile closely followed a single Gaussian peak from 860 which both the position and relative occupancy were determined. Since Ubtf was present not only at 861 the promoter but also over the adjacent regions, curve fits were made using three Gaussians peaks, and 862 the central one used to estimate relative occupancy.       Taf1b, taf1b+