Cohesin-independent STAG proteins interact with RNA and localise to R-loops to promote complex loading

Most studies of cohesin function consider the Stromalin Antigen (STAG/SA) proteins as core complex members given their ubiquitous interaction with the cohesin ring. Here, we provide functional data to support the notion that the SA subunit is not a mere passenger in this structure, but instead plays a key role in the localization of cohesin to diverse biological processes and promotes loading of the complex at these sites. We show that in cells acutely depleted for RAD21, SA proteins remain bound to chromatin, cluster in 3D and interact with CTCF, as well as with a wide range of RNA binding proteins involved in multiple RNA processing mechanisms. Accordingly, SA proteins interact with RNA and are localised to R-loops where they contribute to R-loop regulation. Our results place SA1 within R-loop domains upstream of the cohesin complex and reveal a role for SA1 in cohesin loading which is independent of NIPBL, the canonical cohesin loader. We propose that SA1 takes advantage of structural R-loop platforms to link cohesin loading and chromatin structure with diverse functions. Since SA proteins are pan-cancer targets, and R-loops play an increasingly prevalent role in cancer biology, our results have important implications for the mechanistic understanding of SA proteins in cancer and disease.


INTRODUCTION 25
Cohesin complexes are master regulators of chromosome structure in interphase and 26 mitosis. Accordingly, mutations of cohesin subunits lead to changes in cellular identity, 27 both during development and in cancer 1-3 . A prevailing model is that cohesin 28 contributes to cell identity changes in large part by dynamically regulating 3D genome 29 organization and mediating communication between distal regulatory elements [4][5][6][7][8][9][10] . 30 Molecular insight into how and when cohesin subunits become associated with 31 chromatin and contribute to this function in vivo in human cells is still lacking under 32 this model. 33 Most studies of cohesin function consider the Stromalin Antigen (STAG/SA) 34 proteins as core complex members given their ubiquitous interaction with the tripartite 35 cohesin ring (composed of SMC1, SMC3 and SCC1/RAD21). Rarely is the SA subunit 36 considered for its roles independent of the cohesin ring, even though it is the subunit 37 most commonly mutated across a wide spectrum of cancers 1,11,12 . 38 SA proteins contribute to cohesin's association with DNA 13,14 . The yeast SA 39 orthologue is critical for efficient association of cohesin with DNA and its ATPase 40 activation 13,14 . Separating interactions into SA-loader and cohesin ring-loader sub-41 complexes still impairs cohesin loading, indicating that SA functions as more than just 42 a bridge protein 14 . Crystallisation studies reveal a striking similarity of SA with NIPBL 43 (the canonical cohesin loader 15 ), in that both are highly bent, HEAT-repeat containing 44 proteins 16,17 . Of note, NIPBL and SA interact together and wrap around both the 45 cohesin ring and DNA to position and entrap DNA 18-20 , implying a potential role for SA 46 in the initial recruitment of cohesin to DNA alongside NIPBL. Further, SA proteins 47 bridge the interaction between cohesin and CTCF 18,21,22 , and also bridge interactions 48 with specific nucleic acid structures in vitro 23,24 . 49 Mammalian cells express multiple SA paralogs. SA1 binds to AT-rich telomeric 50 sequences 23,24 and SA2 displays sequence-independent affinity for particular DNA 51 structures commonly found at sites of repair, recombination, and replication 25 . We set out to investigate the nature of SA proteins and cohesin loading to DNA. 68 We discovered independent functions of the SA proteins, providing critical insight into 69 the importance they play in their own right to direct cohesin's localization and loading 70 to chromatin. In cells acutely depleted of RAD21, SA proteins remain associated with 71 chromatin and CTCF where they are enriched at chromatin sites clustered in 3D. 72 Moreover, we identify numerous, diverse cohesin-independent SA1 interactors 73 involved in RNA processing, ribosome biogenesis, and translation. Consistent with 74 this, SA1 and SA2 interact with RNA and non-canonical nucleic acid structures in the 75 form of R-loops where SA1 suppresses R-loop formation. Importantly, SA proteins are 76 required for loading of cohesin to chromatin in cells deficient for NIPBL. Our results 77 highlight a central role for SA proteins in cohesin biology and the cohesin-independent 78 interaction of SA proteins with RNA processing factors opens up a new understanding 79 of how SA dysregulation can impact disease development that moves us beyond the 80 control of chromatin topology for gene expression regulation. 81 transcription, RNA processing, ribosome biogenesis, and translation (Fig. 2c, d). 241 Within this group there are chromatin remodeling proteins (INO80 and SMARCAL1) 242 and several transcriptional and epigenetic regulators such as JARID2 and TAF15. 243 Similar to our ChIP and coIP results, this suggests that SA1 maintains interaction with 244 proteins that localize with it in the presence of cohesin, albeit at different abundances. 245 RNA processing was the most enriched category in the SA1 ΔCoh PPI network 246 (FDR=3.62x10 -39 ) and included proteins involved in RNA modification (YTHDC1, 247 ADAR1, FTSJ3), mRNA stabilization and export (SYNCRIP, FMR1), and RNA splicing 248 regulators (SRSF1, SON). We also found a significant enrichment for DNA and RNA 249 helicases (FDR=3.54x10 -08 ) as well as RNA binding proteins (FDR=9.11x10 -11 ) within 250 which were many HNRNP family members (HNRNPU, aka SAF-A). We also found a 251 highly significant enrichment of proteins associated with ribosome biogenesis 252 (FDR=2.20x10 -30 ) including both large and small subunit components; rRNA 253 processing factors and components of the snoRNA pathway (FDR=4.39x10 -05 ). 254 Finally, translation was significantly enriched as a biological process (p=1.64x10 -06 ), 255 with several cytoplasmic translation regulators identified as SA1 ΔCoh interactors 256 (DHX29, GCN1L1) (Fig. 2c,d,S2d,e). Among these is ESYT2 which is primarily found 257 in the cytoplasm and contains a F/YXF-motif (Fig. 2c,d). We validated 8 of the highest-258 ranking proteins within the enriched functional categories described above in EtOH 259 and IAA-treated RAD21 mAC cells (Fig. 2d). Importantly, the enrichment of these 260 proteins with SA1 in the IAA condition suggests that SA may have a role in these 261 processes independently of the core cohesin complex. 262 Comparison of the SA1 ΔCoh interactome with the SA1 interactome revealed that 263 the proteins involved in RNA processing (FDR= 0.0298), ribosome biogenesis 264 (0.0197), ribonucleoprotein complex biogenesis (0.0298) and rRNA processing 265 (0.0409) were enriched with SA1 following IAA treatment compared to SA1 in the 266 presence of RAD21 (Fig 2c, dotted lines). Overall, our results show that SA1 ΔCoh PPIs 267 contain not only transcriptional and epigenetic regulators, but are in fact predominantly 268 enriched for proteins with roles in RNA processing and modification, ribosome 269 biogenesis and translation pathways. Thus, SA1 is involved in several biological 270 processes and may facilitate an aspect of cohesin regulation at a variety of functionally 271 distinct locations. 272

SA proteins bind RNA independently of cohesin. 273
Since RNA binding and RNA processing were among the most enriched categories in 274 the SA1 ΔCoh PPI network, we hypothesized that SA proteins may also bind RNA. We 275 performed SA-crosslinking and immunoprecipitation (CLIP) in untreated RAD21 mAC 276 cells and found that both SA1 and SA2 directly bound RNA (Fig. 3a, b). This was 277 evidenced by detection of RNPs of the expected molecular weights, with a smear of 278 trimmed RNA, which was stronger in the +UV and +PNK conditions, increased as the 279 RNaseI concentration was reduced, and which was lost after siRNA-mediated SA KD 280 ( Fig. S3a-c). We repeated the experiment in EtOH-and IAA-treated RAD21 mAC cells 281 to determine if the SA subunits can bind RNA in the absence of cohesin. As before, 282 RAD21 depletion reduced SA1 and SA2 (Fig. S3d) and the amount of RNA crosslinked 283 remained proportional to the amount of residual SA1 and SA2 protein (Fig. 3c, S3e), 284 demonstrating that cohesin is not required for the interaction of these proteins with 285 RNA in cells. Thus, cohesin-independent SA proteins interact with a wide array of RNA 286 binding proteins (RBPs) as well as with RNA itself. 287 288 SA proteins localise to endogenous R-loops in the absence of cohesin. 289 Proteins involved in RNA processing, such as splicing, modification and export, act as 290 regulators of R-loops 42 . Furthermore, R-loops accumulate at sites of multiple 291 biological processes including transcription, DNA replication and DNA repair 42 . As 292 many of these processes were enriched in the SA1 interactome, we reasoned that the 293 diversity of biological processes represented in the SA1 ΔCoh PPI network may be 294 reflective of a role for SA proteins in R-loop biology. 295 We performed a number of experiments to investigate the localization of SA 296 proteins at endogenous R-loops. First, we found a correlation between global SA and 297 R-loop levels. We depleted endogenous R-loops by overexpressing ppyCAG-298 RNaseH-V5 in HCT116 cells. IF using the R-loop specific antibody S9.6 revealed that 299 nuclear S9.6 levels were significantly reduced in cells which expressed V5 (38% of 300 controls, p=0.04) and that mean SA1 signal was significantly reduced by 29% in the 301 same cells (Fig. 3d). Furthermore, RAD21 mAC cells treated with scramble control 302 siRNAs or Smartpool (SP) siRNAs to AQR (a known suppressor of R-loops 43 ), SA1 303 or SA2 revealed that S9.6 IF signal was significantly increased in siAQR and siSA1 304 but not siSA2 cells compared to the siScr control (mean S9.6 signal increased by 28%, 305 p=0.0004; 32%, p=3.90E-8; reduced by 10%, p=0.17, respectively) (Fig. S3f). 306 Although S9.6 signal was reduced by IF in RAD21 mAC cells treated with IAA, this did 307 not represent a significant change using this method (Fig. S3g). 308 We also performed STORM imaging on EtOH and IAA-treated RAD21 mAC cells 309 to assess the nuclear distribution of SA1 in the context of R-loops with and without 310 RAD21. We measured the ratio of the SA1 signal inside and outside of the S9.6 signal 311 mask. A ratio of 1 indicates a random distribution of SA1 with respect to S9.6 domains 312 while a ratio above 1 reflects enrichment within S9.6 domains. In EtOH conditions, 313 we did not detect enrichment of SA1 localizations, in fact SA1 was modestly depleted 314 (mean ratio 0.93). However, upon IAA treatment, we observed a significant enrichment 315 of SA1 localizations within S9.6 domains (mean ratio 1.24, p<0.0001) (Fig. 3e), 316 strongly suggesting that SA1 proteins are localized within R-loop domains 317 independently of cohesin. 318 In addition, we returned to our IP-MS experiment to analyse enrichment of R-319 loop-associated proteins in our SA1 ΔCoh interactome. We overlapped the proteins 320 identified in two independent IP-MS experiments for R-loop interactors 44,45 to create 321 a high-confidence 'R-loop interactome' and then used a hypergeometric distribution to 322 determine the significance of this category in the SA1 ΔCoh interactome (Methods). Both 323 the custom R-loop interactome as well as proteins from the individual studies were 324 highly over-enriched in the SA1 ΔCoh interactome 325 respectively) (Fig. 3f). To directly measure this, we optimised a coIP method using the 326 S9.6 antibody in RAD21 mAC cells (Fig. 3g, S3h). In agreement with published results, 327 we found that S9.6 precipitated the known R-loop helicases AQR, DHX9, RNase H2 328 43,44 as well as MCM3 and RNA Pol II (POLR2) 46 . Both SA1 and SA2 precipitated with 329 S9.6 and treatment with RNase H (RNH) revealed the specificity of the S9.6-SA 330 interactions since the reduction of R-loop signal was proportional to the observed 331 reduction in coIP of SA1 by S9.6 ( Fig. 3g, S3h, i). 332 Finally, we used a high resolution, genome-wide method to detect R-loops in 333 HCT116 cells. RAD21 mAC cells were treated with RNH to confirm the specificity of our 334 method and with EtOH or IAA to assess the impact of cohesin loss on R-loops and 335 subjected to DNA-RNA Immunoprecipitation coupled with sequencing (DRIP-seq) 336 using the S9.6 antibody. We combined these datasets with our ChIP-seq for SA 337 proteins, RAD21 and CTCF in EtOH or IAA conditions to confirm the associations 338 described above. We detected 50,338 RNH-sensitive R-loop sites which were also 339 sensitive to acute degradation of RAD21, albeit not to the same extent as RNH 340 treatment (average S9.6 signal was reduced by 31.4% in RNH and 16.8% in IAA) (Fig.  341 3h, i). Among the RNH-sensitive R-loop sites, we detected two regimes of SA-R-loop 342 biology. A small proportion of R-loop sites directly overlapped with SA1/2, RAD21 and 343 CTCF in control EtOH conditions. These sites were enriched at genes and both the 344 SA1 and SA2 read density was sensitive to RAD21 loss (Fig. 3h,i,S3j,k). On the 345 other hand, a larger proportion of R-loops had SA signals adjacent (bound within 2kb 346 of the R-loop peak). Interestingly, these SA sites were enriched in repressed chromatin 347 and were not sensitive to RAD21 loss, in fact their read density was enriched 348 compared to EtOH conditions (Fig. 3h, i, S3j, k), reminiscent of the enrichment 349 observed previously by STORM imaging (Fig. 3e). 350

NIPBL-independent cohesin loading mediated by SA proteins. 352
Our results thus far revealed that SA ΔCoh is localised to clustered regions, engages 353 with RNA and various RBPs and is localised to R-loops hybrids. Several lines of 354 evidence suggest that alongside the canonical NIPBL/Mau2 loading complex, SA 355 proteins contribute to cohesin's association with chromatin 13,14 and that its functions 356 may go beyond simply acting as a bridging protein 14 . Thus, we hypothesized that SA 357 proteins support genome organization in their own right and herein facilitate cohesin's 358 association with chromatin. 359 The RAD21 mAC system has the advantage that when IAA is washed-off cells, 360 the RAD21 protein is no longer degraded and can become 're-loaded' onto chromatin. We assessed this by measuring mClover signal intensity using IF and observed that it 362 was robustly lost in IAA conditions and was partially restored to EtOH levels within 4hr 363 of IAA withdrawal (Fig. 4a,b, S4a). We note the spatial distribution of RAD21 was itself 364 variable, ranging between highly compartmentalised and randomly distributed (Fig.  365 4a, c). This provided a unique opportunity to assess how SA influences cohesin 366 reloading in vivo and the potential role for RNA and R-loops in this process. 367 We assessed reloading using both single-cell and bulk methods, coupled with 368 siRNA-mediated KD to determine how specific proteins affected cohesin reloading in 369 vivo. We first measured the impact of the canonical cohesin loader, NIPBL. RAD21 mAC 370 cells were treated with scramble or NIPBL siRNAs and subsequently grown in EtOH 371 or IAA. The '0h' and '4h' post EtOH/IAA wash-off samples represent the extent of 372 cohesin degradation or reloading, respectively (Fig.S4b). Chromatin fractionation in 373 high-salt conditions followed by immunoblot analysis confirmed the loss of the loader 374 complex, NIPBL and MAU2 (known to become destablised upon NIPBL loss 47 ). As 375 expected, in NIPBL KD conditions, mean RAD21 re-loading efficiency was reduced, 376 although surprisingly, this was incomplete (41% of the siRNA controls; mean re-377 loading siNIPBL, 2.1 vs siCon, 3.6), and did not represent a statistically significant 378 difference (p=0.33) (Fig. 4d, e, S4c). This result was reproduced using IF, where mean 379 mClover signal in siNIPBL-treated cells was 45.1% of siRNA control (MFI siCON, 6563 380 vs siNIPBL, 3602) (Fig 4g), indicating that cells can still load cohesin in the absence 381 of NIPBL. 382 We reasoned that SA proteins may be contributing to the observed NIPBL-383 independent reloading. Thus, we repeated the experiments to include siRNA to SA1 384 and SA2 together (siSA), and a siNIPBL+siSA condition. In both population and single 385 cell analysis of reloading, SA KD had a more dramatic effect on cohesin re-loading 386 efficiency than NIPBL KD, reducing RAD21 on chromatin to 51% of scramble controls 387 (mean siSA, 1.9 vs siCon, 5.1, p=0.002 for Fig. 4f, S4d and MFI siSA,2303 p<0.0001 388 for Fig. 4g). In the absence of both SA and NIPBL, cohesin reloading was reduced 389 further (mean siNIPBL+siSA, 1.4 vs siCon, 5.1, p=0.001 for Fig. 4f, S4d and MFI 390 siNIPBL+SA,1925 p<0.001 for Fig. 4g), indicating that SA performs an important and 391 complementary step to NIPBL during normal reloading. Given the differences between 392 SA1 and SA2 reported herein, we also performed the reloading experiment to separate 393 the effects of SA1 and SA2. As expected from our co-IP results (Fig. 1c), RAD21 levels 394 in RAD21 mAC cells were more affected by siSA2 than siSA1 (Fig. S4e). We observed 395 that cohesin reloading was more efficient in siSA2 (where SA1 is present) than in 396 siSA1 (where only SA2 is present), and that siSA1 was similar in reloading to siSA 397 ( Fig. S4e). Together these observations suggest that the bulk of the reloading in IAA 398 conditions is supported by SA1. 399 400 SA proteins stabilize nascent RNA in the absence of cohesin. 401 Given the association of SA proteins with RNA and RBPs and the dependence of 402 cohesin reloading on SA1, we tested the requirement for RNA in cohesin reloading. 403 Cells were treated as above with a pulse of 5 ethynyl uridine (EU) prior to collection. 404 EU becomes actively incorporated into nascent RNA and can be measured by IF 405 alongside the change in RAD21-mClover. While a significant reduction in nascent 406 RNA signal was detected upon treatment with Triptolide (TRP), mClover signal was 407 not significantly changed compared to IAA washoff conditions, indicating that RNA is 408 not a key determinant of cohesin reloading per se (Fig 4h, left panel). However, we 409 did observe an increase in nascent RNA upon acute RAD21 degradation which 410 returned to EtOH levels when cohesin became reloaded onto chromatin (Fig 4h,  Our results thus far showed that SA proteins remain chromatin associated in the 420 absence of cohesin ( Fig. 1), when they bind RBP (Fig. 2) and RNA and are localized 421 to R-loops (Fig. 3). We also report that SA1 proteins contribute to cohesin's re-422 association with chromatin and that this involves nascent RNA (Fig. 4). Thus, we 423 reasoned that SA may facilitate cohesin reloading at R-loops. It was technically 424 challenging to measure reloading upon over-expression of ppyCAG-RNaseH. As an 425 alternative, we used STORM imaging to assess the nuclear distribution of the reloaded 426 cohesin in the context of R-loop clusters by comparing EtOH-and IAA-treated to IAA-427 washoff RAD21 mAC cells (Fig. 4j). As before, we measured the ratio of signal (this time 428 RAD21-mClover) inside and outside of the S9.6 mask. Interestingly, in EtOH 429 conditions, RAD21 localizations were depleted from the S9.6 domain (mean ratio 0.95) 430 ( Fig. 4j) similar to what we observed for SA1 (Fig 3h). Since STORM is such a 431 sensitive approach, trace localizations of mClover will always be detected, even in IAA 432 conditions when the bulk of the signal is lost. The few localizations we observed were 433 indeed modestly enriched within the S9.6 mask, although these were not significantly 434 different from EtOH (mean ratio 1.08, p=0.10). These localizations may represent 435 either extremely stable or freshly loaded cohesin. Upon IAA washoff, new RAD21-436 mClover molecules are readily detected, became significantly enriched within S9.6 437 domains compared to EtOH treated cells (mean ratio 1.19, p=0.029) and were 438 sensitive to treatment with RNase H (mean ratio 0.98) (Fig 4j). Overall, our results 439 point to a role for SA1 proteins in mediating reloading of cohesin at R-loops. 440

A basic exon in the C-terminus of SA2 tunes interactions with RBPs. 442
While both SA1 and SA2 played a role in cohesin's reloading, SA1 was the dominant 443 paralog ( Fig S4e). In addition, SA2 was not able to compensate for SA1 in R-loop 444 stability (Fig S3f), despite its interaction with RNA ( Fig. 3a, b) and R-loops (Fig. S3h). 445 Previous publications have described association of RBP from SA2 MS-IP in HCT 446 cells 40 . Indeed, several of these RBPs overlap with the proteins described here as 447 SA1 interactors (Fig. 2b, c) and are enriched in SA1 IP in IAA conditions (Fig. 2a, d). 448 However, we did not observe robust enrichment of RBPs compared to input in SA2 449 IP, in either EtOH or IAA conditions. This was reminiscent of the differential 450 interactions between SA1 and SA2 with F/YXF containing proteins (Fig. 2a). These 451 results thus raised the question of whether additional features in SA2 may be required 452 to stabilize these interactions and functions. 453 SA1 and SA2 express transcript variants in RAD21 mAC cells. We re-analysed 454 publicly available RNA-seq datasets and quantified alternative splicing profiles using 455 VAST-tools analysis 48 . One prominent variant which is conserved between human 456 and mouse (Fig. S5a, b), arises from the alternative splicing of a single C-terminal 457 exon, exon 31 in SA1 (SA1 e31∆ ) and exon 32 in SA2 (SA2 e32∆ ) (Fig. 5a), The 458 significance of this is unknown. We found that in human HCT cells, the majority of 459 SA1 mRNAs include e31 (average 'percent spliced in' (PSI) 97.7%), while the majority 460 of SA2 mRNAs exclude e32 (average PSI 20.4%) (Fig. 5b, S5a, b). We confirmed 461 this at the protein level by designing custom esiRNAs to specifically target SA1 e31 or 462 SA2 e32 (Methods). Smartpool (SP) KD reduced the levels of SA1 and SA2 to similar 463 extents compared to scrambled controls (87% and 94%, respectively) ( Fig. 5c). 464 Specific targeting of SA1 e31 led to a reduction of 85% of SA1 compared to esiRNA 465 control (which was comparable to SP KD). In contrast, SA2 e32 targeting had a 466 minimal effect on SA2 protein levels compared to its esiRNA control (reduction of 2%) 467 ( Fig. 5c), in line with the PSI data ( Fig. 5b) and indicating that the dominant SA2 468 isoform does not contain e32. 469 These results imply that cells 'tune' the availability of e31/32 in SA proteins, 470 prompting us to investigate the nature of these exons. Interestingly, the amino acid 471 (aa) sequence of the spliced SA exons encode a highly basic domain within an 472 otherwise acidic C-terminus (Fig. 5a,. Overall, the SA paralogs are highly 473 homologous, however the N-and C-termini diverge in their aa sequence. Despite this, 474 e31 and e32 have retained their basic properties (pI=10.4 and 9.9, respectively) 475 ( Fig.5a, zoom-in). Basic patches can act as regulatory domains and bind nucleic acids 476 prompting us to ask whether these alternatively spliced basic exons contribute to the 477 association of SA proteins with RNA (Fig 3a). We cloned cDNAs from HCT116 cells 478 representing the exon32-containing SA2 (SA2 e32+ ) and the canonical exon32-lacking 479 SA2 (SA2 e32∆ ), tagged them with YFP, expressed them in HCT116 cells and purified 480 the tagged isoforms to compare their ability to interact with RNA ( Fig. 5d) using CLIP. 481 While the presence of e32 did not change the ability of SA2 to interact with RNA ( Fig.  482 5d, blue arrows), cells expressing the alternative exon routinely enriched RBPs with 483 molecular weights ~110-140kDa (Fig 5d, black arrow), strongly suggesting that the 484 e32 domain may act to stabilize the association of SA2 with RBPs. 485 To identify the proteins stabilized by the presence of e32, we coupled YFP-SA2 486 isoform CLIP with Mass Spectrometry. Three biological replicate IPs were prepared 487 from RAD21 mAC cells that were transfected with either YFP-SA2 e32+ or YFP-SA2 e32∆ . 488 YFP IP efficiency for SA2 e32+ or SA2 e32∆ was similar and both isoforms interacted with 489 core cohesin subunits ( Fig S5c). We identified a total of 238 proteins, the majority of 490 which overlap in the two SA IPs and with a previously published SA2 IP 40 (Fig S5c,  491 d). We used a pairwise analysis of SA2 e32+ vs SA2 e32∆ samples to generate a fold-492 change value for each putative interactor (Fig 5e). GO analysis of proteins changed 493 by at least 1.5-fold, and absent in Mock IP revealed a mild enrichment for post-494 translational modification category from the SA2 e32∆ IP (FDR=0.0234, p=1.35e-06), 495 and conversely an enrichment of the RNA Binding category from SA2 e32+ (FDR= 496 3.43E-05, p=6.56E-09). Interestingly, the enriched proteins included YTHDC1 and 497 YTHDF3 (previously identified in the SA1 ΔCoh interactome , Fig 2b), DIS3 and POLR2B,498 all known to play key roles in RNA-protein complexes and stability, have molecular 499 weights ~110-140kDa and thus likely represent the specifically enriched band in the 500 CLIP experiments (Fig 5d, black arrow). Finally, the observation that a basic exon 32 501 domain in SA2 supports the stability of RNA-RBP interactions led us to investigate if 502 exon 32 also stabilized SA2 at R-loops. We repeated the S9.6 IP in RAD21 mAC cells 503 expressing either YFP-SA2 e32+ or YFP-SA2 e32∆ . As before, AQR and MCM3 were 504 enriched by S9.6 IP ( Fig. 5f) and we found that SA2 e32+ was more enriched in the S9.6 505 IP compared to SA2 e32∆ (enrichment of 1.8x and 1.24x respectively, relative to 506 endogenous SA2) (Fig 5f, g). Taken together, our results support a role for the 507 alternatively spliced C-terminal basic domain of SA in stabilizing interactions with 508 RBPs and R-loops. 509 510 511

DISCUSSION 512
Whether SA proteins function in their own right outside of the cohesin complex is rarely 513 considered. Consequently, our understanding of how these proteins contribute to 514 cohesin function and disease is incomplete. In this study, we shed light on this 515 question by uncovering a diverse repertoire of SA1 interactors in cells acutely depleted 516 for the cohesin ring. This ranges from proteins associated with translation and 517 ribosome biogenesis to RNA processing factors and regulators of 518 the epitranscriptome. These observations suggest that SA1 has a previously 519 unappreciated role in post-transcriptional regulation of gene expression which offers 520 much-needed new insight into its roles in disease and cancer. previously been shown to support cohesin stabilisation at CTCF at the IGF2/H19 locus 546 53 . These results are in line with our findings that a basic domain in the unstructured 547 C-terminal portion of SA supports RNA-associated protein interactions. 548 This study also reveals SA1 as a novel regulator of R-loop homeostasis. It is 549 noteworthy that other suppressors of R-loop formation include RNA processing 550 factors, chromatin remodellers and DNA repair proteins 28 which all function in the 551 context of nuclear bodies 54 . We find that SA1 proteins are enriched at very distal 552 chromatin interactions in cohesin-depleted Hi-C data, interact with numerous RBPs 553 known to condense in 3D 55,56 and are enriched in S9.6 domains in cells where we find 554 cohesin becomes associated with chromatin. Harnessing such condensates would 555 provide an efficient loading platform for cohesin at sites of similar biological function. 556 Yeast cohesin has been shown to mediate phase separated condensate structures 57 . 557 Our results support this view and further suggest that it is SA (and possibly 558 predominantly SA1 in HCT116 cells), with its propensity for intrinsically disordered 559 domains 51 that contribute to this formation, thereby linking cohesin loading to 560 biological functions. We note that if SA paralogs or isoforms direct different localization 561 We are grateful to Jernej Ule for his support with DRIP-sequencing and to Julian 569 Zagalak and the CRICK sequencing facility for reagents, advice and assistance. We 570 thank Stanimir Dulev for his contributions at the early stages of the project and Jiten 571 Manji for his support with microscopy. We also thank Konstantina Skourti-Stathaki for 572 advice about S9.6 antibody, IFs and R-loops. We are grateful to the members of the 573 Hadjur lab for critical discussions and reading of the manuscript. with EtOH (EtOH) as a control or Auxin (IAA) for 4hrs. Nuclei were counterstained with 602 DAPI. 603 604 b) Imaris quantification of the relative mean fluorescence intensity (MFI) of mClover, 605 CTCF, SA1 and SA2 in EtOH and IAA-treated RAD21 mAC cells. Whiskers and boxes 606 indicate all and 50% of values, respectively. Central line represents the median. 607 Asterisks indicate a statistically significant difference as assessed using two-tailed t-608 test. **** p<0.0001. n>50 cells/condition from 3 biological replicates.

METHODS 955
Cell culture and IAA-mediated degradation of Rad21.

956
HCT116 cells with engineered RAD21-miniAID-mClover (RAD21mAC), or OsTIR1-957 only, or both (RAD21mAC-OsTIR) were obtained from Masato T. Kanemaki. 958 Throughout this study we used RAD21mAC-OsTIR cells, and for simplicity we refer to 959 them in the text as RAD21 mAC . The cells were maintained in McCoy's 5A medium 960 with Glutamax (Thermo Fisher Scientific) supplemented with 10% Heat-inactivated 961 FBS (Gibco), 700µg/ml Geneticin, 100µg/ml Hygromycin B Gold 962 and 100µg/ml Puromycin as described. We clonally selected the RAD21mAC-OsTIR 963 cells by sorting green fluorescence positive single cells on a FACS Aria Fusion cell 964 sorter (BD Bioscience). Single cells were individually seeded into one well of a 96-well 965 plate, expanded for 10 days into 6cm culture dishes and selected with Geneticin, 966 Hygromycin B Gold and Puromycin as indicated above in McCoy's medium for another 967 10 days. Each clone was assessed for efficiency of Rad21 degradation using FACS 968 analysis and western blotting (WB) using mClover, mAID and OsTIR antibodies. Two 969 clones (H2 and H11) were taken forward and used throughout this study. Horizon Discovery). A final concentration of 10 nM of siSA1, siSA2, or siNIPBL or 980 5 nM of siAQR was reverse transfected into the cells using 981 Lipofectamine RNAiMAX reagent (Invitrogen), as per the manufacturer's 982 instructions. Cells were plated at a density of 1 -1.25 x 10 6 cells per 10 cm dish and 983 harvested 72hrs post-transfection, at a confluency of ~70%. The Lipofectamine-984 containing media was replaced with fresh media 12-16 hrs post-transfection to avoid 985 toxicity. For Figure 5f/g, incubation time was reduced to 40 hrs. To account for the 986 reduced growth time, cells were plated at a density of 2-3 x 10 6 cells per 10 cm dish. 987 Here siCon-and siNIPBL-transfected cells were plated at a lower cell number 988 than siAQR-transfected cells to ensure equalised confluence ( single-end reads. Each biological set was sequenced on a separate run. 1142 Quality control of reads was preformed using FASTQC. Reads were aligned to 1143 the hg19 reference genome using Bowtie with 3 mismatches. PCR duplicates were 1144 detected and removed using SAMTOOLS. Bam files were imported into MISHA (v 1145 3.5.6) and peaks were identified using a 0.995 percentile. Peaks that overlapped in 1146 both replicates were retained. Only replicate 1 of the SA1 library was used. Correlation 1147 plots of peaks across the genome from different ChIP libraries were compared with 1148 log-transformed percentiles plotted as a smoothed scatter plot. Comparison of peaks 1149 at regions of interest were carried out using deepTools (Version 3.1.0-2). For input 1150 into deepTools, peak data was converted to bigwig format, with a bin size of 500, using 1151 the UCSC bedGraphtoBigWig package. The signal matrix was calculated for a window 1152 2,000 bp up-and down-stream of the region of interest, missing data was treated as 1153 zero, and all other parameters were as default. Heatmaps were generated 1154 within deepTools, with parameters as default. Read density profile plots were plotted 1155 in ggplot using deepTools profilePlot -perGroup data and smoothed using 1156 geom_smooth default 'gam' settings. 1157 1158 DRIP-sequencing.

1159
DRIP lysates were prepared from chromatin. Chromatin was fractionated as described 1160 for ChIP samples above, with the following changes. Samples were not fixed and were 1161 collected from the plate by scraping in ice-cold PBS. Sonication was performed to 1162 were identified and subset for only the high-scoring neighbours. This created a list of 1205 high scoring neighbours for each high scoring contact, where the first neighbour is the 1206 contact itself with a distance of 0. This allowed the neighbour information to be 1207 converted into edge information, thereby allowing high score fend contacts to be 1208 grouped into cluster hotspots using the R package 'igraph'. Hotspots that contained 1209 less than the minimum number of high scoring fends (<100) were removed. The output 1210 list of hotspots were represented as 2D intervals which contained high scoring 1211 contacts. In total, 5539 hotspots were identified in EtOH and 759 in IAA Hi-C data. 1212 Creating aggregate plots -To calculate and visualise the contact enrichment at 1213 hotspots in the EtOH and IAA Hi-C, we used the R package 'shaman'. Firstly, we used 1214 the function 'shaman_generate_feature_grid' to calculate the enrichment profile at 1215 EtOH and IAA hotspots. Using the weighted centre for each hotspot, represented as 1216 a 2D interval we used the function to build grids for the EtOH and IAA hotspots in the 1217 HiC data at 3 specific bands, 100k -1MB, 1MB -5MB, 5MB -10MB. A range of 1218 250kb was visualised around the weighted centre. The grid was built by taking all 1219 combinations interval1 and interval2 of the EtOH and IAA hotspot centres, with each 1220 combination termed a 'window'. Hotspots were not filtered for size or shape. A score 1221 threshold of 60 was used to focus on enriched pairs, those windows that did not 1222 contain at least one point with a score of 60 were discarded. Each window was then 1223 split into 1000nt bins and the windows were summed together to generate a grid 1224 containing the observed and expected contacts. We visualised the grid using 1225 'shaman_plot_feature_grid' using 'enrichment' mode and a plot_resolution value of 1226 6000, due to the large range being visualised. 1227 1228 STORM -Immunolabelling and imaging.

1229
Two clones of RAD21 mAC -OsTIR cells were seeded at a density of 30,000 cells per 1230 well per 400ul) onto poly-L-lysine coated 8-well chamber slides (Lab-Tek™ 155411) 1231 overnight. Each clone was treated with EtOH, IAA or IAA washoff and then fixed with 1232 PFA 4% (Alfa Aesar) for 10 min at room temperature and rinsed with PBS three times 1233 for 5 min each. The cells were shipped to the Cosma Lab after fixation for STORM 1234 processing and imaging. Cells were permeabilized with 0.3% Triton X-100 in PBS 1235 and blocked in blocking buffer (10% BSA -0.01 % Triton X-100 in PBS) for one hour 1236 at room temperature. Cells were incubated with primary antibodies (see Table 2 labeled secondary antibodies were added at a 1:50 dilution in blocking buffer and were 1243 incubated for 45 min at room temperature or single fluorophore labeled commercial 1244 antibodies were added at a 1:250 dilution in blocking buffer and were incubated for 45 1245 in i) was manually rearranged in Cytoscape for visual clarity, enriched categories were 1372 visualized using the STRING pie chart function and half of the proteins within each 1373 category were subset from the network based on pvalue change between UTR and 1374 IAA samples. 1375 Over-enrichment of the s9.6 interactome was calculated separately using the 1376 hypergeometric distribution for comparison with 44,45 . Significance was calculated 1377 using the dhyper function in R and multiple testing was corrected for using the p.adjust   **** **** **** ** *** **** ****   ChIP-seq read density distance (kb)