ABSTRACT
Most studies of cohesin function consider the Stromalin Antigen (STAG/SA) proteins as core complex members given their ubiquitous interaction with the cohesin ring. Here, we provide functional data to support the notion that the SA subunit is not a mere passenger in this structure, but instead plays a key role in cohesins localization to diverse biological processes and promotes loading of the complex at these sites. We show that in cells acutely depleted for RAD21, SA proteins remain bound to chromatin and interact with CTCF, as well as a wide range of RNA binding proteins involved in multiple RNA processing mechanisms. Accordingly, SA proteins interact with RNA and are localised to endogenous R-loops where they act to suppress R-loop formation. Our results place SA proteins on chromatin upstream of the cohesin complex and reveal a role for SA in cohesin loading at R-loops which is independent of NIPBL, the canonical cohesin loader. We propose that SA takes advantage of this structural R-loop platform to link cohesin loading and chromatin structure with diverse genome functions. Since SA proteins are pan-cancer targets, and R-loops play an increasingly prevalent role in cancer biology, our results have important implications for the mechanistic understanding of SA proteins in cancer and disease.
INTRODUCTION
Cohesin complexes are master regulators of chromosome structure in interphase and mitosis. Accordingly, mutations of cohesin subunits leads to changes in cellular identity, both during development and aberrantly in cancer 1–3. A prevailing model is that cohesin contributes to cell identity changes in large part by dynamically regulating genome organization and mediating communication between distal regulatory elements 4–10. Our understanding of how cohesin’s component parts contribute to its functions and where cohesin becomes associated to chromatin in order to perform its critical roles in spatial genome organization is incomplete.
Most studies of cohesin function consider the Stromalin Antigen (STAG/SA) proteins as core complex members given their ubiquitous interaction with the tripartite cohesin ring (composed of SMC1, SMC3 and SCC1/RAD21). Rarely is the SA subunit considered for its roles outside of cohesin, even though it is the subunit most commonly mutated across a wide spectrum of cancers 1,11,12. SA proteins play a role in cohesin’s association with DNA 13,14. The yeast SA orthologue is critical for efficient association of cohesin with DNA and its ATPase activation 13,14. Recent crystallization studies of cohesin in complex with its canonical loader NIPBL15, suggest that NIPBL and SA together wrap around both the cohesin ring and DNA to position and entrap DNA 16–18, implying a role for SA in the initial recruitment of cohesin to DNA alongside NIPBL. Further, SA proteins bridge the interaction between cohesin and CTCF 16,19,20, and also bridge interactions with specific nucleic acid structures in vitro. SA1 binds to AT-rich telomeric sequences 21,22 and SA2 displays sequence-independent affinity for particular DNA structures commonly found at sites of repair, recombination, and replication 23. Consistent with this, results in yeast implicate non-canonical DNA structures in cohesin loading in S-phase. In vitro experiments show that cohesin captures the second strand of DNA via a single-strand intermediate 24, and chromatid cohesion is impaired by de-stabilisation of single-strand DNA intermediates during replication 25. Together, this implicates SA proteins in playing a regulatory role in guiding or stabilising cohesin localisations.
During transcription, the elongating nascent RNA can hybridise to the template strand of the upstream DNA and form an R-loop, which is an intermediate RNA:DNA hybrid conformation with a displaced single strand of DNA (Richardson, 1975; Roy and Lieber, 2009; El Hage et al., 2010; Roy et al., 2010). A multitude of processes have been linked to R-loop stability and metabolism. For example, proper co-transcriptional RNA processing, splicing, and messenger ribonucleoprotein (mRNP) assembly counteract R-loop formation (Li and Manley, 2005; Teloni et al., 2019). R-loop structures have also been shown to regulate transcription of both mRNA and rRNAs by recruitment of transcription factors, displacement of nucleosomes, and preservation of open chromatin (Dunn and Griffith, 1980; Powell et al., 2013; Boque-Sastre et al., 2015). Hence, like at the replication fork, sites of active transcription accumulate non-canonical nucleic acid structures.
We set out to investigate the nature of the association between SA proteins and CTCF. We discovered that far from being ‘passengers’ in the cohesin complex, SA proteins perform critical roles in their own right, directing cohesin’s localization and loading to chromatin. In cells acutely depleted of RAD21, SA proteins remain associated with chromatin and CTCF where they are enriched at 3D clustered sites of active chromatin. Moreover, we identify cohesin-independent binding of SA1 to numerous proteins involved in RNA processing, ribosome biogenesis, and translation. Consistent with this, SA1 and SA2 interact with RNA and non-canonical nucleic acid structures in the form of R-loops where SA1 acts to suppress R-loop formation. Importantly, SA proteins are required for loading of cohesin to chromatin in cells deficient for NIPBL, and loading is enhanced by modulating the levels of R-loop structures. Our results highlight a central role for SA proteins in cohesin biology. Through their diverse interactions with proteins, RNA and DNA, SA proteins act as the ‘seed’ for cohesin loading to chromatin. Finally, the interaction of cohesin-independent SA proteins with nucleolar and RNA processing factors, opens up a new understanding of how cohesin mis-regulation can impact disease development that moves us beyond its control of gene expression regulation.
RESULTS
SA interacts with CTCF on chromatin in the absence of the cohesin trimer
To determine how CTCF and cohesin assemble on chromatin, we used human HCT116 cells engineered to carry a miniAID tag (mAID) fused to monomeric Clover (mClover) at the endogenous Rad21 locus and OsTIR1 under the control of CMV (herein RAD21mAC) 26. RAD21mAC cells were cultured in control conditions (ethanol) or in the presence of auxin (IAA) to induce rapid RAD21 degradation. We used immunofluorescence (IF) to monitor the levels of mClover, SA1, SA2 and CTCF (Fig.1a, b, S1a). While acute IAA treatment robustly reduced mClover levels by over 83% compared to control cells, SA2 levels were reduced by 63% (p=4.7E-76), and SA1 was only reduced by 29% (p=7.9E-12) (Fig.1b). We also observed a variable effect on CTCF in the absence of RAD21 (reduced mean value between 7-24%). The retention of SA proteins on chromatin despite the degradation of RAD21 was surprising given the fact that they are considered to be part of a stable biochemical complex.
We sought to validate these observations using an orthogonal technique and to establish whether the residual SA proteins retained the capacity to directly interact with CTCF. We prepared chromatin extracts from RAD21mAC cells treated with ethanol or IAA and established a chromatin co-immunoprecipitation (coIP) protocol to probe the interactions between SA proteins, cohesin core subunits and CTCF. Both SA1 and SA2 interacted with RAD21 and CTCF in control cells as expected 27,28, with notable differences in their preferred interactions (Fig.1c). SA2 more strongly enriched RAD21 while the SA1-CTCF interaction was significantly stronger than SA2-CTCF (Fig.1c). Upon RAD21 degradation, we again observed a stronger effect on chromatin-bound SA2 levels compared to SA1, implying stable binding of SA2 to chromatin is more sensitive to cohesin loss than SA1. Not only did the residual SA proteins retain their ability to interact with CTCF in the absence of Rad21, but the interactions between SA1 and CTCF were further enhanced (Fig.1c). Reciprocal coIPs with CTCF confirmed the CTCF-SA interactions in RAD21-depleted cells (Fig.1d). These results were validated in a second cell line and upon siRNA-mediated knockdown of SMC3, confirming the results (Fig S1b).
We performed two-color Stochastic Optical Reconstruction Microscopy (STORM) to further assess the nuclear distribution and colocalization of SA1, SA2 and CTCF with nanometric resolution in RAD21-degraded cells. Upon IAA treatment, we observed a decreased density of detected SA1, SA2 and CTCF in two analyzed clones (Fig.1e, f, S1c, d), suggesting that RAD21 degradation affects the stability of CTCF in addition to SA proteins. As we observed by conventional confocal microscopy, SA2 localizations were more affected than SA1 (mean density reduction in SA1, 32% vs SA2, 42%). Accordingly, SA1, SA2 and CTCF clusters were more sparsely distributed across the nucleus upon RAD21 degradation as quantified by nearest neighbor distance (NND) analysis of protein clusters (Fig.1g). This analysis also revealed a higher density of SA1 and CTCF clusters compared to SA2, with shorter distances between clusters, even in ethanol conditions (Fig.1g). To further confirm that SA and CTCF were still co-localised in IAA conditions, we analyzed the relative distribution of SA clusters to CTCF clusters by analyzing the NND distribution between SA1 and CTCF, and SA2 and CTCF. NND showed that the association between SA1 and SA2 with CTCF is maintained upon RAD21 degradation as compared to both the control cells and to a simulation of randomly-distributed protein clusters at the same density (Fig. 1h). Interestingly, while the probability of SA1 at CTCF is only modestly affected in IAA conditions, supporting their continued co-localization, SA2 at CTCF is more affected in IAA treated cells, in line with results indicating that SA2 levels are more affected than SA1 when cohesin is depleted. Together, our results confirm the maintained interaction and localization patterns of SA proteins with CTCF and reveal a difference in SA paralogue stability in the absence of the core cohesin trimer.
Cohesin-independent SA proteins are localised at clustered regions in 3D
Previous analyses of the contribution of SA proteins to genome organization 7,9 were performed in cells containing cohesin rings, possibly obscuring a functional role for SA proteins themselves in genome organization. To determine if cohesin-independent SA proteins may function at unique locations in the genome, we investigated whether the residual SA-CTCF complexes (herein, SA-CTCFΔCoh) in IAA-treated RAD21mAC cells (Fig.1c, d) occupied the same chromatin locations as in control cells. Using chromatin immunopreciptation followed by sequencing (ChIP-seq), we determined the binding profiles of CTCF, SA1, SA2, RAD21 and SMC3 in RAD21mAC cells treated with ethanol or IAA. Pairwise comparisons of CTCF ChIP-seq with RAD21 or SA in control RAD21mAC cells revealed the expected overlap in binding sites (Fig.1i, S1e). In contrast, both global and CTCF-overlapping RAD21 and SMC3 ChIP-seq signals were dramatically lost in IAA-treated cells (Fig.1i, S1e). In agreement with our microscopy and biochemistry results, we detected residual SA1 and SA2 binding sites in IAA-treated cells which retained a substantial overlap with CTCF (Fig.1i, S1e). We confirmed that the sites co-occupied by CTCF and SA proteins in RAD21-depleted cells were previously bound in control conditions, indicating that CTCF and SA maintain occupancy at their canonical binding sites in the absence of RAD21. This suggests that SA interaction with CTCF in the absence of the cohesin ring is a step in normal cohesin activity.
While depletion of cohesin results in a dramatic loss of Topologically Associated Domain (TAD) structure 8, the frequency of long-range inter-TAD, intra-compartment contacts (LRC) is increased 5,8, and enriched for CTCF 5 or active enhancers 8. To determine whether residual, chromatin-bound SA could be associated with LRCs in the absence of RAD21, we re-analysed Hi-C data from control and IAA-treated RAD21mAC cells 8. We quantified all contacts within two different scales of genome organization; local TAD topology (100k-1Mb) and clustered LRCs (1-5Mb) (Fig.1j). As previously shown, local TAD contacts are lost and clustered LRCs are enriched in IAA conditions (Fig.1j). When we probed the Hi-C datasets for contacts containing the residual SA-CTCFΔCoh binding sites, we observed a further enrichment in IAA conditions (Fig.1j, bottom row), indicating that SA-CTCFΔCoh are enriched at the clustered LRCs formed when cells are depleted of cohesin and thus implicating them in 3D structural configurations. Finally, using ChromHMM, we discovered that SA-CTCFΔCoh sites are characterised by active chromatin and enhancers (Fig.S1f). Our results suggest that cohesin-independent SA, either with CTCF or alone, may itself contribute to large-scale arrangement of active chromatin and regulatory features in 3D space.
SA interacts with diverse ‘CES-binding proteins’ in RAD21-depleted cells
SA proteins contain a highly conserved domain termed the ‘stromalin conservative domain’ 14,29, or the ‘conserved essential surface’ (CES). Structural analysis of CTCF-SA2-SCC1(RAD21) has recently shown that FGF (F/YxF) motifs in the N-terminus of CTCF bind to the CES on the SA2-SCC1 sub-complex, forming a tripartite interaction patch 16. Furthermore, the authors identified an FGF-like motif in additional cohesin regulators and showed that a consensus motif could be used to predict interaction with additional chromatin proteins. Thus, we investigated whether SA could associate with other FGF-motif containing proteins in native and IAA conditions in cells. We performed chromatin IP with SA1 and SA2 in ethanol and IAA and probed for interaction with CTCF, and three additional FGF-motif containing proteins, CHD6, MCM3 and HNRNPUL2 (Fig.2a, S2a). As with CTCF, all of the proteins directly interacted with SA1 in RAD21-control cells and furthermore their interaction with SA1 was enriched upon RAD21-degradation. Interestingly, despite SA2 also containing the conserved ‘CES’ domain, the FGF-motif proteins did not interact with SA2 as strongly (Fig.2a), pointing to an additional element which functions to stabilise SA with FGF-containing proteins in vivo. These results revealed that SA can interact with proteins beyond just CTCF in the absence of cohesin, indicating a need to re-evaluate the role of SA in cohesin activity and consider possible novel functions for SA proteins.
SA1 interacts with a diverse group of proteins in the absence of cohesin
To delineate novel protein binding partners and putative biological functions of SA1, we optimised our chromatin-bound, endogenous SA1 co-IP protocol to be compatible with mass-spectrometry (IP-MS) and used this to comprehensively characterize the SA1 protein–protein interaction (PPI) network in control or RAD21-degraded RAD21mAC cells. Three biological replicate sets were prepared from RAD21mAC cells that were either untreated (UT) or treated with IAA (IAA) and processed for IP with both SA1 and IgG antibodies. In parallel, RAD21mAC cells were also treated with scrambled siRNAs or with siRNA to SA1 to confirm the specificity of putative interactors (Fig. S2b). Immunoprecipitated proteins were in-gel Trypsin digested, gel extracted, and identified by liquid chromatography tandem mass spectroscopy (LC-MS/MS). SA1 peptides were robustly detected in all UT and siCON samples and never detected in IgG controls, validating the specificity of the antibody.
We identified 1282 unique proteins that co-purified with SA1 with a False Discovery Rate <1%. After filtering steps (methods), we used a pairwise analysis of IAA vs UT samples to generate a fold-change value for each putative interactor. These candidates were found in at least 2 of the 3 SA1 IP replicates, were changed by at least 1.5-fold compared to UT controls, and sensitive to siSA1, yielding 134 high-confidence cohesin-independent SA1 (SA1ΔCoh) interactors (Fig. 2b, Table 1). As expected, core cohesin subunits SMC1A and SMC3 were strongly depleted while no peptides were detected for RAD21 (Fig. 2b). SA1 itself was significantly depleted compared to control cells, as were other cohesin regulators, known to directly interact with SA1, such as PDS5B 30. In line with the enrichment we observed for the CES-binding proteins in IAA-conditions (Fig. 1c, 2a), the vast majority of the SA1ΔCoh interactors were enriched for binding with SA1 in IAA conditions (117 of 134) (Fig. 2b).
We used STRING analysis to compute the associations between our SA1ΔCoh interactome and to identify enriched biological processes and molecular functions. This revealed that the SA1ΔCoh PPI network included gene expression, chromatin, cytoplasmic and RNA binding proteins representing a variety of functionally diverse cellular processes. Among these are processes previously associated with cohesin biology and identified in published cohesin mass-spec experiments 31, thus validating our approach, such as MCM3 and SWI/SNF components INO80 and SMARCAL1 which are involved in DNA replication and chromatin remodelling, respectively. Similarly, several transcriptional and epigenetic regulators were identified, such as PRC2 component JARID2 and TAF15 and SPTY2D1.
In addition, we identified proteins associated with SA1 in IAA conditions that were involved in functions that have not been previously associated with SA biology (Fig. 2c, d). The most enriched category was RNA processing (p=3.62−39), and included proteins involved in RNA modification (YTHDC1, ADAR1, FTSJ3); mRNA stabilization and export (SYNCRIP, FMR1); and several RNA splicing regulators (SRSF1, SON). Accordingly, we found a significant enrichment for DNA and RNA helicases (p=3.54−08) (MCM3, DHX9, more) and RNA binding proteins (p=9.11−11) within which were serveal hnRNP family members (hnRNPU or SAF-A). We also found a highly significant enrichment of proteins associated with Ribosome biogenesis (p=2.20−30) including both large and small subunit components (RPL5, 17, 29, RPS9); rRNA processing factors (BOP1, NOP56); and components of the snoRNA pathway. Translation was significantly enriched as a biological process (p=1.64−06), with several cytoplasmic translation regulators also identified as SA1ΔCoh interactors (Fig. 2c, d). Among these are ESYT2 and EIF3B which we identify as FGF-containing proteins that are primarily found in the cytoplasm (Fig. 2d). We validated 8 of the highest-ranking proteins within the enriched functional categories described above by immunoblotting in ethanol and IAA-treated RAD21mAC cells (Fig. 2d). Overall, our results show that SA1ΔCoh PPIs contain not only for transcriptional and epigenetic regulators, but also are predominantly enriched for proteins with roles in nuclear RNA processing and modification, ribogenesis and translation pathways. Accordingly, this suggests that SA may facilitate an aspect of cohesin regulation at a variety of functionally distinct cellular locations through its association with these diverse proteins.
SA proteins directly bind RNA
Since RNA binding and RNA processing were two of the most enriched categories in the SA1ΔCoh PPI network, we asked whether SA proteins could also bind RNA. We performed CLIP (crosslinking and immunoprecipitation) to determine whether SA proteins directly bind RNA in untreated RAD21mAC cells. We found that both SA1 and SA2 directly bound RNA (Fig.3a). This was evidenced by detection of RNPs of the expected molecular weights, with a smear of trimmed RNA, which was stronger in the +UV and +PNK conditions, which increased as the RNaseI concentration was reduced and which was lost after siRNA-mediated KD (Fig.3a, S3a). We repeated the experiment in IAA-treated RAD21mAC cells to determine if the SA subunits can directly bind RNA in the absence of cohesin. Although RAD21 depletion reduced SA1 and SA2 stability, the amount of RNA crosslinked to the proteins remained proportional to the amount of SA1 and SA2 protein, demonstrating that cohesin is not required for the interaction of these proteins with RNA in cells (Fig.3b, S3b. Thus, cohesin-independent SA proteins interact with a wide array of RNA binding proteins (RBP) as well as with RNA itself.
A variable exon in the C-terminus of SA tunes association w RNA binding proteins
SA1 and SA2 express transcript variants in RAD21mAC cells. One such prominent variant arises from the alternative splicing of a single C-terminal exon, exon 31 in SA1 (SA1e31Δ) and exon 32 in SA2 (SA2e32Δ) (Fig.3c). This has been observed in many cell types, however the significance of this variant is unknown. We re-analysed publicly available RNA-seq datasets for gene expression and alternative splicing. Interestingly, quantification of the splicing profiles using VAST-tools analysis 32 revealed that the frequency of the e31 or e32 splicing events were dramatically different (Fig.3d). The majority of SA1 mRNAs include e31 (average PSI 97.7%), while the majority of SA2 mRNAs exclude e32 (average PSI 20.4%). We confirmed this at the protein level by designing custom esiRNAs to specifically target SA1 e31 or SA2 e32 (Methods). Smartpool (SP) KD reduced the levels of SA1 and SA2 to similar extents compared to scrambled controls (87% and 94% respectively compared to siRNA control) (Fig. 3e, f). Specific targeting of SA1 e31 led to a reduction of 85% of SA1 compared to esiRNA control (Fig.3e), while SA2 e32 targeting had a minimal effect on SA2 protein levels compared to its esiRNA control (reduction of 2%) (Fig.3f), in line with the PSI data.
These results imply that cells ‘tune’ the availability of e31/32 domains in SA proteins, prompting us to investigate the nature of these exons to shed light on their potential function. Inspection of the amino acid (aa) sequence of the spliced exons revealed that they encode a highly basic domain within an otherwise acidic C-terminus (Fig.3c). Overall, the SA paralogs are highly homologous, however the N and C termini diverge in their aa sequence. Despite this divergence, e31 and e32 have retained their basic properties with a pI similar to histones (pI=10.4, 9.9 for e31 and e32, respectively) (Fig.3c). Basic domains act as important regulatory cassettes and can bind nucleic acids. Thus, we investigated whether the alternatively spliced basic exon of SA proteins contributes to the differential association of SA with RNA (Fig.3a,b). We cloned cDNAs from HCT116 cells representing full-length SA2 (SA2FL) and the variant lacking e32 (SA2e32Δ), tagged them with YFP and expressed them in HCT cells (Fig.S3c). We used the GFP-TRAP system to specifically purify the YFP-tagged isoforms from cells and compared their ability to interact with RNA. While CLIP experiments revealed that the presence of the alternative e32 does not affect the ability of SA2 to interact with RNA (Fig.3g, S3d), however it did reveal bands which were enriched in the YFP-SA2FL CLIP and not observed in the YFP-SA2e32Δ samples, revealing a role for the alternative exon in enhanced association of SA2 with RNA binding proteins.
SA proteins bind to endogenous R-loops
Regulators of RNA processing, such as splicing, modification and export factors, act as regulators of R-loops 33. In addition, R-loops are found at sites of multiple biological processes including transcription (of both mRNAs and rRNA), DNA replication and DNA repair 33. Given the fact that many of these processes were enriched in the SA1 interactome and our observations that SA proteins can interact with RNA, we reasoned that the diversity of biological processes represented in the SA1ΔCoh PPI network may be reflective of a role for SA proteins in R-loop biology.
To address this, we returned to our IP-MS experiment to analyse enrichment of R-loop-associated proteins in our SA1ΔCoh interactome. We overlapped the proteins identified in two independent IP-MS experiments using the R-loop specific antibody, S9.6 34,35 to create a custom high-confidence ‘R-loop interactome’ and then used a hygrometric distribution to determine the significance of this category in the SA1ΔCoh interactome (methods). Both the custom R-loop interactome as well as the S9.6 interactomes from Cristini et al., and Wang et al., were highly over-enriched in the SA1ΔCoh interactome (FDR=1.10×10−15, 1.38×10−47, respectively) (Fig.4a). As an independent validation of these observations, we optimised an S9.6 coIP method in RAD21mAC cells (Fig.4b, methods). In agreement with published results, we found that S9.6 precipitated with the known R-loop helicases AQR, DHX9, RNase H2 34,36 as well as MCM3 and RNA Pol II 37. Both SA1 and SA2 precipitated with S9.6 (Fig.4b, S3x), indicating a function at R-loops and supporting the observed enrichment of R-loop proteins in the SA1 interactome.
To understand the causal relationship between R-loops and SA proteins, and to determine the specificity of S9.6-SA interactions, we used RNase H1 to selectively degrade the RNA component of RNA:DNA hybrids 38. We were able to achieve a ~30% reduction in R-loops upon treatment of chromatin lysates with RNase H (Fig.4b, S4a). This reduction was proportional to the observed reduction in coIP of SA1 by S9.6 (Fig.4b, S4b). In parallel, we assessed the effect of R-loop degradation on chromatin-bound SA levels in single cells using confocal microscopy. Treatment of RAD21mAC cells with RNase H1 reduces S9.6 staining by >50% of untreated controls (Fig.4c, d). In agreement with the S9.6 coIP results, mean levels of SA1 and SA2 were significantly reduced by 35% and 18.5%, respectively compared to control cells in response to RNase H treatment (Fig. 4c, d). Finally, we also depleted R-loops in vivo by overexpressing ppyCAG-v5-RNaseH1 in cells. IF revealed that nuclear S9.6 levels were significantly reduced in cells which expressed v5 (to 38% of controls) and that mean levels of chromatin-associated SA1 were similarly reduced by 29.4% (p=4.05E-8) (Fig.S4c), further confirming the causal relationship between R-loops and SA proteins.
SA1 proteins act as suppressors of R-loops
Proteins that act to suppress R-loops in vivo, such as AQR 36, have an inverse correlation with S9.6 levels. From our IF results (Fig. 4c, d), we noticed that SA1 had a similar negative relationship with S9.6 (Fig. S4d), prompting us to investigate whether SA proteins could act as suppressors of R-loop formation. To this end, we treated RAD21mAC cells with scramble control siRNAs or siRNA to SA1, SA2 or AQR and used IF to assess the impact on nuclear S9.6 levels in KD cells (Fig.4e, f). As previously reported, AQR KD resulted in a 30.1% increase of mean nuclear S9.6 levels (p=0.0004). Compared to control siRNA-treated cells, mean SA1 levels were reduced by 56.2% (p=4.1E-40), while mean nuclear S9.6 staining was significantly increased in the same cells by 55.3% (p=3.90E-08) (Fig.3f, g). We note that perturbing SA1 levels increased nuclear S9.6 staining to a similar extent as what was observed upon AQR KD, a bonafide R-loop regulator. When we treated cells with the custom siRNA to SA1 e31 (Fig 3e), we also observed an increase in S9.6 signals (Fig S4e), suggesting that this basic exon plays a role in R-loop stability. Surprisingly, despite efficient KD of SA2 (68% reduction), there was no significant change in nuclear S9.6 staining (mean S9.6 reduced by 10% compared to control, p=0.17), indicating that although SA2 is localised to R-loops (Fig S4b), it does not seem to contribute to their regulation. Taken together, our results confirm the presence of SA proteins at endogenous R-loops in vivo and reveal a role for SA1 in R-loop suppresion.
SA contributes to cohesin loading independently of NIPBL
Our results thus far support a hypothesis whereby SAΔCoh engages with RNA and various RNA binding proteins at clustered regulatory regions (possibly R-loops) to structurally support them and/or facilitate cohesin’s association with chromatin. Indeed, several lines of evidence suggest that alongside the canonical NIPBL/Mau2 loading complex, SA proteins contribute to cohesin’s association with chromatin. In yeast, interaction of the SA orthologue with the loader complex is required for efficient association of the cohesin ring with DNA and subsequent ATPase activation 13,14. Separating interactions into SA-loader and cohesin ring-loader subcomplexes still impairs cohesin loading, indicating that SA functions as more than just a bridge protein 14. Crystallisation studies reveal a striking similarity of NIPBL and SA, in that both are highly bent, HEAT-repeat proteins 39,40. Indeed, NIPBL and SA1 interact together in an antiparallel arrangement and wrap around DNA and the cohesin ring via similar interactions in their respective ‘U’ surfaces, implying that SA1 has a role in the initial recruitment of cohesin to DNA alongside NIPBL (Shi 2020, Higashi 2020).
The RAD21mAC system has the unique advantage that when IAA is washed off cells, RAD21 proteins are no longer degraded and can be ‘re-loaded’ back onto chromatin (Fig.5a). We coupled this to an siRNA-mediated KD of NIPBL to investigate whether cohesin re-loading onto chromatin is influenced by SA proteins in human cells in native conditions. RAD21mAC cells were treated with scramble or NIPBL siRNAs and subsequently grown in ethanol or IAA. The ‘0h post IAA wash-off’ sample represents the extent of cohesin degradation in the IAA-treated cells. In parallel, IAA was washed out and the cells were left for 4h to recover. This sample, ‘4h post IAA wash-off’ represents the extent of cohesin re-loading in the respective genetic background (Fig.5a, b). We confirmed loss of the loader complex by immunoblot for both NIPBL and MAU2 as it is known that MAU2 is de-stablised upon NIPBL loss 41. We note that re-loading was not fully restored to the levels observed in ethanol-treated cells and varied between experiments (Fig.5c), which may reflect differences in the initial amounts of RAD21 or NIPBL (see methods). Despite this variation, we observed a consistent effect on RAD21 re-loading across 8 independent experiments. As expected, in NIPBL KD conditions, mean RAD21 re-loading efficiency was reduced to 40.9% of the siRNA controls (mean re-loading siNIPBL, 2.1 vs siCon, 3.6), however this did not represent a statistically significant difference (p=0.33) and accordingly, a large fraction of chromatin-associated RAD21 could still be detected in NIPBL KD cells (Fig. 5c), indicating that cells have a NIPBL-independent cohesin re-loading mechanism.
We performed the same experiment and this time, in addition to treating cells with siRNAs to NIPBL, we also included siRNA to SA1 and SA2 together (siSA), and a siNIPBL+ siSA condition to ask if SA proteins contribute to the observed NIPBL-independent reloading. Across 5 independent experiments, SA KD had a more dramatic effect on cohesin re-loading efficiency than NIPBL KD, reducing RAD21 re-loading on chromatin to 51% of scramble controls (mean re-loading siSA, 1.9 vs siCon, 5.1, p=0.061), (Fig. 5d, e). However, only when SA and NIPBL were both reduced in cells, was there a statistically significant change to cohesin re-loading, reducing RAD21 on chromatin to 64.9% of scramble control cells (mean re-loading siNIPBL+siSA, 1.42 vs siCon, 5.1, p=0.001), indicating that SA performs an important and complementary step to NIPBL and MAU2 during normal reloading (Fig.5d, e).
Finally, given that SA localises to R-loops and these can be localized to many places across the genome, we reasoned that SA could use this structural platform to link the loading of cohesin to diverse biological processes. Therefore, we repeated the cohesin re-loading experiments in the presence of siRNAs to AQR, which we had previously shown act as suppressors of R-loops (Fig. 4e, f). AQR KD alone had little effect on cohesin re-loading efficiency (Fig. 5f, g), however when R-loops were increased in the context of reduced NIPBL, we observed an increase in the efficiency of cohesin re-loading compared to control cells (Fig. 5f, g). This increase in re-loading efficiency corresponded with a 2.08-fold increase in SA1 levels and a 1.46-fold increase in s9.6 levels, relative to siCon (Fig. S5a) while MAU2 and AQR showed a corresponding fold-change of 0.48 and 0.69, respectively, indicating the specificity of SA1 and R-loop increase. Our results support a role for R-loops in SA-mediated cohesin loading.
DISCUSSION
Whether SA proteins function in their own right outside of the cohesin complex is rarely considered. Consequently, our understanding of how these proteins contribute to cohesin function and disease is incomplete. In this study, we shed light on this question by uncovering a diverse repertoire of SA interactors in cells acutely depleted for the cohesin trimer. These range from proteins associated with translation and ribogenesis to RNA processing factors and regulators of the epitranscriptome. These observations suggest that SA proteins have a previously unappreciated role in post-transcriptional regulation of gene expression which offers much-needed new insight into its roles in disease and cancer.
Acute depletion of the cohesin ring has allowed us to capture a moment in the normal life cycle of cohesin – DNA associations and unveiled a previously unappreciated step for SA proteins. We show that cohesin-independent SA proteins bind to DNA and RNA, in the context of non-canonical RNA:DNA hybrid structures as we have shown here, or sequentially, and use this platform for the loading of cohesin to chromatin. Our results are supportive of biophysical observations of SA proteins and R-loops 42 and in vitro assessment of cohesin loading at DNA intermediates 24. Structural studies suggest that NIPBL and SA1 together bend DNA and cohesin to guide DNA entering into the cohesin ring 17,18,43. Our work shows that in cells lacking either the canonical NIPBL/MAU2 loader complex or the SA proteins, cohesin can still associate with chromatin, suggesting that loading can occur with either component alone, albeit most effectively together.
Since SA paralogues have distinct terminal ends and nucleic acid targeting mechanisms 22,23, their initial recruitment to chromatin may be specified by unique DNA, RNA or protein-interactions, or indeed all three. Such diversification of loading platforms would be important in large mammalian genomes to ensure sufficient cohesin was chromatin associated or to direct stabilization of particular biological processes for a given cell fate 44. Indeed, SA1 and SA2 show clear differences in interaction with FGF-motif containing proteins, despite the fact that both paralogs contain a CES domain 45, underscoring the importance of in vivo studies and arguing that additional factors play an important role in complex stabilization. In this context, RNA-associated protein interaction has previously been shown to support cohesin stabilisation at CTCF at the IGF2/H19 locus 46. These results are in line with our findings that a basic domain in the unstructured C-terminal portion of SA supports RNA-associated protein interactions and R-loop stability.
This study identifies SA proteins as novel regulators of RNA:DNA hybrid homeostasis. It is noteworthy that other suppressors of R-loop formation include mRNA processing factors, chromatin remodellers and DNA repair proteins 47 which all function in the context of nuclear bodies 48. We find that SA proteins are enriched at very distal chromatin interactions in cohesin-depleted Hi-C data and they interact with numerous RNA binding proteins known to condense in 3D 49,50. Harnessing such condensates would provide an efficient loading platform for cohesin at sites of similar biological function. If SA paralogs direct different localization of cohesin loading or stability of its association, this could have important implications in our understanding of disease and cancer.
AUTHOR CONTRIBUTIONS
H.P. and S.H. conceived the project. H.P. designed and performed all the coIP, Mass spectrometry and cohesin re-loading experiments, analysed the ChIP and Hi-C data and performed the statistical analysis for mass spectrometry with the support of A.B. and S.S. Y.L. performed and analysed all imaging experiments (apart from STORM), derived clonal lines of RAD21-mAC cells, cloned YFP-tagged SA2 cDNAs and performed CLIP together with M.T.C. W.V. performed Hi-C, ChIP-seq and splicing analyses. M.V.N., L.M. and M.P.C. performed and analysed STORM imaging. D.P. discovered splicing features of the SA isoforms. H.P. and Y.L. prepared cellular materials for CLIP, which was carried out by M.B., M.T.C., and R.J. A.B. and S.S. performed mass spectrometric and proteomic analysis. H.P., Y.L and S.H. formatted all figures and wrote the manuscript with input from all authors.
Declaration of Interests
The authors declare no competing interests.
METHODS
Cell culture and IAA-mediated degradation of Rad21
HCT116 cells with engineered RAD21-miniAID-mClover (RAD21mAC), or OsTIR1-only, or both (RAD21mAC-OsTIR) were obtained from Masato T. Kanemaki. The cells were maintained in McCoy’s 5A medium with Glutamax (Thermo Fisher Scientific) supplemented with 10% Heat-inactivated FBS (Gibco), 700μg/ml Geneticin, 100μg/ml Hygromycin B Gold and 100μg/ml Puromycin as described in (Natsume). We clonally selected the RAD21mAC-OsTIR cells by sorting green fluorescence positive single cells on a FACS Aria Fusion cell sorter (BD Bioscience). Single cells were individually seeded into one well of a 96-well plate, expanded for 10 days into 6cm culture dishes and selected with Geneticin, Hygromycin B Gold and Puromycin as indicated above in McCoy’s medium for another 10 days. Each clone was assessed for efficiency of Rad21 degradation using FACS analysis and western blotting (WB) using mClover, mAID and OsTIR antibodies. Two clones (H2 and H11) were taken forward and used throughout this study. To deplete RAD21, RAD21mAC-OsTIR cells were grown in adherent conditions for 3 days and treated with 500 μM Indole-3-acetic acid (IAA, Auxin, diluted in ethanol) for 4 hours. For IAA withdrawal, IAA treated cells were washed with PBS and replaced with fresh supplemented McCoy’s medium for another 4 hours. Cells were washed twice with ice-cold PBS before being harvested for later experimental procedures.
siRNA-mediated knockdowns
For siRNA transfections, RAD21mAC-OsTIR cells were reverse transfected with scramble siRNA (siCon) or siRNAs targeting SA1, SA2, NIPBL, or AQR (Dharmacon, Horizon Discovery). A final concentration of 10 nM of siSA1, siSA2, or siNIPBL or 5 nM of siAQR was reverse transfected into the cells using Lipofectamine RNAiMAX reagent (Invitrogen), as per the manufacturer’s instructions. Cells were plated at a density of 1 – 1.25 × 106 cells per 10 cm dish and harvested 72hrs post-transfection, at a confluency of ~70%. The Lipofectamine-containing media was replaced with fresh media 12-16 hrs post-transfection to avoid toxicity. For Figure 5f/g, incubation time was reduced to 48 hrs. To account for the reduced growth time, cells were plated at a density of 2-3 × 106 cells per 10 cm dish. Here siCon- and siNIPBL-transfected cells were plated at a lower cell number than siAQR-transfected cells to ensure equalised confluence (~70%) at the time of collection. When IAA-treatment was combined with siRNA mediated KD, the IAA was added at the end of the normal KD condition so that total KD time was not changed compared to UT cells. For esiRNA treatment, RAD21mAC-OsTIR cells were reverse-transfected with 20 μM FLUC control esiRNA or esiRNA custom designed to SA1 exon31 or SA2 exon32 (MISSION® siRNA, Sigma Aldrich) using RNAiMAX (Invitrogen). Cells were incubated in transfection mixture for 7-8 hours before being replaced with fresh supplemented McCoy’s medium and left for another 40h until harvest. Efficiency of KD was assessed by WB. siRNA information can be found in Table 1.
Immunofluorescence
Cells were adhered onto poly L-lysine coated glass coverslips in 6 well culture dishes and were washed twice with ice-cold PBS before IF procedures. For RAD21-depletion analysis, cells were fixed for 10 mins at room temperature with 3.7% paraformaldehyde (Alfa Aesar) in PBS, washed 3 times with PBS and then permeabilized at room temperature for 10 mins with 0.25% Triton X-100 in PBS (Sigma Aldrich). For R-loop imaging, cells were fixed and permeabilised with ice-cold ultra-pure methanol (Sigma Aldrich) for 10 mins at −20°C. After 3 washes with PBS, cells were blocked for 45 mins at room temperature with 10% FCS-PBS. For RNASEH1 enzyme treatment, cells were incubated with blocking solution supplemented with 1x RNASEH1 reaction buffer alone (50 mM Tris-HCl, 75 mM KCl, 3 mM MgCl2, 10 mM DTT) or 5 units of RNASEH1 enzyme (M0297, New England Biolabs) for 30 mins at 37 °C, PBS-washed twice, before blocking. Cells were washed twice with PBS before incubation with primary antibodies diluted in 5% FCS-PBS at 4 °C overnight. Anti-SA1, anti-SA2 and anti-AQR were used at 1:3000 dilutions; anti-CTCF was used at 1:2500 dilution; anti-s9.6 was used at 1:1000 dilution; anti-V5 was used at 1:1000. After 4 washes with PBS, cells were incubated with secondary antibodies (donkey anti-Goat AF555 or AF647 for SA1/2 used at 1:3000; donkey anti-Rabbit AF647 for CTCF used at 1:2500; donkey anti-Mouse AF555 for s9.6 used at 1:2000; donkey anti-Rabbit AF647 for AQR used at 1:3000; donkey anti-Rabbit AF488 for V5 used at 1:2000)) in 5% FCS-PBS for 1 hour at room temperature, and washed 4 times with PBS before being mounted onto glass slides with ProLongTM Diamond Antifade Mountant with DAPI (Thermo Fisher Scientific) to stabilise overnight in dark before imaging. See Table 2 for details of where antibodies were purchased.
Imaging was performed on Zeiss LSM confocal microscopes using 63x/1.40 NA Oil Plan-Apochromat objective lens (Carl Zeiss, Inc.). Images were captured as z-stacks and under consistent digital gain, laser intensity and resolution for each experiment. Numerical analysis was carried out using Imaris software (Oxford Instruments, version 9.5.1) and representative images are shown as maximum z-projected views generated using Fiji Image J. In brief, z-stack images were imported into Imaris, cells were identified using DAPI and only those located 1 μm away from image boundary and sized between 120-800 μm3 were selected. A seed-split function of 7.5um was used to separate closely situated cells. Fluorescence intensities of individual DAPI-selections in each channel were determined by Imaris and exported into Excel for further analysis. Distribution plots were generated from >50 cells of each replicate with 3 biological replicates per experiment. Student’s t-test was performed between control and experimental conditions and statistical significance was determined by detecting the difference between means (unequal variance, two-tailed). Significance is denoted as p>0.05 = not significant (ns), p≤0.05 = *, p≤0.01 = **, p≤0.001 = *** and p≤0.0001 = ****.
Chromatin Fractionation and coImmunoprecipitation
Cells were washed twice with ice-cold PBS (Sigma Aldrich) and lysed in Buffer A (10 mM HEPES, 10mM KCl, 1.5 mM MgCl2, 0.34 M Sucrose, 10% Glycerol, 1mM DTT, 1mM PMSF/Pefabloc, protease inhibitor), supplemented with 0.1% T-X100, for 10 min on ice. Lysed cells were collected by scraping. Nuclei and cytoplasmic material was separated by centrifugation for 4 min at 1300 g at 4oC. The supernatant was collected as the cytoplasmic fraction and cleared of any insoluble material with further centrifugation for 15 min at 20,000 g at 4oC. The nuclear pellet was washed once with buffer A before lysis in buffer B (3mM EDTA, 0.2mM EGTA, 1mM DTT, 1mM PMSF/Pefabloc, protease inhibitor) with rotation for 30 min at 4oC. Insoluble nuclear material was spun down for 4 min at 1700 g at 4oC and the supernatant taken as nuclear soluble fraction. The insoluble material was wash once with buffer B and then resuspended in high-salt chromatin solubilization buffer (50mM Tris-HCl pH 7.5, 1.5 mM MgCl2, 500mM KCl, 1mM EDTA, 20% Glycerol, 0.1% NP-40, 1mM PMSF/Pefabloc, protease inhibitor). The lysate was vortexed for 2 min to aid solubilization. Nucleic acids were digested with 85U benzonase (Sigma-Aldrich) per 100 × 106 cells, with incubation for 10 min at 37oC and 20 min at 4oC. Chromatin was further solubilized with ultra-sonication for 3 × 10 sec at an amplitude of 30. The lysate was diluted to 200 mM KCl and insoluble material was removed by centrifugation at 15,000 RPM for 30 min at 4oC.
For coIP, antibodies were bound to Dynabead Protein A/G beads (ThermoFisher Scientific) for 10 min at room temperature and ~ 5 hr at 4oC. For mock IgG IPs, beads were incubated with serum from the same host type as the antibody of interest. 1mg of chromatin extract was incubated with the antibody-bead conjugate per IP for approximately 16 hr at 4oC. IPs were washed x5 with IP buffer (200mM chromatin solubilization buffer) and eluted by boiling in either 2x Laemmeli sample buffer (BioRad) or 4x NuPAGE LDS sample buffer (ThermoFisher Scientific). Proteins ≤ 250 kDa were separated by SDS-PAGE electrophoresis using 4–20% Mini-PROTEAN® TGX™ Precast Protein Gels (BioRad) and transferred to Immobilon-P PVDF Membrane (Merck Millipore) for detection. Proteins ≥ 250 kDa were separated by SDS-PAGE electrophoresis using Invitrogen NuPAGE 3-8% Tris-Acetate precast protein gels. Transfer was extended to overnight with low voltage (20V) to aid in transfer of the high-molecular weight proteins. Membranes were incubated in primary antibody solution overnight at 4oC and images were detected using chemiluminescent fluorescence. Densitometry was carried out using ImageStudio Lite software with statistical significance calculated by unpaired t test, unless otherwise specified. Fold enrichment quantifications were performed by first normalising the raw densitometry value to its corresponding Histone H3 quantification and the comparing between the samples indicated. See Table 2 for details of antibodies.
S9.6 IP and Dot Blot
Cells were fractioned and processed for S9.6 IP as described above, with the following modifications. To avoid digestion of RNA:DNA hybrids, samples were not treated with benzonase during chromatin solubilization and sonication was carried out for 10 min (Diagenode Biorupter) as in 34. Where indicated, chromatin samples were treated with Ribonuclease H enzyme (NEB) overnight at 37°C to digest RNA:DNA hybrids in the extract. To avoid detection of single-stranded RNA by the S9.6 antibody, all S9.6 IP samples were pre-treated with Purelink RNase A (Thermo Fisher Scientific) at 0.25ug/1mg chromatin extract for 1 hr 30 min at 4°C. The reaction was stopped with addition of 143U Invitrogen SUPERase•In RNase Inhibitor (Thermo Fisher Scientific). RNA:DNA hybrid levels were assessed in chromatin samples by dot blot. Specifically, the chromatin lysate was directly wicked onto Amersham Protran nitrocellulose membrane (Merck) by pipetting small volumes above the membrane. Membranes were blocked in 5% (w/v) non-fat dry milk in PBS-0.1% Tween and incubated with S9.6 antibody overnight as for standard western blot. As above, detection was carried out using chemiluminescent fluorescence. RNase A-mediated digestion of RNA:DNA hybrids was performed using a non-ssRNA-specific enzyme (Thermo Scientific) at 1.5ug/25ug chromatin extract at 37°C.
ChIP-sequencing, library preparation and analysis
ChIP lysates were prepared from RAD21mAC cells treated with ethanol or IAA for 4hrs in two biological replicates. Formaldehyde (1%) was added to the culture medium for 10min at room temperature. Fixation was blocked with 0.125M glycine and cells were washed in cold PBS. Nuclear extracts were prepared by douncing (20 strokes, medium pestle) in swelling buffer (25 mM HEPES pH8, 1.5 mM MgCl2, 10mM KCL, 0.1% NP40, 1 mM DTT and protease inhibitors) and centrifuged for 5min at 2000rpm at 4C. Nuclear pellets were resuspended in Sucrose buffer I (15mM Hepes pH 8, 340 mM Sucrose, 60mM KCL, 2mM EDTA, 0.5 mM EGTA, 0.5% BSA, 0.5 mM DTT and protease inhibitors) and dounced again with 20 strokes. The lysate was carefully laid on top of an equal volume of Sucrose buffer II (15mM Hepes pH 8, 30% Sucrose, 60mM KCL, 2mM EDTA, 0.5 mM EGTA, 0.5 mM DTT and protease inhibitors) and centrifuged for 15min at 4000rpm at 4C. Nuclei were washed twice to remove cytoplasmic proteins, centrifuged and the pellet was resuspended in Sonication/RIPA buffer (50mM Tris, pH 8.0, 140 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% Na-deoxycholate, 0.1% SDS and protease inhibitors) at a concentration of 5 ×10^6 nuclei in 130μl buffer. This was transferred to a sonication tube (AFA Fiber Pre-Slit Snap-Cap 6×16mm) and sonicated in a Covaris S2 (settings; 4 cycles of 60 seconds, 10% duty cycle, intensity: 5, 200 cycles per burst). Soluable chromatin was in the range of 200 - 400 bp. Triton X100 was added (final concentration 1%) to the sonicated chromatin and moved to a low-retention tube (Eppendorf) before centrifugation at 14,000 rpm for 15min at 4C and pellets were discarded. 1/100th of the chromatin lysate was retained as the Input sample.
For Immunoprecipitation, 200ug chromatin aliquots/IP were precleared with a slurry of Protein A/G (50:50) (Dynabeads) an incubated for 4hr at 4C. Meanwhile, washed Protein A/G beads (40ul per IP) were mixed with primary antibodies and incubated for 4hrs at 4C. The following amounts of antibodies were used: anti-CTCF, 5ug/ChIP; anti-SA1, 15ug/ChIP; anti-SA2, 10ug of the mixed antibody pack/ChIP; anti-Smc3, 5ug/ChIP and anti-IgG, 10ug/ChIP. See Table 2 for information about the antibodies. Washed, pre-bound Protein A/G beads+antibody were mixed with pre-cleared chromatin lysates and incubated overnight with rotation at 4C. The next day, the supernatant was removed and the beads were washed 9 times with increasing salt concentrations. Protein-DNA crosslinks were reversed in ChIP elution buffer (1% SDS, 5 mM EDTA, 10 mM Tris HCl pH 8) + 2.5 ul of Proteinse K and incubated for 1 hour at 55°C and overnight at 65°C. Samples were phenol–chloroform extracted, resuspended in TE buffer and assessed by qPCR as a quality control. Libraries were prepared from 5-10ng of purified DNA, depending on availability of material, using NEBNext Ultra II DNA Library Prep Kit for Illumina kit and using NEBNext Multiplex Oligos for Illumina (Index Primers Set 2) according to manufacturer’s instructions using 6-8 cycles of PCR. ChIP-seq libraries from one biological set (all ChIP libraries for both ethanol and IAA) were multiplexed and sequenced on the Illumina HiSeq2500 platform, 80bp single-end reads. Each biological set was sequenced on a separate run.
Quality control of reads was preformed using FASTQC. Reads were aligned to the hg19 reference genome using Bowtie with 3 mismatches. Only replicate 1 of the SA1 librraye was used. PCR duplicates were detected and removed using SAMTOOLS. Bam files were imported into MISHA (v 3.5.6) and peaks were identified using a 0.995 percentile. Peaks that overlapped in both replicates were retained. Correlation plots of peaks across the genome from different ChIP libraries were compared with log-transformed percentiles plotted as a smoothed scatter plot. Comparison of peaks at regions of interest were carried out using deepTools (Version 3.1.0-2 REF). For input into deepTools, peak data was converted to bigwig format, with a bin size of 500, using the UCSC bedGraphtoBigWig package. The signal matrix was calculated for a window 2,000 bp up- and down-stream of the region of interest, missing data was treated as zero, and all other parameters were as default. Heatmaps were generated within deepTools, with parameters as default.
Hi-C data and contact hotspots analysis
Generating hotspots - Previously published Hi-C datasets derived from RAD21mAC cells treated with ethanol or IAA 8 were analyzed as previously described 51. Custom R scripts were written to identify Hi-C hotspots, i.e. regions of Hi-C maps with high contact frequency. To begin, for each chromosome, all contacts were extracted and subsetted for only high scoring (>=60) contacts between a band of 10e3 – 70e6. Using KNN, for each high scoring contact, the 250 nearest neighbour contacts were identified and subset for only the high-scoring neighbours. This created a list of high scoring neighbours for each high scoring contact, where the first neighbour is the contact itself with a distance of 0. This allowed the neighbour information to be converted into edge information, thereby allowing high score fend contacts to be grouped into cluster hotspots using the R package ‘igraph’. Hotspots that contained less than the minimum number of high scoring fends (<100) were removed. The output list of hotspots were represented as 2D intervals which contained high scoring contacts. In total, 5539 hotspots were identified in EtOH and 759 in IAA Hi-C data.
Creating aggregate plots - To calculate and visualise the contact enrichment at hotspots in the EtOH and IAA Hi-C, we used the R package ‘shaman’. Firstly, we used the function ‘shaman_generate_feature_grid’ to calculate the enrichment profile at EtOH and IAA hotspots. Using the weighted centre for each hotspot, represented as a 2D interval we used the function to build grids for the EtOH and IAA hotspots in the HiC data at 3 specific bands, 100k – 1MB, 1MB – 5MB, 5MB – 10MB. A range of 250kb was visualised around the weighted centre. The grid was built by taking all combinations interval1 and interval2 of the EtOH and IAA hotspot centres, with each combination termed a ‘window’. Hotspots were not filtered for size or shape. A score threshold of 60 was used to focus on enriched pairs, those windows that did not contain at least one point with a score of 60 were discarded. Each window was then split into 1000nt bins and the windows were summed together to generate a grid containing the observed and expected contacts. We visualised the grid using ‘shaman_plot_feature_grid’ using ‘enrichment’ mode and a plot_resolution value of 6000, due to the large range being visualised.
STORM – Immunolabelling and imaging
Two clones of RAD21mAC-OsTIR cells were seeded at a density of 30,000 cells per well per 400ul) onto poly-L-lysine coated 8-well chamber slides (Lab-Tek™ 155411) overnight. Each clone was treated with either ethanol (EtOH) or Auxin (IAA) for 4hr and then fixed with PFA 4% (Alfa Aesar) for 10 min at room temperature and rinsed with PBS three times for 5 min each. The cells were shipped to the Cosma Lab after fixation for STORM processing and imaging. Cells were permeabilized with 0.3% Triton X-100 in PBS and blocked in blocking buffer (10% BSA – 0.01 % Triton X-100 in PBS) for one hour at room temperature. Cells were incubated with primary antibodies (see Table 2) in blocking buffer at 1:50 dilution. Cells were washed three times for 5 min each with wash buffer (2% BSA – 0.01 % Triton X-100 in PBS) and incubated in secondary antibody. For STORM imaging, home-made (Bates et al., 2007) dye pair labeled secondary antibodies were added at a 1:50 dilution in blocking buffer and were incubated for 45 min at room temperature. Cells were washed three times for 5 min each with wash buffer.
STORM imaging was performed on an N-STORM 4.0 microscope (Nikon) equipped with a CFI HP Apochromat TIRF 100× 1.49 oil objective and an iXon Ultra 897 camera (Andor) and using Highly Inclined and Laminated Optical sheet illumination (HILO). Dual color STORM imaging was performed with a double activator and single reporter strategy by combining AF405_AF647 anti-Goat secondary with Cy3_AF647 anti-Rabbit secondary antibodies. Sequential imaging acquisition was performed (1 frame of 405 nm activation followed by 3 frames of 647 nm reporter and 1 frame of 560 nm activation followed by 3 frames of 647 nm reporter) with 10 ms exposure time for 120000 frames. 647 nm laser was used at constant ~2 kW/cm2 power density and 405 nm and 560 nm laser powers were gradually increased over the imaging. Imaging buffer composition for STORM imaging was 100 mM Cysteamine MEA (Sigma-Aldrich, #30070) - 5% Glucose (Sigma-Aldrich, #G8270) – 1% Glox Solution (0.5 mg/ml glucose oxidase, 40 mg/ml catalase (Sigma-Aldrich, #G2133 and #C100)) in PBS.
STORM imaging analysis and quantifications
STORM images were analyzed and rendered in Insight3 (kind gift of Bo Huang, UCSF) as previously described (Bates et al., 2007; Rust et al., 2006). Localizations were identified based on a threshold and fit to a simple Gaussian to determine the x and y positions. Cluster analysis of CTCF, SA1 and SA2 STORM signal was performed as previously described (Ricci et al., 2015) to obtain cluster size and positions and to measure Nearest Neighbour distributions (NND) between clusters of the same protein in individual nuclei. NND between clusters’ centroids of two different proteins (i.e. CTCF-SA1 and CTCF-SA2) was calculated by knnsearch.m Matlab function and the NND histogram of experimental data was obtained by considering all the NNDs of individual nuclei (histogram bin, from 0 to 500 nm, 5 nm steps). Simulated NNDs recapitulating random spatial distribution of cluster centroids were first obtained for each nucleus separately and then merged to calculate the simulated NND histogram (histogram bin, from 0 to 500 nm, 5 nm steps). The difference plot reports the difference between experimental NND and simulated NND. Quantification and analysis of STORM images was performed in Matlab and statistical analysis was performed in Graphpad Prism (v7.0e). The type of statistical test is specified in each case. Statistical significance is represented as indicated above.
Insight3 Software used for STORM image processing has been generated (Huang et al., 2008) and kindly provided by Dr Bo Huang (UCSF). Graphpad Prism software used for statistical analysis can be found at: https://www.graphpad.com/scientific-software/prism/ MatLab software used for imaging data analysis can be found at: https://www.mathworks.com/products/matlab.html
Mass spectrometry sample preparation and running
SA1 immunoprecipitation samples were analysed by liquid chromatography–tandem mass spectrometry (LC-MS/MS). Three biological replicate experiments were carried out for MS and each included four samples, untreated (UT), treated with IAA for 4hrs, siCon, or siSA1, generated as described above. A fourth technical replicate was also included for the UTR samples. Cells were fractionated to purify chromatin-bound proteins as above and immunoprecipitated with IgG- or SA1-bead conjugates. To maximise IP material for the MS, the antibody amount was increased to 15ug and the chromatin amount was increased to 2mg.
The IP eluates were loaded into a pre-cast SDS-PAGE gel (4–20% Mini-PROTEAN® TGX™ Precast Protein Gel, 10-well, 50 μL) and proteins were run approximately 1 cm to prevent protein separation. Protein bands were excised and diced, and proteins were reduced with 5 mM TCEP in 50 mM triethylammonium bicarbonate (TEAB) at 37°C for 20 min, alkylated with 10 mM 2-chloroacetamide in 50 mM TEAB at ambient temperature for 20 min in the dark. Proteins were then digested with 150ng trypsin, at 37°C for 3 h followed by a second trypsin addition for 4 h, then overnight at room temperature. After digestion, peptides were extracted with acetonitrile and 50 mM TEAB washes. Samples were evaporated to dryness at 30°C and resolubilised in 0.1% formic acid.
nLC-MS/MS was performed on a Q Exactive Orbitrap Plus interfaced to a NANOSPRAY FLEX ion source and coupled to an Easy-nLC 1200 (Thermo Scientific). 25% (first, second and fourth biological replicate) or 50% (third biological replicate) of each sample was loaded as 5 or 10 μL injections. Peptides were separated on a 27cm fused silica emitter, 75 μm diameter, packed in-house with Reprosil-Pur 200 C18-AQ, 2.4 μm resin (Dr. Maisch) using a linear gradient from 5% to 30% acetonitrile/ 0.1% formic acid over 60 min, at a flow rate of 250 nL/min. Peptides were ionised by electrospray ionisation using 1.8 kV applied immediately prior to the analytical column via a microtee built into the nanospray source with the ion transfer tube heated to 320°C and the S-lens set to 60%. Precursor ions were measured in a data-dependent mode in the orbitrap analyser at a resolution of 70,000 and a target value of 3e6 ions. The ten most intense ions from each MS1 scan were isolated, fragmented in the HCD cell, and measured in the orbitrap at a resolution of 17,500.
Mass spectrometry analysis
Raw data was analysed with MaxQuant 52 version 1.5.5.1 where they were searched against the human UniProtKB database using default settings (http://www.uniprot.org/). Carbamidomethylation of cysteines was set as fixed modification, and oxidation of methionines and acetylation at protein N-termini were set as variable modifications. Enzyme specificity was set to trypsin with maximally 2 missed cleavages allowed. To ensure high confidence identifications, PSMs, peptides, and proteins were filtered at a less than 1% false discovery rate (FDR). Label-free quantification in MaxQuant was used with LFQ minimum ratio count set to 2 with ‘FastLFQ’ (LFQ minimum number of neighbours = 3, and LFQ average number of neighbours = 6) and ‘Skip normalisation’ selected. In Advanced identifications, ‘Second peptides’ was selected and the ‘match between runs’ feature was not selected. Statistical protein quantification analysis was done in MSstats 53(version 3.14.0) run through RStudio. Contaminants and reverse sequences were removed and data was log2 transformed. To find differential abundant proteins across conditions, paired significance analysis consisting of fitting a statistical model and performing model-based comparison of conditions. The group comparison function was employed to test for differential abundance between conditions. Unadjusted p-values were used to rank the testing results and to define regulated proteins between groups.
Proteins with peptides discovered in the IgG samples were disregarded from downstream analyses. Significantly depleted/enriched proteins were considered with an absolute log2foldchange > 0.58 (1.5-fold change) and a p-value < 0.1. SA1 interactome analysis was performed in STRING. The network was generated as a full STRING network with a minimum interaction score of 0.7 required. Over-enrichment of GO biological process and molecular function terms was calculated with the human genome as background. Network analysis of the SA1 interactome in IAA-treated samples was generated from the significantly depleted/enriched proteins, with a minimum interaction score of 0.4 required. Two conditions for functional enrichments were considered; i) enrichment was calculated with the human genome as background to determine the full SA1 interactome in the absence of cohesin, compared to the genome, and ii) enrichment was calculated with the untreated SA1 interactome as background, to determine the statistical effect of cohesin loss of the SA1 interactome itself. The network developed in i) was manually rearranged in Cytoscape for visual clarity, enriched categories were visualized using the STRING pie chart function and half of the proteins within each category were subset from the network based on pvalue change between UTR and IAA samples.
Over-enrichment of the s9.6 interactome was calculated separately using the hypergeometric distribution for comparison with 34,35. Significance was calculated using the dhyper function in R and multiple testing was corrected for using the p.adjust Benjamini & Hochberg method. To compare with a minimal background protein list, http://www.humanproteomemap.org was analysed on the Expression Atlas database to determine a list of proteins expressed in one or more of three tissue types corresponding to the cell types used across the different studies.
SLiMSearch analysis
The SLiMSearch tool http://slim.icr.ac.uk/slimsearch/, with default parameters was used to search the human proteome for additional proteins that contained the FGF-like motif determined in 16 to predict binding to SA proteins. The motif was input as [PFCAVIYL][FY][GDEN]F.{0,1}[DANE].{0,1}[DE]. Along with CTCF, four proteins found to contain the FGF-like motif, CHD6, MCM3, HNRNPUL2 and ESYT2 were validated for interaction with SA.
CLIP and iCLIP
Crosslinking immunoprecipitation (CLIP) was performed as previously described54. Briefly, mESC or HCT116 cells were irradiated with 0.2 J/cm2 of 254 nm UV light in a Stratalinker 2400 (Stratagene). Cells were lysed in 1 ml of lysis buffer with Complete protease inhibitor (Roche). Lysates were passed through a 27 G needle, 1.6 U DNase Turbo (Thermofisher) per 106 cells and 0.8 (low) or 8 U (high) U RNase I (Ambion) per 106 cells added, and incubated in a thermomixer at 37°C and 1100 rpm for 3 minutes. Lysates were then cleared by centrifugation and using Proteus clarification spin column, according to the manufacturer’s instructions. Endogenous SA1 and SA2 were immunoprecipitated with 10 g SA1 and SA2 antibodies or non-specific IgG control (Sigma) conjugated to protein G dynabeads (Dynal) for 4 hrs at 4°C. Tagged SA2 proteins were immunoprecipitated from HCT116 cells 40 hours after transfection with 30 l GFP-Trap beads. IPs were washed three times with high salt buffer (containing 1M NaCl and 1M urea) and once with PNK buffer and RNA labelled with 8 μl radioactive 32P-gamma-ATP (Hartmann Analytic) for 5 mins at 37°C. For RNaseH1 treatment, YFP-SA2 samples were split and either treated with PNK buffer alone or PNK buffer containing 50 U RNaseH1 for 15 mins at 37°C. RNPs were eluted in LDS loading buffer (Invitrogen) and resolved on a 4-12% gradient NuPAGE Bis-Tris gel (Invitrogen) and transferred onto 0.2 μm diameter pore nitrocellulose membrane. After blocking with PBST+milk, membranes were washed and exposed overnight to phosphorimager screen (Fuji) and RNA-32P visualized using a Typhoon phosphorimager (GE) and ImageQuant TL (GE). Membranes were then immunoblotted for SA1, SA2, and RAD21 and visualized using an ImageQuantLAS 4000 imager (GE). See Table 2 for details on antibodies.
GFP-TRAP + Cloning of STAG2 s/l and YFP constructs
SA2 cDNAs were cloned directly from HCT116 cells by PCR using KAPA HiFi HotStart PCR kit (Roche) (Fwd: ATGATAGCAGCTCCAGAAAACCAACTG; Rev: TTAAAACATTGACACTCCAAGAACTGATTCATCC). Two major isoforms were detected, SA2Δex32 where exon32 has been spliced out and SA2+ex32 where exon 32 has been spliced in. Both SA2 cDNAs were cloned into pENTR/D vector (Invitrogen) and then into an N-terminal YFP-tagged Gateway cloning vector (a kind gift from Endre Kiss-Toth, University of Sheffield). Sequences were confirmed by restriction enzyme digestion and Sanger sequencing. Recombinant YFP-SA2Δex32 or YFP-SA2+ex32 were transfected into adherent HCT116 cells for 40 hours before being harvested. Cells were lysed, fractionated and sonicated following the same protocol for chromatin fractionation with the variation of chromatin solubilisation in NaCl IP buffer (50mM Tris-HCL pH 7.5, 150mM NaCl, 1mM EDTA, 0.1% NP-40, 20% Glycerol, 1mM DTT). 1mg chromatin lysate was pre-cleared with a 50:50 mixture of protein A/G magnetic beads and GFP-Trap (Chromotek, gtd-20) was pre-blocked with 1mg/mL ultra-pure BSA (AM2616, Invitrogen) for 2h at 4°C. After blocking, GFP-Trap was washed twice with NaCl IP buffer and added to pre-cleared lysates to immunoprecipitate proteins for 1h at 4°C. Samples were washed in NaCl IP buffer and eluted in 2x Laemmli buffer (Bio-Rad). Proteins were separated by SDS-PAGE on a 4-20% gradient mini-PROTEAN® Precast Gel (Bio-Rad) and transferred onto PVDF membrane for visualization.
VAST-TOOLS
VAST-TOOLS was used to generate Percent Spliced In (PSI) scores, a statistic which represents how often a particular exon is spliced into a transcript using the ratio between reads which include and exclude said exon. Paired-end RNA-seq datasets were submitted to VAST-TOOLS (v2.1.3) using the Mmu genome (Tapial J et al, Gen Res 2017). Briefly, reads are split into 50nt words with a 25nt sliding window. The 50nt words are aligned to a reference genome using Bowtie to obtain unmapped reads. These unmapped reads are then aligned to a set of predefined exon-exon junction (EJJ) libraries allowing for the quantification of alternative exon events. The output was further interrogated using a script which searches all hypothetical EEJ combinations between potential donors and acceptors within Stag1. PSI scores could be obtained providing there was at least a single read within the RNAseq data that supported the event, although we only considered events supported by a minimum of 50 reads. Calculated PSI values for each alternatively spliced exon (and shown in Fig 3d), as well as the average PSI reported in the text are shown below. See Table 3 for names of published datasets used in this analysis.
ACKNOWLEDGMENTS
This work would not be possible without the support of a Senior Research Fellowship from the Wellcome Trust awarded to S.H. (106985/Z/15/Z) and a CRUK PhD studentship awarded to H.P. The Proteomics work was supported by the CRUK–UCL Centre Award [C416/A25145]. We thank Stanimir Dulev for his contributions at the early stages of the project. We would like to thank Konstantina Skourti-Stathaki for advice about S9.6 IFs and R-loops. We are grateful to the members of the Hadjur lab for critical discussions and reading of the manuscript.