Abstract
Background Ubiquitously expressed CTCF is involved in numerous cellular functions, some of which involve organizing chromatin into TAD structures and others to its role as a transcription factor. In contrast, its paralog, CTCFL is normally only present in testis. However, it is also aberrantly expressed in many cancers. While it is known that shared and unique zinc finger sequences in CTCF and CTCFL enable CTCFL to bind competitively to a subset of CTCF binding sites as well as its own unique locations, the impact of CTCFL on chromosome organization and gene expression has not been comprehensively analyzed in the context of CTCF function. Using an inducible complementation system, we analyzed the impact of expressing CTCFL and CTCF-CTCFL chimeric proteins in the presence or absence of endogenous CTCF to clarify the relative and combined contribution of CTCF and CTCFL to chromosome organization and transcription.
Results We demonstrate that the N terminus of CTCF interacts with cohesin which explains the requirement fpr convergent CTCF binding sites in loop formation. By analyzing CTCF and CTCFL binding in tandem we identified phenotypically distinct sites with respect to motifs, targeting to promoter/intronic intergenic regions and chromatin folding. Finally, we reveal that the N, C and zinc finger terminal domains play unique roles in targeting each paralog to distinct binding sites, to regulate transcription, chromatin looping and insulation.
Conclusion This study clarifies the unique and combined contribution of CTCF and CTCFL to chromosome organization and transcription, with direct implications for understanding how their co-expression deregulates transcription in cancer
Introduction
CTCF is involved in numerous cellular functions, some of which can be attributed to its role in organizing chromatin into TAD structures. The latter involves a loop-extrusion mechanism whereby cohesin rings create loops by actively extruding DNA until the complex finds two CTCFs bound in convergent orientation, which block its movement [1–5]. While it is known that convergently orientated CTCF binding sites preferentially form loops while divergent sites delineate boundary regions [6], it is not clear why convergently, rather than divergently orientated CTCF sites can stop the movement of cohesin movement on chromatin. CTCF can also act as a transcription factor (TF) controlling the expression of many genes by binding to their TSSs [7]. In addition, CTCF can pause transcription. Thus, it is clear that not all CTCF sites are created equal and there are site-specific functional distinctions, but it is not known whether these can be attributed to differences in binding site motifs and/or the action of cofactors that bind CTCF.
CTCFL (CTCF like), otherwise known as BORIS (Brother of the Regulator of Imprinted Sites), is the paralog of CTCF [8]. It emerged by gene duplication of CTCF during evolution in the ancestry of amniotes [9]. In contrast to CTCF, which is a constitutively and ubiquitously expressed essential protein, CTCFL is expressed only transiently in pre-meiotic male germ cells of healthy individuals together with CTCF [10]. It plays a unique role in spermatogenesis by regulating expression of pluripotency and testis specific genes [10–12]. It is also aberrantly activated in cancers of several lineages including lung [13–15], breast [16, 17], uterine [18], esophageal [19], hepatocellular [20], ovarian [21–24], prostate [25], urogenital [26] and neuroblastoma [27]. CTCFL has been shown to promote neoplastic transformations by its interference in cellular processes including invasion and apoptosis, cell proliferation and immortalization [21, 22, 27–29]. Furthermore, CTCFL was identified as one of the most promising cancer testis antigens by the NCI [30] and it is known to be important in activating the expression of numerous other cancer testis antigens.
CTCF is bound to chromatin through a subset of its eleven zinc fingers (ZFs). The core ZFs 3-7 make sequence specific contacts with DNA and it is thought that ZFs 8 and 9 provide stability [31, 32]. Together, the 11 zinc fingers of CTCF contribute to its multivalent nature and ability to bind to about 50,000 sites across the genome [33]. The DNA binding ZF region of CTCF and CTCFL share 74% sequence identity [9], however, the N- and C-terminal domains are quite distinct and likely interact with different binding partners that contribute to their unique functions [34]. CTCFL has the ability to bind to and compete with CTCF at a subset of its binding sites, owing to the similarity in the DNA binding region [10, 35]. Although differences in the two proteins can lead to divergent and antagonistic effector functions [10], little is known about the mechanisms underlying these different outcomes.
In this context, it is not clear how the N/C terminals and zinc finger domains contribute to CTCF’s site-specific roles and which regions of the protein are involved in interacting with cohesin. There is contradictory evidence supporting and disputing a role for the C terminal region in mediating CTCF-cohesin interaction, respectively from the Felsenfeld and Reinberg labs [36, 37]. Furthermore, the issue of which region of CTCF halts cohesin’s movement on chromatin remains an unsolved problem as co-immunoprecipitation or ChIP-seq analysis of mutants lacking these or other domains has not been published. It is also not known which part/s of CTCFL are important for its role in gene regulation and whether the individual domains have distinct functional impacts at different binding sites. Like CTCF, CTCFL can act as a transcription factor (TF), but given that its binding does not overlap with cohesin [10, 35], it is unlikely to be able to phenocopy CTCF’s function in acting as an insulator at boundary sites, but this has not been analyzed. Pertinent to our investigations is the finding that CTCFL can bind competitively to a subset of sites that CTCF binds [10, 35] and because of the likely differences in the insulating capability of the two proteins, eviction of CTCF at these sites could have an impact on chromosome architecture linked to changes in gene regulation, but this has not been examined.
Investigating the impact of CTCFL overexpression in cancer cells is difficult because of the confounding effects of other genetic and epigenetic alterations. To circumvent these issues, we combined use of a CTCF degron system (which acutely and reversibly depletes endogenous CTCF) [7], with knocked-in doxycycline inducible transgenes encoding intact CTCFL and CTCF-CTCFL fusion proteins at the Tigre locus. This dual system allowed us to elucidate the interplay of CTCFL and CTCF by analyzing the functional impact of each protein in cells where they were expressed individually or together. Transgenic fusion proteins in which different domains of CTCF and CTCFL were exchanged, further enabled us to delineate the contributions of the zinc fingers and N/C terminal regions to their binding and site-specific functions.
Here we highlight an interesting aspect of functional importance: not all CTCF and CTCFL binding sites are created equal. CTCF and CTCFL each bind to a set of unique and overlapping sites that have distinct DNA motifs, domain boundary insulation scores and biases for being in promoters rather than intronic or intergenic regions. Consistent with its role as a transcription factor, we find that CTCFL is more likely to bind promoters than CTCF and when CTCF is found at promoter sites, it predominantly binds locations where CTCF and CTCFL overlap and a subset of these promoters are associated with genes that both CTCF and CTCFL regulate. This indicates that CTCF + CTCFL overlapping binding sites may be functionally distinct from the CTCF only bound sites and emphasizes a common role for CTCF and CTCFL acting as TFs at these overlapping locations. Expressing CTCF-CTCFL chimeric proteins with swapped N and C terminal domains reveals that the zinc finger region of both CTCF and CTCFL defines their respective DNA motif specificity, while the N and C terminal domains influence whether the proteins bind promoters or intergenic and intronic regions. In line with these findings we show that, for both CTCF and CTCFL, transcription is differentially affected by the N/C and zinc fingers domains depending on their genomic context.
While CTCF demarcates TAD boundaries, we find that CTCFL is unable to insulate chromosome domains. Using pull down and immunoprecipitation analyses we demonstrate that CTCFL cannot physically interact with cohesin, which explains why cohesin is absent from CTCFL only sites. CTCFL’s inability to interact with cohesin also explains our finding that CTCFL cannot insulate chromosome domains after CTCF depletion. CTCFL nevertheless plays a role in chromosome architecture by altering DNA looping, independently of CTCF. Finally, we establish that the zinc fingers of CTCF contribute to insulation at domain boundaries although we find that cohesin only binds to the N terminal region of this protein, providing an explanation for why convergent bound CTCFs are required to stop the movement on cohesin. This study clarifies the relative and combined contribution of CTCF and CTCFL to chromosome organization and transcription, with direct implications for understanding how their co-expression deregulates transcription in cancer.
Results
System to investigate the interplay between CTCFL and CTCF in somatic cells
CTCF and CTCFL have similar zinc finger domains but their N and C terminal regions have no homology as indicated in Fig. 1a. In line with these differences CTCF and CTCFL have very different expression profiles: CTCF is present in all cell types, while in contrast CTCFL is normally only expressed transiently in pre-meiotic male germ cells (Figure. S1A). However, CTCFL is aberrantly activated in a wide variety of cancer types [8] and publicly available data from genomic studies demonstrates that, in the context of cancer, CTCFL exhibits a variety of genetic alterations. As of July 2019, 382 of the 10950 (3%) cancer samples profiled in cBioPortal [38] were found to have genetic changes in CTCFL, with amplification occurring most frequently (58%) in these patient samples (Figure S1B-E). Moreover, there is a clear correlation between amplification of CTCFL and its increased expression in several cancer types including ovarian, uterine, cervical, lung squamous, head and neck cancer (Figure S1E).
Despite the finding that CTCFL is aberrantly expressed in numerous cancers, little is known about its impact on chromatin organization and gene regulation and the mechanism underlying its effector functions and interplay with CTCF. To address these questions, we made use of an auxin-inducible degron (AID) mESC system in which we could study the effects of CTCFL in the presence and absence of CTCF [7]. In this system both endogenous CTCF alleles are tagged with AID as well as eGFP (CTCF-AID-eGFP) (Fig. 1b) and they constitutively express the auxin-activated ubiquitin ligase TIR1 (from Oryza sativa) from the Rosa26 locus. Addition of indole acetic acid (IAA; an analogue of auxin) leads to rapid poly-ubiquitination and proteasomal degradation of the proteins tagged with the AID domain. Degradation after 48 h of auxin treatment in treated and control cells was confirmed by Western blot, fluorescence microscopy and flow cytometry (Fig. 1c-f).
To study the impact of CTCFL expression, we established a rescue system wherein the ESC degron cell line was modified to express either a stable doxycycline-inducible Ctcfl or control wild-type Ctcf transgene (Fig. 1b, c). Individual clones with comparable transgenic expression levels were selected based on Western blot and FACS analysis (Fig. 1d-f). The four conditions used for our analysis are shown in (Fig. 1c and g).
Distinct characteristics of CTCF and CTCFL and their binding sites
While it is known that CTCF and CTCFL bind to both unique and overlapping binding sites [10, 35], it is not known whether these sites have distinct properties and whether binding of the two proteins at different locations leads to distinct effector functions. Furthermore, it is not known whether the presence versus absence of CTCF alters the profile of CTCFL binding and / or its impact on gene expression. Use of the dual CTCF degron system combined with expression of transgenic CTCF or CTCFL provided us with a unique system with which to address these questions. We first performed ChIP-seq (by ChIPmentation) to examine how DNA binding of transgenic CTCF or CTCFL changes in the presence (D) and absence (ID) of endogenous CTCF, using the FLAG tag in our transgenes. We also performed RNA-seq.
RNA seq and ChIP-seq confirmed that CTCFL expression and binding occur only after doxycycline induction (Fig. 2a). As expected we found locations where CTCFL bound to unique sites and sites where it overlapped with CTCF binding. Binding of CTCFL was detected at the promoter and an intragenic site in Ctcfl. CTCF was absent at this site but was bound to an intragenic site overlapping CTCFL binding, as well as two other CTCF sites within the Ctcfl gene. ChIP-seq and RNA-seq indicated that binding of CTCFL at promoters of genes including testis specific Stra8 and Prss50 was linked to their activation [10] (Fig. 2a). Binding within exons of some genes such as Gal3st1 (Figure S2D) was correlated with an increase in transcriptional output [10, 39], while binding at promoters of other genes (Rapgef1) (Figure S2D) was not. Thus, CTCFL binding does not always impact gene expression. Other loci (e.g. the Hoxb cluster) exhibited a preference for CTCF rather than CTCFL binding (Figure S2E).
It is known that CTCF and CTCFL target unique and overlapping sites [10, 35], (Fig. 2b, c). To examine this further we performed FLAG ChIP-seq in the presence of endogenous CTCF (CTCFL D condition. We identified highlighted 16,809 CTCF-only and 9,132 CTCFL-only binding sites, while both proteins shared 14,256 sites. In total 46% of CTCF bound sites were occupied by CTCFL and 61% of CTCFL bound sites were bound by CTCF (Fig. 2c).
In agreement with previous studies [10, 35, 40–42], we found that CTCF and cohesin have an overlapping binding profile (Fig. 2b). RAD21 was bound only at sites where CTCF normally binds ie. CTCF only and CTCF + CTCFL sites but it failed to bind CTCFL only sites (Fig. 2b). Compared to untreated cells (CTCFL U), induction of CTCFL (CTCFL D) did not lead to a drastic alteration in the RAD21 binding profile. However, when CTCFL was expressed in the absence of CTCF (CTCFL ID), RAD21 binding was globally depleted. These data demonstrate that CTCFL does not have the ability to recruit cohesin. As a result, it is unlikely that CTCFL will have the ability to phenocopy CTCF’s function as insulator.
Motif analysis reveals that CTCF only sites contained the consensus CTCF motif (JASPAR MA0139.1). CTCFL only sites had less of a requirement for the ‘A’ base in the triplet where ZF7 binds as shown previously [10], as well as an increase for ‘C’ in the triplet that was bound by ZF4 (Fig. 2d). The change in the ZF7 binding region mirrors differences between ZF7 in CTCF and CTCFL (Fig. 2d, 1a). Changes in the dependence of ‘C’ in the triplet where ZF4 binds could either be explained by differences in ZF4 between CTCF and CTCFL and/or by changes in binding of ZF7 affecting upstream binding of ZF4. Overlapping CTCF and CTCFL binding sites had a motif intermediate to that of CTCF and CTCFL only sites.
In order to define the functional significance of the three different bindings sites (CTCF only, CTCF + CTCFL overlapping and CTCFL only), we compared their genomic distribution. CTCF only sites had a strong preference for intronic and intergenic regions, while in contrast, CTCFL-only sites were predominantly at promoters (Fig. 2e). Overall, only 22% of CTCF sites were at promoters, versus 38% for CTCFL Fig. 2e, f). CTCF binding at promoters typically co-occurred with CTCFL (60% of cases), while CTCFL binding at promoters primarily occurred without CTCF (55% of cases).
When CTCFL was expressed in the presence of endogenous CTCF, a total of 986 genes were significantly deregulated (lfc > 1 and fdr < 0.01) (Fig. 2g). Most CTCFL-regulated genes are not controlled by CTCF and vice versa, since there was little overlap between genes deregulated by induction of CTCFL in the absence of endogenous CTCF (CTCFL U versus ID) and those altered by depletion of endogenous CTCF (CTCF U versus I) (Figure S2A-C). Of interest, for genes found in the overlapping subset, 146 out of 219 were regulated by CTCF and/or CTCFL binding at the promoters, 76 of which had overlapping binding sites (Figure S2C). Analysis of CTCF (U versus I) and CTCFL (U versus D) mediated changes in gene expression in the context of promoter binding demonstrated 36.2% and 50.8% of alterations, respectively (Fig. 2f). Because CTCF was bound to many more sites than CTCFL (Fig. 2c) there was an increase in the number of genes deregulated in the CTCFL (U versus ID) cohort compared to the CTCFL (U versus D) subset (Fig. 2g, Figure S2A). It is important to note that without the degron system it would not have been possible to examine the interplay between CTCF and CTCFL.
Taken together these data demonstrate that CTCFL has more of a preference for binding promoters than CTCF, which highlights the functional differences of the two factors. Furthermore, when CTCF is located at promoters, we found that binding predominantly occurs at CTCF + CTCFL overlapping sites, suggesting that CTCFL binding sites may be functionally distinct from the CTCF only bound sites.
CTCFL activates cancer testes antigens (CTA) and components of cancer relevant signaling pathways
CTCFL, itself a cancer testis antigen referred to as Cancer/Testis Antigen 27 has an impact on expression of other CTA genes. Indeed, upon induction with Dox, Ctcfl and Dll1 were the most highly upregulated genes (Fig. 2g). DLL1 is a Notch ligand known to play a major role in cancers like breast cancer [43, 44] and squamous neoplasias [45] and it is thought to be a promising therapeutic target [46]. Our ChIP-seq data showed CTCFL binding at the promoter of Dll1 and other CTAs such as TSP50 or Prss50 [47] and those belonging to the MAGE family (MAGE-B4, MAGE-E1, MAGE-F1) (Figure S3A, B). CTCFL binding was also linked to increased expression of the ADAM family of proteins (ADAMTS2, ADAMTS15) (Figure S3C). Use of the degron system allowed us to determine whether there is overlap in the genes that CTCF and CTCFL regulate, e.g. Stra8 (Fig. 2a), or whether control is mutually exclusive, as in the case of the other genes highlighted above (Figure S3A-C).
Previous studies have shown that CTCFL transgenic mice die within a few hours after birth. They exhibit ocular hemorrhaging and unfused eyelids, a phenotype typical of mouse models in which the TGFβ pathway is deregulated. In line with this, RNA-seq analysis of ES cells from the mice revealed upregulation of TGFβ1 [39]. Our ChIP-seq and RNA-seq data demonstrated that CTCFL binds to the promoter of Tgfβ1, leading to its upregulation (Figure S3D). These findings highlight the links between CTCFL and TGFβ1. Additionally, a subset of TGFβ1 target genes (Bhlhe40, Klf10, Gadd45b) were upregulated in our dataset [48–50] (Figure S3E). Furthermore, Stat1, a protein with both tumor suppressor and oncogenic properties [51] was bound and activated by CTCFL (Figure S3D). We also identified upregulation of Cited1, which encodes a Cbp/p300-interacting transactivator 1 that is a cofactor of the p300/CBP-mediated transcription complex [52] (Figure S3F). These data demonstrate that ectopic expression of CTCFL is sufficient to trigger expression of a panel of genes that regulate several signaling pathways important in cancer.
The impact of CTCFL on 3D chromatin organization
We next addressed if CTCFL either shares or antagonizes the role of CTCF in chromosome folding, and performed Hi-C (see Figure S4 for quality control (QC) analysis). Consistent with the findings from previous studies our PCA analysis showed that compartments, which separate active euchromatin (A compartment) from inactive heterochromatin (B compartment) remain largely unchanged when CTCF was degraded (CTCF I) [7]. We also could not detect any changes in compartments after induction of CTCFL, either in the presence (CTCFL D) or absence of endogenous CTCF (CTCFL ID) (Fig. 3a). Transgenic expression of CTCFL (CTCFL ID) did not rescue the highly self-interacting topologically associated domain (TAD) structures that form independently of compartments, that were lost upon CTCF depletion [7] (Fig. 3b). In contrast, control experiments using cells that harbor the CTCF transgene CTCF (CTCF ID) were able to restore these structures (Fig. 3c). Consistent with the dose dependent effects of CTCF [7] we found that TADs were strengthened by expressing the CTCF transgene in the presence of endogenous CTCF (CTCF D) (Fig. 3c). However, expression of CTCFL in the presence of endogenous CTCF (CTCFL D) did not dramatically alter TAD structure at a global level (Fig. 3b).
Since CTCFL does not bind everywhere that CTCF binds, the global analysis described above does not provide much insight into the impact of CTCFL on chromatin organization. To address this question, we focused on sites where CTCF and CTCFL binding overlaps and performed an aggregate peak analysis (APA) to estimate the strength of the loops at these locations [53]. In this evaluation, the signals from a set of peak pixels are superimposed such that the color intensity corresponds to the strength of the loops. Cells with intact CTCF (CTCF U) had the strongest loops at CTCF+CTCFL sites, which disappeared when CTCF was degraded (CTCF I) and as expected, could be rescued by expression of control transgenic CTCF (CTCF ID). In contrast, CTCFL was unable to rescue CTCF-mediated looping (CTCFL ID). Furthermore, expression of CTCFL in the presence of CTCF, reduced loop strength, indicating that binding of CTCFL at CTCF overlapping sites impairs loop formation (Fig. 3d). This demonstrates that CTCFL does not have the same function as CTCF in chromosome organization, but that ectopic CTCFL expression disrupts CTCF-mediated genome folding.
We next asked if the changes in loop strength that were seen at sites where CTCF and CTCFL binding overlaps had any functional impact on transcriptional output. In untreated cells (U), the region corresponding to the Prkcβ gene is involved in two loops, one of which has CTCF-CTCFL overlapping binding sites at both anchors and the other only at one of the anchors. Induction of CTCFL led to the disappearance of the loops as well as concomitant overexpression of Prkcb (Fig. 3e). Upregulation of PRKCβ is of interest because it is a protein implicated in several cancers including lymphoma, glioblastoma, breast, prostate and colorectal cancers [54]. In the same snapshot, downregulation of Zkscan2 (that encodes a zinc finger with KRAB and SCAN domain protein) is linked to loss of a loop that has a CTCF-CTCFL overlapping binding site at an anchor adjacent to the gene (Fig. 3e).
In sum, these analyses demonstrate that CTCFL cannot rescue TAD structure and loop strength that are lost after CTCF depletion. Furthermore, while CTCFL does not have a global impact on TAD structure, it does have an impact on looping at CTCF + CTCFL overlapping sites. Importantly, binding of CTCFL at CTCF + CTCFL overlapping binding sites was linked to differential expression of genes within altered loops. These findings have implications for the role of CTCFL in altering chromatin organization and gene expression in the context of cancer where CTCFL is expressed in the presence of CTCF.
CTCFL does not physically interact with cohesin
There is some controversy about which region of CTCF interacts with cohesin. While one report demonstrates physical interaction between the C terminal region of CTCF (amino acids 575 to 611) and the SA2 subunit of cohesin [37], other studies that deleted these amino acids (577-614) showed that they are dispensable [36, 55, 56]. In agreement with others, we find that RAD21 overlaps with CTCF binding and does not occupy sites bound exclusively by CTCFL (Fig. 2b) [10, 35]. It is thus likely that CTCFL fails to physically interact with cohesin, but this has not been directly demonstrated. To investigate, we performed co-immunoprecipitation experiments with lysates from cells induced to express transgenic CTCFL or control transgenic CTCF, in the presence or absence of endogenous CTCF (CTCF D and ID; CTCFL D and ID). We used an antibody to FLAG to pull down transgenic proteins followed by Western blotting with a RAD21 antibody, to determine whether the two proteins interact with the cohesin complex under the different culture conditions. As shown in Fig. 3f, we find that RAD21 interacts with CTCF but fails to interact with CTCFL. Surprisingly, we were unable to visualize a RAD21 band in cells induced to express CTCFL in the presence of endogenous CTCF (CTCFL D). This suggests that CTCF and CTCFL may not interact with each other in contradiction to findings from a previous study [35].
The role of CTCF and CTCFL zinc fingers and N/C terminal regions in site specific binding
While it is known that the zinc fingers 6 and 7 of CTCF and CTCFL can define site specific selectivity [10, 57], little is known about whether there are other functional contributions made by the zinc fingers or N/C terminal regions of each factor. In order to investigate, we inserted transgenic CTCFL and CTCF with both their N and C terminal domains swapped into the Tigre locus. The fusion proteins (CTCF N terminus - CTCFL zinc fingers - CTCF C-terminus; CTCFL N terminus - CTCF zinc fingers - CTCFL C-terminus) are abbreviated as CLC and LCL where C stands for CTCF and L stands for CTCFL (Fig. 4a). The fusion protein transgenes, along with their FLAG and mRUBY tags, were expressed at the same levels as intact transgenic CTCF and CTCFL, as demonstrated by both flow cytometry (Figure S5A) and western blotting (Fig. 4b). RNA-seq analysis revealed that each transgenic construct expressed the appropriate domains of CTCFL/CTCF as indicated by peaks at the respective exons (Figure S5B).
FLAG ChIP-seq in the presence of endogenous CTCF (D condition) revealed that LCL has a similar binding profile to CTCF, including at CTCF-only sites (Fig. 4c and Figure S5C, D). The CTCF zinc fingers are therefore the main determinant of CTCF binding specificity. On the contrary, CLC only bound CTCF+CTCFL sites, and is unable to target CTCFL-only sites. CLC also exhibits overall reduced binding compared to CTCFL (Fig. 4c). This indicates that the N/C domain of CTCFL provides overall stability as well as binding specificity of CTCFL to CTCFL-only sites. We also found that LCL can bind to a subset of CTCFL only sites where CTCF does not bind (Figure S5D), reinforcing the idea that the N/C domains of CTCFL participate in targeting at CTCFL-only sites. Interestingly, clustering analysis of binding at CTCF and CTCFL only sites revealed that CLC is able to bind weakly to some CTCF only sites when CTCF is present but not when it is absent (Figure S5D), indicating that perhaps the N and C terminals of CTCF facilitate interaction between CTCF and CLC and this in turn directs binding of CLC to these sites. In support of this idea, recent studies have shown that an RNA binding region on the C terminal region of CTCF mediates CTCF clustering [56, 58]. Examples of fusion and parent protein binding are shown in the screenshots in Fig. 4d-f. CTCFL and CTCF-CTCFL overlapping binding sites are frequently bound by both CLC and LCL (Fig. 4d, e), but at some locations fusion protein peaks are reduced in size compared to that of CTCFL (eg. at Ctcfl, Prss50, Gal3st1 loci) (Fig. 4d). Sites bound by CTCF only were preferentially bound by LCL as opposed to CLC (Fig. 4f). Interestingly, RAD21 ChIP-seq reveals that CLC and LCL can enrich cohesin to CTCFL only sites, where it does not normally go (Fig. 4c). This suggests that both the N/C domain or CTCF (present in CLC) as well as the zinc fingers (present in LCL) participate in how CTCF recruits cohesin.
From the ChIP-seq data we demonstrate that the binding motif for sites where LCL and CLC bind is similar to that of CTCF and CTCFL, respectively (Fig. 4g). These findings indicate that as expected, zinc fingers direct sequence specific binding. In contrast, when we analyzed the genome annotation intervals (UTR, promoters, introns, exons, downstream and distal intergenic regions) of the fusion protein binding sites, we identified a preference for LCL to be at promoters and CLC to be at intergenic and intronic regions. Thus, LCL resembles CTCFL and CLC resembles CTCF in this aspect of their behavior (Fig. 4h). These data reveal that the N and C terminal regions of CTCF and CTCFL contribute functionally to where these factors bind.
Taken together, these data indicate that both the zinc fingers and N and C terminal regions play distinct roles in site directed binding. LCL and CLC resemble CTCF and CTCFL, respectively in terms of their binding motifs highlighting the importance of the zinc fingers. The opposite is the case when it comes to the regions they prefer to bind (promoters versus intergenic and intronic regions): CLC and LCL resemble CTCF and CTCFL, respectively underscoring the functional contributions of the N and C terminal domains.
Gene expression changes of fusion proteins do not phenocopy that of either parent protein
To determine whether the N and C terminal regions of the CTCF and CTCFL proteins influence transcriptional output we performed RNA-seq on cells expressing LCL and CLC in the presence and absence of CTCF. Fewer genes were deregulated upon induction of LCL (265 genes) and CLC (254 genes) (Fig. 5b, c) compared to induction of CTCFL (986 genes) in the presence of CTCF (Fig. 2g). Thus, neither CLC nor LCL can phenocopy the impact of CTCFL, underscoring the functional importance of both the zinc finger and N/C terminal domains of this factor. Induction of CLC and LCL in the absence of CTCF, led to an increase in the number of genes that were up and downregulated in each case (Fig. 5e, f). However, fewer genes were deregulated than in cells where CTCF was depleted alone (Figure S2B) suggesting that both factors were able to perform a partial rescue.
Although the two fusion proteins are largely incapable of phenocopying the impact of the parent proteins on gene expression, there are examples of loci where we see concordant and discordant changes (Figure S6). At Gadd45g, concordant changes in gene expression are mediated by CTCFL and CLC suggesting that the zinc finger region of CTCFL is important for the regulation of this gene (Figure S6A). Both parent and fusion proteins are bound upstream of Gadd45g, but CTCF removal and induction of LCL have no effect on this gene’s expression status, underscoring the fact that binding does not always equate with proximal changes in gene expression. At the Igf2 locus, induction of LCL and removal of CTCF leads to its upregulation, indicating that the two factors act discordantly: LCL activates and CTCF represses expression of this gene (Figure S6B). LCL and CTCF bind at an overlapping site upstream of Igf2os suggesting this may be a direct effect regulated by the zinc fingers of each protein. Interestingly both CLC and CTCFL appear to activate Igf2, although neither factor binds to the upstream region, suggesting an indirect or long-distance effect that results from binding at a distal site. At Prss50 and Steap1, LCL and CLC mediated-activation are concordant with the effects of CTCFL. Furthermore, degradation of CTCF downregulates Steap1 expression indicating that CTCF is also important for its activation. Here both N/C terminal regions and zinc fingers of CTCFL contribute to the regulation of these genes (Figure S6C). The Steap 1 locus provides an example where CLC, LCL as well as CTCFL can mediate rescue after CTCF removal. At the Egr1 locus CLC and LCL act independently of CTCF and CTCFL in regulating transcription (Figure S6D). These examples highlight the fact that overlapping changes in fusion protein-mediated deregulated genes are not necessarily concordant with expression changes mediated by either parent protein.
The impact of fusion proteins on chromatin organization
Our finding that CTCFL was unable to rescue the impact of CTCF depletion on TAD structure due to its inability to bind cohesin (Fig. 3), begs the question of whether the zinc fingers or N/C terminal domains contribute to this aspect of CTCF’s function. To determine this, we performed Hi-C (see Figure S7A, A for QC) and asked if either fusion protein (CLC or LCL) could restore chromatin folding (ID condition). At a global level we observed that unlike the control CTCF transgene, transgenic CLC and LCL were unable to restore the alterations in looping mediated by CTCF loss (Fig. 6a). Furthermore, CLC and LCL had little impact on TAD structure when expressed in the presence of endogenous CTCF (D condition) (Figure S7C). Expression of transgenic CTCFL, CLC and LCL in the presence of endogenous CTCF also had no significant effect on TAD size or number. This reveals that the N/C domains of CTCFL are important for the way it interferes with endogenous CTCF-mediated chromosome folding at CTCF+CTCFL sites. As expected we detected fewer, larger TADs upon CTCF depletion and rescue of these effects was only achieved by expression of transgenic CTCF, while induction of CTCFL or the fusion proteins had no effect on this aspect of chromatin organization (Figure S7D, D).
Given that CTCF and CTCFL bind unique and overlapping binding sites that reflect differences in binding sequence and functional properties (Fig. 2), we next asked whether the unique and overlapping binding have different insulation scores at boundary regions in untreated control cells.
Indeed, boundary insulation was highest at CTCF only binding sites and reduced at overlapping sites and even further reduced at CTCFL only sites (Fig. 6b). Furthermore, insulation at CTCF only sites was maximally affected when CTCF was degraded, while CTCF + CTCFL overlapping binding sites were less affected and CTCFL only sites were least altered (Fig. 6c).
Since the interaction between CTCF and cohesin is important for the establishment of TAD structures [7, 59, 60] we used the fusion proteins LCL and CLC to determine whether the zinc fingers and/or the N/ C terminal regions were involved. We performed co-immunoprecipitation experiments with lysates from cells harboring chimeric proteins (CLC and LCL) in the presence and absence of CTCF. With this approach we could detect that RAD21 is pulled down with CLC but not LCL in the absence of endogenous CTCF (Fig. 6d). This finding demonstrates that the N/C terminal regions of CTCF are involved in mediating the interaction with cohesin. Interestingly, pulldown of RAD21 was seen with LCL in the presence, but not absence of CTCF, indicating that C and N terminal regions mediate interaction with CTCF (Fig. 6d). In this context it is pertinent to note that CTCF zinc fingers 1 and 10 have RNA binding regions that are important and may cooperate with the C terminal RNA binding region in oligomerization[55, 56].
To determine whether CLC and / or LCL could rescue insulation after depletion of CTCF, we separately analyzed the impact on boundary insulation at CTCF only, CTCF + CTCFL overlapping sites and CTCFL only sites. As shown in Fig. 6e-g, depletion of CTCF (CTCF I) leads to a loss of boundary insulation that can be rescued by expression of transgenic CTCF (CTCF ID) but not CTCFL (CTCFL ID) or CLC (CLC ID). However, LCL was partially able to restore boundary insulation. Similar effects were seen across the three different sites but the scale of the changes was reduced from CTCF only to CTCFL only sites, with the overlapping sites exhibiting an intermediate affect. The difference in insulation score at each site for the various conditions is shown in Figure S7F-H. Partial rescue of boundary insulation by LCL is consistent with the findings from our recent study showing that zinc fingers of CTCF contribute to insulation [55]. However, it is not clear why CLC, which we showed can physically interact with cohesin, has no impact. One explanation for this is that CLC does not bind CTCF only sites, and furthermore has a reduced capacity to bind chromatin overall, especially CTCF + CTCFL overlapping sites and CTCFL only sites compared to CTCF (Fig. 4c). In contrast, LCL which partially rescues insulation at all binding sites has the ability to bind at all three sites. Taken together, these data highlight site-specific differences in insulation strength at boundary regions within CTCF only, CTCF + CTCFL overlapping and CTCFL only binding sites. Partial rescue of insulation can be achieved by expressing transgenic LCL, with the scale of changes at each site reflecting the differences in boundary scores at the different locations. We therefore conclude that the N/C domains of CTCF are required but not sufficient for its function as a chromosome organizer.
The N terminus of CTCF interacts with RAD21
Given our finding that CLC can interact with RAD21 (Fig. 3f), we wanted to determine whether the N or the C terminal is responsible for this aspect of CTCF’s function. To investigate, we inserted transgenic CTCFL chimeric proteins into the Tigre locus with either their N or C terminal domains swapped with those of CTCF. The fusion proteins (CTCF N terminus - CTCFL zinc fingers - CTCFL C-terminus; CTCFL N terminus - CTCFL zinc fingers - CTCF C-terminus) were abbreviated as CLL and LLC where C stands for CTCF and L stands for CTCFL (Fig. 7a). The chimeric protein transgenes, along with their FLAG and mRUBY tags, were expressed at the same levels as intact transgenic CTCF and CTCFL, as demonstrated by flow cytometry (Fig. 7b).
FLAG ChIP-seq in the presence of endogenous CTCF (D condition) revealed that CLL has a similar binding profile to CTCFL, although compared to the latter, binding was much reduced. In the case of LLC we detected almost no binding in the Dox condition (Fig. 7c). Binding of both transgenes was increased at CTCF + CTCFL overlapping sites in the absence of CTCF (ID condition) (Fig. 7d), which suggests that the two chimeric proteins are unable to compete effectively with CTCF at these sites.
Moreover, these data indicate that the C terminal region of CTCFL is more important than the N terminal region for CTCFL’s binding. Binding of the single swapped chimeric proteins, CLL and LLC was also reduced compared to the double swapped CLC chimera further suggesting that the C and N terminals of CTCF cooperate with each other and are both important for binding.
Co-immunoprecipitation experiments with lysates from cells harboring chimeric protein (CLL) in the presence and absence of CTCF revealed that RAD21 was pulled down with CLL (Fig. 7e), indicating that it is the N terminal region of CTCF that is involved in mediating the interaction with cohesin. This finding is significant because it explains why convergently orientated binding of CTCF is important for halting the movement of cohesin on chromatin because the N terminal region of CTCF would be the first point of encounter with cohesin.
Discussion
CTCF plays a key role in organizing chromatin into highly self-interacting topologically associated domain (TAD) structures by promoting the formation of insulating loops and boundaries that are important for gene regulation. It is a ubiquitously expressed factor in contrast to its paralogue, CTCFL which is normally only transiently present in testis but, it is frequently aberrantly expressed in numerous cancers due to genetic abnormalities. As a result of shared and unique zinc finger sequences in CTCF and CTCFL, CTCFL can bind competitively to a subset of CTCF binding sites as well as its own unique locations. While this has been known for some time, the impact of CTCFL on chromosome organization and gene expression has not been comprehensively analyzed in the context of CTCF function. Indeed, CTCFL has largely been studied in a cancer setting which has many other confounding genetic aberrations. Here we made use of a complementation system incorporating inducible auxin degradable endogenous CTCF, combined with doxycycline inducible transgenes encoding CTCF, CTCFL or CTCF-CTCFL chimeric proteins. This approach enabled us to analyze the impact of CTCF and CTCFL expression either individually or in concert, and determine the unique functional impact of each factor as well as the interplay between the two. Use of the chimeric CTCF-CTCFL proteins further provided us with a tool to tease out the contribution of the zinc finger and N/C terminal domains to their individual and shared functions.
Our studies demonstrate that CTCF and CTCFL bind to common and overlapping sites that have distinct properties, highlighting an interesting facet of functional importance: not all CTCF and CTCFL binding sites are created equal. First, CTCF-only binding sites exhibit a preference for intronic and intergenic regions while CTCFL binding is biased towards promoter regions. Specifically, CTCFL is more likely to bind promoter sites than CTCF and when CTCF binds these sites, it predominantly binds locations where CTCF and CTCFL binding overlaps. While there is little overlap in the changes in gene expression mediated by CTCF and CTCFL, of the 219 genes found in the overlapping subset, 146 are regulated by CTCF and/or CTCFL binding to the same promoters and 76 of these are overlapping binding sites. Interestingly, although only 21.52% of CTCF binding events occur at promoter sites, most CTCF-mediated gene expression changes (36.2%) are linked to binding at these locations. This highlights a common role for CTCF and CTCFL in acting as transcription factors at sites where CTCF binding overlaps with that of CTCFL at promoters. At other CTCF sites, namely the predominantly intergenic or intronic only sites, CTCF may bind enhancers or behave more like an insulator, controlling gene expression in a more distal or indirect manner, respectively. Support for the notion that these sites are linked to insulation comes from our finding that CTCF depletion at CTCF only sites results in a more pronounced difference in insulation boundary score than CTCF depletion at CTCF + CTCFL overlapping sites or CTCFL only sites.
What mediates the distinct functional impact at the different binding site subsets? We speculate that it has to do with cofactors binding to the C/N terminal domains. Indeed, use of the chimeric proteins allowed us to demonstrate that it is these domains in CTCF and CTCFL that influence promoter versus intronic and intergenic site bias such that the LCL fusion protein has more of a preference for promoter binding sites compared to CTCF and conversely, CLC has more of a preference for intronic and intergenic regions compared to CTCFL. Given that the N and C terminal regions of CTCF and CTCFL are very different it is highly likely that the cofactors they bind are also different.
As mentioned above, our studies also highlight that the different binding sites, CTCF only, CTCF + CTCFL overlapping and CTCFL only, have different boundary insulation scores. We hypothesized that these differences result from differences in CTCF’s and CTCFL’s relationship with cohesin. In support of this notion, we show that RAD21, a component of the cohesin complex only binds sites where CTCF is bound and that binding of RAD21 at CTCF + CTCFL overlapping sites is dependent on the presence of CTCF. These distinctions can be explained by our finding that, in contrast to CTCF, CTCFL does not physically interact with cohesin. As a result, CTCFL cannot rescue the TAD structures that are lost through CTCF depletion.
Expression of transgenic CTCF-CTCFL chimeric proteins in the presence and absence of CTCF enabled us to demonstrate that the N terminal region is responsible for CTCF’s interaction with cohesin. This finding is important because it explains why convergently bound CTCF halts cohesin’s movement on chromatin. Surprisingly we were not able to restore loss of CTCF-mediated boundary insulation by expressing transgenic CLC, but LCL, which cannot interact with cohesin could partially restore this aspect of CTCF’s function. One explanation for these results can be found through an examination of the binding profiles of the two fusion proteins: while LCL can bind to CTCF as well as CTCFL sites, CLC is unable to bind to CTCF only sites and binding to CTCFL sites was significantly reduced compared to binding of LCL and CTCFL.
It is of note that there are considerable differences between the sequences of ZF1, ZF10 and ZF11 in CTCF versus CTCFL. The zinc fingers of CTCF have two RNA Binding Regions (RBRs). One RBR extends from amino acids 264-275 that stretch from nearly the end of the N terminal domain through ZF1 and the other encompasses amino acids 536-544 in ZF10 [36, 55]. The RBR at ZF1 (KTFQCELCSYTCPR) of CTCF shows a clear difference in sequence from that of CTCFL (GTFHCDVCMFTSSR, differences bolded), while the RBR at ZF10 (QLLDMHFKR) is relatively conserved (QLLNAHFRK). Deletions in both RBRs disrupt DNA binding, gene expression and CTCF-mediated loop formation at a subset of sites, with the mutant ZF10 having less of an impact than mutant ZF1 [55]. In addition, the C terminal 576-611 amino acids that connect the C terminal domain of CTCF with ZF 11 (also an RBR), have been shown to be important for the diffusion, clustering, target search and self-association of CTCF. This RBR region does not physically interact with cohesin, but contributes to the formation of CTCF clusters in an RNA dependent manner and these clusters block extruding cohesin [56, 58]. Additionally, as with the mutant ZF1 and 10 RBRs, deletion of the RBR at the C terminal region results in reduced CTCF binding and loss of a subset of CTCF-mediated loops as well as alteration in gene expression [56]. Together these studies indicate that both the C terminal region and zinc fingers contribute to CTCF binding, CTCF-mediated loop formation and gene expression. These findings are consistent with our results showing that the zinc finger and C/N terminal domains have distinct contributions to binding site preference and regulation of chromosome organization. Also consistent with these two published studies, we found that the zinc finger and C/N terminal domains have an important functional impact on the transcriptional changes mediated by CTCF and CTCFL. This is highlighted by our finding that overall both CLC and LCL were ineffectual at mediating changes in gene expression in comparison to intact CTCF and CTCFL. However, close inspection of individual genes revealed sites at which the fusion proteins acted concordantly, discordantly or independent of the parental proteins. Together these findings underscore the combined roles played by zinc fingers and the N/C terminal regions in site-specific regulation of gene expression.
What role does CTCFL play in regulating chromatin organization and gene expression in a cancer setting where it is expressed in the presence of CTCF? At CTCF + CTCFL overlapping binding sites where CTCFL can bind competitively with CTCF we demonstrated that even in the presence of CTCF, CTCFL can have an impact on chromosome organization, reducing the strength of the aggregate peak enrichment of chromatin loops and in some places abrogating loop formation altogether. Importantly, binding of CTCFL at CTCF + CTCFL overlapping binding sites was linked to differential expression of genes within the loops. These findings have implications for the role of CTCFL in altering chromatin organization and gene expression in the context of cancer.
Conclusion
In sum, use of the complementation system incorporating auxin degradable endogenous CTCF combined with doxycycline inducible transgenic CTCF, CTCFL and CTCF-CTCFL in the presence and absence of CTCF enabled us to demonstrate that CTCF’s and CTCFL’s unique and overlapping binding sites have distinct binding sequences, biases for being in promoters rather than intronic or intergenic regions and boundary insulation scores. Furthermore, our studies highlight unique functional aspects of the zinc finger and C/N terminal domains of CTCF and CTCFL in controlling binding site preference as well as site-specific effects on chromosome organization and gene expression. Future studies will clarify the identity of the cofactors that facilitate the site-specific functions of CTCF and CTCFL, and the genetic system we have developed here will be a useful tool for addressing this question.
Methods
Cell lines
Mouse embryonic stem cells E14Tg2a (karyotype 19, XY; 129/Ola isogenic background) and all clones derived from these were cultured under feeder-free conditions in 0.1% gelatin (Sigma ES-006-B) coated dishes (Falcon, 353003) at 37°C and 5% CO2 in a humidified incubator. The cells were grown in DMEM (Thermo Fisher, 11965-118) supplemented with 15% fetal bovine serum (Thermo Fisher, SH30071.03), 100 U/ml penicillin - 100 μg/ml streptomycin (Sigma, P4458), 1 X GlutaMax supplement (Thermo Fisher, 35050-061), 1 mM sodium pyruvate (Thermo Fisher, 11360-070), 1 X MEM non-essential amino-acids (Thermo Fisher, 11140-50), 50 μM b-mercaptoethanol (Sigma, 38171), 104 U/ml leukemia inhibitory factor (Millipore, ESG1107), 3 μM CHIR99021 (Sigma, SML1046) and 1 μM MEK inhibitor PD0325901 (Sigma, PZ0162). The cells were passaged every alternate day by dissociation with TrypLE (Thermo Fisher, 12563011).
DNA constructs
Construction of vector for cloning transgenic, doxycycline-inducible expression of Ctcfl
cDNA clone for Mus musculus Ctcfl (NCBI Gene ID: 664799) was purchased from Transomic Technologies (TCMS1004). The cDNA was amplified such that it harbors AflII sequence at the 3’ end of the gene and was fused with FLAG tag (that harbors NotI sequence) at 5’ end with the help of a fusion PCR. The resultant fragment was digested with NotI and AflII. The Ctcf gene was removed from pEN366[7] by digesting with the same enzymes. This backbone was used for insertion of Ctcfl as well as the chimeric constructs.
For construction of Ctcf and Ctcfl with the terminals swapped
To construct a hybrid gene with Ctcf N terminus - Ctcfl zinc fingers - Ctcf C-terminus, the region encoding the first 265 amino acids of mice Ctcf was fused in frame to the region encoding amino acids 259 to 568 of mice Ctcfl and the 159 (578-736) C-terminal amino acids of Ctcf. The fragments of Ctcf were amplified from pEN366 [7] and Ctcfl from cDNA clone (TCMS1004, Transomic Technologies). The resulting plasmid was named pCLC (‘C’ for Ctcf and ‘L’ for Ctcfl) and the transgene is referred to as CLC henceforth. Similarly, to construct a hybrid of mice Ctcfl N terminus - Ctcf zinc fingers - Ctcfl C-terminus protein, the region encoding the first 258 amino acids of Ctcfl was fused in frame to the regions encoding amino acids 266 to 577 of Ctcf and the 68 (569-636) C-terminal amino acids of Ctcfl. The plasmid was named pLCL and the transgene as LCL respectively. The construction of these mutant genes was achieved by swapping one terminus at a time using a two-step PCR overlap extension method. In brief, cDNA region corresponding to each of the terminals and zinc fingers were PCR amplified in such a way that it included a short stretch of the 5′ and/or 3′ region of the neighboring fragment to be connected. The desired PCR products were then annealed, amplified by PCR and cloned into the NotI and AflII sites of pEN366 backbone. All of the constructs were verified by DNA sequence analysis. The transgenes with one terminus each of CTCFL swapped with that of CTCF were constructed and named using the same terminology as LLC (Ctcfl with C-terminal Ctcf) and CLL (Ctcfl with N-terminal Ctcf). With all transgenes, the final vector harbors an N terminal 3 X FLAG tag and a C terminal mRuby as in-frame fusion to the transgenes (Ctcfl, Ctcf, LCL, CLC, LLC and CLL). It also harbors TetO-3G element and rtTA3G for doxycycline induced expression of the transgene and homology arms surrounding the sgRNA target site of the Tigre locus for locus-specific insertion. The selection of stable integrants was achieved by virtue of FRT-PGK-puro-FRT cassette. Further details of the vector are described elsewhere [7]. The vector pX330-EN1201 [7] harboring spCas9 nuclease and Tigre-targeting sgRNA was used for targeting of Tigre locus.
Gene targeting
Mouse embryonic stem cell E14Tg2a harboring Ctcf-AID-eGFP on both alleles and a knock-in of pEN114 - pCAGGS-Tir1-V5-BpA-Frt-PGK-EM7-PuroR-bpA-Frt-Rosa26 at Rosa26 locus was used as the parental cell line for making all the transgenes [7]. pEN366 derived vectors harboring the rescue transgenes (Ctcf, Ctcfl as well as chimeric proteins) were used for targeting transgenes to the Tigre locus [7]. For nucleofections, 15 μg each of plasmids harboring the transgenes and 2.5 μg of those with sgRNA targeting the Tigre locus was used. Nucleofection were performed using Amaxa P3 Primary Cell kit (Lonza, V4XP-3024) and 4D-transfector. 2 million cells were transfected with program CG-104 in each case. The cells were recovered for 48 h with no antibiotic followed by selection in puromycin (1 μg/mL) (Thermo Fisher, A1113803). Single colonies were manually picked and expanded in 96 well plates. Clones were genotyped by PCR and FACS was performed to confirm that the level of expression of transgenes were comparable. All the clones that were used for the analyses were homozygous for the integration of the transgenes and their levels of expression were comparable.
Induction of auxin inducible degradation of CTCF and doxycycline induced expression
For degradation of endogenous CTCF, the auxin-inducible degron was induced by adding 500 μM indole-3-acetic acid (IAA, chemical analog of auxin) (Sigma, I5148) to the media. Expression of transgenes were achieved by the addition of doxycycline (Dox, 1 μg/ml) (Sigma, D9891) to the media. The cells were treated with IAA and/or Dox for 2 days unless mentioned otherwise.
Western Blotting
mESCs were dissociated using TrypLE, washed in PBS, pelleted and used for western blotting. Approximately 2 million cells were used to prepare cell extract. Cell pellets were resuspended in RIPA lysis buffer (Thermo Fisher, 89900) with 1X HALT protease inhibitors (Thermo Fisher, 78430), incubated on ice for 30 min, spun at 4°C at 13,000 rpm for 10 min and supernatant was collected. For the western blot of CTCF, low salt lysis buffer (0.1 M NaCl, 25 mM HEPES, 1 mM MgCl2, 0.2 mM EDTA and 0.5% NP40) was used supplemented with 125 U/ml of benzonase (Sigma E1014). Protein concentration was measured using the Pierce BCA assay kit (Thermo Fisher, 23225). 20 μg of protein were mixed with Laemmli buffer (Biorad, 1610737) and b-mercaptoethanol, heated at 95°C for 10 min and run on a Mini-protean TGX 4%-20% polyacrylamide gel (Biorad, 456-1095). The proteins were transferred onto PVDF membranes using the Mini Trans-Blot Electrophoretic Transfer Cell (Bio-Rad, 170-3930) at 80 V, 120 mA for 90 min. PVDF membranes were blocked with 5% BSA in 1 X TBST prior to the addition of antibody. The membranes were probed with appropriate antibodies overnight at 4°C (anti-rabbit histone H3 (abcam, ab1791; 1: 2,500 dilution), anti-mouse FLAG antibody (Sigma, F1804; 1: 1,000 dilution), anti CTCF (active motif, 61311), anti Rad21 (ab992). Membranes were washed five times in PBST (1 × PBS and 0.1% Tween 20) for 5 min each and incubated with respective secondary antibodies in 5% BSA at room temperature for 1 h. The blots were rinsed in PBST and developed using enhanced chemiluminescence (ECL) and imaged by Odyssey LiCor Imager (Kindle Bioscien ces).
Immunoprecipitation
For immunoprecipitation of nuclear lysates, cells were first lysed in 5 X pellet-volume of ice-cold Buffer A (10 mM Tris-HCl (pH 7.5-7.9), 1.5 mM MgCl2, 10 mM KCl, 0.5 mM DTT, 0.2 mM PMSF, 0.1% NP40) supplemented with complete EDTA-free tablets (Roche) while rotating in the cold-room for 10 minutes.
Nuclei fractions were then isolated by spinning down the lysate at 1000 xg for 5 minutes at 4°C. The remaining nuclear pellet was then resuspendded in 5 X pellet-volume of ice-cold Buffer C (10 mM Tris-HCl (pH 7.5-7.9), 25% glycerol, 0.42 M NaCl, 1.5 mM MgCl2, 0.2 mM EDTA, 0.5 mM DTT, 0.5 mM PMSF) supplemented with complete EDTA-free tablets and placed on the cold-room rotator for 60 minutes. Soluble nuclear extracts were then cleared by centrifugation at 20,000 xg for 10 minutes at 4°C. The remaining insoluble nuclear pellet was dissolved in 3 X pellet-volume of Urea-Chaps Buffer (8 M Urea, 20 mM HEPES, 1% CHAPs), vortexed vigorously at 10-minute intervals over a 30-minute incubation at room-temperature, and then combined with the soluble nuclear extract to make the complete nuclear lysate. BCA Assay (ThermoFisher) was used to determine protein levels of each sample in which 2 mg of nuclear lysates was incubated overnight with 50 uL of ANTI-FLAG M2 magnetic beads (Sigma Cat# M8823) at 4°C. Beads were washed 3 X in ice-cold IP wash buffer (20 mM Tris-HCl (pH 7.5-7.9), 150 mM NaCl, 1 mM EDTA, 0.05% Triton X-100, 5% Glycerol) and eluted at 95°C for 10 minutes into 1 X SDS-Page Buffer (Bio-Rad) supplemented with 5% BME. Samples were resolved by SDS-Page using 4-20% gradient gels (BioRad) and transferred to PVDF membranes by a wet transfer protocol. Immunoblotting was performed using 5% BSA for both blocking and primary or secondary horseradish peroxidase-conjugated antibody incubation. Primary antibodies used were anti-Flag M2 (Sigma, F1804) (1:1000) or anti-Rad21 (Abcam, ab992) (1:1000), and secondary antibodies used were Mouse IgG HRP Linked Whole (Sigma, GENA931) (1:2000) or Mouse Anti-Rabbit IgG (Light Chain Specific) (CST, #93702S) (1:5000). Blots were developed using enhanced chemiluminescence (ECL) and imaged by Odyssey LiCor Imager (Kindle Bioscien ces).
Flow cytometric analysis
Cells were dissociated with TrypLE, washed and resuspended in MACS buffer for flow cytometric analysis on LSRII UV (BD Biosciences). Analysis was performed using the FlowJo software.
Microscopy
Images were acquired on EVOS FL Color Imaging System using a 20 X objective.
ChIPmentation
mESCs were dissociated using TrypLE, washed in PBS and fixed in 1% formaldehyde for 10 min at room temperature. Quenching was performed by adding glycine to a final of 0.125 M followed by incubations of 5 min at room temperature and 15 min at 4°C. The cells were washed twice in PBS with 0.125 M glycine, pelleted, snap frozen and stored at -80°C till use. Fixed cells (10 million) were thawed on ice, resuspended in 350 μl ice cold lysis buffer (10 mM Tris-HCl (pH 8.0), 100 mM NaCl, 1 mM EDTA (pH 8.0), 0.5 mM EGTA (pH 8.0), 0.1% sodium deoxycholate, 0.5% N-lauroysarcosine and protease inhibitors) and lysed for 10 min by rotating at 4°C. Chromatin was sheared using a bioruptor (Diagenode) (25 cycles: 30 sec on, 30 sec off). Triton X-100 was added to a final concentration of 1% and the samples were centrifuged for 5 min at 16000 rcf at 4°C. Supernatant was collected and shearing was continued for another 10 min and the chromatin was quantified. FLAG M2 Magnetic Beads (Sigma, M8823) were used for flag ChIPs. In other cases (CTCF, Cohesin, IgG) antibodies were bound to protein A magnetic beads by incubation on a rotator for one hour at room temperature. 10 μl each of antibody was bound to 50 μl of protein-A magnetic beads (Dynabeads). and added to the sonicated chromatin from 10 million cells per immunoprecipitation. The beads were washed and tagmentation were performed as per the original ChIPmentation protocol (Schmidl et al., 2015). In short, the beads were washed successively twice in 500 μl cold low-salt wash buffer (20 mM Tris-HCl (pH 7.5), 150 mM NaCl, 2 mM EDTA (pH 8.0), 0.1% SDS, 1% tritonX-100), twice in 500 μl cold LiCl-containing wash buffer (10 mM Tris-HCl (pH 8.0), 250 mM LiCl, 1 mM EDTA (pH 8.0), 1% triton X-100, 0.7% sodium deoxycholate) and twice in 500 μl cold 10 mM cold Tris-Cl (pH 8.0) to remove detergent, salts and EDTA. Subsequently, the beads were resuspended in 25 μl of the freshly prepared tagmentation reaction buffer (10 mM Tris-HCl (pH 8.0), 5 mM MgCl2, 10% dimethylformamide) and 1 μl Tagment DNA Enzyme from the Nextera DNA Sample Prep Kit (Illumina) and incubated at 37°C for 1 min in a thermocycler. Following tagmentation, the beads were washed successively twice in 500 μl cold low-salt wash buffer (20 mM Tris-HCl (pH 7.5), 150 mM NaCl, 2 mM EDTA (pH8.0), 0.1% SDS, 1% triton X-100) and twice in 500 μl cold Tris-EDTA-Tween buffer (0.2% tween, 10 mM Tris-HCl (pH 8.0), 1 mM EDTA (pH 8.0)). Chromatin was eluted and de-crosslinked by adding 70 μl of freshly prepared elution buffer (0.5% SDS, 300 mM NaCl, 5 mM EDTA (pH 8.0), 10 mM Tris-HCl (pH 8.0) and 10 ug/ml proteinase K for 2 hours at 55°C and overnight at 65°C. The supernatant was collected and saved. The beads were supplemented with an additional 30 μl of elution buffer, incubated for 1 h at 55°C and the supernatants were combined. DNA was purified using MinElute Reaction Cleanup Kit (Qiagen 28204) and eluted in 20 ul. Purified DNA (20 μl) was amplified as per the ChIPmentation protocol [61] using indexed and non-indexed primers and NEBNext High-Fidelity 2X PCR Master Mix (NEB M0541) in a thermomixer with the following program: 72°C for 5 m; 98°C for 30 s; 14 cycles of 98°C for 10 s, 63°C for 30 s, 72°C for 30 s and a final elongation at 72°C for 1 m. DNA was purified using Agencourt AMPure XP beads (Beckman, A63881) to remove fragments larger than 700 bp as well as the primer dimers. Library quality and quantity were estimated using Tapestation bioanalyzer (Agilent) as well as Qubit (ThermoFisher) assays. Samples were quantified using and Library Quantification Kit (Kapa Biosystems, KK4824) and sequenced with Illumina Hi-Seq 4000 using 50 cycles single-end mode.
RNA seq
mESCs were dissociated using TrypLE, washed in PBS, pelletted and used used for extracting RNA. RNA from were extracted from 2.5 million cells using RNeasy plus kit (Qiagen 74134) in each case. The poly-adenylated transcripts were positively selected from the RNA using the NEBNext Poly(A) mRNA Magnetic Isolation Module (E7490) following the manufacturer’s protocol. Libraries were prepared according to the directional RNA-seq dUTP method adapted from http://wasp.einstein.yu.edu/index.php/Protocol:directionalWholeTranscript_seq that preserves information about transcriptional direction. Library concentrations were estimated using tapestation and Qubit assays, pooled and sequenced on a Next-seq instrument (Illumina Hi-Seq 4000) using 50 cycles paired-end mode.
Hi-C
Hi-C was performed in duplicates using 1 million cells each. mESCs were dissociated using TrypLE, washed in PBS and fixed in 1% formaldehyde for 10 min at room temperature. Quenching was performed by adding glycine to a final of 0.125 M followed by incubations of 5 min at room temperature and 15 min at 4°C. Hi-C samples were processed using the Arima Hi-C kit as per the manufacturer’s protocol and sequenced with Illumina NovaSeq 6000 using 50 cycles paired-end mode.
QUANTIFICATION AND STATISTICAL ANALYSIS
ChIP-seq data processing and quality control
Reads were aligned to GRCm38/mm10 genome with Bowtie2 [62] (parameters: –no-discordant -p 12–no-mixed -N 1 -X 2000). Ambiguous reads were filtered to use uniquely mapped reads in the downstream analysis. PCR duplicates were removed using Picard-tools (version 1.88). For FLAG and RAD21 ChIP-seq, MACS version 1.4.2 [63] was used to call peaks (parameters: -g 1.87e9 --qvalue 0.05 for FLAG; --broad -q 0.05 for RAD21). Bigwigs were obtained for visualization on individual as well as merged bam files using Deeptools/2.3.3 [64] (parameters: bamCoverage --binSize 1 -- normalizeUsing RPKM). Heatmaps and average profiles were performed on merged bigwig files using Deeptools/2.3.3. We also used DiffBind package [65] to cluster the samples and generate heatmaps (Parameters: summits=250).
RNA-seq data processing and quality control
Raw sequencing files were aligned against the mouse reference genome (GRCm38/mm10) using the STAR [66] aligner (v.2.6) and differentially expressed genes were called using DESeq2 [67] with an adjusted p-value of 0.01 and a fold change cutoff of 1. Venn diagrams were generated using the ‘eulerr’[68] library in R package. We obtained a list of mouse TSS coordinates from the Ensembl database (GRCm38.p6 - release 98) [69] that was used in the downstream analyses.
Hi-C Processing and Quality Control
Hi-C-Bench [70] was used to align and filter the Hi-C data and identify TADs. To generate Hi-C filtered contact matrices, the Hi-C reads were aligned against the mouse reference genome (GRCm38/mm10) by bowtie2 (version 2.3.1). Mapped read pairs were filtered by the GenomicTools [71] tools-hic filter command integrated in HiC-bench for known artifacts of the Hi-C protocol. The filtered reads include multi-mapped reads (‘multihit’), read-pairs with only one mappable read (‘single sided’), duplicated read-pairs (‘ds.duplicate’), low mapping quality reads (MAPQ < 30), read-pairs resulting from self-ligated fragments, and short-range interactions resulting from read-pairs aligning within 25kb (‘ds.filtered’). For the downstream analyses, all the accepted intra-chromosomal read-pairs (‘ds.accepted intra’) were used. The total numbers of reads in the 2 biological replicates for each condition ranged from ∼120 million reads to ∼260 million. The percentage of reads aligned was always over 97% in all samples. The proportion of accepted reads (‘ds-accepted-intra’ and ‘ds-accepted-inter’) was ∼40%, which in all cases was sufficient to annotate TADs with HiC-Bench.
DOWNSTREAM ANALYSIS
Annotation of ChIP peak sets and motif analysis
To obtain a peak set per condition we first merged the peaks in each replicate (overlap ≥ 1 bp) and then only the peaks present in both replicates were considered (overlap ≥ 1 bp). ‘CTCF-only’ sites correspond to peaks present in the FLAG ChIP seq of CTCF (ID) peak set and absent in the CTCFL (D) set. The ‘CTCF and CTCFL sites’ has the peaks that were found in both, CTCF (ID) and CTCFL (D) peak sets. ‘CTCFL-only’ sites correspond to peaks present in the CTCFL (D) peak set and absent in the CTCF (ID) set. A peak was considered present in two conditions when the peak overlap was higher than 66% for both peaks. We used the ChIPSeeker [72] library to annotate the peak sets obtained. Annotation packages: ‘TxDb.Mmusculus.UCSC.mm10.knownGene’ and ‘org.Mm.eg.db’ (Bioconductor). Promoters were defined as ± 3 kilobases from the transcription start site. Venn diagrams were generated using the ‘bedr’ library [72] in R package. The MEME-ChIP tool from the MEME suite [73, 74] was used to detect motifs in the peak sets.
Compartments, TADs and Boundaries
Compartment Analysis
Compartment analysis was carried out using the HOMER [75] pipeline (v4.6). Hi-C filtered matrices were given as input together with ATAC-seq peaks for compartment prediction (default parameters: 50 kb resolution). HOMER was used to perform a principal component analysis of the normalized interaction matrices and then, we used the PCA1 component to predict regions of active (A compartments) and inactive chromatin (B compartments), and to generate the eigenvalues bedgraph files of each condition. HOMER assumes that gene-rich regions with active chromatin marks have similar PC1 values, while gene deserts show differing PC1 values.
Domain boundary Insulation Scores
The Hi-C filtered contact matrices were corrected using the ICE “correction” algorithm [76] built into HiC-bench. Chromatin domains and boundaries were called using Crane [77] at 40 kb bin resolution with an insulating window of 500 kb. We also called domains using the Hicratio algorithm [70] at 40 kb resolution. Hi-C heatmaps for regions of interest were generated in Juicebox [78].
To assess and compare boundary strength alteration across all the conditions, we included the calculation of the Mean Boundary Score (MBS) for every boundary identified in the CTCFL (U) condition (reference boundaries). We calculated the MBS as the arithmetic mean of all the ‘ratio’ insulation scores that fall inside the reference boundary coordinates being assessed. As explained in Lazaris et al. (Lazaris et al., 2017), HiC-Bench computes one ratio score per bin (40 kb), as a result, there are multiple ratio scores per boundary coordinate identified by the Crane algorithm.
Loop Analysis
Loops were annotated for all conditions using HiCCUPS [79]. Loops were called at 25 kb resolution using default parameters (KR-normalization). We also assessed chromatin loops by using the aggregate peak analysis (APA) at 10 kb resolution (-r 10000 -k KR).
Availability of data and materials
All raw and processed sequencing data files are deposited at NCBI’s Gene Expression Omnibus and will be available to public on publication of the manuscript.
Ethics approval and consent to participate
Not applicable
Competing Interests
The authors declare no competing interests.
Funding
This work was supported by 1R35GM122515 (J.S) and NIH P01CA229086 (J.S). N.M was supported National Cancer Center and A.T. by the American Cancer Society (RSG-15-189-01-RMC) and St. Baldrick’s foundation (581357).
Author’ contributions
Conceptualization & Study Design, N.M, J.S; Formal analysis, J-R.H, S.B; Writing – Original Draft, N.M and J.S; Supervision, A.T and J.S
Acknowledgements
The authors thank Skok lab members for helpful scientific discussions, New York University School of Medicine High Performance Computing Facility (HPCF) for computing technical support, Adriana Heguy and the Genome Technology Center (GTC) core for sequencing efforts, Applied Bioinformatics Laboratories (ABL) for providing bioinformatics support and helping with the analysis and interpretation of the data and the NYU Flow Cytometry and Cell Sorting Center for FACS analysis and sorting. GTC and ABL are shared resources partially supported by the Cancer Center Support Grant P30CA016087 at the Laura and Isaac Perlmutter Cancer Center.