Discovery of directional chromatin-associated regulatory motifs affecting human gene transcription

orientation exist in >60% neighboring TAD boundaries (Figure S4D), suggesting that the boundary reverse-forward CBS pairs play an important role in the formation of most of TADs. For example, there is a CBS pair in the reverse-forward orientation in a Chr12 genomic region of H1-hESC cells, located at or very close to each of the six TAD boundaries (boundaries 1–6), except for boundary 5, which has only one closely located CBS in the forward orientation (Figure S4E). These data, taken together, strongly suggest that directional binding of CTCF to boundary CBS pairs in the reverse-forward orientations causes opposite topological looping and thus appears to function as insulators.

orientation exist in >60% neighboring TAD boundaries (Figure S4D), suggesting that the boundary reverse-forward CBS pairs play an important role in the formation of most of TADs. For example, there is a CBS pair in the reverse-forward orientation in a Chr12 genomic region of H1-hESC cells, located at or very close to each of the six TAD boundaries (boundaries 1-6), except for boundary 5, which has only one closely located CBS in the forward orientation ( Figure S4E). These data, taken together, strongly suggest that directional binding of CTCF to boundary CBS pairs in the reverse-forward orientations causes opposite topological looping and thus appears to function as insulators.

The Human b-globin Locus Provides an Additional Example of CBS Orientation-Dependent Topological Chromatin Looping
Based on the location and orientation of CBSs, as well as their CTCF/cohesin occupancy, we identified four CCDs (domains 1-4) in the well-characterized b-globin cluster ( Figure 5A). The b-globin gene cluster is located between CBS3 (5 0 HS5) and CBS4 (3 0 HS1) in domain1 ( Figure 5A) (Hou et al., 2010;Splinter et al., 2006). We generated a series of CBS4/5 mutant K562 cell lines using CRISPR/Cas9 with one or two sgRNAs (Li et al., (E) Distribution of genome-wide orientation configurations of CBS pairs located in the boundaries between two neighboring domains in the human K562 genome. Note that the vast majority (90.0%) of boundary CBS pairs between two neighboring domains are in the reverse-forward orientation. See also Figure S4 and Tables S1, S2, S3, S4, S5, and S6.
2015) ( Figures S2B and S2C). In the CRISPR cell lines D3, D7, and D19 (out of 38 clones screened) in which the internal CBS4 (3 0 HS1) was deleted (Figure S2B), chromatin-looping interactions between CBS3 (5 0 HS5) in the forward orientation and the boundary CBS5 in the reverse orientation in domain1 persisted, although its interaction with the CBS4 (3 0 HS1) region was abolished ( Figures S5A and S5B). As expected, the interactions between CBS6/7 and CBS8/9 in domain2 were unchanged ( Figure S5C). Strikingly, however, in the CBS4 (3 0 HS1) and CBS5 double-knockout CRISPR cell lines C2, C4, and C14 (out of 49 clones screened) ( Figure S2C), novel chromatin-looping interactions between CBS3 (5 0 HS5) in the forward orientation of domain1 and CBS8/9 in the reverse orientation of the neighboring domain2 were observed, suggesting that these two domains merge as a single domain in CRISPR cell lines with CBS4/5 double knockout ( Figure S5B). Similarly, when CBS8 was used as an anchor, this reverse-oriented CBS in domain2 establishes new long-range chromatin-looping interactions with CBS1-3 in the forward orientation of domain1 in the CBS4/5 double-deletion CRISPR cell lines ( Figure S5C). We conclude that cross-domain interactions can be established after deletion of CBSs up to the boundary of topological domains, but not after deletion of the internal CBS in the b-globin locus.
To further test the functional significance of this organization of CBSs, we again performed CRISPR/cas9-mediated DNAfragment editing in the HEK293T cells and screened 198 CRISPR Expression level of putative transcriptional target genes orientation exist in >60% neighboring TAD boundaries (Figure S4D), suggesting that the boundary reverse-forward CBS pairs play an important role in the formation of most of TADs. For example, there is a CBS pair in the reverse-forward orientation in a Chr12 genomic region of H1-hESC cells, located at or very close to each of the six TAD boundaries (boundaries 1-6), except for boundary 5, which has only one closely located CBS in the forward orientation ( Figure S4E). These data, taken together, strongly suggest that directional binding of CTCF to boundary CBS pairs in the reverse-forward orientations causes opposite topological looping and thus appears to function as insulators.

The Human b-globin Locus Provides an Additional Example of CBS Orientation-Dependent Topological Chromatin Looping
Based on the location and orientation of CBSs, as well as their CTCF/cohesin occupancy, we identified four CCDs (domains 1-4) in the well-characterized b-globin cluster ( Figure 5A). The b-globin gene cluster is located between CBS3 (5 0 HS5) and CBS4 (3 0 HS1) in domain1 ( Figure 5A) (Hou et al., 2010;Splinter et al., 2006). We generated a series of CBS4/5 mutant K562 cell lines using CRISPR/Cas9 with one or two sgRNAs (Li et al., (E) Distribution of genome-wide orientation configurations of CBS pairs located in the boundaries between two neighboring domains in the human K562 genome. Note that the vast majority (90.0%) of boundary CBS pairs between two neighboring domains are in the reverse-forward orientation. See also Figure S4 and Tables S1, S2, S3, S4, S5, and S6. 2015) (Figures S2B and S2C). In the CRISPR cell lines D3, D7, and D19 (out of 38 clones screened) in which the internal CBS4 (3 0 HS1) was deleted (Figure S2B), chromatin-looping interactions between CBS3 (5 0 HS5) in the forward orientation and the boundary CBS5 in the reverse orientation in domain1 persisted, although its interaction with the CBS4 (3 0 HS1) region was abolished (Figures S5A and S5B). As expected, the interactions between CBS6/7 and CBS8/9 in domain2 were unchanged ( Figure S5C). Strikingly, however, in the CBS4 (3 0 HS1) and CBS5 double-knockout CRISPR cell lines C2, C4, and C14 (out of 49 clones screened) ( Figure S2C), novel chromatin-looping interactions between CBS3 (5 0 HS5) in the forward orientation of domain1 and CBS8/9 in the reverse orientation of the neighboring domain2 were observed, suggesting that these two domains merge as a single domain in CRISPR cell lines with CBS4/5 double knockout ( Figure S5B). Similarly, when CBS8 was used as an anchor, this reverse-oriented CBS in domain2 establishes new long-range chromatin-looping interactions with CBS1-3 in the forward orientation of domain1 in the CBS4/5 double-deletion CRISPR cell lines ( Figure S5C). We conclude that cross-domain interactions can be established after deletion of CBSs up to the boundary of topological domains, but not after deletion of the internal CBS in the b-globin locus.
To further test the functional significance of this organization of CBSs, we again performed CRISPR/cas9-mediated DNAfragment editing in the HEK293T cells and screened 198 CRISPR 906 Cell 162, 900-910, August 13, 2015 ª2015 Elsevier Inc.

Forward
Reverse Orientation of CTCF-binding Sites CTCF-mediated DNA Looping Promoter The expression level was higher using forward-reverse orientation of CTCF binding sites.
This suggests that the prediction of target genes is more accurate using CTCF binding sites than the others.
Regulatory motifs overlapping H3K27ac histone modification (enhancer and promoter mark) showed high expression level of target genes, suggesting that they are associated with enhancer and promoter.
Forward-reverse Reverse-forward C ell identity is defined by a unique gene expression program as well as a characteristic epigenetic landscape and threedimensional (3D) chromatin topology, features that are constantly supervised by a set of transcription factors known as master regulators 1,2 . Although the ability of master regulators to maintain and change cell identity is well established, the underlying molecular mechanisms remain poorly understood.

Chromatin-associated regulatory motifs
Somatic cell reprogramming into induced pluripotent stem cells (iPSCs) by OCT4, KLF4, SOX2 and cMYC (OKSM) offers a tractable system to study the transcription-factor-driven mechanisms of cell-fate determination 3,4 . The transcriptional and epigenetic changes induced by OKSM expression that result in the erasure of somatic identity and the establishment of pluripotency have been extensively described [5][6][7][8][9][10][11][12][13][14] . Recent studies utilizing targeted or global chromatin conformation capture techniques revealed local or largescale reorganization of the 3D genomic architecture between somatic and pluripotent stem cells (PSCs) 15-21 and a strong association with OKSM binding [15][16][17]21 , suggesting a potential architectural role of reprogramming transcription factors. The architectural function of KLF4 is further supported by the observations that KLF4 depletion abrogates long-range chromatin contacts at specific genomic loci, such as the Pou5f1 (Oct4) locus in mouse PSCs 18 and the HOPX gene in human epidermal keratinocytes 22 . In addition, depletion of the related factor KLF1 disrupts select long-range interactions in the context of erythropoiesis 23,24 . To test whether OKSM, and in particular KLF4, may orchestrate chromatin architectural changes in a genome-wide manner, we captured the dynamic KLF4-centric topological reorganization and associated molecular alterations during the course of reprogramming mouse embryonic fibroblasts (MEFs) into iPSCs (Fig. 1a,top). Integrative analysis of our results generated a reference map of stage-specific chromatin changes around KLF4bound loci and established strong links with enhancer rewiring and concordant transcriptional changes. Inducible depletion of KLF factors in PSCs and genetic disruption of KLF4 binding sites within specific PSC enhancers further supported the function of KLF4 as both a transcriptional regulator and a chromatin organizer.

KLF4 binding during reprogramming induces chromatin opening and precedes enhancer and gene activation.
We mapped the genome-wide KLF4 binding at different stages of reprogramming using 'reprogrammable' MEFs induced with doxycycline (dox) 25 in the presence of ascorbic acid (Fig. 1a, bottom). Under these conditions the resulting iPSCs are molecularly and functionally indistinguishable from embryonic stem cells (ESCs) 26,27 and we used either cell type (referred to as PSCs) as reference points for established pluripotency. Bulk populations were used for the early stages, whereas we sorted SSEA1 + cells for the mid and late KLF4 is involved in the organization and regulation of pluripotency-associated three-dimensional enhancer networks Dafne Campigli Di Giammartino 1,9 , Andreas Kloetgen 2,9 , Alexander Polyzos 1,9 , Yiyuan Liu 1,9 , Daleum Kim 1 , Dylan Murphy 1 , Abderhman Abuhashem 1,3,4 , Paola Cavaliere 5 , Boaz Aronson 1 , Veevek Shah 1 , Noah Dephoure 5 , Matthias Stadtfeld 1,6 , Aristotelis Tsirigos 2,7,8 * and Effie Apostolou 1 * Cell fate transitions are accompanied by global transcriptional, epigenetic and topological changes driven by transcription factors, as is exemplified by reprogramming somatic cells to pluripotent stem cells through the expression of OCT4, KLF4, SOX2 and cMYC. How transcription factors orchestrate the complex molecular changes around their target gene loci remains incompletely understood. Here, using KLF4 as a paradigm, we provide a transcription-factor-centric view of chromatin reorganization and its association with three-dimensional enhancer rewiring and transcriptional changes during the reprogramming of mouse embryonic fibroblasts to pluripotent stem cells. Inducible depletion of KLF factors in PSCs caused a genome-wide decrease in enhancer connectivity, whereas disruption of individual KLF4 binding sites within pluripotent-stem-cell-specific enhancers was sufficient to impair enhancer-promoter contacts and reduce the expression of associated genes. Our study provides an integrative view of the complex activities of a lineage-specifying transcription factor and offers novel insights into the nature of the molecular events that follow transcription factor binding. C ell identity is defined by a unique gene expression program as well as a characteristic epigenetic landscape and threedimensional (3D) chromatin topology, features that are constantly supervised by a set of transcription factors known as master regulators 1,2 . Although the ability of master regulators to maintain and change cell identity is well established, the underlying molecular mechanisms remain poorly understood.
Somatic cell reprogramming into induced pluripotent stem cells (iPSCs) by OCT4, KLF4, SOX2 and cMYC (OKSM) offers a tractable system to study the transcription-factor-driven mechanisms of cell-fate determination 3,4 . The transcriptional and epigenetic changes induced by OKSM expression that result in the erasure of somatic identity and the establishment of pluripotency have been extensively described [5][6][7][8][9][10][11][12][13][14] . Recent studies utilizing targeted or global chromatin conformation capture techniques revealed local or largescale reorganization of the 3D genomic architecture between somatic and pluripotent stem cells (PSCs) 15-21 and a strong association with OKSM binding [15][16][17]21 , suggesting a potential architectural role of reprogramming transcription factors. The architectural function of KLF4 is further supported by the observations that KLF4 depletion abrogates long-range chromatin contacts at specific genomic loci, such as the Pou5f1 (Oct4) locus in mouse PSCs 18 and the HOPX gene in human epidermal keratinocytes 22 . In addition, depletion of the related factor KLF1 disrupts select long-range interactions in the context of erythropoiesis 23,24 . To test whether OKSM, and in particular KLF4, may orchestrate chromatin architectural changes in a genome-wide manner, we captured the dynamic KLF4-centric topological reorganization and associated molecular alterations during the course of reprogramming mouse embryonic fibroblasts (MEFs) into iPSCs (Fig. 1a, top). Integrative analysis of our results generated a reference map of stage-specific chromatin changes around KLF4bound loci and established strong links with enhancer rewiring and concordant transcriptional changes. Inducible depletion of KLF factors in PSCs and genetic disruption of KLF4 binding sites within specific PSC enhancers further supported the function of KLF4 as both a transcriptional regulator and a chromatin organizer.

KLF4 binding during reprogramming induces chromatin opening and precedes enhancer and gene activation.
We mapped the genome-wide KLF4 binding at different stages of reprogramming using 'reprogrammable' MEFs induced with doxycycline (dox) 25 in the presence of ascorbic acid (Fig. 1a, bottom). Under these conditions the resulting iPSCs are molecularly and functionally indistinguishable from embryonic stem cells (ESCs) 26,27 and we used either cell type (referred to as PSCs) as reference points for established pluripotency. Bulk populations were used for the early stages, whereas we sorted SSEA1 + cells for the mid and late KLF4 is involved in the organization and regulation of pluripotency-associated three-dimensional enhancer networks Dafne Campigli Di Giammartino 1,9 , Andreas Kloetgen 2,9 , Alexander Polyzos 1,9 , Yiyuan Liu 1,9 , Daleum Kim 1 , Dylan Murphy 1 , Abderhman Abuhashem 1,3,4 , Paola Cavaliere 5 , Boaz Aronson 1 , Veevek Shah 1 , Noah Dephoure 5 , Matthias Stadtfeld 1,6 , Aristotelis Tsirigos 2,7,8 * and Effie Apostolou 1 * Cell fate transitions are accompanied by global transcriptional, epigenetic and topological changes driven by transcription factors, as is exemplified by reprogramming somatic cells to pluripotent stem cells through the expression of OCT4, KLF4, SOX2 and cMYC. How transcription factors orchestrate the complex molecular changes around their target gene loci remains incompletely understood. Here, using KLF4 as a paradigm, we provide a transcription-factor-centric view of chromatin reorganization and its association with three-dimensional enhancer rewiring and transcriptional changes during the reprogramming of mouse embryonic fibroblasts to pluripotent stem cells. Inducible depletion of KLF factors in PSCs caused a genome-wide decrease in enhancer connectivity, whereas disruption of individual KLF4 binding sites within pluripotent-stem-cell-specific enhancers was sufficient to impair enhancer-promoter contacts and reduce the expression of associated genes. Our study provides an integrative view of the complex activities of a lineage-specifying transcription factor and offers novel insights into the nature of the molecular events that follow transcription factor binding. Many other TFs were also known to have chromatin-associated functions such as chromatin remodeling, chromatin accessibility, pioneer factor, and/or histone modification.
These findings contribute to the study of chromatin-associated motifs involved in transcriptional regulation, chromatin interactions, regulation of chromatin, and histone modifications. Association rule 4 shortened at forwardreverse orientation of CTCF binding cites showed the most significant difference of distribution of expression level of target genes.
Enhancers affect the expression level of genes significantly.

Comparison of expression level of putative target genes of each TF between promoter and enhancer-promoter association domain
Median expression level of transcriptional target genes predicted using TFBS in promoters Median expression level of transcriptional target genes predicted using TFBS in enhancer-promoter association shortened at forward-reverse orientation of CTCF binding sites Median expression level of transcriptional target genes predicted using TFBS in enhancer-promoter association shortened at CTCF binding sites without considering their orientation Median expression level of transcriptional target genes predicted using TFBS in enhancer-promoter association (Association rule 4)     (E) Distribution of genome-wide orientation configurations of CBS pairs located in the boundaries between two neighboring domains in the human K562 genome. Note that the vast majority (90.0%) of boundary CBS pairs between two neighboring domains are in the reverse-forward orientation. See also Figure S4 and Tables S1, S2, S3, S4, S5, and S6.
2015) ( Figures S2B and S2C). In the CRISPR cell lines D3, D7, and D19 (out of 38 clones screened) in which the internal CBS4 (3 0 HS1) was deleted (Figure S2B), chromatin-looping interactions between CBS3 (5 0 HS5) in the forward orientation and the boundary CBS5 in the reverse orientation in domain1 persisted, although its interaction with the CBS4 (3 0 HS1) region was abolished ( Figures S5A and S5B). As expected, the interactions between CBS6/7 and CBS8/9 in domain2 were unchanged ( Figure S5C) continued (7 cell types) Enhancer-promoter association (EPA) shortened at CTCF binding sites increased the normalized numbers of functional enrichments. Forwardreverse orientation of CTCF binding sites are important to form chromatin interaction loop and increased functional enrichments of putative transcriptional target genes in seven cell types.
The table show the number of biased orientations of DNA motifs, where a significantly higher ratio of EPIs that were predicted based on an enhancer-promoter association (EPA) domain (i) overlapped with HiChIP chromatin interactions than the other types of EPA domain (ii) and (iii) in T cells.
Or type your manuscript ID here: 10.1101/ A portable graphical link to your paper (QR code) is pro example, on a poster, your paper can be accessed thro Right click on the image below to save it to your device.

QR Code for 10.1101/290825
Discovery of biased orientation of human DNA motif s interactions and transcription of genes bioRxiv QR Code Image Generato Known transcription factors (TFs) associated with chromatin interactions ARTICLES Cohesin mediates transcriptional insulation by CCCTC-binding factor Kerstin S. Wendt 1 *, Keisuke Yoshida 2 *, Takehiko Itoh 3 *, Masashige Bando 2 , Birgit Koch 1 , Erika Schirghuber 1 , Shuichi Tsutsumi 4 , Genta Nagae 4 , Ko Ishihara 6 , Tsuyoshi Mishiro 6 , Kazuhide Yahata 5 , Fumio Imamoto 5 , Hiroyuki Aburatani 4 , Mitsuyoshi Nakao 6 , Naoko Imamoto 7 , Kazuhiro Maeshima 7 , Katsuhiko Shirahige 2 & Jan-Michael Peters 1 Cohesin complexes mediate sister-chromatid cohesion in dividing cells but may also contribute to gene regulation in postmitotic cells. How cohesin regulates gene expression is not known. Here we describe cohesin-binding sites in the human genome and show that most of these are associated with the CCCTC-binding factor (CTCF), a zinc-finger protein required for transcriptional insulation. CTCF is dispensable for cohesin loading onto DNA, but is needed to enrich cohesin at specific binding sites. Cohesin enables CTCF to insulate promoters from distant enhancers and controls transcription at the H19/IGF2 (insulin-like growth factor 2) locus. This role of cohesin seems to be independent of its role in cohesion. We propose that cohesin functions as a transcriptional insulator, and speculate that subtle deficiencies in this function contribute to 'cohesinopathies' such as Cornelia de Lange syndrome.
In proliferating cells, cohesin complexes physically connect replicated DNA molecules ('sister chromatids') from S phase until the subsequent anaphase of mitosis or meiosis. This sister-chromatid cohesion is essential for chromosome segregation and for DNA damage repair. Cohesin is composed of four core subunits, called SMC1, SMC3, SCC1 (also known as MDC1 and RAD21) and SCC3 (also known as SA2 and STAG2) 1-3 . These proteins have been proposed to mediate cohesion by embracing sister chromatids as a ring 4 .
The essential role of cohesin in cohesion is well established, but evidence obtained in yeast and different animal species implies that cohesin also contributes to gene regulation, chromatin structure and development 5-13 . Furthermore, certain human diseases have been linked to hypomorphic mutations in cohesin and in proteins that regulate cohesin. Cornelia de Lange syndrome (CdLS) is characterized by growth and mental retardation, craniofacial anomalies and microcephaly. This disease can be caused by mutations in a protein that is required to load cohesin onto DNA, called SCC2 (also known as NIPBL and delangin), or by mutations in SMC1 or SMC3 (refs 14-17). Roberts/SC phocomelia syndrome (RBS/SC, OMIM 26900) is a related disease that has been linked to mutations in ESCO2, a protein implicated in the establishment of cohesion 18,19 . Surprisingly, most of these mutations do not cause obvious defects in cell proliferation, implying that the resulting developmental abnormalities are not caused by defects in cohesion but reflect a distinct, yet unknown, function of cohesion proteins.
Cohesin is expressed in differentiated postmitotic cells In vertebrates, cohesin binds to chromatin at the end of mitosis, long before cohesion is established in the next cell cycle 3,20,21 . This suggests that cohesin may also have a function on unreplicated DNA, independent of its role in cohesion. To explore this possibility, we first tested if cohesin is also expressed in postmitotic cells, which lack cohesion. Immunoblotting experiments identified SCC1 and SMC1 in numerous mouse tissues, including brain (Fig. 1a). SMC3 antibodies immunoprecipitated cohesin complexes from brain extracts (Fig. 1b), and by immunofluorescence microscopy (IFM), SCC1 staining was observed in the nuclei of neurons ( Fig. 1c and Supplementary Fig. 1). PDS5B, a protein that is associated with cohesin, has recently also been detected in mouse neurons 13 . Cohesin is therefore *These authors contributed equally to this work.  were analysed for cohesin and CTCF expression by SDS-polyacrylamide gel electrophoresis (SDS-PAGE) and immunoblotting with the indicated antibodies. WB, western blot. b, SMC3 immunoprecipitates (IP) obtained from mouse brain and HeLa interphase extracts were analysed by SDS-PAGE and immunoblotting with the indicated antibodies. c, Frozen thin sections from mouse brain cortex were co-stained for SCC1, the neuronal marker NeuN and DNA (4,6-diamidino-2-phenylindole, DAPI). Chromatin interactions connect distal regulatory elements to target gene promoters guiding stimulus-and lineage-specific transcription. Few factors securing chromatin interactions have so far been identified. Here, by integrating chromatin interaction maps with the large collection of transcription factor-binding profiles provided by the ENCODE project, we demonstrate that the zinc-finger protein ZNF143 preferentially occupies anchors of chromatin interactions connecting promoters with distal regulatory elements. It binds directly to promoters and associates with lineage-specific chromatin interactions and gene expression. Silencing ZNF143 or modulating its DNA-binding affinity using single-nucleotide polymorphisms (SNPs) as a surrogate of site-directed mutagenesis reveals the sequence dependency of chromatin interactions at gene promoters. We also find that chromatin interactions alone do not regulate gene expression. Together, our results identify ZNF143 as a novel chromatin-looping factor that contributes to the architectural foundation of the genome by providing sequence specificity at promoters connected with distal regulatory elements. Insulators are regulatory elements that help to organize eukaryotic chromatin via enhancer-blocking and chromatin barrier activity. Although there are several examples of transposable element (TE)-derived insulators, the contribution of TEs to human insulators has not been systematically explored. Mammalian-wide interspersed repeats (MIRs) are a conserved family of TEs that have substantial regulatory capacity and share sequence characteristics with tRNA-related insulators. We sought to evaluate whether MIRs can serve as insulators in the human genome. We applied a bioinformatic screen using genome sequence and functional genomic data from CD4 + T cells to identify a set of 1,178 predicted MIR insulators genome-wide. These predicted MIR insulators were computationally tested to serve as chromatin barriers and regulators of gene expression in CD4 + T cells. The activity of predicted MIR insulators was experimentally validated using in vitro and in vivo enhancer-blocking assays. MIR insulators are enriched around genes of the T-cell receptor pathway and reside at T-cell-specific boundaries of repressive and active chromatin. A total of 58% of the MIR insulators predicted here show evidence of T-cell-specific chromatin barrier and gene regulatory activity. MIR insulators appear to be CCCTC-binding factor (CTCF) independent and show a distinct local chromatin environment with marked peaks for RNA Pol III and a number of histone modifications, suggesting that MIR insulators recruit transcriptional complexes and chromatin modifying enzymes in situ to help establish chromatin and regulatory domains in the human genome. The provisioning of insulators by MIRs across the human genome suggests a specific mechanism by which TE sequences can be used to modulate gene regulatory networks.
transposable elements | insulators | chromatin | gene regulation | genomics I nsulators are regulatory sequence elements that help to organize eukaryotic chromatin into functionally distinct domains (1, 2). Insulators can encode two different functions: enhancer-blocking activity and chromatin barrier activity. Enhancer-blocking insulators prevent the interaction of enhancer and promoter elements located in distinct domains, and chromatin barrier insulators, also known as boundary elements (3, 4), protect active chromatin domains by blocking the spread of repressive chromatin. These two functional roles are not mutually exclusive; compound insulators may encode both enhancer-blocking and chromatin barrier activities (5).
Transposable element sequences are known to provide a variety of regulatory sequences to eukaryotic genomes (6), and there are several examples of transposable element (TE)-derived insulators. The best studied TE insulator comes from the Drosophila gypsy element (7-10). Gypsy is a long terminal repeat retrotransposon that contains an insulator sequence in its 5′ untranslated region. The gypsy insulator interacts with the suppressor of hairy wing [su(Hw)] and modifier of mdg4 [mod(mdg4)] proteins to block regulatory interactions between distal enhancer and proximal promoter sequences. This same insulator can also protect transgenes from position effects, indicating that it encodes chromatin barrier activity as well.
More recently, TE-derived insulator sequences have been discovered in mammalian genomes. The short interspersed nuclear element (SINE) B1 has insulator activity that is mediated by the binding of specific transcription factors along with the insulator associated protein CCCTC-binding factor (CTCF) (11). A genomewide analysis of CTCF binding sites in the human and mouse genomes discovered that many CTCF binding sites are derived from TE sequences (12), and a survey of six mammalian species revealed that lineage-specific expansions of retrotransposons have contributed numerous CTCF binding sites to their genomes (13). A number of these TE-derived CTCF binding sites in the mouse and rat genomes are capable of segregating domains enriched or depleted for acetylation of histone 2A lysine 5 (H2AK5ac), suggesting that they may encode insulator function. Interestingly, this same analysis did not detect retrotransposon-driven expansion of CTCF binding sites in the human genome (13).
Whereas subsets of CTCF binding sites are known to be associated with insulators, numerous insulators can function in a CTCF-independent manner. An important example comes from a mouse TE, the SINE B2 element, which serves as a developmentally regulated compound insulator, encoding both enhancer-blocking and chromatin barrier activity, at the growth hormone locus (14). B2 is a tRNA-derived SINE that encodes the B-box promoter element, which is bound by RNA polymerase III (RNA Pol III). The connection to tRNAs/Pol III binding is intriguing, given the fact that

Significance
Insulators are genome sequence elements that help to organize eukaryotic genomes into coherent regulatory domains. Insulators can encode both enhancer-blocking activity, which prevents the interaction between enhancers and promoters located in distinct regulatory domains, and/or chromatin barrier activity that helps to delineate active and repressive chromatin domains. The origins and functional characteristics of insulator sequence elements are important, open questions in molecular biology and genomics. This report provides insight into these questions by demonstrating the origins of a number of human insulator sequences from a family of transposableelement-derived repetitive sequence elements: mammalianwide interspersed repeats (MIRs). Human MIR-derived insulators are characterized by distinct sequence, expression, and chromatin features that provide clues as to their potential mechanisms of action.   to the fourth position of the ZNF143 DNA recognition sequence (motif 1; Fig. 4a) the most prominent motif found within B85% of the top 500 sites. The rs13228237 SNP changes the fourteenth position of a reported extension of a ZNF143 DNA recognition sequence 22,38 (motif 2; Fig. 4b), which is found within B25% of the top 500 sites. Consistent with the observation that the actual ZNF143-binding sites are located at gene promoters, B43% and B76% of gene promoters ( ± 2.5 kb of the transcription start site) bound by ZNF143 were found to contain motif 1 or motif 2 (motif P values o1 Â 10 À 4 ) in GM12878 cells, respectively. Interesting, motif 2 appears to be the most prominent ZNF143 motif found at gene promoters and most closely resembles the ZNF143 motif characterized using in vitro methods 22,39 . The imposed changes to the DNA sequence based on the positionweighted matrix predict preferential binding of ZNF143 to the reference A and the variant C allele of the rs2232015 and rs13228237 SNPs, respectively, compared with the other alleles ( Fig. 4a,b). In agreement, 242 reads from the ZNF143 ChIP-seq data, mapping to the rs2232015 SNP, contain the reference A allele and 136 reads contain the variant T allele (P ¼ 5.47 Â 10 À 8 ; Fig. 4c). Likewise, of the 25 reads mapping to the rs13228237 SNP, five contain the reference G allele and 20 contain the variant C allele (P ¼ 4.08x10 À 3 ; Fig. 4d). Importantly, the signal intensity of the ZNF143-binding site containing the rs13228237 SNP is high (n ¼ 175) indicating that this SNP falls within the centre of the inferred ZNF143-binding site and between the positive and negative strand peaks of the unprocessed ChIP-Seq reads (Fig. 4d). Allele-specific ChIP-quantitative PCR (qPCR) assays against ZNF143 in GM12878 cells validated the predicted allelic imbalance for both SNPs (Fig. 4e,f and Supplementary Fig. 3). Consistent with ZNF143 being directly responsible for chromatin loop formation, the decreased binding of ZNF143 to the chromatin caused by the variant allele at the rs2232015 SNP leads to a corresponding allele-specific reduction of the chromatin interaction frequency measured by 3C assays between the PRMT6 promoter and a distal regulatory element B85 kb away ( Fig. 4e and Supplementary Fig. 3). Interestingly, the rs2232015 SNP modulates a portion of the ZNF143 recognition motif that is shared with THAP11 and recently shown in vitro to be dispensable for ZNF143 binding 22 . These results, while revealing that ZNF143 is required, may indicate that a complex of factors specify chromatin interactions. Similarly, the increased binding of ZNF143 to the chromatin caused by the variant C allele of the rs13228237 SNP leads to an increase in the chromatin interaction frequency between the first intron of the ZC3HAV1 gene and two distal regulatory elements located B200 kb away ( Fig. 4f and Supplementary Fig. 3). Interestingly, this ZNF143binding site is located B14 kb from the transcription start site of the ZC3HAV1 gene and may represent an unknown isoform of ZC3HAV1 gene. Consistently, a transcription start site was predicted from 5 0 cap analysis of gene expression data 89 bp from the rs13228237 in GM12878 by the ENCODE project ( Supplementary Fig. 4). Expression quantitative trait loci (eQTL) analysis of the rs2232015 and rs13228237 SNPs using RNA-Seq data from lymphoblastoid cells (n ¼ 373) (ref. 40) genotyped as part of the 1,000 Genomes Project 41 reveals that the ZC3HAV1 expression is modulated by the rs13228237 SNP in lymphoblastoid cells (P ¼ 1.73 Â 10 À 3 ; Fig. 4f). However, the rs2232015 SNP is not significantly associated with the expression of the PRMT6 gene (P ¼ 0.063; Fig. 4e). This coincides with a repressed element and poised promoter chromatin state at the distal regulatory element looping to the PRMT6 promoter in the GM12878 cells ( Supplementary Fig. 5), which contrasts with the active state at regulatory elements looping to the ZC3HAV1 promoter ( Supplementary Fig. 5). Interestingly, the rs2232015 SNP is in strong linkage disequilibrium (r 2 Z0.95) with two reported eQTLs captured by the rs1762509 and rs9435441 SNPs 42,43 . The rs1762509 and rs9435441 SNPs lead to allelespecific expression of the PRMT6 gene within the liver cells and monocytes, respectively 42,43 . Consistently, the interacting distal regulatory element looping to the PRMT6 promoter is in an active state within liver cells ( Supplementary Fig. 5). This suggests that chromatin interactions are not sufficient to impact gene expression, as recently reported at the b-globin locus 44 and that ZNF143 role in loop formation is not dependent on gene transcription.

Discussion
Cellular identity is dependent on lineage-specific transcriptional programmes set by master transcription factors acting at regulatory elements that communicate with one another through chromatin interactions 1 . Recently, the ENCODE project 17 observed well-positioned and symmetrical nucleosomes flanking the binding sites of CTCF, RAD21 and SMC3, which contrasted the variability observed surrounding the binding sites of other transcription factors with the exception of ZNF143 (ref. 17). In agreement with this observation representing a unique feature of chromatin-looping factors, we demonstrate that ZNF143 is required at promoters to stimulate the formation of chromatin interactions with distal regulatory elements (Fig. 5). This aligns with its reported role favouring POL2 occupancy at gene promoters 22 and in the assembly of the pre-initiation complex 23 . The fact that ZNF143 is ubiquitously expressed 21 suggests that ZNF143 may be a regulator of the architectural foundations of cell identity. Although the mechanisms accounting for cell type-specific ZNF143-binding profiles are unknown, chromatin interactions were recently reported to be set early during lineage commitment 6 . In agreement, ZNF143 is required for zebrafish embryo development 45 , for stem cell identity and for to the fourth position of the ZNF143 DNA recognition sequence (motif 1; Fig. 4a) the most prominent motif found within B85% of the top 500 sites. The rs13228237 SNP changes the fourteenth position of a reported extension of a ZNF143 DNA recognition sequence 22,38 (motif 2; Fig. 4b), which is found within B25% of the top 500 sites. Consistent with the observation that the actual ZNF143-binding sites are located at gene promoters, B43% and B76% of gene promoters ( ± 2.5 kb of the transcription start site) bound by ZNF143 were found to contain motif 1 or motif 2 (motif P values o1 Â 10 À 4 ) in GM12878 cells, respectively. Interesting, motif 2 appears to be the most prominent ZNF143 motif found at gene promoters and most closely resembles the ZNF143 motif characterized using in vitro methods 22,39 . The imposed changes to the DNA sequence based on the positionweighted matrix predict preferential binding of ZNF143 to the reference A and the variant C allele of the rs2232015 and rs13228237 SNPs, respectively, compared with the other alleles (Fig. 4a,b). In agreement, 242 reads from the ZNF143 ChIP-seq data, mapping to the rs2232015 SNP, contain the reference A allele and 136 reads contain the variant T allele (P ¼ 5.47 Â 10 À 8 ; Fig. 4c). Likewise, of the 25 reads mapping to the rs13228237 SNP, five contain the reference G allele and 20 contain the variant C allele (P ¼ 4.08x10 À 3 ; Fig. 4d). Importantly, the signal intensity of the ZNF143-binding site containing the rs13228237 SNP is high (n ¼ 175) indicating that this SNP falls within the centre of the inferred ZNF143-binding site and between the positive and negative strand peaks of the unprocessed ChIP-Seq reads (Fig. 4d). Allele-specific ChIP-quantitative PCR (qPCR) assays against ZNF143 in GM12878 cells validated the predicted allelic imbalance for both SNPs (Fig. 4e,f and Supplementary Fig. 3). Consistent with ZNF143 being directly responsible for chromatin loop formation, the decreased binding of ZNF143 to the chromatin caused by the variant allele at the rs2232015 SNP leads to a corresponding allele-specific reduction of the chromatin interaction frequency measured by 3C assays between the PRMT6 promoter and a distal regulatory element B85 kb away ( Fig. 4e and Supplementary Fig. 3). Interestingly, the rs2232015 SNP modulates a portion of the ZNF143 recognition motif that is shared with THAP11 and recently shown in vitro to be dispensable for ZNF143 binding 22 . These results, while revealing that ZNF143 is required, may indicate that a complex of factors specify chromatin interactions. Similarly, the increased binding of ZNF143 to the chromatin caused by the variant C allele of the rs13228237 SNP leads to an increase in the chromatin interaction frequency between the first intron of the ZC3HAV1 gene and two distal regulatory elements located B200 kb away ( Fig. 4f and Supplementary Fig. 3). Interestingly, this ZNF143binding site is located B14 kb from the transcription start site of the ZC3HAV1 gene and may represent an unknown isoform of ZC3HAV1 gene. Consistently, a transcription start site was predicted from 5 0 cap analysis of gene expression data 89 bp from the rs13228237 in GM12878 by the ENCODE project ( Supplementary Fig. 4). Expression quantitative trait loci (eQTL) analysis of the rs2232015 and rs13228237 SNPs using RNA-Seq data from lymphoblastoid cells (n ¼ 373) (ref. 40) genotyped as part of the 1,000 Genomes Project 41 reveals that the ZC3HAV1 expression is modulated by the rs13228237 SNP in lymphoblastoid cells (P ¼ 1.73 Â 10 À 3 ; Fig. 4f). However, the rs2232015 SNP is not significantly associated with the expression of the PRMT6 gene (P ¼ 0.063; Fig. 4e). This coincides with a repressed element and poised promoter chromatin state at the distal regulatory element looping to the PRMT6 promoter in the GM12878 cells ( Supplementary Fig. 5), which contrasts with the active state at regulatory elements looping to the ZC3HAV1 promoter ( Supplementary Fig. 5). Interestingly, the rs2232015 SNP is in strong linkage disequilibrium (r 2 Z0.95) with two reported eQTLs captured by the rs1762509 and rs9435441 SNPs 42,43 . The rs1762509 and rs9435441 SNPs lead to allelespecific expression of the PRMT6 gene within the liver cells and monocytes, respectively 42,43 . Consistently, the interacting distal regulatory element looping to the PRMT6 promoter is in an active state within liver cells ( Supplementary Fig. 5). This suggests that chromatin interactions are not sufficient to impact gene expression, as recently reported at the b-globin locus 44 and that ZNF143 role in loop formation is not dependent on gene transcription.

Discussion
Cellular identity is dependent on lineage-specific transcriptional programmes set by master transcription factors acting at regulatory elements that communicate with one another through chromatin interactions 1 . Recently, the ENCODE project 17 observed well-positioned and symmetrical nucleosomes flanking the binding sites of CTCF, RAD21 and SMC3, which contrasted the variability observed surrounding the binding sites of other transcription factors with the exception of ZNF143 (ref. 17). In agreement with this observation representing a unique feature of chromatin-looping factors, we demonstrate that ZNF143 is required at promoters to stimulate the formation of chromatin interactions with distal regulatory elements (Fig. 5). This aligns with its reported role favouring POL2 occupancy at gene promoters 22 and in the assembly of the pre-initiation complex 23 . The fact that ZNF143 is ubiquitously expressed 21 suggests that ZNF143 may be a regulator of the architectural foundations of cell identity. Although the mechanisms accounting for cell type-specific ZNF143-binding profiles are unknown, chromatin interactions were recently reported to be set early during lineage commitment 6 . In agreement, ZNF143 is required for zebrafish embryo development 45 , for stem cell identity and for orientation exist in >60% neighboring TAD boundaries (Figure S4D), suggesting that the boundary reverse-forward CBS pairs play an important role in the formation of most of TADs. For example, there is a CBS pair in the reverse-forward orientation in a Chr12 genomic region of H1-hESC cells, located at or very close to each of the six TAD boundaries (boundaries 1-6), except for boundary 5, which has only one closely located CBS in the forward orientation ( Figure S4E). These data, taken together, strongly suggest that directional binding of CTCF to boundary CBS pairs in the reverse-forward orientations causes opposite topological looping and thus appears to function as insulators.

The Human b-globin Locus Provides an Additional Example of CBS Orientation-Dependent Topological Chromatin Looping
Based on the location and orientation of CBSs, as well as their CTCF/cohesin occupancy, we identified four CCDs (domains 1-4) in the well-characterized b-globin cluster ( Figure 5A). The b-globin gene cluster is located between CBS3 (5 0 HS5) and CBS4 (3 0 HS1) in domain1 ( Figure 5A) (Hou et al., 2010;Splinter et al., 2006). We generated a series of CBS4/5 mutant K562 cell lines using CRISPR/Cas9 with one or two sgRNAs (Li et al., (E) Distribution of genome-wide orientation configurations of CBS pairs located in the boundaries between two neighboring domains in the human K562 genome. Note that the vast majority (90.0%) of boundary CBS pairs between two neighboring domains are in the reverse-forward orientation. See also Figure S4 and Tables S1, S2, S3, S4, S5, and S6. 2015) (Figures S2B and S2C). In the CRISPR cell lines D3, D7, and D19 (out of 38 clones screened) in which the internal CBS4 (3 0 HS1) was deleted (Figure S2B), chromatin-looping interactions between CBS3 (5 0 HS5) in the forward orientation and the boundary CBS5 in the reverse orientation in domain1 persisted, although its interaction with the CBS4 (3 0 HS1) region was abolished ( Figures S5A and S5B). As expected, the interactions between CBS6/7 and CBS8/9 in domain2 were unchanged ( Figure S5C). Strikingly, however, in the CBS4 (3 0 HS1) and CBS5 double-knockout CRISPR cell lines C2, C4, and C14 (out of 49 clones screened) ( Figure S2C), novel chromatin-looping interactions between CBS3 (5 0 HS5) in the forward orientation of domain1 and CBS8/9 in the reverse orientation of the neighboring domain2 were observed, suggesting that these two domains merge as a single domain in CRISPR cell lines with CBS4/5 double knockout ( Figure S5B). Similarly, when CBS8 was used as an anchor, this reverse-oriented CBS in domain2 establishes new long-range chromatin-looping interactions with CBS1-3 in the forward orientation of domain1 in the CBS4/5 double-deletion CRISPR cell lines ( Figure S5C). We conclude that cross-domain interactions can be established after deletion of CBSs up to the boundary of topological domains, but not after deletion of the internal CBS in the b-globin locus.
To further test the functional significance of this organization of CBSs, we again performed CRISPR/cas9-mediated DNAfragment editing in the HEK293T cells and screened 198 CRISPR

CTCF-mediated DNA Looping
Forward Orientation of CTCF-binding Sites orientation exist in >60% neighboring TAD boundaries (Figure S4D), suggesting that the boundary reverse-forward CBS pairs play an important role in the formation of most of TADs. For example, there is a CBS pair in the reverse-forward orientation in a Chr12 genomic region of H1-hESC cells, located at or very close to each of the six TAD boundaries (boundaries 1-6), except for boundary 5, which has only one closely located CBS in the forward orientation ( Figure S4E). These data, taken together, strongly suggest that directional binding of CTCF to boundary CBS pairs in the reverse-forward orientations causes opposite topological looping and thus appears to function as insulators.

The Human b-globin Locus Provides an Additional Example of CBS Orientation-Dependent Topological Chromatin Looping
Based on the location and orientation of CBSs, as well as their CTCF/cohesin occupancy, we identified four CCDs (domains 1-4) in the well-characterized b-globin cluster ( Figure 5A). The b-globin gene cluster is located between CBS3 (5 0 HS5) and CBS4 (3 0 HS1) in domain1 ( Figure 5A) (Hou et al., 2010;Splinter et al., 2006). We generated a series of CBS4/5 mutant K562 cell lines using CRISPR/Cas9 with one or two sgRNAs (Li et al., (E) Distribution of genome-wide orientation configurations of CBS pairs located in the boundaries between two neighboring domains in the human K562 genome. Note that the vast majority (90.0%) of boundary CBS pairs between two neighboring domains are in the reverse-forward orientation. See also Figure S4 and Tables S1, S2, S3, S4, S5, and S6. 2015) (Figures S2B and S2C). In the CRISPR cell lines D3, D7, and D19 (out of 38 clones screened) in which the internal CBS4 (3 0 HS1) was deleted (Figure S2B), chromatin-looping interactions between CBS3 (5 0 HS5) in the forward orientation and the boundary CBS5 in the reverse orientation in domain1 persisted, although its interaction with the CBS4 (3 0 HS1) region was abolished ( Figures S5A and S5B). As expected, the interactions between CBS6/7 and CBS8/9 in domain2 were unchanged ( Figure S5C). Strikingly, however, in the CBS4 (3 0 HS1) and CBS5 double-knockout CRISPR cell lines C2, C4, and C14 (out of 49 clones screened) ( Figure S2C), novel chromatin-looping interactions between CBS3 (5 0 HS5) in the forward orientation of domain1 and CBS8/9 in the reverse orientation of the neighboring domain2 were observed, suggesting that these two domains merge as a single domain in CRISPR cell lines with CBS4/5 double knockout ( Figure S5B). Similarly, when CBS8 was used as an anchor, this reverse-oriented CBS in domain2 establishes new long-range chromatin-looping interactions with CBS1-3 in the forward orientation of domain1 in the CBS4/5 double-deletion CRISPR cell lines ( Figure S5C). We conclude that cross-domain interactions can be established after deletion of CBSs up to the boundary of topological domains, but not after deletion of the internal CBS in the b-globin locus.
To further test the functional significance of this organization of CBSs, we again performed CRISPR/cas9-mediated DNAfragment editing in the HEK293T cells and screened 198 CRISPR orientation exist in >60% neighboring TAD boundaries (Figure S4D), suggesting that the boundary reverse-forward CBS pairs play an important role in the formation of most of TADs. For example, there is a CBS pair in the reverse-forward orientation in a Chr12 genomic region of H1-hESC cells, located at or very close to each of the six TAD boundaries (boundaries 1-6), except for boundary 5, which has only one closely located CBS in the forward orientation ( Figure S4E). These data, taken together, strongly suggest that directional binding of CTCF to boundary CBS pairs in the reverse-forward orientations causes opposite topological looping and thus appears to function as insulators.

The Human b-globin Locus Provides an Additional Example of CBS Orientation-Dependent Topological Chromatin Looping
Based on the location and orientation of CBSs, as well as their CTCF/cohesin occupancy, we identified four CCDs (domains 1-4) in the well-characterized b-globin cluster ( Figure 5A). The b-globin gene cluster is located between CBS3 (5 0 HS5) and CBS4 (3 0 HS1) in domain1 ( Figure 5A) (Hou et al., 2010;Splinter et al., 2006). We generated a series of CBS4/5 mutant K562 cell lines using CRISPR/Cas9 with one or two sgRNAs (Li et al., (E) Distribution of genome-wide orientation configurations of CBS pairs located in the boundaries between two neighboring domains in the human K562 genome. Note that the vast majority (90.0%) of boundary CBS pairs between two neighboring domains are in the reverse-forward orientation. See also Figure S4 and Tables S1, S2, S3, S4, S5, and S6. 2015) ( Figures S2B and S2C). In the CRISPR cell lines D3, D7, and D19 (out of 38 clones screened) in which the internal CBS4 (3 0 HS1) was deleted (Figure S2B), chromatin-looping interactions between CBS3 (5 0 HS5) in the forward orientation and the boundary CBS5 in the reverse orientation in domain1 persisted, although its interaction with the CBS4 (3 0 HS1) region was abolished ( Figures S5A and S5B). As expected, the interactions between CBS6/7 and CBS8/9 in domain2 were unchanged ( Figure S5C). Strikingly, however, in the CBS4 (3 0 HS1) and CBS5 double-knockout CRISPR cell lines C2, C4, and C14 (out of 49 clones screened) ( Figure S2C), novel chromatin-looping interactions between CBS3 (5 0 HS5) in the forward orientation of domain1 and CBS8/9 in the reverse orientation of the neighboring domain2 were observed, suggesting that these two domains merge as a single domain in CRISPR cell lines with CBS4/5 double knockout ( Figure S5B). Similarly, when CBS8 was used as an anchor, this reverse-oriented CBS in domain2 establishes new long-range chromatin-looping interactions with CBS1-3 in the forward orientation of domain1 in the CBS4/5 double-deletion CRISPR cell lines ( Figure S5C). We conclude that cross-domain interactions can be established after deletion of CBSs up to the boundary of topological domains, but not after deletion of the internal CBS in the b-globin locus.
To further test the functional significance of this organization of CBSs, we again performed CRISPR/cas9-mediated DNAfragment editing in the HEK293T cells and screened 198 CRISPR

CTCF-mediated DNA Looping
Forward Orientation of CTCF-binding Sites orientation exist in >60% neighboring TAD boundaries (Figure S4D), suggesting that the boundary reverse-forward CBS pairs play an important role in the formation of most of TADs. For example, there is a CBS pair in the reverse-forward orientation in a Chr12 genomic region of H1-hESC cells, located at or very close to each of the six TAD boundaries (boundaries 1-6), except for boundary 5, which has only one closely located CBS in the forward orientation ( Figure S4E). These data, taken together, strongly suggest that directional binding of CTCF to boundary CBS pairs in the reverse-forward orientations causes opposite topological looping and thus appears to function as insulators.

The Human b-globin Locus Provides an Additional Example of CBS Orientation-Dependent Topological Chromatin Looping
Based on the location and orientation of CBSs, as well as their CTCF/cohesin occupancy, we identified four CCDs (domains 1-4) in the well-characterized b-globin cluster ( Figure 5A). The b-globin gene cluster is located between CBS3 (5 0 HS5) and CBS4 (3 0 HS1) in domain1 ( Figure 5A) (Hou et al., 2010;Splinter et al., 2006). We generated a series of CBS4/5 mutant K562 cell lines using CRISPR/Cas9 with one or two sgRNAs (Li et al., (E) Distribution of genome-wide orientation configurations of CBS pairs located in the boundaries between two neighboring domains in the human K562 genome. Note that the vast majority (90.0%) of boundary CBS pairs between two neighboring domains are in the reverse-forward orientation. See also Figure S4 and Tables S1, S2, S3, S4, S5, and S6. 2015) ( Figures S2B and S2C). In the CRISPR cell lines D3, D7, and D19 (out of 38 clones screened) in which the internal CBS4 (3 0 HS1) was deleted (Figure S2B), chromatin-looping interactions between CBS3 (5 0 HS5) in the forward orientation and the boundary CBS5 in the reverse orientation in domain1 persisted, although its interaction with the CBS4 (3 0 HS1) region was abolished ( Figures S5A and S5B). As expected, the interactions between CBS6/7 and CBS8/9 in domain2 were unchanged ( Figure S5C). Strikingly, however, in the CBS4 (3 0 HS1) and CBS5 double-knockout CRISPR cell lines C2, C4, and C14 (out of 49 clones screened) ( Figure S2C), novel chromatin-looping interactions between CBS3 (5 0 HS5) in the forward orientation of domain1 and CBS8/9 in the reverse orientation of the neighboring domain2 were observed, suggesting that these two domains merge as a single domain in CRISPR cell lines with CBS4/5 double knockout ( Figure S5B). Similarly, when CBS8 was used as an anchor, this reverse-oriented CBS in domain2 establishes new long-range chromatin-looping interactions with CBS1-3 in the forward orientation of domain1 in the CBS4/5 double-deletion CRISPR cell lines ( Figure S5C). We conclude that cross-domain interactions can be established after deletion of CBSs up to the boundary of topological domains, but not after deletion of the internal CBS in the b-globin locus.
To further test the functional significance of this organization of CBSs, we again performed CRISPR/cas9-mediated DNAfragment editing in the HEK293T cells and screened 198 CRISPR orientation exist in >60% neighboring TAD boundaries (Figure S4D), suggesting that the boundary reverse-forward CBS pairs play an important role in the formation of most of TADs. For example, there is a CBS pair in the reverse-forward orientation in a Chr12 genomic region of H1-hESC cells, located at or very close to each of the six TAD boundaries (boundaries 1-6), except for boundary 5, which has only one closely located CBS in the forward orientation ( Figure S4E). These data, taken together, strongly suggest that directional binding of CTCF to boundary CBS pairs in the reverse-forward orientations causes opposite topological looping and thus appears to function as insulators.

The Human b-globin Locus Provides an Additional Example of CBS Orientation-Dependent Topological Chromatin Looping
Based on the location and orientation of CBSs, as well as their CTCF/cohesin occupancy, we identified four CCDs (domains 1-4) in the well-characterized b-globin cluster ( Figure 5A). The b-globin gene cluster is located between CBS3 (5 0 HS5) and CBS4 (3 0 HS1) in domain1 ( Figure 5A) (Hou et al., 2010;Splinter et al., 2006). We generated a series of CBS4/5 mutant K562 cell lines using CRISPR/Cas9 with one or two sgRNAs (Li et al., (E) Distribution of genome-wide orientation configurations of CBS pairs located in the boundaries between two neighboring domains in the human K562 genome. Note that the vast majority (90.0%) of boundary CBS pairs between two neighboring domains are in the reverse-forward orientation. See also Figure S4 and Tables S1, S2, S3, S4, S5, and S6. 2015) ( Figures S2B and S2C). In the CRISPR cell lines D3, D7, and D19 (out of 38 clones screened) in which the internal CBS4 (3 0 HS1) was deleted (Figure S2B), chromatin-looping interactions between CBS3 (5 0 HS5) in the forward orientation and the boundary CBS5 in the reverse orientation in domain1 persisted, although its interaction with the CBS4 (3 0 HS1) region was abolished ( Figures S5A and S5B). As expected, the interactions between CBS6/7 and CBS8/9 in domain2 were unchanged ( Figure S5C). Strikingly, however, in the CBS4 (3 0 HS1) and CBS5 double-knockout CRISPR cell lines C2, C4, and C14 (out of 49 clones screened) ( Figure S2C), novel chromatin-looping interactions between CBS3 (5 0 HS5) in the forward orientation of domain1 and CBS8/9 in the reverse orientation of the neighboring domain2 were observed, suggesting that these two domains merge as a single domain in CRISPR cell lines with CBS4/5 double knockout ( Figure S5B). Similarly, when CBS8 was used as an anchor, this reverse-oriented CBS in domain2 establishes new long-range chromatin-looping interactions with CBS1-3 in the forward orientation of domain1 in the CBS4/5 double-deletion CRISPR cell lines ( Figure S5C). We conclude that cross-domain interactions can be established after deletion of CBSs up to the boundary of topological domains, but not after deletion of the internal CBS in the b-globin locus.
To further test the functional significance of this organization of CBSs, we again performed CRISPR/cas9-mediated DNAfragment editing in the HEK293T cells and screened 198 CRISPR

CTCF-mediated DNA Looping
Forward Orientation of CTCF-binding Sites orientation exist in >60% neighboring TAD boundaries (Figure S4D), suggesting that the boundary reverse-forward CBS pairs play an important role in the formation of most of TADs. For example, there is a CBS pair in the reverse-forward orientation in a Chr12 genomic region of H1-hESC cells, located at or very close to each of the six TAD boundaries (boundaries 1-6), except for boundary 5, which has only one closely located CBS in the forward orientation ( Figure S4E). These data, taken together, strongly suggest that directional binding of CTCF to boundary CBS pairs in the reverse-forward orientations causes opposite topological looping and thus appears to function as insulators.
The Human b-globin Locus Provides an Additional Example of CBS Orientation-Dependent Topological Chromatin Looping Based on the location and orientation of CBSs, as well as their CTCF/cohesin occupancy, we identified four CCDs (domains 1-4) in the well-characterized b-globin cluster ( Figure 5A). The b-globin gene cluster is located between CBS3 (5 0 HS5) and CBS4 (3 0 HS1) in domain1 ( Figure 5A) (Hou et al., 2010;Splinter et al., 2006). We generated a series of CBS4/5 mutant K562 cell lines using CRISPR/Cas9 with one or two sgRNAs (Li et al., (E) Distribution of genome-wide orientation configurations of CBS pairs located in the boundaries between two neighboring domains in the human K562 genome. Note that the vast majority (90.0%) of boundary CBS pairs between two neighboring domains are in the reverse-forward orientation. See also Figure S4 and Tables S1, S2, S3, S4, S5, and S6. 2015) ( Figures S2B and S2C). In the CRISPR cell lines D3, D7, and D19 (out of 38 clones screened) in which the internal CBS4 (3 0 HS1) was deleted (Figure S2B), chromatin-looping interactions between CBS3 (5 0 HS5) in the forward orientation and the boundary CBS5 in the reverse orientation in domain1 persisted, although its interaction with the CBS4 (3 0 HS1) region was abolished ( Figures S5A and S5B). As expected, the interactions between CBS6/7 and CBS8/9 in domain2 were unchanged ( Figure S5C). Strikingly, however, in the CBS4 (3 0 HS1) and CBS5 double-knockout CRISPR cell lines C2, C4, and C14 (out of 49 clones screened) ( Figure S2C), novel chromatin-looping interactions between CBS3 (5 0 HS5) in the forward orientation of domain1 and CBS8/9 in the reverse orientation of the neighboring domain2 were observed, suggesting that these two domains merge as a single domain in CRISPR cell lines with CBS4/5 double knockout ( Figure S5B). Similarly, when CBS8 was used as an anchor, this reverse-oriented CBS in domain2 establishes new long-range chromatin-looping interactions with CBS1-3 in the forward orientation of domain1 in the CBS4/5 double-deletion CRISPR cell lines ( Figure S5C). We conclude that cross-domain interactions can be established after deletion of CBSs up to the boundary of topological domains, but not after deletion of the internal CBS in the b-globin locus.
To further test the functional significance of this organization of CBSs, we again performed CRISPR/cas9-mediated DNAfragment editing in the HEK293T cells and screened 198 CRISPR