Discovery of biased orientation of human DNA motif sequences affecting enhancer-promoter interactions and transcription of genes

Chromatin interactions have important roles for enhancer-promoter interactions (EPI) and regulating the transcription of genes. CTCF and cohesin proteins are located at the anchors of chromatin interactions, forming their loop structures. CTCF has insulator function limiting the activity of enhancers into the loops. DNA binding sequences of CTCF indicate their orientation bias at chromatin interaction anchors – forward-reverse (FR) orientation is frequently observed. DNA binding sequences of CTCF were found in open chromatin regions at about 40% - 80% of chromatin interaction anchors in Hi-C and in situ Hi-C experimental data. It has been reported that long range of chromatin interactions tends to include less CTCF at their anchors. It is still unclear what proteins are associated with chromatin interactions. To find DNA binding motif sequences of transcription factors (TF) such as CTCF affecting the interaction between enhancers and promoters of transcriptional target genes and their expression, here, I predicted human transcriptional target genes of TF bound in open chromatin regions in enhancers and promoters in monocytes and other cell types using experimental data in public database. Transcriptional target genes were predicted based on enhancer-promoter association (EPA). EPA was shortened at the genomic locations of FR orientation of DNA binding motifs of TF, which were supposed to be at chromatin interaction anchors. The expression level of the target genes predicted based on EPA was compared with target genes predicted from only promoters. Total 351 biased orientation of DNA motifs (192 forward-reverse (FR) and 179 reverse-forward (RF) orientation, the reverse complement sequences of some DNA motifs are also registered in databases, so the total number is smaller than the number of FR and RF) affected the expression level of putative transcriptional target genes significantly in monocytes of four people in common. Moreover, EPI predicted using FR or RF orientation of some DNA motifs were overlapped with chromatin interaction data (Hi-C) more than the other EPA (Total 62 biased orientation of DNA motifs, 41 FR and 24 RF showed this result).

Huvec and MCF-7 respectively, including CTCF and cohesin (RAD21 and SMC3) 158 (Table 2; Supplemental Table S2). The scores of DNA binding motif sequences were the 159 highest in monocytes, and the other cell types showed lower scores. The results of the 160 analysis of DNA motif sequences in CD20 + B cells and macrophages did not include 161 CTCF and cohesin, because these analyses can be utilized in cells where the expression 162 level of putative transcriptional target genes of each transcription factor show a 163 significant difference between promoters and EPA shortened at the genomic locations of 164 difference between promoters and EPA (Osato 2018). 166 Instead of DNA binding motif sequences of transcription factors, DNA repeat 167 sequences were also examined. The expression level of transcriptional target genes 168 predicted based on EPA shortened at the genomic locations of DNA repeat sequences 169 was compared with the expression level of transcriptional target genes predicted from 170 promoters. Three reverse-forward orientation of DNA repeat sequences showed a 171 significant difference of expression level of putative transcriptional target genes in 172 monocytes of four people in common (Table 3) The number of open chromatin regions with the same pairs of DNA binding motif 180 sequences was counted, and when the pairs of DNA binding motif sequences were 181 enriched with statistical significance (chi-square test, p-value < 1.0 × 10 -10 ), they were 182 listed (  including CTCF and cohesin showed a higher ratio of EPI overlapped with Hi-C than 208 the other types of EPA in monocytes (Table 5) Target genes of a transcription factor were assigned when its TFBS was found in 314 DNase-seq narrow peaks in promoter or extended regions for enhancer-promoter 315 association of genes (EPA). Promoter and extended regions were defined as follows: 316 promoter regions were those that were within distances of ±5 kb from transcriptional 317 start sites (TSS). Promoter and extended regions were defined as per the following 318 association rule, which is the same as that defined in Figure 3A   tion exist in >60% neighboring TAD boundaries (Fig-D), suggesting that the boundary reverse-forward CBS lay an important role in the formation of most of TADs. ample, there is a CBS pair in the reverse-forward orientaa Chr12 genomic region of H1-hESC cells, located at or lose to each of the six TAD boundaries (boundaries 1-6), t for boundary 5, which has only one closely located n the forward orientation ( Figure S4E). These data, taken er, strongly suggest that directional binding of CTCF to ary CBS pairs in the reverse-forward orientations causes ite topological looping and thus appears to function as ors.
uman b-globin Locus Provides an Additional ple of CBS Orientation-Dependent Topological atin Looping on the location and orientation of CBSs, as well as their cohesin occupancy, we identified four CCDs (domains the well-characterized b-globin cluster ( Figure 5A). The in gene cluster is located between CBS3 (5 0 HS5) and (E) Distribution of genome-wide orientation configurations of CBS pairs located in the boundaries between two neighboring domains in the human K562 genome. Note that the vast majority (90.0%) of boundary CBS pairs between two neighboring domains are in the reverse-forward orientation. See also Figure S4 and Tables S1, S2, S3, S4, S5, and S6. 2015) (Figures S2B and S2C). In the CRISPR cell lines D3, D7, and D19 (out of 38 clones screened) in which the internal CBS4 (3 0 HS1) was deleted (Figure S2B), chromatin-looping interactions between CBS3 (5 0 HS5) in the forward orientation and the boundary CBS5 in the reverse orientation in domain1 persisted, although its interaction with the CBS4 (3 0 HS1) region was abolished ( Figures S5A and S5B). As expected, the interactions between CBS6/7 and CBS8/9 in domain2 were unchanged ( Figure S5C). Strikingly, however, in the CBS4 (3 0 HS1) and CBS5 double-knockout CRISPR cell lines C2, C4, and C14 (out of 49 clones screened) ( Figure S2C), novel chromatin-looping interactions between CBS3 (5 0 HS5) in the forward orientation of domain1 and CBS8/9 in the reverse orientation of the neighboring domain2 were observed, suggesting that these two domains merge as a single domain in CRISPR cell lines with CBS4/5 double knockout ( Figure S5B). Similarly, when CBS8 was used as an anchor, this reverse-oriented CBS in domain2 establishes new long-range chromatin-looping interactions with CBS1-3 in the forward orientation of domain1 in the CBS4/5 double-deletion CRISPR cell lines ( Figure S5C). We conclude that cross-domain interactions can be established after deletion of CBSs up to the boundary of topological domains, but not after deletion of the internal CBS in the b-globin locus.
To further test the functional significance of this organization of CBSs, we again performed CRISPR/cas9-mediated DNAfragment editing in the HEK293T cells and screened 198 CRISPR ell 162, 900-910, August 13, 2015 ª2015 Elsevier Inc. orientation exist in >60% neighboring TAD boundaries (Figure S4D), suggesting that the boundary reverse-forward CBS pairs play an important role in the formation of most of TADs. For example, there is a CBS pair in the reverse-forward orientation in a Chr12 genomic region of H1-hESC cells, located at or very close to each of the six TAD boundaries (boundaries 1-6), except for boundary 5, which has only one closely located CBS in the forward orientation ( Figure S4E). These data, taken together, strongly suggest that directional binding of CTCF to boundary CBS pairs in the reverse-forward orientations causes opposite topological looping and thus appears to function as insulators.
The  Figure S4 and Tables S1, S2, S and S6. 2015) (Figures S2B and S2C). CRISPR cell lines D3, D7, and D of 38 clones screened) in which t nal CBS4 (3 0 HS1) was delete ure S2B), chromatin-looping inte between CBS3 (5 0 HS5) in the orientation and the boundary CBS5 in the reverse orien domain1 persisted, although its interaction with the (3 0 HS1) region was abolished ( Figures S5A and S5B). pected, the interactions between CBS6/7 and CB domain2 were unchanged ( Figure S5C). Strikingly, how the CBS4 (3 0 HS1) and CBS5 double-knockout CRISPR c C2, C4, and C14 (out of 49 clones screened) ( Figure S2C chromatin-looping interactions between CBS3 (5 0 HS5) in ward orientation of domain1 and CBS8/9 in the reverse tion of the neighboring domain2 were observed, sugges these two domains merge as a single domain in CRIS lines with CBS4/5 double knockout ( Figure S5B). S when CBS8 was used as an anchor, this reverse-orient in domain2 establishes new long-range chromatin-loopi actions with CBS1-3 in the forward orientation of domai CBS4/5 double-deletion CRISPR cell lines ( Figure S5 conclude that cross-domain interactions can be esta after deletion of CBSs up to the boundary of topolog mains, but not after deletion of the internal CBS in the locus.
To further test the functional significance of this orga of CBSs, we again performed CRISPR/cas9-mediate fragment editing in the HEK293T cells and screened 198