Deciphering Hierarchical Chromatin Structures and Preference of Genomic Positions in Single Mouse Embryonic Stem Cells

Recent developments in the three-dimensional (3D) genome have revealed chromatin organization with hierarchical architectures at different genomic scales. Transcriptional regulators and their specific interactions implicate gene expression programs by complex chromatin structures. The exploration of single-cell 3D genome maps reveals that chromatin domains are indeed physical structures presenting in single cells and domain boundaries vary from cell to cell. However, the formation mechanism and functional implication of hierarchical chromatin structures in single cells is still blurry. To this end, we first develop a hierarchical chromatin structure identification algorithm (named as HiCS) from individual single-cell Hi-C maps, with superior performance in both accuracy and efficiency. The enrichment of hundreds of regulatory factors in domain boundaries suggests that in addition to the CTCF-cohesin complex, Polycomb, TrxG, pluripotent protein families and other multiple factors also contribute to shaping chromatin landscapes in single embryonic stem cells. We further decipher the preference of genomic positions-function relationship by grouping and annotating chromatin landscape clusters with different types of regulatory factors, and the relationship corresponds to precisely the largest five types of retrotransposons. Our results suggest that multiple types of regulatory factors interplaying with each other in specific genomic positions may navigate the preference of genomic positions, and then shape chromosome architecture and cell functions of single cells.


Introduction
optimal structural identification parameter. The decision graph of optimal structural identification them fairly, we adjust the scaling parameter of HiCS to obtain comparable domain sizes with IS and 138 deTOKI, respectively (Fig. 1e), with an example, we can see that HiCS can obtain more accurate 139 chromatin structure boundaries at single-cell resolution ( Fig. 1a and d). Actually, the insulation 140 strengths of domains obtained from HiCS is significantly higher than those of IS and deTOKI 141 respectively, suggesting its superiority to competing methods (Fig. 1f). The running time of HiCS 142 is significantly less than both algorithms under the same hardware condition (Fig. 1g). Moreover, 143 the domain boundaries detected by HiCS are more significantly enriched in multiple common 144 factors, including CTCF, H3K4me3, Housekeeping (HK) genes (Fig. 1h), as well as RNA 145 polymerase II (PolII), promoters, highly expressed genes, and average phastcon score 146 ( Supplementary Fig. S1b). Also, the boundaries detected by HiCS are more pronounced 147 disappearance of the enhancers or the super enhancers (SEs), which unfavorably form boundaries 148 in single cells (Supplementary Fig. S1b). Taken together, HiCS shows superior performance in 149 both accuracy and efficiency.

151
The existence of hierarchical chromatin structures. We adjust the optimal structural 152 identification parameters to generate multiple-scale chromatin structures at different genomic scales 153 for 1315 single mESCs at 40kb resolution. We clearly observe four peaks of domain size and 154 insulation strength distribution with different scaling parameters of 0.2, 1, 4, and 8, which we choose 155 for the downstream analysis (Fig. 2a, b and Supplementary Fig. S1a). The domain scales of these 156 four levels are approximately 200Kb~600Kb, 800Kb~1Mb, 2Mb~3Mb, and ~5Mb respectively. We

157
show an example for hierarchical chromatin structures, which well match the local insulation 158 strength of chromatin regions (Fig. 2b). Note that high-level boundaries with high may have lower 159 insulation strength than low-level ones, and the boundaries in the same level may have different 160 local insulation strengths (Fig. 2b).

161
The median number of boundaries identified by HiCS on chromosome 2 (Chr2) of single 162 mESCs is about 161.5, 54, 18, and 30 for different genomic levels (Supplementary Fig. S2a). The locations form boundaries in 5% of cells (Fig. 3a). We also observe that the probability forming 169 boundaries enhances with the levels of genomic position increasing (Supplementary Fig. S2d).  increase of levels ( Fig. 3b and c). The emergence of highly transcriptional activity, especially

231
We find that the core TFs (Oct4, Sox2, and Nanog) controlling the pluripotent state do not 232 prefer to appear in domain boundaries, which may be related to chromatin hubs occupied by super-

296
We also observe some interesting cooperative patterns among different clusters 297 (Supplementary Fig. S9b). For example, PCR1 is strongly associated with HASE&E and HTA,

302
The observation is consistent with the report in a recent study that Yy1 is a structural regulator of 303 enhancer-promoter loops, and contributes to transcription repressor associated with non-canonical 304 PRC1, whereas CTCF preferentially occupies sites distal from these regulatory elements that tend 305 to form larger loops and participate in insulation [21,29]. In addition, the CoreRF cluster is strongly 306 associated with the HASE&E cluster, but not the HTG cluster, while the HASE&E cluster is    (Fig. 4b, c and Fig. 5a). Previous studies have indicated 320 that loop extrusion executed by CTCF and cohesin is a leading factor governing domain formation 321 and facilitating chromatin folding. We find that the other subunits (Sa1 and Sa2) of cohesin which 322 occupy the same genomic locus and present similar enrichment patterns at boundaries of single cells, 323 suggesting that both Sa1 and Sa2 may also participate in the maintenance of domain boundaries 324 (Fig. 3e). ZC3H11A, a zinc finger protein, shows a uniform pattern as Sa1 and Sa2, implying its positions that AP1 proteins occupy may be directly related to the macroscopic architecture of

389
To summarize, we obtain the following key results by the above analysis: (1) The genomic and repressing factors unfavorably form boundaries in single cells with weakened insulation are specially associated with ERVK elements, which may indicate that ERVK acts as specific roles    (Fig. 4b and c). In addition, previous studies have shown that   (Fig. 5b). The observation suggests that highly activation of SEs in the phase may promote 442 gene regulation and transcription for cell growth in size, and ensure biomaterials for DNA synthesis.

443
In addition, the functional group (HTG-Positions) accompanied with highly transcribed genes 444 exhibits a significant loss of boundaries in the MS phase, which may be because the rates of 445 transcription and protein synthesis are low during DNA replication (Fig. 5b). We also observe that 446 functional groups occupied by both CTCF-cohesin and TrxG-PRC complexes prefer to form 447 boundaries in ES phases (Fig. 5b) heterochromatin organization prefers to form boundaries in both MS and LS/G2 phases (Fig. 5b).

452
Both phases may prepare for everything entering the mitosis phase with condensing chromatin states.

544
Next, we modeled each chromosome as an unweighted network (each bin is one node, and each bin 545 pair with non-zero contacts is added as one edge), and implemented a classic graph embedding 546 method node2vec, which applies a biased random walks procedure, to compute the contact 547 probability of edges by computing the cosine similarity of any two node embedding vectors, and 548 obtained the imputed matrix (Fig. 1a). We only kept the top 5% pairs for downstream analysis 549 and the diagonal pairs were removed in our study.

551
Detect domain boundaries for each individual cell. Inspired by a fast density-based clustering 552 method designed for grouping data points [47,48], we take advantage of finding the cluster centers 553 to detect domain boundaries for each chromosome of individual cells (Fig. 1b and c). Specifically,

574
Determine the hierarchical chromatin structures. We ran the above procedures of detecting 575 domain boundaries multiple times by define ( ) > α × ( ) as a screening selection for different 576 genomic levels, where the range of α are set as (0.1, 10) (Fig. 2a). In our study, we selected multiple 577 scaling parameters α as {0.2, 1, 4, 8} to obtain the hierarchical structures of chromatin, according 578 to the distribution of the domain sizes and insulation strengths under different scaling parameters 579 ( Fig. 2a and b).

581
Decipher the categories of regulatory factors and spatially organized chromatin landscapes. We

586
We further applied the hierarchical clustering method to merge the 27 classes into 7 large 587 clusters, based on the Pearson correlation of the normalization ratio of the mean counts for peaks of 588 regulatory factor classes (Supplementary Fig. S9b). These clusters or sub-clusters were manually 589 annotated based on the annotation information (Supplementary Table S2). We then merged   specific genes and maintain self-renewal in mouse embryonic stem cells. Science