Conservation of dichromatin organization along regional centromeres

The focal attachment of the kinetochore to the centromere is essential for genome maintenance, yet the highly repetitive nature of satellite regional centromeres, such as those in humans, limits our understanding of their chromatin organization. We demonstrate that single-molecule chromatin fiber sequencing (Fiber-seq) can uniquely co-resolve kinetochore and surrounding chromatin architectures along point centromeres, revealing largely homogeneous single-molecule kinetochore occupancy along each chromosome. In contrast, extension of Fiber-seq to regional satellite centromeres exposed marked per-molecule heterogeneity in their chromatin organization. Regional CENP-A-marked centromere cores uniquely contain a dichotomous chromatin organization (dichromatin) composed of compacted nucleosome arrays punctuated with highly accessible chromatin patches. CENP-B occupancy phases dichromatin to the underlying alpha-satellite repeat within centromere cores, but is not necessary for dichromatin formation. Centromere core dichromatin is a conserved feature between humans despite the marked divergence of their underlying alpha-satellite organization and is similarly a conserved feature along regional centromeres that lack satellite repeats in gibbon. Overall, the chromatin organization of regional centromeres is defined by marked per-molecule heterogeneity, likely buffering kinetochore attachment against sequence and structural variability within regional centromeres. Highlights Dichotomous accessible and compacted chromatin (dichromatin) marks centromere cores Highly accessible chromatin patches punctuate sites of kinetochore attachment Dichromatin can form irrespective of CENP-B occupancy Conservation within centromeres is mediated at the level of chromatin, not DNA


Introduction
Centromeres can range from 'point' centromeres, which contain a single well positioned CENP-A nucleosome, to 'regional' centromeres, which include numerous CENP-A nucleosomes often embedded within tandemly repeated sequences.For example, Saccharomyces cerevisiae centromeres are comprised of ~125 bp point centromeres built on a single CENP-A nucleosome, whereas human centromeres are comprised of regional centromeres that occupy ~171 bp alphasatellite repeat units organized into higher-order repeats (HORs) that can span several megabases on each chromosome.These alpha-satellite HORs serve as the genetic substrate for kinetochore attachment to the human genome within the centromere core 1,2 , and kinetochore interactions within the centromere core are modulated by several DNA-binding proteins, including the sequence-specific DNA binding protein CENP-B 3,4 and the histone H3 variant CENP-A 5 .
Imaging and genomic studies have demonstrated that kinetochore attachment is limited to only a small portion of the alpha-satellite HOR array, which is marked by CENP-A and CENP-B occupancy along the autosomes and X chromosome 6,7 and hypo-CpG methylation [8][9][10] (i.e., the centromere core or 'centromere dip region [CDR]').However, in vivo chromatin compaction and organization within this region remains largely unresolved, with somewhat contradictory chromatin features appearing to localize to the centromere core.For example, the centromere core contains a unique patterning of histones referred to as 'centro-chromatin', which is comprised of both centromere-specific CENP-A nucleosomes, as well as H3 nucleosomes dimethylated on Lys4 (H3K4me2) 11 that are often associated with euchromatic portions of the genome 12 .Despite the presence of these H3K4me2 histone, the centromere core is thought to contain condensed chromatin arrays in vivo 13 , which are proposed to arise from CENP-A mediated nucleosome condensation 14 as well as other kinetochore proteins bridging and compacting neighboring nucleosomes [15][16][17] .Resolving exactly how chromatin is organized along individual chromatin fibers within the centromere as well as the relationship between kinetochore binding and chromatin organization has been challenging due to the highly repetitive DNA content of alpha-satellite HORs 13 .Specifically, centromeres are systematically unresolved in most reference genomes, and shortread sequencing-based chromatin profiling methods cannot uniquely resolve chromatin architectures across these repetitive DNA arrays (Figure S1A).In addition, short-read sequencing-based chromatin profiling methods are inherently unable to resolve how multiple chromatin features are positioned along a single chromatin fiber, as these methods inherently massively fragment chromatin fibers in the process of studying them.The completion of the first telomere-to-telomere (T2T) human reference genome 18 in combination with emerging methyltransferase-based single-molecule long-read chromatin profiling methods 19- 22 opens the possibility of studying chromatin features across centromeres at single-molecule and single-nucleotide resolution.Specifically, non-specific adenine methyltransferase (m6A-MTase)based chromatin profiling methods (i.e., Fiber-seq) enable nucleotide-precise mappings of chromatin accessibility, protein occupancy, nucleosome positioning, and CpG methylation along multi-kilobase chromatin fibers 19,23 (Figure S1B and S1C).Fiber-seq utilizes a non-specific m6A-MTase 24 to stencil the chromatin architecture of individual multi-kilobase fibers onto their underlying DNA templates via methylated adenines (Figure 1A), which is a non-endogenous DNA modification in humans 25,26 .The genetic and chromatin architecture of each fiber is directly read using highly accurate single-molecule PacBio HiFi long-read DNA sequencing, which is capable of accurately distinguishing m6A-and mCpG-modified bases at single-nucleotide resolution 23,27 .

Single-molecule chromatin architectures of kinetochore-bound point centromeres
To map the chromatin architecture of centromeres at single-molecule and near single-nucleotide resolution, we first applied Fiber-seq to budding yeast Saccharomyces cerevisiae (Fig. 1A), which contain a point centromere along each chromosome 28,29 .Similar to humans, the yeast inner kinetochore directly interacts with DNA via a centromere-associated inner kinetochore (CCAN) complex, which contains the histone H3 variant CENP-A (Cse4) 28 in addition to the yeast-specific centromere binding factor 1 (Cbf1) 30 and the yeast-specific multiprotein complex CBF3 31 .Each yeast centromere is organized into three centromere DNA elements (CDEs) (CDEI, CDEII, and CDEIII), and cryo-EM structures and nuclease digestions of the yeast CCAN complex have shown that the yeast CCAN embeds ~160 bp of DNA within this complex 32,33 .Application of Fiber-seq to asynchronously growing Saccharomyces cerevisiae cells revealed a discrete change in chromatin structure immediately overlapping the CDEI-III elements (Fig. 1B, 1C).Specifically, >90% of chromatin fibers overlapping the yeast centromere contained a well-positioned MTase-protected region overlapping the CDEI-III elements.This MTase-protected region was significantly larger than the standard nucleosome footprint size observed genome-wide, perfectly aligned with the cryo-EM structure of the yeast inner kinetochore (Fig. 1D), and was disrupted upon Cse4 destabilization 34 .Notably, yeast centromeres were frequently flanked by MTase sensitive patches (MSP) of chromatin that mirror the size of traditional gene regulatory elements (Fig. 1B, 1C), indicating that the yeast CCAN is associated with marked alterations in the local chromatin architecture.Together, these findings reveal that Fiber-seq can resolve the chromatin architecture of the CCAN at single-molecule and near single-nucleotide resolution and that individual fibers demonstrate homogeneous CCAN occupancy that frequently abuts patches of accessible chromatin.

Dichotomous chromatin marks the centromere core in humans
We next sought to investigate the chromatin architecture of human centromeres by applying Fiber-seq to the human hydatidiform mole CHM13 cell line 35 , the same line used for constructing the first human telomere-to-telomere (T2T) reference genome 9,18 .We observed that coupling long-read epigenome maps with paired fully sequenced and assembled genome maps from the same individual enabled us to uniquely map the chromatin architecture of all centromeres in CHM13 cells (Figure 2), overcoming the substantial diversity between individuals 36 that is enriched within these highly repetitive genomic regions.Notably, in stark contrast to the yeast point centromeres, CHM13 regional centromeres lacked homogeneously positioned chromatin (Fig. 2A).However, similar to yeast, CHM13 centromere cores contained markedly altered chromatin structures not observed elsewhere in the human genome.Specifically, the nucleosome architecture of CHM13 centromere cores diverged from that observed elsewhere in the human genome, containing mononucleosome footprints markedly smaller than the average nucleosome footprint size outside of the centromere core (Figure S1D), consistent with a large population of these centromere core mononucleosomes containing CENP-A, which is known to wrap only 121 bp of DNA in humans 13,37,38 .In addition, the centromere core contained some of the most compacted nucleosome arrays within the human genome, with nearly 50% of all nucleosome footprints within the centromere core being di-nucleosomal in size (i.e.>210 bp), consistent with prior observations 39 (Fig. 2B and 2C).Notably, similar to yeast point centromeres, regional centromere cores in humans were also markedly enriched for large MTase-sensitive patches of chromatin, forming some of the most accessible chromatin domains within the entire human genome (Figures 2B and 2C).However, in stark contrast to traditional heterochromatin or euchromatin domains, individual chromatin fibers originating from the centromere core contained both tightly compacted nucleosome arrays and highly accessible chromatin features abutting each other (Figures 2A and 2C).Accessible chromatin patches were infrequently observed within alpha-satellite regions outside of the centromere core, indicating that they are a unique feature of kinetochore binding.Although these accessible chromatin patches were heterogeneously placed across individual chromatin fibers within the centromere core, these patches nonetheless clustered along individual chromatin fibers within the centromere core (Figure 2D).In addition, each chromosome molecule containing only 6-28 accessible chromatin patches within its 80-220 kbp centromere core (Figure 2E), a value that mirrors estimates of the number of kinetochore microtubules that form on individual human centromeres 40,41 .Notably, longer chromosomes have significantly more accessible chromatin patches than shorter chromosomes, and after controlling for chromosome length, we observed that chromosomes with the highest rate of missegregation in human RPE-1 cells 42 had among the lowest amount of accessible chromatin patches (Figure 2E).Together, these findings establish that the centromere core in human CHM13 cells contains a dichotomous chromatin organization (i.e.dichromatin) not found elsewhere in the genome, which is characterized by highly accessible chromatin patches that likely directly relate to kinetochore attachment and function within the centromere core.

CENP-B guides centromere core dichromatin to mirror the alpha-satellite DNA repeat
We next sought to investigate the impact that alpha-satellite repeat sequence has on dichromatin formation within CHM13 centromere cores.The 170 bp alpha-satellite repeat is known to preferentially position CENP-A containing nucleosomes both in vitro and in vivo 38,43 , so we first sought to identify whether we could observe this preferential positioning along extended alphasatellite arrays using Fiber-seq data.To identify the most common represented spacing of m6Amodified bases (i.e., the chromatin/nucleosome repeat unit length), we applied a Fourier Transform to m6A-modified bases present along each fiber.We observed that chromatin fibers originating from euchromatic and non-alpha satellite heterochromatic genomic regions have nucleosomes repeating every ~179-190 bp (Figures 3A, 3B and S2), consistent with prior reports 44,45 .In contrast, centromere core alpha-satellite regions contained a nucleosome repeat unit of 170 bp, mirroring the underlying repeat length of alpha-satellite DNA (Figure 3A and 3B).This reveals that the predominant positioning of nucleosome arrays within the centromere core is directly related to the size of the underlying alpha-satellite repeat.However, inactive alphasatellite HORs as well as divergent/monomeric alpha-satellite HORs contained a nucleosome repeat unit of 190-193 bp (Figure 3B), indicating that the alpha-satellite repeat in and of itself is not sufficient for enabling chromatin to mirror the underlying DNA repeat.In addition to CENP-A, alpha-satellite repeats within centromere cores in humans are also bound by the sequence-specific DNA binding protein CENP-B, which occupies a non-palindromic A/Trich 17 bp CENP-B box 3,4 that predominantly punctuates alternating repeats within alpha-satellite HORs 46 (Figures 3C and S3A).CENP-B occupancy is known to induce neighboring nucleosome positioning along these repeats in vitro 47,48 , and CpG methylation is a known modifier of CENP-B affinity and occupancy in vitro 49,50 .Given these features, we next sought to determine whether CENP-B occupancy could be playing a role in the phasing of dichromatin to the alpha-satellite repeat within the centromere core.CENP-B occupancy along the CENP-B box is readily resolved in vitro using DNaseI footprinting 47 , and we similarly found that CENP-B occupancy can be resolved in vivo using Fiber-seq (Figures 4A and 4B).Specifically, CENP-B boxes located within the centromere core demonstrate occupancy with an MTase-footprint that mirrors the in vitro CENP-B DNaseI footprint and the protein-DNA contacts within the crystal structure of the CENP-B-DNA complex 3,47,51 (Figures 4B).Furthermore, consistent with prior in vitro findings 49,50 , we observed that CpG methylation at even one of the 2 CpG dinucleotides within the CENP-B box largely abrogates CENP-B occupancy within the centromere core in vivo (Figures 4B, 4C and  S3B), indicating that CpG methylation plays a dominant role in modulating CENP-B occupancy within the centromere core.Notably, we found that CENP-B boxes that lack CpG methylation within the centromere core preferentially overlap accessible chromatin patches and position nucleosomes immediately adjacent to them (Figures S3C and S3D), thereby organizing dichromatin within the centromere core relative to the underlying alpha-satellite DNA repeat.However, this feature was limited to the centromere core, as CENP-B boxes outside of the centromere core were largely unoccupied irrespective of their CpG methylation status (Figure 4C), with weak CENP-B occupancy only observed at sporadic unmethylated CENP-B boxes located within alpha-satellite HORs flanking the centromere core, a pattern consistent with prior imaging studies 7 .Together, these findings suggest a model whereby CENP-B occupancy at hypo-CpG methylated CENP-B boxes within the centromere core is organizing chromatin architecture to mirror the underlying alpha-satellite repeat.To disentangle the role of CENP-B in dichromatin formation, we evaluated chromatin architectures along the Y chromosome centromere, which is formed from alpha-satellite repeats that lack CENP-B boxes 52 .Application of Fiber-seq to 46,XY GM24385 cells followed by mapping to the complete GM24385 reference genome (i.e., HG002) 10 demonstrated a marked enrichment in both compacted di-nucleosome footprints, as well as large patches of accessible chromatin within the Y-chromosome centromere core (Figures 4D and S3E).However, in contrast to studies using short-read sequencing method 38 , higher order chromatin organization within the centromere core region of chromosome Y did not mirror the underlying alpha-satellite repeat unit, but rather was a continuation of the canonical chromatin repeat unit observed throughout the rest of the HG002 genome (Figure 4E).Together, these findings indicate that the phasing of chromatin architectures along the alpha satellite HOR is intricately intertwined with that of CENP-B occupancy, and that dichromatin formation within these regions can form irrespective of CENP-B.Centromere core dichromatin is conserved across highly divergent regional centromeres We next sought to evaluate the conservation of centromere core dichromatin architecture between multiple human genomes, as the underlying DNA sequence and structure of each centromere can markedly differ between individuals 36 .To accomplish this, we applied Fiber-seq to CHM1 and GM24385 cells, as a reference genome for all centromeres within both of these cells was recently resolved 36 .Notably, centromere core regions (i.e., hypo-CpG methylated regions) within each centromere markedly diverge in both their sequence content and location when compared to CHM13 cells (Figures 5A, 5B and S4).However, despite the sequence divergence of centromere cores within CHM13 cells, or between CHM13 cells and CHM1 and GM24385 cells, these centromere core regions are still marked by dichromatin -exhibiting both compacted di-nucleosome footprints as well as large patches of accessible chromatin intermingled along the same chromatin fiber (Figures 5B, 5C and S4A).In addition, the higher order chromatin organization along the autosomes and X chromosome within CHM1 and GM24385 cells (i.e.centromeres that contain CENP-B boxes) similarly displayed a chromatin repeat unit of 170 bp selectively within centromere cores (Figure S4B and S4C).Together, these findings demonstrate that dichromatin is a conserved feature of chromatin compaction within the centromere core in humans.To understand if dichromatin is a feature of regional centromeres that lack alpha-satellite repeats, we next applied Fiber-seq to resolve chromatin architectures within gibbon centromeres.Gibbons have undergone rapid karyotypic evolution accompanied by the inactivation of many ancestral centromeres and the formation of evolutionary new centromeres 53 that are largely composed of transposable elements that lack CENP-B boxes and alpha-satellite repeats 54 .We first assembled and validated the sequence of five centromeres from the eastern hoolock gibbon (Hoolock leuconedys) by applying Oxford Nanopore (ONT) sequencing, Hi-C, Fiber-seq, and CENP-A CUT&RUN to a 38,XX lymphoblastoid cell line derived from Betty (Figure 6A).These centromeres contained hypo-CpG methylated regions corresponding to sites of CENP-A occupancy (Figure 6B and S5).However, the sequence of these centromere cores was largely derived from transposable elements, not alpha satellite repeats.Notably, despite the lack of alpha-satellite repeats within these centromere cores, these regions were still marked by dichromatin -exhibiting both compacted di-nucleosome footprints as well as clustered patches of accessible chromatin intermingled along the same chromatin fibers (Figure 6B, 6C and S5E).Unlike human autosomes, the higher order chromatin organization within the eastern hoolock gibbon centromere core regions appeared to be a continuation of the canonical chromatin repeat unit observed within flanking regions, albeit less organized owing to the dichromatin features (Figure 6D).Together, these findings demonstrate that dichromatin is a conserved feature of centromere core chromatin compaction irrespective of the underlying DNA sequence.

Discussion
We demonstrate that single-molecule chromatin fiber sequencing (Fiber-seq) enables the delineation of chromatin architectures within point and regional centromeres at single-molecule and single-nucleotide resolution.Application of Fiber-seq to point centromeres in yeast, as well as regional centromeres in four human and non-human primate samples with complete centromere assemblies exposed a unique form of chromatin compaction within regional centromere cores that is not observed elsewhere along the genome.Specifically, regional centromere cores simultaneously contain both the most accessible and compacted chromatin in the genome, with these dichotomous chromatin features both present along individual chromatin fibers.In addition, unlike traditional euchromatic regulatory elements, accessible chromatin patches within the centromere core are heterogeneously positioned across individual chromatin fibers.These dichotomous features of chromatin compaction within the centromere core diverge from the standard model of heterochromatin and euchromatin compaction and is hence termed 'dichromatin'.These accessible chromatin patches that punctuate the centromere core of regional centromeres are also a feature of point centromeres, where they create a highly plastic chromatin topology at the site of kinetochore attachment.Within regional centromeres, clustered accessible chromatin patches disrupt the ability of compacted nucleosome arrays to extend beyond ~1 kb in length, in stark contrast to the chromatin surrounding regional centromeres core, which is marked by tightly compacted nucleosome arrays that often stretch for >20 kb in length 17 .Centromeres comprise the most rapidly evolving DNA sequences in eukaryotic genomes 59 , and the marked heterogeneity of chromatin organization within the centromere core likely enables kinetochore attachment to be particularly immune to sequence and structural variability within centromeres.Consistent with this, we find that the dichromatin architecture of centromere cores is highly conserved between individuals, despite the marked divergence of the underlying alphasatellite organization.Furthermore, dichromatin can assemble within centromere cores that lack alpha-satellite repeats, indicating that functional conservation within centromeres is mediated at the level of chromatin, not DNA.These findings are consistent with the centromere paradox 58 , and raise additional questions as to how the centromere core remains faithfully localized to a specific region along the genome despite its predilection to form along divergent sequences.(E) Estimated total number of accessible chromatin patches along each chromosome's centromere core versus the length of that chromosome.Chromosomes with high rates of missegregation 35 are in red.See also Figure S1.(D) Box-and-whisker plots of the chromatin repeat lengths from centromere core and flanking regions.Below are indicated the peak chromatin repeat unit as well as the medial spectral density at that repeat unit showing that chromatin is more disorganized within the centromere core.See also Figure S5.

STAR METHODS
Single-molecule chromatin fiber sequencing CHM13 cells were grown in AmnioMax C-100 Basal Medium (Invitrogen 12558011), which includes the AmnioMAX™ C-100 Supplement to approximately 90% confluency at 37°C and 5% CO 2 .Growth media was changed daily, and cells were split using 0.25% trypsin (GIbco 25200056).One million CHM13 cells per sample were pelleted at 250 x g for 5 minutes (4 samples were processed in parallel), and washed once with PBS and then pelleted again at 250 x g for 5 minutes.Each cell pellet was resuspended in 60 μl Buffer A (15 mM Tris, pH 8.0; 15 mM NaCl; 60 mM KCl; 1mM EDTA, pH 8.0; 0.5 mM EGTA, pH 8.0; 0.5 mM Spermidine) and 60 μl of cold 2X Lysis buffer (0.1% IGEPAL CA-630 in Buffer A) was added and mixed by gentle flicking then kept on ice for 10 minutes.Samples were then pelleted at 4°C for 5 min at 350 x g and the supernatant was removed.The nuclei pellets were gently resuspended individually with wide bore pipette tips in 57.5 μl Buffer A and moved to a 25°C thermocycler.1 μl of Hia5 MTase (200 U) 19 and 1.5 μl 32 mM S-adenosylmethionine (NEB B9003S) (0.8 mM final concentration) were added, then carefully mixed by pipetting the volume up and down 10 times with wide bore tips.The reactions were incubated for 10 minutes at 25°C then stopped with 3 μl of 20% SDS (1% final concentration) and transferred to new 1.5 mL microfuge tubes.The sample volumes were adjusted to 100 μl by the addition of 37 μl PBS.To that, 500 μl of HMW Lysis Buffer A (Promega Wizard HMW DNA Extraction Kit A2920) and 3 μl RNAse A (ThermoFisher EN0531) were added.The tubes were mixed by inverting 7 times and incubated 15 minutes at 37°C.20 μl of Proteinase K (Promega Wizard HMW DNA Extraction Kit A2920) was added and the samples mixed by inverting 10 times and incubated 15 minutes at 56°C followed by chilling on ice for 1 minute.Protein was precipitated by the addition of 200 μl Protein Precipitation Solution (Promega Wizard HMW DNA Extraction Kit A2920).Using a wide bore tip the samples were mixed by drawing up contents from the bottom of the tube and then expelled on the side of the tube 5 times.Tubes were centrifuged at 16,000g for 5 minutes.The supernatant was poured into a new 1.5 ml tube containing 600 μl isopropanol.This was mixed by gentle inversion 10 times, incubated 1 minute at room temperature then inverted an additional 10 times.DNA was precipitated by centrifugation at 16,000g for 5 minutes.The supernatant was decanted and the pellet washed with the addition of 70% ethanol and centrifuged again at 16,000 g 5 minutes.After the supernatant was decanted, a quick spin was performed to facilitate removal of any residual ethanol.Open tubes were air dried on the bench for 15 minutes.The DNA was resuspended by the addition of 25 μl 10mM Tris pH 8.0.Tubes were stored overnight in 4°C and the next day were mixed gently by using a wide bore pipette tip, drawing up the sample 5 times and gently expelling the contents on the side of the tube.Samples were then stored at -80°C prior to library construction.DNA shearing, library construction, and PacBio Sequel II sequencing were performed as previously described 20 with the exception that 15-20 kb fragments were targeted when shearing using the Megaruptor (Diagenode Diagnostics).In addition, we performed a high-pass size selection of the SMRTbell library using the Sage Science PippinHT platform (Sage Science cat.no.ELF0001) according to the manufacturer's protocol, using a high-pass cutoff of 10-15 kb to target an average library size of 17-20 kb.Fiber-seq on CHM1 cells was performed in a similar manner, and CHM1 cells were similarly grown in AmnioMax C-100 Basal Medium (Invitrogen 12558011), which includes the AmnioMAX™ C-100 Supplement.Fiber-seq on GM24385 and HLE cells were performed in a similar manner with the exception that cells were grown in the following conditions.GM24385 cells were grown in RPMI-1640 supplemented with 2mM L-Glutamine, 15% fetal bovine serum (FBS) and 1% Penicillin/Streptomycin. HLE cells were grown in RPMI-1640 supplemented with 10% FBS, 1% Pen/Strep, 1% l-glutamine, 1% non-essential amino acids, and 1% sodium pyruvate.

Yeast strains, yeast Fiber-seq and gDNA extraction:
WT Saccharomyces cerevisiae cells (SBY3, W303 background) were grown in 10 mL of YPD media until mid-log phase and harvested by centrifugation at 3000 x g for 5 minutes.Cells were washed once with cold dH2O and resuspended in cold KPO4/Sorbitol buffer (1 M Sorbitol; 50 mM Potassium phosphate, pH 7.5; 5 mM EDTA, pH 8.0) supplemented with 0.167% β-Mercaptoethanol.Cells were spheroplasted by addition of 0.15 µg/mL final concentration of Zymolyase T100 (Amsbio 120493-1) and incubated at 23 °C for 15 minutes on a roller drum.Spheroplasts were pelleted at 300 x g for 8 minutes at 4 °C, washed twice with cold 1 M Sorbitol, and resuspended in 58 µL of Buffer A (1 M Sorbitol; 15 mM Tris-HCl, pH 8.0, 15 mM NaCl; 60 mM KCl; 1 mM EDTA, pH 8.0; 0.5 mM EGTA, pH 8.0; 0.5 mM Spermidine; 0.075% IGEPAL CA-630).Spheroplasts were treated with 1 µL of Hia5 MTase (200 U) and 1.5 µL of 32 mM Sadenosylmethionine (NEB B9003S) for 10 minutes at 25 °C.Reaction was stopped by addition of 3 µL of 20% SDS (1% final concentration) and high molecular weight DNA was purified using the Promega Wizard® HMW DNA extraction kit (A2920).Yeast gDNA samples were quantified using a Qubit dsDNA HS Assay according to the manufacturer's protocol.Samples were sheared to 10-15 kb with two passes through a Covaris gTUBE at 3200rpm for 2 minutes, or until most sample was passed, per pass in an Eppendorf 5424R centrifuge.∼50 μL per sample was recovered and barcoded using the SMRTbell® prep kit 3.0 with the SMRTbell adapter index plate 96A, according to the protocol for each kit with adjustment for larger sample volume.Samples were sequenced by UW PacBio Sequencing Services on a Sequel II SMRT Cell 8M with a 30-hour movie.

Mapping and identifying single-molecule chromatin Features
Unaligned subreads were first converted to HiFi reads using the ccs -hifi-kinetics.The subsequent unaligned bam files containing HiFi reads were first aligned to the appropriate reference genome using pbmm2 -preset CCS.Fiber-seq reads from CHM13 cells were aligned to CHM13 v1.1 (identical to CHM13v2 without chrY).Fiber-seq reads from CHM1 cells were aligned to chm1_cens_v21.fa.Fiber-seq reads from GM24385/HG002 were aligned to HG002v1.0.1.Fiber-seq reads from the eastern hoolock gibbon (Hoolock leuconedys) lymphoblastoid cell line (HLE) were aligned to a de novo genome assembly produced from that cell line as described below.Yeast reads were aligned to the sacCer3 genome.Single-molecule m6A events were called along single molecules using fibertools-rs predict-m6A using the default parameters.Single-molecule mCpG events were identified using PacBio's suite of tools.Given the later release of jasmine we used primrose(v1.3)to process CHM13 using parameters: --min-passes 3 --keep-kinetics.For the remaining datasets we used jasmine(v2.0)--min-passes 3. We compared CHM13 data processed with primrose vs jasmine and findings were nearly identical.Aggregate bigWig tracks containing CpG methylation information were generated using pb-CpG-tools (v2.3.1)-modsite-mode reference -model pileup_calling_model.v1.tflite .Nucleosomes and MSPs were identified with fibertools-rs add-nucleosomes using default parameters.By default, this process is run in the background during fibertools-rs predict-m6A.Nucleosome, MSP, and m6A bigWig tracks were generated by counting total occurrences across all molecules at a bp and dividing by the coverage at that bp.Identifying the centromere core/centromere dip region For CHM13, annotations of various satellite DNA were downloaded from the UCSC T2T hub (http://t2t.gi.ucsc.edu/chm13/hub/t2t-chm13-v1.0/cenSatAnnotation.bigBed), and putative kinetochore bindings regions for CHM13 were searched for within 'hor' satellite regions using CpG methylation data that was similarly downloaded from the UCSC T2T hub (http://t2t.gi.ucsc.edu/chm13/hub/t2t-chm13-v1.0/methyFreq.bigWig).CHM1 annotations were generated as previously described (Logsdon et al., In preparation).For CHM13 and CHM1, alpha satellite repeats less than 25 bp apart were merged together and we only kept alpha satellite stretches >= 100kb.We then made windows of 1190 bp across this region with step size of 170 bp (approximate alpha satellite repeat length).We then took the mean CpG methylation frequency across all CpG sites that fall in the window.We identified all windows that were below the 35 th percentile of CpG methylation window frequency, and merged neighboring windows that passed this threshold.Merged windows that were 1) at least 15 kb in length, 2) had a mean CpG methylation window frequency < 20 th percentile, and 3) were not located in the terminal 70kb of the alpha satellite repeat region were identified as CDRs.For HG002, CDRs were identified using the same approach, but the threshold to the terminus of the alpha satellite array was lowered to 15kb.For HLE, we used CENPA CUT&RUN enriched regions to identify the centromere core, as we did not know whether to expect this region to be hypo-CpG methylated.

Identification of dinucleosomes and large MSPs
The cutoff for identifying dinucleosomes and large MSPs was 210 and 150, respectively.For each individual fiber, features (Nucleosomes or MSPs) were compared against their respective thresholds, and a percentage value was calculated for each individual chromatin fiber.

Measuring Distance Between Neighboring Accessible Patches
To calculate the observed distance between large MSPs we calculated the distance between large MSP start positions.To calculate the expected distance between large MSPs in the CDR, we first calculated the number of large MSPs fully contained in the CDR and divided by the total number of sequenced bases per CDR to get a large MSP / sequenced bp value.We then multiply the resulting value by the length of the respective CDR to get the expected number of large MSPs per CDR.We divide the length of the CDR by this value to get the expected number of bp between large MSPs within the CDR.

Determining chromatin repeat lengths
To determine the chromatin repeat length of individual chromatin fibers, the per-base m6A pattern on each fiber was converted into a binary vector (m6a=1, no-m6a=0).These vectors were then independently analyzed using the scipy.signal.periodogramtool.This results in spectral density estimations across a vector of frequencies, which is the inverse of the nucleosome repeat length in base-pairs.Spectral densities derived from multiple fibers originating from similar genomic loci were combined and displayed.

Delineating CENP-B binding elements and occupancy
CENP-B boxes were identified in each annotation group using the 17-bp CENP-B box NTTCGNNNNANNCGGGN motif.Specifically, genomic regions were scanned with this motif using fimo (version 4.11.2) 60 with the --parse-genomic-coord --qv-thresh --thresh 0.05 --maxstored-scores 1000000 flags.We further filtered motifs to those which contained 2 CpG sites.To delineate the occupancy of these CENP-B boxes, we intersected CENP-B boxes per-fiber methylation calls.Specifically, the m6A methylation profile of every region along every fiber overlapping a CENP-B box was converted into a binary profile (m6a=1, no-m6a=0) and stored in an array.Based on genomic context and/or CpG methylation contexts these arrays were collapsed to form m6A frequency tracks with a rolling average of 10bp.To calculate footprint score distributions, we took all fibers fulfilling a filtering requirement (CpG methylation status / genomic region), subsampled to 10% of the fibers 1000 times and collapsed these to generate m6A frequency tracks smoothed by a 10bp window.We used these fibers to calculate a footprint score defined by the m6A frequency in the flanking region divided by the m6A frequency in the core CENP-B box.The flanking region score was defined as the mean of the smoothed m6A content -35 --27 bases upstream and +22 -+27 bases downstream of the CENPB box center.The CENP-B core score was defined by the mean of the smoothed m6A frequency in the -7 -+1 positions relative to the center of the CENP-B box.

Quantifying nucleosome positioning surrounding CENP-B boxes
To visualize the association of CENP-B occupancy and surrounding nucleosome positioning we first classified every instance of a chromatin fiber overlapping a CENP-B box by its CpG methylation state (0,1 or 2 mCpGs).For each overlap, we took the closest (Relative to center of CENPB box) boundary of both the upstream and downstream nucleosome.For each position upstream or downstream of the CENP-B box we then calculated the percentage of upstream or downstream nucleosomes beginning at that position, respectively.

Hoolock leuconedys Oxford Nanopore Technologies (ONT) Sequencing
Cells were collected from a transformed lymphoblastoid cell line (LCL) derived from a female Hoolock leuconedys individual (Betty).To isolate high molecular weight (HMW) DNA, a phenol/chloroform/isoamyl alcohol extraction was performed per standard protocols for the isolation of high molecular weight DNA.Library preparation was performed using the Ligation Sequencing Kit (LSK109).HMW DNA was sequenced on the PromethION platform from Oxford Nanopore Technologies using a PromethION R9.4.1 FLOPRO002 flow cell and basecalled using Guppy (v5.0.16).

Hoolock leuconedys Dovetail™ Omni-C™ Sequencing
Roughly 1.5 million cells were collected from the previously described Hoolock leuconedys lymphoblastoid cell line and processed according to the Dovetail™ Omni-C™ Proximity Ligation Assay protocol for mammalian samples (v1.4) as written.Lysate quantification was performed using the Qubit® dsDNA HS kit for the Qubit® Fluorometer and the D5000 HS kit for the Agilent TapeStation 2200.150 bp paired end sequencing was performed on the Illumina NextSeq 550 V2 platform to a depth of ~274M reads.

Hoolock leuconedys CUT&RUN Sequencing
The CUT&RUN Assay Kit (#86652) from Cell Signaling Technology® was used to assess CENP-A protein-DNA interactions with 250,000 cells per condition following manufacturer's instructions.To assess CENPA-DNA interactions, the CENP-A monoclonal antibody (ADI-KAM-CC006-E) was used at a dilution of 1:50.Tri-methyl-histone H3 (Lys4) (C42D8) rabbit monoclonal antibody (#9751) at a dilution of 1:50 was used as a positive control; rabbit (DA1E) monoclonal antibody IgG XP® isotype control (#66362) at a dilution of 1:10 was used as a negative control.Input chromatin samples were sheared to ~100-700 bases using a Covaris S2 sonicator prior to purification.DNA purification was performed using the Cell Signaling® DNA purification with spin columns kit (#14209).DNA concentration was assessed using the Qubit® dsDNA HS kit for the Qubit® Fluorometer and the High Sensitivity D1000 kit for the Agilent TapeStation 2200.CENP-A and Input libraries were prepared using the NEBNext Ultra II DNA Library Prep Kit for Illumina (#E7645S) and sequenced using the Illumina NovaSeq 150 bp paired end settings to a depth of ~15M reads.

Hoolock leuconedys genome assembly
Flye (v2.9) 61 was used to assemble the raw Oxford Nanopore reads using an estimated genome size of 2.9 Gb, the size of the previously assembled Nomascus leucogenys genome 62 .Medaka (v1.4.3) was used for long read polishing using default settings and the r941_prom_sup_g507 model.Publicly available Illumina WGS sequencing reads for the same individual (SAMN12702557) 63 were used to polish the assembly by mapping the reads with Burrow's Wheeler Aligner (v0.7.17) using the bwa mem algorithm and processed using Samtools (v1.7).Short read error correction was performed using Pilon (v1.22) with default parameters.Haplotype redundancies and assembly artifacts based on read coverage were removed using minimap2 (v2.15) and PURGEhaplotigs (v1.0).Omni-C sequences were used to scaffold the assembly with Juicer (v1.6) 64 and 3D-DNA (v180922) 65 following the default protocols outlined by developers.Juicebox with Assembly Tools (v1.11.08) was used for manual review of the produced scaffolds.Gap filling was performed using TGS-GapCloser (v1.0.1).The gap-filled assembly was polished with Illumina reads using Pilon (v.1.22) with default parameters.The reformat.shmodule of BBMap was used to impose a 3kb limit on the genome.LASTZ (v1.04.15) was used to align the assembly to the CHM13 v1.1 genome.Resulting alignments were validated according to predicted syntenic regions and large-scale chromosome misassemblies and misorientations were manually corrected using Emboss (6.6.0).To reduce any misassemblies associated with manual curation and scaffolding, pre-scaffolded contigs (the "query") were aligned to the curated assembly (the "reference") and scaffolded using RagTag (v2.1.0).The resulting scaffolded assembly (built from "query" contigs) was gap filled using TGS-GapCloser (v1.0.1).The gap-filled assembly was polished with Illumina reads using Pilon (v.1.22)  with default parameters.

Hoolock leuconedys centromere annotation
A two-step approach was used to identify contiguous centromeres for Fiber-seq analysis.First, centromeric regions were identified by enrichment of CENP-A binding.Low quality reads (Phred score < 20, length < 50 nt) and adapters were trimmed from CENP-A and Input CUT&RUN sequences using cutadapt (v3.5) and mapped using Bowtie2 with default parameters (v2.5.0).Peaks were called using MACS (v2.1.2) using the --keep-dup=2 --broad -q 0.01 -g 2.8e9 flags.These peaks were labeled as putative centromere regions.Second, hifiasm was run directly on the Fiber-seq reads using default parameters.Hifiasm contigs were then mapped to the Hoolock leuconedys genome assembly generated using ONT and HiC data above.Putative centromere regions that were fully encompassed within a hifiasm contig that mapped to that region were then considered to be validated as correctly assembled centromeres, as these centromeres were independently assembled using two orthologous technologies.Five centromeres passed these filters and all five were validated by manual review of read coverage and selected for subsequent Fiber-seq analysis.Repeats in the genome were annotated with RepeatMasker (v4.1.2-p1)using the Crossmatch search engine (v1.090518) and a combined gibbon (Hylobates sp.) Dfam (v3.6) and Repbase (20181026) repeat library.CpG methylation was detected in the ONT reads using the Bonito (v0.3.2) -Remora (v1.1.1)basecalling pipeline with default parameters and the dna_r9.4.1_e8_sup@v3.3model.Resulting files were converted to a bedMethyl file using modbam2bed (v0.6.2).
(E) Genomic locus showing CpG methylation signal along the entire chromosome Y in HG002/GM24385 cells.Below are insets of single-molecule chromatin architectures at a promoter within the euchromatic region of chromosome Y, as well as an HSat3-HSat1B boundary within the heterochromatic q arm of chromosome Y.(B) Heatmap of the median spectral density for various genomic regions along the autosomes and X chromosome from CHM1 using Fiber-seq data from the CHM1 cell line.(C) (top) Heatmap of the median spectral density for various genomic regions along the autosomes and X chromosome from HG002 using Fiber-seq data from the GM24385 cell line.(bottom) Average density of di-nucleosome footprints and accessible chromatin patches within various genomic regions along the autosomes and X chromosome from HG002 using Fiber-seq data from the GM24385 cell line.

Figure 1 .
Figure 1.Resolution of kinetochore and surrounding chromatin architectures along point centromeres (A) Schematic for mapping the chromatin occupancy of point centromeres within the yeast S. cerevisiae using Fiber-seq.(B) Locus displaying single-molecule chromatin architectures of the S. cerevisiae chromosome III point centromere, alongside CAGE data and gene annotations.Note the large patches of chromatin accessibility at the CAGE-positive gene promoters, as well as immediately adjacent to the centromere.(C) Heatmap displaying average Fiber-seq m6A signal surrounding each of the S. cerevisiae point centromeres.(D) (top) Zoom-in of the per-nucleotide Fiber-seq signal of the S. cerevisiae chromosome III point centromere in relation to the CDEI, CDEII and CDEIII elements.(bottom) Cryo-EM structure of the yeast complete S. cerevisiae inner kinetochore bound to the chromosome III point centromere (PDB 8OW0) with the DNA colored according to the m6A-MTase sensitivity measured via Fiberseq on S. cerevisiae.

Figure 2 .
Figure 2. Dichotomous accessible chromatin patches mark human CHM13 centromere cores (A) Genomic locus of chromosome 5 centromere showing satellite repeats, bulk CpG methylation, Fiber-seq identified di-nucleosome footprint density, Fiber-seq identified accessible chromatin patch density, and the density of Fiber-seq Inferred Regulatory Elements (FIRE).Below are individual Fiber-seq reads (grey bars) with m6A-modified bases in purple delineating singlemolecule chromatin architectures within euchromatic and heterochromatic regions.(B) Average density of di-nucleosome footprints and accessible chromatin patches within various genomic regions (* p-value <0.01 Mann-Whitney).(C) Hexbin plot showing the single-molecule density of di-nucleosome footprints and accessible chromatin patches within various genomic regions.(D) Swarm and box-and-whisker plots showing the distance between accessible chromatin patches along the same molecule of DNA within the centromere core, as well as the expected distance based on the density of accessible chromatin patches within each chromosome's centromere core (* p-value <0.01 Mann-Whitney).(E)Estimated total number of accessible chromatin patches along each chromosome's centromere core versus the length of that chromosome.Chromosomes with high rates of missegregation35 are in red.See also FigureS1.

Figure 3 .
Figure 3. Centromere core chromatin mirrors the alpha-satellite DNA repeat (A) Box-and-whisker plots of the chromatin repeat lengths from various genomic regions.Specifically, m6A-marked chromatin features from individual chromatin fibers were subjected to Fourier transform and the per-molecule spectral densities were then aggregated across different fibers from the same genomic region.(B) Heatmap of the median spectral density for various chromatin repeat lengths, and the peak chromatin repeat length(s) from fibers contained within the euchromatic genome, or various satellite repeat regions.The satellite DNA repeat unit for each region is also indicated.

Figure 4 .
Figure 4. CENP-B selectively occupies and phases dichromatin within the centromere core (A) Genomic locus showing bulk CpG methylation and per-molecule m6A-marked chromatin architectures and CpG methylations demonstrating single-molecule footprints at centromere core CENP-B boxes.(B) Structure of CENP-B bound to a CENP-box (PDB 1HLV), as well as in vitro DNaseI footprint of a CENP-B-occupied CENP-B box relative to aggregate m6A methylation profile at centromere core CENP-B boxes containing various levels of methylated CpGs (mCpGs).(C) Box-and-whisker plots of footprint scores at CENP-B boxes within various satellite regions as a function of mCpG status at the CENP-B box.Higher scores quantitatively indicate greater CENP-B occupancy.(D)Average density of di-nucleosome footprints and accessible chromatin patches within various genomic regions along the Y chromosome from HG002 using Fiber-seq data from the GM24385 cell line.(E) Heatmap of the median spectral density for centromere core and euchromatic regions along the Y chromosome from HG002 using Fiber-seq data from the GM24385 cell line.See also FigureS3.

Figure 5 .
Figure 5. Dichromatin is a conserved feature of centromere core chromatin in humans (A-B) Genomic locus showing chromosome 5 centromere in both CHM13 (A) and CHM1 (B) cells, including CpG methylation and accessible chromatin patches.(bottom) Single-molecule chromatin architecture along individual alpha-satellite repeats within centromere core and noncore alpha-satellite repeats.(C) Hexbin plot showing the single-molecule density of di-nucleosome footprints and accessible chromatin patches within various CHM1 centromere cores and surrounding regions.See also Figure S4.

Figure 6 .
Figure 6.Centromere core dichromatin does not require alpha-satellite repeats (A) Schematic for Fiber-seq in a lymphoblastoid cell line from the eastern hoolock gibbon (Hoolock leuconedys) Betty.Ultralong read Nanopore sequencing was used for de novo genome assembly, which was validated using the contigs assembled using Fiber-seq sequencing data.CENP-A CUT&RUN was used to identify centromere cores along validated assembly regions.(B) Genomic locus showing chromosome 8 centromere, including different repeat classes, CpG methylation, CENP-A CUT&RUN, and Fiber-seq derived di-nucleosome footprint density, and chromatin accessibility.(bottom) Single-molecule chromatin architecture from pericentromeric gene regulatory elements, centromere core, and heterochromatin.(C) Hexbin plot showing the single-molecule density of di-nucleosome footprints and accessible chromatin patches within centromere core and flanking regions.(D) Box-and-whisker plots of the chromatin repeat lengths from centromere core and flanking regions.Below are indicated the peak chromatin repeat unit as well as the medial spectral density at that repeat unit showing that chromatin is more disorganized within the centromere core.See also Figure S5.

Figure S4 .
Figure S4.Conservation of dichromatin architecture between humans, related to Figure 5 (A) Genomic loci showing CpG methylation signal as well as di-nucleosome density and chromatin accessibility along the chromosome 12 and 15 centromeres in both CHM13 and CHM1 cells.(B)Heatmap of the median spectral density for various genomic regions along the autosomes and X chromosome from CHM1 using Fiber-seq data from the CHM1 cell line.(C) (top) Heatmap of the median spectral density for various genomic regions along the autosomes and X chromosome from HG002 using Fiber-seq data from the GM24385 cell line.(bottom) Average density of di-nucleosome footprints and accessible chromatin patches within various genomic regions along the autosomes and X chromosome from HG002 using Fiber-seq data from the GM24385 cell line.

Figure S5 .
Figure S5.Alpha-satellite DNA is not necessary for dichromatin formation, related to Figure 6 (A-D) Genomic loci showing repeats, CpG methylation, CENP-A CUT&RUN, and Fiber-seq derived di-nucleosome density and chromatin accessibility along four assembled centromeres from a lymphoblastoid cell line from the eastern hoolock gibbon (Hoolock leuconedys) Betty.Below inset showing single-molecule chromatin architectures within the centromere core region, a heterochromatic region, and a gene regulatory element along chromosome 1.(E) Swarm and box-and-whisker plots showing the distance between accessible chromatin patches along the same molecule of DNA within the centromere core, as well as the expected distance based on the density of accessible chromatin patches within each chromosome's centromere core (* p-value <0.01 Mann-Whitney).