Synthetic genomic reconstitution reveals principles of mammalian Hox cluster regulation

Precise Hox gene expression is crucial for embryonic patterning. Intra-Hox transcription factor binding and distal enhancer elements have emerged as the major regulatory modes controlling Hox gene expression. However, quantifying their relative contributions has remained elusive. Here, we introduce ‘synthetic regulatory reconstitution’, a novel conceptual framework for studying gene regulation and apply it to the HoxA cluster. We synthesized and delivered variant rat HoxA clusters (130-170 kilobases each) to an ectopic location in the mouse genome. We find that a HoxA cluster lacking distal enhancers recapitulates correct patterns of chromatin remodeling and transcription in response to patterning signals, while distal enhancers are required for full transcriptional output. Synthetic regulatory reconstitution is a generalizable strategy to decipher the regulatory logic of gene expression in complex genomes. One-Sentence Summary Reconstitution of gene regulation using large DNA constructs unravels the regulatory logic of a developmental gene locus.


Introduction
Developmental programs require precise spatial and temporal control of gene expression. To ensure this, developmentally important genes, such as Hox genes, are subject to multiple regulatory mechanisms that ensure appropriate expression patterns (1-3). Hox genes encode evolutionarily conserved transcription factors with crucial roles in cell fate patterning (4,5). Alteration of Hox gene expression patterns results in gross developmental defects, even transforming one body part into another (homeotic transformations) (6).
Hox genes are found within a unique genomic regulatory environment. In mammals, they are organized in four chromosomal clusters (HoxA, HoxB, HoxC, and HoxD), each harboring a subset of 13 Hox paralogs. Each cluster contains Hox genes tightly arranged within about 100kb in the same transcriptional orientation and lacks other coding genes (7). The spatial and temporal Hox gene expression patterns along the anterior-posterior (AP) axis of the developing embryo mirror the organization of the genes within the cluster, a phenomenon known as colinearity (4,8). Thus, the precisely regulated Hox clusters have become a paradigm for studying the fundamental link between genome organization and gene expression.
Low gene density regions peppered with distal regulatory elements surround Hox clusters. A collection of intricate genetic manipulations has revealed a complex regulatory landscape dispersed within the gene-poor region surrounding the Hox clusters (9)(10)(11)(12)(13). For example, the HoxA cluster relies on several retinoic acid (RA) and 4 Wnt responsive distal regulatory elements (enhancers) located in the gene desert between Hoxa1 and the next gene Skap2 (~300kb away) (14-17) ( Figure 1A).
In undifferentiated cells, where no Hox genes are expressed, the entire HoxA cluster is targeted by the Polycomb Repressive Complex 2 (PRC2) (18) and is densely carpeted with the H3K27me3 histone mark, the PRC2 catalytic product. Patterning signals activate transcription through their downstream transcription factors and partition the Hox cluster into two chromatin domains (19)(20)(21). Hox clusters contain regulatory elements that bind these transcription factors (22). For example, the activated RA receptors (RAR) bind to RA responsive elements (RAREs) embedded within the Hoxa1-Hoxa5 cluster domain (19,22,23). RA signaling leads to the separation of the cluster into two stable domains: active (H3K27ac marked), and inactive (H3K27me3 marked) chromatin, that contain transcribed and repressed genes, respectively (19-21) ( Figure   1A).
The Hox cluster chromatin partition is mirrored by a topological reorganization in 3D, from a single globular domain into two domains harboring either transcribed or repressed genes (20,21,24,25). In addition, the HoxA and HoxD clusters harbor topologically associated domain (TAD) boundaries (12,25,26) (Figure 1A). A strong topological boundary forms at CTCF binding sites within each of the clusters upon differentiation, separating the transcribed and repressed genes and promoting distal enhancer access to genes in the active domain (14,24,25). Deletion of CTCF motifs increases contacts between enhancers and repressed genes, and is associated with an expansion of the active chromatin domain, misexpression of Hox genes, and even homeotic transformations (24,25). 5 In sum, distinct regulatory modes control Hox gene expression: local transcription factor binding, distal regulatory elements, and topological DNA organization. To date, it has been difficult to generate alleles that can separate the function of these modes.
Therefore, a synergistic model describing their relative contribution and interactions has remained elusive. For example, is the endogenous genomic neighborhood and its organization required by CTCF and RAR to establish defined chromatin domains in response to RA? Are distal enhancers required for setting chromatin boundaries and the appropriate transcriptional output of single or multiple genes? Or do Hox clusters intrinsically contain the information required to activate the correct gene set in response to patterning signals ( Figure 1B)?
Measuring the relative contributions of the different regulatory modes would require the generation of a set of 'designer' variant alleles over a sizeable genomic window. In addition, studying the direct effect of these genomic modifications on gene expression requires their isolation from confounding factors, such as the compensatory effect of other regulatory elements in cis. The ability to recapitulate endogenous regulation, including chromatin changes and transcriptional output, when placed at an ectopic genomic location is a very stringent test of the inherent regulatory potential of a DNA sequence. Therefore, attempting to reconstitute HoxA gene regulation at an ectopic locus would directly test the sufficiency of the transposed regulatory elements to drive various stages of Hox cluster regulation. Further, transposing variant Hox clusters containing subsets of regulatory elements enables the study of relative contributions of and interactions between the various regulatory modes. 7 complex structural rearrangements, multiplex editing, and insertion of novel DNA sequences across a large genomic window are tractable. We recently described a pipeline that harnesses the power of the endogenous homologous recombination machinery in yeast to de novo assemble ~100kb regions of mammalian genomes and integrate them into a defined location in mouse embryonic stem cells (mESCs) (47).
Here we apply and extend this technology to the study of Hox cluster regulation. We de novo assembled variants of the rat HoxA cluster (ranging from 130 -170kb long) containing various combinations of the previously identified regulatory modes. We integrated them into a single-copy ectopic locus on the mouse X chromosome, thereby isolating them from the confounding effects of the native genomic neighborhood. We then asked whether variant ectopic clusters were sufficient to reconstitute the transcriptional and epigenetic HoxA cluster response to activating patterning signals ( Figure 1C).
We found that a minimal cluster lacking enhancers and a cluster with all distal enhancer elements placed directly adjacent to Hoxa1 drove the correct patterns of chromatin remodeling and transcription in response to the RA patterning signal, even in a foreign genomic neighborhood. Removal of the RAREs within the minimal HoxA cluster abolished all response to RA at the ectopic location. The introduction of enhancers to the HoxA cluster lacking RAREs was insufficient to fully rescue the loss of gene expression phenotype. This 'synthetic regulatory reconstitution' approach allows us to neatly separate the function of various regulatory modes and define the set of elements sufficient to drive various aspects of Hox gene regulation. We expect this will be a powerful approach to dissect the logic of transcriptional control in complex genomes. 8

Results
We sought to understand the relative contributions of the various regulatory modes involved in the response of HoxA to patterning signals: local transcription factor binding sites, distal regulatory elements, and genomic neighborhood organization. The regulatory potential of a DNA sequence is best understood by testing its ability to drive gene expression predictably at an ectopic genomic location. Therefore, we tested the ability of HoxA cluster variants inserted at an ectopic genomic location to respond to patterning signals. We investigated the following unknowns: 1) role of genomic context in recruitment of repressing, activating, and boundary forming machinery during HoxA regulation; 2) distal enhancers' quantitative contribution to HoxA regulation; and 3) role of intra-cluster regulatory elements.
To this end, we first relocated a synthetic cluster containing all elements previously described to be involved in endogenous HoxA regulation to an ectopic location on the X chromosome. To reveal the role of distal enhancers specifically, we synthesized a minimal construct lacking these sequences (Supplementary Figure 1A and Supplementary Figure 2A).

Synthetic Hox strategy and construction
All constructs described here are derived from rat (Rattus norvegicus) HoxA cluster sequence, which shares ~90% sequence similarity with the mouse sequence at HoxA.
Although the sequence is highly conserved, polymorphisms enable distinction between ectopic and endogenous HoxA in sequencing-based analyses. Notably, this facilitated experiments in two distinct genetic backgrounds: 1) cells containing the endogenous . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. 9 HoxA cluster, which serves as an internal positive control for Hox regulation and for quantitatively comparing expression levels; and 2) cells lacking endogenous mouse HoxA to eliminate possible sequence mapping challenges.
We first constructed a 134kb wild-type rat minimal HoxA cluster (SynHoxA) by harnessing the homologous recombination machinery in yeast (Saccharomyces cerevisiae) (Figure 2 and Supplementary Figure 1). The minimal SynHoxA cluster contains all HoxA coding genes and encompasses the sequence corresponding to the repressed H3K27me3 contiguous domain in undifferentiated mESCs. We produced 28 ~5kb PCR amplicons from BACs bearing the rat HoxA cluster, with ~200 bp overlap between adjacent segments to enable homologous recombination. These amplicons were combined into ~65kb rat HoxA half constructs upon co-transformation into yeast with appropriate linkers that direct assembly into an episomal yeast/E. coli shuttle vector. The half assemblies were recovered to E. coli, propagated, isolated, released from their BAC vectors and combined in yeast to produce the 134kb SynHoxA construct termed an "assemblon" (Supplementary Figure 1B). Edits to the assemblon can be made by switching the wild-type amplicons with synthetic DNA bearing the desired changes or by editing the assemblons directly using highly efficient, marker-free CRISPR Cas9-based engineering in yeast (48). This allows unfettered changes to preexisting assemblons to be made rapidly ( Figure 2).
Several studies have previously identified regulatory elements that respond to RA (RARE elements) and/or Wnt (Ades for HoxA developmental early side) in the genepoor region distal to the endogenous HoxA cluster, between Hoxa1 and Skap2 (14)(15)(16).
Deleting some of these regulatory regions reduces, but never eliminates, HoxA . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint 10 expression in response to RA or Wnt signaling (14,16). To evaluate the importance of these enhancers, we generated a 'compound enhancer' that included all of these experimentally verified elements condensed together in ~35 kb of DNA (Supplementary Figure 2). We fused this compound enhancer directly upstream of the 134kb core assemblon, resulting in a 170kb (Enhancers+SynHoxA) construct.
We performed DNA sequencing at each step of the pipeline (source BACs, half assemblons, and full assemblons) to confirm identity of the assemblon ( Figure 2C and Supplementary Figure 3). Sequencing revealed that there were no gross abnormalities in the constructs. However, we find that there is a mutation frequency of about 1 single nucleotide polymorphism per 6kb on average arising from errors in PCR (Supplementary Figure 3). We verified that none of these mutations were likely to impact the characterization of these clusters in our differentiation system described below. The variants lie in intergenic space and are largely associated with relatively low conservation, except for a single exonic variant in Hoxa7, which is not expressed in mESCs or in response to RA. (Supplementary Table 1) We used Inducible Cassette Exchange (ICE) for single-copy, site-specific delivery of the assemblons to the ectopic Hprt locus on the X chromosome of mESCs (49) (Supplementary Figure 5). Hprt is a housekeeping gene long used as a safe harbor site for genetic engineering studies (50). Although Hprt is in the middle of a TAD (26), recent work has shown remarkably little regulatory activity in the regions surrounding Hprt (51).
For these reasons, we chose it as an appropriate neutral site to attempt regulatory reconstitution of the HoxA locus. All constructs described here were delivered to both HoxA +/+ and HoxA -/-mESCs using ICE (Supplementary Figure 4). With ICE, on-target . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint 11 integration at Hprt was verified by phenotypic activation of conditional marker cassettes that depend on precise recombination events (G418 R and GFP + ). Resistant mESC clones were validated by PCR with SynHoxA specific primers spanning the entire construct as well as novel junctions formed with the genome following integration We used capture sequencing to validate that SynHoxA mESC clones contained the entire assemblon at Hprt. Nick-translation of our BACs yielded biotinylated probes specific to the synthetic ectopic locus, that were used to enrich for DNA from the assemblon, facilitating high-coverage sequencing at low cost (52). In all cases, we eliminated clones that contained deletions or rearrangements of SynHoxA sequence.
( Figure 2C). mESC clones that passed the sequencing pipeline were further validated for precise on-target single-copy integration and absence of any off-target integration by analyzing reads spanning junctions between the synthetic construct and mouse genome (52). Importantly, we only detected reads spanning the expected junctions at the landing pad and no off-target integrations ( Figure 2B). Therefore, we have established a powerful genome engineering technology at a scale that enables the study of transcriptional regulation at complex genomic loci, such as Hox clusters.

Induction of SynHox expression during motor neuron differentiation
To investigate the response of the ectopic SynHoxA clusters to patterning signals, we chose a widely-used mESC differentiation system that recapitulates key aspects of ventral spinal cord development (53-55). After treatment with the ventralizing Hedgehog (Hh) and rostro-caudal RA patterning signals, mESCs transition through relevant . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint 12 progenitor states and differentiate into motor neurons (MNs) and interneurons ( Figure   3A) (53). In this protocol, 90% of cells express Hoxa5, and have therefore collectively acquired an anterior or rostral identity (56, 57). Importantly, this differentiation strategy has provided critical insights into Hox regulation and insulating properties of CTCF that were ultimately confirmed in vivo (19,24,25,58).
We investigated whether cells containing both endogenous HoxA and the SynHoxA clusters differentiate appropriately by performing RNA-seq over the course of the differentiation protocol, and by comparing to previously published control datasets (59, Distal enhancers are not required to specify active genes but boost early transcriptional output Hox genes are repressed in their endogenous genomic context before exposure to patterning signals. In response to the 'anterior' signal RA, Hoxa1-5 are induced at high levels, while the posterior Hoxa7-13 remain repressed (19). We first asked whether the native configuration of distal enhancers and 3D organization are required for Hox clusters to appropriately activate genes in response to RA.
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made To that end, we differentiated cells with an intact endogenous HoxA cluster, additionally bearing the ectopic Enhancers+SynHoxA construct into MNs. As expected, the endogenous HoxA cluster induced the expected set of Hoxa1-5 genes and had some weak Hoxa6 expression at the latest time point as previously described (19,24) ( Figure   4A). Thus, the extra SynHoxA sequence did not affect endogenous Hox gene activation. We sought to quantify the contribution of distal enhancers to HoxA gene activation. We repeated the differentiation experiment in the cell line containing the minimal SynHoxA and an intact endogenous HoxA cluster. Similar to the Enhancers+SynHoxA cluster, SynHoxA also specifically induced SynHoxa1-5 ( Figure 4C, D and Supplementary Figure 6). Therefore, the HoxA cluster itself contains all information that is required to decode the RA patterning signal into an anterior MN positional identity, independent of distal enhancers and the complex native genomic architecture.
We quantified SynHoxA gene induction by comparing each SynHoxA gene to its endogenous mouse HoxA counterpart ( Figure 4E, F). There were some differences in . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint 14 the fine details of transcription from the SynHox clusters. Both Enhancers+SynHoxA and the minimal SynHoxA construct induced a lower amplitude of SynHoxa1 transcription than the endogenous cluster. On the other hand, SynHoxa2 induction surpassed the endogenous gene at 96h in both constructs. The induction kinetics of SynHoxa3-5 were slower than Hoxa3-5, but the mRNA levels became comparable after 96h, particularly in the Enhancers+SynHoxA construct. Therefore, whereas a minimal HoxA cluster correctly responds to patterning signals, additional elements are required to fine-tune expression amplitude and timing.
The transcriptional response of the ectopic HoxA clusters can be summarized as follows. First, both constructs correctly upregulate SynHoxa1-5 in response to anterior patterning signal RA during MN differentiation. Second, the addition of distal enhancers in Enhancers+SynHoxA leads to higher transcription levels, especially evident at earlier time points. Third, while the minimal SynHoxA is less effective at establishing high transcription levels early, the differences in mRNA levels lessen with differentiation time.
Fourth, a direct comparison with endogenous genes revealed that while the most anterior SynHoxa1-2 do not fully phenocopy endogenous expression levels, SynHoxa3-5 highly resemble endogenous genes at the latest time point.
These data support a model in which the compact Hox clusters contain all necessary information to respond to extracellular signals by inducing the appropriate gene set.
Distal enhancers are not required for specification of active genes but boost transcription and accelerate temporal dynamics. The endogenous genomic context and enhancer spacing may be required for further fine-tuning of early expression levels.
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Genomic context and distal enhancers are not required for PRC2 or CTCF recruitment
In ESCs, Hox clusters are carpeted with the repressive H3K27me3 histone modification (the PRC2 catalytic product) and recruit CTCF at potential boundary positions established in response to extracellular signals (19,24). Active and repressive chromatin domains are established rapidly upon RA treatment (19). Therefore, we investigated whether the chromatin dynamics and CTCF recruitment of the relocated HoxA clusters mirrors their transcriptional output by performing ChIP-seq.
We used cells lacking the endogenous HoxA cluster to more reliably map reads to the CTCF was recruited to similar sites within the ectopic Enhancers+SynHoxA and SynHoxA clusters as in the endogenous HoxA cluster ( Figure 5A, B). Prior to exposure to patterning signals, the Enhancers+SynHoxA and SynHoxA clusters had high levels of H3K27me3 marks ( Figure 5A, B). Thus, the Hox cluster, with or without distal enhancers, suffices to recruit the appropriate repressive chromatin marks and boundary elements even at an ectopic genomic location. The ability to recruit all components for correct patterning is intrinsic to the HoxA cluster sequence and is thus independent of endogenous genomic context or organization.
A minimal ectopic HoxA cluster establishes precise chromatin boundaries . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint 16 The endogenous HoxA cluster responds to RA by forming a precise chromatin boundary between Hoxa5 and Hoxa6, with some additional clearance of the repressive histone modification from Hoxa6 at later time points (19,24). The RA patterning signal leads to eviction of PRC2 from the Hoxa1-5 domain with a concomitant H3K27ac increase, while maintaining PRC2 repression at Hoxa7-13, thus separating the HoxA cluster into two chromatin domains, active and inactive. We asked whether ectopic Enhancers+SynHoxA would partition into two chromatin domains during MN differentiation. Similar to the endogenous HoxA cluster, H3K27me3 at the SynHoxa1-5 domain decreased while SynHoxa6-13 gained H3K27me3 ( Figure 5C). H3K27me3 removal from SynHoxa1-5 was complemented by an increase in H3K27ac coverage, with no detectable H3K27ac deposited at SynHoxa6-13. H3K27me3 was entirely cleared by 48h, which is slightly slower than the endogenous locus, which clears by 24h (19). Therefore, the Enhancers+SynHoxA assemblon at an ectopic genomic location can respond to a patterning signal by forming two chromatin domains containing active and inactive Hox genes ( Figure 5E) at exactly the correct positions. Thus, endogenous genomic context at HoxA is not required to translate an extracellular signal into an accurate epigenetic chromatin state.
We tested whether the minimal SynHoxA cluster would establish a chromatin boundary.
Indeed, SynHoxA recruited H3K27ac at anterior SynHoxA genes upon RA activation ( Figure 5D). Unlike the endogenous HoxA cluster and Enhancers+SynHoxA, H3K27me3 is not entirely removed from the SynHoxa1-5 domain during differentiation ( Figure 5D). Nonetheless, SynHoxA formed the appropriate, albeit weak, chromatin . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint 17 boundary at the SynHoxa5-a6 CTCF binding site, evident from the H3K27me3 reduction at SynHoxa1-5 ( Figure 5E, Supplementary Figure 12).
These results suggest that the minimal SynHoxA cluster has the intrinsic ability to induce dynamic chromatin domains, independent of genomic context. This construct was unable to reduce H3K27me3 in the active domain to the same extent as the construct with enhancers. Either the boost in transcription provided by distal enhancers at early time points facilitates clearance of the repressive chromatin or the enhancers serve as platforms to recruit additional chromatin modifiers.

Topological organization of ectopic SynHoxA clusters
The 3D structure of the HoxA cluster changes during MN differentiation, transitioning from a single association domain to two globular domains containing active or repressed chromatin (24,25). Moreover, the active domain preferentially establishes associations with distal regulatory elements.
We investigated the topological organization of the SynHoxA clusters by performing Hi- was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint only SynHoxA gene with any signal at all, consisting of a few reads mapping to the assemblon. The explanation for this slight residual activity might lie in a nearby weakly conserved and poorly characterized RAR binding site located between Hoxa5 and Hoxa6 (19,22).
This lack of RA response provided an ideal background to measure the independent contribution of distal enhancers to Hox gene expression and chromatin state regulation.
To that end, we built and integrated a fourth construct: Enhancers + RARE∆ SynHoxA.
This assemblon contained all the enhancers inserted upstream of the RARE∆ SynHoxA.  Figure 12). This suggests that distal enhancers have a weak ability to activate HoxA gene transcription independent of the internal RAREs' driving force.

Discussion
Landmark discoveries on distal enhancer contribution to Hox gene expression relied primarily on deleting regulatory elements (11,14,16). Here we develop a 'synthetic regulatory reconstitution' approach and apply it to investigate the relative contributions of the regulatory modes (distal enhancers and intra-cluster transcription factor binding) that control HoxA gene expression. We systematically assembled and precisely integrated large DNA constructs (130-170kb) encoding HoxA variants into mESCs. We . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint 20 then differentiated these genetically modified pluripotent stem cells into motor neurons to study Hox gene expression and chromatin dynamics. The SynHox cluster with enhancers (Enhancers+SynHoxA) induced the correct Hox gene set and formed a strong chromatin boundary at the correct location between SynHoxa5 and SynHoxa6. The SynHoxA genes were expressed at levels comparable to the endogenous cluster at the latest time point. Some differences in gene expression dynamics were observed, suggesting that either specific regulatory elements or the endogenous genomic architecture and enhancer spacing are required for fine-tuning gene expression (61). Future work with larger transposed constructs that include the intervening sequences will help discriminate between these possibilities.
The minimal ectopic SynHoxA lacking distal enhancers recruited PRC2 and CTCF in embryonic stem cells and induced the correct subset of genes in response to RA.
However, the dynamics of the response were delayed and reduced compared to the endogenous cluster and Enhancers+SynHoxA. This suggests that the distal enhancers' primary effect is not to specify the set of genes that is active, but rather to boost transcription levels and chromatin turnover rate. These results are consistent with previous studies in which enhancers were deleted from the endogenous locus, leading to lower Hox gene transcription in response to RA or Wnt (14,16,62) and during development (61,(63)(64)(65).
We would like to highlight two relevant issues for PRC2 and CTCF recruitment. First, although canonical Polycomb Response Elements (PREs) for PRC recruitment have not been described in mammals, a small number of genomic locations recruit PRC2 that . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint 21 spreads to the rest of the PRC2-repressed genome based on linear or 3D proximity (66). These results suggest the SynHoxA clusters either have the intrinsic ability to recruit PRC2 or that they associate with distal recruitment sites, like the endogenous Hox clusters (66). Second, HoxA and HoxD clusters are at TAD boundaries (12,25,26).
Therefore, CTCFs within the cluster engage in long-distance interactions with specific CTCF proteins bound at the other end of each TAD (25). Our results thus imply that precise CTCF positioning within the cluster does not depend on particular CTCF-CTCF interactions in the native locus or endogenous genomic organization.
Finally, we were able to study the relative contribution of distal enhancers and the regulatory elements that have been described within the HoxA cluster itself (19,22,23).
This cluster lacking internal RAREs failed to respond at both the chromatin and transcriptional levels, supporting a model where HoxA internal RAREs are crucial for the RA response. Importantly, a cluster that contained distal enhancers but lacked RAREs did not fully rescue this phenotype, suggesting that internal RAR binding is the primary mode by which the RA signal is translated into an appropriate HoxA response. Distal enhancers serve to boost the dynamics of the transcriptional and chromatin response mediated by RAR binding upon receiving the RA differentiation signal. Altogether, we conclude that Hox clusters are regulatory units, with an intrinsic ability to respond to patterning signals, explaining in part the compact structure of vertebrate Hox clusters ( Figure 7E, F).
This study is also the first proof-of-principle for a 'synthetic regulatory reconstitution' approach to understanding gene expression control. In vitro reconstitution has been a powerful framework to dissect complex biochemical processes because it allows for . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint 22 exquisite control over components of the system under study (67,68). With this ability to generate locus-scale variant constructs that would have previously been intractable or have required numerous rounds of arduous gene editing, we expect synthetic regulatory reconstitution to revolutionize the study of transcriptional regulation.
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ;

Acknowledgments
We want to thank all Mazzoni and Boeke lab members as well as the Institute for Systems Genetics community for their insightful comments and support. We thank Mohammed Khalfan, from the Genomics Core Facility at NYU, for making the reform tool for editing reference genomes publicly available; Bhavana Ragipani for preliminary observations and analysis on motor neuron differentiation markers; Natalie Zesati and Sonny Arora for help with preliminary visualization of the Hi-C data; Jane Skok and Danny Reinberg for insight and feedback. This work was supported in part by NHGRI

Competing interests
JDB is a Founder and Director of CDI Labs, Inc., a Founder of and consultant to Neochromosome, Inc, a Founder, SAB member of and consultant to ReOpen Diagnostics, LLC and serves or served on the Scientific Advisory Board of the following: Sangamo, Inc., Modern Meadow, Inc., Sample6, Inc., Tessera Therapeutics, Inc. and the Wyss Institute. The other authors declare no competing interests.

Data and materials availability
All sequencing data generated for this study will be deposited in NCBI GEO (pending).
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021.

Yeast and E. coli strains and media
All yeast work was performed in strain BY4741 using standard media and growth conditions. Yeast transformations were performed using the Lithium acetate method as previously described (41, 69). Plasmids were purified from bacterial cultures in Luria Broth (LB) supplemented with the appropriate antibiotic (12.5 µg/mL Chloramphenicol for source BACs, 25 µg/mL Kanamycin for all assemblons and 50 µg/mL Carbenicillin for all other plasmids). Plasmids purified from small scale bacterial cultures (5-10 mL) were used for all steps except for delivery to mESCs in which case DNA was purified from a large scale (500 mL) culture. Both bacteria and yeast were grown at 30°C.
A full list of yeast strains from this study is in Supplementary Table 2. A full list of plasmids from this study is in Supplementary Table 3.

Yeast colony PCR
With the exception of the 134kb SynHoxA half-assemblon build (see relevant section in methods), all yeast colonies were genotyped by hand. Briefly, a single yeast colony was resuspended in 10-40 µl of 20 mM NaOH and placed in thermocycler. Yeast colony suspensions were boiled for 95°C for min and then cooled to 4C for at least 5 min before proceeding to PCR. 1ul of yeast lysate was used as template in a 10 µl GoTaq Green reaction (Promega M7123) with 0.25 µM of primers. PCR program: 95°C -5min; 30X (95°C -20s, 55°C -90s, 72°C -1min); 72°C -5min; 4°Chold. PCRs were separated on a 1% agarose gel containing ethidium bromide. 1kb Plus DNA Ladder (New England Biolabs N0550S) was used as a molecular weight standard.

BAC recovery from yeast to E. coli
Plasmids were isolated from 5-10 mL yeast cultures either by alkaline lysis (39) or using Zymo Yeast Miniprep Kit I (Zymo Research D2001). Plasmids were then recovered into DH10B ElectroMax E. coli by electroporation (Invitrogen 18290015) following the manufacturer's instructions.

Yeast CRISPR/Cas9 editing
Donors bearing the desired mutations were generated by overlap extension PCR using Q5 polymerase (New England Biolabs M0492S) Given the high efficiency of homologous recombination following the generation of a defined double strand break, . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint 25 markers are not required to be encoded in the donor. This allows for seamless, markerfree editing. All CRISPR modifications in yeast were made as previously described (70,71). Target strains were first transformed with the Cas9 plasmid (pNA519 pRS413-TEF1p-Cas9) and subsequently transformed with gRNA plasmids and linear donor fragments generated by PCR. Correct clones were identified by colony PCR followed by sequencing. Yeast CRISPR guide sequences are listed in Supplementary Table 3. Donor sequences are listed in the supplementary tables associated with each assemblon.

Field Inversion Gel Electrophoresis (FIGE)
SynHox assemblon BACs were verified by digesting a ~250-500ng purified by alkaline lysis (72) from small scale (5-10 mL) saturated bacterial culture with PvuI-HF (New England Biolabs R3150S). Digestion reactions were carried out at 37°C for 3-24 hours. The entire reaction was separated using the CHEF-DR system (Biorad 1703670) on a 1% low melting temperature agarose gel (Lonza 50100) in 0.5X TBE. The system was programmed using the auto algorithm function for fragments between 2kb and 50kb. 500 ng of lambda monocut ladder (New England Biolabs N3019S) were used as a molecular weight standard. Gels were stained after separation in a 0.5 µg/mL solution of ethidium bromide in water for 20-30 min and destained in water for 20-30 min before imaging.

Design and build of 134kb SynHoxA
The 134kb SynHoxA assemblon was designed to cover the H3K27me3 domain at HoxA in mESCs (mm10 chr6:52151343-52285368). Corresponding rat coordinates (rn6 chr4:82120263-82339548) were identified using the UCSC genome browser Convert tool. The rn6 genome contains an erroneous duplication at HoxA between gaps in the assembly. We deleted this duplicated sequence in silico to arrive at the final 134kb SynHoxA sequence. Primers spanning the entire locus were designed using a custom script based on the Primer3 algorithm (73), yielding a list of 29628 unique 18-24bp primers that contain each of the 4 bases at a of 15% and meet cutoffs for primer dimer and hairpin formation scores. This list was then manually pruned to 28 primer pairs that cover the entire sequence with an average amplicon length of 4.5kb (range: 3.2kb-5.4kb) and an average overlap with the adjacent amplicon of 207bp (range: 363bp -102bp). See Supplementary Table 4 for coordinates of design.
134kb SynHoxA was built by first constructing two half-assemblons (segments 1-14 and segments [15][16][17][18][19][20][21][22][23][24][25][26][27][28]. Two µl of each PCR amplicon were used to generate a pool, which was then transformed into yeast strain BY4741 with appropriate linkers (generated by overlap extension PCR) to direct assembly into a linearized vector (100 ng of I-SceIdigested pLM453) as previously described. (47) Linkers contained unique restriction enzyme sequences (I-SceI and AsiSI) to facilitate isolation of the insert sequence from the vector. The right linker for each of half assemblon also encoded a selectable URA3 gene. In each assembly step, yeast transformants were first screened for the correct phenotypes (e.g. Ura+). Subsequently, colonies were tested for the presence/absence of assembly junctions using spanning primers (sequences are listed in Supplementary  Table 5) across the construct with the aid of a robotic workcell as previously described (47). Plasmids from correct yeast clones (ySP0084 and ySP0085) were recovered into E. coli by electroporation to generate larger quantities of DNA.
To build the full 134kb SynHoxA assemblon, half-assemblon BACs (pSP0180 and pSP0182) were purified by alkaline lysis. Inserts were released using AsiSI digest and ~1 µg of each were transformed into BY4741 with appropriate linker fragments and linearized vector (I-SceI digested pLM453). The right linker for this step encoded a LEU2 marker, which allowed for the simple exclusion of yeast colonies arising from the transformation of undigested half-assemblon BACs (marked with URA3). Leu+ yeast colonies were screened for the presence of a PCR amplicon spanning segments 14-15 junction and for primers spanning the entire construct. Plasmids from correct yeast clones (ySP0088/89) were recovered to E. coli (pSP0193/196) and purified by alkaline lysis for verification by FIGE and sequencing.
In order to functionalize the assemblon for delivery by ICE, the backbone was modified in yeast using CRISPR. Strain bearing the full 134kb assemblon was transformed with Cas9 expressing plasmid pNA519 to yield yeast strain ySP0093. ySP0093 was subsequently transformed with a gRNA encoded in pSP0197 and a donor amplified from pLM707 containing the PGK-ATG-loxM-loxP-GFP cassette. The insertion was verified by Sanger sequencing. Plasmid was recovered from this yeast strain (ySP0096) into bacteria (pSP0211) and used for delivery to mESCs.
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint Supplementary Table 5 lists all sequences used in the build of 134kb SynHoxA, including segment primers, junction primers, linkers, gRNAs and reagents used for amplification of the donor used to insert the ICE cassette.

Design and build of 170kb Enhancers + SynHoxA
Mouse enhancer coordinates were derived from published reports of enhancer function (14)(15)(16) and were expanded by adding 1 kb on each side. Mouse coordinates were then mapped to the rat genome using the UCSC genome browser Convert tool. In order to be conservative, we defined the final rat enhancer coordinates to be the union of those derived from the original and extended mouse enhancer coordinates (See Supplementary Table 6 for details). All enhancer sequences were then appended to yield the compound enhancer sequence. Primers were designed to amplify the requisite sequences as well as to generate linkers between non-contiguous segments (Supplementary Table 7).
PCR amplicons corresponding to the linkers between non-contiguous enhancer segments were generated by overlap extension PCR. Initial amplicons were generated with Q5 polymerase supplemented with 1 M Betaine. These were gel purified using Zymoclean Gel DNA Recovery Kit (Zymo Research D4002) and the two fragments were mixed together at a 1:1 ratio and 1-10 ng of this mix was used as template with outer primers in 20 µl Q5 reactions to generate the final linkers. 10 µl of each segment's PCR product and linker product were pooled separately and then isopropanol-precipitated to a final volume of 10 µl of TE.
Yeast strain bearing the 134kb SynHoxA assemblon with delivery cassette (ySP0096) was transformed with Cas9 plasmid pNA519 to yield ySP0108. ySP0108 was transformed with plasmid pSP0233, which expresses a gRNA that cuts upstream of the 134kb assemblon, and with 10 µl of segment PCR pool and 5 µl of linker PCR pool to repair the gap. Yeast colonies were screened manually with junction primers spanning the overlaps between segments and with primers that span the 134kb SynHoxA . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint 28 assemblon (see Supplementary Table 7 for enhancer junction primers and  Supplementary Table 12 for primers that span the 134kb SynHoxA assemblon).
A verified yeast colony (ySP0109) was recovered to bacteria (pSP0242) for verification by FIGE and sequencing.

Design and build of 130kb RARE∆ SynHoxA and 166kb Enhancers + RARE∆ SynHoxA
Rat RARE sequences were based on published reports of mouse RAREs that were converted to Rat coordinates using the UCSC genome browser Convert tool (22). The region containing 3'Hoxa1 RARE (rn6 chr4:82120263-82123922) was deleted. Other RARE mutations were designed such that only the direct repeat sequences were mutated to polyA. Mutagenesis was performed using CRISPR/Cas9. Donors were generated by overlap extension PCR (Supplementary Table 8).
Hoxa7 CDS mutation R131W was corrected in ySP0108 (yeast strain with 134kb SynHoxA -ICE delivery cassette + pNA519 Cas9 plasmid) by CRISPR/Cas9. ySP0108 was transformed with gRNA expression plasmid pSP0255 and a donor generated from a synthetic oligo obtained from IDT (Supplementary Table 8). Yeast colonies were screened by PCR followed by Sanger sequencing to yield wild type Hoxa7 strain ySP0126, which was transformed with Cas9 expression plasmid pNA519 to yield ySP0128, which is the parent to all subsequent assemblons.
3'Hoxa1 RARE was first deleted using gRNA expression plasmid pSP0277 to yield ySP0131. ySP0131 was transformed with pNA519 Cas9 expression plasmid to yield ySP0146. 5'Hoxa3 RARE and 5'Hoxa4 RAREs were modified in ySP0146 with gRNAs expressed from pSP0322 and requisite donors to yield ySP0147. Finally, 3'Hoxa4 RARE was mutated in ySP0147 using the gRNA expression plasmid pSP0315 to yield ySP0161. All mutations were verified in yeast colonies using genotyping primers specific to the mutation (Supplementary Table 8) followed by Sanger sequencing of the region, To add enhancer sequences to the 130kb RARE∆ SynHoxA assemblon, enhancers were amplified from the 170kb Enhancers + SynHoxA BAC (see section on build of 170kb assemblon). A pool of enhancer amplicons was transformed into ySP0161 with gRNA expression plasmid pSP0233 and linkers to repair the cut upstream of the assemblon. Yeast colonies were screened for the presence of all enhancer junctions as well as sequences spanning the 134kb SynHoxA assembly (see Supplementary Table 7 for enhancer junction primers and Supplementary Table 12 for primers that span the . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint 29 134kb SynHoxA assemblon) The plasmids from ySP0161 and ySP0200 were recovered to bacteria and subject to verification by FIGE and sequencing to yield pSP0328 and pSP0394, respectively.
Supplementary Table 9 lists all cell lines used in this study.

PCR genotyping of mESCs
mESC clones were initially screened by performing PCR on crude gDNA extracted from cells growing in a 96-well plate. Briefly, cells were washed once with PBS and frozen at -80°C for at least 30 min. The plate was thawed at room temperature and cells were resuspended in 40-50 µl of TE supplemented with 0.3 µg/ul Proteinase K (Thermo . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint 30 Scientific EO0492) and transferred to a 96-well PCR plate. Cell suspensions were then incubated in a thermocycler at 37°C for 1 hour and then 99°C for 10 min. 1 µl of this crude lysate was used for PCR in a 10 µl GoTaq Green reaction (Promega M7123) with 0.25 µM of primers. Candidate clones were expanded and re-genotyped using DNA extracted with the QIAamp DNA mini kit (Qiagen 51306) according to the manufacturer's protocol. 25-50 ng of DNA was used as template per reaction. PCRs were separated on a 1% agarose gel containing ethidium bromide. 1kb Plus DNA Ladder (New England Biolabs N0550S) was used as a molecular weight standard.

Generation of HoxA -/cell line
Deletion of the endogenous HoxA alleles was designed to mimic the sequence of 134kb SynHoxA (mm10 chr6:52151380-52285416). gRNAs were designed using CRISPOR and were chosen to have high cutting efficiency and low off-target scores (74). gRNAs were cloned into pSP0172, a modified version of pX459 (Addgene #62988, a gift from Feng Zhang) that has the Puromycin resistance gene replaced by a Blasticidin resistance gene. A 200 bp single stranded oligo donor (ssODN, oSP379) was designed to bridge the deletion and ordered from Integrated DNA Technologies (IDT). gRNA expression plasmids were purified using PureLink HiPure Plasmid Midiprep Kit (Invitrogen K210004) following manufacturer's instructions. 12.5 µg of gRNA expression plasmids pSP0161 and pSP0164 were nucleofected into 1 million A17iCre mESCs along with 5 µl of 100 µM ssODN. Cells were transiently selected with 10µg/mL Blasticidin (Invivogen ant-bl-05b) for 3 days post nucleofection. Single clones were picked and genotyped using primers that span the deletion junctions and with primers internal to mouse HoxA. Sanger sequencing confirmed the precise deletion. Clones with the correct genotype were expanded and subject to WGS. Clones were also verified by metaphase spread karyotyping and quantitative real-time PCR (qRT-PCR) for ESC markers as previously described (47). A passing clone (1-A6) was used for all experiments.
Sequences of guides, donor and genotyping primers can be found in Supplementary  Table 10. Sequences of qRT-PCR primers are in Supplementary Table 11.

Delivery of assemblons to mESCs and verification
Bacterial strain carrying the desired assemblon was struck out and grown for 2 days at 30°C on LB-agar plates containing 50 µg/mL Kanamycin. Single colonies were picked into 5 mL of LB + 25 µg/mL Kanamycin and grown for approximately 8 hours. The starter culture was used to seed 500 mL cultures at a 1:1000 dilution. 500 mL cultures . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. DNA was delivered by nucleofection using the Nucleofector 2b system (Lonza VPH-1001). 1.5 million A17iCre mESCs were plated on gelatin-coated 10 cm dishes 2 days before delivery. Cre expression was induced with 1 µg/mL doxycycline 18-24 hours before delivery. On the day of delivery, cells were washed with PBS and harvested using Accutase (Biolegene 423201). 5-6 million cells were used per delivery. 10 µl purified assemblon DNA were used per delivery. Cells were resuspended in 100 µl nucleofection solution with purified DNA and transferred to a cuvette using wide bore tips. Cells were nucleofected using program A-023 and plated on gelatinized 10 cm dishes. Cells were selected with 400 µg/mL Geneticin (Life Technologies, 10131-027) 48 hours post nucleofection. Resistant clones were picked 10-14 days post nucleofection.
Crude gDNA was extracted from clones and PCR genotyping was performed with primers spanning genome junctions (tetO-GFP and PGK-Neo), primers specific to endogenous HoxA deletion and primers specific to the overwritten Cre in the landing pad. In addition, clones were screened with a subset of heterologous primers that were designed using Primer-Blast to be specific to SynHoxA sequences when compared to endogenous mouse HoxA. Correct clones were expanded, pure genomic DNA was extracted and clones were genotyped with the full complement of primers. Passing clones were then verified to contain the full assemblon by capture sequencing. Sequences of genotyping primers are in Supplementary Table 12. To generate the flow cytometry plot in Supp. Fig 5, parental WT mESCs and captureseq verified WT mESCs bearing the 134kb SynHoxA assemblon at Hprt were treated with 3 µg/mL doxycycline for 24 hours. Flow cytometry was performed on a BD Accuri C6 instrument and results were analyzed using FlowJo. Cells were gated on forward and side-scatter and histograms of GFP expression normalized to mode were plotted.

Library preparation for next generation sequencing
A list of all sequencing libraries and information associated with them can be found in Supplementary Table 13. BAC DNA sequencing . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. Libraries for whole genome sequencing (WGS) of parental mESC lines (WT and HoxA -/-) were sequenced on a Novaseq 6000 in paired end mode.
Targeted sequencing using in-solution hybridization capture (Capture-seq) was performed on SynHoxA-delivered mESCs as previously described (52). Biotinylated baits for capture sequencing were prepared from assemblon BACs using nick translation. 134kb SynHoxA mESCs were captured with bait made from 134kb SynHoxA BAC. All other mESCs were captured with bait made from 170kb Enhancers + SynHoxA BAC. In addition, the parental mESCs and RARE∆ SynHoxA mESCs were captured with a bait that included the ICE landing pad and flanking mouse genome sequence (pLM1103+ICEFlanking). See Supplementary Table 13 for details. All libraries from SynHoxA deliveries were sequenced on an Illumina NextSeq 500 in paired end mode. 33 Glycine and cells were washed with 1 × PBS. Samples were divided into ~25-30 million cell aliquots, pelleted by centrifugation at 275 g, and frozen at -80°C. Cell aliquots were thawed on ice and lysis was performed in 5 mL of 50 mM HEPES-KOH pH 7.5, 140 mM NaCl, 1 mM EDTA pH 8.0, 10% glycerol (vol/vol), 0.5% Igepal (vol/vol), 0.25% Triton X-100 (vol/vol) with 1 × protease inhibitors (Roche, 11697498001) for 10 min at 4°C. Cells were pelleted by centrifugation for 5 min at 1200 g, resuspended in 5 mL of 10 mM Tris-HCl pH 8.0, 200 mM NaCl, 1 mM EDTA pH 8.0, 0.5 mM EGTA pH 8.0 with 1 × protease inhibitors, and incubated for 10 min at 4°C on a rotating platform. Cells were centrifuged for 5 min at 1200 g and resuspended in 2 mL of Sonication Buffer (50 mM Hepes pH 7.5, 140 mM NaCl, 1 mM EDTA pH 8.0, 1 mM EGTA pH 8.0, 1% Triton X-100 (vol/vol), 0.1% sodium deoxycholate (wt/vol), 0.1% SDS (vol/vol) with 1 × protease inhibitors). For sonication, each sample was split in two Bioruptor tubes with added sonication beads. Sonication was performed using the Bioruptor Pico (Diagenode) for 18 cycles of 30 sec on and 30 sec off to sheer DNA into an average size of approximately 200bp. Immunoprecipitation was performed for 16h at 4°C on a rotating platform by incubating with Dynabeads protein-G (Thermo Fisher Scientific) conjugated with antibodies. For histone modifications, half of each original cell aliquot was incubated with Dynabeads protein-G conjugated with 5 µg of rabbit polyclonal to Histone H3K27me3 (Active motif 39155) or rabbit polyclonal to Histone H3 (acetyl K27) (Abcam ab4729) antibodies. For CTCF, the entire cell aliquot was incubated with Dynabeads protein-G conjugated with 5 µl of rabbit polyclonal to CTCF (EMD 07-729) antibody.
After the immunoprecipitation, washes were performed with the following ice-cold buffers: sonication buffer, sonication buffer with 500 mM NaCl, LiCl wash buffer (20 mM Tris-HCl pH 8.0, 1 mM EDTA pH 8.0, 250 mM LiCl, 0.5% Igepal (vol/vol), 0.5% sodium deoxycholate (wt/vol)), and TE buffer (10 mM Tris-HCl pH 8.0, 1 mM EDTA pH 8). Elution was performed in elution buffer (50 mM Tris-HCl pH 8.0, 10 mM EDTA pH 8.0, 1% SDS (vol/vol)) by incubating for 45 min at 65°C with occasional flicking of the tube. Samples were incubating for 16h at 65°C to perform reversal of crosslinks. 200 μL of TE and RNase A (Sigma) at a final concentration of 0.2 mg/mL was added to digest RNA by and incubating for 2h at 37°C. Proteins were digested by adding Proteinase K (Invitrogen) at a final concentration of 0.2 mg/mL, supplemented with CaCl2, at 55°C for 30 min. DNA was purified with phenol:chloroform:isoamyl alcohol (25:24:1; vol/vol) (Invitrogen) followed by an ethanol precipitation. DNA pellets were resuspended in 70 μL of water. lllumina DNA sequencing libraries were prepared with approximately one third of the ChIP sample (24 μL) or a 1:100 dilution of the input sample in water. Library preparation was performed by end repair, A-tailing and ligating Illumina-compatible Bioo Scientific multiplexed adapters. Agencourt AmpureXP beads (Beckman Coulter) were used to remove unligated adapters. PCR amplification was performed with Phusion polymerase (New England Biolabs) and TruSeq primers (Sigma). Libraries were gel . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint 34 purified (Qiagen) between 250 and 550bp in size. Libraries were quantified before pooling using the KAPA library amplification kit on the Roche Lightcycler 480 or the Bio-Rad CFX96. The libraries were sequenced on Illumina NextSeq 500 using V2.5 chemistry (75 cycles, single-end 75bp) or on Illumina NovaSeq 6000 using the SP Reagent Kit (100 cycles, single-end 100bp) at the Genomics Core Facility at NYU.

RNA-seq
Cells were collected at 0h or 24h, 48h, or 96h after RA/SAG treatment. RNA-seq was performed as previously described (75). RNA was extracted with the TRIzol LS Reagent (Life Technologies) followed by purification with the RNAeasy mini kit (Qiagen). RNA integrity was checked with the Agilent High Sensitivity RNA Screentape (Agilent, 5067-5579). 500 ng of RNA was used to prepare libraries and spiked-in with ERCC Exfold Spike-in mixes (Thermo Fisher 4456739). The TruSeq Stranded mRNA Library Preparation kit (Illumina 20020594) was used to prepare RNA-seq libraries. Library size was checked on the High Sensitivity DNA ScreenTape (Agilent 5067-5584). The KAPA library amplification kit was used to quantify libraries on the Bio-Rad CFX96 or the Roche Lightcycler 480 before pooling libraries. Libraries were sequenced on the Illumina NextSeq 500 using V2.5 chemistry (75 cycles, single-end 75 bp) or on the Illumina NovaSeq 6000 using the SP Reagent Kit (100 cycles, single-end 100bp) at the Genomics Core Facility at NYU. Control (without synthetic Hox constructs) RNA-seq datasets were previously published: 0h in (59); and 48h/96h after RA/SAG in (75).

Hi-C
Cells were collected at 0h and 48h after RA treatment. Cells were divided into ~1x10 6 aliquots and crosslinked in a final concentration of 2% FA (vol/vol) in 1 × PBS for 10 min at room temperature. After quenching with Glycine, cells incubated on ice for 15 min, pelleted by centrifugation at 500 g, and frozen at -80°C. Hi-C was performed using the Arima-HiC workflow (Arima Genomics, San Diego, CA) by NYU Langone's Genome Technology Center (RRID: SCR_017929).

Custom references
Two modified versions of the mm10 genome and corresponding genome annotations, were created using the reform tool (https://reform.bio.nyu.edu/). The genome mm10_synHoxA was created by replacing mm10 chrX:52963048-52997452 (Hprt) with the sequence corresponding to 170kb Enhancers + SynHoxA delivered to the ICE . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint 35 landing pad. A second genome mm10_synHoxA_delHoxA was then created by removing the endogenous HoxA sequence that is deleted in the HoxA -/-mESC line (mm10 chr6:52151380-52285416).
Variants in SynHoxA assemblons were called relative to the rn6 rat reference genome. Sequencing data from assemblon BACs were mapped using the same pipeline described for parental mESCs (52). A modified rn6 genome was used as reference. Two sequences were masked: 1) the sequence corresponding to the mistaken duplication at HoxA (rn6 chr4: 82229539-82315425) and 2) an unplaced contig (4_KL567939v1_random) containing HoxA sequence. Variants were then called using a standard pipeline based on bcftools v1.9: bcftools mpileup-redo-BAQ-adjust-MQ 50-gap-frac 0.05-max-depth 10000-maxidepth 200000 -a DP,AD-output-type u | bcftools call-keep-alts -ploidy 1-multiallelic-caller -f GQ-output-type u . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021.

Capture-sequencing coverage analysis
All capture-seq data coming from assemblon delivery to WT mESCs was mapped to mm10_synHoxA whereas all data from HoxA -/-mESCs was mapped to mm10_synHoxA_delHoxA for coverage analysis. Reads were demultiplexed with Illumina bcl2fastq v2.20 requiring a perfect match to indexing BC sequences. Illumina sequencing adapters were trimmed with Trimmomatic v0.39. Reads were mapped using bowtie2 v2.2.9. Samtools was used to sort bam files and coverage tracks were generated using deeptools (81) bamCoverage v3.2.1 with bin size set to 1, ignoring duplicates. Coverage was visualized in the Integrated Genomics Viewer (IGV) coordinates chrX:52967270-53136430 (83).

Capture-sequencing integration site analysis
Bamintersect analysis was performed as previously described (52) with a few modifications. Briefly, capture-seq data were mapped to mm10 and independently to references containing the ICE landing pad sequence and the delivered assemblon sequence. Bamintersect identifies junctions by looking for read pairs where each read is mapped to a different reference.
To exclude spurious hits, certain sequences present in multiple contexts are masked. For example, the mouse Pgk1 promoter is found in the rtTA cassette integrated at Rosa26, at its endogenous location on chrX and at the 3' end of the delivered assemblon driving G418 (Neo) resistance. Similarly, the SV40 pA signal is present downstream of both the GFP in the assemblon and G418 R (Neo) gene in the landing pad. Coordinates of masked sequences are: 1) Pgk1 promoter found in the rtTA cassette integrated at Rosa26 (mm10 chr6:113071694-113077114) 2) Pgk1 promoterendogenous location (mm10 chrX:106186732-106187231) and 3) region of the mouse genome immediately downstream of the G418 R SV40 pA (chrX:52962425-52963047). In addition, the endogenous mouse HoxA sequence (mm10 chr6:52121869-52285368) . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint 37 is masked to eliminate hits that arise from cross mapping of highly conserved regions between the rat derived assemblon and mouse HoxA.
Reads with the same strand and mapping to within 500 bp of each other were clustered for reporting. Regions below 150 bp or with fewer than 2 reads/10M reads sequenced were excluded.
To generate sliding window plots of ChIP-seq signal, bedtools makewindows was used to generate bins of 3kb sliding 300 bp. Bedtools coverage was then used to compute the mean coverage in each bin from the sorted bam files (84). Coverage was normalized across samples using RPKM that was calculated as: reads-per-bin/(number of mapped reads (in millions) * bin length (kb)). Mean value and standard deviation of replicates was then plotted using Python matplotlib.
Of note, the vector backbone for all integrated constructs was covered in H2K27me3 at all time points. This corroborates previous reports of bacterial sequence silencing in mammalian cells (Supplementary Figure 13) (85).

RNA-seq data analysis
Fastq files were aligned to the genome (mm10_synHoxA or mm10_synHoxA_delHoxA custom genomes) using HISAT2 (86,87), using options: -p 20 -q --rna-strandness F. Mapped reads were assigned to annotated genes using the featureCount function in Rsubread (88), using options -s 2. Read counts were normalized using the 'rlog' or regularized log transformation in DESeq2 (89) and used as inputs for the Principal Component Analysis (PCA). The log2 fold change (FC) and adjusted p-value in gene expression levels between 24h, 48h, and 96h vs. 0h was estimated using DESeq2 and plotted using ggplot (90). Normalized counts from DESeq2 were used to generate transgene/endogenous ratio plots, with the counts from endogenous locus being reduced by half to normalize the copy number between HoxA clusters. RNAseq track visualization in Suppl. Fig. 6 was performed using combine tracks tool in IGV (83) on . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint 38 bigwigs generated with deeptools bamCoverage (81) and subsequent editing in Adobe Illustrator.

Hi-C data analysis
Hi-C data was aligned against the custom mm10_synHoxA_delHoxA genome using BWA mem (version 0.7.17) using parameters -M -t 4 and aligning each mate pair independently (77). Samtools (version 1.11) was used to sort mapped reads by read name, and the pair_reads.py script in mHiC was used to join mate-pairs into a pairedend SAM file (82, 91). Paired-end read counts were then binned at 10kbp resolution to create the genome-wide contact matrix. The TAD calls displayed in Supplementary  Figure 8 were produced using HiCseg (version 1.1) (92). The HiCseg_linkC_R function in HiCseg was provided with the segment of the 10kbp-resolution contact matrix corresponding to coordinates chrX:43000000-63000000 (mm10_synHoxA_delHoxA custom genome), and the following parameters: nb_change_max=100, distrib="G", model="D". The heatmap was then plotted at coordinates chrX:52400000-53600000 and chrX:52800000-53300000, with maximum color intensity set at a contact frequency of 40. . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint

Figure 1
Step . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint Sequencing data for assemblon DNA isolated from bacteria (purple) and from capture sequencing after integration in mESCs (blue and green). DNA sequencing data shown here are aligned to custom reference genomes (see Methods).
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021.   . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint  . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021.   Enhancers   a13  a10  a7  a2  a5  a1  a3  a4  a6  a9  a11   a13  a10  a7  a2  a5  a1  a3  a4  a6  a9  a11 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint   1 a 2 a 3 a 4 a 5 a 6 a 7 a 9 a 1 0 a 1 1 a 1 3 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

SynHoxa gene
The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint  Gaps in rn6 assembly . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021.  Conservation -Mouse  . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint SuppFigure 3: Mutations in SynHoxA assemblon that are generated during construction. Sequencing data from various stages of assemblon construction: source rat BACs, half-assemblon BACs and full-assemblon BACs. Below each coverage track, variant positions in comparison to the reference rn6 genome are depicted with reference allele (top) and variant allele (bottom). Data shown here are aligned to the rat reference genome rn6 and smoothed over 500bp.
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021.

Parental mESC
HoxA -/-mESC  . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  Capture-sequencing data generated from parental WT and HoxA -/-mESCs using rat HoxA sequence as bait. Data are aligned to custom mouse reference genomes and normalized for sequencing depth.
There is minimal cross-mapping between endogenous HoxA and SynHoxA loci.
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021.  . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 9 a 1 0 a 1 1 a 1 3 SynHoxa gene a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 9 a 1 0 a 1 1 a 1 3 SynHoxa gene a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 9 a 1 0 a 1 1 a 1 3 SynHoxa gene a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 9 a 1 0 a 1 1 a 1 3 SynHoxa gene a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 9 a 1 0 a 1 1 a 1 3 Hoxa gene a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 9 a 1 0 a 1 1 a 1 3 Hoxa gene . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint SuppFigure 8: Ectopic SynHoxA clusters self-organize in 3D during differentiation. Heatmaps of Hi-C data during MN differentiation from mESCs lacking the endogenous HoxA cluster that contain either Enhancers + SynHoxA (A) or SynHoxA (B) at Hprt. Black lines indicate topological boundaries called by an unbiased algorithm, HiSeg2.0. A topological boundary formed between SynHoxa5 and SynHoxa6 in Enhancers + SynHoxA, mirroring endogenous organization.
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint SuppFigure 10: Delivery of RARE∆ assemblons to WT and HoxA -/-mESCs. (A) Genotyping PCRs separated on an agarose gel. RARE∆ specific primers are used to verify presence of the RARE mutations. Clones were also screened using SynHoxA specific primers that span the length of the assembly, primers specific to enhancer junctions, primers that span novel junctions formed with the genome (tetO-GFP, PGK-Neo) and primers that confirm overwriting of the Cre gene. In addition, the presence or absence of the endogenous HoxA cluster deletion was confirmed using deletion specific primers. (B) Only the expected junctions spanning the synthetic assemblon and the host genome were observed in next generation sequencing data with no off-target integrations. (C) Positions of the enhancers and protein coding genes are shown in black. Capture sequencing data is shown from WT and HoxA -/-mESC clones arising from delivery of RARE∆ SynHoxA assemblons. Sequencing data shown here are aligned to a custom reference genome (see Methods).
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021.  was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint SuppFigure 11: mESCs with SynHoxA assemblons differentiate well into MNs. . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint SuppFigure 13: Vector backbones recruit chromatin modifications. ChIP-seq data mapping to the vector backbones associated with all SynHoxA clusters is presented at all time points analyzed. As part of the ICE delivery, the two parts of the vector backbone are separated by the SynHoxA cluster. Bacterial derived sequences were covered with the repressive chromatin modification H3K27me3 (red). The yeast LEU2 gene recruited the activating chromatin modification H3K27Ac (blue) and H3K27me3. Some inconsistent CTCF (black) recruitment was observed. ChIP-seq data shown here are aligned to a custom mm10 reference genome (see Methods).
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.07.451065 doi: bioRxiv preprint