ABSTRACT
The Shox2 homeodomain transcriptional regulator is known for its critical functions during mouse embryogenesis, enabling accurate development of limbs, craniofacial structures, neural populations and the cardiac conduction system. At the genomic level, the Shox2 gene is flanked by an extensive gene desert, a continuous non-coding genomic region spanning over 500 kilobases that contains a multitude of evolutionarily conserved elements with predicted cis-regulatory activities. However, the transcriptional enhancer potential of the vast majority of these elements in combination with the biological necessity of the gene desert have not yet been explored. Using transgenic reporter assays in mouse embryos to validate an extensive set of stringent epigenomic enhancer predictions, we identify several novel gene desert enhancers with distinct tissue-specific activities in Shox2 expressing tissues. 4C-seq chromatin conformation capture further uncovers a repertoire of gene desert enhancers with overlapping activities in the proximal limb, in a compartment essential for Shox2-mediated stylopod formation. Leveraging CRISPR/Cas9 to delete the gene desert region contained in the Shox2 topologically associated domain (TAD), we demonstrate that this complex cis-regulatory platform is essential for embryonic survival and required for control of region-specific Shox2 expression in multiple developing tissues. While transcription of Shox2 in the embryonic limb is only moderately affected by gene desert loss, Shox2 expression in craniofacial and cardiac domains is nearly abolished. In particular, Shox2 transcripts in the sinus venosus (SV) encompassing the sinoatrial node (SAN) were depleted in embryos lacking the gene desert, likely accounting for the embryonic lethality due to Shox2-dependency of the SAN pacemaker. Finally, we discover a 1.5kb SV enhancer within the deleted gene desert region, which may act as a genomic module controlling the development of the cardiac conduction system. In summary, our results identify a gene desert indispensable for pleiotropic patterning and highlight the importance of these extensive regulatory landscapes for embryonic development and viability.
INTRODUCTION
The function of gene deserts has posed a considerable puzzle since these large noncoding regions were first shown to be a prominent feature of the human genome almost 20 years ago1. As further vertebrate genomes were sequenced, orthologous gene deserts that shared synteny were found2. Originally defined as gene-free chromosomal regions larger than 500 kilobases (kb), gene deserts frequently contain many interspersed, highly conserved sequences that function as transcriptional enhancers3,4. Not surprisingly, these extensive cis-regulatory landscapes are found enriched near genes with important developmental functions, such as transcription factors (TFs), suggesting a critical role for gene deserts in regulation of key developmental genes2,3. The first megabase-scale deletions of gene deserts surprisingly had no obvious effect on mouse development and only mildly affected the expression of nearby genes, suggesting that these chromosomal regions may be dispensable5. When chromosome-conformation-capture techniques were developed, it became possible to accurately predict the range and identity of specific cis-regulatory interactions within a given locus. For example, Montavon et al. applied these emerging technologies and genomic deletions to show that an 830kb gene desert containing a “regulatory archipelago” of limb enhancers was required for expression of HoxD genes in distal limbs6. Such an arrangement of dispersed enhancers within an extensive gene desert, or sometimes within gene-rich regions, has now emerged as a paradigm for understanding the control of tissue-specific transcription during development7-9. Thereby, the identification of topologically associating domains (TADs) as a unit of chromosomal organization has refined our understanding of how dispersed enhancers are integrated into a gene’s regulatory architecture10,11. Since enhancer-promoter interactions are generally confined within a given TAD, deletions or inversions involving TAD boundaries can lead to a gain or loss of gene expression as regulatory interactions are redistributed within the reconfigured TADs12,13. Therefore, elucidating the regulatory activities in the vast non-coding segments of TADs can have profound implications for our understanding of the basis of human disease. To date, comprehensive studies of gene regulatory regions in mice involving chromosome conformation capture, transgenic reporter assays and genomic deletions have been conducted on a restricted number of loci including Shh, Pitx1, Epha4/Pax3/Ihh, and the HoxD genes, and most commonly focusing on the developing limb11,14-16.
In the current study, we focused on the mouse short stature homeobox 2 gene (Shox2) as an ideal model to study the cis-regulatory complement underlying pleiotropic gene expression and driving the development of multiple embryonic tissues. Shox2 function is essential for the development of several discrete structures, including the proximal limb (the humerus and femur), craniofacial compartments, the facial motor nucleus of the hindbrain, and a subset of neurons of the dorsal root ganglia17-21. Most importantly, Shox2 is required for cardiac pacemaker differentiation in the sinoatrial node (SAN) and therefore its inactivation leads to embryonic lethality due to bradycardia starting around embryonic day 11.5 (E11.5)22,23. We previously showed that the regulation of Shox2 in limbs is controlled by multiple cis-regulatory modules and even the combined deletion of two Shox2 proximal limb enhancers had relatively small effects on Shox2 expression and limb morphology24,25. Here, we performed a more stringent test of the resilience of Shox2 expression in multiple tissues by deleting the gene desert adjacent to Shox2, which encodes a plethora of genomic elements with developmental enhancer signatures. First, using a combination of epigenomic analysis, chromatin conformation capture and transgenic reporter assays, we identify numerous gene desert enhancers with distinct subregional activities in limbs, craniofacial compartments and neural cell populations, directly correlated with dynamic Shox2 expression in mouse embryos. Our deletion analysis then uncovers a critical role of the gene desert in controlling Shox2 expression not only in the proximal limb mesenchyme and craniofacial compartments, but also in the SAN-containing cardiac sinus venosus (SV). Finally, using open chromatin profiling from embryonic hearts we discover a SV enhancer likely involved in the essential Shox2-controlled regulation of the cardiac pacemaker system. Taken together, our results emphasize fundamental roles of a large cis-regulatory gene desert in transcriptional control of a key developmental gene.
RESULTS
The mouse Shox2 transcription factor is located on chromosome 3 in a TAD spanning 1 megabase (Mb) of genomic sequence that contains the major fraction of a 675 kilobase (kb) gene desert26 (Fig. 1A). This Shox2-TAD harbors one additional protein-coding gene (Rsrc1), while three other genes (Mlf1, Veph1, Ptx3) are found in neighboring chromatin domains (Fig. 1A, S1A). These Shox2-adjacent genes have not been involved in developmental patterning and show either near-ubiquitous (Mlf1, Rsrc1) or differential (Veph1, Ptx3) tissue-specific expression profiles (Fig. S1A). Transcription of Shox2 is highly regulated around mid-gestation with prevalent expression domains in the developing limbs, craniofacial structures, the heart, neuronal populations of the mid- and hindbrain, and emerging facial nerves (Fig. 1B, S1A). This temporally dynamic and pleiotropic character suggests considerable complexity in the genomic regulatory landscape controlling Shox2 activities. However, only a limited number of Shox2-associated transcriptional enhancers, with activities restricted to brain and limb sub-regions have been identified to date (Vista Enhancer Browser)24,25,27.
To characterize the cis-regulatory complexity encoded in the extended Shox2 TAD and specifically in the aforementioned gene desert, we established a map of stringent enhancer predictions using a combination of chromatin state profiles (ChromHMM) and H3K27ac ChIP-seq peak calls across sixty-six embryonic and perinatal tissue-stage combinations from ENCODE28 (https://www.encode.project.org) (see methods). After excluding promoter regions, this analysis of the epigenome identified 30 genomic elements with robust enhancer signatures in at least one of the tissues and timepoints examined (Figs. 1B, S1 and Table S1). Remarkably, 16 of the 30 elements were located within the Shox2 gene desert representing putative gene desert enhancers (GDEs). Indeed, the majority of GDEs showed dynamic spatiotemporal H3K27ac profiles including a combination of limb, craniofacial, cardiac or neuronal signatures (Figs. 1B). Collectively, these results suggest that the Shox2 gene desert encodes a major fraction of the cis-regulatory modules controlling Shox2 in a temporally and spatially-restricted manner in mouse embryos.
While H3K27 acetylation represents the primary epigenomic mark used to predict active transcriptional enhancers genome-wide28,29, these predictions are not always congruent with cell-type or tissue-specific activities in vivo30,31. Therefore, to determine the relevant developmental enhancer activities of predicted GDE elements we conducted LacZ transgenic reporter assays in mouse embryos at embryonic day 11.5 (E11.5) (Fig. 1C and Table S2), a stage characterized by wide-spread and functionally relevant Shox2 expression25. This analysis led to the identification of a battery of novel in vivo enhancers with distinct tissue-restricted activities, many closely overlapping subregional Shox2 expression domains in craniofacial compartments, cranial nerve or brain regions (Fig. 1B, C). However, while multiple GDEs showed elevated H3K27ac signatures in developing limbs, transgenic screen identified only one element (GDE6) able to drive reporter activity in forelimbs (Fig. 1C). Notably, two GDEs (GDE9 and GDE15) displayed elevated H32K27ac in both limb and craniofacial tissues, but drove LacZ reporter expression exclusively in Shox2-overlapping craniofacial domains in the medial nasal (MNP) and maxillary-mandibular (MXP, MDP) processes, respectively (Fig. 1C). Our analyses also revealed multiple enhancers (GDE1, 5 and 12) with activities in cranial nerve tissue, including the trigeminal (TGn), facial (FGn) and jugular (JGn) ganglia, as well as the dorsal root ganglia (DRG) (Fig. 1C). Shox2 is expressed in all these neural crest-derived tissues, but a functional requirement has only been observed for the development of the FGn and the mechanosensory neurons of the DRG20,21. Interestingly, while no H3K27ac profiles for cranial nerve populations were available from ENCODE, both GDE5 and GDE12 elements showed elevated H3K27ac in craniofacial tissue at E11.5, potentially mirroring the common neural-crest origin of cranial nerve and a subset of craniofacial cell populations32. At mid-gestation, Shox2 is also expressed in the diencephalon (DE), midbrain (MB) and hindbrain (HB), and is specifically required for cerebellar development33. In accordance, our gene desert enhancer screen also revealed a set of novel brain enhancers (GDE7, 14 and 16) overlapping Shox2 domains in the DE, MB or HB (Fig. 1C). In contrast, despite the presence of strong cardiac enhancer signatures in a subset of the tested gene desert elements, none of these predicted cis-regulatory modules drove reproducible reporter expression in the heart at E11.5 (Fig. 1B, C). Taken together, our results uncover the potential of the Shox2 gene desert to regulate a significant portion of the pleiotropic Shox2 expression pattern and emphasize the importance of validating tissue-specific epigenomic predictions in vivo using transgenic reporter assays.
Shox2 exerts a crucial role during limb development in controlling the formation of the humerus and femur via direct chondrogenic and osteogenic patterning mechanisms17,34-36. Although multiple elements with limb enhancer potential were identified in the gene desert (Fig. 1B), our transgenic screen of GDE elements only uncovered a single enhancer with forelimb activity at E11.5 (GDE6). Another, previously characterized limb enhancer (LHB-A/hs1262)24,25 located 43kb downstream of the Shox2 transcriptional start site (TSS) was not selected by our epigenomic profiling analysis, as a result of an earlier activation pattern and differential temporal enhancer signatures28 (Fig. 2A, S2A, B). Therefore, to better define the ensemble of limb enhancers interacting with Shox2 and relevant for limb chondrogenesis and/or osteogenesis, we performed circular chromosome conformation capture (4C-seq) from proximal limbs at E12.5 (Fig. 2B, S2C). We conducted two independent 4C-seq experiments using a viewpoint directly adjacent to the Shox2 promoter (Fig. 2A, B and Table S3). The two replicates displayed reproducible interaction profiles revealing discrete regions with high interaction frequencies with the Shox2 promoter (Fig. 2B). Notably, the vast majority of these regions was located within the gene desert and also marked by open chromatin, H3K4me1 and/or H3K27 acetylation (Fig. 2A), indicative of cis-regulatory modules6,28. In accordance, five of these preferentially interacting regions mapped to GDE elements, including limb (GDE 6) and craniofacial enhancers (GDE 9, 15) (Fig. 1C, S1, 2A, 2B). In addition, our 4C-seq results confirmed interactions between Shox2 and the previously identified proximal limb enhancers (PLEs) m741/hs741 (termed here PLE1) and LHB-A/hs1262 (termed PLE2) located upstream (−89kb) and downstream (+43kb) of the Shox2 TSS, respectively (Fig. 2B, C and Table S4)24,25,36. Finally, our 4C-seq analysis identified three Shox2-contacting gene desert modules (+237kb, +407kb and +568kb) with limb enhancer signatures (Fig. 2A, B). And indeed, subsequent transgenic analysis in mouse embryos at E12.5 revealed that each of these elements (termed PLE3, 4 and 5) on its own was able to drive transgenic reporter expression in the proximal limb (Fig. 2B, C and Table S4). While both, PLE3 and PLE4 displayed activities co-localizing with skeletal progenitors from E11.5 to E13.5, PLE5 activity was restricted to the proximal-anterior limb mesenchyme and apparent at later stages (E12.5 and E13.5), predominantly in the hindlimb (Fig. S3). In a last step, using 4C-seq we assessed the 3D interaction profiles of selected individual enhancers (PLE2 and PLE4) (Fig. S2C). These experiments corroborated the specific interactions observed between both enhancers and the Shox2 promoter (Fig. S2C). Interestingly, while PLE2 shows no interaction with other enhancers, PLE4 is establishing contacts with two other proximal limb enhancers (PLE1 and PLE3) (Fig. S2C). This finding suggests that several 3D conformations co-exist in the limb at the Shox2 locus, each one involving a different enhancer subset contacting the Shox2 promoter. In summary, our results unveil a proximal limb enhancer (PLE) repertoire encoded in the Shox2 gene desert and suggest a significant role of the gene desert in controlling limb-specific Shox2 expression.
Next, to determine the functional necessity of this limb enhancer repertoire and the regulatory relevance of the Shox2 gene desert as a whole, we used CRISPR/Cas9 in mouse zygotes to delete the gene desert region (582kb) located within the Shox2-TAD and encompassing PLE2-5 as well as GDE1-15 elements (Figs. 2A, S4A and Tables S5, S6). Heterozygous F1 mice with clean deletion breakpoints (S2GDΔ/+) (Fig. S4A) were born at expected Mendelian ratios and showed no impaired viability and fertility. However, following intercross of F1 heterozygotes, no mice homozygous for the gene desert deletion were born, and S2GDΔ/Δ embryos displayed lethality between E11.5 and E13.5 (Fig. S4C), reminiscent of the lethality observed in Shox2-deficient embryos due to cardiac pacemaker defects22,23. Assessment of Shox2 expression in fore- and hindlimbs of S2GDΔ/Δ embryos at mid-gestation revealed surprising resilience of the spatial Shox2 transcript domain (Fig. 3A), despite the loss of multiple PLEs (Fig. 2C, S3). Instead, Shox2 transcript levels in the limb were reduced by approximately half in absence of the gene desert, indicating significant quantitative contributions of the PLE elements (Fig. 3B and Table S7). To circumvent embryonic lethality and to study the cumulative phenotypic requirement of the gene desert enhancers for limb skeletal morphology, we used a Prx1-Cre conditional approach37 allowing allelic reduction of Shox2 specifically in the limb (Fig. 3C). Remarkably, loss of the gene desert in a sensitized genetic background (defined by reduced Shox2 gene dosage due to Prx1-Cre-mediated Shox2 inactivation on one allele) revealed severe shortening of the stylopod in both limb types, most pronounced in the hindlimb (Fig. 3C). Together, these results indicate that telomeric (upstream) limb enhancers (including hs741) act largely autonomously in controlling spatial Shox2 expression, while the centromeric (downstream) gene desert limb enhancers have a role in conferring transcriptional and phenotypic robustness in a predominantly quantitative manner.
Shox2 also displays important tasks in assuring normal craniofacial development, involving a requirement of Shox2 for palatogenesis as well as formation of the temporomandibular joint (TMJ) required for jaw functionality in mammals18,19. These tasks are dependent on embryonic Shox2 expression in distinct craniofacial domains, such as the anterior part of the palatal shelves and the maxillary-mandibular junction, respectively18,19. Notably, at E11.5, S2GDΔ/Δ embryos revealed Shox2 downregulation in precisely the anterior portion of the palatal shelves as well as the proximal maxillary (MXP) and mandibular (MDP) processes (Figs. 4A, 4B). Furthermore, Shox2 expression in the medial nasal process (MNP) was severely downregulated at E10.5 and E11.5 (Figs. 4A, 4B). Hereby, the reduction of Shox2 expression in the MXP-MDP domain and MNP of S2GDΔ/Δ embryos suggests an essential functional contribution of the two craniofacial enhancers (GDE9 and GDE15) identified in our transgenic screen based on epigenomic predictions and located in the deleted gene desert region (Figs. 1B, 2A). Importantly, GDE9 and GDE15 show activity patterns that closely overlap Shox2 in the maxillary-mandibular (MXP-MDP) and MNP compartments, respectively (Fig. 4C). In addition, transgenic validation of other predicted GDEs identified multiple brain and cranial nerve activities (Figs. 1B, C), but with the exception of the nodose ganglion no obvious alterations in spatial Shox2 expression in these tissues were observed in S2GDΔ/Δ embryos (Fig. 4A). Hereby, the presence of multiple brain enhancers with overlapping activities in the diencephalon, midbrain and hindbrain, both inside and outside the gene desert (Fig. S5, Vista Enhancer Browser), suggests that removal of brain-specific enhancers might be buffered by redundant enhancer interactions38. Strikingly, in situ hybridization (ISH) analysis in S2GDΔ/Δ embryos at E10.5 revealed absence of Shox2 transcripts in the sinus venosus (SV) myocardium comprising the SAN region (Fig. 4A). Quantitative expression profiling in S2GDΔ/Δ embryonic hearts at E11.5 furthermore revealed severe downregulation of cardiac Shox2 transcripts (Fig. 4B). Together, these findings indicate that the embryonic lethality observed in S2GDΔ/Δ embryos is a result of depleted Shox2 in the SV myocardium encompassing SAN pacemaker cells22,39, potentially due to the deletion of a cardiac SV enhancer located in the gene desert. However, rather surprisingly, our previous transgenic validation of epigenomic predictions did not reveal regulatory modules driving reproducible reporter activity in cardiac tissues (Fig. 1B, C).
At E11.5, Shox2 protein is specifically localized in the sinus venosus (SV) myocardium which includes the venous valves and the SAN pacemaker cell population40 (Fig 5A, B). In accordance with the absence of Shox2 transcripts in the SV at E10.5 (Fig. 4A), we found that in E11.5 S2GDΔ/Δ embryos Shox2 is largely depleted in cells of the SV comprising the SAN pacemaker myocardium marked by Hcn440, while it is retained to some degree in the mandible (Fig. 5A, B). As Shox2 gene inactivation leads to embryonic lethality due to a SAN pacemaker defect22,23, our results suggest that in S2GDΔ/Δ embryos SAN-specific loss of Shox2 is responsible for the observed embryonic lethality phenotype (Fig. S4C), indicating the presence of one (or multiple) critical SV enhancers in the deleted gene desert region. In search of a cardiac enhancer located in the gene desert we then conducted ATAC-seq41 from embryonic hearts at E11.5 to define genome-wide open chromatin signatures including potential cis-regulatory modules with cardiac and consequently SV enhancer activity at E11.5 (Fig. 5C, D). ATAC-seq peak calling analysis uncovered 10 elements within the deleted gene desert region which were significantly enriched for open chromatin (Fig. 5C and Table S8). Four of these elements co-localized with regions enriched for H3K27ac in the heart at E11.5 and were identified as part of our initial epigenomic analysis (GDE7, GDE10, GDE11 and GDE12) (Fig. 1B). As none of these elements drove reproducible LacZ reporter activity in cardiac regions at E11.5 (Fig. 1C), we also validated the remaining six gene desert elements with significant open chromatin signatures (+224kb, +283kb, +326kb, +389kb, +405kb, +520kb) using transgenic reporter assays at E11.5 (Fig. 5C). Strikingly, the element located 326kb downstream of the Shox2 TSS was the only one to drive reproducible LacZ reporter expression in the heart and indeed its activity co-localized with Shox2 in the SV myocardium (Fig. 5B, C, E, F). To refine the genomic sequence driving SV enhancer activity we then also validated a second element (termed +325kb) partially overlapping the +326kb enhancer in a block of conserved sequence marked by low ATAC-seq signal (Fig. 5E). Remarkably, the +325kb region showed identical reporter activity overlapping Shox2 expression in the SV at E11.5, indicating that SV enhancer activity is restricted to the 1.5kb region of overlap (Fig. 5E). Interestingly also, the conserved sequence in the region of overlap harbors a binding motif of the Tbx5 transcription factor (p<0.001, JASPAR CORE vertebrates collection, based on PWMScan43) (Fig. S6), a presumptive upstream regulator of Shox2 in SAN pacemaker cells39. Together, these results identify a gene desert enhancer with specific activity in the SV, whose absence in S2GDΔ/Δ embryos potentially accounts for the embryonic lethal loss of Shox2 expression in cardiac SAN pacemaker cells.
DISCUSSION
The majority of gene deserts located in the vicinity of developmental regulators are considered evolutionarily ancient and stable, and typically harbor a large number of conserved elements with predicted cis-regulatory signatures2. Assessment of extensive cis-regulatory regions flanking a number of developmental genes, such as the cluster of HoxD genes or Sox9, has demonstrated the biological relevance of gene deserts and non-coding chromatin domains in regulation of developmental gene expression6,13,44. Nevertheless, the precise functional contributions of gene deserts near a majority of critical developmental regulators remains unexplored. Here, we characterize the cis-regulatory output and functions of a gene desert downstream of the Shox2 transcriptional regulator. Our results reveal the cis-regulatory complexity underlying transcriptional orchestration of a key developmental gene with important implications for functional interpretation of enhancer-gene interactions and of the evolution of gene deserts into pleiotropic expression control units.
A reservoir of transcriptional enhancers essential for pleiotropic Shox2 expression
Enhancers with tissue- and stage-specific biological functions typically exhibit restricted temporal activity windows31. To pinpoint the robust cis-regulatory activities embedded in the gene desert and involved in the regulation of Shox2, we chose an unbiased approach based on the presence of the active enhancer mark H3K27ac across a range of embryonic stages28. While it remains challenging to predict precise temporal and spatial enhancer activities from bulk tissues in vivo, the stringent and unbiased nature of our analysis identified 12 novel gene desert enhancers (from 16 predictions) with specific subregional activities in Shox2-expressing tissues, such as limb, craniofacial compartments, cranial nerve and brain cell populations. In addition, our 4C-seq chromatin conformation capture from limb in combination with subsequent transgenic analysis starts to delineate the likely critical cluster of limb enhancers orchestrating Shox2-mediated stylopod formation. This cluster is reminiscent of a multipartite enhancer ensemble, such as the one regulating the Indian Hedgehog (Ihh) gene, or the HoxD cluster genes, in multiple tissues and due to additively acting enhancers with partially overlapping activities45,46. While many developmental enhancers with overlapping counterparts are known to exert specific tasks, they also exhibit partially redundant functions serving as a regulatory buffer to ensure phenotypic robustness24,47,48. We observe similar transcriptional resilience of spatial Shox2 expression following CRISPR-mediated removal of the gene desert, in particular in the limb and brain. The functional significance of the gene desert for limb development is corroborated by quantitative reduction of Shox2 in absence of this regulatory landscape, leading to severely affected stylopod development in a genetically sensitized background. The cumulative removal of enhancers via deletion of the gene desert further allowed functional assessment of fundamental cis-regulatory activities in other tissues. Most notably, in absence of the gene desert, we observed a depletion of Shox2 transcripts in the sinus venosus (or inflow tract), comprising the SAN pacemaker population and most likely cause of the observed embryonic lethality phenotype22,49. Furthermore, our results demonstrate that craniofacial Shox2 expression and in particular Shox2 transcripts in the mandibular and nasal processes critically depend on the presence of the gene desert. A recent study uncovered that human (and mouse) extreme long-range enhancers located in a large gene desert upstream of Sox9 are acting across nearly 1.5 Mb to regulate Sox9 expression in craniofacial regions, such as the nasal, maxillary and mandibular processes50. Similarly, our study identifies gene desert enhancers with activities in nasal and maxillary-mandibular regions, the latter likely critical for the formation of the temporomandibular joint18.
Cis-regulatory control of cardiac Shox2 essential for embryonic viability
Alongside other TFs, such as Isl1 or Tbx3, Shox2 in mice is a key regulator of cardiac pacemaker cells of the sinoatrial node (SAN), the primary pacemaker of the heart39. While the genetic hierarchies and transcriptional cell states orchestrating cardiac pacemaker development have been characterized, the genomic cis-regulatory modules underlying this process have remained largely unexplored. Here we demonstrate an essential regulatory requirement of the gene desert for embryonic viability at mid-gestation by maintaining Shox2 transcription in the cardiac sinus venosus (SV) encompassing the SAN pacemaker myocardium. In a very recent, independently published study, van Eif et al. report complementary observations at the same locus49. In this study, they performed ATAC-seq on SAN-like pacemaker cells differentiated from human pluripotent embryonic stem cells (hESC) and Hcn4+ SAN cells of newborn mice to delineate the cis-regulatory modules controlling the expression of TFs promoting cardiac pacemaker cell fate, such as TBX3, ISL1 and SHOX2. While the authors initially focused on human cis-regulatory landscapes near these genes, they used CRISPR/Cas9 deletions to investigate the function of homologous SAN-specific accessible chromatin regions in the Shox2 and Tbx3 loci in mouse embryos49. In particular, within the 582kb gene desert domain deleted here, their study narrows the critical space down to a ∼250kb region. Consistent with our observations in embryonic hearts of S2GDΔ/Δ embryos (Fig. 4A, 5A, B), Van Eif et al. confirm the embryonic lethality phenotype in their embryos lacking the 250kb region and show that the lethality is likely a result of a hypoplastic SAN (and venous valves) due to loss of Shox2 protein in the SV49. In addition, through our targeted exploration we now define a 1.5kb element located within this 250kb window and driving transcriptional activity specifically in the Shox2 domain of the SV (Fig. 6C, E), potentially acting as critical enhancer controlling Shox2 in SAN pacemaker cells. Further enhancer deletion analyses will uncover whether Shox2 transcription in the SAN is controlled by a single cis-regulatory unit or is shielded by multiple enhancers as it could be the case in human embryos49.
A blueprint for disease-relevant enhancer repertoires controlling human SHOX
Together, our findings significantly expand on former analyses that identified a panel of mouse (and human) Shox2 enhancers with activities mostly restricted to limb and hindbrain (Vista Enhancer Browser)25,27. Interestingly, such tissue-specific activities were also found to be conserved in distinct elements of the similar-sized gene desert flanking the human SHOX gene25,51. Disruption of enhancers within the gene desert downstream of SHOX represents the likely mechanistic cause of Léri-Weill dyschondrosteosis (LWD) and idiopathic short stature (ISS) syndromes in a significant fraction of cases52 and SHOX haploinsufficiency is directly associated with the skeletal abnormalities observed in Turner syndrome and LWD53,54. One study has also found a link between neurodevelopmental disorders and microduplications at the SHOX locus, suggesting that such perturbations may alter neural development or function55. In humans, SHOX2 represents the closely related paralog of SHOX and is encoded in all vertebrate genomes. However, while many functional aspects of human SHOX2 remain unknown, a link between heterozygous SHOX2 mutations and SAN dysfunction as well as familial/early onset atrial fibrillation has recently been demonstrated56,57. Rodents have lost their SHOX gene in the course of evolution and therefore entirely rely on the function of Shox2, which features an identical DNA-interacting homeodomain and is replaceable by human SHOX in a mouse knock-in line58. Thus, in light of the overlapping expression patterns and critical functions of mouse Shox2 and human SHOX, as well as the presence of a gene desert downstream of both genes, our results provide a blueprint for the investigation of the regulatory control of pleiotropic SHOX expression, especially in those tissues where both genes are expressed during development: the hindbrain, thalamus, pharyngeal arches and limbs59,60. It will be particularly interesting to determine whether “orthologous” cardiac, craniofacial, neural and/or limb enhancers exist, and whether human SHOX enhancers share motif content or other enhancer grammar characteristics61 with mouse Shox2 enhancers. Indeed, human and mouse orthologs of a highly conserved enhancer located 160kb/47kb downstream of human SHOX and mouse Shox2, respectively, were found to drive overlapping activities in the hindbrain25. Such enhancers presumably originate from a single ancestral SHOX locus, preceding the duplication of SHOX and SHOX2 paralogs and are therefore considered evolutionary ancient. Within this context, future comparative studies should search for deeply conserved orthologs of SHOX and SHOX2 enhancers in basal chordates such as amphioxus, which express their single Shox gene in the developing hindbrain62. The recent identification of orthologous Islet gene enhancers in sponges and vertebrates63 demonstrate the promise of such an approach.
MATERIALS AND METHODS
Experimental Design
All animal work at Lawrence Berkeley National Laboratory (LBNL) was reviewed and approved by the LBNL Animal Welfare Committee. Knockout and transgenic mice were housed at the Animal Care Facility (the ACF) at LBNL. Mice were monitored daily for food and water intake, and animals were inspected weekly by the Chair of the Animal Welfare and Research Committee and the head of the animal facility in consultation with the veterinary staff. The LBNL ACF is accredited by the American Association for the Accreditation of Laboratory Animal Care International (AAALAC). Transgenic mouse assays and enhancer knock-outs at LBNL were performed in Mus musculus FVB strain mice. Animal work at the University of Calgary involving the production, housing and analysis of transgenic mouse lines shown in Figs. 2 and S3, as well as breeding and skeletal analysis of S2GD mice, was approved by the Life and Environmental Sciences Animal Care Committee (LESACC). All experiments with mice were performed in accordance with Canadian Council on Animal Care guidelines as approved by the University of Calgary LESACC, Protocol # AC13-0053. The following developmental stages were used in this study: embryonic day E10.5, E11.5, E12.5, E13.5 and newborn mice (the latter only for skeletal preparations). Animals of both sexes were used in these analyses. Sample size selection and randomization strategies were conducted as follows: Transgenic mouse assays. Sample sizes were selected empirically based on our previous experience of performing transgenic mouse assays for >3,000 total putative enhancers (VISTA Enhancer Browser: https://enhancer.lbl.gov/). Mouse embryos were excluded from further analysis if they did not encode the reporter transgene or if the developmental stage was not correct. All transgenic mice were treated with identical experimental conditions. Randomization and experimenter blinding were unnecessary and not performed.
Knockout mice
Sample sizes were selected empirically based on our previous studies24,38. All phenotypic characterization of knockout mice employed a matched littermate selection strategy. Analyzed S2GD knockout embryos and mice described in this paper resulted from crossing heterozygous gene desert deletion (S2GDΔ/+) mice together to allow for the comparison of matched littermates of different genotypes. Embryonic samples used for in situ hybridizations and quantitative gene expression profiling were dissected and processed blind to genotype.
Hi-C data re-analysis
Raw reads from Hi-C on mouse embryonic stem cells (mESCs) from Bonev et al., 2017, available on GEO (GSE96107), were reprocessed using HiCUP v.0.6.1. Valid pairs used to generate the Hi-C map in Fig. 1A are available on GEO (GSE161259) and the code used to generate the representation of the extended Shox2 TAD is available on https://github.com/lldelisle/Hi-C_reanalysis_Bonev_2017. The matrix heatmaps were plotted using pygenometracks64.
In vivo transgenic LacZ reporter analysis
For all elements tested, except PLEs, transgenic mouse LacZ reporter assays were conducted as previously described31,65 and the related primer sequences and genomic coordinates are listed in Tables S2 and S8. Predicted enhancer elements were PCR-amplified from mouse genomic DNA (Clontech) and cloned into an Hsp68-LacZ expression vector31. PLE elements were amplified via PCR from bacterial artificial chromosomes containing the appropriate mouse genomic DNA (Table S4) then cloned into the βlacz plasmid, which contains a minimal human β-globin promoter-LacZ cassette, as described25. Due to their large size, PLE3 (10,351 bp) and PLE5 (9,473 bp) were amplified with the proofreading polymerase in the SequalPrep™ Long PCR Kit (Invitrogen). Permanent transgenic lines (Fig. S3) were produced at the University of Calgary Centre for Mouse Genomics by pronuclear injection of DNA constructs into CD-1 single-cell stage embryos as described66. Male founder animals (or male F1 progeny produced from transgenic females) were crossed to CD-1 females to produce transgenic embryos which were stained with X-gal by standard techniques65.
4C-seq
For each of two biological replicates, proximal forelimbs were dissected in PBS from 10-12 E12.5 CD-1 embryos using the cutting pattern shown in the inset of Fig. 2B. Tissue was prepared for 4C-seq as described67. Cells were dissociated by incubating the pooled tissue in 250µl PBS supplemented with 10% fetal fetal calf serum (FCS) and 1 mg/ml collagenase (Sigma) for 45 minutes at 37° C with shaking at 750 rpm. The solution was passed through a cell strainer (Falcon) to obtain single cells which were fixed in 9.8 ml of 2% formaldehyde in PBS/10% FCS for 10 minutes at room temperature, and lysed and 4C-seq performed68. Libraries were prepared by overnight digestion with NlaIII (New England Biolabs (NEB)) and ligation for 4.5 hours with 100 units T4 DNA ligase (Promega, #M1794) under diluted conditions (7 ml), followed by de-crosslinking overnight at 65°C after addition of 15ul of 20mg/ml proteinase K. After phenol/chloroform extraction and ethanol precipitation the samples were digested overnight with the secondary enzyme DpnII (NEB) followed again by phenol/chloroform extraction and ethanol precipitation purification, and ligated for 4.5 hours in a 14 ml volume. The final ligation products were extracted and precipitated as above followed by purification using Qiagen nucleotide removal columns. For each viewpoint, libraries were prepared with 100 ng of template in each of 16 separate PCR reactions using the Roche, Expand Long Template kit with primers incorporating Illumina adapters. Viewpoint and primer details are presented in Table S3. PCR reactions for each viewpoint were pooled and purified with the Qiagen PCR purification kit and sequenced with the Illumina HiSeq to generate single 100bp reads. Demultiplexed reads were mapped and analyzed with the 4C-seq module of the HTSstation pipeline as described69. Results are shown in UCSC browser format as normalized reads per fragment after smoothing with an 11-fragment window and mapped to mm10 (Figs. 2B, S2C). Raw and processed (bedgraph) sequence files are available under GEO accession number GSE161194.
Generation of gene desert knock-out mice using CRISPR/Cas9
Mouse strains encoding the 582kb gene desert deletion centromeric to the Shox2 gene body were engineered using in vivo CRISPR/Cas9 editing, as previously described with minor modifications24. Pairs of single guide RNAs (sgRNAs) targeting genomic sequence 5’ and 3’ of the gene desert were designed using CHOPCHOP70 (see Table S5 for sgRNA sequences and coordinates). To generate the deletion a mix containing Cas9 mRNA (final concentration of 100 ng/ul) and two sgRNAs (25 ng/ul each) in injection buffer (10 mM Tris, pH 7.5; 0.1 mM EDTA) was injected into the cytoplasm of single-cell FVB strain mouse embryos. Founder (F0) mice were genotyped via PCR utilizing High Fidelity Platinum Taq Polymerase (Thermo Fisher) to identify the desired deletion breakpoints generated via NHEJ (see Fig. S4A and Table S6 for genotyping strategy, primer sequences and PCR amplicons). Sanger sequencing was used to identify and confirm deletion breakpoints in F0 and F1 mice (Fig. S4A).
In situ hybridization
For assessment of spatial gene expression changes in mouse embryos, whole mount in situ hybridization using digoxigenin-labeled antisense riboprobes was performed as previously described71. At least three independent embryos were analyzed for each genotype. Embryonic tissues were imaged using a Leica MZ16 microscope coupled to a Leica DFC420 digital camera.
Quantitative real-time PCR (qPCR)
Isolation of RNA from microdissected embryonic tissues at E11.5 was performed using the Ambion RNAqueous Total RNA Isolation Kit (Life Technologies) according to the manufacturer’s protocol. RNA was then subjected to RNase-free DNase (Promega) treatment and reverse transcribed using SuperScript III (Life Technologies) with poly-dT priming according to manufacturer instructions. qPCR was conducted on a LightCycler 480 (Roche) using KAPA SYBR FAST qPCR Master Mix (Kapa Biosystems) according to manufacturer instructions. qPCR primers (Shox2, Rsrc1, Actb) were described previously24. Relative gene expression levels were calculated via the 2-ΔΔCT method, normalized to the Actb housekeeping gene, and the mean of wild-type control samples was set to 1.
Skeletal preparations
Euthanized newborn mice were eviscerated, skinned and fixed in 1 % acetic acid in EtOH for 24 hours. Cartilage was stained overnight with 1 mg/mL Alcian blue 8GX (Sigma) in 20% acetic acid in EtOH. After washing in EtOH for 12 hours and treatment with 1.5 % KOH for three hours, bones were stained in 0.15 mg/mL Alizarin Red S (Sigma) in 0.5 % KOH for four hours, followed by de-staining in 20 % glycerol, 0.5 % KOH.
ENCODE H3K27ac ChIP-seq and mRNA-seq analysis
To establish a heatmap revealing putative enhancers and their temporal activities within the Shox2 TAD interval, a previously generated catalog of strong enhancers identified using ChromHMM72 across mouse development was used28. Briefly, calls across 66 different tissue-stage combinations were merged and H3K27ac signals quantified as log2-transformed RPKM. Estimates of statistical significance for these signals were associated to each region for each tissue-stage combination using the corresponding H3K27ac ChIP-seq peak calls. These were downloaded from the ENCODE Data Coordination Center (DCC) (http://www.encodeproject.org/, see Table S1, sheet 3 for the complete list of sample identifiers). To this purpose, short reads were aligned to the mm10 assembly of the mouse genome using bowtie (ref), with the following parameters: -a -m 1 -n 2 -l 32 -e 3001. Peak calling was performed using MACS v1.4, with the following arguments: --gsize=mm --bw=300 -- nomodel --shiftsize=10073. Experiment-matched input DNA was used as control. Evidence from two biological replicates was combined using IDR (https://www.encodeproject.org/data-standards/terms/). The q-value provided in the replicated peak calls was used to annotate each putative enhancer region defined above. In case of regions overlapping more than one peak, the lowest q-value was used. RNA-seq raw data was downloaded from the ENCODE DCC (http://www.encodeproject.org/, see Table S1, sheet 3 for the complete list of sample identifiers).
Immunofluorescence (IF)
IF was performed as previously described24. Briefly, mouse embryos at E11.5 were isolated in cold PBS and fixed in 4% PFA for 2–3h. After incubation in a sucrose gradient and embedding in a 1:1 mixture of 30% sucrose and OCT compound, sagittal 10µm frozen tissue sections were obtained using a cryostat. Selected cryo-sections were then incubated overnight with the following primary antibodies: anti-Shox2 (1:300, Santa Cruz JK-6E, sc-81955), anti-SMA-Cy3 (1:250, Sigma, C6198), anti-Hcn4 (1:500, Thermo Fisher, MA3-903) and anti-Nkx2.5 (1:500, Thermo Fisher, PA5-81452). Goat-anti mouse, goat anti-rabbit and donkey anti-rat secondary antibodies conjugated to Alexa Fluor 488, 568, or 647 (1:1,000, Thermo Fisher Scientific) were used for detection. Hoechst 33258 (Sigma-Aldrich) was utilized to counterstain nuclei. A Zeiss AxioImager fluorescence microscope in combination with a Hamamatsu Orca-03 camera was used to acquire fluorescent images.
ATAC-seq and data processing
ATAC-seq was performed as described74 with minor modifications. Per replicate, pairs of wildtype mouse embryonic hearts at E11.5 were micro-dissected in cold PBS and cell nuclei were dissociated in Lysis buffer using a douncer. Approx. 50’000 nuclei were then pelleted at 500 RCF for 10 min at 4°C and resuspended in 50 µL Transposition reaction mix containing 25 µL Nextera 2x TD buffer and 2.5 µL TDE1 (Nextera Tn5 Transposase; Illumina) (cat. no. FC-121-1030) followed by incubation for 30 minutes at 37°C with shaking. The reaction was purified using the Qiagen MinElute PCR purification kit and amplified using defined PCR primers41. ATAC-seq libraries were purified using the Qiagen MinElute PCR purification kit (ID: 28004), quantified by the Qubit Fluorometer with the dsDNA HS Assay Kit (Life Technologies) and quality assessed using the Agilent Bioanalyzer high sensitivity DNA analysis assay. Libraries were pooled and sequenced using single end 50 bp reads on a HiSeq 4000 (Illumina).
ATAC-seq data analysis from wild-type heart replicate samples at E11.5 followed ENCODE2 specifications (May 2019, https://www.encodeproject.org/atac-seq): CASAVA v1.8.0 (Illumina) was utilized to demultiplex data, and reads with CASAVA ‘Y’ flag (purity filtering) were discarded. Adaptor trimming (cutadapt_v1.1) (https://cutadapt.readthedocs.io/) was used with parameter ‘-e 0.1 -m 5’. For read mapping and peak calling, bowtie2 was used75 (version 2.2.6) with parameters ‘-X2000 --mm –local’. bowtie2 aligned 66% of the reads uniquely, and 35% to more than one location. Reads were aligned to both GRCm38/mm10 and NCBI37/mm9 reference genomes with GENCODE annotations, allowing for multi-mapped reads. Unmapped failed reads, duplicates, and low-quality reads (MAPQ = 255) were removed using SAMtools76 (v1.7) and Picard (https://broadinstitute.github.io/picard) (v1.126). For each sample, 20-25 million reads were retrieved after all quality checks. Peak calling was then performed using MACSv273,77 (v2.1.0) with p-value<0.01, and a smoothing window of 150bp. Finally, peaks were filtered in two steps and resulted in 100-200k peaks per sample: (a) excluding the 164 blacklisted coordinates from ENCODE78 mm10 (ENCFF547MET), and (b) overlap across replicates and pseudo replicates. To visualize signal obtained for each of the replicates a UCSC track hub was generated for the mm9 and mm10 genomes in the Genome Browser (GSE160127).
Data availability
Limb 4C-seq and heart ATAC-seq datasets are available in the NCBI GEO database with the accession codes GSE161194 and GSE160127, respectively. All relevant transgenic in vivo enhancer data is available at the Vista Enhancer Browser (https://enhancer.lbl.gov) (see Table S8 for Vista Enhancer IDs). Correspondence and requests for materials should be addressed to J.C. (jacobb{at}ucalgary.ca) or M.O. (marco.osterwalder{at}dbmr.unibe.ch).
Competing interests
The authors declare no competing financial interests.
Author contributions
S.A.-O., B.J.M, J.C. and M.O. conceived the study. S.A.-O., B.J.M, F.D., R.H., J.A.A., J.C. and M.O. performed experimental work including transgenic analyses and genome editing experiments. S.A.-O., E.R-C., A.L., G.A. and J.C. conducted 4C-seq experiments and analysis. V.T. performed the in situ hybridization analysis under the supervision of J.L.-R. E.R.-C., G.K. and I.B. performed bioinformatic analyses. T.A.F and C.S.S. performed skeletal phenotyping. C.S.N, I.P.-F. and S.T. conducted pro-nuclear injections. D.E.D., A.V. and L.A.P. provided project funding and support. J.C. and M.O. provided project funding and wrote the manuscript with input from the remaining authors.
Supplementary Figures
Supplementary Tables
Acknowledgements
This work was supported by Swiss National Science Foundation (SNSF) grant PCEFP3_186993 (to M.O.), a Discovery Grant (RGPIN/355731-2013) from the Natural Sciences and Engineering Research Council of Canada (to J.C.) and National Institutes of Health grants R01HG003988, U54HG006997, R24HL123879 and UM1HL098166 (to A.V. and L.A.P.). J.L-R. is supported by the MICINN grants BFU2017-82974-P and MDM-2016-0687 (Unidad de Excelencia María de Maeztu institutional grant). GA is supported by Swiss National Science Foundation Grant PP00P3_176802. F.D. is supported by a SNSF postdoc.mobility fellowship (P400PB_194334). We thank L. Lopez-Delisle for sharing re-analyzed Hi-C data and D. Duboule for hosting and supporting 4C-Seq experiments as well as training in his laboratory. We are grateful to C. Fielding at the Clara Christie Centre for Mouse Genomics for pronuclear injections conducted at the University of Calgary. We thank the members of the L.A.P., A.V., and D.E.D. group for technical advice and useful comments on the manuscript. Research at the E.O. Lawrence Berkeley National Laboratory was performed under Department of Energy Contract DE-AC02-05CH11231, University of California.
REFERENCES
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.
- 15.
- 16.
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.
- 35.
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵