Lifelong single-cell profiling of cranial neural crest diversification

The cranial neural crest generates a huge diversity of derivatives, including the bulk of connective and skeletal tissues of the vertebrate head. How neural crest cells acquire such extraordinary lineage potential remains unresolved. By integrating single-cell transcriptome and chromatin accessibility profiles of cranial neural crest-derived cells across the zebrafish lifetime, we observe region-specific establishment of enhancer accessibility for distinct fates. Neural crest-derived cells rapidly diversify into specialized progenitors, including multipotent skeletal progenitors, stromal cells with a regenerative signature, fibroblasts with a unique metabolic signature linked to skeletal integrity, and gill-specific progenitors generating cell types for respiration. By retrogradely mapping the emergence of lineage-specific chromatin accessibility, we identify a wealth of candidate lineage-priming factors, including a Gata3 regulatory circuit for respiratory cell fates. Rather than multilineage potential being an intrinsic property of cranial neural crest, our findings support progressive and region-specific chromatin remodeling underlying acquisition of diverse neural crest lineage potential. Highlights Single-cell transcriptome and chromatin atlas of cranial neural crest Progressive emergence of region-specific cell fate competency Chromatin accessibility mapping identifies candidate lineage regulators Gata3 function linked to gill-specific respiratory program Graphical Abstract

specialized type of gill cartilage distinct from that in the rest of the head, as well as pillar and 111 tunica media cells and putative gill progenitors. We also recovered smooth muscle, perivascular, 112 and stromal cells (see Table S1 for cluster marker genes and Fig. S9-10 for in situ validation). 113

114
In addition to skeletal and gill populations, we recovered a distinct type of fibroblast enriched for 115 the cell adhesion molecule chl1a and wnt5a. Strikingly, these fibroblasts are also enriched for 116 genes encoding enzymes for all steps of phenylalanine and tyrosine breakdown (Fig. 1f, Fig. S11). 117 In situ hybridization for two of these genes (hpdb and pah) reveals that these fibroblasts are in 118 the dermis between the skin epidermis and runx2b+/sp7+ osteoblast lineage cells (Fig. 1g,h). 119 Humans with mutations in HGD, which encodes an intermediate enzyme in the Phe/Tyr catabolic 120 pathway, develop Alkaptonuria, or black bone disease, due to accumulation and pathological 121 aggregation of homogentisic acid 16 . As the abundant melanocytes in the zebrafish skin use high 122 levels of Tyr to synthesize melanin, one possibility is that these specialized dermal fibroblasts 123 function to protect the skeleton by removing damaging Phe/Tyr metabolites. 124 125 Progressive emergence of CNCC derivatives and region-specific progenitors 126 To understand lineage decisions of CNCC mesenchyme across time, we first used the STITCH 127 algorithm 17 to connect individual stages into developmental trajectories for scRNAseq and 128 snATACseq datasets (Fig. 2a,b). As early as 3 dpf (particularly apparent for snATACseq), we 129 observe divergence of CNCCs into skeletogenic versus gill lineages. A hyal4+ perichondrium 130 population precedes branches for tendon/ligament, periosteum, and osteoblasts (Fig. S9), and an 131 fgf10b+ gill progenitor population appears at 5 dpf and precedes branches for gill cartilage, pillar, 132 and tunica media cells (Fig. S10). We also observe a distinct trajectory to dermal fibroblasts by 3 133 dpf (Fig. S11), as well as to cxcl12a+ stromal cells (Fig. S9) and teeth. We do not observe CNCC 134 contributions to cardiomyocytes (Fig. S8), in contrast to reports for amniotes 18 . By creating an 135 index for ectomesenchyme-enriched gene expression at 1.5 dpf, a stage preceding the onset of 136 differentiation, we found no evidence for retention of ectomesenchyme identity at later stages, as 137 shown by aggregated ectomesenchyme gene expression and the early ectomesenchyme marker 138 nr2f5 19 (Fig. S12). Although formation of CNCC ectomesenchyme involves a reacquisition of the 139 pluripotency network 14 , we also did not observe expression of pluripotency genes pou5f3 (oct4), 140 sox2, nanog, and klf4 at any stage of post-migratory ectomesenchyme, with the exception of 141 lin28aa that displays broad expression at 1.5 dpf and is rapidly extinguished by 2 dpf (Fig. S12). 142 Rather than maintenance of a multipotent ectomesenchyme population, our data point to 143 progressive emergence of specialized hyal4+ perichondrium, cxcl12a+ stromal, and fgf10a/b+ gill 144 populations at 3 dpf and beyond (Fig. S12). For gill clusters, cell distribution from 5 to 14 dpf revealed two primary trajectories ( Fig. 2f-h, Fig.  159 S13). In the first, cxcl12a+/ccl25b+ stromal cells give rise to mesenchyme associated with retinoic 160 acid metabolism (aldh1a2+/rdh10a+), with in situ hybridization revealing these cell types restricted 161 to the base of secondary filaments (Fig. S10). In the second, fgf10a+ cells are connected to 162 fgf10b+ cells, which then diverge into gill cartilage, pillar, tunica media, and perivascular 163 populations. To test whether fgf10b+ cells are progenitors for specialized gill subtypes, we used 164 CRISPR/Cas9 to insert a photoconvertible nuclear EOS protein into the endogenous fgf10b locus. 165 We found fgf10b:nEOS to be robustly expressed in the forming gills, with expression becoming 166 progressively restricted to the tips of gill filaments over time, similar to endogenous fgf10b 167 expression (Fig. S10, S14). We then used UV light to convert fgf10b:nEOS fluorescence from 168 green to red in a small number of filaments at 7 dpf and observed contribution to gill chondrocytes 169 and pillar cells 3 days later, with new fgf10b:nEOS cells (i.e. green only) being generated at the 170 tips of growing filaments (Fig. 2i). Similar results were seen in adult gill filaments (Fig. S14). These 171 data support fgf10b+ cells being progenitors for gill-specific cell types from larval through adult 172 stages. 173

174
To understand how CNCC mesenchyme changes from embryogenesis to adulthood, we next 175 interrogated patterns of gene usage and chromatin accessibility (Fig. 2j, Fig. S15-16, Table S2). 176 Gene ontogeny (GO) analysis of ectomesenchyme at 1.5 and 2 dpf revealed terms linked to cell 177 division and metabolism, consistent with early expansion of this population. We also find 178 enrichment of transcription factors for early ectomesenchyme (dlx2a, twist1a, nr2f6b) and arch 179 patterning (pou3f3b, hand2), as well as transcription factor binding motifs for several types of 180 nuclear receptors, in accordance with known roles of Nr2f members in ectomesenchyme 181 development 19 . The hyal4+ population contains skeletal-associated terms (collagen fibril 182 organization, skeletal system development, regulation of ossification, cartilage development), 183 consistent with being a common progenitor for cartilage, tendon, ligament, and bone in 184 pseudotime analysis. The hyal4+ population is enriched for transcription factors implicated in 185 perichondrium biology (mafa, foxp2, foxp4) 23,24 and cartilage formation (barx1, sox6, emx2) 25-27 , 186 and motifs for Bmp signaling (SMAD) and transcription factors (NFAT, RUNX) known to regulate 187 cartilage and bone 28 . For gill fgf10a/b+ progenitors, we recover terms for general growth (e.g. Highly resolved embryonic spatial expression domains from integrated datasets 206 We next sought to understand the developmental origins of distinct cell types and lineage 207 programs in CNCC ectomesenchyme. To do so, we first examined the ability of integrated 208 transcriptomic and chromatin accessibility datasets to predict the expression patterns of potential 209 ectomesenchyme patterning genes at 1.5 dpf, a stage before overt cell type differentiation. and pitx1 (oral mandibular) 25,33,34 -revealed tight correlation to reported expression, including 218 zebrafish-specific overlap of dlx5a and hand2 in the mandibular arch (Fig. 3d). We also identified 219 a previously unappreciated oral-aboral axis of the mandibular arch in zebrafish, marked by pitx1 220 and nr5a2 respectively, which we validated by in situ hybridization for nr5a2 (Fig. 3e). Re- Chromatin accessibility predicts cell type competency in early arches 232 We next sought to understand how the establishment of cell fate competency is linked to the 233 earlier activity of arch patterning genes. To do so, we first computed unique patterns of chromatin 234 accessibility ("peaks") for each cell cluster at 14 dpf (Fig. 4a, Table S3). Modules of the top 235 enriched peaks for each cell type were then mapped onto UMAP projections of SnapATAC data 236 at 1.5, 2, 3, and 5 dpf (Fig. S19). To understand when cluster-specific peaks become established, 237 as well as cluster relatedness, we developed the bioinformatics pipeline "Constellations". First, 238 we calculated whether projections of cluster-specific peak modules are skewed toward particular 239 regions of UMAP space at each earlier time-point, suggesting establishment of cluster-specific 240 chromatin accessibility (a proxy for cell type competency). We then computed the relatedness of 241 peak module projections in two dimensions for each mapped cluster at each stage (Fig. 4b). 242 Analysis of cell competency trajectories shows that cell types can be grouped into five main 243 classes: skeletogenic cells (including hyal4+ perichondral and postnb+ periosteal cells), stromal 244 cells, dermal fibroblasts, gill cell types, and cartilage. Constellations analysis also reveals a 245 temporal order of cell type competency establishment, with unique chromatin accessibility for 246 cartilage and dermal fibroblast lineages emerging at 1.5 dpf; bone and perichondrium at 2 dpf; 247 and periosteum, tendon and ligament, and gill progenitors and pillar cells at 3 dpf (Fig. 4c). This 248 analysis suggests that chromatin accessibility prefiguring diverse CNCC cell types is 249 progressively established rather than being inherited from earlier multipotent CNCCs. 250 251

Constellations analysis reveals candidate transcription factors for lineage priming 252
To discover potential transcription factors for establishing cell type competency, we analyzed the 253 Constellations dataset for transcription factors whose expression and predicted binding motifs 254 were co-enriched in particular clusters. We identified 287 transcription factor expression/motif 255 pairs showing enrichment (Fig. S20, Table S4). The FOXC1 motif and foxc1b gene body activity 256 were highly enriched in the cartilage trajectory, and LEF1/lef1 in the dermal fibroblast trajectory 257 ( Fig. 5a). Projection of FOX motifs and merged Fox gene activity (foxc1a, foxc1b, foxf1, foxf2a, 258 foxf2b) and LEF1/lef1 onto SnapATAC UMAPs at 1.5 dpf reveals close correlation to mapping of 259 the 14 dpf peak modules for cartilage and dermal fibroblasts at this stage (Fig. 5b,c), as well as 260 the known fate map of cartilage precursors in the arches 35 (Fig. 5d,e). This confirms genetic 261 evidence for roles of Foxc1 and Foxf1/2 in cartilage formation in zebrafish and mouse 36,37 , and 262 more specifically Foxc1 in establishing accessibility of cartilage enhancers in the developing 263 face 28 . It also raises the possibility that Wnt signaling, mediated in part by Lef1, may play a role 264 in early dermal fibroblast specification, consistent with enrichment of wnt5a in this population (Fig.  265

S11). 266
We also find GATA3/gata3 to be highly enriched in gill populations, with SnapATAC UMAP 268 projections of GATA3 motif and gata3 gene body activity at 5 dpf correlating with 14 dpf gill 269 progenitor peaks (Fig. 5f). The enrichment of ETS2/ets2, which plays a role in endothelial previous work had shown that gata3 is expressed in and required for initial gill bud formation in 283 zebrafish, larval lethality had precluded analysis of gill subtype differentiation 40 . We find gata3 284 expression to be maintained in gill populations through adult stages in scRNAseq data, which we 285 validated by in situ hybridization at 14 dpf and 2 years of age (Fig. S21). We then identified a non-286 coding region ~143kb downstream of the gata3 gene, itself containing a predicted GATA3 binding 287 site, that was selectively accessible in posterior arch CNCCs by 3 dpf, gill progenitors and pillar 288 cells by 5 dpf, and gill cartilage cells by 14 dpf (Fig. 6a, Fig. S22). This gata3-P1 element was 289 sufficient to drive highly restricted GFP expression in posterior arch CNCCs starting at 1.5 dpf, 290 which continued in gill progenitors, pillar cells, and chondrocytes through 60 dpf (Fig. 6c-e, Fig.  291

S21). 292
Gill cartilage has a markedly distinct expression and chromatin accessibility profile from hyaline 294 cartilage of the jaw, as shown by selective expression of ucmaa in gill cartilage versus ucmab in 295 hyaline cartilage (Fig. S23). We identified a non-coding region ~5kb upstream of the ucmaa gene 296 that was selectively accessible in gill cartilage starting at 14 dpf and contained a predicted GATA3-297 binding site (Fig. 6b, Fig. S22). This ucmaa-P1 element drives highly restricted GFP expression 298 in gill chondrocytes at 11 and 23 dpf, in contrast to a previously described ucmab enhancer 28 299 driving GFP expression in hyaline but not gill cartilage (Fig. 6f, Fig. S23). Although functional 300 assays are needed to confirm Gata3 dependence, our findings are consistent with GATA factors 301 establishing a positive autoregulatory circuit in posterior arch CNCCs that maintains gata3 302 expression and promotes the later differentiation of gill-specific cell types (Fig. 6g). For enhancer transgenic lines, we synthesized peaks for gata3 (chr4:24918100-24918770) and 344 ucmaa (chr4:7836670-783720) using iDT gBlocks and cloned these into a modified pDest2AB2 345 construct containing E1b minimal promoter, GFP, and an eye-CFP selectable marker 28 using In-346 Fusion cloning (Takara Bio). We injected plasmids and Tol2 transposase RNA (30 ng/uL each) 347 into one-cell stage zebrafish embryos, raised these animals, and screened for founders based on 348 eye CFP expression in the progeny. Two independent germline founders were identified for each 349 that showed similarly specific activity in the gills.  Libraries were sequenced on Illumina NextSeq or HiSeq machine at a depth of at least 75,000 416 reads per nucleus for each library. Both read1 and read2 were extended to 65 cycles. Cellranger 417 ATAC v1.2.0 (10X Genomics) was used for alignment against genome (built with GRCz11.fa, 418 JASPAR2020, and GRCz11.98.gtf), peak calling, and peak-by-cell count matrix generation with 419 default parameters. 420 We included biological replicates at several stages to test the reproducibility of library preparation 421 and increase depth of data. For scRNAseq, we performed two replicates at 5 and 14 dpf, and 422 three replicates at 3 and 150 dpf. For snATACseq, we performed two replicates at 2, 3, and 14 423 dpf. 424 425 SnapATAC for peak refinement and gene activity matrix imputation 426 To refine the peak profile for better representation of diverse cell types across libraries, we 427 performed a second round of peak calling using package Snaptools (v1.2.7) and SnapATAC 428 (v1.0.0) 15 . We first removed low-quality cell and cell doublets by setting cutoffs based on 429 percentage of reads in peaks (> 30 for 60 dpf, > 45 for 210 dpf, and > 50 for the rest) and fragment 430 number within peaks (5,000 -30,000 for 5 dpf, 1,000 -11,000 for 14 dpf, and 1,000 -20,000 for 431 the rest). Potential cell debris or low-quality cells were removed by setting hard fragments-in-peak 432 number cutoffs. Using the SnapATAC package, we then generated "pseudo-multiome" data at 433 each stage. To recover every aligned fragment, we binned the genome into 5 kb sections and 434 The tissue module scores of the snATACseq data were calculated based on the enriched peak 503 sets and their module scores for each cluster identified at 14 dpf by R packages Seurat and 504 Signac. The enriched peak sets were calculated by the FindAllMarkers function using two-sided 505 likelihood ratio test with fragment numbers in peak region as latent variables. We used the peaks 506 with adjusted p values smaller than 0.001 as the enriched peaks for a cluster. As there are 23 507 clusters (tissues) identified at 14 dpf, we ended up with 23 peak sets, which we applied to 508 calculate the tissue module scores to earlier time points (1.5, 2, 3, and 5 dpf) using 509 AddChromatinModule function. To determine whether a tissue score at a time point distributes in 510 a statistically significant, and hence biologically interesting, way, we calculated the skewedness 511 of the distribution of a tissue score by the R package parameter (v0.12.0). We considered a tissue 512 score to be distributed in a meaningful way if it was strongly right skewed by a hard cutoff of 513 skewedness greater than 1. For 1.5 dpf, the cutoff of skewedness was lowered to 0.4 to 514 accommodate overall lower skewedness at that time point, but with additional filter of max module 515 score > 15 to avoid tissue module scores with extremely low values.