scRNA-seq reveals the diversity of the developing cardiac cell lineage and molecular building 1 blocks of the primary pacemaker 2

10 The heart is comprised of a variety of specialized cell types that work in unison to maintain blood flow. 11 Here we utilized scRNA-seq analysis to delineate the diversity of cardiac cell types in the zebrafish. With 12 the growing use of the zebrafish to model human heart biology, a deeper insight into its complex cellular 13 composition is critical for a better understanding of heart function, development, and associated 14 malformations. We present a high resolution atlas of zebrafish heart single cells transcriptomics, 15 consisting of over 50 000 cells representing the building blocks of the zebrafish heart at 48 and 72 hpf. 16 We defined 18 discrete cell populations comprising major cell lineages and sublineages of the developing 17 heart. We pinpointed a population of cells likely to be the primary pacemaker and identified 18 the transcriptome profile defining this critical cell type. Our analyses identified two genes, atp1b3b and 19 colec10 , which were enriched in the sinoatrial pacemaker cells. CRISPR/Cas9-mediated knockout of these 20 two genes significantly reduced heart rate which is accompanied by arrhythmia or morphological defects, 21 suggesting their novel function in cardiac development and conduction. Additionally, we describe other 22 subpopulations of cardiac cell lineages, including the endothelial and neural cells, whose expression 23 profiles we provide as a resource for further investigations into the cellular and molecular mechanisms of 24 this organ. 25


Introduction
Essential components of the heart ensure its life-sustaining activity [1].Specialized cell types constitute the contractile myocardium of the atria and ventricles, while cells of the cardiac conduction system and the autonomous nervous system innervates the heart tissues and coordinate rhythmic contractions of the heart chambers [2].The two pacemakers, sinoatrial (SA) and atrioventricular (AV) nodes, spontaneously generate electrical impulses driving heart contraction [3].Endothelial cells form the inner endocardial lining of the heart lumen [4], a subset of which are further specialized to form the heart valves [5], while a separate population makes up the coronary blood vessels that supply oxygen for myocardial contraction [6].The epicardium provides a protective layer surrounding the heart muscles [7].Besides these main sublineages, other cell types, mainly fibroblasts and mesenchyme, provide the structural matrix of the organ and play multiple roles in physiological processes related to cardiac function and regeneration [8].
The heart also performs other functions such as endocrine and iron homeostasis (reviewed in [9,10]) which are established though less well described.
The core genetic program and stepwise morphogenesis involved in the development of the heart is largely conserved across metazoans [11].Cells making up the heart arise from a pool of common mesodermal progenitors which are specified to the various major lineages.In the zebrafish, these progenitors can be detected by 12 hours post-fertilization (hpf) by the expression of myl7 and nkx2.5.Subpopulations of myocardial progenitors could be further distinguished between atrial (expressing myh6) and ventricular (expressing myh7) myocardium [12].Concurrently, endocardial progenitors denoted by the expression of kdrl and cdh5 could be found anterior to the myocardial progenitors [13].As the myocardial progenitors migrate to the midline to form a heart tube by 19 hpf, endocardial progenitors proceed to migrate towards the median and line the lumen of the heart tube [14].By 22 hpf, a linear, beating heart tube is formed, which is continuously elongated by the addition of cells originating from the second heart field (SHF), a pool of late-differentiating mesodermal progenitors, extending the atrium and ventricle and forming inflow and outflow tracts [15,16].Between 24 and 30 hpf, neural crest cells (NCCs) migrating through the pharyngeal arches contribute to the myocardium of the primitive heart tube, while a second population arriving at 80 hpf contribute to the outflow tract structures [17].The NCCs also contribute to the cardiac peripheral nervous system [18].Finally, the proepicardial organ forms at 48 hpf close to the atrioventricular junction and can be detected by the expression of wt1 and tcf21 markers.This structure gives rise to the epicardial cells which subsequently migrate and envelop the heart [7].
Although key insights into heart development and function have been derived from the zebrafish model organism [11,13], critical differences exist between the zebrafish and mammalian heart.Unlike mammals, teleosts including the zebrafish possess a heart comprising two chambers.An additional structure unique to teleosts is the bulbus arteriosus (BA), a chamber-like structure located at the outflow portion of the heart which is composed of smooth muscle and serves to absorb pressure [19].In terms of electrophysiology, the zebrafish heart is fundamentally similar to that of humans, which enables faithful modeling of diseases of the cardiac conduction system [20,21].Yet, differences in terms of ion channels and the types of currents involved in driving cardiac contractions exist [22].With the growing use of the zebrafish to model human heart biology, a comprehensive knowledge of both conserved and nonconserved features between the hearts of the two organisms becomes necessary in order to more accurately translate results from the zebrafish to human.
Single cell analyses of mammalian heart have revealed surprising new insights in the discovery of new cell types and their contributions to various forms of heart disease [23][24][25][26].The zebrafish offers a glimpse into earlier events of cardiogenesis which could provide valuable insights into the mechanism underlying the development of various cardiac cell types and specialized structures.Several single cell level analyses have been performed in zebrafish which included the heart [27][28][29][30].However, these were either focused on selected cell types or have not reached sufficient depth to comprehensively capture and annotate cardiac cell subtypes, particularly rare cell types such as the cardiac pacemaker cells.These cells are embedded within the myocardium and play a central role in generating and propagating electrical impulses for a rhythmic heart contraction.Although previous bulk-level analyses have shed some light into the molecular mechanism of their function [31][32][33], their scarcity and the lack of defining morphological and molecular features continue to pose a challenge to isolate pure populations of this cell type and study them.
Here, we present a high resolution atlas of the developing zebrafish whole heart single cells transcriptomics, aiming at sufficient depth to allow discovery and annotation of cardiac cell subtypes.We delineated major cell lineages and sublineages of the zebrafish heart and distinguished a set of gene expression profiles associated with each of these populations.We uncovered new cell subpopulations within major clusters of cardiomyocytes, endocardium, neural-crest-derived, and fibroblast cells.
Clustering analyses revealed two novel genes specifically enriched in the SA pacemaker, colec10 and atp1b3b, which encode the collectin subfamily member 10 [34] and a subunit of the Na+/K+ ATPase beta chain proteins [35], respectively.Loss of function analyses further revealed their role in heart development and rhythm maintenance.Our study established the heterogeneity of zebrafish cardiac cell types which could serve as a valuable resource for future in-depth analyses of cell populations with higher specificity.

Heart extraction and single cell dissociation
Whole hearts were extracted from double transgenic individuals [sqet31Et x Tg(myl7:mRFP) and sqet33mi59BEt x Tg(myl7:mRFP)].The term "pseudo-replicates" was used to denote samples of two different genotypic backgrounds (sqet31Et and sqet33mi59BEt) at the same developmental stage.
Embryos were anesthetized with Tricaine (0.16 mg/ml in egg water) and large-scale extraction was performed according to a previously published protocol, with minor adjustments [43].Hearts were manually separated from remaining tissue under a fluorescence stereomicroscope and collected into 0,5 ml of EDM (L-15/10% FBS).Single cell heart suspension was obtained according to Bresciani et al. [44].
Ultimately, cells were resuspended by gentle pipetting, loaded on the cell counting chamber, visually inspected under the microscope (Zeiss Apotome), and quantified by Countess automated cell counter (Invitrogen) (Fig. 1A, Supplementary Fig. 1A).

Library prep and sequencing
After viability and quantity check, dissociated cells derived from embryonic zebrafish hearts at 48 hpf and 72 hpf were loaded on 10x Chromium Controller Chip G (10x Genomics) and processed according to the Chromium Next GEM Single Cell 3' Reagent Kits v.3.1 targeting 10 000 cells.Resulting libraries were verified by High-Sensitivity DNA Kit (Agilent Technologies) on a 2100 Bioanalyzer (Agilent Technologies) and Qubit Fluorometer using Qubit dsDNA High-Sensitivity Assay Kit (Invitrogen).Final libraries were quantified with KAPA Library Quantification Kit Illumina® Platforms (Kapa Biosystems/Roche), followed by paired-end sequencing (read 1 -28 cycles, i7 index -8 cycles, i5 index -0 cycles, read 2 -91 cycles) performed with Nextseq 500 (Illumina).

QC, filtering and clustering
Obtained raw sequencing data (BCL files) were demultiplexed and transformed into fastq files using 10x Cellranger pipeline version 3.1.0[45] and bcl2fastq v2.19.0.316 (Illumina).The resulting matrices were loaded into the Seurat package v4.0.1 [46] for R v4.0.2 [47] applying standard quality control, normalization and analysis steps unless otherwise specified in the description.Briefly, sequencing reads were mapped to the zebrafish reference genome GRCz11 (Ensembl release 100) extended with additional EGFP and mRFP sequences.Due to overlapping annotations, reads mapped to non-polyA transcripts and protein-coding genes may be flagged as multi-mapped, and in consequence not count in the 10x pipeline, respective annotation GTF file was pre-filtered to retain only protein-coding genes (cellranger mkgtf --attribute=gene_biotype:protein_coding).The estimated number of cells identified by the Cellranger pipeline was 52,447 (Supplementary Table 1).In order to dispose of potential empty droplets, low quality cells and possible multiplets, droplets with a very small (< 200) and very high (> 2500) library size were removed.Standard scRNA-seq filtering workflows exclude cells with a high ratio of reads from mitochondrial genome transcripts, which may indicate potential cell membrane rupture and dissociationbased damage.This filter is often set at 5-10% [48], however, the heart as muscle tissue is composed of diverse cell populations including cardiomyocytes which require high energy demand leading to potentially very high mitochondrial gene content per cell.Therefore, to retain cardiomyocytes while removing low-quality cells, we chose a threshold of 30% mitochondrial gene content.In addition to above mentioned thresholds, we also decided to remove cells containing more than 10% of hemoglobin reads.
All together, to keep high quality cells the following thresholds were applied: 200 < number of genes < 2500 & mitochondrial gene content < 30% & hemoglobin gene content < 10%.An increasing number of targeted cells is inherently associated with higher multiplet rate.To maximize the removal of possible doublets from the dataset, in addition to filtering out cells with very high gene content (> 2500) we applied the DoubletFinder [49] approach to exclude extra heterotypic doublets derived from transcriptionally distinct cells (Supplementary Table 1).After applying all filtering steps, a total of 34 676 cells were retained for further analysis (Supplementary Table 1).Following the Seurat workflow, the resulting gene expression matrices were normalized according to "SCT transform" using "glmGamPoi" method and the default number of cells and variable genes [50].The scaled matrices were dimensionally reduced by PCA and Uniform Manifold Approximation and Projection (UMAP) embedding followed by neighbor finding and clustering according to default Seurat workflow.In each, 30 principal components were used.

Integration of single-cell datasets
Having two biological pseudo-replicates for each timepoint (48 hpf and 72 hpf, respectively), data were integrated using the Seurat integration approach applying default parameters [51].3 000 integrated anchors were identified and used as an input to create an integrated dataset as implemented in Seurat.Cells were clustered based on KNN graph based approach and Louvian algorithm for modularity optimization with the resolution parameter ranging from 0.5 to 12. Ultimately resolution was set to 1.5.
Clustering results were visualized by UMAP.Finally, we used the built-in default Seurat FindAllMarkers and FindMarkers functions to identify differentially expressed genes in each cluster.As a background, the entire dataset was used except for differential expression analysis between myocardial subclusters that have been compared internally.Functional Gene Ontology (GO) analysis was performed online (http://geneontology.org/) using PANTHER classification system [52] according to the default parameters (Fischer's exact test and FDR as correction) or ClusterProfiler v. 3.17.3[53].Top 100 differentially upregulated genes were used as an input.However, many gene products involved in pacemaker development in zebrafish are not comprehensively captured by GO [54], therefore, the Ensembl IDs of all upregulated genes (adj.p-value <0.05) from Sinoatrial CMs cluster were converted to human orthologs and searched against Homo sapiens.All subsequent enrichment plots were generated in ggplot2 package using zebrafish ensembl IDs.The single cell transcriptome data can be accessed http://zfcardioscape.iimcb.gov.pl.Interactome map between the cardiac cell clusters was constructed using the DanioTalk tool [55].The tool provides zebrafish-focused annotations based on physical interaction data of the zebrafish proteome and hence provided significantly higher number of ligand-receptor interactions compared to others built on mammalian data.All sequencing data have been deposited in the GEO database under accession number GSE234216.

Spatial correlation analysis
The TOMO-seq dataset [31] was downloaded and used as spatial reference.Genes that had a maximum read count of less than 20 were first removed.We then smooth the data with local regression method (LOESS, span =0.15) and calculate fold changes for all genes against the average for that gene across all sections.The Pearson correlation coefficient between each scRNA-seq clusters and each section is calculated.The spatial correlations () between the scRNA-seq clusters and the sections [31] were calculated by applying the following steps: (1) Ensembl gene IDs were used to match the data from the clusters and the sections, (2) in the cluster data, the log2FC were converted to fold change values, (3) if resulting value for a given gene ID was zero either for clusters or sections, these gene ID was ignored, (4) the Pearson correlation coefficient () between the cluster and each section  was estimated.The average number of compared genes for all sections was 254.As a proxy for assessing quality of comparison, we investigated the number of genes that were compared for estimation of  for each section with respect to this average value.The sections from 2nd to 36th had number of compared genes between 0.8 and 1.2 of the average value, and only sections 37th to 39th had 0.2 of the average value.Section IDs are as in [31].

Zebrafish F0 knockout of candidate genes using CRISPR/Cas9
The sgRNA sequences were designed utilizing chopchop [56] with default settings for CRISPR/Cas9 knock-out method.Based on previously established protocol [57], sgRNAs were designed to target three different loci per gene.The following exons were targeted: 2nd, 4th and 5th as well as 2nd, 3rd and 6th for atp1b3b and colec10, respectively (Figure 4B, Supplementary Table 7).All selected sgRNAs were specific to the targeted gene showing no mismatches and off-targets according to chopchop (Supplementary Table 7).Synthetic sgRNAs were ordered from Synthego at 1.5 nmol scales and dissolved in 15 μl of 1X TE buffer (10 mM Tris-Hcl, 1 mM EDTA, pH = 8) to reach 100 μM stock concentration.
Sequences and all quality metrics are provided in Supplementary Table 7. Cas9 protein was ordered from IDT (Alt-R™ S.p. Cas9 Nuclease V3, catalog number 1081059).Based on [58], a pre-assembled complex made up of Cas9 and gRNA results in greater efficiency than the coinjection of Cas9 and gRNA.
Therefore, one day prior to scheduled injections, individual sgRNAs were mixed in equal volumes and adjusted to 19 μM sgRNA mix concentration.sgRNA mixes were combined with previously diluted Cas9 protein in Cas9 buffer (20 mM Tris-HCl, 600 mM KCl, 20% glycerol [59]) to reach a final concentration of 9,5 μM RNP complex concentration (molar ratio 2:1, 19 μM sgRNA mix : 9,5 μM Cas9) and incubated at 37 °C for 5 min then cooled on ice and stored in 4 °C until microinjection.Approximately 1 nL of RNP complex composed of three sgRNA targeting different gene loci and Cas9 protein was injected into the yolk of one-cell stage embryos [59].Uninjected siblings were used as control.In the case of both atp1b3b and colec10 knockdown, injection of 28.5 µM of Cas9-sgRNA complex according to previous report [57] induced over 70% mortality which suggests batch-sepcific toxicity of the Cas9 enzyme (not shown), therefore we used 9.5 µM in our experiments.To facilitate the observation of the heart, all experiments were performed in the transgenic line Tg(myl7:EGFP) [39] or Tg(myl7:EGFP-Hsa.HRAS) s883 [40] expressing EGFP in the heart.To account for possible off-target effects resulting from microinjections, we targeted EGFP in the same transgenic zebrafish lines using in vitro transcribed sgRNAs [59].
Templates for sgRNA synthesis were generated by annealing and elongation (Phusion Flash High-Fidelity Master Mix, Life Techologies) of a forward primer containing a T7 promoter and guide sequence, and a reverse primer encoding the sgRNA scaffold (Supplementary Table 7).The DNA templates were purified (MinElute PCR Purification, Qiagen) and quality checked by 8% polyacrylamide gel electrophoresis before used as template for in vitro transcription using T7-FlashScribe™ Transcription Kit (CellScript).

Knockout phenotype and heart rate analysis
Following microinjection, dead embryos before 24 hpf were discarded to exclude unfertilized embryos or those damaged by the injection.Phenotypic analysis was performed starting from 48 hpf.First, malformed embryos and/or embryos that developed cardiac edema were separated and reported.Then, wherever possible, a random 10 and 20 embryos from the uninjected control and knockout groups were sampled for heart rate analysis and kept for further observations.Heart rate analysis was performed at 48 hpf on the SZX16 fluorescent steromicroscope (Olympus) by manually counting the number of heartbeats for 30 seconds per embryo using a timer.Subsequent statistical analysis was performed in R. Shapiro-Wilk normality test implied that the distribution of the knockout data are significantly different from a normal distribution (p-value < 0.05), therefore, non-parametric Wilcoxon rank sum test was applied to compare data between wildtype and knockout embryos.The Holm-Bonferroni method was used for adjusting pvalues.All movies were recorded using acA1440-220um USB 3.0 camera (Basler).

Isolation and transcriptome profiling of single cells from the embryonic zebrafish heart
To generate homogenous, viable single cell suspension from the zebrafish heart at 48 hpf and 72 hpf, we optimized a cell dissociation protocol incorporating simultaneous trypsin and collagenase treatment.We obtained cell suspension with viability above 90% (Supplementary Fig. 1A).To provide an internal control of specific rare cardiac cell population, we utilized two transgenic lines sqet33mi59B [32,37] and sqet31Et, in which cells of the sinoatrial (SA) and atrioventricular (AV) pacemaker regions expressed EGFP [33,36].In order to additionally demarcate the major cell types of the heart, we utilized the Tg(myl7:mRFP) transgenic line [38] to confidently mark cardiomyocytes (CMs), which is the most technically challenging cell type to isolate, and enhance cell clustering.To profile the transcriptome of the zebrafish heart, we collected offspring from two independent crosses of Tg(myl7:mRPF) line with either sqet33mi59BEt or sqet31Et transgenic lines.From each of the double transgenic individual pools, we isolated whole hearts at 48 hpf and 72 hpf and dissociated them into single cells which were subsequently encapsulated according to the 10x Genomics workflow (Fig. 1A).
In total, we obtained 52 447 cells from all four biological samples (Fig. 1B, Supplementary Table 1).In each, over 22 000 genes were identified.The median number of genes per cell varied between the developmental stages, which was on average nearly three times higher at 48 hpf as compared to 72 hpf.
Similar trend was observed in terms of unique transcript molecules per cell, whereas mitochondrial gene content was double in cells derived from 72 hpf (Supplementary Fig. 1B).At the same time, the Pearson correlation coefficient between raw developmental pseudo-replicates was equal to 0.98 and 0.83 for 48 hpf and 72 hpf data, respectively (Fig. 1D).The number of cells after quality control for library size, mitochondrial gene content and possible multiplets ranged from 6728 to 11 015 per sample, resulting in a final number of 34 676 cells for downstream analysis (Fig. 1B, Supplementary Table 1).Out of these, 2243 (80% of total RFP+ cells) cells co-expressed RFP and well established gene signatures for myocardial cells which included cmlc1 and myl7.These form a separate cluster, suggesting cardiomyocytes identity (Fig. 1C, Supplementary Fig. 1C).In addition, 179 cells were found to express EGFP, out of which the vast majority comes from 48 hpf dataset (129 cells at 48 hpf vs. 50 cells at 72 hpf).Most of the total EGFP+ cells (116) were within the cardiomyocytes cluster, suggesting that our protocol enabled the isolation of single cardiomyocytes, with sufficient sensitivity to capture rare cell populations making up the two pacemaker regions.To facilitate in-depth exploration of our data, we developed an R Shiny application based tool for interactive data visualization, differential expression as well as gene enrichment analysis which is available at http://zfcardioscape.iimcb.gov.pl.

The cellular landscape of the developing zebrafish heart
We distinguished 18 discrete cell populations comprising the zebrafish embryonic heart from both 48 hpf and 72 hpf (Fig. 1C).The most abundant clusters comprised endothelial cells (31%), cardiomyocytes (11%) and mesenchymal fibroblasts (7%) (Fig. 1E, Supplementary Table 2).Moreover, the number of cells contributing to each cluster varied between the developmental stages.Clusters originating mostly from 48 hpf included resident fibroblasts, mesoderm progenitors and cardiac peripheral nerves.On the other hand, seven clusters showed a higher proportion of cells derived from 72 hpf.These included neuronal cells, epicardial cells and Bulbus arteriosus (Fig. 1E, Supplementary Table 2).Of note, mitochondrial gene content was particularly higher among cells originating from the epicardium and cardiomyoytes clusters in the later developmental stage (72 hpf) which suggested their higher energy metabolism.On the other hand, the number of expressed genes and UMIs were greater at 48 hpf as compared to 72 hpf.Among clusters with the highest number of gene and UMIs were Mesoderm progenitors, AV endocardium and Neural crest cells (Fig. 1F).
Based on the expression of unique marker genes, these clusters could be broadly grouped into the major cell lineages of the developing heart, namely: myocardial, endocardial, epicardial, neural and neural crest, and mesenchymal/fibroblasts.The myocardial lineage formed a distinct cluster of cells represented by the cluster "Myocardium" which could be distinguished by the expression of myl7, cmlc1, myh6, myh7, tnnt2 (Fig. 2A, Supplementary Table 3) [60,61], as well as the RFP transgenic marker (Supplementary Fig. 1C).
The endocardial lineage constituted the largest proportion of the cells in our data.These included the clusters "Endothelial cells" and "Endothelial precursors" which express cdh5, tie1 and fli1 [62,63], as well as the "AV endocardium" and "AV cushion" which contributes to the developing AV valve (Fig. 1E, Supplementary Table 2).In addition to endocardial markers, the latter two express AV canal restricted markers such as anxa5b, alcama, and nrg1 [31,33].The "AV cushion" differs from the "AV endocardium" cluster by the presence of myocardial-associated genes (myl7, ttn.2, and a number of mitochondrial genes) and a higher expression of cell adhesion and extracellular matrix proteins (postna, col1a1a/b/2) (Supplementary Table 3).The epicardial lineage forms a single cluster "Epicardium" characterized by the expression of tbx18, gstm.3,wt1b, and tcf21 [28].
Cells of the "Bulbus arteriosus" cluster were distinguished by the expression of elnb, postnb, and fbln5, reflecting their smooth muscle composition [19].Neural and neural crest cells were represented by clusters "Cardiac peripheral nerves"' and "Neuronal cells"' characterized by the expression of tuba1c and elavl3 [64] and cluster "Neural crest" by the expression of twist1a/b, id2a, id3, and sox9b [65][66][67].Mesenchyme and fibroblasts make up a significant population of cardiac cells and could be distinguished by the expression of adhesion molecules such as cldnh, and epcam [68].Two clusters constituting this cell lineage were "Resident fibroblasts"and "Mesenchymal fibroblasts", the latter of which expressed higher numbers of genes encoding structural proteins and extracellular matrix components such as col1a1a, cdh1, and those encoding the Claudin family [69] (Supplementary Table 3).Interestingly, we also detected a cluster of cells designated as "Mesoderm progenitos" (Fig. 1C, 2A) which was enriched in the expression of isl1 and isl2b, and ldb1a which are known to be expressed in cells originating from the second heart field [16,70].Cells of this cluster also expressed a large number of transcription factors and cell cycle regulators (Supplementary Fig. 1D, Supplementary table 3).Likely as byproduct of our whole heart isolation procedure, we also captured hematopoietic cells indicated by the expression of blf and myb [71,72].From these, distinct population of erythrocytes (expressing hemoglobin genes as well as alas2, hemgn [73]) and leukocytes comprising of neutrophils (mpx [74], ncf1 [75]), macrophages (mpeg [76]), and platelets (mpl, itga2b [77]) could be distinguished based on the expression of their unique markers (Fig. 2A, Supplementary Table 3).
Different cell clusters exhibited variable proliferation activity based on the expression of genes driving the G2M and S-phase of the cell cycle.The most proliferative cluster being "Endothelial precursors", the cells of which expressed a large number of G2M and S-phase related genes (Supplementary Fig. 1D).The clusters "Bulbus arteriosus", "Leukocytes", "Mesoderm progenitors'', "Erythrocytes", and "Resident fibroblasts" were found to be more proliferative at 48 hpf compared to 72 hpf (Supplementary Fig. 1D), which may reflect their differentiation process.Accordingly, clusters expressing known cell-type-specific markers were generally low in proliferation activity (<50%).These include "Myocardium", "AV cushion" and "AV endocardium", "Endothelial cells", and "Neuronal cells" (Supplementary Fig. 1D).
In order to provide spatial context to the cell clusters, we took advantage of the available spatial information from the serial cross-section transcriptome dataset of the zebrafish heart [31].Correlating the expression profiles of our main scRNA-seq clusters with that of each serial section, the analysis revealed that cell types associated with structures known to be spatially localized exhibited the highest correlation with sections representing their corresponding localization (Fig. 2B).Furthermore, spatial correlation between each main cluster and their corresponding fine-grained clusters (see Supplementary method) revealed overall high levels of both gene and spatial cluster correlation, except for the "Unclassified" cluster which was heterogenous in both spatial and gene expression values, and the "Mesenchymal fibroblasts" cluster which was homogeneous in terms of gene expression values but spatially heterogeneous (Supplementary Fig. 2, Supplementary Table 6).This suggested that these clusters might be controlling subparts of the main clusters which have a differentiated role but are otherwise attached to the developmental structure associated with the respective main cluster.Taken together, the spatial correlation analyses of the scRNA-seq clusters agrees with the spatially restricted expression profile, thereby providing additional support for our cluster annotation.
To identify potential cellular crosstalk between cell types in the developing heart, we constructed an interactome map between the cardiac cell clusters based on the expression of ligands and their receptors [55].The analysis revealed several known cellular interactions including those involved in the development of AV cushion [78].Genes encoding Bmp4/5/6 were expressed in the "AV cardiomyocytes" cluster, while those encoding its signaling receptors Bmpr2a, Acvr1ba, and Sdc2 were expressed in the "AV cushion" cluster (Fig. 2C, Supplementary Table 4).The analyses also suggested potential interactions which were not previously established.For instance, the cluster "Cardiac peripheral nerves" interacted with "Endothelial cells" and "Epicardium" clusters through Robo1/2 and its receptor Slit1a/2/3 expressed in the corresponding clusters (Fig. 2C, Supplementary Table 4).The Robo/Slit signaling was implicated in axon guidance, as well as in endocardial progenitors migration to the midline [79].The "Cardiac peripheral nerves" cluster also interacted with "SA cardiomyocytes" subcluster through several known ligand-receptor pairs including Plxna3 -Sema3aa, Fgfr2/3/4 -Ncam1a, and Sdc4 -Mdkb (Fig. 2C, Supplementary Table 4), which have been implicated in neural patterning [80,81].Taken together, the analysis revealed potential interactions between cell or tissue types and their underlying molecular players for further in-depth analyses.

Diverse myocardial cell populations constitute chamber myocardium and the cardiac conduction system
The myocardium is the main tissue type of the heart which is responsible for the organ's main contractile function.Myocardial cells are characterized by their contractility and high metabolism [82].In agreement with this, Gene Ontology terms related to muscle function and structure were enriched among genes in this cluster (Figure 3A, B).Chamber cardiomyocytes could be distinguished by the expression of wellestablished marker genes, namely atrial myh6, and ventricular myh7/myh7l [83,84].Within the cluster "Myocardium", a clear distinction between atrial and ventricular cardiomyocytes was evident by the complementary expression of atrial (myh6) and ventricular (myh7 and myh7l) cardiomyocyte markers (Fig. 3A).Differential expression analysis between both fractions confirmed the presence of a set of unique genes in each heart chambers.The top differentially expressed signatures comprised genes encoding for various structural proteins involved in active force generation, including myh6, tnnc1b and smtln1 (atrial), and myh7l, myh7, and tnni4a (ventricular) (Fig. 3C).
Further re-clustering of the cells within the "Myocardium" cluster (a total of 3315 cells) revealed subpopulations of myocardial cell types (Fig. 3D).At least 43% of myocardial cell populations consisted of those making up the working myocardium of the two cardiac chambers, namely, atrial ("Atrial CMs I" and "Atrial CMs II") and ventricular ("Ventricular CMs").The mutually exclusive expression of myh6 and myh7/myh7l, together with other unique markers, differentiate these two subtypes of chamber myocardium (Fig. 3G, Supplementary Table 5).Two distinct subclusters of atrial cardiomyocytes were different in terms of higher mitochondrial content in "Atrial CMs II" compared to "Atrial CMs I" (Supplementary Fig. 1E).In addition, "Atrial CMs II" was enriched for the expression of myoz2b; Supplementary Table 5), the mammalian ortholog of which was previously reported to mark a distinct population of cardiomyocytes with yet unknown function [85].An additional cluster designated as "Cardiomyocytes" expressed myl7 which identified these cells as cardiomyocytes.However, they are not enriched in any chamber-specific markers.
Besides the working myocardium, the cardiac conduction system constitutes an important structure of myocardial origin.Among cells with high expression of atrial-specific gene (myh6), we identified a group of 127 cells that uniquely express lyve1a [31], vsnl1a [86], and chrm2a [87] that have been reported to be upregulated in the heart SA region.These cells also express other genes implicated in the function of the SA pacemaker, including isl1, shox2, hcn4, bmp4, fgf13a, tbx18 [31,32,88], while maintaining low expression of working myocytes markers, e.g.nppa/b, nkx2.5, gja5a/b (Fig. 3E).Gene Ontology enrichment analysis revealed numerous GO terms related to the sinoatrial node and cardiac pacemaker cell development (Fig. 3F).We therefore define this cluster as "Sinoatrial CMs" that could represent a group of specialized cardiomyocytes of the primary pacemaker site of the cardiac conduction system.

Functional analysis of cell-type-specific markers reveal new potential players in heart rhythm maintenance
Single cell analysis allowed us the opportunity to identify and study molecular components implicated in the development and function of rare cell populations.To this end, we utilized the CRISPR/Cas9 based knockdown [57,59] to target two candidate genes, atp1b3b and colec10, which were significantly enriched in the "Sinoatrial CMs" cluster representing the rare population of primary pacemaker cells (Fig. 4A, Supplementary Table 5).To target atp1b3b, we designed sgRNAs against the 2nd, 4th and 5th exons (Fig. 4B).By 48hpf, loss of function of atp1b3b did not appear to affect heart development (Fig. 4D).Moreover, these embryos also looked largely unaffected in terms of overall morphology (Fig. 4C, F).Targeting of colec10 by sgRNAs directed against the 2nd, 3rd and 6th exons caused a high proportion of embryos with hearts that failed to loop by 48 hpf (46 %, n = 160/347, Fig. 4C).These embryos appeared morphologically normal otherwise, with only a very small number of embryos exhibiting gross morphological abnormality (Fig. 4C, F).
The specific expression of atp1b3b and colec10 in the "Sinoatrial CMs" representing the primary pacemaker cell population led us to question whether they are involved in regulating heart rhythm.We systematically calculated the heart beat rate in CRISPR/Cas9-injected embryos as compared to uninjected siblings and EGFP-knockout embryos (EGFP-KO) which served as controls (Figure 4E, Supplementary Fig. 3A).Injection of Cas9 complex with sgRNAs targeting EGFP successfully abolished 100% (n = 77) EGFP expression in the transgenic embryos.A small number developed cardiac edema (13,5%, n = 30/221) while the majority did not show any observable phenotypic changes (Fig. 4C).Interestingly, we observed a reduction in heart rate in EGFP-injected embryos as compared to uninjected siblings (Supplementary Fig. 3A).Therefore, to account for off-target effects resulting from microinjections, in addition to uninjected siblings, we directly compared heart rates of atp1b3b and colec10 knockout embryos to EGFP-injected ones.We observed that, in both cases, heart beat rate was significantly lower by 48 hpf (Fig. 4E

The cardiac endothelial cell subpopulations possess distinct molecular profiles
Endothelial cells perform critical function in heart development and physiology and are known to contribute to different heart tissues including cardiac valve, coronary vessels, and trabeculae [89].In order to characterize the diversity of cardiac endothelial cells in the developing heart, we re-clustered cells of the "Endothelial cells", "Endothelial precursors", "AV endocardium", and "AV cushion" clusters.Our analyses revealed that the cardiac endothelial cells consisted of molecularly distinct populations (Fig. 5A).
Clusters "AV endocardium" and "AV cushion" were distinguished by the expression of several genes characteristic of the endocardium of the AV region, which included hand2, has2, alcama, and crip2 [33,78,90] (Fig. 5B).Cluster "AV endocardium" genes included anxa1a and anxa5b, both of which are known to be expressed in AV canal region [31], as well as notch1b and klf2a which are early markers of the AV valve [78] (Fig. 5B).Cluster "AV cushion" was enriched in genes coding for extracellular matrix constituents including many collagen family (col1a1b, col1a2, col5a1/2a, and thbs1a/b), as well as postna which are known to be expressed in the AV cushion [33].This cluster also had the highest expression of col1a2 which is implicated in human cardiac valve disease [91] (Fig. 5B, Supplementary Table 8).
Noticeably, several of the endothelial subclusters were significantly enriched in functional terms related to blood vessel morphogenesis and angiogenesis (Fig. 5C, Supplementary Table 12).Cluster "Proliferating cells" was significantly enriched in the expression of genes encoding cell cycle regulators (Fig. 5C, Supplementary Table 8).Cells of this cluster uniquely express birc5a which is known for its role in vasculogenesis and angiogenesis [92].Cells of "Angiogenesis I" cluster expressed the most number of genes implicated in sprouting angiogenesis, including hapln1b [93], sdc2 [94], and rtn4a [95] (Fig. 5B).
Cluster "Angiogenesis II" shared a partially overlapping set of angiogenesis genes with "Angiogenesis I", such as clec14a [96], as well as sele [97], pecam1 [98] and serpine1 [99].Apart from sele and serpine1, cluster "Angiogenesis III" also expressed several unique genes including cnn2 which is implicated in blood vessel endothelial cell migration [100] (Fig. 5B).The rest of the endothelial cell clusters were not enriched in any specific functional category other than that which suggest their endothelial identity and hence are likely to be endocardial lining of the cardiac chambers.Collectively, the diversity of the cardiac endothelial cell population reflects their contribution to various structures of the heart and suggests their functional diversity.

Neural crest and the intracardiac nervous system
Besides the main cardiac cell lineages, we also identified other cell populations whose expression profiles we provide as a resource (Supplementary Table 9).One important but poorly characterized cell populations are those making up the cardiac peripheral nervous system, which ensures the regulation of heart contraction speed according to physiological demands [101].To evaluate the diversity of the neural and neural crest cell populations in the zebrafish heart, we re-analyzed the main clusters "Cardiac peripheral nerves", "Neural crest" and "Neuronal cells".We distinguished subpopulations of cells which could be broadly subdivided into neuronal and non-neuronal cells (Fig. 5D).Clusters "Neurons I-III" and "Neural crest I-III" were enriched in the expression of neurodifferentiation markers including elavl3, elavl4, stmn1b, tuba1a, and tubb5 [64,[102][103][104] (Fig. 5E, F).They were also enriched in the expression of sncb which is involved in dopaminergic neuron differentiation and development of motor functions [105].
Compared to other neural subclusters, clusters "Neurofibroblasts I-IV" were depleted in neural and neural crest markers expression while enriched in the expression of id1 and id3 known for their role as inhibitors of neurogenesis [118] (Fig. 5E, Supplementary Table 9).Clusters "Neurofibroblasts I" and "Neurofibroblasts II" expressed a large number of genes implicated in cell cycle such as jun/ba/bb/d and cdkn1bb.They also uniquely expressed egr1 which is implicated in neural stem cell maintenance [119], suggesting the characteristics of neuron stem-like cells.Hence, their expression profiles suggest their identity as neural fibroblast cells.Cluster "Neurofibroblasts III" was enriched in genes coding for the most number of extracellular matrix components including collagens and keratins, which suggests structural function.Together with "Neurofibroblasts IV" cluster, it was also enriched for postnb encoding the zebrafish ortholog of Periostin which is known to be expressed in endoneurial cardiac fibroblasts contributing to sympathetic nerve fasciculation in the mammalian heart [120].Taken together, the expression resource for these clusters, as well as others, could potentially reveal novel cardiac cell types which are poorly explored and open new pathways of investigations.

Discussion
We present a high resolution atlas of the developing zebrafish heart between 48 hpf and 72 hpf obtained through single cell transcriptomics.The developmental stages chosen constitute a critical phase of heart development, where the development of specialized cardiac structures occur, including the formation of atrioventricular canal, heart looping, as well as the development of the cardiac valves and conduction system [78,121].In addition, this developmental period also covered the time when external cell types are incorporated into specialized structures of the heart, such as the neural crest and the proepicardium [7,17].
Our study delineated major cardiac cell lineages which were supported by known molecular markers.In addition, spatial correlation analysis points to main clusters which possess specific anterior-posterior localization within the heart tube, which provided further support of their identity.Further correlation analysis between the main and subcluster components (fine-grained clusters at higher resolution, see Supplementary Methods) revealed the homogenous composition of several main clusters, while pointing out the heterogenous composition of other clusters, such as the mesenchymal fibroblasts, on which a scarce literature information is available and annotation is therefore less precise.To date, there is only one published spatial transcriptomic data on the embryonic zebrafish heart [31].Although the section-based method limits the cellular resolution, it still provided a reliable positional reference for major cardiac structures.Future spatial analysis at higher resolution is expected to enable the annotation for each cell type with improved accuracy.
Our analyses uncovered subpopulations within the major cardiac cell lineages, including previously uncharacterized subtypes of cells, along with their molecular profiles.Previous single cell analyses on the developing epicardium in zebrafish has uncovered distinct subpopulations, each possessing specific genetic signatures and function [28].Similarly, transcriptome profiling of mammalian cardiomyocytes at the single cell level have revealed unprecedented molecular diversity which seems to suggest further functional specializations and/or distinct energetic profiles within this cell type [25,85,122].A specific population of cardiomyocytes that express Myoz2, a member of the sarcomeric protein family that is implicated in hypertrophic response [123], was recently described in mouse and humans [26,85].We observed that myoz2b, the zebrafish ortholog of this gene was expressed in the zebrafish myocardium.
myoz2b expression were detected in subsets of cells across almost all myocardial clusters.Determining the features and function of this particular cardiomyocyte population would provide key insights into how it contributes to the overall functioning of the heart.
A similar functional heterogeneity was also observed among cardiac endothelial cells.An interesting observation is that several endothelial cell clusters were found to be enriched for genes associated with blood vessel formation and angiogenesis.It was previously reported that the zebrafish coronary vasculature originates from the endocardial population and develops only from 1-2 months post-hatching [124].Moreover, endocardial cells are known to share the expression of vascular markers [125].Therefore it would seem unlikely that the enrichment of these functional categories reflects an actual process of vasculogenesis.Nevertheless, the distinct molecular profiles of the cardiac endothelial cell populations suggest the presence of functional diversity at the embryonic stage of heart development.
Our single cell transcriptome analyses also captured diverse populations of at least two extracardiac cell lineages -the cardiac neural crest and fibroblasts.Annotation of these cell types, particularly the latter, are still challenging as there are currently very few molecular markers that can be used to define them [8].
Previous studies in the zebrafish have focused on cardiac fibroblasts within the context of regeneration and hence capture only the activated form [126,127].We nevertheless provide their expression profiles as a resource which is envisaged to open the pathway for further studies of various aspects of cardiac biology in the zebrafish model organism.
The upcoming challenge will be to ascribe function to each cell type and determine the extent to which these diversity of cell types are conserved in comparison to the human heart in terms of their molecular properties and function.As a an initial step towards this effort, we functionally characterized two genes, atp1b3b and colec10, which were specifically enriched in the "Sinoatrial CMs" cluster representing the primary pacemaker region.Our loss-of-function analyses revealed the potential role of both genes in heart development and maintaining heart rhythm.The atp1b3b gene encodes the ꞵ3 subunit of the Na+/K+ ATPase family responsible for osmoregulation and electrical excitability of nerve and muscle.In the heart, the ɑ1 and ɑ2 isoforms, as well as the ꞵ1 are predominantly expressed [128].However, some studies have also reported the expression of ꞵ3 isoform in the ventricular myocytes [129].Evidence have suggested that a functional distinction exist between the isoforms, although the exact mechanism is still poorly understood.Interestingly, our single cell transcriptome analysis showed that atp1b3b had the most restricted expression in the sinoatrial CMs compared to other genes encoding the ɑ and ꞵ subunits.This may also underlie the specific heart rhythm phenotype observed upon their loss of function.Future analysis will further distinguish the function and mechanism of the different members of the Na+/K+ ATPases which has potential implications in developing therapies for heart failure.The gene collectin sub-family member 10 (colec10) encodes a member of the C-type lectin family [34], mutations of which have been associated with the rare 3MC syndrome 3 in humans [130].The gene product of colec10 was implicated in cellular movement and migration in vitro [130], while scarce information is available on the function of this gene in vivo.Our observation of its specific expression in the sinoatrial CMs, together with the defects observed upon its loss of function, therefore suggest that this gene may be a new player in regulating heart development and/or function.4).
Atrial CMs Ventricular CMs

Figure 2 .
Figure 2. The expression profiles of main cardiac cell clusters correlate with heart structures of known spa�al localiza�on.A Dotplot showing the top three differen�ally expressed genes for all cell clusters except "Unclassified".B Spa�al projec�on of cardiac single cell expression profiles to zebrafish heart sec�ons derived from Burkhard et al.[31] at 48 hpf.C Interac�on map between cell clusters according to ligand-receptor expression.Each line connect individual ligand with its corresponding receptor (source data: Supplementary Table4).
calcium ion into cytosol by endoplasmic reticulum sodium ion export across plasma membrane release of sequestered calcium ion into cytosol by sarcoplasmic reticulum sinoatrial node development bud elongation involved in lung branching mammary gland formation positive regulation of Rac protein signal transduction regulation of relaxation of cardiac muscle sinoatrial node cell di erentiation a tr ia l C M s A tr ia l C M s I A tr ia l C M s II A tr io v e n tr ic u la r C M s V e n tr ic u la r C M s C a r d io m y o c y te s C lu s te r

FFigure 4 .Figure 5 .
Figure 4. Func�onal analysis iden�fied atp1b3b and colec10 as new players in heartbeat maintenance.A Expression of atp1b3b and colec10 within the myocardial subclusters.The highest expression for both genes was observed within Sinoatrial CMs cluster.B Scheme illustra�ng gene loci targeted by CRISPR/Cas9 strategy.The combina�on of four sgRNAs was used to target EGFP whereas three sgRNAs were u�lized for atp1b3b and colec10.C Barplot summarizing observed phenotypes from all replicates for each gene knockout.Numbers within each sec�on indicate the number of embryos associated with a par�cular phenotype.D Fluorescent images showing heart looping defect in colec10 knockouts as compared to uninjected embryos (WT represents embryos on Tg(myl7:EGFP-Hsa.HRAS) s883 background at 48 hpf).E Heartbeat rate comparison between appropriate knockouts, uninjected wild-type control, and EGFP-injected embryos (significance values according to Wilcoxon rank sum test).F Representa�ve image showing morphology of atp1b3b and colec10 knockout embryos.
The reduced heart rate in EGFP knockouts could possibly result from developmental delay triggered by microinjection.Nevertheless, direct comparison between embryos injected with sgRNAs targeting either atp1b3b or colec10 and EGFP still indicated significantly lower heart rate in the former (Fig 4E), thus suggesting the specificity of heart rate phenotype to the knockdown of both genes.The genetic mozaicism of these F0 individuals limited our ability to ascertain the true function of the two candidate genes.Hence, more detailed analysis of stable mutants is necessary and underway.Collectively, these observations suggest that the two newly identified sinoatrial pacemaker genes play a role in heart development and regulation of heart rhythm.
, Supplementary Movie 1-2).Knockout of atp1b3b resulted in a median of 118 bpm knockout embryos developed arrhythmia later on (by 72 hpf or 96 hpf) characterized by either irregular heart rhythm or silent ventricle (at least 4 out of 50 embryos observed; Supplementary Movie 3-4).

Analysis of the "Myocardium" cluster revealed the diversity of myocardial cells and pinpointed sinoatrial cardiomyocytes within the atrial myocardium. A
Two dis�n11.iden��es of chamber myocardium could be delineated by molecular markers: Atrial CMs by expression of myh6 and Ventricular CMs by myh7l expression.B Gene Ontology enrichment analysis of all genes enriched within the "Myocardium" cluster.C Volcano plot depic�ng differen�ally expressed genes between atrial and ventricular CMs frac�ons.D UMAP projec�on of re-clustered myocardial cells reflec�ng the heterogeneity of atrial and ventricular myocardium.E Dotplot showing the expression of well-established gene signatures associated with working myocardium and sinoatrial pacemaker within "Myocardium" clusters.F Gene Ontology enrichment analysis of genes enriched within the "Sinoatrial CMs" subcluster.G Differen�ally expressed genes in each myocardial subclusters.Top 8 genes in terms of significance were labeled.Genes with adjusted p-value < 0.05 are depicted as red dots while not significant genes (adjusted p-value > 0.05) are shown in black.Source data: SupplementaryTable 10,11.