Abstract
Extensive studies of the reference plant Arabidopsis have enabled deep understandings of tissues throughout development, yet a census of cell types and states throughout development are lacking. Here, we present a single-nucleus transcriptome atlas of seed-to-seed development employing over 800,000 nuclei, encompassing a diverse set of tissues across ten developmental stages, with spatial transcriptomic validation of the dynamic seed and silique. Cross-organ analyses revealed transcriptional conservation of cell types throughout development but also heterogeneity within individual cell types influenced by organ-of-origin and developmental timing, including groups of transcription factors, suggesting gatekeeping by transcription factor activation. This atlas provides a resource for the study of cell type specification throughout the continuum of development, and a reference for stimulus-response and genetic perturbations at the single-cell resolution.
One-Sentence Summary A single nucleus atlas of seed-to-seed development in Arabidopsis charts a course through the lifecycle of an organism.
Introduction
Multicellular organisms have evolved various organs to perform specific functions required for the organism to survive and flourish. Plants, unlike most animals, undergo dynamic post-embryonic organogenesis to form new organs over the course of life. Some plant organs are highly specialized and developmental stage-specific, and many organs consist of various cell types with distinct sub-functions. However, some cell types share functionality across diverse organs and developmental stages based on anatomical and physiological features. For instance, epidermal cell types, generally defined as part of the outermost layer of cells of an organ, protect the organ from and interact with external environmental cues, whereas internally localized vascular cells are required for the transport of water and nutrients (1, 2). Yet, we still have a limited understanding of many cell types at the molecular level in the context of organ specificity. High throughput single-cell RNA-sequencing (RNA-seq) has been demonstrated to provide detailed maps of cell types in plants (3). Still, its application is currently limited to selected organs, tissues, and cell types (4–8), with a predominant focus on the Arabidopsis root tip (9–11), posing a bottleneck toward a comprehensive understanding of cell types and states in this model organism. This motivated us to generate an extensive single-nucleus transcriptome atlas of Arabidopsis development that encompasses all major organs present over the plant’s entire life cycle. Global characterization of plant cell types across organs at the molecular level will be paramount for understanding organ development and function. All data and annotations can be accessed through a web portal at http://arabidopsisdevatlas.salk.edu/.
Results
A comprehensive single-nucleus atlas of the Arabidopsis lifecycle
To generate a comprehensive atlas of Arabidopsis development, we collected six distinct organs that encompass a diverse range of tissues present at several developmental stages and transitions throughout the entire life cycle corresponding to developmental roadmaps (12–14), including imbibed and germinating seeds, three timepoints of seedling development, developing and fully emerged rosettes, the stem (apical, branched area and stem base), a range of flower tissue (unopened flower buds to fully mature flowers), and siliques (immature to fully elongated green siliques) (Fig. 1A). To circumvent the large range of plant cell sizes and reduce heterogeneous sampling effects between tissues, a universal nuclei extraction protocol, amenable for droplet-based single-nuclei sequencing was developed, and further optimized for each tissue (see Supplementary methods). A total of 801,276 nuclei from the ten samples passed accepted droplet-based single nuclei filtering metrics (15) and were independently clustered and merged into a global dataset (Fig. 1B). A parallel analysis using higher stringency filtering cutoffs resulting in 432,919 nuclei revealed a similar global structure (Fig. 1C), demonstrating the overall high quality of the larger dataset, but was associated with overall increased cluster number and resolution. While it is possible that the loss of clusters and cell types may occur with the higher stringency dataset, we proceeded with the higher stringency dataset for downstream analyses but also maintained the larger dataset.
The global clustering of all ten datasets revealed several distinctly separated clusters, with the presence of a large central cluster (Fig. 1C). To determine the defining characteristics of the large cluster, we examined the aggregated and individual expression of the top 50 marker genes of these cells (Fig. 1D and figs. S1A-H). Many top markers are involved in photosynthesis, suggesting that this biological process has a dominating influence on the transcriptome of cells with high levels of photosynthetic activity (fig. S1I). To assess the heterogeneity of this central cluster, we removed the abundantly expressed photosynthesis-related genes and re-clustered the resulting data (fig. S2), which slightly improved cluster resolution. However, a large central cluster remained, suggesting a conserved functionality across several populations of photosynthetically active cells.
Overall, we chose to more deeply sample seedling tissue due to the potentially higher quantities of discrete cell types, contributing to the presence of both above- and below-ground tissues (Fig. 1E). Across all tissues and time points, we observed comparable transcriptional complexity between samples and tissues (Fig. 1F). Independent clustering of each dataset revealed a total of 183 clusters, many of which correspond to individual cell types (Figs. 2A-B, fig. S3, and Table S1). As high-resolution cell type maps are not currently available for many of these tissues and developmental time points, annotation was guided by curated marker genes identified in previous studies (see Methods and Table S2). The organ with the greatest cluster complexity was the silique, where we identified 26 major clusters (Fig. 2B and Table S1), which may arise from the diversity of tissues within the fruit organ (fruit flesh and developing embryonic tissue), and from real-time developmental gradients present both across individual siliques and within each silique (16). Further, within individual siliques, diverse stages of embryonic development from fertilization to zygotic to embryonic development can be present along the longitudinal axis of the fruit based on the timing of fertilization of individual embryo sacs. Using previously validated cell type-specific marker genes of the silique (17), we were able to annotate many of the clusters to individual cell types of this organ (Fig. 2C and Table S1) along with several cell types that comprise the developing embryo and seeds (Figs. 2B to D).
To explore the transcriptional heterogeneity within and across individual cell types, we systematically performed subclustering of the 183 major clusters identified from all organs, resulting in a total of 653 subclusters representing both unique cell identities or states across plant tissues and development (fig. S4A). Comparing the aggregated transcriptomes (pseudo-bulk) of individual subclusters revealed groups of subclusters derived from individual organs that share conserved transcriptional identities, as well as subcluster groups associated with unique gene expression patterns (Fig. 2E), which was further supported by analyzing overlapping sets of de novo, identified cluster marker genes across all 653 subclusters, where we observed particularly high cluster diversity among subclusters of seeds and siliques (fig. S4B). Taken together, our seed-to-seed transcriptome atlas captured known cell types and a diverse set of previously uncharacterized cell populations at the major and subcluster levels.
Reconstructing development along real time and pseudotime
To explore real-time developmental dynamics of individual cell types, we utilized the seedling datasets that comprised identical tissue and organ compositions at three developmental time points (3-, 6-, and 12-day-old-seedlings). Integration of these datasets revealed that all clusters were represented by each timepoint, demonstrating a conservation of cell types across these three stages of seedling development (Fig. 3A and fig. S5). However, despite successful integration, the populations of several clusters were skewed by developmental age, including the annotated root hair clusters. It is well characterized that the timing of root hair (I) specification, (II) emergence, (III) elongation, and (IV) maturation is developmentally gated (18), with root hairs first observable around two days of growth, suggesting that the skewed developmental representation reflects varying degrees of root hair differentiation and development; thus our time-resolved seedling atlas provided an opportunity to reconstruct this developmental process by trajectory inference and pseudotime analysis supported by real-time ground truth. Re-clustering of the three annotated root hair clusters revealed heterogeneity within this population of cells (Fig. 3B). Fitting a trajectory and pseudotime to these cells revealed a continuous gradient of root hair maturation, capturing the expression of known root hair developmental stage-specific genes (Fig. 3C). Affirming our prior findings, we observed that nuclei of the youngest seedling dataset (3d) were enriched early in the trajectory. In contrast, nuclei of the older seedlings progressively populated intervals of late pseudotime (6d and 12d) and at greater proportions (Fig. 3D). Identification and clustering of genes differentially expressed over pseudotime revealed three expression patterns, with gene modules that largely correlated to these three stages of root hair development (Fig. 3E). Of note, these de novo identified gene groups are not enriched for processes related to root hair development but rather broader GO terms such as anatomical structure process and glycerolipid metabolic process, implicating novel roles of those genes in root hair development (fig. S6).
To utilize the breadth of our datasets, we sought to determine if these spatiotemporally expressed gene modules have a conserved role in other tissues and/or cell types. Mapping the expression of each gene module to the integrated seedling and global datasets revealed examples of both highly specific and broad roles of these genes across development (Figs. 3F-H). For example, gene modules corresponding to root hair differentiation in pseudotime (intermediate; II) were exclusively expressed in this population of cells within the context of all tissues and developmental time points assayed (Figs. 3F-H), suggesting that these genes function solely within this trajectory of root hair development. Conversely, genes corresponding to both early and late pseudotime of root hair development were more broadly expressed in other tissues and cell types (Figs. 3F-H), suggesting that the transcriptional programs associated with epidermal cell fate and root hair maturation are shared among other biological programs throughout development. Overall, our time-resolved atlas, including broad tissues and developmental stages, revealed examples of transcriptional programs that are shared across cell types of various tissues and programs uniquely utilized by a single cell type at discrete stages of development.
Cross-tissue analyses of conserved cell types
To determine if transcriptional programs between like-cell types are shared between identical cell types originating from various organs, we evaluated discrete populations of cells identified at the global clustering level (Figs. 1C and 4A). Expression of known marker genes was restricted to individual clusters as evidenced by cluster-specific expression of the stomatal lineage and phloem marker genes, FAMA [AT3G24140, (19)] and SUC2 [AT1G22710, (20, 21)], respectively. The composition of these clusters was well represented by cells from all tissue types assayed (Figs. 4B-D), demonstrating that the transcriptional identity of these cell types is strongly conserved throughout development. Quantification of other distinct clusters revealed that many clusters were well represented by various tissues-of-origin (figs. S7A and B).
Subclustering of the phloem and guard cell lineage populations revealed additional heterogeneity not observed at the global clustering level, whereby cells originating from similar tissues of origin (three seedling datasets, two rosette datasets) co-segregated (Figs. 4B to C). Conversely, guard cell lineage and phloem cells from the terminally differentiated reproductive and germline tissues clustered distinctly from the vegetative tissue, suggesting organ-specific influences on the transcriptional identities of these cell types. Exploring the expression patterns of de novo identified subcluster markers of guard cell lineage and phloem cells at the global and cell type levels revealed two classes of genes; (I) organ/tissue-specific - expressed throughout various cell types of an individual tissue (identified as subcluster markers in at least half of the subclusters of a sampled organ or tissue) (Fig. 4E and Table S3), and (II) unique - under the combined regulation of cell type-specific and developmental specific programs (Fig. 4F and Table S3). Extending this analysis to include ten distinct clusters (figs. S7 and S8) further revealed the presence of genes both with tissue level expression patterns and under the combined regulation of cell type and developmental timing (2,941 genes) (Figs. 4G-H and figs. S7C and D).
Of note, we observed the largest quantity of overlapping subcluster markers across cell types of the germinating seed (1.25d) (Table S3). A deeper investigation of conserved markers of germinating seeds revealed that many of these genes with shared expression across cell types encode a diverse range of ribosomal protein subunits (figs. S9 and S10). The unified expression of ribosomal proteins across cell types in the germinating seed dataset coincides with global increases in ribosome biogenesis and translation during seed germination (22). While the expression of these ribosomal protein genes is present across several cell types of germinating seeds, this was not observed for all of the other tissues assayed, where subsets of ribosomal proteins were expressed only in cells of specific tissues or in cell type(s) specific expression patterns, supporting the hypothesis that the heterogeneous composition of ribosomes are developmentally regulated (23), and may explain the higher levels of homogeneity within this tissue.
Together, these findings suggest the presence of cell populations whereby transcriptional identity is influenced by organ- or tissue-specific transcriptional networks but also unique cell populations with context-specific transcriptional regulation. This may be correlated with cell functionality across tissues, as some cell types and states are conserved regardless of tissue of origin, while others may encounter different environmental stimuli, such as abaxial versus adaxial leaf epidermal guard cells.
Transcription factor specificity across development
We systematically analyzed TF expression across the identified cell populations to understand cell type and organ-specific gene regulatory mechanisms. Many TFs showed organ-specific expression (Figs. 5A and B), but we also observed heterogeneity within organs (figs. S11A-C). For instance, AT3G15510 (ANAC056) was expressed specifically and globally in the silique tissue (Fig. 5C); AT3G62340 (WRKY68) was also specifically expressed in the silique, within a subset of cells, but also within the entire context of development (Fig. 5D). Notably, we identified TFs expressed in specific clusters of each organ. TFs such as FAMA, WRKY23, and BIM1 were highly expressed in clusters mostly annotated as guard cells in seven different samples (Fig. 5E; highlighted in red), suggesting that these TFs have conserved cell type-specific roles across various organs and development.
We next asked whether TF clades can explain organ specificity and within-organ heterogeneity of expression patterns. We performed principal component analysis on the expression of 70 TF clades in individual clusters and found both cases where TF family members showed either shared or diverse cell type/organ specificity (Figs. 5F-I and fig. S11D). For instance, expression patterns of WRKY TFs are primarily separated into three organ groups (regardless of cell type): (I) germinating seeds (1.25d), (II) imbibed seeds (0d) and seedlings, and (III) rosette, stem, flower, and silique. Cell type/state heterogeneity was observed within each organ group (Fig. 5F). Some TF families have largely expanded and showed high degrees of heterogeneity (Figs. 5J and K and fig. S11E and F). Interestingly, we found that, in some cases, phylogenetic relationships of genes within TF families can explain their organ/cell type specificity, as clusters associated with cell types and/or organ formed grouped when TF family members were clustered phylogenetically (Fig. 5J; highlighted with red boxes). We also found cases where closely related TFs showed contrasting expression patterns (Fig. 5J; highlighted with white boxes).
Basic leucine zipper (bZIP) TFs recognize various DNA sequences by forming various homo- and hetero-dimers (24). A previous study showed that Group C bZIP TFs (bZIP9, bZIP10, bZIP25, bZIP63) do not bind DNA as homodimers but form heterodimers with Group S1 members (bZIP1, bZIP2, bZIP11, bZIP44, bZIP53) to bind unique DNA sequences (25). We found that genes in each group showed varying expression patterns (Fig. 5K). Group C TFs were clustered with at least one Group S TF with overlapping expression patterns (Fig. 5K), implying the co-evolution of cell type/organ specificity in these dimerization pairs. Our dataset will be valuable for analyzing the neo/sub-functionalization of TFs over the course of gene family expansion, which has previously been suggested for some genes (26).
Spatial mapping of cell types
Annotating cell types and cell states to clusters identified in snRNA-seq requires a priori knowledge of previously validated cell type-specific markers by imaging- and dissection-based analyses. However, such ground truth cell type markers are unavailable for many organs and cell types within the context of the entire Arabidopsis lifecycle. Therefore, to validate our cluster annotations, we utilized two spatial transcriptomics technologies to simultaneously validate cell type-specific expression patterns of several de novo identified clusters from two datasets associated with highly dynamic stages of growth; germinating seeds (1.25d seeds) and siliques.
To spatially profile the transcriptome of germinating seeds (1.25d) (Fig. 2A), we utilized a sequencing-based spatial transcriptomics platform with 10 µm spatial resolution to profile cell type and cell layer-specific transcriptomes (27). Plotting the transcript detection of spots in spatial coordinates revealed shapes resembling both seed and embryo structures (figs. S12A and B). Dimensionality reduction and de novo clustering of the filtered spatial transcriptomics data revealed two major groupings of clusters (Fig. 6A). Spatial mapping revealed that these clusters broadly correspond to the cotyledons (clusters 0 and 5), root tip region (cluster 1), epidermis (cluster 2), seed coat (cluster 3), and the provasculature (cluster 4) (Figs. 6B to D). Mapping the expression of cluster markers corresponding to the cotyledon and epidermal clusters (clusters 0 and 2; fig. S10) onto the matched droplet-based single-nuclei dataset similarly revealed cluster-specific expression patterns to the correspondingly annotated cell types (Fig. 6E), demonstrating the ability to accurately annotate the droplet-based clusters.
As fully elongated siliques simultaneously represent dynamic growth patterns associated with both small (egg sac; embryo) and large tissues (∼15 mm silique when fully elongated), we opted for an imaging-based spatial transcriptomics method (MERFISH; see Methods) that can accommodate higher resolution of small cells as well as larger tissue areas to simultaneously visualize the expression of many transcripts on a tissue section at single-molecule resolution. We selected 140 de novo identified silique cluster marker genes and spatially mapped them on 10 µm tissue sections of fully expanded green siliques (Fig. 6F; Table S6). Marker genes of clusters predicted to be embryo, and chalazal endosperm (Fig. 2B-C, Table S2) were successfully mapped in expected regions of the tissues (Figs. 6G and H). Among these markers, AT2G44240 has been validated as a chalazal marker by in situ hybridization (17). We also identified cells in the stomatal lineage based on expression of the guard cell marker gene FAMA (19) and validated the de novo identified guard cell marker gene AT3G16340 (PDR1) (Figs. 6I and J). These results validate our cell type prediction and, de novo identified marker genes of these cell types. Mapping the expression patterns of these highly specific genes revealed that many of these cell type-specific transcripts of the silique are uniquely expressed in this tissue throughout all stages of Arabidopsis development assayed in this study, highlighting the ability to investigate cell type and developmental-specific expression patterns in our dataset (Figs. 6G and J). In summary, spatial transcriptomics validated de novo-identified cluster markers and facilitated cluster annotation in these complex organs.
Discussion
In this study, utilizing single-nucleus sequencing technologies, we sought to characterize a diverse range of tissues and organs along the entire life cycle of an organism. The post-embryonic development and emergence of distinct tissues and organs is a notable aspect of plant development, which is not present in most other systems, and was incorporated into our experimental design. With a broad sampling strategy spanning several well-studied developmental stages, our single-nucleus atlas of Arabidopsis development provides a resource of wide use to the Arabidopsis and plant community for hypothesis generation and reference for future single-cell genomics studies.
We identified tissue of origin transcriptional signatures in like or identical cell types, uncovering additional layers of heterogeneity that holistically contribute to the transcriptional identity of individual cells. Contrastingly, we also identified genes under the combined regulation of cell type and tissue-of-origin specificity, with many of these genes uniquely expressed solely within a single cell type of single organs. Many of these genes uniquely expressed in our dataset are not functionally annotated and thus serve as potential candidates for follow-up studies. Interestingly, cell types of the terminal reproductive tissues containing both germ- and non-germ-line cells were associated with greater quantities of genes uniquely expressed in these organs throughout all cell types and developmental stages assayed, including many TFs, some of which showed phylogenetic conservation, which could explain their organ specificity. Further phylogenetic analysis of large gene families could reveal evolutionary mechanisms underlying the diversification and specification of gene expression across cell types and organs.
While some high-resolution cell-type atlases exist for selected Arabidopsis organs, tissues, and cell types, the spatiotemporal axis of development, derived from the non-renewing nature of plant cells, introduces additional levels of complexity that can obfuscate the interpretation of dissociation-based single-cell methods. An a priori understanding of cell-type specific developmental programs can aid in the understanding and annotation of single-cell datasets (9), but such datasets have yet to be generated for the entirety of Arabidopsis tissues and development.
Application of two spatial transcriptomics technologies to the germinating seed and silique tissues revealed that high-resolution spatial studies can disentangle plant organ heterogeneity, as evidenced by the validation of both cluster markers and cluster annotations in these highly dynamic tissues. As in situ spatial transcriptomic methodologies require knowledge of up to thousands of transcripts with distinct expression patterns, our datasets also function as a foundation for future spatial transcriptomics experiments
Overall, our dataset highlights the complexity of cell types throughout the entirety of development of an organism.
Our dataset also serves as a foundation for future studies that seek to further characterize these tissues and developmental stages, or atlases investigating stress or stimulus-driven responses, with access to the processed datasets for rapid exploration in a web-based interface available at our web portal (http://arabidopsisdevatlas.salk.edu/).
Funding
J.R.E. is an Investigator of the Howard Hughes Medical Institute
Author contributions
Conceptualization: TL, TN, JRE
Methodology: TL, TN, NIE
Investigation: TL, TN, NIE, BJ, JRN
Data curation: TL, TN, JRN
Data analysis: TL, TN, JX
Visualization: TL, TN, NIE
Funding acquisition: JRE
Project administration: TL, TN, NIE
Supervision: JRE
Writing – original draft: TL, TN, NIE, JRE
Writing – review & editing: TL, TN, NIE, JRE
Competing interests
Authors declare that they have no competing interests.
Data and materials availability
All datasets are available for browsing at arabidopsisdevatlas.salk.edu. Raw data and processed datasets are available at GEO (accession number GSE226097).
Supplementary Materials
Materials and Methods
Figs. S1 to S12
Tables S1 to S6
Acknowledgments
We would like to thank Genevieve Zzyzyx for the illustrations. We would also like to thank James Walker for help with annotation of pollen cells and manuscript input, and Hanqing Liu for reading and input to the manuscript. T.N. was supported by Human Frontiers Science Program (HFSP) Long-term Fellowship (LT000661/2020-L). N.I-E is a research fellow at the George E. Hewitt Foundation for Medical Research and an Awardee of the Weizmann Institute of Science – Israel National Postdoctoral Award Program for Advancing Women in Science.