Abstract
Single-cell RNA sequencing combined with spatial information on landmark genes enables reconstruction of spatially-resolved tissue cell atlases. However, such approaches are challenging for rare cell types, since their mRNA contents are diluted in the spatial transcriptomics bulk measurements used for landmark gene detection. In the small intestine, enterocytes, the most common cell type, exhibit zonated expression programs along the crypt-villus axis, but zonation patterns of rare cell types such as goblet and tuft cells remain uncharacterized. Here, we present ClumpSeq, an approach for sequencing small clumps of attached cells. By inferring the crypt-villus location of each clump from enterocyte landmark genes, we establish spatial atlases for all epithelial cell types in the small intestine. We uncover immune-modulatory programs in villus tip goblet and tuft cells and heterogeneous migration patterns of enteroendocrine cells. ClumpSeq can be applied for reconstructing spatial atlases of rare cell types in other tissues and tumors.
Introduction
Many tissues such as the liver, the intestine and the kidney, are composed of structured anatomical units1. Spatially varying concentrations of oxygen, nutrients and morphogens along these units dictate distinct gene expression signatures for cells residing at different spatial coordinates, a phenomenon termed ‘zonation’1. Approaches to spatially reconstruct zonation patterns combine scRNAseq with spatial expression profiles of landmark genes characterized by RNA in situ hybridization2–6. An alternative approach, when no prior knowledge of landmark gene candidates exists, entails the spatial measurements of the complete transcriptome of small tissue regions, isolated using laser capture microdissection (LCMseq)7. While these approaches successfully reconstruct the zonation patterns of the major cell types in a tissue, rare cell types are more challenging, since their transcript contents are diluted in the spatial measurement techniques.
A recent approach for sequencing pairs of attached cells enabled reconstructing the zonation patterns of liver endothelial cells, by utilizing the landmark genes of their attached hepatocytes8. This approach relies on the prospective isolation of mixed pairs, requiring unique surface markers for the cell types of interest, which do not generally exist. Reconstructing zonation patterns of rare tissue cells therefore remains an open challenge.
In the small intestine, epithelial cells operate in repeating crypt-villus units (Fig. 1a). Crypt-harboring stem cells and progenitors, supported by a Paneth cell niche, continuously divide to generate differentiated cells9. Around 90% of the differentiated epithelial cells are absorptive enterocytes. LCMseq-guided single-cell reconstruction revealed profound zonation of enterocyte gene expression along the villus7. Additional secretory epithelial cell types include mucus-producing goblet cells10,11 (∼8%), hormone producing enteroendocrine cells12 (∼1%) and chemosensory tuft cells13,14 (∼1%). These rare cell populations are important for the protection of the tissue and for communication with other stromal cell types and with other organs15. Using a reporter mouse model, Clevers and colleagues reconstructed the temporal expression programs of enteroendocrine cells along the crypt-villus axis12, however zonated expression patterns of goblet and tuft cells are unknown.
(a) The intestinal tissue is suboptimally dissociated to generate clumps. (b) Clumps are enriched with FACS, based on Hoechst DNA staining; the histogram shows ImageStream quantification of the clumps nuclear DNA content (n=3); Bottom shows an example of a pair (left) and a 4-cell clump (right). (c) The position of clumps is computationally inferred by the enterocyte transcriptome, and spatial landmark genes for specific secretory cells are retrieved. (d) These are used to infer the location of single sequenced secretory cells, enabling zonation reconstruction.
To overcome the limitations in reconstructing spatial expression profiles of rare cells, we present ClumpSeq, an approach for sequencing small clumps of attached tissue cells. Sequencing clumps increases the capture rate of rare cell types without the need for antibody enrichment, and utilizes the spatial information of the major tissue cell type. We use this approach to reconstruct spatial maps of all intestinal secretory epithelial cell types along the crypt-villus axis, revealing zonated immune-modulatory programs and heterogenous migration patterns.
Results
ClumpSeq enables reconstructing spatial expression patterns of rare cell types
ClumpSeq entails the sub-optimal dissociation of the epithelium into small clumps of 2-10 cells (Fig. 1a-b, Supplementary Fig. 1a-b) and sequencing of these clumps using scRNAseq protocols. Analysis of clumps increases the capture rate of rare epithelial cells, while avoiding the need for dedicated surface markers and for massive numbers of sequenced cells. The enterocyte transcriptome in each clump enables inferring the clump location along the crypt-villus axis (Fig. 1c). Such information facilitates extracting large panels of spatially-varying landmark genes that are specific to the rare secretory cell type of interest, enabling spatial reconstruction of their entire transcriptome by integrating single-cell data (Fig. 1d).
(a) Quantification of the number of cells per clump, based on Imagestream images using Hoechst DNA nuclear staining. n = 100 clumps per mouse (n =3 mice). (b) Representative Imagestream images of clumps. (c) Clumps with more than two cells have higher UMIs counts compare to clumps with 2 cells (pairs). Violin plot of total sum of UMI counts for pairs and larger clumps.
We applied sub-optimal tissue dissociation using EDTA without the addition of commonly used dissociation enzymes (Methods) and stained for DNA content using Hoechst dye. We used fluorescence-activated cell sorting (FACS) to select clumps based on the DNA content (Fig. 1b, Supplementary Fig.1b). We sorted the clumps into 384-well plates and applied the MARS-seq protocol16 for sequencing their transcriptomes. The resulting clumps exhibited zonation patterns, as evident by the distinct expression in different clumps of the crypt (Fig. 2a), bottom villus (Fig. 2b), mid-villus (Fig. 2c) and villus tip enterocyte genes (Fig. 2d). In addition to these zonated enterocyte genes, many clumps exhibited mRNAs of secretory genes, attesting to the successful capture of goblet cells (Fig. 2e), enteroendocrine cells (Fig. 2f), tuft cells (Fig. 2g) and Paneth cells (Fig. 2h). The fraction of cells with secretory transcripts was significantly higher in the larger clumps compared to the 2-cell clumps (Fig. 2i-l, Mann-Whitney p<10−6, Supplementary Fig. 1c). The expression of villus tip markers and crypt markers were strongly anti-correlated (Fig. 2m, RSpearman=-0.82, p<10−6), and Paneth cells markers were almost exclusively found in crypt clumps (94.6%, Fig. 2n). Thus, the sequenced clumps consist of cells that were attached in the tissue, rather than dissociated cells that have come into contact after tissue dissociation.
(a-d) tSNE plots of sequenced clumps, colored by log10 of summed expression of zonated enterocyte markers: (a) The crypt genes Mki67, Ccnb1, Ccnd1, Mcm2, Pcna and Olfm4; (b) The bottom villus genes Nlrp6, Lypd8, Il18, Reg1 and Reg3a; (c) The mid-villus genes Slc5a1, Slc2a5, Slc2a2, Slc7a7, Slc7a8, Slc7a9; (d) The villus tip genes Ada, Nt5e and Slc28a2, Creb3l3, Apoa1 and Apob. (e-h) tSNE plots highlighting clumps containing secretory cells, marked by black dots. Plots colored by log10 of summed cell type marker genes (Supplementary Table 7, Methods) for – (e) Goblet, (f) Enteroendocrine, (g) Tuft and (h) Paneth cells. (i-l) Large clumps increase the capture rate of secretory cells. Shown are violin plots of summed secretory derived transcripts (expressed in over 50% of single secretory cells with a mean over 5 fold higher than in enterocytes) in pairs compared to larger clumps (Supplementary Table 6) for (i) Goblet, (j) Enteroendocrine, (k) Tuft and (l) Paneth specific genes. Only crypt pairs and clumps were used to minimize effects from zonated genes. White circles are medians, vertical gray lines are 25-75 percentiles. Dashed Horizontal lines indicate the median value in the respective single secretory cell type (Supplementary Fig. 3). Numbers show the percent of clumps above this threshold, which most probably contain the respective secretory cell type. (m) Crypt and villus-tip enterocyte marker genes are not found in the same clumps, indicating the clumps did not form after tissue dissociation. (n) Violin plot of log10 of 1+summed paneth markers (Supplementary Table 7) in all large (more than two cells) clumps, stratified by inferred zone. White circles are medians. Dashed horizontal line indicates the median value in paneth containing large clumps. (o) Geometric classification of clumps. Representation of clumps in PCA space based on the type-specific markers summed expression. Enterocyte-only clumps are at the origin, each ray contains a different secretory cell type (Methods, Fig.S2d,e, Supplementary Table 7).
(a) Scatter plot of pairs in 3D PC space of type markers sum (Methods, Supplementary Table 7), colored by classification markers sum. The pairs form four rays emanating from the origin of the axis. Along each of the rays, the marker sum of a single secretory cell type is highest and reaches a peak at the tip of the ray furthest from the origin. (b) tSNE plot of all clumps colored by their assigned zone. (c) Validation for clumps zone assignment, by comparing center of mass of enterocyte specific genes in clumps based spatial reconstruction with single cell spatial reconstruction performed in Moor et al.7 (Methods). (d) Summed expression of enterocyte type markers (Supplementary Table 7) as function of R, the distance from origin in PC space, for clumps and pairs (Methods). (e) Violin plots of total sum of UMIs per clump, for clumps within and outside the lower 25 percentile of R values.
We developed a geometric algorithm for classifying the clumps into their cell-type constituents (Fig. 2o, Supplementary Fig. 2a, Methods). We reconstructed the location of each clump using enterocyte landmark genes (Supplementary Fig. 2b-c, Supplementary Table 1, Supplementary Table 2)7. We next used scRNAseq datasets12,17,18 to identify genes that are expressed at high levels in the secretory cells of interest and at low levels in enterocytes. The transcripts of these genes in clumps likely originate from secretory cells rather than from enterocytes. Among these secretory cell-specific genes, we identified sets of landmark genes that decreased or increased in expression from the crypt bottom clumps towards the villus tip clumps (Methods, Supplementary Table 1). These constitute secretory landmark genes that can be used for single-cell reconstruction.
Tip goblet cells exhibit enriched immune modulatory programs
We first applied our approach to goblet cells, the most abundant secretory cell type in the intestine10. Our ClumpSeq data included 1,140 high-confidence goblet cell containing clumps. We extracted 371 landmark genes (Supplementary Table 1) and used them to infer the position of single goblet cells, sequenced with the MARS-seq protocol (Supplementary Fig. 3). We grouped these goblet cells into 5 zones, ranging from crypt goblet cells to villus-tip goblet cells and computed a zonation table by averaging the cells within each zone (Supplementary Table 3, Methods). We found that around 30% of the highly expressed genes in goblet cells were significantly zonated (1,187 out of 3,967 genes expressed to levels above 5*10−5 had zonation q-value<0.25, Fig. 3a). We used single molecule fluorescense in-situ hybridization (smFISH, Fig. 3b,e) and the ClumpSeq data (Supplementary Fig. 4a) to validate the accuracy of our goblet zonation reconstruction (Spearman correlation between the ClumpSeq and single-cell reconstructed zonation profiles of a validation set, R=0.6, p=8*10−4, Methods).
tSNE plots showing the identified clusters from the single cell sequencing: (a) Enterocytes, colored by Apob, (b) Goblet cells, Muc2+, (c) Tuft cells, Dclk1+ and (d) Enteroendocrine cells, Chga+.
(a) Validation of the reconstructed zonation profiles of goblet cells using clumps data. Blue line represents the reconstruction based on clumps mean expression level, black line - the reconstructed profile based on the single cell analysis. Light patches are the SEM. Profiles were smoothed with a moving average window of 3. (b-d) Zonated expression of (b) ligands, (c) receptors and (d) transcription factors in goblet cells. Only genes with expression above 5*10−5 and qval<0.25 are shown
(a) Reconstructed zonation profiles based on single goblet cells. Plots on the left show zonation profiles of representative crypt (Sox9, Rpl4, Agr2, Spink4), mid-villus (Slc1a5, Slc38a2) and villus tip (Muc2, Cxcl16, Egfr, Ido1) genes. (b) Validation of the reconstructed zonation profiles using smFISH. Blue line represents smFISH mean expression level, red line the reconstructed profile based on the single cell analysis. Light areas denote the SEM. (c) Heatmap of zonation profiles of genes related to mucus composition. (d) Gene set enrichment analysis24 for hallmark (H) and KEGG (K) pathways (FDR <0.05), tip enriched sets are in red, bottom enriched sets are in blue. (e) smFISH image of zonated genes Clca1 (green) and Ido1. Ada (red) marks the villus tip, blue the DAPI-stained nuclei. Bottom insets show Ido1 mRNAs (gray dots) increasing from the bottom (1) to middle (2) and tip (3) villus zones. Scale bar - 50μm in the stitched image and 15μm in the insets.
Most genes encoding mucus components exhibited increased expression towards the villus tip (Fig. 3c). An exception was Agr2, a gene hypothesized to be secreted in molar quantities with Muc210, which we found to be inversely zonated towards the crypt. This finding of an anticorrelated zonation profile supports the hypothesis that Agr2 may play additional roles in goblet cells, for example in goblet cell maturation at the crypt19. Our analysis revealed zonation of ligands, receptors20 and transcription factors21 in goblet cells (Supplementary Fig. 4b-d), including tip-enriched expression of the immediate early genes Jun and Atf3. Gene sets related to RNA polymerase, splicing and ribosome were zonated towards the crypt and villus bottom (Fig. 3d, Supplementary Table 4) largely overlapping the functional zonation previously measured for crypt and bottom villus enterocytes7. Goblet genes at the villus tip were enriched in cytoskeleton and tight junction genes, resembling the structural changes previously observed for tip enterocytes7 (Supplementary Table 4). Notably, villus-tip goblet cells exhibited enriched immune modulatory programs, including interferon-alpha and interferon-gamma responses (Fig. 3d, Supplementary Table 4). Among these genes was the immune checkpoint target gene Ido1 (Fig. 3a,e), an interferon-inducible gene that has been shown to have immunosuppressive effects22. Consistent with these tip programs, the most tip-zonated mucus genes were Muc3 and Muc4 (Fig. 3c), transmembrane mucins that act as bacterial receptors23. Our zonation reconstruction thus points to immune-specialization of goblet cells at the tips of the villi.
ClumpSeq reveals zonation of Tuft cell programs
Tuft cells are rare chemosensory epithelial cells with important functions in mediating type-2 immunity, most notably against intestinal worm infections13. Recent work demonstrated diversity of individual tuft cells17, but spatial heterogeneity of tuft cells along the crypt-villus axis has not been explored. Our ClumpSeq data included 146 tuft cell-containing clumps, from which we extracted 352 landmark genes (Supplementary Table 1). We used these landmark genes to reconstruct a dataset of 144 single sequenced tuft cells assembled from previous work17 and from 159 tuft cells sequenced with MARS-seq (Supplementary Table 3, Methods). We found that around 17% of the highly expressed genes in tuft cells were significantly zonated (1,240 out of 7,360 genes expressed at levels above 5*10−5 had q-value<0.25, Fig. 4a). We used ClumpSeq data to validate the accuracy of our tuft zonation reconstruction (Spearman correlation between the ClumpSeq and single-cell reconstructed zonation profiles of a validation set, R=0.62, p=5*10−4, Methods).
(a) Reconstructed zonation profiles based on single tuft cells. Plots on the left show zonation profiles of representative crypt (Sox4, Ccnd1, Nrep, Stmn1), middle (Ctsa, Cdx1) and tip (Sucnr1, Fabp1, Il17rb, Rab18) genes. (b-c) Representative smFISH images of the tuft zonated gene Fabp1 (gray) in a villus bottom (b) and a villus tip (c) tuft cell. Tuft cells were identified using Dclk1 (red). Scale bar 15μm. (d) Quantification of Fabp1 smFISH experiment. P value was calculated by Mann Whitney U test. n = 20 cells were examined over 2 mice. Red lines are medians, black boxes are 25-75 percentiles. (e) Mean max-normalized zonation profiles for tuft1 and tuft2 genes.
Tuft cells at the crypt expressed the transcription factor Sox4, previously shown to be important for Tuft cell specification25. Tuft cells at the villus tip expressed the fatty acid binding protein 1 (Fabp1) and the succinate receptor 1 (Sucnr1), suggested to act as a sensor for infectious agents26 (Fig. 4a-d). Tip tuft cells also expressed Il17rb, the receptor for Il25, a tuft-specific cytokine that activates type-2 innate lymphoid cells Th2 immunity27–29 (Fig. 4a), indicating an autocrine signalling loop.
A recent study identified two subsets of tuft cells termed ‘tuft1’ and ‘tuft2’ with distinct functions17. Tuft1 cells express neuronal-like genes, whereas tuft2 cells elevate immune programs, including the expression of Ptgs1, encoding the prostaglandin H2 synthase 117. We found that tuft1-specific transcripts were zonated towards the bottom of the villus, whereas tuft2-specific transcripts were zonated towards the villus tip (Fig. 4e, Supplementary Fig. 5a-d, Mann-Whitney p<10−6).
(a) Heatmap of zonation profiles of genes specific for Tuft1 and Tuft2 classes. List of Tuft1 and Tuft2 genes was taken from the “consensus gene list” which appears in Supplementary Table 7 of Haber et. al.20. Profiles are normalized to their maximal value across the crypt-villus zones. (b-c) Representative smFISH images of the Tuft2 zonated gene Ptgs1 (white dots) in a villus bottom (left) and a villus tip (right) tuft cell. Tuft cells were identified by Dclk1 expression (red). Scale bar 15um. (d) Quantification of Ptgs1 smFISH experiment. P value was calculated by Mann Whitney U test. n = 20 cells, 2 mice. Red lines are medians, black boxes are 25-75 percentiles.
Enteroendocrine lineages have heterogeneous migration patterns
Enteroendocrine cells are rare intestinal epithelial cells (∼1%) that consist of diverse subtypes of hormone-secreting cells that are essential for physiological homeostasis. We next applied ClumpSeq to extract landmark genes for these cells. We extracted 656 enteroendocrine landmark genes from our 181 enteroendocrine-containing clumps (Supplementary Table 1) and used them to infer the crypt-villus coordinates of single-sequenced enteroendocrine cells12 (Supplementary Table 3, Methods). We found that around 35% of the highly expressed genes in enteroendocrine cells were significantly zonated (1,838 out of 5,243 genes expressed to levels above 5*10−5 had q-value<0.25). The zonated expression patterns conformed with previous observations of crypt-biased expression of Gcg, Tac130–32 and Reg433, and villus-biased expression of Sct and Nts30 (Supplementary Fig. 6).
Heatmap of zonation profiles of genes encoding key enteroendocrine genes. Profiles are normalized to their maximal value across the crypt-villus zones.
A recent work analyzed the temporal expression programs of single enteroendocrine cells12. The study used a slowly-decaying fluorescent reporter, driven by Neurog3, a gene that is expressed in a pulse-like manner in the earliest crypt enteroendcorine progenitor, providing a time stamp for each enteroendocrine cell that enables grouping cells according to the time since their ‘birth’. We argued that combining these temporal profiles with our spatial measurements could reveal the patterns of cell migration of different enteroendocrine cell types, identified by the expression of their characteristic hormones (Fig. 5a). More specifically, genes that are expressed in crypt cells that have an ‘early’ time stamp, such as Neurog3 and Sox4, are most probably transient genes expressed in cells that are migrating out of the crypt. Genes that are expressed in crypt cells with late time stamps, would indicate that the cells expressing them are stalling in the crypt. In contrast, genes that are expressed in villus cells with similarly late time stamps indicate faster migration of the expressing cells.
(a) Spatio-temporal analysis of enteroendocrine cells. Central plot: Scatter plot of the center of mass (COM) in our spatial zonation reconstruction vs. temporal COM based on single cell time stamps form Gehardt et. al.12 (Methods). Dots colored by –log10(spatial zonation q-value). Peripheral plots show reconstructed temporal (red) and spatial (blue) profiles for early crypt-confined genes (green fonts, bottom row), late crypt-confined genes, expressed in cells that might be migrating more slowly (cyan fonts, top row) and late villus-localized genes, expressed in cells that might be migrating more rapidly (black fonts, right column). (b) Representative smFISH images of the enteroendocrine crypt-zonated gene Sst (red). Insets are blow-ups of the Sst+ cells at the crypt and bottom villus. Ada in gray marks the villi tips. Scale bars – 50μm in the large image and 10μm in the insets. (c) Quantification of Sst+ enteroendocrine D-cells in crypt and villus bottom, middle and tip over 3 mice. Fisher exact test for the frequencies of D-cells between the two lower zones and two upper zones p = 3.8*10−5.
While the correlation between our inferred cryp-villus zone and the average time-stamp for different genes was significant (R=0.4, p<10−10), different enteroendocrine genes exhibited distinct behaviour in space-time (Fig. 5a). Cells expressing Nts, Sct, Cck and Gip showed late time stamps and peaked in expression at the villus tip (Fig. 5a black-font genes). In contrast, the L-cell markers Pyy and Gcg, the X-cell marker Ghrl, the EC-cell marker Tac1, Reg433 and the D-cell marker Sst, encoding the hormone somatostatin, showed similar late time stamps, yet were spatially-confined to the crypt (Fig. 5a, cyan-font genes, Supplementary Fig. 6). We used smFISH to demonstrate that D cells are indeed enriched in the crypt and lower villus zones (Fig. 5b-c, Fisher exact test p = 3.8*10−5). The discordant space-time profiles, with crypt retained expression of relatively late appearing genes, suggest that enteroendocrine cell types such as L-cells and D-cells have slower crypt-villus migration rates compared to the Cck+ I-cells and the Gip+ K-cells. Future studies will explore whether D cells transdifferentiate into other enteroendocrine lineages at the villus tip, as has been shown for the conversion of Tac1+ cells to Sct+ cells30.
Discussion
ClumpSeq leverages the spatial information of the abundant cell types in a tissue to extract large sets of landmark genes for rare cell types. The ability to sort large clumps of up to 10 cells increases the capture rate of the rare cells, thus making it uncessecary to use specialized cell-type specific surface markers for enrichment, markers that often do not exist. With larger clumps, the probability to contain more than two cell types increases, e.g. goblet cells and tuft cells in the same clump. While this was negligible in our study, the reconstruction algorithms can be readily adapted to take this into account by extracting landmark genes from a pool of secretory cell-type specific genes that are not expressed by both enterocytes and other secretory cells.
While ClumpSeq directly provides the zonation profiles of cell-type specific genes, its real power of discovery emerges when using the two-step approach of first identifying a set of cell-type specific landmark genes to be used for reconstructing a genome-wide spatial single-cell atlas of the rare cell type (Fig. 1). This enables identifying zonated patterns of genes that are expressed in more than one cell type. An example is Fabp1, a gene that is highly abundant in enterocytes and peaks at the mid-villus zone, but is also zonated in tuft cells towards the villus tip. The expression of Fabp1 in clumps is dominated by the zonation of enterocytes, yet the tuft cell zonation emerges when reconstructing single sequenced tuft cells with the ClumpSeq-identified tuft cell landmark genes (Fig. 4b-d). ClumpSeq can be applied to diverse tissues and cell types, for example the analysis of lung goblet and tuft cell diversity in spatially-distinct airways34,35 and the zonation patterns of pancreatic endocrine cells along the radial islet axis36. ClumpSeq could also be adapted to assess the range-dependent effects of developmental organizers37 and tumor signalling centers31,38, thus expanding the toolbox of single cell biology beyond single cells and pairs.
Methods
Mice and tissue
All mouse experiments were approved by the Institutional Animal Care and Use Committee of the Weizmann Institute of Science and performed in accordance with institutional guidelines. Experiments were conducted on 8-12 weeks old C57BL/6 mice, obtained from Envigo. Mice were housed at 4–5 per cage, exposed at all times to a 12 h light/12 h dark cycle and had access to food and water ad libitum. All experiments were performed on the same region of the Jejunum. Mice were sacrificed by cervical dislocation.
Clump dissociation
The Jejunum was harvested, flushed with cold 1X DPBS, laterally cut and incubated for 20 min on ice in a 10mM EDTA solution. Afterwards, the tissue was cut into 1cm pieces, moved in a pre-warmed 10 mM EDTA solution for 5 min and shaked vigorously at the end of the incubation time. Dissociated cells were collected and filtered through a 100 μm cell strainer. Cells were spun down at 300g for 5 min at 4 °C. Pellet was resuspended and incubated for nuclear staining for 5 min at RT in a solution of DMEM + 10% FBS + 10mM HEPES + Hoechst 33342 (15 μg ml-1). To prevent the cells from pumping out the Hoechst dye, Reserpine (5μM) was also added. Cells were resuspended in PBS and Alexa Fluo 488 Zombie Green (BioLegend) was added at a dilution of 1:500, to later enable the detection of viable cells by FACS. Cells were kept in a rotator in the dark at room temperature for 15 min. After spinning down (500g. for 5 min at 4 °C), cells were resuspended in FACS buffer (2 mM EDTA, pH 8, and 0.5% BSA in 1× PBS) at a concentration of 106 cells in 100 μl.
Single cell isolation
To obtain single cell suspension, rather than clumps, the tissue was incubated for 10 min on ice in a 10mM EDTA solution, before to be cut in small pieces and moved for other 10 min in a pre-warmed 10 mM EDTA solution. The tissue was shaked vigorously every 2 min. Cells were filtered through a 70 μm cell strainer and spun down at 300g for 5 min at 4 °C. Cells were resuspended in FACS buffer and stained with combination of APC-anti-Epcam (BioLegend, 118214) and PE/Cy7-anti-CD45 (Biolengend, 103114) or APC-anti-Epcam and PE/Cy7-anti-CD24 (BioLegend, 101821). FcX blocking solution (BioLegend) was added at a dilution of 1:50.
Clumps and Single-cell sorting
Single cells and clumps were sorted with SORP-FACSAriaII machine (BD Biosciences) using a 100 μm nozzle. For clumps sorting, dead cells were excluded using the Zombie green staining and clumps were sorted based on Hoechst histogram (Fig. 1b). For single cell sorting, dead cells were excluded on the basis of 500 ng/ml Dapi incorporation. Sorted cells were negative for CD45 and positive for Epcam. To enrich for enteroendocrine cells, cells were gated on CD45- Epcam+ CD24+,since tuft cells express CD4517.
Cells and clumps were sorted into 384-well MARS-seq cell capture plates containing 2 μl of lysis solution and barcoded poly(T) reverse-transcription (RT) primers for single-cell RNA-seq. Barcoded single cell capture plates were prepared with a Bravo automated liquid handling platform (Agilent) as described previously16. Four empty wells were kept in each 384-well plate as a no-cell control during for data analysis. Immediately after sorting, each plate was spun down to ensure cell immersion into the lysis solution, snap frozen on dry ice and stored at −80 °C until processed.
MARS-Seq library preparation
Single cell libraries for both single cells and clumps were prepared, as described in Keren-Shaul et al16. Briefly, mRNA from cells sorted into MARS-Seq capture plates were barcoded and converted into cDNA by reverse transcription reaction and pooled using an automated pipeline. The pooled sample was cleaned using 0.9X SPRI beads and then linearly amplified by T7 in vitro transcription. The resulting RNA was fragmented and converted into sequencing-ready library by tagging the samples with pool barcodes and Illumina i7 barcode sequences during ligation, reverse transcription and PCR. Each pool of cells was tested for library quality and concentration was assessed as described in Keren-Shaul et al16. Machine raw files were converted to fastq files using bcl2fastq package, to obtain the UMI counts, reads were aligned to the mouse reference genome (GRCm38.84) using zUMI packge4539 with the following flags that fit the barcode length and the library strandedness: -c 1-7, -m 8-15, -l 66, -B 1, -s 1, -p 16.
scRNAseq data processing
For each single cell or clump and for each gene we performed background subtraction. The background was calculated for each 384-well plate separately, as the mean gene expression in the four empty wells. After subtraction, negative values were set to zero. We used Seurat v3.2 package in R v3.6.1 to cluster the clumps and single cell RNAseq datasets, retaining only clumps or cells containing at least 200 genes. We used Seurat to regress out cell-cell variation driven by the fraction of mitochondrial genes. For clumps, we excluded clumps with over 30% mitochondrial genes. Clustering was based on PCA dimensionality reduction using the first 18 PCs, and a resolution value of 1.
For single cells, cells with either total UMI counts lower than 200 or higher than 7,000 or total gene counts lower than 150 or higher than 1500 or mitochondrial content of over 40% were removed. Cell clustering was based on PCA dimensionality reduction using the first 25 PCs and a resolution value of 0.1. We used cell type-specific markers to interpret the single cell clusters: Epcam in the epithelial cells clusters, Ptprc in immune clusters, Muc2 in the goblet cluster, Dclk1 in the tuft cluster, Chga in the enteroendocrine cluster (Supplementary Fig. 3).
ImageStream analysis
Cells were imaged by an Imaging Flow Cytometer (ImageStreamX Mark II, AMNIS corp. - part of Luminex, TX, USA). Data was acquired using a 40X lens, and lasers used were 405nm (10mW), 488nm (100mW), 642nm (100mW), and 785nm (5mW). Data were analyzed using the manufacturer’s image analysis software IDEAS 6.2 (AMNIS corp.). Images were compensated for spectral overlap using single stained controls. Viable cells were first gated as negative for the dead cell marker Zombie-Green. To eliminate out-of-focus cells, cells were further gated using the Gradient RMS and contrast features (measures the sharpness quality of an image by detecting large changes of pixel values in the image). Then, cell were gated for single cells and cell clumps according to their area (in μm2) and aspect ratio (the Minor Axis divided by the Major Axis of the best-fit ellipse). To distinguish between large cells and small clumps with similar size, the circularity feature was used (the degree of the mask’s deviation from a circle, calculated as the average distance of the object boundary from its center divided by the variation of this distance) – high circularity was correlated with large cells rather than cell clumps. This was calculated using the Object cell mask (segments images to closely identify the area corresponding to the cell, by distinguishing it from the background), to better delineate cell morphology. To distinguish between pairs and larger clumps, objects were gated according to the area and aspect ratio (normalized for intensity) of the Hoechst staining. To validate that cell clumps contain more than one EpCAM positive cell, two features were calculated – the area of the EpCAM staining divided by the bright-field area, and the distance between the geometrical centers of the EpCAM staining and the bright-field image, using the Delta Centroid XY feature. Clumps with higher area ratio and lower distance were eventually chosen.
Single molecule FISH and quantification
Jejunum was harvested, flushed with cold 1X DPBS, laterally cut and then fixed in 4% formaldehyde for 3 h, incubated overnight with 30% sucrose in 4% formaldehyde and finally embedded in OCT in the form of swiss-rolls. 7 µm thick sections of fixed Jejunum were sectioned onto poly L-lysine coated coverslips and used for smFISH staining. Probe libraries were designed using the Stellaris FISH Probe Designer Software (Biosearch Technologies, Inc., Petaluma, CA). The intestinal sections were hybridized with smFISH probe sets according to a previously published protocol31. Briefly, tissues were treated for 10 min with proteinase K (10 µg/ml Ambion AM2546) and washed twice with 2× SSC (Ambion AM9765). Tissues were incubated in wash buffer (20% Formamide Ambion AM9342, 2× SSC) for 5 min and mounted with the hybridization buffer (10% Dextran sulfate Sigma D8906, 20% Formamide, 1 mg/ml E.coli tRNA Sigma R1753, 2× SSC, 0.02% BSA Ambion AM2616, 2 mM Vanadyl-ribonucleoside complex NEB S1402S) mixed with 1:3000 dilution of probes. Hybridization mix was incubated with tissues overnight in a 30°C incubator. SmFISH probe libraries (Supplementary Table 5) were coupled to Cy5, TMR or Alexa594. After the hybridization, tissues were washed with wash buffer containing 50 ng/ml DAPI for 30 min at 30 °C. DAPI (Sigma-Aldrich, D9542) was used as nuclear staining. All images were taken as scans extending from villus tip to crypt bottom using ×100 magnifications, hence several fields of view were stitched together to cover the whole crypt-villus unit. Stitching was performed with the fusion mode linear blending and default settings of the pairwise stitching plugin in Fiji40.
Quantification of smFISH was done using ImageM31. Goblet cells were manually segmented based on Muc2 or Clca1 expression. Each transcript quantification was based on at least 5 entire villi from at least 2 mice. Tuft cells were manually segmented using Dclk1 mRNA expression. Results were based on at least 20 cells from bottoms and tips of villi and from at least 2 mice. Delta cells were identified based on Sst expression. Results were based on at least 20 cells from bottom and tip villus and from at least 2 mice. Fisher exact test was calculated on the number of crypt-villus units with and without Sst+ cells. mRNA concentration (number of mRNA per unit volume, for low abundance genes) or mRNA signal intensity (mean background-subtracted intensity in segmented unit, for high abundance genes) was computed per cell.
Gene specificity analysis
In order to find genes specific to the intestinal epithelial cell types, we comprised a table of mean expression of genes across cell types and the percentage of single cells of each cell type expressing each gene (Supplementary Table 6). To this end, we analyzed published scRNA-seq data sets7,12,17,18, using cell type annotations by the papers’ authors. Gene expression measurements (UMIs per gene) were normalized for each cell by the sum of its UMIs and then averaged across single cells by type. For enterocytes, we averaged cells from each villus zone using Moor et. al. annotation 7. The same single cell data source was also used for generating the crypt stem cells colums of Supplementary Table 618. For other secretory cell types: goblet, enteroendocrine, tuft and paneth cells, we used the data from Figure 1 of Haber et. al.17
Zonation reconstruction of clumps
UMI counts table for all 5,297 clumps was exported from Seurat41 and further clumps analysis was performed using MATLAB (version 2019a). 4,788 clumps with over 500 UMIs were retained and expression values per gene were calculated as UMIs per gene normalized for each clump by the total sum of its UMIs.
Enterocyte Landmark gene selection
Enterocyte landmark genes for clumps zonation reconstruction were basd on the enterocyte zonation table in Moor et. al.7. There, gene zonation was reconstructed for the crypt and 6 villus zones (V1-V6) using single enterocytes. Candidate landmark genes were required to satisfy the following requirements: 1) Abundance - having a mean normalized expression across zones of 5*10−4 or more. 2) Enterocyte specific: having mean expression in any enterocyte/stem-cell population higher than 10-fold the maximal mean expression in all secretory cell types and expressed in at least 10% of that enterocyte population. 3) Zonated having at least 70% difference between maximal and minimal expression along the crypt-tip axis.
In order to select an informative set of landmark genes which includes crypt, mid-villus and villus tip markers, we calculated for all candidate landmark genes the Euclidean distances to “ideal” land mark profiles as follows: Ideal crypt landmark profile: expression value 1 in the crypt and 0 for all other zones; Ideal mid-villus land mark profile: expression value 1 in the middle of the villus (V3) and 0 for all other zones; Ideal tip land mark profile: expression value 1 in the tip and 0 for all other zones. Finally, three lists of enterocyte landmark genes were comprised: the crypt list with the 30 candidate landmarks with lowest distance from “ideal” profile 1, the mid-villus list with the 30 candidate landmarks with lowest distance from “ideal” profile 2 and the tip-villus list with the 30 candidate landmarks with lowest distance from “ideal” profile 3. If genes overlapped between the lists, they were assigned to the list of the “ideal” profile they were closest to. The selected landmark genes are shown in Supplementary Table 1.
Assignment of clumps to zones
Based on Supplementary Table 6, enterocyte specific genes were defined as genes for which the mean expression in any enterocyte/stem population was higher than 3-fold the maximal mean expression in all other secretory cell populations, and that were expressed in at least 10% of that enterocyte population. For comparability between clumps containing different numbers of enterocytes, these genes were internally normalized: their expression was divided by the sum total for all enterocyte specific genes in each clump. Note that the selected landmark genes are a subset of this group of enterocyte specific genes. 2% of clumps with lowest sums of enterocyte landmark gene expression were discarded, since they could not be reliably assigned to a zone. The remaining 4,690 clumps were assigned a zone using the single cell enterocyte zonation table7 as a spatial reference as follows.
The expression values of the enterocyte landmark genes in the spatial reference were normalized by dividing the expression of each gene by its maximal expression across zones. This resulted in a normalized landmark expression vector for each zone in the spatial reference. The expression of the enterocyte landmark genes in the clumps was also normalized by dividing the expression of each gene by its maximal expression across clumps. This resulted in a normalized landmark expression vector for each clump. Next, the correlations between the vector of landmark values for each clump were calculated with that of each of the zones. The clump was assigned to the zone it correlated most with. A clump-based zonation table was computed by averaging the expression values for each gene across all clumps in the zone. P values were calculated with the Kruskal-Wallis test (implemented in the MATLAB function kruskalwallis). q values were calculated using the Benjamini and Hochberg method (implemented in the MATLAB function mafdr), applied to all genes for which maximal expression across zones exceeded 5*10−6 (Supplementary Table 2).
Selection of cell type-specific classification markers
Classification of clumps according to their contained cell type was performed separately for pairs and larger clumps due, to differences in relative expression of genes stemming from clumps size. For cell type classification, we used Supplementary Table 6 to identify cell-type specific marker genes for secretory cells and enterocytes. For secretory cells, these included genes with mean normalized expression above 10−4, expressed in over 15% of the single cells and expressed at more than 4-fold higher levels than the maximal mean expression in all other epithelial cell types (we define this fold-change as specificity ratio). For each secretory type, all genes meeting these criteria were ordered by their specificity ratio in descending order and up to 50 first genes were selected as type markers. Enterocyte markers (used in Supplementary Fig. 2d) were selected similarly: genes expressed in at least 15% of enterocytes in any zone, with mean expression at least 4-fold greater in enterocytes than secretory cell types. The 50 genes with highest fold difference between enterocytes and secretory cells were selected.The list of cell-type classification markers appears in Supplementary Table 7.
Geometric classification of clumps
For each secretory cell type (goblet, tuft, enteroendocrine and Paneth), the expression levels of its classification markers were summed in each clump. These sums were converted to Z scores by subtracting the mean and dividing by the standard deviation across clumps. This process projected the clumps into a 4-dimensional space spanned by the sums of secretory cell type markers. We next performed Principal Component Analysis on these shifted and scaled sums (implemented in the MATLAB function pca). This resulted in three principal components (PCs) that define a 3D position for each clump in PC space. For each PC, the median was subtracted in order to shift the origin of the PCs to the origin of axes. In PC space, the clumps were now arranged on four lines or rays emanating from the origin (Fig. 2o, Supplementary Fig. 2a). Clumps at the origin, where the sum of all secretory markers were low were enterocyte-containing clumps (Supplementary Fig. 2d). Clumps at the edge of each ray were the ones for which the sums of the distinct secretory type’s markers were maximal. Intermediate clumps contained different contributions of the enterocyte transcriptome and the secretory cell transcriptome (Supplementary Fig. 2a,d). Larger clumps were closer to the origin, since the contributions of enterocytes, the major cell type, were higher in these clumps (Supplementary Fig. 2e).
We fitted a line to each of the secretory type rays. For the fit, we sorted clumps according to their distance from the origin and considered only clumps with distance above the 99 percentile. Fit was performed using a least square fit method implemented in a custom MATLAB script. Each ray was assigned to the secretory type, the markers of which peaked along it (Supplementary Fig. 2a). For each clump, the Euclidean distance from each of the rays was computed and Z scores for the four distances were calculated.
The farther from the origin of axes a clump was located, the higher it’s sum of cell type-specific markers, and therefore the lower were the chances to miss-classify it. The region close to the origin contained clumps that were low in all cell type marker sums. There, a clump could be close to a particular ray at random. To minimize miss-classification we therefore sorted the clumps on each ray in descending order according to the distance from the origin, and included a fraction that matches the abundance of this cell type in the tissue. To estimate these abundances, we measured the proportions of each secretory cell type in the both crypts and villi of the jejunum out of all cells (Supplementary Table 8). Measurements were performed by imaging the tuft cell marker Dclk1, the enteroendocrine cell marker Chga and the goblet cell marker Clca1. For Paneth cells, data was taken from Elmes M.J.42. For final thresholds, measured proportions for crypts and villi were multiplied by 2 for pairs and by 3 for larger clumps, to represent the higher probability to capture rare events in clumps. Secretory types were assigned only to clumps far enough from the origin based on the above mentioned cell type-specific thresholds, for crypt and villus clumps separately based on the respective threshold. Only clumps for which Z-score of distance from closest ray was below -1 were considered. All other clumps were classified as enterocyte-only clumps.
Separation into clumps zonation tables per secretory type
Clumps were separated by assigned secretory cell types, and the zone was assigned as previously described, based on enterocyte gene expression. For comparability between clumps containing different numbers of enterocytes, secretory cell type specific genes were internally normalized: their expression was divided by the sum of all secretory cell type specific genes in each clump. Secretory cell type specific genes, out of which a subset of secretory landmark genes were (below), were defined as genes for which the mean expression in the secretory cell type was higher than 3-fold the maximal mean expression in all enterocyte populations and were expressed in at least 1% among cells in that secretory cell type. Calculation of zonation table proceeded as previously described for all clumps (Supplementary Table 2)
Use of clumps for single cell zonation reconstructions
For goblet, enteroendocrine and tuft cells, we used the clump-based zonation tables to find zonated, secretory-specific landmark genes. We then used the expression patterns of these landmark genes to assign single sequenced secretory cells to crypt-villus zones, grouped them and averaged their expression, thus obtaining zonation tables of all genes for each secretory cell type. The method we used for single cell reconstruction is similar for all three secretory types, with slight differences in landmark gene selection criteria and the spatial resolution of the reconstruction between goblet cells and the other secretory cell types. The reason for these differences stemmed from the substantially higher goblet cell abundance in the tissue and therefore in clumps, compared to other secretory cell. This enabled performing the reconstruction with finer spatial resolution for goblet cells. For all secretory types, single cell reconstruction consisted of the following steps, secretory type specific parameters and variations on this general method are detailed in the next sections:
Secretory cell type-specific genes were defined as genes for which mean expression in the secretory cell type was higher than 3-fold the maximal mean expression in all enterocyte populations, and that were also expressed in at least 1% of secretory cells.
For comparability between clumps containing different numbers of cells, these genes were internally normalized: their expression in each clump was divided by the sum of all secretory cell type-specific genes in that clump.
Out of the secretory specific genes in step 1, two groups of zonated landmark genes were selected: crypt landmark genes, which are zonated towards the crypt and villus-tip landmark genes, zonated towards the villus tip. See sections below for detailed description of landmark gene selection criteria. For each clump of this secretory cell type, the sum of the normalized expression of the crypt landmark genes (denoted X) and of the tip landmark genes (denoted Y) was calculated. These sums were than used to calculate a unit-less spatial coordinate: η=Y/(X+Y) for each clump. This yielded a distribution of η values for each zone in secretory cell-containing clumps.
For each single secretory cell used for reconstruction, the spatial coordinate η was calculated as in step 3.
In order to assign each single cell to a zone based on it’s η, η limits for the zones were calculated using an optimization method - reconstruction was performed using a wide range of possible η limits options (see step 6 for details). The set of η limits that yielded zonation profiles of secretory specific genes which best fit the clumps profiles was selected. Specifically, for each possible set of η limits:
We performed single cell zonation reconstruction as described in steps 7-8
For each secretory specific gene (defined in step 1), the zonation profile in the current reconstruction was compared to the zonation profile in clumps: the Euclidean distance between the two profiles was calculated after both were normalized by their maximal values.
The median over genes of this distance was calculated and denoted as MedEuc.
The set of η limits yielding the smallest MedEuc was selected as optimal.
All η limit sets that were considered in the optimization described in step 5 were calculated as follows:
The lowest possible upper bound on η for crypt was set to the median of η values of crypt clumps – calculated in step 3. We denote this number as ηmin
The resolution of η optimization denoted Dη, was determined.
A vector of all considered η limit values, ηVec, was created: a regularly-spaced vector starting at ηmin and ending in 1, using Dη as the increment between elements. ηVec elements were therefore: [ηmin, ηmin + Dη, ηmin +2* Dη,…, ηmin +m* Dη] where m=(1-ηmin)/Dη).
All possible combinations of the elements of ηVec taken Nzones-1 (Nzones is the number of desired zones for the reconstruction) at a time were calculated. Each such combination is an optional set of η limits, with a zero added in the beginning and one appended at the end. For example, if the reconstruction is to be done for 4 zones, each such optional set of η limits would be: 0, ηVec1, ηVec2, ηVec3,,1. With ηVec1, ηVec2, ηVec3 being one of the combinations of ηVec values-such that ηVec1< ηVec2< ηVec3.
All sets of η limits which yielded less than 10 single cells in some zone were discarded and not considered in the optimization described in step 5.
Each single cell was assigned to a zone based on the optimal η limits selected as described in step 5.
The expression values of all the single secretory cells in each zone were averaged for each gene, to obtain the zonation table of genes in the secretory cells.
P values for zonation per gene were calculated with the Kruskal-Wallis test (implemented in the MATLAB function kruskalwallis). q values were calculated using the Benjamini and Hochberg method (implemented in the MATLAB function mafdr), for all genes for which expression exceeded 5*10−6.
Zonation reconstruction of single goblet cells
Single cells used for zonation reconstruction were from scRNA-seq experiments on intestinal cells, performed using the MARS-seq protocol (Supplementary Fig. 3), see scRNA-seq section for details. Goblet cells were detected based on Seurat clustering41.
We defined the Center Of Mass (COM) of a gene’s spatial expression profile across the zones 1,2,..N with respective expression values per zone of E1,E2,…EN as:
Goblet cell specific landmark genes based on clumps data (steps 3-4 in previous section) were selected based on the following criteria:
Maximal expression across zones in clumps zonation Supplementary Table 2>=5*10−5.
Crypt markers: COM<=3, expressed in at least 2 clumps in the crypt.
Tip markers: COM>=4.7, expressed in at least 2 clumps in the tip most zone.
This resulted in 309 crypt markers and 62 tip markers listed in Supplementary Table 1. Single cell reconstruction was performed with 5 zones. Resolution parameter for optimization-Dη (step 6b in previous section) was set to 0.05.
Zonation reconstruction of single Enteroendocrine cells
Single cells used for zonation reconstruction were taken from Gehart 12. UMI count table for the single cells was downloaded from Gehart et al.12 (GEO: GSE113561) and parsed in MATLAB. Expression values per cell were normalized by dividing by the overall sum of UMI for each cell. Cells marked as excluded in the metadata supplied by the authors were removed. Enteroendocrine specific landmark genes based on clumps data (steps 3-4 in previous section) were selected based on the following criteria:
Maximal expression in clumps zonation Supplementary Table 2>=5*10−5.
Crypt markers: COM<=1.8, expressed in at least 2 clumps in the crypt.
Tip markers: COM>=4.2, expressed in at least 2 clumps in the tip most zone.
This resulted in 636 crypt markers and 20 tip markers listed in Supplementary Table 1. Single cell reconstruction was performed with 4 zones. Resolution parameter for optimization-Dη (step 6b in previous section) was set to 0.02.
To obtain the temporal profile of enteroendocrine gene expression (Fig. 4f), time stamps per cell, which were available in GEO: GSE113561 were used to equaly partition cells into 7 temporal bins, assigning each cell to a distinct temporal zone. Within each temporal bin, gene expression was averaged over cells in that bin creating the temporal expression table. P values for temporal profiles per gene were calculated with the Kruskal-Wallis test (implemented in the MATLAB function kruskalwallis). q values were calculated using the Benjamini and Hochberg method (implemented in the MATLAB function mafdr).
Zonation reconstruction of single tuft cells
Our single cell MARS-seq protocol yielded mainly villus tuft cells. We therefore combined our cells with tuft cells from Figure 1 of Haber et. al.17. Zonation reconstruction was performed separately for these two single cell datasets and consequently merged.Tuft cell-specific landmark genes based on clumps data (steps 3-4 in previous section) were selected based on the following criteria:
Maximal expression in clumps zonation Supplementary Table 2>=5*10−5.
Crypt markers: COM<=1.8, expressed in at least 2 clumps in the crypt.
Tip markers: COM>=4.2, expressed in at least 2 clumps in the tip most zone.
This resulted in 323 crypt markers and 29 tip markers listed in Supplementary Table 1. Single cell reconstruction was performed with 4 zones. Resolution parameter for optimization-Dη (step 6b in previous section) was set at 0.02. The merged zonation table was calculated as the weighted mean of the two zonation tables derived separately from the two datasets with the weights reflecting the relative contribution of each data set per zone in terms of amount of expressing cells.
The details of the calculation are as follows - for each possible set of η limits:
The single cell reconstruction using only cells from our data was computed.
The single cell reconstruction using only cells from Haber et al.17 was computed.
For each gene, the mutual zones between data sets were identified. Mutual zones were defined as zones in which both data sets had 5 or more cells expressing the gene above the expression threshold, set at 5*10−6.
If there were no mutual zones-and the gene was expressed only in one dataset, the expression in that data set was retained for the merged zonation table.
If both data sets had cells expressing the gene, but without sufficient overlap in a single zone the gene was excluded from the merged zonation table.
If there were several mutual zones, the mutual zone in which the data sets had the most similar amount of expressing cells was selected.
For each gene, the zonation expression profile in each data set was normalized by the value in the mutual zone.
These two scaled profiles were averaged with weights per zone. The weights per dataset per zone were the amount of cells expressing the gene above the expression threshold (5*10−6).
Averaged zonation profiles for each gene were re-normalized by dividing by the maximal value, and re-scaled by multiplying the normalized profiles by the maximal expression level between the two separate zonation tables.
For the merged zonation table, the p values were calculated per gene as the minimal p value between the two separate reconstructions. q values were calculated using the Benjamini and Hochberg method (implemented in the MATLAB function mafdr).
Validation of Single cell reconstructions with clumps
In order to further validate the single cell reconstructions, we computed the correlation of Centers of mass (COM) between single cell and clump-based reconstructions. For this puropose, we coarse-grained the 7-zone clumps zonation tables into the same number of zones as the respective single cell reconstruction. These coarse grained zonation tables were calculated the same way as described for clump to zone assignment, with one difference: the spatial reference7 zonation table was linearly interpolated for the smaller number of equally spaced zones prior to reconstruction.
For the validation, we chose genes which were not used as landmarks for single-cell reconstruction and were both highly expressed (above 10−5) and secretory specific (secretory specific criteria described in step 1 of the single cell reconstruction algorithm). We further limited ourselves to genes for which the SEM of reconstruction was small enough for both single cells and clumps based reconstructions (below 0.4 for goblet and below 0.5 for tuft). This yealded a similar quantity of genes for goblet and tuft: 30 and 28 respectively.
For Enteroendocrine cells, such validation was infeasible, due to the various enteroendocrine cell sub-types and their relatively sparse representation in clumps. Instead, single cell reconstruction of spatial zonation was validated against the temporal gene expression patterns derived from the same single cells in Gehart et al.12, resulting in high correlation (R=0.4, p<10−10).
Data Availability
Data generated in this study has been deposited in GEO and will be publically available upon publication. Single cell dataset also includes epithelial cells from GSE13447943. Enteroendocrine single cell dataset was acquired from NCBI GEO dataset browser, with accessions code: GSE11356112.
Code Availability
All codes used in this study will be available upon request.
Author Contribution
R.M., K.B.H, Z.P. performed the experiments, I.A., S.I performed the data analysis, I.A. contributed to project design. S.I., R.M., I.A. wrote the manuscript. All of the authors discussed the results and commented on the manuscript.
Competing Interest statement
The authors declare no competing interests
Acknowledgements
We thank all members of our lab for valuable comments. S.I. is supported by the Wolfson Family Charitable Trust, the Edmond de Rothschild Foundations, the Fannie Sherr Fund, the Helen and Martin Kimmel Institute for Stem Cell Research grant, the Israel Science Foundation grant No. 1486/16, the Broad Institute-Israel Science Foundation grant No. 2615/18, the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme grant No. 768956, the Chan Zuckerberg Initiative grant No. CZF2019-002434, the Bert L. and N. Kuggie Vallee Foundation and the Howard Hughes Medical Institute (HHMI) international research scholar award.