Spatial expression programs of the intestinal follicle-associated epithelium

The intestine is lined with isolated lymphoid follicles (ILFs) that facilitate sampling of luminal antigens to elicit immune responses. Technical challenges related to the scarcity and small sizes of ILFs and their follicle-associated epithelium (FAE) impeded the characterization of their spatial gene expression programs. Here, we combined RNA sequencing of laser capture microdissected tissues with single molecule transcript imaging to obtain a spatial gene expression map of the ILF and its associated FAE in the mouse small intestine. We identified zonated expression programs in both follicles and FAEs, with a decrease in enterocyte anti-microbial and absorption programs and a partial induction of expression programs normally observed at the villus tip. We further identified Lepr+ sub-epithelial telocytes at the FAE top, which are distinct from villus-tip Lgr5+ telocytes. Our analysis exposes the epithelial and mesenchymal cell states associated with ILFs.


Introduction
The intestine is lined with mucosa-associated lymphoid follicles that promote homeostatic response against luminal microbiota [1]. These lymphoid follicles include 7-10 large Peyer's patches (1-2 mm in diameter) [2], as well as smaller solitary intestinal lymphoid tissues (SILT) that are up to 200 µm in diameter. SILTs consist of a spectrum of developed tissues, ranging from cryptopatches (CP) to mature isolated lymphoid follicles (ILF), numbering at around 100-200 along mouse small intestine [3,4]. Peyer's patches and mature ILFs are covered by follicle-associated epithelium (FAE), forming a dome-like shape [5]. The FAE is mainly composed of enterocytes, with a minority of interspersed Microfold (M) cells. Secretory goblet cells have been shown to be scarce along the Peyer's patch FAE, resulting in a thinner mucus layer at the luminal surface, and by that facilitating close interactions with microbial antigens [6]. The thin mucous layer enables apical-basolateral transport of antigens from the lumen across the FAE into the lamina propria by M cells, mediating the mucosal immune response [7]. This transport program is different from the normal function of the intestinal epithelium, which consists of selective transport of nutrients with a simultaneous blocking of microbial access to the surface.
Several studies described the transcriptomes of FAE cells and M cells in Peyer's patches [8][9][10] , however technical limitations related to the smaller sizes of ILFs impeded their similar transcriptomic characterization. Here we applied laser capture microdissection RNA sequencing (LCM RNA-seq [11,12]) and single molecule fluorescence in-situ hybridization (smFISH, [13,14]) to generate a spatial expression map of ILFs and their associated FAE in the mouse small intestine. We identify zonated expression programs in ILFs and FAE, with a decrease in anti-microbial and absorption programs at the FAE top and a compensatory increase in secreted anti-microbial peptides at the FAE boundaries.
The FAE top consists of enterocytes that partially induce genes normally observed at the villus tip, and are in contact with Lepr+ sub-epithelial telocyte cells.

Spatial mapping of ILF and FAE gene expression
To investigate the spatial heterogeneity of epithelial, immune and stromal gene expression in the ILF and its associated FAE, we used LCM to isolate five segments.
These included three epithelial segments consisting of FAE top (FAET), FAE bottom (FAEB), and an adjacent segment at the bottom of a neighboring villus (denoted Villus bottom, VB), and two immune segments consisting of the ILF top (ILFT) and ILF bottom (ILFB) (Fig 1A-C). We used mcSCRBseq [15,16], a sensitive UMI-based protocol to sequence the RNA of the dissected fragments. We identified distinct zonated expression and enriched gene sets in each of the segment (Fig 1D, Fig S1). We used these LCMseq expression programs as a basis for extensive smFISH validations, to establish their statistical power.
We first examined the spatial expression programs at the ILF compartment. We identified higher expression of different B-cell markers at the bottom of the ILF such as Cd19 (Figs   S2A,B, S3A,D). In contrast, T cell markers did not show a spatial bias towards the ILF bottom (Fig S2C,D). Rather, using smFISH we found that Cd3e, a classic T cell marker, and Gzma, a marker of cytotoxic CD8+ T cell subset were highly abundant at the ILF top, infiltrating into the FAE top layer (Fig S3B,C,E,F). This spatial pattern of lymphocytes in the ILF resembles the architecture previously observed in Peyer's patches, which exhibit a core of B cells and a mantle of T cells [17].
The FAE segments exhibited elevated expression of M cell markers, including Anxa5  Fig S3I). We next turned to examine the zonated properties of the remaining >90% epithelial cells in the FAE. data showing gene mean expression Z-score of the five segments (S1-S5). Selected genes with high expression are shown on the right side of the clustergram for each cluster, colored according to the cluster color.

Anti-microbial and absorption programs are downregulated in FAE top
The mucosal layer that lines the intestinal epithelium is a critical barrier against bacteria.
This barrier consists of a mucus layer that contain antimicrobial peptides. Previous in-situ imaging studies on FAE of the large Peyer's patches demonstrated a lower abundance of the mucus-secreting goblet cells [18,19], as well as a decline in enterocyte anti-microbial genes [6]. Our spatial analysis enabled global characterization of these and other zonated antimicrobial programs with high spatial resolution in FAE associated to the smaller ILFs.
Our LCMseq showed decreased levels of Muc2 in the FAE (Fig 2A). Using smFISH we validated the lower abundance of Muc2+ goblet cells in the FAE with 3.6%±1.0% compared to 9.3%±1.1% in the adjacent villus (mean of five different smFISH images, Fig 2A, B). We also found a decline in the epithelial expression of Pigr, encoding the polymeric immunoglobulin receptor that facilitates transcytosis of IgA antibodies from the lamina propria to the lumen [6] ( Along with the reduction of antimicrobial programs, our analysis further revealed a steady decrease in key genes associated with nutrient absorption towards the FAE top. These included the sodium-coupled transporter Atp1b1, the fatty acid binding proteins Fabp1, Fabp2, as well as different solute carrier genes, such as the glucose transporter Slc2a2, and the amino-acid transporters Slc7a7, Slc7a9 (Fig 2F-I and Fig S4B).

FAE top enterocytes exhibit a partial villus tip program
Previous reconstruction of spatial gene expression patterns along the intestinal villus axis revealed a substantial up-regulation of dozens of genes in enterocytes at the villus tip [11]. These included cell-adhesion genes such as Cdh1, encoding E-cadherin, apolipoproteins, stress-associated transcription factors such as Jun, Fos and Klf4, as well as the purine-metabolism immune-modulatory genes Ada, Nt5e and Slc28a2 (Fig 3A).
We asked whether FAE top enterocytes exhibit similar elevation of these genes. We found that some villus tip enterocyte genes, such as Cdh1, Cdkn1a (encoding the protein P21), and the apolipoprotein genes Apoa1 and Apoa4 were elevated at the FAE top. In contrast, Fos, Jun, Klf4 and Slc28a2 were not elevated at the FAE top ( Fig 3B-K). The increase in Apoa proteins, which are essential components of chylomicrons used to transport absorbed lipids to the body, is surprising, given that the absorption machinery was largely downregulated in the FAE top zone (Fig 2F, S4B). This increase could be related to additional anti-inflammatory roles of apolipoproteins [21,22]. Our analysis therefore demonstrates that FAE top enterocytes exhibit a partial induction of the expression program normally observed at the villus tip.

FAE top telocytes are marked by Lepr and are distinct from villus tip telocytes
PDGFRα+ sub-epithelial telocytes are slender elongated cells that line the intestinal epithelium and have recently been shown to be important niche cells that shape zonated epithelial expression programs [23,24]. Villus tip telocytes, marked by the gene Lgr5, are regulators of the villus tip enterocyte program [25]. We argued that our LCM RNA-seq of the epithelial segments may also include sub-epithelial telocytes, in addition to epithelial cells, due to their adjacency to the epithelial layer. To identify potential telocyte gene expression, we extracted 1,765 genes that were previously shown to be substantially more highly expressed in telocytes compared to enterocytes (Methods). Among these potential telocyte marker genes, we found elevated expression at the FAE top of Lepr, encoding the leptin receptor (Fig 4A-B). Lepr was previously shown to be elevated in crypt mesenchymal cells in the colon [26]. Using smFISH combined with immunostaining for PDGFRα, we validated that FAE top telocytes express significantly higher levels of Lepr compared to the FAE bottom and adjacent villus bottom (Fig 4C,   F). In contrast, Lgr5 was highly expressed in villus tip telocytes (as well as in the crypt epithelial stem cells), but was undetectable in the sub-epithelial layers of the FAE (Fig  4C-E). Our analysis therefore identifies Lepr+ sub-epithelial telocytes distinct from the Lgr5+ telocytes at the villus tip.

Discussion
ILFs, along with Peyer's patches, are critical sites of intestinal immune surveillance, yet their scarcity and small sizes render them hard to isolate. Our analysis provides a comprehensive spatial atlas of gene expression of ILFs and their associated epithelium.
We found that FAE enterocytes exhibit a gene expression signature that is distinct from both enterocytes at the villus bottom and at the villus tip. FAE top enterocytes express lower levels of antimicrobial genes, as well as lower levels of the nutrient absorption genes. These features seem to be tuned towards maintaining a microenvironment that is optimal for efficient sampling of bacterial antigens by M cells and immune cells, rather than nutrient absorption.
What are the factors that could elicit the distinct gene expression programs at the FAE top? FAE enterocytes are localized at around the same physical distance from the crypt as villus bottom enterocytes. Moreover, previous work has shown that FAE eneterocytes are continuously migrating and shedding from the FAE tip, similarly to villus enterocytes [27]. Yet, the FAE operates under a unique microenvironment compared to the villus bottom. Luminal bacterial concentrations at the FAE tip should be higher due to the reduced mucous layer and reduced secretion of anti-microbial peptides, and therefore more similar to the luminal microenvironment at the villus tip [11]. We found that, unlike villus-tip enterocytes, FAE top enterocytes are in contact with Lepr+ telocytes, which could provide different niche signals than their villus tip Lgr5+ telocytes counterparts.
Indeed, the expression of the purine-metabolism immune-modulatory genes Ada, Nt5e and Slc28a2 at the villus tip seems to be controlled by Lgr5+ telocytes [25], potentially explaining their reduced expression at the FAE top. Our study forms the basis for the future exploration of the regulatory molecules that shape FAE zonation. It will be interesting to expand our study to cryptopatches and colonic ILFs, as well as to the characterization of ILFs in perturbed states such as germ-free mice and in models of inflammatory diseases.

Animal experiments
All animal studies were approved by the Institutional Animal Care and Use Committee

Laser capture microdissection (LCM)
LCM experiments were performed as was previously described [11]

Bioinformatic analysis
Illumina output sequencing raw files were converted to FASTQ files using bcl2fastq package. To obtain the UMI counts, FASTQ reads were aligned to the mouse reference genome (GRCm38.84) using zUMI package [28] and STAR with the following parameters: RD1 16 bp, RD2 66 bp with a barcode (i7) length of 8 bp. For UMI table see   Table S1. Subsequent analysis was done with Matlab R2018b. UMI table was filtered to retain only protein coding genes, based on GRcm38.84 ensembl acquired via BioMart using biomaRt R package version 2.44.1 (Table S1). Mitochondrial genes were further removed and the remaining UMI counts for each sample were normalized to the sum of UMI counts of all genes that individually took up less than 5% of the UMI count sum in any of the samples. Mean expression was calculated over all samples from the same zone (Table S1). Gene expression was calculated as fraction of UMI counts over the sum of UMIs (Table S1) and was presented in violin plots. Clustering was performed using the Clustergram function in Matlab with correlation distance over all genes with maximal mean expression above 10 -4 . Gene set enrichment analysis was done using MSigDB_Hallmark_2020 database of Enrichr [29,30]. The list of genes in each set is specified in Table S2. For extracting telocyte genes, we used previous datasets of celltype specific gene expression in the mouse small intestine and compared the mean expression over villus tip and crypt telocytes [25] to the mean expression over zonated crypt-villus populations [11], goblet cells and enteroendocrine cells [8]. Telocyte genes were defined as those with mean expression above 10 -5 in telocytes and 4-fold higher expression in telocytes compared to epithelial cells. M cell marker genes ( Fig S3G) were obtained from previous analyzed data [8]. To identify T cell and B cell markers we analyzed a previous scRNAseq dataset [31]. We calculated the maximal mean expression among cells annotated as CD4 or CD8 and compared it to the mean expression of B cells.
We considered genes with expression above 5*10 -4 in the corresponding cell type and extracted the 20 genes with the highest expression ratio between the two cell types.
Kruskal Wallis tests were used to assess statistical significance as p value<0.05.

Single molecule FISH (smFISH)
The smFISH experiments were conducted as was previously described [13] with some modifications. Briefly, 8µm thick sections of fixed jejunum were sectioned and placed onto poly L-lysine (Sigma-Aldrich, P8920) coated coverslips, then fixed again in 4% FA in PBS for 15 min R/T followed by 70% Ethanol dehydration for 2h in 4°C. Tissues were treated for 10 min R/T with proteinase K (10 µg/ml Ambion AM2546) and washed twice with 2× SSC (Ambion AM9765), then incubated in wash buffer (20% Formamide Ambion AM9342, 2× SSC) for 10 min R/T. Next, the tissues were mounted with the hybridization buffer (10% Dextran sulfate Sigma D8906, 20% Formamide, 1 mg/ml E.coli tRNA, Sigma R1753, 2× SSC, 0.02% BSA, Ambion AM2616, 2 mM Vanadylribonucleoside complex, NEB S1402S) mixed with 1:3000 dilution of specific probe libraries and were transferred to an overnight incubation at 30°C. Probe libraries were designed using the Stellaris FISH Probe Designer Software (Biosearch Technologies, Inc., Petaluma, CA). For probe library list see Table S3. Imaging was performed on Nikon eclipse Ti2 inverted fluorescence microscopes equipped with 100x oil-immersion objectives and a Photometrics Prime 95B 25MM EMCCD camera. All images were taken using ×100 magnifications, therefore several fields of view were stitched together to cover the whole ILF/FAE area using Fiji software. Quantification of smFISH was done using ImageM [14]. Three to five imaged tissues from at least two mice were segmented and fluorescent dots were counted and divided by the segmented cell volume to obtain the mRNA concentration per cell. At  Table S2).