Spatial transcriptome sequencing of FFPE tissues at cellular level

Formalin-fixed paraffin-embedded (FFPE) tissues are the most abundant archivable specimens in clinical tissue banks, but unfortunately incompatible with single-cell level whole transcriptome sequencing due to RNA degradation in storage and RNA damage in extraction. We developed an in-tissue barcoding approach namely DBiT-seq for spatially revolved whole transcriptome sequencing at cellular level, which required no tissue dissociation or RNA exaction, thus potentially more suited for FFPE samples. Herein, we demonstrated spatial transcriptome sequencing of embryonic and adult mouse FFPE tissue sections at cellular level (25μm pixel size) with high coverage (>1,000 genes per pixel). Spatial transcriptome of an E10.5 mouse embryo identified all major anatomical features in the brain and abdominal region. Integration with singlecell RNA-seq data for cell type identification indicated that most tissue pixels were dominated by single-cell transcriptional phenotype. Spatial mapping of adult mouse aorta, atrium, and ventricle tissues identified the spatial distribution of a variety of cell types. Spatial transcriptome sequencing of FFPE samples at cellular level may provide enormous opportunities in a wide range of biomedical research. It may allow us to exploit the huge resource of clinical tissue specimens to study human disease mechanisms and discover tissue biomarkers or therapeutic targets.

We have developed high-spatial-resolution spatial omics sequencing vis deterministic barcoding in tissue (DBiT-seq) 17 , which was distinct from other NGS-based spatial transcriptome techniques in that it required no de-crosslinking for mRNA release and yielded high-quality transcriptome data from paraformaldehyde(PFA)-fixed tissue sections. Extending it to highcoverage spatial transcriptome sequencing of FFPE tissues at cellular level would be another major leap. Herein, we demonstrated spatially resolved whole transcriptome mapping of mouse embryo (E10.5) FFPE tissue samples with 25μm pixel size and identified all major tissue types in the brain and abdominal region at the cellular level. Integration with scRNA-seq data allowed for identification of 40+ cell types and revealed that most tissue pixels were dominated by single-cell transcriptome. Applying it to adult mouse heart (atrium and ventricle) and aorta tissues demonstrated high-coverage (>1000 genes per pixel) spatial transcriptome and the detection of sparse cell types in the cardiovascular tissues. This work represents a major leap forward to unlock the enormous resource of clinical histology specimens for human disease research.
The main workflow for FFPE samples is shown in Figure 1a. The banked FFPE tissue block was first microtomed into sections of 5-7 µm in thickness and placed onto a poly-L-lysinecoated glass slide. If the FFPE tissue sections were not to be analyzed right away, they should be stored at -80 °C prior to use in order to reduce RNA oxidative degradation by air exposure.
Next, deparaffinization was carried out with standard xylene wash. Afterwards, the tissue section was rehydrated and permeabilized by proteinase K, and then post-fixed again with formalin. The deparaffinized tissue section ready for DBiT-seq exhibited a darkened and higher-contrast tissue morphology ( Figure 1b). Then, the 1 st PDMS chip with 50 parallel channels was attached onto the tissue slide and a set of DNA barcode A1-A50 oligos were prepared in the reverse transcription (RT) mix and flowed through the channels to perform in tissue RT to produce cDNAs in situ with barcode A incorporated at the 3' end. Afterwards, the 1 st PDMS chip was removed and replaced with a 2 nd PDMS chip containing another set of 50 microchannels perpendicular to the first set of microchannels. Ligation was then performed in each of the channel by flowing a set of barcode B1-B50 oligos plus a universal ligation linker, which was complementary to the half-linker sequence in barcode A and B oligos in order to join them together in proximity to form full barcodes A-B. Thus, the ligation would only occur at the intersection of two flows where both barcode A and barcode B were present. Afterwards, the tissue was imaged and digested to collect cDNA to perform the downstream procedure including template switch, PCR amplification, and tagmentation to prepare the NGS library for paired-end sequencing.
The attachment of PDMS chip to the tissue section was secured with a clamp set, and the clamping force could cause the deformation of tissue under the microfluidic channel walls. Therefore, after the application of two PDMS microfluidic chips onto the same tissue section in orthogonal directions, the slight deformation of tissue surface gave rise to a 2D grid of square features (Figure 1c), which allowed for the precise identification of individual DBiT-seq pixels and the corresponding location and morphology. The quality of cDNAs was evaluated by electrophoretic size distribution and compared between an archived FFPE mouse embryo sample and an FPA-fixed fresh frozen sample ( Figure S1a&b). We noticed that the FFPE sample cDNA fragment size peaked between 400 and 500 bps, significantly shorter than that of the PFA-fixed fresh frozen sample which had the main peaks over 1000 bps. The average size was calculated to be ~600 bps for FFPE and ~1,400 bps for the PFA-fixed fresh frozen sample. This difference was due in part to the formalin cross-linking of RNA and proteins causing reduced accessible RNA segment length and the degradation of RNA during storage. Next, we assessed the quality of spatial transcriptome sequencing data based on total number of genes or unique molecular identifiers (UMIs) per pixel (Figure 1d). For FFPE samples, we found the results were variable among different experiments and sample types. For the mouse embryo samples, we obtained on average of 520 UMIs and 355 genes per pixel. For the mouse aorta sample, the average number of UMIs or genes per pixel were 1,830 and 663, respectively. For the adult mouse heart FFPE samples, we detected 3,014 UMIs and 1040 genes for atrium and 2,140 UMIs and 832 genes for ventricle. In comparison, we revisited the dataset of a PFA-fixed fresh frozen mouse embryo sample analyzed by DBiT-seq, which showed an average of 4,688 UMIs and 2,100 genes with the same pixel size (25μm). In order to validate the gene expression profile, we performed correlation analysis of the pseudo-bulk DBiT-seq data between FFPE and PFA-fixed fresh frozen mouse embryo tissue samples. The Pearson correlation coefficient R was ~0.88 ( Figure S1c), which demonstrated a good agreement between the two types of experiments despite the difference in mapped tissue regions. The performance was also compared to spatial transcriptome mapping data from Slide-seq 15 and Slide-seqV2 18 , which were obtained using unfixed fresh frozen mouse brain or embryo tissue samples.
Using an E10.5 mouse embryo FFPE tissue (Figure 2a), we conducted DBiT-seq on two adjacent sections to analyze two anatomic areas -the brain region (FFPE-1) and the abdominal region (FFPE-2), respectively. Using the Seurat package, clustering analysis of spatial pixel transcriptomes combining DBiT-seq data from both samples revealed 10 distinct clusters ( Figure   2b). Mapping the clusters back to the spatial location identified spatially distinct patterns that agreed with the anatomical annotation ( Figure 2c). Cluster 0 mainly represents the muscle structure in embryo. Cluster 3 covers the central nerve system including neural tube, forebrain and related nervous tissues. Cluster 4 is specific for ganglions, which comprises the brain ganglions and the dorsal root ganglions (Figure 2c right). High spatial resolution allows us to observe individual bone segments in the spine (cluster 6). Liver is largely shown as cluster 7.
Heart comprises two layers of pixels with cluster 8 for myocardium and cluster 10 for epicardium.
Cluster 9 is scattered within the neural tube region, probably representing a specific subset of neurons. These results demonstrated that high-spatial-resolution DBiT-seq could resolve fine tissue structures close to the cellular level. We further conducted GO analysis (Figure 2d) for each cluster, and the GO pathways matched well the anatomical annotation. The top 10 differentially expressed genes (DEG) were shown in a heatmap ( Figure S2). We also conducted similar clustering analysis with each tissue sample as a separate dataset and the results revealed similar spatial patterns ( Figure S3). DEGs for each cluster can be analyzed and compared ( Figure S4). For example, Stmn2 and Mapt2, which encode microtubule associated proteins and are important for neuron development, were mainly expressed in forebrain and the neural tube.
Fabp7, a gene encoding the brain fatty acid binding protein, was expressed mainly in the hindbrain.
Myosin associated genes, Myl2, Myh7 and Myl3, were highly enriched in heart. Slc4a1, a gene related to blood coagulation, was detected extensively in liver, where most coagulation factors were produced. Copx, a heme biosynthetic enzyme encoding gene, was also produced in liver.
Afp, which encodes alpha-fetoprotein, one of the earliest proteins synthesized by the embryonic liver, was observed exclusively in an organ-specific manner.
We then applied SpatialDE, an unsupervised spatial differential gene expression analysis tool 19 , to the mouse embryo FFPE DBiT-seq data. It identified 30 spatial patterns for each of the two FFPE samples ( Figure S5&S6). GO analysis of the SpatialDE-identified gene sets revealed the biological meaning of each spatial pattern. For example in FFPE-1, pattern 0 represents neural precursor cell proliferation and pattern 7 corresponds to eye morphogenesis. In FFPE-2, cluster 20 is specific for the heme metabolic process and cluster 26 strongly enriched in the heart tissue is for cardiac muscle contraction.
To identify the dominant cell type in each pixel, we performed integrated analysis of mouse embryo (E10.5) DBiT-seq data and scRNA-seq data from literature corresponding to the same developmental stage of mouse embryos 20 . We first compared the aggregated "pseudo bulk" data between DBiT-seq and scRNA-seq by unsupervised clustering (Figure 2e). In the UMAP plot, DBiT-seq data of FFPE-1 and FFPE-2 were in close proximity with the scRNA-seq data of E10.5 mouse embryo samples, which validated the FFPE DBiT-seq data for capturing the correct embryonic age even with lower coverage or the number of genes detected. We then performed the integrated analysis of these two types of data by combining the transcriptomes of all individual pixels from DBiT-seq with the transcriptomes of single cells for clustering analysis in Seurat after normalization with SCTransform 21 . The DBiT-seq pixels conformed to the clusters of scRNA-seq ( Figure 3a), enabling the transfer of cell type annotations from single-cell transcriptomes to the spatial pixels and also to map different cell types back to spatial distribution (Figure 3d). In FFPE-1, cluster 3 mainly consisted of oligodendrocytes. Epithelial cells (cluster 4) and neural epithelial cells (cluster 13) were observed widely in epithelial glands. Interestingly, excitatory neurons (cluster 7) and inhibitory neurons (cluster 17) were both observed in the neural tube but forming a mixed pattern to fulfil their functionally distinct roles in transporting neurotransmitters. This integrative analysis answered an unresolved question in Figure 2c with regards to the specific subset of neurons observed in the neural tube by unsupervised clustering of DBiT-seq data alone.
In FFPE-2, several organ-specific cell types were detected. For example, the primitive erythroid cells (cluster 14) crucial for early embryonic erythroid development and the transition from embryo to fetus in developing mammals were strongly enriched in liver 22 . Cardiac muscle cell types were observed mainly in the heart region in agreement with the anatomical annotation. In brief, integration with published scRNA-seq data could distinguish cell identity more robustly as compared to the differential gene expression and GO analysis of DBiT-seq data alone.
We next examined a mouse aorta FFPE tissue section ( Figure 4a). The aorta tissue block was cross-sectioned, showing a thin wall of the artery along with the surrounding tissue. The heatmaps of gene and UMI counts ( Figure 4b) showed more than 1,000 genes detected in 50% of the tissue pixels. Unsupervised clustering did not show distinct spatial patterns due to the lack of distinct tissue types and the dominance of specific cell types such as smooth muscle cells in this sample ( Figure S7b). However, when integrated with scRNA-seq reference data from a mouse aorta 23 , we could identify six distinct cell types, including endothelial cells (ECs), arterial fibroblasts (Fibro), macrophages (Macro), monocytes(Mono), neurons and vascular smooth muscle cells (VSMCs). Most cells were ECs, VSMCs and Fibros. We also noticed that there was a layer of enriched smooth muscle cells in the artery wall, which were known to be the major cell type in a large artery 24 . We also performed automatic cell annotation using SingleR to analyze this aorta DBiT-seq data in comparison to the built-in reference database provided in the SingleR package based on scRNA-seq of mouse tissues ( Figure S7c). It is worth pointing out that adipocytes that normally exist in the supporting tissue around the artery were readily identified.
Meanwhile, the adipocyte-specific genes like Adipoq and Aoc3 were observed to express at high levels in the surround tissue region ( Figure S7d).
Lastly, we analyzed adult mouse atrium and ventricle FFPE samples using DBiT-seq ( Figure 4f&g). Although cardiomyocytes only account for 30-40% of the total cell number in a heart, the volume fraction of cardiomyocytes can reach up to 70-80% 25 . Indeed, we observed the expression of muscle-related genes like Myh6 extensively throughput the cardiac tissue ( Figure S8a), which encodes a protein known as the cardiac alpha (α)-myosin heavy chain. The fact that there is a large volume of cardiomyocytes in this tissue posed a challenge for spatial expression pattern analysis due to the dominance of one specific cell type and the lack of distinct anatomic landmarks. Unsupervised clustering of the DBiT-seq pixels from atrium and ventricle using Seurat could not resolve highly distinct clusters ( Figure S8b&c). However, when integrated with scRNA-seq reference data from the mouse hearts 26  In summary, we demonstrated spatially resolved transcriptome sequencing of FFPE tissue sections with 25μm pixel size. The data quality in terms of the number of UMIs and genes detected was lower than that from PFA-fixed frozen sections, but still yielded highly meaningful results with ~1,000 genes per pixel achieved across whole transcriptome, which was comparable to other high-spatial-resolution (10 or 20μm spot size) spatial transcriptome technologies 15,16 that are currently compatible with fresh frozen samples only. Applying our technology to mouse embryo FFPE tissues resulted in the identification of 11 spatial patterns that agreed with anatomical annotations. Integration with published scRNA-seq data further improved cell type identification and revealed that most spatial tissue pixels were dominated by single-cell transcriptome. We

Conflict of interests
R.F. is scientific founder and advisor of IsoPlexis, Singleron Biotechnologies, and AtlasXomics.
The interests of R.F. were reviewed and managed by Yale University Provost's Office in accordance with the University's conflict of interest policies.

Supplementary information
Supplementary Information can be found online at [to be inserted, SI is also provided as part of the manuscript submission].

Fabrication of microfluidic device
Soft lithography was used to produce the PDMS (polydimethylsiloxane) microfluidic device. The According to Zyagen protocol, the mice used in this project were purchased from Charles River Laboratories. Adult mice were sacrificed upon arrival, and the aorta, atrium and ventricle were collected. The embryo (E10.5) were collected the day the pregnant mouse was received. The FFPE tissues were processed following standard protocols, which includes fixation (10% formalin), dehydration (ethanol series: 70%-100%), clearing (100% xylene), paraffin infiltration and embedding. FFPE sample was sectioned with a thickness of 5-7 µm and placed onto a poly-Llysine coated glass slide. After receiving the sectioned FFPE slides, the tissue sections were stored at -80 °C in a sealed bag until use.

Deparaffinization of tissue section
Prior to deparaffinization, adult mouse or mouse embryo FFPE tissue slides were first baked at 60°C for 1 hour to ensure that the tissue sections were properly attached to glass slide.
Deparaffinization was performed by two times washing with Xylene (100%) for 5 minutes each.
To remove the remaining xylene, the section was washed 5 minutes with 100% ethanol. Tissue was then rehydrated by immersing in 90%, 70% and 50% ethanol for 5 minutes each, and finally placed in PBS with 0.1% Tween-20. The tissue was permeabilized for 5 minutes with Proteinase K 7.5µg/ml in PBST and fixed in 4% formaldehyde with 0.2% Glutaraldehyde for 20 minutes.  Table S1 and all other reagents were listed as Table S2 and S3.

In tissue reverse transcription with Barcode A
The deparaffinized tissue section was blocked by 1% BSA solution in PBS plus RNase inhibitor

In tissue ligation with Barcode B
The 2 nd PDMS slab with channels perpendicular to the 1 st PDMS was attached to the dried slide with care. A brightfield image was taken (10x, Thermo Fisher EVOS fl microscope) and the same clamp was used here to press the PDMS against the tissue. We then prepared 115.8 µL ligation mix by adding the following reagents into a 1.5 mL Eppendorf tube.

Tissue digestion
After removing the 2 nd PDMS, the tissue section was dipped in water and dried with air before taking the final brightfield image. Afterwards, we prepared proteinase K lysis solution, which contains 2 mg/mL proteinase K (Thermo Fisher), 10 mM Tris (pH = 8.0), 200 mM NaCl, 50 mM EDTA and 2% SDS. We then covered the tissue region of interest with a square well PDMS gasket and then loaded around ~25 µL of lysis solution into it. The lysis was performed at 55 °C for 2 hours in a wet box. The tissue lysate was collected into a 1.5 mL Eppendorf tube and purified using streptavidin beads (Dynabeads MyOne Streptavidin C1 beads, Thermo Fisher) or stored at -80 °C until use.

cDNA extraction
Before extraction, RNase free water was first added into the lysate to bring the total volume up to 100 µL. 5 µL of PMSF (100 µM, Sigma) was added to the lysate and incubated for 10 minutes at room temperature to inhibit the activity of Proteinase K. Meanwhile, the magnetic beads were cleaned three times with 1X B&W buffer with 0.05% Tween-20 and dispersed into 100 µL of 2X B&W buffer (with 2 μL of SUPERase In Rnase Inhibitor). After adding 100 µL of the cleaned streptavidin beads suspension to the lysate, the mixture was incubated for 60 minutes at room temperature with gentle shaking. Afterwards, the beads were washed twice with 1X B&W buffer and 1X Tris buffer (with 0.1% Tween-20) once.

Template switch
After cleaning, the beads were resuspended into 132 μL of the template switch reaction mix, which consists of: Template switch was performed first at room temperature for 30 minutes and then at 42 °C for 90 minutes. After reaction, the beads were pulled down using the magnetic stand and rinsed once with 500 μL 10 mM Tris plus 0.1% Tween-20, and then cleaned with 500 μL RNase free water.

PCR amplification
There are two separate PCR processes. In the first PCR, the cleaned beads with template switched cDNAs were first resuspended into the PCR mix, which contains 110 µL Kapa HiFi The PCR product was kept in 4°C until next step.

Sequencing library preparation
To remove remaining PCR primers, the PCR product was purified using the Ampure XP beads (Beckman Coulter) at 0.6x ratio following standard protocol. The purified cDNA was then quantified by an Agilent Bioanalyzer High Sensitivity Chip. A Nextera XT Library Prep Kit (Illumina, FC-131-1024) was used to prepare the sequencing library. 1 ng of the cDNA was used as the starting material, and the library preparation is following manufacture protocols. The library was then analyzed by bioanalyzer again and sequenced using a HiSeq 4000 sequencer with pair-end 100x100 mode.
DBiT-seq data pre-processing Read 1 of the raw sequencing data contains the transcriptome information, while Read 2 holds the UMI, Barcode A and Barcode B. Following ST pipeline v1.7.2 27 , Read 1 was trimmed, filtered, STAR mapped (STAR version 2.6.0a) against the mouse genome(GRCh38) and annotated using Gencode release M11. Most of the default parameters were used when running ST pipeline, except that the "--min-length-qual-trimming" was set to 10. The final expression matrix has the location info as rows and gene expression levels as columns. R package "ggolot2" was used to plot the spatial heatmaps for pan-mRNA or individual genes.

UMI and Gene Counts comparison with other techniques
To compare with other NGs based spatial RNA-seq technique (Fresh Frozen tissue sections), we downloaded published data: 10X Visium: 151673_filtered_feature_bc_matrix.h5 (human cortex) Slide-seq: Puck_180413_7(coronal hippocampus) 15 Slide-seqV2: Puck_190926_03 (mouse embryo) 18 DBiT-seq: Mouse embryo Brain E11 25μm resolution 17 The total UMI and Gene counts were calculated for each of the spots(pixels) and then the violin plots for each technique were plotted side-by-side.

Pseudo bulk comparison with reference
The pseudo bulk data for FFPE samples were obtained by summing counts for each gene in each sample and divided by the sum of total UMI counts, and further multiplied by 1 million. Similarly, the pseudo bulk data was calculated using the E9.5-E13.5 embryo scRNA-seq data from Reference paper 20 .

Clustering with Seurat
We used Seurat V3.2 28,29 to analyze the spatial transcriptome data of all the FFPE samples. Data integration and normalization were performed with the SCTransform workflow. The top 3,000 variable features were selected when doing data integration. For PCA analysis and UMAP visualization, the dimensions were set to 10, and the clustering resolution was set to 0.8.

Differentially expressed genes for each cluster was obtained by comparison of cells in individual
clusters against all remaining cells.

SpatialDE analysis
To study spatial patterns of gene expression, SpatialDE, an unsupervised automatic expression analysis tool was conducted for both adult heart and mouse embryo samples. Following standard workflow, SpatialDE identified >15 distinct spatial patterns in the mouse embryo sample. The results agree with Seurat pixel-based clustering results.

Cell type annotation
Cell type annotation was achieved by integration analysis (Seurat V3.2, SCTransform) combining the spatial transcriptome data of FFPE samples and the corresponding published scRNA-seq reference. After clustering, the spatial pixel data conformed well with the scRNA-seq data, and thus the cell types were assigned based on the scRNA-seq cell type annotation for each cluster (if two cell types presented in one cluster, the major cell types were assigned). SingleR is also used for aorta sample annotation with the built-in reference "MouseRNAseqData" 30 .

GO analysis
GO analysis was completed using the "GO Enrichment Analysis" module at http://geneontology.org/ with default settings. The biological process was ranked by the gene ratio and the top 3-5 biological process were plotted using the "dotplot" function in ggplot2.