Abstract
Formalin-fixed paraffin-embedded (FFPE) tissues are the most abundant archivable specimens in clinical tissue banks, but unfortunately incompatible with single-cell level whole transcriptome sequencing due to RNA degradation in storage and RNA damage in extraction. We developed an in-tissue barcoding approach namely DBiT-seq for spatially revolved whole transcriptome sequencing at cellular level, which required no tissue dissociation or RNA exaction, thus potentially more suited for FFPE samples. Herein, we demonstrated spatial transcriptome sequencing of embryonic and adult mouse FFPE tissue sections at cellular level (25μm pixel size) with high coverage (>1,000 genes per pixel). Spatial transcriptome of an E10.5 mouse embryo identified all major anatomical features in the brain and abdominal region. Integration with singlecell RNA-seq data for cell type identification indicated that most tissue pixels were dominated by single-cell transcriptional phenotype. Spatial mapping of adult mouse aorta, atrium, and ventricle tissues identified the spatial distribution of a variety of cell types. Spatial transcriptome sequencing of FFPE samples at cellular level may provide enormous opportunities in a wide range of biomedical research. It may allow us to exploit the huge resource of clinical tissue specimens to study human disease mechanisms and discover tissue biomarkers or therapeutic targets.
Clinical tissue samples are often stored as formalin fixed paraffin embedded (FFPE) blocks at room temperature, representing the most abundant resource of archived human specimens. For clinical histopathology and diagnostic purpose, tissue morphology is best preserved in FFPE as compared to other tissue banking methods, especially after prolonged storage1,2. Consequently, a huge volume of clinical FFPE tissue samples are readily available worldwide in hospitals and research institutions, which is a valuable source exploitable for retrospective tissue profiling and human disease research3. However, during the sample preparation and storage, nucleic acids including mRNAs in FFPE tissue often lost integrity and became partially degraded or fragmented4. In order to perform whole transcriptome analysis of FFPE samples using, for example, RNA sequencing (RNA-seq)5, a harsh chemical process for tissue decrosslinking, digestion, and RNA extraction is required, which unfortunately resulted in significant RNA degradation, damage, and loss. The bulk tissue digestion process also resulted in the loss of spatial and cellular information needed to trace the cellular origin of mRNAs6, 7.
Despite recent breakthroughs in massively parallel single-cell RNA sequencing (scRNA-seq) that have transformed all major fields of biological and biomedical research8–10, FFPE samples are not yet amenable to single-cell transcriptome sequencing using current techniques. Spatial transcriptomics emerged to address the limitation of scRNA-seq by retaining the spatial information of gene expression in the tissue context essential for a true mechanistic understanding of tissue organization, development, and pathogenesis. All early attempts of spatial transcriptomics were based on single-molecule fluorescence in situ hybridization(smFISH) or image-based in situ sequencing11–13. In order to measure the expression of mRNAs at the transcriptome level, it requires repeated hybridization and imaging cycles using high-end advanced fluorescence microscopy, which is technically demanding, costly, and time consuming. Moreover, most of these methods do not analyze RNAs base-by-base but rely on predesigned probes to detect known sequences only. It is highly desirable to harness the power of Next Generation Sequencing (NGS) to realize unbiased genome-wide profiling of spatial gene expression with high throughput and low lost. A barcoded solid-phase RNA capture approach was developed for coarse resolution (~100μm) spatial transcriptomics using DNA spot microarray14, which was recently improved to cellular resolution (~10μm) using self-assembled DNA barcode beads in Slide-seq and HDST15, 16. However, these NGS-based spatial transcriptomics methods were fundamentally limited by the requirement to de-crosslink FFPE tissues for RNA extraction, making it difficult to realize high-coverage transcriptome sequencing at cellular level.
We have developed high-spatial-resolution spatial omics sequencing vis deterministic barcoding in tissue (DBiT-seq)17, which was distinct from other NGS-based spatial transcriptome techniques in that it required no de-crosslinking for mRNA release and yielded high-quality transcriptome data from paraformaldehyde(PFA)-fixed tissue sections. Extending it to high-coverage spatial transcriptome sequencing of FFPE tissues at cellular level would be another major leap. Herein, we demonstrated spatially resolved whole transcriptome mapping of mouse embryo (E10.5) FFPE tissue samples with 25μm pixel size and identified all major tissue types in the brain and abdominal region at the cellular level. Integration with scRNA-seq data allowed for identification of 40+ cell types and revealed that most tissue pixels were dominated by single-cell transcriptome. Applying it to adult mouse heart (atrium and ventricle) and aorta tissues demonstrated high-coverage (>1000 genes per pixel) spatial transcriptome and the detection of sparse cell types in the cardiovascular tissues. This work represents a major leap forward to unlock the enormous resource of clinical histology specimens for human disease research.
The main workflow for FFPE samples is shown in Figure 1a. The banked FFPE tissue block was first microtomed into sections of 5-7 μm in thickness and placed onto a poly-L-lysine-coated glass slide. If the FFPE tissue sections were not to be analyzed right away, they should be stored at −80 °C prior to use in order to reduce RNA oxidative degradation by air exposure. Next, deparaffinization was carried out with standard xylene wash. Afterwards, the tissue section was rehydrated and permeabilized by proteinase K, and then post-fixed again with formalin. The deparaffinized tissue section ready for DBiT-seq exhibited a darkened and higher-contrast tissue morphology (Figure 1b). Then, the 1st PDMS chip with 50 parallel channels was attached onto the tissue slide and a set of DNA barcode A1-A50 oligos were prepared in the reverse transcription (RT) mix and flowed through the channels to perform in tissue RT to produce cDNAs in situ with barcode A incorporated at the 3, end. Afterwards, the 1st PDMS chip was removed and replaced with a 2nd PDMS chip containing another set of 50 microchannels perpendicular to the first set of microchannels. Ligation was then performed in each of the channel by flowing a set of barcode B1-B50 oligos plus a universal ligation linker, which was complementary to the half-linker sequence in barcode A and B oligos in order to join them together in proximity to form full barcodes A-B. Thus, the ligation would only occur at the intersection of two flows where both barcode A and barcode B were present. Afterwards, the tissue was imaged and digested to collect cDNA to perform the downstream procedure including template switch, PCR amplification, and tagmentation to prepare the NGS library for paired-end sequencing.
The attachment of PDMS chip to the tissue section was secured with a clamp set, and the clamping force could cause the deformation of tissue under the microfluidic channel walls. Therefore, after the application of two PDMS microfluidic chips onto the same tissue section in orthogonal directions, the slight deformation of tissue surface gave rise to a 2D grid of square features (Figure 1c), which allowed for the precise identification of individual DBiT-seq pixels and the corresponding location and morphology. The quality of cDNAs was evaluated by electrophoretic size distribution and compared between an archived FFPE mouse embryo sample and an FPA-fixed fresh frozen sample (Figure S1a&b). We noticed that the FFPE sample cDNA fragment size peaked between 400 and 500 bps, significantly shorter than that of the PFA-fixed fresh frozen sample which had the main peaks over 1000 bps. The average size was calculated to be ~600 bps for FFPE and ~1,400 bps for the PFA-fixed fresh frozen sample. This difference was due in part to the formalin cross-linking of RNA and proteins causing reduced accessible RNA segment length and the degradation of RNA during storage. Next, we assessed the quality of spatial transcriptome sequencing data based on total number of genes or unique molecular identifiers (UMIs) per pixel (Figure 1d). For FFPE samples, we found the results were variable among different experiments and sample types. For the mouse embryo samples, we obtained on average of 520 UMIs and 355 genes per pixel. For the mouse aorta sample, the average number of UMIs or genes per pixel were 1,830 and 663, respectively. For the adult mouse heart FFPE samples, we detected 3,014 UMIs and 1040 genes for atrium and 2,140 UMIs and 832 genes for ventricle. In comparison, we revisited the dataset of a PFA-fixed fresh frozen mouse embryo sample analyzed by DBiT-seq, which showed an average of 4,688 UMIs and 2,100 genes with the same pixel size (25μm). In order to validate the gene expression profile, we performed correlation analysis of the pseudo-bulk DBiT-seq data between FFPE and PFA-fixed fresh frozen mouse embryo tissue samples. The Pearson correlation coefficient R was ~0.88 (Figure S1c), which demonstrated a good agreement between the two types of experiments despite the difference in mapped tissue regions. The performance was also compared to spatial transcriptome mapping data from Slide-seq15 and Slide-seqV218, which were obtained using unfixed fresh frozen mouse brain or embryo tissue samples.
Using an E10.5 mouse embryo FFPE tissue (Figure 2a), we conducted DBiT-seq on two adjacent sections to analyze two anatomic areas – the brain region (FFPE-1) and the abdominal region (FFPE-2), respectively. Using the Seurat package, clustering analysis of spatial pixel transcriptomes combining DBiT-seq data from both samples revealed 10 distinct clusters (Figure 2b). Mapping the clusters back to the spatial location identified spatially distinct patterns that agreed with the anatomical annotation (Figure 2c). Cluster 0 mainly represents the muscle structure in embryo. Cluster 3 covers the central nerve system including neural tube, forebrain and related nervous tissues. Cluster 4 is specific for ganglions, which comprises the brain ganglions and the dorsal root ganglions (Figure 2c right). High spatial resolution allows us to observe individual bone segments in the spine (cluster 6). Liver is largely shown as cluster 7. Heart comprises two layers of pixels with cluster 8 for myocardium and cluster 10 for epicardium. Cluster 9 is scattered within the neural tube region, probably representing a specific subset of neurons. These results demonstrated that high-spatial-resolution DBiT-seq could resolve fine tissue structures close to the cellular level. We further conducted GO analysis (Figure 2d) for each cluster, and the GO pathways matched well the anatomical annotation. The top 10 differentially expressed genes (DEG) were shown in a heatmap (Figure S2). We also conducted similar clustering analysis with each tissue sample as a separate dataset and the results revealed similar spatial patterns (Figure S3). DEGs for each cluster can be analyzed and compared (Figure S4). For example, Stmn2 and Mapt2, which encode microtubule associated proteins and are important for neuron development, were mainly expressed in forebrain and the neural tube. Fabp7, a gene encoding the brain fatty acid binding protein, was expressed mainly in the hindbrain. Myosin associated genes, Myl2, Myh7 and Myl3, were highly enriched in heart. Slc4a1, a gene related to blood coagulation, was detected extensively in liver, where most coagulation factors were produced. Copx, a heme biosynthetic enzyme encoding gene, was also produced in liver. Afp, which encodes alpha-fetoprotein, one of the earliest proteins synthesized by the embryonic liver, was observed exclusively in an organ-specific manner.
We then applied SpatialDE, an unsupervised spatial differential gene expression analysis tool19, to the mouse embryo FFPE DBiT-seq data. It identified 30 spatial patterns for each of the two FFPE samples (Figure S5&S6). GO analysis of the SpatialDE-identified gene sets revealed the biological meaning of each spatial pattern. For example in FFPE-1 pattern 0 represents neural precursor cell proliferation and pattern 7 corresponds to eye morphogenesis. In FFPE-2, cluster 20 is specific for the heme metabolic process and cluster 26 strongly enriched in the heart tissue is for cardiac muscle contraction.
To identify the dominant cell type in each pixel, we performed integrated analysis of mouse embryo (E10.5) DBiT-seq data and scRNA-seq data from literature corresponding to the same developmental stage of mouse embryos20. We first compared the aggregated “pseudo bulk” data between DBiT-seq and scRNA-seq by unsupervised clustering (Figure 2e). In the UMAP plot, DBiT-seq data of FFPE-1 and FFPE-2 were in close proximity with the scRNA-seq data of E10.5 mouse embryo samples, which validated the FFPE DBiT-seq data for capturing the correct embryonic age even with lower coverage or the number of genes detected. We then performed the integrated analysis of these two types of data by combining the transcriptomes of all individual pixels from DBiT-seq with the transcriptomes of single cells for clustering analysis in Seurat after normalization with SCTransform21. The DBiT-seq pixels conformed to the clusters of scRNA-seq (Figure 3a), enabling the transfer of cell type annotations from single-cell transcriptomes to the spatial pixels and also to map different cell types back to spatial distribution (Figure 3d). In FFPE-1, cluster 3 mainly consisted of oligodendrocytes. Epithelial cells (cluster 4) and neural epithelial cells (cluster 13) were observed widely in epithelial glands. Interestingly, excitatory neurons (cluster 7) and inhibitory neurons (cluster 17) were both observed in the neural tube but forming a mixed pattern to fulfil their functionally distinct roles in transporting neurotransmitters. This integrative analysis answered an unresolved question in Figure 2c with regards to the specific subset of neurons observed in the neural tube by unsupervised clustering of DBiT-seq data alone. In FFPE-2, several organ-specific cell types were detected. For example, the primitive erythroid cells (cluster 14) crucial for early embryonic erythroid development and the transition from embryo to fetus in developing mammals were strongly enriched in liver22. Cardiac muscle cell types were observed mainly in the heart region in agreement with the anatomical annotation. In brief, integration with published scRNA-seq data could distinguish cell identity more robustly as compared to the differential gene expression and GO analysis of DBiT-seq data alone.
We next examined a mouse aorta FFPE tissue section (Figure 4a). The aorta tissue block was cross-sectioned, showing a thin wall of the artery along with the surrounding tissue. The heatmaps of gene and UMI counts (Figure 4b) showed more than 1,000 genes detected in 50% of the tissue pixels. Unsupervised clustering did not show distinct spatial patterns due to the lack of distinct tissue types and the dominance of specific cell types such as smooth muscle cells in this sample (Figure S7b). However, when integrated with scRNA-seq reference data from a mouse aorta23, we could identify six distinct cell types, including endothelial cells (ECs), arterial fibroblasts (Fibro), macrophages (Macro), monocytes(Mono), neurons and vascular smooth muscle cells (VSMCs). Most cells were ECs, VSMCs and Fibros. We also noticed that there was a layer of enriched smooth muscle cells in the artery wall, which were known to be the major cell type in a large artery24. We also performed automatic cell annotation using SingleR to analyze this aorta DBiT-seq data in comparison to the built-in reference database provided in the SingleR package based on scRNA-seq of mouse tissues (Figure S7c). It is worth pointing out that adipocytes that normally exist in the supporting tissue around the artery were readily identified. Meanwhile, the adipocyte-specific genes like Adipoq and Aoc3 were observed to express at high levels in the surround tissue region (Figure S7d).
Lastly, we analyzed adult mouse atrium and ventricle FFPE samples using DBiT-seq (Figure 4f&g). Although cardiomyocytes only account for 30-40% of the total cell number in a heart, the volume fraction of cardiomyocytes can reach up to 70-80%25. Indeed, we observed the expression of muscle-related genes like Myh6 extensively throughput the cardiac tissue (Figure S8a), which encodes a protein known as the cardiac alpha (α)-myosin heavy chain. The fact that there is a large volume of cardiomyocytes in this tissue posed a challenge for spatial expression pattern analysis due to the dominance of one specific cell type and the lack of distinct anatomic landmarks. Unsupervised clustering of the DBiT-seq pixels from atrium and ventricle using Seurat could not resolve highly distinct clusters (Figure S8b&c). However, when integrated with scRNA-seq reference data from the mouse hearts26, DBiT-seq pixels of atrium and ventricle conformed rather well to single-cell transcriptional clusters and revealed a total of 15 clusters (Figure 4h&j). These clusters were then annotated using the cell types defined by scRNA-seq (Figure S9). The results confirmed that cardiomyocytes were still the main cell type in this tissue and observed across multiple spatial clusters (Figure 5d&f), for example, clusters 1, 4, and 8 in the atrium. A significant number of endothelial cells were observed, presumably corresponding to coronary microvasculature in myocardium. Other cell types, like stromal fibroblasts and macrophages were observed presumably in the interstitial space of cardiomyocyte fibers in the mouse heart.
In summary, we demonstrated spatially resolved transcriptome sequencing of FFPE tissue sections with 25μm pixel size. The data quality in terms of the number of UMIs and genes detected was lower than that from PFA-fixed frozen sections, but still yielded highly meaningful results with ~1,000 genes per pixel achieved across whole transcriptome, which was comparable to other high-spatial-resolution (10 or 20μm spot size) spatial transcriptome technologies15, 16 that are currently compatible with fresh frozen samples only. Applying our technology to mouse embryo FFPE tissues resulted in the identification of 11 spatial patterns that agreed with anatomical annotations. Integration with published scRNA-seq data further improved cell type identification and revealed that most spatial tissue pixels were dominated by single-cell transcriptome. We further analyzed adult mouse aorta, atrium and ventricle FFPE tissue samples and revealed a wide range of cell types localized in the interstitial space of myocardium or the perivascular supporting tissue. As FFPE samples are widely available and represent the most abundant format of archivable clinical tissue samples, we envision that this work will open up new opportunities to revisit the huge resource of clinical tissue banks to study the mechanisms of pathophysiology and to discover new targets for diagnosis and treatment of human diseases.
Author contributions
Conceptualization: Y.L, R.F.; Methodology, Y.L., A.E., and Y.D.; Experimental investigation, Y.L., A.E., and Y.D.; Data Analysis, Y.L., A.E., and Y.D. and R.F.; Writing – Original Draft, Y.L. and R.F.; Writing – Review and Editing, Y.L., A.E., Y.D. and R.F..
Conflict of interests
R.F. is scientific founder and advisor of IsoPlexis, Singleron Biotechnologies, and AtlasXomics. The interests of R.F. were reviewed and managed by Yale University Provost’s Office in accordance with the University’s conflict of interest policies.
Supplementary information
Supplementary Information can be found online at [to be inserted, SI is also provided as part of the manuscript submission].
Methods
Fabrication of microfluidic device
Soft lithography was used to produce the PDMS (polydimethylsiloxane) microfluidic device. The chrome mask was printed by the company Front Range Photomasks (Lake Havasu City, AZ) with high resolution (2 μm). Upon receiving, chrome mask was cleaned using acetone before use to remove any dirt or dust. The negative photoresist SU-8 (SU-8 2025) based mold was fabricated according to manufacturer’s (MicroChem) recommendations using a precleaned silicon wafer substrate. The final SU-8 layer of the mold had a thickness of ~25 μm and a channel width of 25 μm. The fabrication of PDMS microfluidic chips were through a replication molding process. The GE RTV PDMS part A and part B were mixed thoroughly with a 10:1 ratio and poured onto the mold. After degassing for 30min, the PDMS was cured in a 75 °C oven for 2 hours. The cured PDMS slab was then cut and punched with inlet and outlet holes using a 2 mm diameter puncher. The acrylic clamps (rectangle, 22 mm × 40 mm) to strength the attachment of PDMS to glass slide were fabricated using a laser cutter.
Tissue Handling
FFPE samples of adult mouse and mouse embryo were obtained from Zyagen (San Diego, CA). According to Zyagen protocol, the mice used in this project were purchased from Charles River Laboratories. Adult mice were sacrificed upon arrival, and the aorta, atrium and ventricle were collected. The embryo (E10.5) were collected the day the pregnant mouse was received. The FFPE tissues were processed following standard protocols, which includes fixation (10% formalin), dehydration (ethanol series: 70%-100%), clearing (100% xylene), paraffin infiltration and embedding. FFPE sample was sectioned with a thickness of 5-7 μm and placed onto a poly-L-lysine coated glass slide. After receiving the sectioned FFPE slides, the tissue sections were stored at −80 °C in a sealed bag until use.
Deparaffinization of tissue section
Prior to deparaffinization, adult mouse or mouse embryo FFPE tissue slides were first baked at 60°C for 1 hour to ensure that the tissue sections were properly attached to glass slide. Deparaffinization was performed by two times washing with Xylene (100%) for 5 minutes each. To remove the remaining xylene, the section was washed 5 minutes with 100% ethanol. Tissue was then rehydrated by immersing in 90%, 70% and 50% ethanol for 5 minutes each, and finally placed in PBS with 0.1% Tween-20. The tissue was permeabilized for 5 minutes with Proteinase K 7.5μg/ml in PBST and fixed in 4% formaldehyde with 0.2% Glutaraldehyde for 20 minutes.
DNA oligo design
Two sets of DNA barcodes (A1:A50 and B1:B50) were used in this study. Barcode A1:A50 had three different functional regions: a 16-mer poly-T region, an 8-mer spatial barcode region (mark Y-axis location) and a 15-mer ligation linker region (See example Barcode A below). Barcode A1:A50 served as the RT primer and was loaded into each of the 50 channels of the 1st PDMS along with reverse transcription mix. The resulting cDNA products were then ligated to the barcode B1:B50 during the ligation process. There were four different functional parts in barcode B: a 15-mer ligation linker, an 8-mer spatial barcode region(mark X-axis location), a 10-mer unique molecular identifier (UMI), and a PCR handle functionalized with biotin, which is used for purification purpose. Before loading into the 2nd PDMS, barcode B was first annealed to a complementary ligation linker strand and then mixed with DNA ligase reaction mix. The ligation product baring the x and y location information was then extracted and processed with downstream steps. There were theoretically 2,500 pixels in a tissue region of 2.5 mm × 2.5 mm square.
DNA barcode Examples
Barcode A1:
/5Phos/AGGCCAGAGCATTCGAACGTGATTTTTTTTTTTTTTTTVN
Barcode B1: /5Biosg/CAAGCGTTGGCTTCTCGCATCTNNNNNNNNNNAACAACCAATCCACGTGCTTGAG
DNA Oligos and barcodes used in this paper were listed in Table S1 and all other reagents were listed as Table S2 and S3.
In tissue reverse transcription with Barcode A
The deparaffinized tissue section was blocked by 1% BSA solution in PBS plus RNase inhibitor (0.05U/μL, Enzymatics) for 30 minutes at room temperature. After 3-times washing with 1X PBS and 1-time wash with water, the 1st PDMS slab with 50 channels was placed on the glass slide, covering the interested tissue region. The brightfield image (10x, Thermo Fisher EVOS fl microscope) was recorded and used later for the identification of pixel locations. Afterwards, an acrylic clamp with screws was clamped against the center tissue region of interest.
The Reverse Transcription solution (225 μL) was first prepared by mixing:
50 μL of RT buffer (5X, Maxima H Minus kit),
32.8 μL of RNase free water,
1.6 μL of RNase Inhibitor (Enzymatics),
3.1 μL of SuperaseIn RNase Inhibitor (Ambion),
12.5 μL of dNTPs (10 mM, Thermo Fisher),
25 μL of Reverse Transcriptase (Thermo Fisher),
100 μL of 0.5X PBS with Inhibitor (0.05U/μL, Enzymatics).
After vortex mixing, the RT solution was aliquoted into 50 different tubes, with each tube a 4.5 μL solution. Then, into each tube, a 0.5 μL of barcodes A (A1-A50) (25 μM) was added and mixed thoroughly. The 50 tubes of 5 μL of RT reaction solution were loaded into the 50 inlets (each can hold >10 μL solution) on the PDMS. In order to fill up the channel and remove air bubbles, the solution was pulled through each of the 50 channels with vacuum continuously for 3 minutes. The chip was then put into a wet box and incubated at room temperature for 30 minutes and then at 42 °C for another 1.5 hours. After RT, the channels were cleaned up by 1X NEB buffer 3.1(New England Biolabs) with 1% RNase inhibitor (Enzymatics) continuously for 10 minutes. Finally, the clamp and PDMS were removed from the tissue slide. The slide was quickly dipped in water and dried with air.
In tissue ligation with Barcode B
The 2nd PDMS slab with channels perpendicular to the 1st PDMS was attached to the dried slide with care. A brightfield image was taken (10x, Thermo Fisher EVOS fl microscope) and the same clamp was used here to press the PDMS against the tissue. We then prepared 115.8 μL ligation mix by adding the following reagents into a 1.5 mL Eppendorf tube.
69.5 μL of RNase free water,
27 μL of T4 DNA ligase buffer (10X, New England Biolabs),
11 μL T4 DNA ligase (400 U/μL, New England Biolabs),
2.2 μL RNase inhibitor (40 U/μL, Enzymatics),
0.7 μL SuperaseIn RNase Inhibitor (20 U/μL, Ambion),
5.4 μL of Triton X-100 (5%).
DNA barcode B was first annealed with ligation linker by adding 25 μL of Barcode B (100 μM), 25 μL of ligation linker (100 μM) and 50 μL of annealing buffer (10 mM Tris, pH 7.5 – 8.0, 50 mM NaCl,1 mM EDTA). 5 μL ligation reaction solution (totally 50 tubes) was prepared by adding 2 μL of ligation mix, 2 μL of NEB buffer 3.1(1X, New England Biolabs) and 1 μL of each DNA barcode B and ligation linker mix (B1-B50, 25 μM) and then loaded into each of the 50 channels with vacuum. The chip was kept in a wet box and incubated at 37 °C for 30 minutes. After washing by flowing 1X PBS with 0.1% Triton X-100 and 0.25% SUPERase In RNase Inhibitor for 10 minutes, the clamp and PDMS were removed, and the dried slide was ready for tissue digestion.
Tissue digestion
After removing the 2nd PDMS, the tissue section was dipped in water and dried with air before taking the final brightfield image. Afterwards, we prepared proteinase K lysis solution, which contains 2 mg/mL proteinase K (Thermo Fisher), 10 mM Tris (pH = 8.0), 200 mM NaCl, 50 mM EDTA and 2% SDS. We then covered the tissue region of interest with a square well PDMS gasket and then loaded around ~25 μL of lysis solution into it. The lysis was performed at 55 °C for 2 hours in a wet box. The tissue lysate was collected into a 1.5 mL Eppendorf tube and purified using streptavidin beads (Dynabeads MyOne Streptavidin C1 beads, Thermo Fisher) or stored at −80 °C until use.
cDNA extraction
Before extraction, RNase free water was first added into the lysate to bring the total volume up to 100 μL. 5 μL of PMSF (100 μM, Sigma) was added to the lysate and incubated for 10 minutes at room temperature to inhibit the activity of Proteinase K. Meanwhile, the magnetic beads were cleaned three times with 1X B&W buffer with 0.05% Tween-20 and dispersed into 100 μL of 2X B&W buffer (with 2 μL of SUPERase In Rnase Inhibitor). After adding 100 μL of the cleaned streptavidin beads suspension to the lysate, the mixture was incubated for 60 minutes at room temperature with gentle shaking. Afterwards, the beads were washed twice with 1X B&W buffer and 1X Tris buffer (with 0.1% Tween-20) once.
Template switch
After cleaning, the beads were resuspended into 132 μL of the template switch reaction mix, which consists of:
44 μL 5X Maxima RT buffer (Thermo Fisher),
44 μL of 20% Ficoll PM-400 solution (Sigma),
22 μL of 10 mM dNTPs each (Thermo Fisher),
5.5 μL of RNase Inhibitor (Enzymatics),
11 μL of Maxima H Minus Reverse Transcriptase (Thermo Fisher),
5.5 μL of a template switch primer (100 μM).
Template switch was performed first at room temperature for 30 minutes and then at 42 °C for 90 minutes. After reaction, the beads were pulled down using the magnetic stand and rinsed once with 500 μL 10 mM Tris plus 0.1% Tween-20, and then cleaned with 500 μL RNase free water.
PCR amplification
There are two separate PCR processes. In the first PCR, the cleaned beads with template switched cDNAs were first resuspended into the PCR mix, which contains 110 μL Kapa HiFi HotStart Master Mix (Kapa Biosystems), 8.8 μL of 10 μM stocks of primers 1 and 2, and 92.4 μL of water. The mix were aliquoted into 4 different PCR tubes, with each ~50 μL of solution. Then, PCR reaction was performed with the following steps: incubate at 95°C for 3 mins, then cycle five times at
98°C for 20 seconds,
65°C for 45 seconds,
72°C for 3 minutes.
After reaction, the beads were removed, and the supernatant was collected and pipetted into 4 new PCR tubes. A second PCR was performed by first incubating at 95°C for 3 minutes, then cycled 20 times at
98°C for 20 seconds,
65°C for 20 seconds,
72°C for 3 minutes.
The PCR product was kept in 4°C until next step.
Sequencing library preparation
To remove remaining PCR primers, the PCR product was purified using the Ampure XP beads (Beckman Coulter) at 0.6x ratio following standard protocol. The purified cDNA was then quantified by an Agilent Bioanalyzer High Sensitivity Chip. A Nextera XT Library Prep Kit (Illumina, FC-131-1024) was used to prepare the sequencing library. 1 ng of the cDNA was used as the starting material, and the library preparation is following manufacture protocols. The library was then analyzed by bioanalyzer again and sequenced using a HiSeq 4000 sequencer with pair-end 100×100 mode.
DBiT-seq data pre-processing
Read 1 of the raw sequencing data contains the transcriptome information, while Read 2 holds the UMI, Barcode A and Barcode B. Following ST pipeline v1.7.227, Read 1 was trimmed, filtered, STAR mapped (STAR version 2.6.0a) against the mouse genome(GRCh38) and annotated using Gencode release M11. Most of the default parameters were used when running ST pipeline, except that the “--min-length-qual-trimming” was set to 10. The final expression matrix has the location info as rows and gene expression levels as columns. R package “ggolot2” was used to plot the spatial heatmaps for pan-mRNA or individual genes.
UMI and Gene Counts comparison with other techniques
To compare with other NGs based spatial RNA-seq technique (Fresh Frozen tissue sections), we downloaded published data:
10X Visium: 151673_filtered_feature_bc_matrix.h5 (human cortex)
Slide-seq: Puck_180413_7(coronal hippocampus)15
Slide-seqV2: Puck_190926_03 (mouse embryo)18
DBiT-seq: Mouse embryo Brain E11 25μm resolution17
The total UMI and Gene counts were calculated for each of the spots(pixels) and then the violin plots for each technique were plotted side-by-side.
Pseudo bulk comparison with reference
The pseudo bulk data for FFPE samples were obtained by summing counts for each gene in each sample and divided by the sum of total UMI counts, and further multiplied by 1 million. Similarly, the pseudo bulk data was calculated using the E9.5-E13.5 embryo scRNA-seq data from Reference paper20.
Clustering with Seurat
We used Seurat V3.228, 29 to analyze the spatial transcriptome data of all the FFPE samples. Data integration and normalization were performed with the SCTransform workflow. The top 3,000 variable features were selected when doing data integration. For PCA analysis and UMAP visualization, the dimensions were set to 10, and the clustering resolution was set to 0.8. Differentially expressed genes for each cluster was obtained by comparison of cells in individual clusters against all remaining cells.
SpatialDE analysis
To study spatial patterns of gene expression, SpatialDE, an unsupervised automatic expression analysis tool was conducted for both adult heart and mouse embryo samples. Following standard workflow, SpatialDE identified >15 distinct spatial patterns in the mouse embryo sample. The results agree with Seurat pixel-based clustering results.
Cell type annotation
Cell type annotation was achieved by integration analysis (Seurat V3.2, SCTransform) combining the spatial transcriptome data of FFPE samples and the corresponding published scRNA-seq reference. After clustering, the spatial pixel data conformed well with the scRNA-seq data, and thus the cell types were assigned based on the scRNA-seq cell type annotation for each cluster (if two cell types presented in one cluster, the major cell types were assigned). SingleR is also used for aorta sample annotation with the built-in reference “MouseRNAseqData”30.
GO analysis
GO analysis was completed using the “GO Enrichment Analysis” module at http://geneontology.org/ with default settings. The biological process was ranked by the gene ratio and the top 3-5 biological process were plotted using the “dotplot” function in ggplot2.
Data sharing and codes
Data is available at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE156862. Codes for data analysis are available (https://github.com/rongfan8/DBiT-seq_FFPE).
SUPPLEMENTARY INFORMATION
Supplementary Figures
Supplementary Tables
Acknowledgements
This research was supported by Packard Fellowship for Science and Engineering (to R.F.), Stand-Up-to-Cancer (SU2C) Convergence 2.0 Award (to R.F.), and Yale Stem Cell Center Chen Innovation Award (to R.F.), National Science Foundation CAREER Award CBET-1351443 (R.F.), National Institutes of Health grants U54 CA209992 (Sub-Project ID: 7297 to R.F.), R01 CA245313 (R.F.), R33 CA196411 (R.F.), R33 CA246711 (R.F), and UG3CA257393, to R.F.). Y.L. was supported by the Society for ImmunoTherapy of Cancer (SITC) Fellowship. The molds for making microfluidic chips were fabricated at the Becton Nanofabrication Center at the Yale University. We used the service provided by the Genomics Core of Yale Cooperative Center of Excellence in Hematology (U54DK106857). Next-generation sequencing was conducted at Yale Stem Cell Center Genomics Core Facility which was supported by the Connecticut Regenerative Medicine Research Fund and the Li Ka Shing Foundation. It was also conducted using the sequencing facility at the Yale Center for Genomic Analysis (YCGA).
Footnotes
updated Figure 1d Slide-seqV2 reference data and related discussion in the main text.