Highly resolved spatial transcriptomics for detection of rare events in cells

Single-cell spatial transcriptomics technologies leveraged the potential to transcriptionally landscape sophisticated reactions in cells. Current methods to delineate such complex interplay lack the flexibility in rapid target adaptation and are particularly restricted in detecting rare transcripts. We developed a multiplex single-cell RNA In-situ hybridization technique, called ‘Molecular Cartography’ (MC) that can be easily tailored to specific applications and, by providing unprecedented sensitivity, specificity and resolution, is particularly suitable in tracing rare events at a subcellular level. Using a SARS-CoV-2 infection model, MC allows the discernment of single events in host-pathogen interactions, dissects primary from secondary responses, and illustrates differences in antiviral signaling pathways affected by SARS-CoV-2, simultaneously in various cell types.

events in host-pathogen interactions, dissects primary from secondary responses, and illustrates differences in antiviral signaling pathways affected by SARS-CoV-2, simultaneously in various cell types.

Introduction
Single-molecule RNA-fluorescence in situ hybridization (smFISH) together with single-cell RNA sequencing (scRNA-seq) techniques are the current pillars to record transcriptional responses to a disease or environmental agent and have proved highly useful in investigating e.g., Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), the agent responsible for the recent pandemic [1][2][3] . Several advanced multiplexed smFISH-based applications (MERFISH, Visium, Cartana, ZipSeq, SeqFISH, DSP) are available that provide analysis of spatio-transcriptional activities [4][5][6][7][8][9] . These technologies are crucial assets to better understand the functional heterogeneity of cells as well as cell autonomous and paracrine effects as inherent principles of physiological and disease conditions, and to correlate such indices to genetic and developmental differences 7,8,10 . However, they are often limited in the number of simultaneously tested samples, time-consuming in adapting protocols to new targets or limited in the resolution or sensitivity necessary to spatially map the transcriptional landscape at a subcellular level, particularly when tracing and dissecting rare transcriptional events 7,8,11-13 . In particular, separating timely or spatially close but disparate events, e.g. responses in primary and secondary infection events, remains highly challenging. Yet, understanding the response dynamics between such primary and secondary infected cells especially in comparison to non-infected bystander cells is crucial to capture the priming of bystander cells towards a collective antiviral response. Shedding light on this very process may help us to better comprehend the disparate infectivity rates and discordant inflammatory signaling observed in COVID-19 patients and its underlying pathogenesis on a molecular level 14 .
To bridge this gap, we developed 'Molecular Cartography' (MC), a highly resolved multiplexed smFISH technique that, through enhanced sensitivity and specificity, allows tracing of even scarcely expressed, single RNA molecules of up to 100 genes simultaneously at a subcellular resolution in multiple samples in parallel, or the analysis of up to 2400 genes in a single experiment. Using a set of fluorescently labeled probes that can easily be designed for specific applications, MC detects and codes single transcripts from individual genes in consecutive rounds that are decoded and counted by a specifically devised algorithm. The rapid design and flexibility to change these customizable probe sets, in particular, unlocks the analytical potential to dynamically adapt research hypotheses within a few days.
Apart from illustrating the advances in specificity, sensitivity and reproducibility, we demonstrate its feasibility by investigating heterogeneous gene expression patterns of key viral entry factors within clonal populations in lung, colorectal, and hepatocellular cell lines and compare primary and secondary pathogen host responses in an infection model of SARS-CoV-2 15 . We further investigate differences in overall infectivity and track pathways known to be affected by coronaviridae [16][17][18] .
Finally, by simultaneously detecting and localizing context-enriched targets, we directly trace the response trajectory starting at primary infection sites (INFS) and highlight the complex propagation network of SARS-CoV-2 19 .

Results
MC harbors the advantages of previous multiplexed smFISH based technologies but with increased flexibility (i.e. rapid adaptation to specific research questions), sensitivity and specificity compared to current applications including concurrent analysis of multiple samples. The previously designed and synthesized probes are hybridized to their target species (Fig. 1a) followed by repeated rounds of probe colorization with fluorophores specific for a certain target species. Transcriptspecific probes are designed using an algorithm developed specifically for MC by Resolve BioSciences that allows the design of probe sets of almost all genes of an organism. In-silico probe design is one of the most critical steps for spatial transcriptomics analysis. In general, a proper probe design influences uniformity, sensitivity and specificity which has been shown in previous applications targeting probe design e.g., for microarrays 20 . Probe design for accurate sensitivity and specificity was optimized in several iterations of MC experiments including the design algorithm, which ultimately allowed the reduction of the probe number per transcript.
For example, MC effectively quantifies short transcripts such as the human CCL8 transcripts (length ~850 nucleotides) or mouse interleukin (IL) 18 (length: ~700 nucleotides) in individual cells. This indicates that MC detects and quantifies even short transcripts or transcript variants.
In total, eight consecutive rounds of colorization, imaging, de-colorization and re-colorization build up a combinatorial color code that is specific for every target sequence. All combinatorial coding schemes have in common that the number of combinations increases with the length of a single code N. The length of the code is given by the number of detection rounds and number of fluorophores (x) used for the multiplex smFISH approach (x N ). Although the number of combinations increases with each round, we reduced the number of potential available codes therefore increasing the distance between the codes. This increases the Hamming distance, which in turn, results in several advantages. The large distance between codes allows for correction of the code, if a certain signal is not detected or wrongly assigned. An increase in code distances thus strongly improves specificity and sensitivity by reducing rates of misidentification due to error recognition and correction.
An example of three consecutive rounds of imaging (channel 1 of 2) is given in Fig. 1b. The first three images demonstrate how a color code is built up in three different rounds. Note that some fluorescent spots are detectable in all three rounds of the same channel, while others appear only in round 2 and/or 3, respectively. After de-colorization, individual spots disappear and re-appear depending on the code in the next colorization round, ultimately generating a unique molecular identifier per target (Fig. 1c). Although colorization and de-colorization is stable over more than ten rounds, we used only eight rounds for a faster analysis and to build up the code to obtain the most sensitive results. A software specifically developed for MC analyzes the images of all eight rounds to compute the color code and map the corresponding target transcript in xyz position within the sample. To facilitate high error robustness, the Hamming distance of the codes ensures the highest specificity of analyses (see Online Methods).
Together, this process delineates the spatial distribution of numerous target transcripts at subcellular resolution (Fig. 1d, close-up images in the middle and right panel; Supplementary Video 1,). Only two (SARS-CoV-2 Np and FURIN) of more than 80 targets are highlighted for better visualization. Infection by SARS-CoV-2 is indicated by detecting the number of nucleocapsid protein (Np) transcripts (magenta color).

Specificity, sensitivity and reproducibility.
Misidentification of RNA species is a major drawback in spatial transcriptomics. We therefore tested the specificity i.e. misidentification rate of MC including the entire colorization, imaging and decoding process by applying probe sets designed for human and mouse RNA transcripts to both human Hela and mouse NIH-3T3 cells and evaluated the rate of positive signals in both species. In total, we identified 860,000 human and 289,000 mouse transcripts in human and mouse cells, respectively (Supplementary Fig. 1). Overall, the fraction of spots decoded within the wrong cell type is 0.1 % (mouse codes in HeLa cells, Fig. 1e) and 0.55 % (human codes in mouse cells) and is slightly higher than the rate of false positives (hits in codes not in use) across three independent experiments. This experiment showed that the total MC process from probe design to hybridization, coding, and decoding showed an average specificity of 99.45 % to 99.9 %.
Additionally, transcriptomics approaches are subject to a drop in calling rate due to incomplete detection of target RNAs. Hence, to demonstrate its sensitivity, we compared MC with a single-round smFISH approach. To differentiate true positives and false positives detected by single-round smFISH, we used two different probe sets labeled by two different fluorophores for a certain transcript. Both probe sets bind in an alternating mode on the transcript to exclude a bias by alternative splicing or hairpin structure formation. In comparative experiments, we performed MC on several instruments and by different users. We tested the copy numbers of the transcripts APC and CENPF, and calculated the average counts per cell in each experiment. Both MC and smFISH resulted in similar sensitivities with regard to both genes (Fig. 1f). This indicates that MC preserves the sensitivity of a single smFISH experiment across multiple rounds of signal detection.
To further test for technical bias in MC experiments, we correlated the measured counts per RNA species per cell derived from two independent biological replicates in SARS-CoV-2 infected Huh7 cells and found an excellent correlation as determined by nonparametric Spearman correlation analysis (p<0.0001; Fig. 1g).
Together, the MC multiplex smFISH approach is highly suitable for detection, quantitation and localization of in particular rare transcripts in cells.

Landscaping heterogeneous expression patterns.
Given that clonal populations of cell lines exhibit severe transcriptional heterogeneity that is lost in bulk analysis, we used MC to investigate the expression patterns of multiple RNA species in individual cells 15,21 . We used SARS-CoV-2 infected Huh7 cells to target both highly and lowly expressed genes and investigated their preferential subcellular localization to either cytoplasm or nucleus. We observed high expression rates of genes associated with immunity, and in general antiviral signaling in the context of COVID-19 (for example JAK2, HIF1A and NFκB1, Fig.   2a) [22][23][24] . These highly abundant targets appeared rather homogeneously distributed and were primarily located in the cytoplasm, probably indicating ongoing translation.
In contrast, lowly expressed targets such as PDGFRA, FOSL1 or STAT5B showed strong heterogeneity in both expression level and localization (Fig. 2a). PDGFs and FOSL1 regulate cell proliferation and differentiation 25,26 , while the universal transcription factor STAT5B exhibits numerous biological roles including proliferation and autoimmunity 27 . These observations were confirmed by single cell analysis of n = 16 randomly chosen cells within the same population (Fig. 2b). Furthermore, distribution analysis of various targets revealed the cytoplasm as the primary localization site for the majority of probed targets, with the gene encoding the antiviral response receptor RIG-I (DDX58) and the interferon stimulated gene (ISG) C19orf66 more evenly distributed between the two localizations than for example transcripts of the STAT protein family (Fig 2c).
The presented data is derived from 2D images, which could result in misidentification of cytoplasmic targets as nuclei-located when positioned in the cytoplasmic space above the nucleus. However, since highly abundant RNA species, such as JAK2, were found to be almost exclusively located in the cytoplasm, we can exclude localization errors due to stochastic distribution of targets and conclude that the localization of target RNAs to different cellular compartments by MC is feasible.
Intriguingly, SARS-CoV-2 Np transcripts were primarily located in the cytoplasm yet close to the nucleus in a site-directed open or closed ring-conformation that suggests an association with the rough endoplasmic reticulum (rER) potentially indicating ongoing translation of Np particles in preparation for exocytosis of SARS-CoV-2 virions 28 (Fig. 2d, arrows, close-up image). These findings highlight the opportunities of using highly multiplexed spatial analysis for investigations in infectivity studies to capture processing of viral transcripts at their subcellular location.

Tracing antiviral responses in primary infected and bystander cells.
We SARS-CoV-2 Np particle counts decreased with increasing distance to the primary infected cell in Huh7 cells (Fig. 3a), which resembled the infection pattern in Calu3, Caco2 and PLC5 cells (Fig. 3b). Here, RNAs coding for the key viral entry factors ACE2, FURIN and TMPRSS2 are highlighted 29,30 . To assess a receptordependent susceptibility to SARS-CoV-2 infections including a potential associated virus-induced downregulation, we tested their expression patterns in infected and mock-infected cells (n = 13 individual cells per cell line) in relation to the SARS-CoV-2 Np particle count (Fig. 3c). We found only low expression levels of ACE2 across all tested cell lines despite active viral replication, supporting previous findings 14 .
However, ACE2 was the most prominent in Calu3 cells that also showed the highest amount of SARS-CoV-2 particles (normalized to total transcripts per cell). While at the dispense of innate immunity, this highly complex network regulates the expression of hundreds of interferon stimulated genes (ISGs) that are imperative to curb, and ultimately clear viral infections 18,33,34 . For example, sensing of double stranded RNA by RIG-1 (encoded by DDX58) and endosomal TLRs, activates IRF3, IRF7 and NFκB, which results in the expression of type-I interferons (IFNα). These trigger the phosphorylation and therefore activation of a diverse set of STAT family members that promote the transcription of effector ISGs such as MX1 or OAS1 in an auto-and paracrine fashion. This contributes to priming of neighboring cells towards a collective antiviral response 33 . We assessed SARS-CoV-2-mediated deregulation of these clusters and implicated genes by comparing the previously defined groups (Fig. 3g). Genes in cluster 1 remained widely unaffected by varying infection rates while cluster 2 and 4 were significantly deregulated in highly infected cells, yet diametrically opposite.
We next tested whether differences observed on the level of gene clusters are also represented in the expression of selected individual genes (Fig. 3h).
Surprisingly, DDX58 expression declined in group 1 compared to group 2 which may indicate virus-induced downregulation that was not observed for MX1. Furthermore, MUC1 and PDCD1, genes encoding cell surface proteins crucial in mediating inflammatory processes 35,36 , were significantly downregulated in group 1 suggesting interference of SARS-CoV-2 with diverse immune-regulatory processes beyond antiviral signaling. SCGB1A1, a component of cluster 4 that codes for a pulmonary surfactant protein, was recently suggested as a novel target for COVID-19 treatment due to its anti-inflammatory function 37 . Elevated levels of SCGB1A1 observed in Huh7 cells that were lacking in Calu3 and Caco2 cells may thus support previous findings on the severity of organ-specific manifestations of COVID-19, which reported rather low levels of SARS-CoV-2-mediated liver injury in non-hospitalized patients 38,39 , potentially due to mitigation of hepatic inflammation by SCGB1A1.
Moreover, the strong upregulation of STAT2 expression and the downregulation of the ISG IFIT1 in highly infected cells strongly supports viral interference at the level of nuclear translocation rather than transcription 40 . However, IFITs are also directly induced by IRF3 ensuring effective antiviral activity already during early stages of infection and are thus also expressed independently of type-I IFN signaling 41 .
Previous analysis demonstrated the application of MC in semi-bulk analysis at both the cell and target level. To further exploit the potential of MC, we set out to analyze expression levels of various genes on a single cell level (Fig. 3i). Notably, several genes were differently regulated on a single cell basis when compared to bulk analysis such as DDX58, SCGB1A1 and STAT2, further emphasizing the role of single cell analysis in infection studies.
Remarkably, MC allows the simultaneous analysis of multiple samples at once, which is pertinent for directly comparing antiviral responses across various cell lines. Targeting DDX58, MX1 and C19orf66 concurrently in all four cell lines (representative images in Fig. 3j, left panel) grouped by the infection state, we found highly dispersing expression signatures across cell types and groups indicating differential regulation (Fig. 3j, right panel). Overall, counts in group 2 seemed to be the least deregulated with exception of downregulated DDX58 and MX1 in Calu3.  . 4c). We selected genes for tracing which were located in ROI 1 either in quadrant Q1 (upregulated, red colors) or Q4 (downregulated, blue colors) and were correlated with a maximum deviation of a z-score of 0.5 (dashed line) to the x = y axis (solid line). Remarkably, genes upregulated in ROI 1 tend to normalize their expression in ROI 2 and seem to become downregulated in ROI 3, a trend that was enhanced in ROI 4. This pattern was inversely true for genes downregulated in ROI   46 , considering it to be the gold standard of spatial in-situ methods. Therefore, we compared MC with smFISH in their ability to detect two transcripts (APC and CENPF) against the background of multiple others and found their sensitivity to be indeed indistinguishable. This is supported by NGS data from RNAseq analysis that revealed an FPKM value of 7.6 for the APC gene and 37.1 for CENPF 47 . Since the coefficient of variation of APC detection was similarly low (~0. 12) compared to the more abundant CENPF transcript (~0.11, Fig. 1f MC provides several additional advantages, as it, for example, holds the potential to analyze multiple samples at once allowing direct side-by-side comparison; or appending immunostaining to exactly correlate transcriptomics data to expression patterns of related proteins. A pivotal asset of MC is the flexibility in target probe design that allows to quickly readjust working hypotheses, thus expediting day-to-day research. In practical terms, MC needs a single hybridization step of gene-specific probes with short steps of de-and re-colorization, followed by automated imaging. Although the MC experiments reported in this study were limited to a number of about 80 target genes related to SARS-CoV-2 induced cellular responses, a broad spectrum of future applications is envisaged. Overall, MC is very flexible and highly suitable for multiple qualitative and quantitative applications of spatial transcriptomics approaches; particularly in studies targeting rare transcripts by providing utmost specificity, sensitivity, and reproducibility at high resolution (see Fig. 1e to g). By multiplexed detection of these transcripts at high resolution, MC promotes our ability to study subtle yet Probe design. Multiple probes per targeted transcript were designed using an automated probe-designer software from Resolve BioSciences. Briefly, the probedesign was performed at the gene-level using all full-length protein-coding transcript sequences from the ENSEMBL database tagged as 'basic' 52,53 . To speed up the process, the calculation of computationally expensive off-target searches, highly repetitive regions were filtered via the abundance of k-mers in the background transcriptome using Jellyfish 54 . A probe candidate was generated by extending a seed sequence until a certain target stability was reached. A set of rules was applied to discard sequences containing homopolymers, di-and trinucleotide repeats and strong compositional bias (e.g., lack of one base).
After these fast screens, every kept probe candidate was mapped to the background transcriptome using ThermonucleotideBLAST 55 and probes with stable off-target hits were discarded. Specific probes were then scored based on the number of on-target matches (isoforms), which were weighted by their associated APPRIS level 56 , favoring principal isoforms over others. A bonus was added if the binding-site was inside the protein-coding region. From the pool of accepted probes, the final set was composed by picking the highest scoring probes.
Sample preparation. Sample preparation of cultured cells on the slides was performed as described in the handbook Cell preparation for Molecular Cartography.
Briefly, approximately 1 x 10 4 cells (defined per cell line) were seeded per well on MC slides and grown to 30-50 % confluence. The cells were grown for 1 or 2 days at 37 °C in the appropriate medium. After the medium was aspirated, cells were washed and fixed using methanol at -20 °C for 10 min. Slides were dried and stored at -80 °C.
Hybridization of transcript specific probes. All steps were performed exactly according to the protocol of the Molecular Cartography Hybridization (Cells) Kit.
Briefly, the cells were pretreated with 70 % ethanol followed by solution BSC1, and wash buffer 1. Thereafter, gene-specific probes were hybridized for 15-18 h at 37 °C in pre-warmed hybridization solution. Subsequently, the cells were washed twice using wash buffer 2 for 10 min each before the temporary addition of wash buffer 1 before color development. The storage buffer may be applied to store the slide at 4° C for up to two weeks. All buffers, solutions, and reaction conditions are described in the MC Hybridization (Cells) protocol.
Color development and Decolorization. The color code of the hybridized probes develops over eight cycles to efficiently distinguish the individual transcripts. Therefore, the following protocol was repeated eight times. For round 1, colorsolution 1 was prepared to contain the reagent CR1. After incubating the hybridized cells with color-solution 1 for 45 min at 25° C, a washing step was performed to eliminate excess reagent. Thereafter, color-solution 2 was applied containing the reagents CD1 and CD2 and incubated for 45 min at 25° C. Subsequently, cells were washed three times followed by the addition of imaging buffer and microscopic imaging. After the 1 st round of imaging, a second round is initiated using reagent CR2. Before the second round, a decolorization of 1 st round color is performed. For decolorization, the cells were washed five times using pre-warmed wash buffer 1 for 6 min at 41° C before proceeding with the next colorization round using a freshly prepared color solution 1 containing the CR2 reagent. Washing steps, the incubation with color-solution 2, and imaging is performed as described above. For the following rounds, the reagents CR3 to CR8 are used. All steps, buffer ingredients, colorization, Extraction of features from the bc-images. In a first step, a target value for the allowed number of maxima was calculated based on the area of the slice in µm² multiplied by an empirically optimized factor (0.5x). The resulting target value was used to adapt the threshold for the algorithm iteratively searching local 2D-maxima.
The threshold leading to the closest number of maxima equal to or smaller than the target value was used for further steps and the respective maxima were stored. This procedure was done for every image slice independently. Maxima that did not have a neighboring maximum in an adjacent slice (called z-group) within a radius of one pixel were excluded. For the resulting list of maxima, the absolute brightness (Babs), the local background (Bback), and the average brightness of the pixels surrounding the local maximum (Bperi) were measured and stored. The resulting maxima list was further filtered in an iterative loop by adjusting the allowed thresholds for (Babs-Bback) and (Bperi-Bback) to reach a feature target value based upon the total volume of the 3D-image. Only maxima still in a z-group with a size of at least 2 were passing the filter step. Each z-group was counted as one hit. The members of the zgroups with the highest absolute brightness were used as features and written to a file. These features resemble 3D point clouds.

Iterative closest point algorithm for determination of transformation matrices.
To align the raw data images from different imaging rounds, these images had to be Profiles were then further filtered based upon similarity between best and secondbest match, as well as by their ratio of summed signals that are contributing to the matched profile (constructive signal) to summed signals that were not expected (noise signal) based on the encoding schema. The match with the highest score (constructive signal minus noise signal) was assigned as an ID to the pixel. Hence, genes with strongly correlated expression patterns appear closer, while distantly correlated genes appear further apart. These distances were used for the construction of a pairwise correlation matrix. Groups with at least four genes with substantial covariations were selected for further analysis.
Data analysis and visualization. Analysis of raw data was performed using excel (Microsoft, Washington, US) and GraphPad Prism (version 8.4.3, 59 ) for statistical analysis and graphical representation. Data was tested for normal distribution using a Shapiro-Wilk normality test. Since no data set passed the normality test, we performed statistical analysis using non-parametric Spearman rank test (two-tailed) for correlation analysis and a Kruskal-Wallis test with Dunn`s multiple comparison test for data sets with group or target comparison as indicated in the figure legends.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability.
Data underlying the reported findings is available from the corresponding author on request.

Acknowledgement
The authors are grateful to Penelope Kungl and Jason Gammack for editing the manuscript.

Author contributions
S.G., C.K. and K.Z. wrote the manuscript. S.G. prepared the figures. C.K. and K.Z.  Tables   Table 1: Probes (including catalogue numbers) used for infection experiment.