Abstract
The identification of spot-like structures in large and noisy microscopy images is an important task in many life science techniques, and it is essential to their quantitative performance. For example, imaging-based spatial transcriptomics (iST) methods rely critically on the accurate detection of millions of transcripts in low signal-to-noise ratio (SNR) images. While recent developments in computer vision have revolutionized many bioimage tasks, currently adopted spot detection approaches for iST still rely on classical signal processing methods that are fragile and require manually tuning. In this work we introduce Spotiflow, a deep-learning method that casts the spot-detection problem as a multiscale stereographic flow regression problem that yields subpixel-accurate localizations. Spotiflow is robust to different noise conditions and generalizes across different chemistries while being up to an order-of-magnitude more time and memory efficient than commonly used methods. We show the efficacy of Spotiflow by comprehensive quantitative comparisons against other methods on a variety of datasets and demonstrate the impact of its increased accuracy on the biological conclusions drawn from iST and live imaging experiments. Spotiflow is available as an easy-to-use Python library as well as a napari plugin at http://www.github.com/weigertlab/spotiflow.
Introduction
Many methods in the life sciences generate images in which the detection and localization of spot-like objects is a crucial first analysis step for more complex downstream tasks, a problem commonly referred to as spot detection [1–4]. While spot detection has been the computational basis of many methods in genomics over the last decades [5, 6], the advent of imaging-based spatial transcriptomics (iST) has recently brought this problem to a significantly more challenging and computationally demanding domain [7] (Fig. 1a). In iST, RNA molecules are located in situ in large tissue sections during sequential imaging cycles, to generate gene expression maps at subcellular resolution [8–10]. Popular iST techniques such as MERFISH [8], seqFISH [10] or HybISS [9] require detection of millions of spots in gigabyte-sized images with high accuracy, high sensitivity, and computational efficiency. Due to the preservation of the native tissue context, any spot detection method has to address multiple imaging challenges such as autofluorescence background, aspecific probe binding, or inhomogeneous spot density (cf. Supp. Video 1). High sensitivity and accuracy are particularly important as for most iST methods transcript identity is combinatorially encoded in the sequence of multiple multi-channel images generated across different imaging rounds [8, 9]. As a result, suboptimal spot detection performance in one channel or imaging round can cause a significant drop in sensitivity and transcript identity misattribution [11].
a) Depiction of a common processing pipeline for imaging-based spatial transcriptomics (iST) data, in which spot detection is a critical step. b) Spotiflow is trained to detect spots from microscopy images via two different synergic tasks, multiscale heatmap regression and stereographic flow regression.c) The ground truth objects to be regressed are computed from point annotations {pk}. First, a full-resolution Gaussian heatmap Y(0) is obtained by generating isotropic Gaussian distributions of variance σ2 centered at spot locations. This Gaussian heatmap is further processed to obtain lower resolution versions, yielding multiscale heatmaps Y(l), which are all regressed. Second, a local vector field V = { νij } is built by placing a vector directed to the closest spot center at every pixel of the image. We obtain the stereographic flow by computing, position-wise, the inverse stereographic projection f of the local vector field. d) Benchmarking of spot detection methods on different datasets, grouped by their modality (Synthetic, FISH, Live cell imaging). Shown is the distribution of F1 scores per image in the test set of every dataset (higher is better, cf. Supp. Note 3.2). Each method was trained and tested individually in each dataset except Spotiflow (general), which was trained on all datasets. A sample training image is depicted under each dataset. e) Runtime (top) and memory (bottom) assessment for different methods at different image sizes. Parameters of each method were calibrated so that the amount of detections were in the same order of magnitude. * RS-FISH not shown for sizes >32k due to Java size-related limitations. RS-FISH memory was not profiled as the implementation is not in Python. ** deepBlink could not be run for sizes > 4k due to GPU memory limitations. f) F1 score on live-cell dataset Terra after fine-tuning a Spotiflow model pre-trained on synthetic-complex with an incrementally increasing number of out-domain training images from Terra.
Commonly used spot detection pipelines for iST often rely on classical threshold-based methods such as Laplacian-of-Gaussian (LoG) [12, 13] or radial symmetry [14]. While these approaches perform well on simulated or relatively clean data, they often struggle with realistic images that exhibit artifacts, autofluorescence, and varying contrast (cf. Results). While few deep learning-based methods have been proposed for this task [15– they are often hard to use and don’t provide subpixel accuracy, with the notable exception of [17]. Consequently, currently used iST spot detection methods often lack robustness to challenging image conditions, are computional inefficient for large images, and require manual parameter (e.g. threshold) tuning for every channel and imaging round, which limits their applicability in large-scale iST experiments.
Here we introduce Spotiflow, a deep learning-based, threshold-agnostic, and subpixel-accurate spot detection method that outperforms other commonly used methods on a variety of iST and non-iST modalities while being up to an order-of-magnitude more time and memory efficient. Spotiflow is trained to predict multiscale Gaussian heatmaps and exploits a novel stereographic flow regression task from which sub-pixel accurate detections are obtained (Fig. 1b,c). Our method generalizes well to unseen samples and removes the requirement of manual threshold tuning in typical end-to-end iST workflows. Spotiflow is available as an easy-to-use Python library as well as a napari [19] plugin (cf. Supp. Video 2).
Results
To compute spot coordinates from a given microscopy image, Spotiflow uses a convolutional neural network (U-Net [20]) that is trained to predict two distinct but synergetic targets: Gaussian heatmaps and the stereographic flow(cf. Fig. 1b, Supp. Fig. 1). The first target, Gaussian heatmaps[21], are real-valued images of different resolutions in which each pixel can be interpreted as the probability of that position being a spot center (cf. Fig. 1c, Supp. Fig. 1, Supp. Note 3.1). We predict a multiscale hierarchy of heatmaps by processing their respective network decoder feature maps, which jointly contributes to the optimized training loss. We found this approach to be beneficial for training convergence, especially when only few spots are present (cf. Methods, Supp. Fig. 1, Supp. Note 1). The second target, which we denote stereographic flow, is a problem-adapted representation of the closest-spot vector field that, for every position, points to the closest spot. The stereographic flow is defined as the inverse stereographic projection of the two-dimensional local offset vector field in ℝ2 onto the unit three-dimensional sphere S2. Crucially, this embedding maps all offsets for points far away from spot locations to a common value (the south pole of the unit sphere) therefore avoiding the problem of indeterminate offset prediction for distant locations (cf. Fig. 1c, Methods, Supp. Fig. 2, Supp. Note 3.1, Supp. Video 3). To produce the final spot coordinates from a given prediction, we use the peaks of the highest-resolution heatmap to obtain preliminary spot locations, which we refine with the inverted stereographic flow (cf. Methods), achieving subpixel precision and substantially lower localization errors (cf. Supp. Table 1).
We systematically assessed the performance of Spotiflow on multiple datasets in comparison with other commonly used methods. Specifically, we compared against the Laplacian-of-Gaussian (LoG/starfish) implementation used in the popular iST framework starfish [12], Big-FISH [13], the radial symmetry based method RS-FISH [14], and the deep learning-based method deepBlink [17]. We first generated two synthetic datasets of diffraction-limited spots (cf. Fig. 1d): one using a simple Gaussian PSF model (synthetic-simple), and another using a more realistic image formation model including autofluorescence and optical aberrations (synthetic-complex, Methods). We found that on synthetic-simple all methods achieved close to perfect scores (F1-score of 0.967–0.995), which is expected due to the limited complexity of the simulated images (Fig. 1d, Supp. Table 3). However, on the more realistic synthetic-complex dataset, classical methods showed a substantial performance drop (F1 = 0.758 − 0.836) while Spotiflow achieved the best detection accuracy (F1 = 0.929) followed by the other deep-learning based method (deepBlink, F1 = 0.915). This demonstrates the advantages of our learned approach for more complex datasets and highlights the limitations of overly simplistic simulations in benchmark scenarios. Similarly, when generating images at different noise conditions and spot densities, we found Spotiflow to consistently outperform other methods, demonstrating its effectiveness in adverse imaging conditions (Supp. Fig. 3).
We next assessed Spotiflow on several publicly available and in-house generated manual annotated datasets from multiple iST modalities (MERFISH, HybISS, smFISH, cf. Supp. Table 2, Supp. Fig. 4). We observed that Spotiflow again outperforms all other methods, including deepBlink, achieving a consistently high detection rate and localization accuracy on all modalities (cf. Fig. 1d, Supp. Fig. 5, Supp. Note 3.3). The performance difference to threshold-based classical methods is particularly prominent for the HybISS and MERFISH datasets, which contain substantial background signal (F1 = 0.861/0.796 vs. 0.531/0.790 for e.g. LoG-starfish). Interestingly, a Spotiflow model jointly trained on all diverse datasets (general) achieves an almost equal performance to models trained on individual datasets, demonstrating the inherent capacity of Spotiflow to capture diverse image characteristics in a single model. We evaluated the utility of Spotiflow for non-iST modalities by annotating two datasets from single frames of live-cell recordings of HeLa cells with labeled telomeres and the telomeric repeat TERRA (cf. Methods). As before, Spotiflow outperforms all other methods both in detection quality and localization accuracy (cf. Fig. 1d), demonstrating the general applicability of our method beyond the iST domain. In relation to the training data requirements of the model, we observed that fine-tuning based on synthetic data can substantially reduce the annotation requirement for novel out-domain datasets, quickly approximating the accuracy of our benchmark training dataset composed of hundreds of annotated 512 × 512 images. Specifically, when we fine-tuned a Spotiflow model initially pre-trained on synthetic-complex with an incrementally increasing number of out-domain training images from the live-cell dataset Terra, we found that already as few as four training images sufficed to achieve good accuracy (F1-score 0.738 vs. 0.174 when training from scratch, Fig. 1f, Supp. Fig. 6). This result underscores the efficiency of Spotiflow models in adapting to different modalities and out-of-distribution samples with minimal annotation, facilitating rapid adoption by end-users.
We next investigated the generalizablility of pre-trained models to the variability in sample types, which can encompass differing signal-to-noise ratios, unique artifacts, and distinct background features. We trained a Spotiflow model on HybISS-processed mouse embryonic brain sections and applied it on a variety of out-of-distribution HybISS samples originating from different tissues and probesets (mouse embryonic limb, frog tadpole developing limb, mouse gastruloid, radial glia progenitor cell cultures). Even though these images exhibit noticeably different structures with varying backgrounds and contrast compared to the training images, we found that the pre-trained model yielded qualitatively excellent transcript detection results without the need of any threshold-tuning (Fig. 2a,b and Supp. Fig. 7).
a, b) Predictions of a pretrained Spotiflow model on two out-of-distribution samples, a mouse embryonic developing limb (a) and a frog tadpole developing limb (b). The Spotiflow model was trained on the HybISS dataset, consisting only of images of embryonic mouse brain embryos. c) Gene expression maps based on Spotiflow of an E12.5 mouse embryo brain processed using HybISS. Five different genes (Clu, Cyp26b1, Irs4, Rax, Wnt8b) involved in neurodevelopment overlaid on the DAPI channel are displayed. The Spotiflow model used was trained on the HybISS dataset. d) Comparison of gene expression maps based on Spotiflow vs. the default LoG Starfish Starfish [12] detector of an E12.5 mouse embryo brain processed using HybISS. Depicted are results for three different genes (Sfrp, Foxg1, Hoxb3). The Starfish detector is run at three different thresholds (0.2, 0.01 and 0.138, the latter being the optimal on the HybISS training dataset) as well as the Spotiflow model trained on HybISS (using the default threshold). The last column contains an ISH reference of similar sections from the Allen Brain Atlas (ISH) for the three depicted genes. e, f) Runtime (e) and memory (f) assessment of both methods in an end-to-end setting. Depicted are wall-clock time (e) and peak CPU memory usage (f). g) Live-cell acquisition of HeLa cells with labeled Telomeres (orange). h) Quantification of Telomere track length on three different experiments using deepBlink and Spotiflow to detect spots per frame which are then tracked using TrackMate [24]. Telomeres are expected to be stable throughout the movie, thus longer tracks are expected. i) Quantification of number of frames where a track does not contain any detected spot (gap fractions). Smaller gap fractions indicate more stable detections.
We then assessed the impact of the increased robustness and accuracy of Spotiflow for a full end-to-end iST experiment (cf. Methods) by using a starfish gene decoding pipeline where we swapped the spot detection component from the default LoG detector to Spotiflow. We processed sections of developing mouse brains at different timepoints, E12.5 (cf. Fig. 2c) and E13.5 (cf. Fig. 2d), using HybISS to spatially resolve 199 genes involved in neurodevelopment (cf. Methods). The resulting gene expression maps obtained with Spotiflow show a gene-dependent spatial pattern that is consistent with previous results (cf. Fig. 2c, Fig. 2d, Supp. Fig. 8). While for intensity-based methods (e.g. LoG/starfish) the quality of the obtained gene expression maps is highly sensitive to the used threshold and thus requires channel-specific threshold choices, Spotiflow is threshold agnostic and does not require any manual tuning (cf. Fig. 2d, Supp. Fig. 8). In addition, we found that in this end-to-end iST setting, Spotiflow is an order of magnitude more time and memory efficient than the default starfish pipeline, especially for large images (cf. Fig. 2e,f).
We hypothesized that the content-awareness of Spotiflow could be leveraged to solve tasks which are infeasible for classical spot detection methods. To explore this we examined whether Spotiflow could effectively differentiate between transcript-derived spots and spot-like patterned autofluorescence structures, such as those from lipofuscin [22, 23], that often render data collected from adult brain unusable. Applying Spotiflow to a HybISS-processed adult mouse brain section with a specific bootstrapping scheme (cf. Methods, Supp. Fig. 9) we achieved a 3x decrease in the number of autofluorescent spots detected in one channel (Supp. Fig. 9), thus substantially reducing the amount of noise in the expression maps obtained from iST. This is particularly notable considering that such a discrimination task is challenging even to experienced human annotators.
Finally, we demonstrate Spotiflow’s flexibility to accurately detect spot-like structures in fluorescence microscopy images outside the iST domain. Concretely, we consider single-molecule detection and tracking of both telomeres and noncoding RNA molecules (TERRA) in live-cell time lapses of HeLa cells (cf. Methods, Fig. 2g). These images present different challenges compared to iST images, such as photobleaching causing the temporal decrease of image contrast, non-specific dot-like structures inside the cell nucleus, and unspecific signal that can lead to erroneous and unrealistic short tracks (Fig. 2g). After detecting spots with Spotiflow, we tracked them using TrackMate [24] (cf. Methods). We also detected spots with deepBlink and tracked them for comparison. For both telomeres and TERRA, the robustness of Spotiflow’s detections at changing imaging conditions led to longer, more consistent (gap-free) tracks compared to deepBlink (cf. Fig. 2h, Fig. 2i, Supp. Fig. 10, Supp. Video 4), demonstrating the significant impact on the estimates of biological parameters which a more accurate molecule detection method is able to provide.
Training a Spotiflow model is fast (∼1h on a single GPU) and our implementation based on PyTorch[25] is an order-of-magnitude faster and over three times more memory efficient than commonly used methods especially for larger images, with e.g. prediction time of 80s for an image of size 32k × 32k vs. 1000s for LoG/starfish (cf. Fig. 1e, Supp. Note 3.4). To facilitate the adoption by end-users, we provide extensive documentation, distribute Spotiflow as an easy-to-use napari[19] plugin, and provide several pre-trained models that can be used out-of-the-box for a variety of iST modalities (cf. Supp. Video 2).
Discussion
In summary, Spotiflow delivers high-quality detections across a variety of iST modalities and surpasses both commonly used and recently proposed methods. Notably we demonstrated on a diverse set of real benchmark datasets that assessments relying solely on simple synthetic data are insufficient to evaluate performance regarding real-world applications. Spotiflow generalizes well to out-of-distribution samples, does not need any manual tuning in end-to-end iST experiments, and is an order of magnitude more efficient than other methods to process whole samples. One limitation of Spotiflow is it being currently tailored for two-dimensional data. However, both multiscale Gaussian heatmaps and the stereographic flow can be naturally extended to n dimensions, and thus Spotiflow can be extended to detect spots in n-dimensional volumes, including 3D. Finally, our live-cell imaging experiments indicate Spotiflow’s flexibility to various fluorescence microscopy modalities, and we foresee its broad utility to other imaging-based methods where localized structures need to be detected. We finally anticipate that the presented stereographic flow will impact other areas where prediction of dense vector fields has successfully been applied (e.g. cell segmentation [26, 27]).
Methods
Spotiflow
Architecture overview
Given an input image and the corresponding spot center annotations, a U-Net [20] is trained to predict two different sets of outputs which encode the location of spots in the image: first multiscale probability heatmaps, and second, the stereographic flow (cf. Fig. 1b, Supp. Fig. 1). During training, the overall loss function optimized is
where ℒheat is the multiscale heatmap loss (see below), ℒflow the stereographic flow loss (see below) and 𝟙Spot is a pixel-wise indicator function which takes the value 1 if the pixel is very close to a spot location (closer than some cutoff distance ε) and 0 otherwise. λ ∈ ℝ is used to increase the loss contribution near spot centers (cf. Supp. Note 3.1). We set λ = 10, ε = 5px when training all our models.
Multiscale heatmap regression
Let X ∈ ℝw×h denote the input image and {pk} with pk ∈ ℝ2 denote the ground truth spot center annotations. We first build a full resolution probability heatmap Y ∈ ℝw×h by generating a Gaussian distribution of variance σ2 centered at every spot so that the probability map exponentially decays around the annotated centre.
Note that instead of summing the individual Gaussian distributions, we take the maximum value at each pixel to create sharp boundaries between spots.
We further generate the heatmaps at L different resolution levels, where level l = 0 denotes the highest resolution and l = L − 1 the lowest. In order to generate a heatmap at resolution level l (Y(l)) from l − 1 (Y(l−1)), we apply max pooling (with a downsampling factor of 2) to Y(l) and then process the result with a Gaussian filter of variance with a scaling prefactor of
, which effectively increases the variance of distributions and ensures the dynamic range of the heatmap is in the interval [0, 1] (cf. Supp. Fig. 1).
The U-Net backbone is then trained to regress all heatmaps at the different scales (multiscale heatmap regression, cf. Fig. 1c, Supp. Fig. 1). We achieve this by adding a loss term at different stages in the decoder whose size correspond directly to the target to be regressed. More specifically, let D(i), i ∈ [1, L], denote the feature maps at the output of the i-th decoder stage in the U-Net. We process D(i) with a lightweight convolutional module to compute the prediction Ŷ(L−i) (cf. Supp. Fig. 1). A pixel-wise loss term is then computed between the ground truth heatmap Y(l) and the prediction Ŷ(l) at every resolution level l with the binary cross-entropy loss. We then aggregate this terms in the overall objective function for the multiscale heatmap, ℒheat:
Stereographic flow
For each pixel (i, j) ∈ ℤ2 of the image X, we first define a local vector field V = {vij} = {(vx, vy) ∈ ℝ2} given by the vector from the pixel to the nearest ground truth spot (cf. Fig. 1b, Supp. Fig. 2). To induce stability and improve modelling at points far from spot locations, we make use of a scaled inverse stereographic projection defined as
with
and
where s ∈ ℝ+ is a fixed length scale (we set s = 1). We define the stereographic flow
as the result of applying f to each component of the local vector field νij. Effectively, we represent each element of the local vector field as a point on the unit 3D sphere S2 (note that this generalizes to arbitrary dimensions). In particular, f maps the zero vector (0, 0) to the north pole (0, 0, 1) and all vectors with infinite length (“points at infinity”) to the south pole (0, 0, −1). The stereographic flow is computed using an extra lightweight convolutional module operating at the highest resolution (cf. Supp. Fig. 1). The corresponding loss function is a pixel-wise weighted L1 loss ℒflow between the ground truth stereographic flow
and the prediction
:
Note that, by construction,
and
lie on the three-dimensional unit sphere S2 yielding a bounded target to be regressed.
Let S’2 = S2 \ {(0, 0, −1)} denote the set of all points in the unit sphere S2 but the south pole. The stereographic flow can be analytically inverted position-wise by applying the stereographic projection:
We note that despite the stereographic projection being undefined at the south pole (0, 0, −1), in practice we only invert the stereographic flow at positions that are close to a spot, which are embedded far from the south pole.
Inference
To retrieve the spot centers from the two outputs of the network (i.e. multiscale heatmaps and stereographic flow), we first detect all local maxima in the highest resolution predicted heatmap Ŷ(0). These local maxima are filtered so than only those above a specific threshold (probability threshold t ∈ [0, 1]) are kept. This threshold is optimized right after training on the validation data and thus does not need to be set explicitly during inference. This procedure results in a set of points {(xk, yk)} where xk, yk ∈ ℤ.
These points are then refined using the stereographic flow to achieve subpixel precision by adding the corresponding predicted vector at every position. Specifically, let denote the pixel-wise stereographic projection of the predicted stereographic flow
, so that the vector at position
. We generate the final set of points {pk}, which correspond to the spot centers, as
, where {(xk, yk)} are the local maxima extracted from the full resolution heatmap and pk ∈ ℝ2, thus allowing the prediction of non-integer (subpixel-precise) spot centers.
Spot detection benchmarking
Datasets (synthetic)
The dataset synthetic-simple was generated by randomly generating spot locations and placing Gaussian distributions with σ = 1.5 and varying intensity on a blank image, after which Poisson and Gaussian noise is added. The dataset synthetic-complex was generated similarly but instead of Gaussian spots we simulated realistically aberrated point-spread functions (PSFs) using the approach described in [28] and added fluorescence DAPI background, Gaussian noise, Perlin noise and Poisson noise at different levels yielding images with different SNRs across the dataset. Different densities (i.e. number of spots) were used to mimic different sparsity of real data.
Datasets (real)
We gathered the datasets HybISS, Terra and Telomeres by randomly cropping square tiles of width 512 and/or 1024 from different acquistions (see below). The dataset MERFISH was compiled by using raw images from [29] which we hand-annotated with napari [19]. In order to speed up the annotation process, we used initial solutions obtained from LoG and/or other Spotiflow models that we iteratively refined by adding, removing and/or moving the proposed spot centers. Different contrasts were considered when annotating to take into account potential uneven illumination. The annotated smFISH dataset was used as released in [17].
Dataset preprocessing
Images were preprocessed equally for each method by normalizing them using percentile-based normalization:
where Ip denotes the p-th percentile of the image intensity. We set pmin ∈ {1, 3} and pmax = 99.8 throughout our experiments.
Parameter tuning
Parameters specific to LoG (intensity threshold) and Big-FISH (variance of the filters) were optimized on the training split of each dataset. We did not optimize the intensity threshold on Big-FISH as the software has a custom threshold optimization procedure which works on an image-by-image basis. For RS-FISH, we optimized its parameters on the test split (thus overestimating its performance) due to the high computational load required and the large number of parameters that can be tuned (cf. Supp. Note 3.3). Learning methods (deepBlink and Spotiflow) were trained on the training split using their default configuration without performing any hyperparameter tuning (cf. Supp. Note 3.3). All reported scores are on the test split of the datasets.
Spotiflow (general) model
In order to assess the potential capacity of Spotiflow models, we trained the general model on a dataset gathered by merging all real datasets (HybISS, MERFISH, smFISH, Telomeres, Terra) as well as the dataset synthetic-complex.
Metrics
To compute overall detection metrics for each image, we first uniquely match ground truth {pi} and predicted spots according to their spatial proximity via hungarian matching [30]. We then define a spatial cutoff c ∈ ℝ and count a matched pair
as true positive (TP) if their Euclidean distance
, a predicted spot
as false positive (FP) if there was no matched ground truth spot, and a ground truth spot p as false negative (FN) if there was no matched predicted spot. We then define the following metrics for each image
We also report the F1AuC [17] (cf. Supp. Note 3.2), which takes into account different spatial cutoffs:
where Δ is a constant defined as ck+1 − ck for any k ∈ [L, H). Finally, we adapt the Panoptic Quality segmentation metric [31], which incorporates the spatial localization accuracy, to the spot detection task. We refer to it as the Panoptic Localization Quality (PLQ):
where LA is the localization accuracy (cf. Supp. Note 3.2). We report results at c = 3, cL = 1 and cH = 5.
Scalability assessment
In order to assess the scalability of different methods (cf. Fig. 1e, Supp. Note 3.4), we used images of different sizes which were all obtained by consecutively expanding a center crop from a full HybISS cycle (see below). Given the dependency of intensity-based methods on the number of spots, their parameters were set so that the number of detections were in the same order of magnitude across all methods. Results were obtained using the Python-based profiling tool Scalene for all methods but RS-FISH, whose results have been obtained by the Unix time command. Intensity-based methods were run on an AMD Ryzen Threadripper PRO 5965WX 24-Cores CPU with 256GB of memory. Learning methods (deepBlink and Spotiflow) were run on an NVIDIA GeForce RTX 4090 GPU (24GB).
Spatial transcriptomics experiments
Tissue collection and preparation
All animal procedures were in accordance with the Swiss Federal Veterinary Office guidelines and as authorized by the Cantonal Veterinary Authorities and the Cantonal Commission for Animal Experimentation under the following licenses: cantonal animal license number VD3651 and national animal license number 33167 for mouse samples as well as cantonal animal license number VD3652c and national animal license number 33237 for frog tadpole samples.
Mouse embryos and frog tadpole samples
Mouse embryos at E12.5 and E13.5 were collected from wild-type CD1 pregnant mice dissecting out from the uterine horn in ice-cold PBS. Nieuwkoop and Faber (FB) stage 58 frog tadpole samples were collected in PBS. Immediately after collection, fresh tissues were cryopreserved in optimal cutting temperature (OCT) and stored at -80 °C until sectioning. Tissues were sectioned with a cryostat (Leica CM3050 S) at 10 μm, placed on SuperFrost Plus microscope slides, and stored at -80 °C until HybISS processing.
Mouse gastruloid generation
Gastruloid generation was performed as previously described in [32]. Briefly, mouse embryonic stem cells (mESCs) (EmbryoMax 129/SVEV, gifted by Denis Duboule Lab) were cultured in gelatinized tissue culture dishes with 2i LIF DMEM medium consisting of DMEM + GlutaMAX (Gibco 61965-026) supplemented with 10% mES-certified FBS (Gibco 16141-079), non-essential amino acids (Gibco 11140-035), sodium pyruvate (Gibco 11360-039), beta-mercaptoethanol (Gibco 31350-010), penicillin/streptomycin (Gibco 15140-122), 100 ng ml-1 mouse LIF (EPFL Protein Facility), 3 μM CHIR99021 (Calbiochem: 361559) and 1 μM PD0325901 (Selleckchem S1036). Cells were passaged every 2-3 days and maintained in a humidified incubator (5% CO2, 37°C). mESCs were collected after trypsin treatment, washed, and resuspended in prewarmed N2B27 medium (50% DMEM/F12 (Gibco 31331-028), 50% Neurobasal medium supplemented (Gibco 21103-049) with 0.5x N2 (Gibco 17502-048), 0. 5x B27 (Gibco 17504-044), non-essential amino acids (Gibco 11140-035), sodium pyruvate (Gibco 11360-039), beta-mercaptoethanol (Gibco 31350-010), 0.5x Glutamax (Gibco 35050-061) and penicillin/streptomycin (Gibco 15140-122). A total of 300 cells were seeded in 40 μl of N2B27 medium in each well of a 96-well plate with a rounded bottom and low adherence (Thermo Fisher, 174925). Forty-eight hours after aggregation, 150 μl of N2B27 medium supplemented with 3 μM CHIR99021 was added to each well. A total of 150 μl of medium was replaced every 24 h. Gastruloids were collected and flash-frozen 120 h after aggregation.
Radial glia progenitor culture
E11.5 mouse brains were collected from wild-type CD1 pregnant mice in ice-cold EBSS (14155-048, Life-technologies). Meninges were removed using fine-tipped forceps under a dissection stereomicroscope (Nikon SMZ18). Then, brains were fragmented into small pieces, transferred to a 50 ml plastic tube, and digested for 30-45 min at 37 °C in 5 ml of solution containing 1mM CaCl2, 1 mM MgCl2, 100 U/ml of DNAse I (LS02058 Worthington, Lake Wood NJ), and 20 U/ml of previously activated papain (Sigma L2020). After that, the cell suspension was briefly decanted, transferred into a 15-ml plastic tube, and centrifuged at 300 rcf for 5 min at 4°C. Then, the cells were resuspended in 3ml of EBSS, and the suspension was transferred into a 15 ml plastic tube containing 3ml of papain inactivating solution containing 50% FBS/50% EBSS. Cells were centrifuged at 300 rcf for 5 min at 4°C and then resuspended in culture media consisting of Neurobasal medium (21103049, Life Technologies) supplemented with L-glutamine (Gibco; cat. no. 25030-123), B27 (Gibco; cat. no. 17504-044), Gentamicin (15750037, Life Technologies), and 20 ng/ml of Epidermal growth factor (EFG, PeproTech catalog no. AF-100-15). Finally, cells were seeded on 8 well chamber slides (80841, IBIDI) and incubated in a humidified 37°C incubator at 5% CO2 for 48 hours until HybISS processing.
In-situ sequencing by HybISS
To process all samples (apart from the adult mouse brain, see below) HybISS [9] was performed as published at protocols.io [33]. For embryonic mice, target genes were selected based on marker genes with expression within the target regions at the target developmental stages. Samples were imaged either on a Leica DMi8 epifluorescence microscope equipped with LED light source (Lumencor SPECTRA X, nIR, 90-10172), sCMOS camera (Leica DFC9000 GTC, 11547007) and 20x objective (HC PC APO, NA 0.8, air) yielding a pixel size of 0.34μm, or on a epifluorescence microscope Nikon Eclipse 90i equipped with LED light source (Lumencor SPECTRA X, nIR, 90-10172), CMOS camera (Nikons DS Qi2) and 20x objective (CFI PLAN PC, NA 0.75, air) yielding a pixel size of 0.15μm. In both microscopes, samples were imaged with 10% overlap between tiles to cover the entire tissue and between 8 and 12 z planes were acquired with 1 μm spacing among them. A full experiment results in a multicycle, multichannel image stack (5 cycles and DAPI + 4 HybISS signal channels) image stack.
To process the adult mouse brain used for the autofluorescence removal experiment (cf. Supp. Fig. 9), HybISS was performed on fresh frozen 6 weeks old mouse brains 10μm sections, using a Phi29 enzyme (NxGen F83900-1). Images were acquired using a 20X 0.8NA objective on a Zeiss AxioImager Z1 Wiedfield microscope with a PCO.edge 4.2bi camera. The microscope was controlled via MicroManager. Exposure times were, in order of acquisition: 3ms for DAPI (Zeiss filter set 49: G365, FT 395, BP445/50), 450ms for 750nm (filters: Alluxa ultra 740.5-35 OD6, 766, 801.5-50 OD6), 300ms for 650nm (chroma BP 640/30, FT ZT640rdc, ET680/40), 300ms for 550nm (filters BP546/12, LP T560lpxr, ET590-50), and 200ms for 488nm (chroma filters BP450-490, T510, ET 525/36). 130 tiles of 2048×2028 pixels with a 10% overlap were acquired to cover the entire tissue, with a pixel size of 0.3225μm. Each tile was a z-stack of 11 planes with a 0.8μm step size. Rounds of probe hybridization, imaging and stripping were performed with a modified Labsat microfluidics device from Lunaphore, allowing us to place the stainer chamber under the microscope. A quenching buffer (Lunaphore BU08) was used to reduce autofluorescence before bridge probe hybridization and an imaging buffer (Lunaphore BU09) was used during imaging.
Image processing
Projection, stitching and alignment
Two yield 2D images the acquired stacks were reduced using either maximum intensity projection (MIP) or a custom implementation of extended depth-of-field (EDF, [34]). Tiled acquisitions were stitched together into a mosaic image with Ashlar [35], which uses a variant of phase correlation [36] to compute the offset between the different tiles at subpixel precision [37] in a simultaneous fashion. Only the DAPI channel was used to retrieve the stitching coordinates, and different cycles in the same experiment were stitched independently. After obtaining the mosaics for all cycles in an experiment, we registered them with wsireg [38], which uses elastix [39, 40] as a backend. We allow for rigid-body alignment as well as non-linear warping, which we found did not aggresively deform the sample and was able to align fine details properly. We only used the DAPI channel for inter-cycle registration.
Spot detection (Spotiflow)
All results with Spotiflow were obtained using a Spotiflow network trained on the HybISS dataset (cf. Fig. 1b, Fig. 1c, Supp. Fig. 4, Supp. Note 2) to detect diffraction-limited spots independently across cycles and channels. The probability threshold used was optimized from the validation data of the HybISS dataset.
Spot detection (LoG/starfish)
We ran LoG using starfish’s implementation, which is based on scikit-image [41], and used different intensity thresholds including the ‘optimal’ one (0.138) which was computed from the training data of the HybISS dataset. To ensure a fair comparison, we detected transcripts independently across cycles and channels as done for Spotiflow.
Gene decoding
To extract gene expression maps the detected transcripts were assigned a gene using starfish’s implementation of an intensity-based nearest-neighbor decoder. When decoding spots detected via Spotiflow, we fed the decoder the spot probabilities output by the network instead of their raw intensity. Gene expression heatmaps were obtained by performing Gaussian kernel density estimation (KDE) with variance σ2 = 5 on the gene signal. The heatmaps were clipped to (0, 1) after applying percentile-based normalization with pmin = 0, pmax = 99.9.
Time/memory benchmarking
To compare the time and memory efficiency of starfish and Spotiflow in an end-to-end setting (cf. Fig. 2e, Fig. 2f), we used Unix’s time module. We detected spots on an E12.5 mouse embryonic brain using LoG on the maximum intensity projection of the input along the cycle and channel dimensions, as done in previous studies [9, 42]. The spot intensities were then traced back along the non-projected input to retrieve the intensity of spots at the different cycles and channels. For Spotiflow, the detections were done independently on each cycle and channel as it is computationally affordable for larger tile sizes.
Zero-shot autofluorescence removal
We first built a dataset from a single experiment consisting of a regular HybISS acquistion preceded by imaging an only-autofluorescence cycle. After registering both images, we then detected spots (using the Spotiflow model trained on the HybISS dataset) in one channel (corresponding to 750nm) of the autofluorescence cycle and the same channel of the first HybISS cycle. We generated the non-autofluorescent ground-truth by substracting all the autofluorescent detections that matched (at spatial cutoff c = 3) to a detection in the HybISS channel. We generated three spatially-disjoint splits from this experiment (training, validation and test). We finally fine-tuned the Spotiflow model pre-trained on HybISS on the generated training dataset to predict the non-autofluorescent spots. Quantification is reported on the test split (cf. Supp. Fig. 9).
Live-cell imaging experiments
Tissue collection and preparation
HeLa cells expressing endogenously tagged Halo-TRF1 were labelled with Janelia Fluor 646 Halo ligand (Promega) in order to mark telomeres. To visualize TERRA, ectopically expressed 15q-TERRA species were tagged with PP7 stem-loop structures that were bound by phage coat protein fused to GFP (PCP-GFP). Live cells were imaged using a Nikon Confocal Spinning Disk microscope equipped with two Photometrics Prime 95B cameras and sCMOS Grayscale Chips. Imaging was performed with a 100x objective in an equilibrated incubation chamber at 37°C and 5% CO2. Images were acquired as multi-channel single planes at a rate of 20 frames per second (50 ms exposure, 200 frames per movie).
Movie processing
Movies were first spatially cropped so that each crop contains only one cell. In each crop, spots were detected independently per frame and channel, where one channel contains the Telomeres marker and the other the TERRA marker, using the deepBlink and Spotiflow models trained on the Telomeres and Terra datasets. The optimized probability threshold on each dataset was used for Spotiflow. For deepBlink, the probability threshold was set to the default (0.5). Single particle tracking was performed using Trackmate [24] with a spot radius of 0.15 μm and simple LAP tracking with the following parameters for Telomeres/TERRA: linking maximum distance 0.22/0.60 μm, gap-closing maximum distance 0.44/1.0 μm and gap-closing maximum frame gap 10 frames.
Acknowledgements
We thank members of the Weigert and La Manno labs as well as Lars Borm (KU Leuven) for their feedback and discussions of the project. We would also like to thank the EPFL BioImaging & Optics Core Facility (BIOP) and the EPFL Histology Core Facility for their assistance in imaging and sample preparation. M.W. was supported by the ELISIR program of the EPFL School of Life Sciences and by generous funding from CARIGEST SA.