Hidden network preserved in Slide-tags data allows reference-free spatial reconstruction

We reanalyzed data from the Slide-tags method developed by Russell et al. and discovered a hidden, spatially informative network formed during the transfer of spatial tags to nuclei. The structure of this network conveys sufficient information to infer cell locations entirely without ground truth from spatial indexing, placing Slide-tags among a new generation of optics-free, network-based imaging-by-sequencing approaches, a fundamental departure from classical spatial sequencing technologies based on pre-indexed arrays.


Main
Multiplexed imaging methods (Chen2015; Ke2013; Merrit2020; Eng2019) have enabled precise localization of several molecular targets of interest, whereas single cell methods (Ramsköld2012; Islam2011; Hwang2018; Macosko2015) allow mass profiling up to thousands of targets at the expense of spatial information.Spatial capture and sequencing technologies (Moses2022; Ståhl2016; Rodriques2019) seek to eliminate this trade-off by providing both broad target coverage and spatial information.These technologies are based on prior fabrication of molecular oligonucleotide barcode arrays that facilitate locating of captured biological sequences from an adjacent tissue.The substrate preparation requires spatial indexing (decoding) to assign sequences to locations in a spatial reference map, either by assignment during array printing (Ståhl2016) or alternatively by in situ sequencing of random arrays using fluorescence imaging (Rodriques2019; Vickovic2019; Stickels2021).Russell and colleagues recently developed a powerful variation on the latter with Slide-tags (Russell2024), whereby spatially decoded bead arrays are cleaved to diffuse oligonucleotide barcodes into tissue samples, thus associating spatial barcodes with individual nuclei and ingeniously inverting the classic capture principle.Like other spatial array-based approaches, however, this method still depends on a decoded reference map, which remains a costly bottleneck in spatial transcriptomics technologies.
In analyzing the publicly available data from Russell et al., we discovered that their diffusion of barcodes generates spatially coherent networks composed of cells and beads, preserved within the Slide-tags data.These networks are analogous to those in the emerging field of imaging-by-sequencing, also known as DNA microscopy, molecular pixelation, or sequencing-based microscopy (Weinstein2019; Hoffecker2019; Boulgakov2020; Karlsson2024).Thus, even without a decoded spatial reference map (Fig 1a), we could infer cell nuclei locations from the sequencing data network structure alone.
Slide-tags data consists of associations between oligonucleotide cell identifiers (cell ID's) and bead barcodes.The reference map allows cells to be traced via associated bead barcodes to locations on the array.We observed the tendency for pairs of cells to be interlinked by mutually associated bead barcodes, resulting in a large cell-bead network.Examining the frequency of bead nodes according to their number of connections (degree distribution; Fig 1b ) we saw an approximate power law characteristic of networks lacking intrinsic scale (Barabási1999).However, when we examined the degree distribution of only the cell nodes, we observed a bell curve with a clearly defined peak, mean, and spread (mean 65 and standard deviation 24 connections).Therefore, we converted the network of cells and beads into a network of only cell nodes connected by edges weighted by the number of mutual bead connections.Thus, like regular lattices or meshes, a node in the cell network has a number of connections falling within a well-defined range.When we examined the relationship between the frequency of mutually occurring bead barcodes with the inter-cell distance (based on ground truth from Russell et al.) for a given cell pair, we observed a population of edges in the short 0-200 µm range whose frequency decays monotonically with distance while a large fraction of edges appear random, uncorrelated with distance (Fig 1c).
We thus sought to discriminate spatially informative, proximity-dependent edges from non-spatially correlated edges reflecting random noise, in a way that is decoupled from ground truth knowledge of the distances between edge-bearing cells.To do this, we assessed the network's spatial coherence, or the potential for a graph to convey spatial information (Bonet2024) (see methods; Fig 1d ).By filtering the network according to the range of allowed edge weights, we could observe a transition from low to high tendency to capture variation in shortest path network distance between cell nodes using only 2 dimensions of variation, reflecting a transition to greater agreement with the native 2 dimensions of the space.Examining network filtering against ground truth locations, we also observed that greater filter stringency resulted in a progression from densely connected and mostly random connections to mesh-like structure, albeit sparser, networks composed of local proximity dependent connections.For a lower edge weight cut-off of 7, the largest connected subgraph consisted of 286 cells (Fig 1e), with structure primarily consisting of short-range, local edges.We sought to recover the positions of cells in this largest 286 cell network using only the spatial constraints preserved in the network structure and without the explicit ground truth information from spatial indexing.We reconstructed positions using the Spatio Topological Reconstruction by Network Discovery (STRND) pipeline (Bonet2023;  The high accuracy with which it was possible to reconstruct the spatial positions of a significant portion of a tissue slice from Slide-tags data without the use of spatial indexing derived ground truth information points to a surprising conclusion: that spatial transcriptomics may be conducted without the need for microscopy-based decoding of spatial position.How might this process be improved to achieve larger reconstructable areas?The quality of the network and its potential for spatial reconstruction could be influenced by multiple experimental parameters affecting the probability, spatial dependence, and consistency of cell connections.We estimated cell density and relative numbers of UMIs per cell using kernel density estimation (Fig 2f) and observed that the largest subgraphs correspond to regions with high cell density and high amount of UMIs per cell.
An exponential diffusion model fitted to the average frequency of mutual bead barcodes per cell as a function of distance reveals a characteristic length scale of network formation of λ = 39.8 µm (related to diffusion coefficient by where t is the diffusion time).To achieve a greater  λ = 4 reconstruction coverage, an experimental procedure expanding the effective range of connectivity for a given cell (e.g. by promoting greater diffusion distances) by increasing the diffusion length parameter to λ = 400 µm enables a fully connected space (Fig 2g).When we simulated the formation of this expanded network, we could achieve a network-based reconstruction spanning the whole tissue sample (Fig 2h).This paradigm shift implies a far more scalable and automation-friendly future for spatial tissue profiling with cellular resolution.This approach is not limited to flat networks, and future applications could entail three dimensional tissues and a potential shift in how we approach molecular decoding of tissue specimens.

Data retrieval from public repositories
The data analyzed was obtained from Russell et al., and was retrieved through the Broad Institute Single-Cell Portal (SCP) via the accession numbers SCP2162 (mouse brain), SCP2170 (mouse embryonic brain).Of the mouse experiments selected for this study, mature mouse brain and 14-day embryonic brain, the embryonic data was deemed more suitable for reconstruction due to its higher density of cells, which was required for accurate reconstruction.For each mouse sample, ground truth cell nuclei locations from the end results of Russell et al.'s Slide-tags procedure were downloaded from the SCP and sequencing data was acquired from the NCBI Sequencing Read Archive using the SRA Toolkit.As only the spatial positions were of interest, only sequencing runs containing bead-cell paired reads were used, SRR26236608 and SRR26236606 for embryonal and mature hippocampus respectively.

Compilation and transformation of networks
The sequencing data was initially processed using Russell et al. '

Edge filtering
To refine the unipartite cell-cell network, edges were filtered according to their weights, with lower weight values indicating connections less likely to correlate with spatial proximity.A filtering process was implemented on the unipartite weight matrix to retain edges only if .As a    ≥ θ result of thresholding, the network became sparser with the removal of connections, leading to the formation of multiple connected components, isolating subsets of cell nodes that remained strongly connected.A connected component in an unweighted graph is a subgraph in which any two  ⊆  nodes are connected to each other by paths, and which is connected to no additional nodes in the larger network.

Spatial reconstruction of networks
Connected components obtained from the filtered unipartite cell-cell network were then spatially reconstructed using the Spatio-topological recovery by network discovery (STRND) pipeline (Bonet2023).The algorithm generates spatial representations of nodes based on their topological relationships within the network, resulting in spatial approximations for networks whose topology reflects physical spatial dependencies during formation.Thus, the structural information of the network is encoded through high-dimensional vector representations.UMAP is used to further reduce the dimensionality and map back to the original 2-dimensional space.This is done in such a way that if a pair of vectors , are close in their     high-dimensional representation (because of high random walk visitation probability), they will also be close in the mapped 2-dimensional space.

Enrichment of filtered subgraphs with internal edges
A high threshold of = 7 beads per edge resulted in subgraphs with a low mean degree of 2.2 for θ the 5 largest subgraphs.When reconstructed, resulting positions have a high local accuracy, but global structure is largely lost.To rectify this, each subgraph goes through a process of  ⊆  extracting edges from lower filtering threshold , where but limited to only nodes within θ spatially precise, sufficient amounts of short-range, spatially accurate edges are able to counteract this.As the reconstruction algorithm is stochastic in nature, the relative abundance of these short, structurally beneficial edges overcomes the addition of long, destructive edges resulting in accurate reconstruction.When enriching a subgraph, the optimal is not guaranteed to be equal for all θ 2 subgraphs, however it was empirically observed that when including edges , reconstructions θ 2 = 1 were always substandard and therefore was used as a baseline.This observation could be θ 2 ≥ 2 attributed to edges composed of single beads being not only extremely numerous, but also appearing to be largely randomly formed.

Evaluating the spatial coherence of network data
Spatial coherence, as described by Bonet et al. (Bonet2024), refers to the extent to which any network resembles a spatial network formed by proximity interactions.Using this to motivate selecting filtering thresholds, we use the contribution of the first two values of a Gram matrix obtained from shortest path topological distances between pairs of nodes.This contribution corresponds to how much of the variance in network node positions can be explained by only 2 dimensions.A lower value indicates that the network is not consistent with 2-dimensional spatial constraints, or in other words that it is not spatially coherent.When increasing the filtering threshold the 2-component variance contribution increases as well, but the size of each subgraph will also decrease as edges are removed, thus a threshold should be selected where subgraphs have a high spatial coherence but are still large enough to be spatially informative.

𝐻 𝐻
Eigenvalues of the Gram matrix were computed by solving the equation .The ( − λ) = 0 eigenvalues were then ranked according to their magnitude.To measure spatial coherence, the first Eigenvalues (approximating for the planar experimental configuration) were compared   = 2 as a fraction relative to the total sum of Eigenvalues.
where is the number of spatial dimensions, and are the eigenvalues of the Gram matrix derived  λ from the network's shortest path distances.

Quality metric analysis based on ground truth comparison
As there is a previously determined (ground truth) position for each cell available, this also allows the usage of quality metrics dependent on the true cell positions to quantify the accuracy of reconstructions.The first metric assesses the correlation between all pairwise reconstructed distances and true distances.This comparison was done by computing the Pearson correlation coefficient and the reported R 2 value is the square of this correlation coefficient where are true  distances and are corresponding reconstructed distances.
For quantifying local accuracy, a KNN-based metric was used (Bonet2023).The original and reconstructed positions are used to construct two separate K-D trees.For each point the nearest neighbors are compared between the original tree and the reconstructed tree.The overlap between them is assigned to that point as a score between 0 and 1 corresponding to the proportion of nearest neighbors shared between trees.The third metric to assess the reconstruction accuracy is the shift in position for the reconstruction once mapped to the original positions.This requires aligning the reconstructed positions, , to the  true positions, .This was accomplished using singular value decomposition (Eq 5) and the point  The distortion per point is then the Euclidean distance , between the  = ( − ) + ( − ) true position and corresponding aligned reconstructed point.Using the unit of the original positions µm the median of all distortions can be interpreted as the resolution of the reconstruction.

Calculating cell density and correlating subgraph positions with graph properties
To identify whether subgraph discovery could be related to sample properties, tissue cell density and normalized UMI density per cell was modeled using kernel density estimation which for each reference cell nucleus allows determination of a probability density value (Eq 7), here  ()  -9 , indicating that subgraphs tend to emerge in sample areas with a higher • cell density.Performing the same analysis but using the normalized UMI density yields a p-value of 3. 15 10 -152 .Together these support that subgraphs we can find using the presented methods form • in general not only in areas of denser cells, but also among cells or areas of tissue where each individual cell captures more bead-oligos.

Modeling and parameterizing distance dependent edge function
The network generating process from Slide-tags of promiscuous diffusion was modeled by producing a distance-dependent edge assignment function.The formula and parameters of this function were decided on the basis of the distribution of edges and weights as a function of distance observed in the Slide-tags dataset.Analysis of distance dependence was performed on the unipartite-transformed data consisting of pairs of cell identifiers (network edges) and their associated number of bead barcodes associating each pair or edge weight.The relationship   between and Euclidean distance was then determined using the coordinate data associated

Modeling edge formation from a set of spatial ground truth coordinates
To simulate the formation of spatial networks by barcode diffusion, the parameterized exponential decay model was used to compute the weight of edges formed between cells as a function of their pairwise separation distance.An arbitrary set of points may be used to produce such a network, however the cell positions (ground truth) from Russell et al. were used to explore hypothetical scenarios with different network-generating parameters.To generate edges and edge weights, the distance dependence function value was used deterministically according to the separation distance between each cell and rounded to the nearest integer.If that integer is non-zero, then an edge with the integer value as weight is assigned to the pair of points (cells), and no edge is assigned otherwise.
The resulting edge list was processed according to the STRND pipeline in which random walks sampling of node visitation probabilities followed by distance constraint satisfaction was used to spatially localize nodes in the network.In cases where multiple subgraphs were generated instead of a single contiguous network, the largest subgraph (with the most constituent nodes) was reconstructed preferentially over smaller ones.

Figure 1 .
Figure 1.Slide-tags data contains a hidden network of spatially linked cells.(a) Schematic of the original Slide-tags workflow (top path) and a discovered optics-free, network-based spatial reconstruction workflow (bottom path).(b) Log-log degree distribution plots for the network constructed from Slide-tags data comprising interconnected cell nodes and bead nodes exhibits a finite distribution of the number of connections (node degree) per cell with a defined peak (left), versus scale-free distribution of the number of connections per bead (right).(c) The frequency distribution of counts associated with an edge of a given distance (left) and the distribution of the mean number of beads per edge for binned distances (right).(d) The evolution of Gram matrix spatial coherence as a function of filtering stringency, or the number of beads per edge.(e) Visualization of cell-to-cell networks formed for a range of filtering powers shown on top of the ground truth cell positions.
Fig 2a), which gathers random-walk neighborhood visitation frequencies from each node and reduces these high dimensional node representations down to 2 spatial dimensions.The algorithm is unsupervised and the only input required is a network.Although the network lacks information on original positions' scale, rotation, translation and chirality, we could accurately recover relative distances.Reconstruction of the largest subgraph of 286 nodes yielded point coordinates whose relative positions closely matched those of Russell et al.'s reported cell positions (Fig 2b).

Figure 2 .
Figure 2. Optimization of proximity constraints enables cell localization without spatial indexing.(a) Schematic of reconstruction algorithm whereby random walk visitation probabilities are captured as node representations that are reduced down to spatial 2 dimensions.(b) Comparison of reconstructed spatial positions (top) obtained without ground truth instead via network analysis with ground truth cell positions (bottom) for the largest subgraph.(c) Reconstruction compared to ground truth cell positions with distortion lines to show relative positioning error of each cell.(d) Global accuracy of reconstruction as measured via the pairwise distances between points in reconstructed versus ground truth points showing linear correlation between the two with R 2 = 0.886.(e) Local reconstruction accuracy as measured through similarity in the K-nearest neighbor points of reconstructed versus ground truth.(f) Quantification of cell density (top) and number of unique barcodes per cell normalized by cell density (bottom).(g) Network formation modeled with exponential decay (upper) and predicted ground truth network (lower) for a hypothetical experiment with expanded diffusion range of barcodes (h) Corresponding ground truth cell positions (left) and network-based spatial reconstruction (right) for the expanded diffusion range network.
s code to generate a file of bead-cell pairs with the corresponding number of UMIs per pair, which is equivalent to an unfiltered edgelist.The bipartite network represented by an biadjacency matrix of size ,  ||×|| consists of two distinct sets of nodes: cell nuclei ( ) and barcoded beads ( ).The bipartite network   was transformed into a unipartite network composed solely of cell nuclei nodes.The conversion was implemented by creating edges between cell nodes based on their mutual connections to bead nodes to produce a unipartite weight matrix where each element in the the number of beads connected to both cells and .

First, random walks
are conducted from each node in the subgraph to sample the visitation probabilities of nearby nodes.Let represent a subgraph of cell nuclei nodes, and  that node is visited during random walks initiated from node .These     probabilities capture the local and global structure of the subgraph and are estimated from the visitation frequency gathered from random walks.The estimated probabilities are used as training data in a skip-gram architecture to produce high-dimensional representations for every node .   The training is formulated as an optimization problem that aims to maximize the log-likelihood of observing the neighborhood of each node given its vector representation (Grover2016) The dot product and the softmax function are used to compare random walk visitation probabilities with high dimensional vectors.
at .While this has the effect of also adding long edges not   ≥ θ 1 represent the intersection and union set of K-nearest neighbors of point i in the original and reconstructed datasets, respectively.The score assigned to the whole subgraph is the mean of the individual scores (Eq 4 cell (taken from the results of Russell et al.'s assignment of cell locations) which was treated as a ground truth in the model to trigonometrically measure pairwise distances between cells.The resulting relationship between and distance was viewed as a scatter plot which revealed a   cloud of points with high values in the distance range of 0 to 200 µm, whereas a large portion of   points occurred at low values at intermediate to long distances.Data was discretized by   forming a set of distance binswhere each bin covers a specific range of intercell distances. .This plot contained a highly spatially-correlated variable region in the 0 to 200 µm distance range on top of a flat non-correlated region spanning the rest of the space.To model the distance dependence of edge formation and weight assignment, only the spatially correlated, variable region of this relationship was modeled, and a line fitted to the flat non-spatially correlated background portion of the relationship was subtracted to produce a variable-region-only cleaned dataset.The exponential diffusion function that was fitted to the background-subtracted mean vs distance relationship by minimizing residuals and had the    form where is the distance assigned to the bin and is the characteristic length scale parameter of the diffusion, where is a diffusion coefficient and is the   time of diffusion.
By separating the nuclei into two independent populations based on whether they are part of the 10 largest subgraphs we can perform a one-sided Mann-Whitney U test to indicate whether subgraphs tend to emerge from cells in areas with high cell density or high normalized UMI density.Out of the 4,635 cells with a spatial reference a total of 1,002 are part of largest subgraphs leaving 3,633 cells to the non-subgraph population.Comparing cell densities between the two populations results in a p-value of 1.76 10