A Universal Method for Crossing Molecular and Atlas Modalities using Simplex-Based Image Varifolds and Quadratic Programming

This paper explicates a solution to the problem of building correspondences between molecular-scale transcriptomics and tissue-scale atlases. The central model represents spatial transcriptomics as generalized functions encoding molecular position and high-dimensional transcriptomic-based (gene, cell type) identity. We map onto low-dimensional atlas ontologies by modeling each atlas compartment as a homogeneous random field with unknown transcriptomic feature distribution. The algorithm presented solves simultaneously for the minimizing geodesic diffeomorphism of coordinates and latent atlas transcriptomic feature fractions by alternating LDDMM optimization for coordinate transformations and quadratic programming for the latent transcriptomic variables. We demonstrate the universality of the algorithm in mapping tissue atlases to gene-based and cell-based MERFISH datasets as well as to other tissue scale atlases. The joint estimation of diffeomorphisms and latent feature distributions allows integration of diverse molecular and cellular datasets into a single coordinate system and creates an avenue of comparison amongst atlas ontologies for continued future development.


Introduction
Since the 17th century, scientists have seen living organisms as a hierarchy of biological mechanisms at work across scales. To understand the interplay of these mechanisms, reference atlases that incorporate genetic, cellular, and connectivity measures into a single coordinate space have been constructed and which aim to summarize the mass of data across scales through a set of discrete partitions. An instance of the more general segmentation problem in computer vision, atlas construction relies on the underlying assumption of homogeneity within each region. The optimal partitioning assigns a label to each region based on this homogeneity and the presence of sharp changes at the boundaries between regions.
In biology, this label frequently reflects behavior or function, as seen in two of the most common mouse brain atlases: the Allen Reference Atlas (ARA) [1] and the Franklin and Paxinos Atlas [2]. Together, the common coordinate framework an atlas provides in addition to its ontology have guided research efforts in facilitating the comparison of different types or replicates of data in a single coordinate system and in honing efforts of study to particular regions relevant to each unique investigation.
The widespread use of these atlases, particularly in the fields of digital pathology and neuroimaging, has motivated efforts to develop image registration tools to align individual images to such reference atlases. A large family of methods, all diffeomorphism based [3], have been developed within the field of Computational Anatomy (CA) [4,5] for transforming coordinate systems at the tissue scales. These come particularly from multiple labs in the magnetic resonance imaging (MRI) community [6,7,8,9,10,11,12,13,14]. More recently, Molecular CA [15] has emerged which unifies the dense tissue scales of MRI with high resolution micron scales of digital pathology imagery. These approaches are hierarchical [16], constructing what are termed image varifold representations, which are geometric measures of the brain existing at multiple scales, and therefore allowing for the simultaneous representation of both micron scale particle phenomena, such as the transcriptional or cell type data studied herein, and millimeter scale tissue phenomena, as traditionally studied in CA.
A central challenge that remains within these representations is the necessity of crossing between those reflecting different imaging modalities and therefore different functional range spaces, which exist at different scales. In the setting of classical images on a regular grid, this challenge has been addressed through different approaches including matching based on analytical methods using cross-correlation [17] or localized texture features [18], and methods for transforming one range space to another in crossing modalities and scales based on polynomial transformations [19], scattering transforms [20] and machine learning [21,22,23]. More recently, methods in deep learning have also been applied to align single-cell datasets, modeled as regular grid images, both to atlases at the histological scale [24] as well as reference transcriptional atlases that are beginning to emerge [25].
We should however expect even more diversity in the types and scales of data that can be measured with the rapidly developing technologies in imaging and spatial transcriptomics. These aim to detect up to thousands of genes simultaneously with spatial information, and thus, allow us to view both the micro and even nanometer scales with exquisite detail [26]. Both the diversity and magnitude of this data pushes the limits of our ability to model such datasets as classical continuous images, discretized on regular grids. Indeed, as seen in those repositories generated in the BRAIN Initiative Cell Census Network (BICCN) and archived at the Brain Image Library (BIL), these datasets are already on the order of terabytes and will only continue to increase as technologies shift from mouse to human measurements. Hence, the need remains for a modeling framework equally equipped to represent datasets in both forms of traditional continuous imagery sampled on a regular grid, and those of discrete particles with attached functional description; and for an associated registration mechanism to align objects in this framework across different functional modalities at different scales.
This paper focuses on the use of mesh-based image varifolds, as described in [27], for simultaneously modeling molecular and tissue scale data. A subproblem covered by the mesh based image-varifold theory outlined in [27] is the mapping of molecular scale data to atlas coordinate systems. Image varifolds are geometric measures [15], which allow us to provide a single representation that supports molecular transcriptomics measurements, cell-based measurements, and tissue scale atlases. We explicate, here, the construction of a universal method rooted in this framework for transferring molecular scale data to tissue scale "cartoon" atlases, which are devoid of gene measurements, and rather, only contain a fixed partition into structures in their description.
Our solution couples coordinate system transformation via geodesic generation of minimal energy diffeomorphisms to estimation of a family of probability laws, which give for each atlas label, a distribution over molecular features that is the most reasonable explanation of the target transcriptomic dataset. Specifically, we model each atlas region as homogeneous and stationary with respect to space, giving an optimal alignment between atlas and target that maximizes similarity in distribution over features across each site in a single atlas region while minimizing the energy of the geometric deformation (diffeomorphism). This consequently skews emphasis away from the foreground-background boundaries that almost exclusively govern image alignment and instead highlights the underlying assumptions in the architecture of the cartoon atlas, whose boundaries were initially constructed so as to maximize the homogeneity of the region. We estimate the diffeomorphism and probability laws jointly via an alternating algorithm, as explicated here, that iterates large deformation diffeomorphic metric mapping (LDDMM) with quadratic programming for minimizing the normed distance between the template and target, and as a result, yields both spatial alignment and functional correspondence between template and target.
We demonstrate the efficacy of this methodology in mapping 2D sections of the ARA [1] to corresponding sections of both cell-independent and cellbased spatial transcriptomics datasets, both generated via the MERFISH imaging-based spatial transcriptomics technology, which yields single molecule resolution. We present methods for sparsifying the functional transcriptome descriptions via gene selection based on mutual information with spatially discriminating variables and subsequently illustrate the stability of our estimated diffeomorphisms to choices of subsets of features. Finally, given the plethora of existing reference atlases, each of which might define a different partitioning scheme over the same area of tissue, questions of comparison and relevance of each atlas to emerging molecular and cellular signatures naturally arise [28]. We show through the use of our methodology to map not just atlas to molecular dataset, but one atlas to another, that the correspondence yielded by our method serves as an anchor for re-examining existing ontologies and creating new ones for the future.

Image Varifolds and Transformations for Molecular Scales Based on Varifold Norms
In Computational Anatomy, correspondence between tissue sections is computed using coordinate transformation between the sections by solving an optimization problem characterized by the set of possible transformations to optimize the image similarity function that specifies the alignment of the sections. These transformations are modeled as affine motions and diffeomorphisms φ which act to generate the space of all configurations. For classical images such as for MRI, LDDMM [29] uses the action of diffeomorphisms on images I as classical functions using function composition on the right with the inverse of the diffeomorphism: φ · I(x) = I • φ −1 (x) for x ∈ R d . The image similarity function used is often a norm on functions, and solving the problem of minimization of the norm in the space of diffeomorphisms gives the metric theory of LDDMM for generating geodesic matching between exemplar anatomies [30,31]. Spatial transcriptomics generates measurements that while often represented as regular lattice images, are fundamentally lists of point measurements across the different technologies and thus, often dispersed irregularly over space. In spot-resolution technologies including Visium, DBiTseq, and SlideSeq, these point measurements are the magnitudes of gene expression in the neighborhood of each "spot", which could be placed in a regular grid pattern. In contrast, in imaging-based spatial transcriptomics technologies including STARmap, Barseq, SeqFISH, and MERFISH, as illustrated here, these point measurements are single mRNA molecules or single cells, therefore dispersed in space according to the given tissue architecture and instantaneous cell dynamics measured. In both cases, we can represent these point measurements as "particles".
Natural fluctuation in gene expression over time and space coupled to the dynamics of each spatial transcriptomics technology leads each tissue section, at the molecular (1-100 micron) scale, to have a varying number of such particles with no natural ordering of particles consistently apparent between sections. To build correspondences between these datasets of point measures, we unify the molecular scales with image-like functions as has been developed for building correspondences at tissue scales in MRI [4]. For this we represent the particles as "generalized functions" [15]. Since they carry gene or cell image data we call them image varifolds [27], linking to the rich literature on the geometric measure theory of varifolds. This allows us to represent particle clouds at any scale in both spatial and imaging function dimensions. We note landmark-based methods [32] that assume direct permutation correspondence between particles across images are not applicable, as a MERFISH section may have 100,000 particles requiring an unfathomable number of permutations to specify.
Varifolds are defined as follows. We consider a Euclidean space in d dimensions with d = 2, 3, to which we add function dimensions represented by a set F. In spatial transcriptomics datasets, the functional dimensions represent the gene types of detected mRNA transcripts, treated as independent measures or aggregated into cells or small neighborhoods. At the finest scale, we model a discrete set of point measures (particles) reflecting the individual reads recorded by the given technology, whether they be single transcripts or distributions of transcripts in a given cell or neighborhood. To a single read (x i , f i ) ∈ R d ×F, we associate the elementary "Dirac" measure, δ xi ⊗δ fi , which acts on a set A ∈ R d × F as δ xi ⊗ δ fi (A) = 1 if (x i , f i ) ∈ A and 0 otherwise. The point measures carry weights w i , giving the multiplicity, typically as number of transcripts or number of cells measured by each individual read. The discrete image varifold is defined as the weighted sum of Diracs representative of the collection of particles and functional features (w i , x i , f i ), i = 1, 2 . . . : While for the molecular scales, each data point is a measurement of a single mRNA transcript or local (e.g. cell's) distribution on the feature space of gene type F, in contrast, a data point in a given atlas is interpreted as a single voxel with a label prescribed to it from the overall ontology, L.
It is natural to associate a density in mass per unit volume to the varifold through the classical decomposition of measures as a product. This gives the marginal distribution ρ on physical space, ρ(A) = µ(A × F), A ⊂ R d , and the field of conditional probability measures over the feature space µ x , x ∈ R d on F the feature space: For molecular scales, ρ(·) on R d is typically the spatial distribution of total gene expression, while for atlas images at tissue scales, it is a continuous uniform distribution over the support of the tissue. Cross-modality mappings from molecular to tissue scales thus imbue the atlases with estimates ρ(·) and the field of conditional probabilities µ x (·), x ∈ R d of the molecular feature space (e.g. gene type).

Quadratic Program for Cross Modality Mapping on Meshes
A central goal is to imbue the atlas with molecular or cellular information by estimating a cross-modality mapping between the atlas and a finer scale, single-cell or subcellular dataset, such as those emerging particularly from imaging-based spatial transcriptomics technologies. To compute this mapping, we model each modality as an image varifold, a product of measures over physical and feature space, by instantiating each measurement as a triangulated or simplex mesh following [27]. Each mesh carries a collection of vertices x = (x i ∈ R 2 ) i∈I . From the vertices we construct the simplex triangles γ j (x) and their centroids m j (x) for j ∈ J, with vertex numbers |I| and simplices |J| determined by the resolution selected. We denote the target mesh as τ throughout the paper; see Section 4.1 for detailed construction. We note the triangles and centers are a function of the underlying vertices, but we will often suppress their explicit dependence except when necessary. To complete the image varifold we append to the mesh τ the density α = (α j ) j∈J and the field of probability laws ζ = (ζ j ) j∈J on F: Importantly, in spite of the apparent differences between equations (1), (2) and (3), they all belong to the same category of mathematical objects, and can be addressed together in the framework of image varifolds.
At the molecular scales presented in this paper, the density is number of cells or number of mRNA transcripts per mm 2 and is defined for each simplex, area |γ j |, as The field ζ j , j ∈ J are probabilities over genes or cell types with finite dimensional feature spaces f ∈ F, with |F| ≃ 1000 in the case of genes and |F| < 50 for cell types. Each ζ j (f ) is a probability of gene or cell type, with F ζ j (f ) = 1, indexed by location in the image. We take the ARA [1] as the template to be mapped onto the molecular data. For mapping the atlas to molecular scales, we have to estimate both the diffeomorphisms φ : R d → R d transforming atlas coordinates. as well as the unknown densities and conditional feature distributions, α π , ζ π which we take as latent variables for the atlas. We denote the mesh for the template as τ 0 representing its vertices x 0 = (x i ) i∈I0 and simplices and centers The atlas carries a finite ontology, L, dividing it into disjoint spatial partitions. We model each atlas region as having a distribution (non-normalized) over the molecular features (π ℓ ) ℓ∈L on F viewed as latent variables that are homogeneous across the partition region. The simplex law is determined by the contribution of each ontology region to the vertex for j ∈ J 0 , given by the mixture distribution p j (ℓ), L p j (ℓ) = 1. The atlas has appended the molecular feature space estimated from the target (α π , ζ π ) and is given as: The group action carrying the atlas onto the target becomes Here, |Dφ| mj is the Jacobian determinant of φ at m j . Figure 1 shows a mesh-based image varifold for two coronal sections (Z = 385, Z = 485) of the Allen Atlas, with finest granularity ontology (|L| ≈ 700) and with meshes rendered at 50 µm resolution. The right panel of Figure 1 illustrates the physical densities α j , j ∈ J of mRNA transcripts per mm 2 in coronal sections of MERFISH from the Allen Institute [33]. Coronal sections of mouse brain rendered as mesh from Allen Reference Atlas (left) and MERFISH-spatial transcriptomics (right). Selected sections of atlas chosen by visual inspection to match MERFISH architecture. Meshes are rendered at 50 µm, with tissue sections corresponding to Z-sections 385 (top row) and 485 (bottom row) in 10 µm Allen reference atlas. Colors in the left column indicate a region in the Allen ontology, while colors in the right column indicate the density of mRNA transcripts given by the number of transcripts per simplex area α j = # transcripts/|γ j |.
To map the mRNA measures to atlases we follow [27] and define the space of image varifolds µ ∈ W * to have a norm ∥ · ∥ 2 W * , and transform the atlas coordinates onto the targets to minimize the norm. The space of varifold norms is associated to a reproducing kernel Hilbert space [34,15] (see (7) below) defined by the inner-product of the space as ⟨µ, ν⟩ W * , ∥µ∥ 2 W * = ⟨µ, µ⟩ W * . The mapping variational problem constructs φ : R d → R d and feature laws (π ℓ ) ℓ∈L on F to carry φ 1 · µ π τ0 onto µ τ minimizing the normed difference. Densities that are estimated are constrained to fall in the range 0 ≤ α min ≤ α π j ≤ α max < ∞ to ensure positive values for the density and incorporate prior knowledge of cellular or molecular distributions .  371  372  373  374  375  376  377  378  379  380  381  382  383  384  385  386  387  388  389  390  391  392  393  394  395  396  397  398  399  400  401  402  403  404  405  406  407  408  409  410  411  412  413  414 Springer Nature 2021 L A T E X template A Universal Method for Crossing Molecular and Atlas Modalities 9 Throughout, we take α min to be the 5th percentile of values of (α j ) j∈J in the target.
The variational problem maximizes the overlap of homogeneous regions in the atlas (e.g. each partition in the ontology) with those in the target (e.g. regions where conditional feature distributions are stationary over space) by deforming coordinates using LDDMM to optimize the vector field v t , t ∈ [0, 1]. The quadratic programming calculations solve for π ℓ , ℓ ∈ L for the atlas to gene expression and cell-type problem and are described in the methods section 4.3. Figure 2 illustrates the results of mapping the Allen Atlas coronal sections to Allen MERFISH spatial transcriptomics sections, shown in Figure 1. Allen Atlas sections were chosen based on correspondence through visual inspection. Estimated mRNA densities, α π j =π j (F), as depicted on left and middle columns, were achieved through solution of the quadratic program as defined in (9), and reflect total mRNA densities from a full set of 702 genes as features. Leftmost column shows estimated mRNA densities on transformed geometry of atlas mesh under the action of the diffeomorphism φ, while middle column shows estimated mRNA densities on original atlas geometry. Right column shows the action of the diffeomorphism φ on each atlas section, with vertex positions, φ(x 0 ), and with approximate determinant jacobian, |Dφ| mj indicated by the color at each simplex site j ∈ J 0 .  416  417  418  419  420  421  422  423  424  425  426  427  428  429  430  431  432  433  434  435  436  437  438  439  440  441  442  443  444  445  446  447  448  449  450  451  452  453  454  455  456  457  458  459  460 Springer Nature 2021 L A T E X template 10 A Universal Method for Crossing Molecular and Atlas Modalities Fig. 2 Results of cross-modality atlas mapping to Allen MERFISH spatial transcriptomics [33] for coronal sections of tissue at approximate Allen atlas Z-sections of 385 (top) and 485 (bottom). Left column shows estimated mRNA densities, α π j =π j (F ), j ∈ J 0 , per deformed simplex site under the action of the diffeomorphism of atlas to target space φ 1 · µ π τ 0 ; middle column shows the same pulled back onto original atlas geometry µ π τ 0 ; right column shows the diffeomorphism applied to the mesh τ 0 , with depicted approximation of the determinant of the Jacobian |Dφ 1 |m j , j ∈ J 0 , as described in Section 4.3.

Dimension Reduction of Gene Distributions via Mutual Information
In mapping atlases to distributions of mRNA, we are typically interested not just in overall mRNA density, but the distribution of expression across a particular set of genes. The size of the total gene set measured varies across technologies, ranging from hundreds to tens of thousands of different genes [26]. However, both computational time and memory frequently dictate the analysis of only a subset of these genes at a time, together with their relevance to each particular application. A common selection mechanism is to consider those genes that are most "spatially variable" [35] or "differentially expressed" [36], under the assumption that expression pattern thereby varies per biologically different regions of tissue. This is particularly relevant, here, in the context of mapping spatial transcriptomics to atlases where we aim to estimate distributions over genes for each region in our atlas that we assume is homogenous within the region. Various methods have been described for identifying which genes in a spatial transcriptomics dataset are more spatially varying than others, some examples being Gaussian process registration, Laplacian Score, [35] and Moran's I [37]. In order to score genes which are most spatially varying we  462  463  464  465  466  467  468  469  470  471  472  473  474  475  476  477  478  479  480  481  482  483  484  485  486  487  488  489  490  491  492  493  494  495  496  497  498  499  500  501  502  503  504  505  506 Springer Nature 2021 L A T E X template A Universal Method for Crossing Molecular and Atlas Modalities 11 introduce Mutual Information scoring which assesses the differential expression of genes in space in a cell-independent manner. Specifically, we score each gene with the mutual information between the two random variables X, M which capture, respectively, an orientation in space and a relative density of mRNA expression for that gene (see Section 4.4). In the case of serial sections, as in the MERFISH data from the Allen Institute, each gene is assigned a score per section, with tallies taken across all sections to deduce which genes are most spatially variable across the entire brain. We note this approach is similar in spirit but not identical to that in [36] which uses the Kullback-Leibler divergence to find genes with differential expression across cells distributed in space. Shown in Figure 3 is a single section of Allen Institute MERFISH data depicting the distribution of three example genes with the lowest mutual information scores (top row, Chodl, Brs3, Hpse2 ) and the highest mutual information scores (middle row, Gfap, Trp53i11, Wipf3 ) computed across the whole set of 60 serial sections. In each case, conditional probabilities, ζ j (·) reflect the relative occurrence of each gene in the context of a subset of 20 total genes of either lowest (top row) or highest (middle row) mutual information. In line with expectations, 75% of the genes comprising those with scores in the bottom 25% of the total 700 genes were decoy genes (e.g. 'BLANK') without biological meaning but used as controls for assuring the quality of the dataset .  508  509  510  511  512  513  514  515  516  517  518  519  520  521  522  523  524  525  526  527  528  529  530  531  532  533  534  535  536  537  538  539  540  541  542  543  544  545  546  547  548  549  550  551  552 Springer Nature 2021 L A T E X template 12 A Universal Method for Crossing Molecular and Atlas Modalities Fig. 3 Relative expression per simplex, ζ j (·), of three genes with lowest (top row) and highest (middle row) mutual information score, computed across the entire set of 60 coronal sections in Allen Institute MERFISH sample, and shown on one section at approximately the coronal slice level of Z = 485 in the Allen atlas. Estimated probabilities ζ π j (·) for each of the three genes with highest mutual information (Gfap, Trp53i11, Wipf3 ) shown for each atlas region with the native atlas geometry (bottom row). Figure 3 is the estimated probability, (ζ π j ) j∈J0 , for each of the three genes with highest mutual information score (Gfap, Trp53i11, Wipf3 ), shown for each region on the Allen atlas section. For calculating these estimates we solve the variational algorithm with LDDMM and quadratic programming estimation of the gene feature distributions to map the Allen atlas section, Z = 485 to the Allen MERFISH target image-varifold . For this result, the smaller feature space of 20 total gene types corresponding to those with the highest mutual information scores are used for the mapping algorithm .  555  556  557  558  559  560  561  562  563  564  565  566  567  568  569  570  571  572  573  574  575  576  577  578  579  580  581  582  583  584  585  586  587  588  589  590  591  592  593  594  595  596  597  598 Springer Nature 2021 L A T E X template

Mapping of Cell Distributions to Atlases
The two previous sections presented results solving the mapping problem between atlas and MERFISH based on the mRNA reads directly. Alternatively, these raw mRNA reads can be segmented into discrete cells as a mode of data reduction followed by downstream analyses clustering the cells into discrete cell types. The mesh-based image varifold framework is ideal for taking the measure representation directly on the aggregated cells and solving the variational problem of mapping to atlas coordinates. Figure 4 shows the results of mapping an Allen atlas section at Z = 675 to a section of cell-segmented MERFISH transcriptional data (courtesy of the JEFworks Lab, Johns Hopkins University). The total gene set measured is ≈ 500 genes, with each transcript assigned to a single cell. Transcriptional profiles per cell are clustered into 33 distinct clusters using Leiden graph-based clustering [38] and annotated as cell types based on known marker genes. This gives a cell-based dataset analogous to the transcript-based dataset discussed in Section 2.2 in which densities, α j (·), reflect the spatial density of data points (here, # cells mm 2 ), and conditional probability distributions, ζ j (·), are defined over the feature space of cell types, |F| = 33.
The essential part of the model for estimating the atlas distribution over cell types is the stationarity of the model across each atlas partition. It is therefore natural to examine the entropy of the distribution within each atlas compartment as a measure of the multiplicity of cell types within a compartment. Shown in the right column is the entropy of the estimated probability distribution over cell types for each simplex in both target and atlas, with nonzero probabilities assigned to ∼ 3 − 5 distinct cell types in each simplex of the target versus ∼ 1 − 20 cell types in each simplex of the atlas, varying per region in the original ontology. We emphasize that there are various methods for solving the segmentation to cells and thereby dimension reduction as determined by the specific imaging technology. Some of the methods are rooted in image-based segmentation schemes such as the Watershed algorithm, operating jointly on transcriptional data and immunofluorescence images such as DAPI stains [39], while others utilize learning-based methods [40] for accommodating often a wider diversity of cell shapes and sizes. In either case, the assignment of mRNA reads to specific cells introduces a layer of functional information at the micron scale, which can now be modeled in lieu of or in tandem with the functional information at the nanometer scale (e.g. raw mRNA reads) as the feature space of a target image varifold to which we wish to map sections of an atlas. The image-varifold method is universal in the sense that it is agnostic to which discrete object is forming the information that provides the substrate for building correspondence.
In addition to cell type as the features associated to the cell aggregated transcriptome data, the feature space can remain gene type generated by aggregating the individual mRNA transcripts into an average gene expression feature per cell across the span of tissue. Shown in Figure 5 are the distribution of two genes (Ntrk3, Fzd3 ) out of a subset of 7 chosen to have the highest mutual information score. By normalizing the total mRNA per cell to 1, we estimate for an atlas section, a density, α π in units of # cells mm 2 , and a conditional distribution over gene types, ζ π , reflecting the probability per cell in the given simplex, of mRNA belonging to each gene type. Figure 5 shows these estimated probabilities ζ π for those genes whose probability of expression per cell is correspondingly shown in the MERFISH target section.  5 Gene type probabilities per cell for MERFISH section (top) for two genes Ntrk3 gene type (left) and Fzd3 gene type (right) out of a selected subset of 7 genes with high spatial discriminance according to mutual information score (see Section 4.4). Bottom shows estimated probabilities, ζ π , for corresponding coronal Allen slice Z = 675 .  693  694  695  696  697  698  699  700  701  702  703  704  705  706  707  708  709  710  711  712  713  714  715  716  717  718  719  720  721  722  723  724  725  726  727  728  729  730  731  732  733  734  735  736 Springer Nature 2021 L A T E X template 16 A Universal Method for Crossing Molecular and Atlas Modalities

Stability of Geometric Transformations Across Varying Feature Spaces
The previous section demonstrated the efficacy and flexibility of our algorithm at mapping cartoon atlases to a molecular target in the settings of that target carrying either gene-based or cell-based functional information. The assumed biological correlation between cell type and pattern of gene expression implies that signals of variation across cell types at the scale of microns should also exist across gene types at the scale of nanometers. Consequently, we might expect similar spatial deformation of a tissue scale atlas in mapping onto the same geometric target, but with conditional feature distributions defined over either gene or cell types, with partition boundaries deforming to match regions of homogeneity that would be roughly consistent across genes and cells. Figure 6 shows the diffeomorphisms estimated for mapping Allen atlas sections at Z = 890 and Z = 675 onto two MERFISH target sections carrying three different feature spaces constructed from the same starting spatial transcriptomics data. Comparing left to middle and right columns, we see similar geometric transformations, φ, estimated to bring atlas onto target image varifold carrying cell type (left) versus gene type (middle, right) functional features. Regions of shrinkage (blue) versus expansion (red) occur in consistent areas across the different cases, and the magnitude of that change, as measured by the determinant jacobian, |Dφ|, is also similar in each case. Furthermore, we illustrate the effect of using two different subsets of 7 spatially discriminating gene types as the feature space. The first carries a high score based on Moran's I (middle) and the second with a high mutual information score (right), as described in Sections 2.3 and 4.4. Here again, we observe similarity in the geometric mappings estimated for carrying atlas onto target between these two independent feature spaces. Hence, the manifest stability in the geometric mappings jointly estimated with the feature laws, (π ℓ ) ℓ∈L , over three different feature spaces supports the stability of our alternating algorithm in the face of different numbers and types of features, but also speaks to the stability of the biological organization across tissue, cellular, and molecular scales.

Generalizing the Methodology to Compare Atlas to Atlas
The variational problem we solve via quadratic programming and LDDMM mapping between coordinates in (5) is universal in the sense that varifold representations can not only be used to represent the MERFISH sections of A Universal Method for Crossing Molecular and Atlas Modalities cellular and molecular data, as described in sections 2.3 and 2.4, but as well can be used to represent atlases, themselves. This allows us to map multiple atlas ontologies, one to another. This is important because widespread variations in brain atlas ontologies have been developed to represent the molecular, chemical, genetic, and electrophysiological signals being measured across institutions. With different levels of granularity and different intended applications, multiple atlases per species now exist and are continuing to emerge [1,2,41,42,43,44]. While some atlases have been defined in the same coordinate framework-often achieved through existing methods of image registration or manual alignment [28]many exist in different coordinate frameworks. Together with mismatches in number, type, and positioning of partitions, this poses a challenge not only to the evaluation of each atlas ontology's fit to a molecular target, but also the ready comparison of atlas to atlas and the establishment of a clear metric of similarity between them. Figure 7 shows the results of mapping corresponding sections of both the ARA and Kim Lab Developmental atlas [43] to the cell-segmented MERFISH section of Figure 4. The images of predicted cell types with the highest probability (left column) for each compartment are shown for each ontology in the left column. The areas of the hippocampus (dashed circle) and striatum and amygdala (arrow) are partitioned with different levels of granularity. This leads to different optimal geometric transformations, as characterized by the determinant Jacobian (middle column), and different predicted cell type distributions (right column). Though both atlases are published as geometrically aligned [43], the diffeomorphism solving the variational problem transforms geometrically the homogenous regions between the atlas and target. Hence, regions of the amygdala and striatum undergo significant contraction in the optimal mapping of Kim but not Allen atlas to MERFISH given the partitioning of this region into fewer and thus larger presumed homogenous regions in the Kim atlas. The right column exhibits the entropy of the distributions over cell types estimated for each region in each atlas. Here, the hippocampus is more finely partitioned in the Kim atlas, which yields lower entropy distributions over cell types than in those estimated for the Allen atlas.
The universality of the variational problem allows for direct mapping across atlas ontologies. Here, atlases are taken to have constant density, α min = α max = 1. Thus, the mapping problem from atlas with ontology L 0 onto the target with ontology L 1 optimizes over the feature laws, (π ℓ ) ℓ∈L 0 , with the target atlas ontology L 1 taken as the target feature space, The joint estimation of geometric transformation, φ and conditional feature laws, (π ℓ ) ℓ∈L in our mapping methodology offers two modes of quantitative comparison of these atlas ontologies. First, as in classical image setting of LDDMM, the determinant jacobian, |Dφ|, of the estimated diffeomorphism, can be used as metric of how similar the atlas ontologies are, reflective of how much boundaries of partitions move to maximize overlap between homogenous regions. However, unlike in classical image settings, the estimation of the additional family of feature laws here affords a second metric of similarity with computation of the entropy of the estimated conditional feature distributions. Figure 8 shows the results of mapping one mouse atlas ontology to another with the Z section 680 in the ARA mapped to the corresponding section in the Kim Lab Developmental atlas (top row) and vice versa (bottom row). The leftmost column depicts the geometry of the section under each ontology, with the Allen section hosting ≈ 140 independent regions and the Kim section ≈ 80. In this setting, both atlases are published in the same coordinate framework, giving φ = Id and thus, highlighting, instead, the estimated distributions over the other ontologies. The middle column depicts the estimated conditional probability distributions, ζ π , for each atlas section over the other atlas section's ontology. The label with the highest probability in these distributions is plotted for each simplex in the mesh and which is consistent across each partition of each original atlas, given the homogeneity assumption in our model (i.e. a single π ℓ for each ℓ ∈ L 0 ). The comparatively larger set of labels in the Allen ontology results in labels being omitted from the corresponding estimated set of labels on the Kim ontology section (middle column, bottom row) while multiple regions in the Allen ontology carry the same most probable region in the Kim ontology. The right column of Figure 8 captures this difference in depicting the entropy of the estimated conditional feature distributions, (ζ π j ) j∈J0 , for each simplex of the mesh. The entropy of the distributions estimated for the Kim ontology over the Allen ontology (bottom) is on average, higher, than that of the distributions estimated for the Allen ontology (top), with probability mass distributed across ∼ 5−7 different Allen regions for each Kim region of cortex. Nevertheless, we see close to 1:1 correspondence between Allen and Kim labels in the center section of the slice, where entropy of the estimated distributions is near 0. Fig. 8 Original and predicted ontologies for Allen (top) and Kim (bottom) atlases. Left column illustrates original ontologies. Middle column illustrates Allen atlas geometry with Kim atlas ontology (top) and Kim atlas geometry with Allen atlas ontology (bottom). Right column shows entropy of predicted ontologies, with higher entropy values (light) indicating less 1:1 correspondence between ontologies.
Atlas ontologies can be mapped not just within species but also across them, where both geometric transformations and estimated ontology distributions, together reflect metrics of comparison between the two. Figure 9 shows the mapping of coronal section, Z = 537, in the ARA to a coronal section, Z = 628 in the Waxholm Rat Brain Atlas [44], with both sections chosen to correspond as sections through the anterior commissure. The left column shows both atlas ontologies with |L 0 | ≈ 120 for the the Allen atlas section and |L 1 | ≈ 30 for the Waxholm atlas section. The middle column depicts the initial differences in size and shape (top) between the two tissue sections. After scaling the volume of the mouse brain by 1.5, additional deformation, with magnitude given by the determinant jacobian, |Dφ|, distorts both internal and external tissue boundaries to align homogeneous regions in each atlas, such as cingulate area to cingulate area (white arrow). Estimated distributions over Fig. 9 Results of mapping coronal section Z = 537 of ARA to corresponding coronal section of Waxholm Rat Brain Atlas at Z = 628, both chosen to be through the anterior commissure. Left shows both original atlas sections. Middle column shows initial tissue overlap between mouse and rat section (top) and resulting overlap following action of estimated diffeomorphism on mouse tissue (bottom). Determinant of Jacobian highlights areas of expansion (red) and contraction (blue) in mouse section deforming to match rat section, with white arrow highlighting expansion in cingulate area needed to match region in mouse to corresponding region in rat. Right column shows entropy for each mouse region's predicted distribution of rat labels (top) and predicted rat label with highest probability (bottom).
the Waxholm ontology labels for each region in the Allen atlas are shown in the right column, summarized by the maximum probability label (bottom) and measures of entropy (top), which highlight in gray, Allen regions mapping to ≈ 3 − 4 Waxholm regions versus those in black achieving 1:1 correspondence.

Discussion
We have introduced, here, a universal method for mapping tissue scale, 'cartoon' atlases to molecular and cellular datasets arising in the context of emerging transcriptomics technologies. We root our method in the modeling of each object as a mesh-based image varifold, as previously described [27], and outline an alternating algorithm that simultaneously incorporates the classical deformation tools of LDDMM [29] with quadratic programming to jointly estimate an optimal geometric transformation and conditional feature law that maps atlas onto target. A Universal Method for Crossing Molecular and Atlas Modalities As presented, our method fills a current need, as highlighted previously [35], for universal tools that can integrate the diverse types and large quantities of data emerging from the evolution of both transcriptomics and imaging technologies over the last decade. With each technology generating a slightly different perspective and different set of animal or human samples to compare, a method that can stably handle the format of past, current, and future datasets will be paramount to integrate both new findings with the vast number of datastores currently available across institutes. The image varifold framework used here is general enough to model emerging transcriptional data from both image-based and spot-resolution technologies and classical imaging data (as demonstrated in our atlas-to-atlas mappings). Therefore, it provides a gateway for comparing data historically curated through immunohistochemistry, MRI, and other techniques in addition to the emerging transcriptomics methods.
In parallel to the development and dispersion of diverse molecular datasets, there has been continued development on the side of reference atlases to reflect trends in these new measures and integrate these trends across even more samples of particular species. Our method offers a tool for re-examining and comparing existing atlas ontologies in the context of new data [35], and serves as a means for developing new atlases in the future. As described in Section 2.6, examination of the mappings achieved between different atlases and the same molecular target offers an indirect comparison between atlases in the context of a particular molecular setting. However, this comparison can also be made directly in a context-independent setting by harnessing our method to map atlas to atlas. In the field of evolutionary biology, for instance, our method could aid in the mapping and comparison of atlases across species [45,46] and in the field of developmental neurobiology, the available atlases of the brain at different stages of development [42,41]. With regard to atlas refinement and creation, the invertibility of the estimated diffeomorphism in the setting of mapping atlas to molecular target, enables the carrying of each target into the same coordinate space of the atlas. Here, the molecular and cellular scale raw reads could be averaged across individual samples, thus providing a mode for defining new atlas segmentation schemes of homogenous regions across these samples.
While the results presented here survey a wide variety of potential applications of our method to mapping atlas modalities to diverse targets, there remain uncertainties and potential modes of improvement that are the subject of current and future work. First, we have presented results mapping 2D sections of 3D atlases to corresponding 2D sections of MERFISH data. The Allen MERFISH data showcased here is part of an entire set of serial sections that span the whole brain. Consequently, we are optimizing our method to compute mappings of atlas to molecular target in 3D, where both added dimensions and added magnitudes of data contribute to the theoretical and computational complexity of the problem. Indeed, with ≈ 6 billion individual transcripts measured across the span of the brain, treatment of this data as a regular lattice image would require on the order of 1000 billion voxels at 1 µm resolution, which is coarser than that needed even to resolve two molecules of mRNA. Hence, it becomes even more vital to treat such data in the particle setting, as presented here, where we capture the sparsity and irregularity of the data in modeling it effectively in its lowest dimension, as 6 billion individual particle measures. Second, though we have highlighted both gene-based and cell-based datasets achieved with image-based MERFISH technologies, we are currently investigating the use of our method to map data from spot-resolution technologies such as SlideSeq [47] and additional image-based technologies such as BarSeq [48], which introduce variations in both the number of genes measured and the scope of tissue (whole versus hemi-brain) measured.
Finally, we emphasize that central to the model posed here is the underlying assumption that each compartment has a homogeneous distribution over molecular features that is stationary with respect to space. This assumption holds in many settings, as we might expect, given the inherent construction of atlases often to delineate regions of particular cell types and thus, where we see a set of predominant cell types or gene types consistently across the region in the molecular scale data, as in Figures 4 and 5. However, we also see examples where this homogeneity assumption may not be appropriate. An example of this is seen in Figure 3 where the expression of Trp53i11 appears to be distributed along a decreasing gradient medial to lateral within the striatum. Notably, the results presented here reflect a particular balance between expected deformation and this homogeneity assumption, imposed by the relative weighting of the separate terms in the cost function. Current work at controlling this balance further includes the addition of a term controlling the divergence of the vector field to the energy defined in the variational problem 5, which leads to solutions more robust to deformation within the interior of the tissue. Future work will also include more rigorous evaluation of how well this homogeneity assumption holds and the effect the given balance between the two terms might have in different settings.

Construction of Mesh-based Image Varifold for Different Modalities
As introduced in Section 2.2, we represent each image varifold object as a triangulated mesh. Each mesh is built from a collection of vertices, x = (x i ) i∈I with each x i ∈ R 2 , here. Each simplex in the mesh is defined from the vertices denoted as γ(x) and is paired with a 3-tuple with components that index the vertices of the simplex, (γ(x), c = (c 1 , c 2 , c 3 ) ∈ I 3 ) and determine the center . Each triangle simplex is defined by The total mesh τ is the collection of vertices x, and simplices and centers (γ j (x), c j = (c 1 j , c 2 j , c 3 j ), m j (x)) j∈J with the resolution determining the complexity as total numbers of vertices |I| and the number of simplices |J| in the mesh.
Meshes were constructed using Delauney triangulation [49] on a grid defined over the support of the starting dataset with the size of each square dictated by the input resolution. Varifold measures, α, ζ, were associated to the simplices of the mesh following assignment of each individual data point (e.g. mRNA or cell read) into its single nearest simplex. Meshes were pruned of simplices that both contained fewer than 1 data point and existed outside the largest connected component of simplices containing at least one data point. In this manner, both for atlas images and transcriptomics data sets, resulting simplex meshes spanned the entire tissue foreground.

Molecular Scale Varifold Norm
To specify the image varifold norm for µ ∈ W * , ∥ · ∥ 2 W * , it suffices to provide the inner product between Diracs ⟨δ , the right-hand side the kernel with for any weighted sum µ in Eqn. (1) then Throughout we use the kernel product K((x, f ), (x ′ , f ′ )) = K 1 (x, x ′ )K 2 (f, f ′ ) chosen as a Gaussian over physical space

Alternating LDDMM and Quadratic Program Algorithm for Joint Optimization
For solving the variational problem of (5) we follow [27] using an alternating optimization, fixing the laws (π ℓ ) ℓ∈L and optimizing over the control v(t), t ∈ [0, 1] and integrating it to generate the diffeomorphim φ 1 , then fixing the diffeomorphm and using quadratic programming to estimate the laws. The variational problem of (5) is optimized using LDDMM by flowing the atlas φ t · µ π τ0 to minimize the target norm to the endpoint µ τ . Smoothness is enforced via the reproducing kernel Hilbert space norm on the control ∥ · ∥ V which controls the differentiability of the flow of vector fields, which is sufficient to guarantee an invertible diffeomorphic result [50]. Holding that fixed we alternately optimize (5) with respect to the laws (π ℓ ) ℓ∈L using quadratic programming, such as OSQP [51]. We loop until convergence.
For simplex triangles within the interior of each atlas region, denoted j ∈ J 0 \ ∂J 0 , thenπ j = π ℓ * (j) , j ∈ J 0 \∂J 0 and these approximations are an equality for all interior pairs of vertices.
For all results shown, the template and target are initially aligned through separate estimation of rigid transformations (translation and rotation) and a single isotropic scaling applied to the template to bring the total area of the template to equal that of the target. Rigid transformations are estimated by minimizing the varifold normed difference Eqn. (9) between the rotated and translated template atlas µ π τ0 transformed to the target µ τ . Everything being specified, gradient based optimization is performed until convergence or a specified number of iterations. In LDDMM, we use L-BFGS optimization combined with a line search using the Wolfe condition. In rigid registration, we directly optimize the varifold norm of the difference, also using the L-BFGS method.

Mutual Information Score for Discriminating Spatially Informative Genes
To deduce which genes are spatially variant with respect to their expression patterns, we assign to each gene a score based on mutual information. This score specifically measures the mutual information between a random variable, M g , that reflects the number of counts of gene g in a given neighborhood, and a random variable, X, that partitions this neighborhood vertically or horizontally into two domains. We describe, here, a method for computing this score particularly in settings of large amounts of data, where discretization is favorable for computational efficiency. This method, as illustrated in Figure  10, is applied for each gene independently on each measured section of tissue, where collective scores per gene be be garnered by tallying each gene's score per section across the entire set of sections. The support of the tissue section is first covered by a grid, as shown in the left panel of Figure 10, with squares of size σ × σ. In the results shown in Sections 2.3 and 2.4, we choose σ = 50µm. In each square, we compute the total number of mRNA expressed per each gene in that square, denoted by N g for gene g. Let F g (t) = P (N g ≤ t) be the cumulative distribution function for gene g, estimated from the empirical distribution of N g across all squares in our grid. We define the binning function ϕ g (n) = q k=1 1 n≥t k for t k = inf{t ≥ 0|F g (t) ≥ k/q} and with k ∈ [1, q] denoting the k-th q-quantile. This gives a discrete (normalized) value of mRNA counts for gene g in each square of the grid, as shown in the middle panel of Figure 10 for g = Gfap.