Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

A Probabilistic Approach For Registration Of Multi-Modal Spatial Transcriptomics Data

Yu Qiang, Shixu He, Rengpeng Ding, Kailong Ma, Yong Hou, Yan Zhou, Karl Rohr
doi: https://doi.org/10.1101/2021.10.05.463196
Yu Qiang
1MGI, BGI-Shenzhen, Shenzhen, 518083, China
2Heidelberg University, BioQuant, IPMB, and DKFZ Heidelberg, Biomedical Computer Vision Group, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shixu He
1MGI, BGI-Shenzhen, Shenzhen, 518083, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rengpeng Ding
1MGI, BGI-Shenzhen, Shenzhen, 518083, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kailong Ma
1MGI, BGI-Shenzhen, Shenzhen, 518083, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yong Hou
1MGI, BGI-Shenzhen, Shenzhen, 518083, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yan Zhou
1MGI, BGI-Shenzhen, Shenzhen, 518083, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: happyqiangyu@gmail.com
Karl Rohr
2Heidelberg University, BioQuant, IPMB, and DKFZ Heidelberg, Biomedical Computer Vision Group, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: happyqiangyu@gmail.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

ABSTRACT

Observing the spatial characteristics of gene expression by image-based spatial transcriptomics technology allows studying gene activity across different cells and intracellular structures. We present a probabilistic approach for the registration and analysis of transcriptome images and immunostaining images. The method is based on particle filters and jointly exploits intensity information and image features. We applied our approach to synthetic data as well as real transcriptome images and immunostaining microscopy images of the mouse brain. It turns out that our approach accurately registers the multi-modal images and yields better results than a state-of-the-art method.

1. INTRODUCTION

Spatial transcriptomics (ST) technologies based on next generation sequencing (NGS) systematically generate spatial measurements of gene expression in an entire tissue sample, which bridge the gap between spatial information and the whole transcriptome [1]. Advanced spatial transcriptomics platforms, like Stereo-seq [2] or Seq-scope [3], achieve nanoscale resolution, enabling determining subcellular compartmentalization and visualization of RNA sequencing data.

However, for the current ST technologies with nanoscale resolution it is difficult to accurately assign spots of the transcriptome images (gene expression matrix images) to specific organelles or cells [1]. The information about gene expression can be exploited in different ways, for example, to characterize gene expression patterns or to classifiy cell types in the tissue [4–6]. However, lack of distinct cell boundaries in the transcriptome images presents a big challenge for automated analysis. Fast and accurate registration of transcriptome images and immunostaining images can facilitate the assignment of expressed genes to specific cells to enable studying sub-cellular gene expression patterns.

Fig. 1.
  • Download figure
  • Open in new tab
Fig. 1.

Registration result of our approach for an image section: (a) Immunostaining image, (b) transcriptome image (gene expression matrix image), and (c) overlay of registered images.

Although in situ sequencing (to generate transcriptome images) and immunostaining are carried out on the same tissue, there are multiple factors that can cause spatial shifting, for example, the sample preparation process and dispersion of RNAs after tissue permeabilization. Manual alignment of two images is time consuming, and only enables partial alignment in most cases [3, 6]. Therefore, methods are needed for efficient and accurate registration of large-scale transcriptome images and immunostaining images with tens of thousands of cells.

In previous work on automatic registration of spatial transcriptomics image data only few methods were introduced. [7] described a multiinformation-based method for registration of multiplexed in situ sequencing (ISS) datasets from the Human Cell Atlas project. Multiinformation is defined as KL divergence between a joint distribution and a product of marginal distributions, and used in conjunction with FRI (finite rate of innovation) sampling and swarm optimization. However, the method is computationally expensive since multiinformation is costly to compute.

In this work, we introduce a novel probabilistic approach for registration and analysis of transcriptome images and immunostaining images. The approach is based on a probabilistic Bayesian framework and uses particle filters to determine the transformation between the multi-modal images. Intensity information and image features are jointly taken into account.

We applied our approach to synthetic data as well as real transcriptome images and immunostaining microscopy images of the mouse brain. It turns out that our approach successfully registers the multi-modal images and yields better results than a state-of-the-art method.

2. METHODOLOGY

Our probabilistic registration approach consists of three main steps: (i) Generation of gene expression matrix image, (ii) spot detection and joint probabilistic registration, and (iii) Voronoi-based cell region determination.

2.1. Generation of expression matrix image

The gene expression matrix images (transcriptome images) are generated by inserting white dots on a black canvas (rectangular region for drawing dots). The x and y coordinates represent the horizontal and vertical positions of a pixel, respectively. In addition, the unique molecular identifier (UMI) count value of each position is used as the intensity value of the dot.

In order to emulate the real distance between dots, we adjust the dot size according to the relation between the dot size and the canvas image size. Let dspot be the diameter of a circular dot, w the width of a canvas image (unit: inch), h the horizontal coordinate range of the gene expression matrix image, and fmag the factor for dot magnification. The size of each dot is then calculated by: Embedded Image

2.2. Probabilistic approach for registration

Our aim is to study the relationship between the spatial cell structure in the immunostaining image and the corresponding gene expression distribution in the gene expression matrix image (transcriptome image). The goal of registration is to assign Ng gene expression spots to Nc cells. Such assignment can be represented by a non-negative assignment matrix ω with elelments Embedded Image that denote the strength of gene expression for a spot ng within a cell nc (using a binary assignment Embedded Image). Some nodes may be assigned to no cells. In a Bayesian framework, we can formulate this task by estimating the posterior probability density p(ω | Ii, Ig), where Ii is the immunostaining image and Ig is the gene expression matrix image. We denote the positions of all detected spots in the immunostaining image by Embedded Image and the positions of all detected spots in gene expression image by Embedded Image . To detect and localize multiple bright spots in an image, the spot-enhancing filter [8,9] is used. Since all cells are located within the same plane in our application, the transformation between the multi-modal images can be represented by a homography H. Since Embedded Image and Embedded Image are conditionally independent of Ii and Ig, by using Bayes’ theorem, we can write Embedded Image

In our work, the transformation matrix is represented by Embedded Image

To determine H, we need to compute the 5D parameter vector x = (sx, sy, θ, tx, ty) with rotation angle θ, scaling sx, sy, and translation tx, ty. As similarity metric between corresponding images we suggest using a combination of mutual information (MI) for the image intensities and the point set distance. The metric is maximized to align the multi-modal images.

Let MI (Ii, H(x)Ig be the mutual information of Ii and H(x)Ig, and D(Yi, H(x)Yg) be the sum of distances to closest points (nearest neighbors) between points from Yi and H(x)Yg. We search the optimal parameter vector x∗ by maximizing the following likelihood function Embedded Image where σMI is the standard deviation of MI and σD is the standard deviation of the point set distance. Since MI(r) ≥ 0 and D(r) ≥ 0, the first term and the second term in Eq. (4) have opposite monotonicity.

We could naively search the whole parameter space to find the maximum likelihood to achieve the best alignment between the two images. However, this is time consuming. Here, we suggest using particle filters to efficiently determine the optimal x* in Eq. (4), given a random initialization x.

Furthermore, we can estimate Embedded Image for each gene expression spot ng at each cell nc for a given observations Embedded Image and Embedded Image. We use Voronoi tessellation [10] to identify cell regions (rough boundary of a cell) to assign gene expression spots to cells.

3. EXPERIMENTAL RESULTS

We compared our probabilistic registration approach with the state-of-the-art registration approach [7] using synthetic as well as real immunostaining images and gene expression matrix images. For synthetic data, we generated immunostaining images (500 × 500 pixels) using SimuCell [11] and randomly determined transformations as ground truth (GT) transformations. For the real data of the mouse brain (Stereo-seq [2], 964 × 964 pixels, pixel size 0.65 × 0.65μm2), the GT transformation was obtained by manual alignment.

Fig. 2.
  • Download figure
  • Open in new tab
Fig. 2.

Results of our method for synthetic data: (a) Immunostaining image, (b) gene expression matrix image, (c) detected spots in immunostaining image (red), and (d) detected spots in gene expression matrix image (blue).

The target registration error (TRE) is used to quantify the registration result. The TRE is the average distance between the positions determined by the computed transformation parameters and the positions using the GT parameters. We employed six points to compute the TRE and determined the average position error in the x and y directions of the corresponding aligned points. Let H(x*) be the computed transformation matrix and pq, q ϵ {1, … , 6} be the manually selected points, then the TRE is computed by Embedded Image where dx, dy denote the distance between two corresponding points in x and y direction. Embedded Image are two corresponding points in the immunostaining image and the gene expression matrix image.

For our probabilistic registration approach, we used 50 random samples and 10 time steps for the particle filter to determine the transformation matrix in Eq. (3).

Fig. 3.
  • Download figure
  • Open in new tab
Fig. 3.

Results of our method for synthetic data with noise: (a) Overlay of detected spots in both images before registration (red: detected spots in immunostaining image, blue: detected spots in gene expression matrix image), (b) overlay of detected spots in both images after registration, (c) overlay of registered immunostaining image and gene expression matrix image, (d) computed cell regions by Voronoi tesselation (each cell is represented by a different color).

Table 1 and Table 2 provide the TRE of our registration approach and the method in [7] for four pairs of synthetic images and 4 pairs of real images. For all image pairs, the TRE of our approach is much lower than that of method [7]. Thus, our approach is much more accurate than the previous method for the considered challenging data. Further, the computational performance of our approach is about 3 times faster than the previous method [7].

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 1.

Target registration error for synthetic images.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 2.

Target registration error for real images.

Fig. 4.
  • Download figure
  • Open in new tab
Fig. 4.

Results of our method for real data (image section): (a) Immunostaining image, (b) gene expression matrix image, (c) overlay of registered images, (d) computed cell regions (red crosses: detected center points of nuclei in immunostaining image using the spot-enhancing filter [8], blue points: detected spots in gene expression matrix image, yellow lines: computed cell regions).

4. CONCLUSIONS

We have presented a probabilistic approach for multi-modal registration of transcriptomics image data. Our approach determines the transformation between immunostaining images and gene expression matrix images. The method jointly exploits intensity information and image features. Our approach has been successfully applied to synthetic data and real spatial transcriptomics data of the mouse brain, and we found that our approach yields better results than a state-of-the-art method.

Compliance with Ethical Standards

This study was performed retrospectively using animal subject data from a previous study conducted under the approval of the Animal Care and Use Committee of Institute of Biomedicine and Health.

5. ACKNOWLEDGEMENTS

This work was supported by the BGI-Research and the Institute of Neuroscience (ION) of the Chinese Academy of Sciences. In addition, The authors gratefully acknowledge Prof Mu-ming Poo, Qing Xie, and Ao Chen for providing mouse brain spatial transcriptomics data and many helpful discussions during development.

6. REFERENCES

  1. [1].↵
    A. Rao, D. Barkley, G. S. França, and I. Yanai, “Exploring tissue architecture using spatial transcriptomics,” Nature, vol. 596, no. 7871, pp. 211–220, 2021.
    OpenUrlCrossRef
  2. [2].↵
    A. Chen, S. Liao, K. Ma, L. Wu, Y. Lai, J. Yang, W. Li, J. Xu, S. Hao, X. Chen, et al., “Large field of view-spatially resolved transcriptomics at nanoscale resolution,” bioRxiv, 2021.
  3. [3].↵
    C.-S. Cho, J. Xi, Y. Si, S.-R. Park, J.-E. Hsu, M. Kim, G. Jun, H. M. Kang, and J. H. Lee, “Microscopic examination of spatial transcriptome using seq-scope,” Cell, vol. 184, no. 13, 2021.
  4. [4].↵
    N. Yoosuf, J. F. Navarro, F. Salmén, P. L. Ståhl, and C. O. Daub, “Identification and transfer of spatial transcriptomics signatures for cancer diagnosis,” Breast Cancer Research, vol. 22, no. 1, pp. 1–10, 2020.
    OpenUrlCrossRefPubMed
  5. [5].
    M. Saiselet, J. Rodrigues-Vitória, A. Tourneur, L. Craciun, A. Spinette, D. Larsimont, G. Andry, J. Lundeberg, C. Maenhaut, and V. Detours, “Transcriptional output, cell-type densities, and normalization in spatial transcriptomics,” Journal of Molecular Cell Biology, vol. 12, no. 11, pp. 906–908, 2020.
    OpenUrl
  6. [6].↵
    W.-T. Chen, A. Lu, K. Craessaerts, B. Pavie, C. S. Frigerio, N. Corthout, X. Qian, J. Laláková, M. Kühnemund, I. Voytyuk, et al., “Spatial transcriptomics and in situ sequencing to study Alzheimer’s disease,” Cell, vol. 182, no. 4, pp. 976–991, 2020.
    OpenUrlPubMed
  7. [7].↵
    R. Chen, A. B. Das, and L. R. Varshney, “Registration for image-based transcriptomics: Parametric signal features and multivariate information measures,” in Proc. 2019 53rd Annual Conference on Information Sciences and Systems (CISS). IEEE, 2019, pp. 1–6.
  8. [8].↵
    D. Sage, F. R. Neumann, F. Hediger, S. M. Gasser, and M. Unser, “Automatic tracking of individual fluorescence particles: application to the study of chromosome dynamics,” IEEE Transactions on Image Processing, vol. 14, no. 9, pp. 1372–1383, 2005.
    OpenUrlCrossRefPubMedWeb of Science
  9. [9].↵
    Y. Qiang, J. Y. Lee, R. Bartenschlager, and K. Rohr, “Colocalization analysis and particle tracking in multi-channel fluorescence microscopy images,” in Proc. ISBI 2017. IEEE, 2017, pp. 646–649.
  10. [10].↵
    M. Bock, A. K. Tyagi, J.-U. Kreft, and W. Alt, “Generalized Voronoi tessellation as a model of two-dimensional cell tissue dynamics,” Bulletin of Mathematical Biology, vol. 72, no. 7, pp. 1696–1731, 2010.
    OpenUrlCrossRefPubMed
  11. [11].↵
    S. Rajaram, B. Pavie, N. E. Hac, S. J. Altschuler, and L. F. Wu, “Simucell: a flexible framework for creating synthetic microscopy images,” Nature Methods, vol. 9, no. 7, pp. 634–635, 2012.
    OpenUrl
Back to top
PreviousNext
Posted October 21, 2021.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A Probabilistic Approach For Registration Of Multi-Modal Spatial Transcriptomics Data
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A Probabilistic Approach For Registration Of Multi-Modal Spatial Transcriptomics Data
Yu Qiang, Shixu He, Rengpeng Ding, Kailong Ma, Yong Hou, Yan Zhou, Karl Rohr
bioRxiv 2021.10.05.463196; doi: https://doi.org/10.1101/2021.10.05.463196
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
A Probabilistic Approach For Registration Of Multi-Modal Spatial Transcriptomics Data
Yu Qiang, Shixu He, Rengpeng Ding, Kailong Ma, Yong Hou, Yan Zhou, Karl Rohr
bioRxiv 2021.10.05.463196; doi: https://doi.org/10.1101/2021.10.05.463196

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3505)
  • Biochemistry (7346)
  • Bioengineering (5323)
  • Bioinformatics (20260)
  • Biophysics (10016)
  • Cancer Biology (7743)
  • Cell Biology (11300)
  • Clinical Trials (138)
  • Developmental Biology (6437)
  • Ecology (9951)
  • Epidemiology (2065)
  • Evolutionary Biology (13321)
  • Genetics (9361)
  • Genomics (12583)
  • Immunology (7701)
  • Microbiology (19021)
  • Molecular Biology (7441)
  • Neuroscience (41036)
  • Paleontology (300)
  • Pathology (1229)
  • Pharmacology and Toxicology (2137)
  • Physiology (3160)
  • Plant Biology (6860)
  • Scientific Communication and Education (1272)
  • Synthetic Biology (1896)
  • Systems Biology (5311)
  • Zoology (1089)