A Web-based Software Resource for Interactive Analysis of Multiplex Tissue Imaging Datasets

Highly multiplexed tissue imaging (MTI) are powerful spatial proteomics technologies that enable in situ single-cell characterization of tissues. However, analysis and visualization of MTI datasets remains challenging, and we developed the Galaxy-ME software hub to address this challenge.Galaxy-ME is a web-based, interactive software hub that enables end-to-end analysis and visualization of MTI datasets and is accessible to everyone. To demonstrate its utility, Galaxy-ME was used to analyze datasets obtained from multiple MTI assays and evaluate assay concordance in both normal and cancerous tissues. Galaxy-ME is a publicly available web resource.

also used extensively in several large tissue atlas consortia to create detailed 2D tissue maps for interrogating cellular and spatial relationships. Atlas consortia using MTI include the Human Cell Atlas (HCA) 11 , the Human BioMolecular Atlas Program (HuBMAP) 12 , and the Human Tumor Atlas Network (HTAN) 13 .
With the substantial growth and use of MTI technologies in the biomedical research community, there is an acute need for robust analytical tools and visualizations of MTI datasets. A typical MTI dataset is tens or hundreds of gigabytes in size and includes images of a tissue sample from which each assayed marker protein is measured, yielding a stack of 30-100 images and tens or hundreds of thousands of cells analyzed per sample. A complete software analysis workflow for MTI often uses dozens of analysis tools to: (1) perform primary image processing to produce single-cell feature tables with marker intensity levels, morphological information, and spatial coordinates and (2) complete single-cell analysis to classify individual cells and quantify spatial relationships amongst cells. While several computational workflows that execute all tools in sequence for particular MTI assays, including CycIF 14 , CODEX 15 , mIHC 16 , and IMC 17 , have been developed 18,19 as well general purpose image analysis platforms, such as CellProfiler 20 , QuPath 21 , and Fiji 22 , exist and provide some functionality for processing MTI datasets, there remains substantial challenges in accessibility, tool integration, and scalability that make it difficult to analyze MTI datasets. Analysis workflows that scale to many samples are often executed using a command line text interface, requiring significant computational expertise to use. Whereas some desktop applications integrate analysis and visualizations together, these applications cannot easily be deployed on disparate infrastructures at scale.
To address these challenges, we have developed Galaxy-MCMICRO Ecosystem (Galaxy-ME), a user-friendly and highly scalable, web-based software hub for interactive analysis of MTI datasets. Galaxy-ME provides a web-based software workbench for MTI analyses that is accessible to all scientists, a comprehensive tool and visualization suite for analysis of MTI datasets, and infrastructure to ensure that all analyses are scalable and reproducible ( Figure 1). Galaxy-ME provides software analysis tools for (1) primary image processing to produce single-cell feature tables that include marker intensity levels, morphological information, and spatial coordinates; (2) single-cell analysis to classify individual cells and quantify spatial relationships amongst cells; and (3) interactive visualization of images and analysis results. Galaxy-ME is built on the Galaxy computational workbench (https://galaxyproject.org/) 23,24 , an open-source platform for user-friendly, reproducible, and collaborative biomedical data analyses. Galaxy is among the most popular software analysis platforms in the world and used by thousands of scientists daily.
Galaxy-ME builds on and substantially advances our prior work developing primary image analysis workflows for multiplexed microscopy images via integration of the MCMICRO tool suite 19 into Galaxy. Galaxy-ME's advances include (1) expanding MCMICRO with many additional tools for analysis, visualization, and dashboarding of MTI datasets to create a tool suite of 17 tools that enables end-to-end analysis of MTI datasets, including both prior knowledge and data-driven based phenotyping and spatial analyses (Supplemental Table 1); (2) leveraging the full capabilities of Galaxy for data analysis, workflow editing and scalable execution, and interactive image viewers and dashboards for visualization of MTI datasets; and a (3) fully-featured and web-based platform for analysis of MTI datasets. The tools and visualizations in Galaxy-ME represent the current best-practice analysis approaches and integrate analysis tools and visualizations from multiple tissue atlas consortia including HCA, HuBMAP, and HTAN. Using Galaxy-ME, scientists can analyze datasets from several MTI assays, including CycIF, mIHC, and CODEX.
Building on top of the Galaxy platform enables Galaxy-ME to take advantage of the accessibility, scalability, and reproducibility features that Galaxy offers. Galaxy-ME uses Galaxy's web-based graphical user interface (GUI), making analysis of MTI datasets widely accessible, regardless of computational expertise. The Galaxy-ME GUI makes it simple to move between selecting input datasets, running analysis tools/workflows, and visualizing imaging data or single cell analysis results. Galaxy-ME's analysis tools and visualizations are orchestrated and executed on remote computing resources by the Galaxy server. By using the Galaxy framework and sufficient remoting computing resources, Galaxy-ME can scale its analyses to process collections of imaging datasets that are hundreds of terabytes in size. Galaxy and Galaxy-ME are open-source and freely available. There are two public web services for using Galaxy-ME, https://cancer.usegalaxy.org/ and https://spatialomics.usegalaxy.eu/, and Galaxy-ME can also be downloaded and run locally as well.
To demonstrate utility, Galaxy-ME was used to perform a fully automated analysis on both healthy and diseased tissue datasets from three distinct MTI assays (CycIF 1 , mIHC 2 , and CODEX 3 ). The analysis included primary image processing and single cell analysis to quantify compositional and spatial features of the tissues and assess assay concordance. To the best of our knowledge, this is the first comparison of concordance across these MTI assays. A supplementary webpage linking to all Galaxy histories for this work is available at https://bit.ly/GalaxyME-Histories.
Galaxy-ME was used to perform an in-depth exploration of the compositional and spatial landscape using three MTI datasets generated by the HTAN 13 consortium where consecutive tissue sections from a healthy human tonsil tissue resection were profiled with CycIF, mIHC, and CODEX ( Figure  2a). A Galaxy-ME workflow (Figure 2b) was implemented and applied to segment cells, quantify marker intensity, phenotype cells and quantify tissue cellular composition, compute spatial metrics between cells, and create Vitessce 25 dashboards for interactive analysis (Figure 2c). Mean cell count obtained from nuclear segmentation across the three assays was 69,329 cells with a standard deviation of 3,152 cells. The physical separation of the non-adjacent slides and assay-specific nuclear segmentation performance likely account for differences in total cell counts. The proportion of positive cells for the shared markers across the three MTI assays-Pan Cytokeratin, CD45, CD20, and CD8-was concordant across the 3 assays, with the exception of CD45, likely due to antibody performance issues and illumination artifacts in the CODEX data ( Figure S1). Cell populations and tissue composition were highly concordant across the assays (Figure 2d). Spatial patterns across assays based on the normalized spatial interaction score 26 between cell populations were highly similar, demonstrating spatial organization is preserved across assays (Figure 2e). , and CD20 (yellow). b, the Galaxy-ME workflow and history used for the CODEX tonsil image analysis. c, a Vitessce dashboard, launched out of the CODEX analysis history in Galaxy, includes views of the phenotype-labeled segmentation mask overlaid onto the registered image, compositional barplots, UMAP representations, violin plots of marker expression, and heatmaps (not shown). Individual cells can be highlighted using a cursor and data representing highlighted cells will be spotlighted in each of the plots. d, stacked barplots cell phenotype proportions in the tonsil ROI across each assay. e, barplot grid of pairwise spatial interaction scores across cell phenotypes.
Galaxy-ME was next applied to datasets generated using CycIF and mIHC on adjacent serial sections from an HTAN colorectal cancer (CRC) resection. Analysis with Galaxy-ME was performed on 7 regions of interest (ROIs) that showed a variety of tissue compositions (Figure 3a, S2, S3). Consistent with the tonsil analysis, total cell counts based on nuclear segmentation were concordant across the ROIs for both CycIF and mIHC. In CycIF, 46,510 mean cell counts per ROI were found for a total of 325,573 cells; in mIHC, 44,287 mean cell counts per ROI were found for a total of 310,006 cells (Figure 3b). Overall cell phenotype counts and tissue composition were highly correlated across CycIF and mIHC (r=0.89; Figure 3c, S4). To identify common cell type spatial patterns across assays, recurrent cellular neighborhoods (RCN) 27 were computed across all 7 ROIs using Latent Dirichlet Allocation (LDA) and K-means clustering ( Figure S5). Across the two assays, 20 RCNs were identified with most ROIs displaying highly similar RCN composition (Cosine > 0.88), though ROI 14 and 15 had more variable composition (Cosine 0.67 and 0.76, respectively) due to lower cell density and higher stromal content (Figure 3d, S6).  Analysis and visualization of multiplexed tissue imaging datasets remains a challenging problem as these datasets are very large and require using many analysis tools and visualizations together, in an iterative and interactive manner. Galaxy-ME is an interactive, web-based software hub, tool suite, and workflow engine that uniquely centralizes a broad collection of image processing and single-cell analytics methods for comprehensive analysis of MTI datasets. Galaxy-ME tools can be connected with more than 8,500 other tools available via the Galaxy Tool Shed 28 , to create analyses that extend far beyond imaging, such as using machine learning tools 29 or integration with omics data for multimodal analysis. By streamlining analysis and visualization of MTI datasets into a web-based, graphical user interface, Galaxy-ME overcomes several barriers that other analysis approaches often encounter and democratizes access, analysis, and visualization of MTI datasets so that any scientist, regardless of their informatics expertise, can work with MTI datasets.
Th analysis of MTI datasets from healthy human tonsil and colorectal cancer tissue specimens with Galaxy-ME demonstrates the feasibility of automated analysis across multiple MTI assays with the ability to produce high-quality single-cell compositional and spatial results. The high concordance found between MTI assays in all facets of analysis-cell counts, tissue composition, and tissue spatial organization-provides confidence in both the imaging datasets produced by the multiplex tissue imaging assays as well as Galaxy-ME's analysis tools and workflows. Looking forward, there are opportunities to add additional tools and visualizations that extend Galaxy-ME beyond single-cell analysis to focus on quantification and organization of larger units of organization such as functional tissue units 30 or subcellular characteristics of cells.

Primary Image Processing
Primary image processing in Galaxy-ME is includes all of the MCMICRO 19 suite of tools, in addition to supplemental tools for key steps (Supplemental Table 1). Image processing steps include: (1) illumination correction between microscopy tiles (with the BaSiC 31 tool); (2) stitching of tiles into channel mosaics (ASHLAR 32 ); (3) registration of channel mosaics into a multi-channel OME-TIFF pyramid (ASHLAR, PALOM); (4) single-cell segmentation (UnMICST 33 and S3segmenter 34 , Ilastik 35 , Cellpose 36 , Mesmer 37 ); and (5) quantification of protein marker intensities for every cell (MCQuant). Processing tissue microarrays requires an additional step to dearray sample cores into separate images (Unet Coreograph), which is performed after illumination correction and stitching. Single-cell segmentation-the process of creating an image mask that assigns pixels into individual indexed cells-is a challenging but critical image processing step that can vary dramatically in performance between assays and tissues. For this reason, several segmentation tools have been integrated into Galaxy-ME so that the best segmentation method can be applied to each dataset. Final outputs from primary image processing are (1) a multi-channel pyramidal OME-TIFF file that includes all image channels and (2) a cell feature table with mean protein marker intensities and morphological features (area, eccentricity, orientation, and solidity) for each cell identified in the segmentation mask.

Single-Cell Analysis and Visualization
Single-cell analysis tools in Galaxy-ME use the cell feature table produced by primary image processing for cell phenotyping, compositional analysis, and spatial analysis (Figure 1). Galaxy-ME supports single-cell phenotyping using two complementary approaches. First, cells can be gated using a biologically-driven, semi-automated hierarchical gating approach using the SciMap 27 package from MCMICRO. The output of this approach is a distinct phenotype for each cell, such as CD8+ Tcell or luminal neoplastic tumor cell. The second approach for phenotyping cells is a data-driven approach where cells are clustered using community metrics 38,39 (e.g. Louvain, Leiden). This approach produces clusters that can then be annotated based on the markers enriched in each cluster. Galaxy-ME uses spatial analysis methods in SciMap and SquidPy 26 to quantify spatial interactions, neighbor enrichment between cell types, spatial neighborhoods, and other metrics of tissue spatial organization. Galaxy-ME uses the common ANNData format (https://anndata.readthedocs.io) for storing single-cell data, allowing for easy integration with existing Galaxy tool suites, like ScanPy 40 and Seurat 41 .
There are three primary visualizations available in Galaxy-ME. Avivator is a light-weight and webbased image viewer built with the Viv 42 library that enables viewing multi-channel OME-TIFF images hosted on a web server. With Avivator, different image channels can be selected and visualized, and it is possible to pan and zoom around images. For additional data viewing, Galaxy-ME provides a tool to create Vitessce 25 interactive dashboards that include multiple connected visualizations. These visualizations include multiplex images augmented with labeled segmentation masks as well as phenotype information, UMAP plots, phenotype marker enrichment, and single-cell heatmaps. Automated Image Processing and Cell Classification of SARDANA Samples Registered images from the healthy Tonsil and CRC samples were cropped to match annotated regions of interest (ROIs) across all MTI assays. Nuclei were segmented using Mesmer on the Hematoxylin or DAPI channels with the appropriate image resolution. MCQuant was used to extract single-cell mean pixel intensities, morphological characteristics, and spatial coordinate positions. The resulting cell feature tables were converted to AnnData format for downstream single cell and spatial analysis.
Individual cells were labeled as positive or negative for each protein using a threshold for marker positivity that was determined automatically using Scimap (scimap.pp.rescale) and validated with manual gating. The marker positivity calls were used to assign cell phenotypes with the hierarchical phenotyping tool available in Scimap (scimap.tl.phenotype_cells). For the tonsil sample, across CycIF, mIHC and CODEX, the following cell types were called based on marker positivity: pancytokeratin for epithelial, CD8 for cytotoxic T-cells, and CD20 for B-cells. For the CRC sample, cycIF and mIHC had more markers in common, so a deeper immune cell classification scheme was used.
For each sample set, basic compositional metrics, including total cell counts and relative frequency of each phenotype were calculated and compared across assays.

Cross-Assay Spatial Analysis of SARDANA Tonsil Sample
The Tonsil CycIF, mIHC, and CODEX single cell data, labeled with phenotype, was used for subsequent spatial analysis. Spatial neighborhood graphs were constructed for each ROI with Squidpy (squidpy.gr.spatial_neighbors) using default parameters (n_neighbors = 6). Spatial patterns were quantified for each assay by calculating a normalized (scaled between 0-1) spatial interaction score (squidpy.gr.interaction_matrix), a measure of how clustered a group of nodes in a graph are, between each phenotype using the neighborhood graphs.

Cross-Assay Spatial Analysis of SARDANA CRC Sample
For the CycIF and mIHC CRC images, spatial neighborhood matrices were generated for each ROI separately which quantified the frequency of neighbor cell phenotypes within a 30 micron radius for every epithelial cell in the tissue. The epithelial cell neighborhood matrices from all ROIs across both assays were combined and Scimap's implementation of Latent Dirichlet Allocation (n_motifs = 10) was run on the combined matrix to generate latent space weights. The resulting weights were clustered using K-means clustering (k = 20) to generate meta-clusters, or Recurrent Cellular Neighborhoods (RCNs) 27 . Hierarchical clustering was used to find biopsies that had similar RCN compositions. RCNs were annotated based on their composition.

Data Availability
The tonsil image datasets analyzed are available at https://www.synapse.org/MCMICRO_images, and the CRC images are available at https://www.synapse.org/#!Synapse:syn47164089. In addition, all datasets will be made available through the HTAN Data Portal.