Abstract
Motivation Networks provide a powerful framework to analyze spatial omics experiments. However, we lack tools that integrate several methods to easily reconstruct networks for further analyses with dedicated libraries. In addition, choosing the appropriate method and parameters can be challenging.
Summary We propose tysserand, a Python library to reconstruct spatial networks from spatially resolved omics experiments. It is intended as a common tool where the bioinformatics community can add new methods to reconstruct networks, choose appropriate parameters, clean resulting networks and pipe data to other libraries.
Availability tysserand software and tutorials with a Jupyter notebook to reproduce the results are available at https://github.com/VeraPancaldiLab/tysserand
Contact vera.pancaldi{at}inserm.fr
Supplementary information Supplementary data are available at Bioarxiv online.
1 Introduction
Recent technologies have made it possible to produce phenotypic data at the resolution of single cells (or higher) in intact sample slices, both at the levels of proteins [1, 2, 3, 4] or mRNA [5, 6, 7, 8]. Taking advantage of spatial information is determinant for revealing the biology of healthy organs and dissecting the complex processes involved in cancer, such a tumor progression and response to treatments[9, 10].
Existing spatial omics analysis libraries such as trendsceek[11], SpatialDE[12] and PySpacell[13] use marked point processes theory. Another fruitful approach is to represent tissues as networks, where nodes are cells and edges are interactions between cells which are established through physical contact. Network theory is already used for spatial analysis in the Python Spatial Analysis Library (PySAL)[14] for geospatial data science, and PySpacell, based on PySAL, provides 3 methods to reconstruct networks: k-nearest neighbors, radial distance neighbors and cell contact neighbors. However, due to its dependence on PySAL, it is not ideally suited to test other network reconstruction methods and PySAL methods do not scale well with big datasets, such as the ones typically produced after nuclei segmentation in Whole Slide Images investigated by anatomopathologists in a medical setting. Moreover, the choice of a reconstruction method and parameters can be hard, and the potential to use the reconstructed network with other dedicated network analysis libraries remains a priority. Here, following the Unix philosophy[15] according to which programs do one thing and do it well and programs work together, we present tysserand, a Python library to reconstruct spatial networks starting from object positions (cells, nuclei, …) or image segmentation results. We aim at encouraging the centralization of efforts in network construction from the bioinformatics community in one place, to promote integration of new reconstruction methods, providing algorithms for parameters selection, network processing to remove reconstruction artifacts, computational performance improvement and the addition of interfaces with external network analysis libraries such as NetworkX[16], iGraph[17] or Scanpy[18] for single-cell data analysis.
2 Materials and methods
Tysserand can consider two types of alternative inputs (Figure 1). First, an M×2 array of the M cells’s x/y coordinates or, second, a “segmentation image” with integers ranging from 0 to K representing K segmented areas in a microscopy image, with 0 values indicating the background. For the coordinates array input, 3 methods are already available to re-construct networks, based on the Scipy[19] library implementation: k-nearest neighbors (knn), radial distance neighbors (rdn) and Delaunay triangulation. We think Delaunay triangulation is best suited to represent tissues and interactions between contacting cells, whereas the rdn method is more appropriate to model interactions by diffusing chemicals, as already noticed by PySpacell authors[13]. For inputs consisting in cell segmentation images, we implemented the area contact neighbors method. It leverages the scikit-image[20] library to detect for each area which neighbors are in direct contact or closer than a given distance. The tysserand library provides simple visualization utilities to choose appropriate parameters for each network construction method and tools to clean the resulting networks from typical artifacts, such as very long edges between nodes on the border of samples after Delaunay triangulation (Figure S2).
Internally, tysserand adopts simple and efficient representations of networks to allow rapid prototyping of new network construction or cleaning methods. A network is represented by 2 arrays: a first M×2 array for nodes coordinates (that are the center of segmented objects if a segmentation image is provided as input), and an L×2 array to represent the L edges between nodes indicated by their index in the first coordinates array. Finally, tysserand can convert networks into formats used by specialized libraries such as NetworkX, iGraph and Scanpy.
3 Results
To qualitatively assess the differences between the Delaunay triangulation, knn and rdn, we used a set of nodes consisting of manually marked nuclei in a single tile from a multiplex immuno-fluorescence (mIF) Whole Slide Image of a lung-cancer biopsy sample (data available on the GitHub repository, description in supplementary materials, Figure S1). We applied each of these methods and upon visual inspection of the resulting networks we observed the following (Figure S2): 1) The Delaunay triangulation with edge trimming produces a network that looks similar to what we can expect from a tissue, i.e., most edges link contacting cells, as visible on the mIF image (Figure S3 b); 2) rdn produces excessively connected areas where the density of nodes is high; 3) knn produces a network with missing edges where we could expect them based on cell contacts, as well as edges passing through neighboring cells, which is not suitable to model interactions dependent on direct physical contact. We thus conclude that Delaunay triangulation is the most suitable method for these biological tissue images.
Finally, we compared the performance of the tysserand Delaunay triangulation and the mathematically equivalent Voronoi tessellation implemented in PySAL on randomly generated sets of node positions. The PySAL implementation, being based on shape objects, does not produce long edges artifacts. Since the tysserand Delaunay method does produce this type of artifacts, we also benchmarked tysserand’s method including automated edge trimming, which can slow down the performance but is necessary to obtain results that are comparable to PySAL (description in supplementary material). Across all sizes of node sets, the tysserand implementation of the Delaunay triangulation, including the automated network artifact removal option, is always at least 42 times faster than PySAL’s (Table 1, Figure S7). It is likely that the lower performance of the PySAL library is due to the use of ‘shape’ objects, which are common and important in geographical sciences but compromise the algorithm scalability. Since biological image applications often do not require the concept of ‘shape’, considerable improvements in scalability can be obtained using tysserand. The speed up provided by tysserand is valuable in the range of several tens of thousands of nodes, which is often the size of reconstructed networks from tissue sample Whole Slide Images.
4 Conclusion
Tysserand can reconstruct spatial networks from different inputs, such as sets of node positions or segmented areas, and the resulting networks can be further processed with dedicated network analysis libraries. It already implements 4 common network reconstruction methods as well as tools to facilitate the choice of parameters for network construction and artifact removal. tysserand Delaunay triangulation is faster than the PySAL equivalent method and scales better with larger datasets, which is important for the analysis of multiple tissue samples. We hope the bioinformatic community will be willing to participate in the implementation of new methods for more accurate and domain specific spatial network reconstruction to advance the field of bioimage processing.
Funding
This work was funded by INSERM; Fondation Toulouse Cancer Santé and Pierre Fabre Research Institute as part of the Chair of Bioinformatics in Oncology of the CRCT.
Acknowledgements
We thank Professor Pierre Brousset and the Imag’IN facility at IUCT Oncopole, Toulouse for providing the multiplex immuno-fluorescence images.