Macrophage sensitivity to microenvironmental cues influences spatial heterogeneity of tumours

Cross-talk between tumour and immune cells, including macrophages, is complex, dynamic and contributes to tumour heterogeneity. In this paper, we introduce a hybrid agent-based model (ABM) to investigate how tumour-macrophage dynamics evolve over time and how they influence spatial patterns of tumour growth. Macrophage phenotype is determined by microenvironmental cues and governs the extent to which macrophages are pro- or anti-tumour, i.e., whether they infiltrate and attack tumour cells, or promote metastasis by guiding tumour cell migration towards blood vessels. We perform extensive ABM simulations to investigate how changes in macrophage sensitivity to microenvironmental cues – specifically, cytokines produced by tumour cells and perivascular fibroblasts – alters their spatial and phenotypic distributions and the tumour’s growth dynamics. We identify outcomes that include: compact tumour growth, tumour elimination, and diffuse patterns of invasion, characterised by clustering of tumour cells and macrophages around blood vessels. We compare the ability of different statistics to characterise the diverse spatial patterns that the ABM generates. These include the weighted pair correlation function (wPCF), a new statistic that quantifies the spatial distributions of cells labelled with a continuous value (e.g., macrophage phenotype). We assess the ability of each statistic to discriminate between the various spatial patterns that the ABM exhibits. By combining statistical analysis with ABM simulations, we show how mechanistic models can be used to generate synthetic data for validation of novel statistics (here, the wPCF) and to assess the extent to which specific statistical descriptions can distinguish different spatial patterns and model behaviours. Such statistics can then be applied to biological imaging data, such as multiplexed medical images, with increased confidence in the interpretation of the results. We show that the wPCF accurately describes differences in macrophage localisation between images, and posit that it would be a valuable tool for analysing multiplexed imaging data. Author summary Macrophage phenotype is regulated by complex microenvironmental cues. It affects their spatial position and behaviour. In solid tumours, the spatial distribution of macrophages can vary significantly, and correlate with patient prognosis. In this paper, we use an agent-based model (ABM) to investigate how changes in sensitivity to tumour-induced microenvironmental cues affect macrophage phenotype and, in turn, how such phenotypic heterogeneity affects tumour composition and morphology. We illustrate the wide range of tumour outcomes that can arise from changing macrophage sensitivity to microenvironmental cues. We apply a variety of statistics to simulation outputs to characterise the range of observed spatial patterns. Different statistics identify different aspects of these patterns, with the most descriptive characterisations obtained from spatial statistics, such as the weighted pair correlation function (wPCF), that account for both phenotype and cell localisation. More generally, this paper illustrates how ABMs can be used to generate synthetic data which mimics data that can be extracted from different imaging modalities. This synthetic data can be used to test new statistics, like the wPCF, and validate their use for future application to multiplex histological imaging.

The immune landscape within solid tumours is complex and varied [1,2], with innate 2 and adaptive immune cells implicated in pro-and anti-tumour responses [3]. For 3 example, high densities of tumour associated macrophages have been associated with 4 poor prognosis in breast, prostate and head and neck cancer and with good prognosis in 5 colorectal and gastric cancer [4,5]. These differences may be due to differences in the 6 numbers of pro-and anti-tumour macrophages in these cancers. Indeed, macrophages 7 are often described as having an anti-tumour, "M 1 " phenotype or a pro-tumour, "M 2 " 8 phenotype. Further the number of macrophages of each phenotype, their morphology 9 and spatial distribution are associated with patient survival [6][7][8][9]. For example, in 10 non-small cell lung cancer high infiltration of M 1 macrophages into tumour islets, but 11 not tumour stroma, has been associated with increased patient survival [10]. 12 Macrophage phenotype may be viewed as a continuous variable, with the balance of 13 pro-and anti-tumour behaviours determined by integrating multiple, which macrophages at distance r from tumour cells have a target phenotype p. We also 126 use the wPCF to investigate how, for a given simulation, the distribution of macrophage 127 phenotypes changes as the distance r varies. 128 Paper outline 129 In this paper, we develop a hybrid ABM which simulates interactions between 130 macrophages and tumour cells. By varying parameters associated with macrophage 131 behaviour, we identify a range of different tumour morphologies including compact 132 growth, fragmented growth and tumour elimination. We use the ABM simulations as a 133 testbed to examine the utility of different statistical descriptions of the data, identifying 134 information that could be extracted from different types of experimental or clinical data 135 (e.g., IHC or IMC). We assess the ability of these statistics to discriminate the different 136 spatial patterns that the ABM can generate, and conclude that care is needed to ensure 137 that the selected statistics can adequately resolve changes of interest in the data.

Materials and methods
139 Agent-based model 140 Overview 141 In this Section, we present a 2D, multiscale, off-lattice ABM which extends an existing 142 model of macrophage [34,35] infiltration into tumour spheroids. We briefly describe the 143 ABM here; for full details of the implementation and default parameter values, we refer 144 the interested reader to S1 Appendix: Model Description. We then explain how 145 macrophage phenotype and behaviour are incorporated into the ABM. The ABM is 146 implemented within the Chaste (Cancer, Heart and Soft Tissue Environment) modelling 147 environment [30][31][32]. 148 The ABM distinguishes four cell types: stromal cells, tumour cells, necrotic cells, 149 and macrophages. Their dynamics are influenced by five diffusible species: oxygen (ω), 150 CSF-1 (c), CXCL12 (ξ), TGF-β (g) and EGF ( ). In the 2D Cartesian geometry, blood 151 vessels are represented by fixed points which do not compete for space with the cell 152 populations and which act as distributed sources of oxygen [33]. A schematic of the 153 ABM is presented in Fig 1. Following [80], critical oxygen thresholds for hypoxia (ω str immediately halt and remain paused until the local oxygen concentration rises above 158 this threshold. However, if the oxygen concentration falls below the necrosis threshold, 159 then the cell becomes necrotic (this switch is irreversible). Necrotic cells occupy space 160 for a finite time period during which they decrease linearly in size, until they reach size 161 0 and are removed from the simulation. Blood vessels also act as entry points for 162 macrophages, which infiltrate the tissue and alter their phenotype (and, hence, 163 behaviour) at rates which depend on local levels of TGF-β (see Fig 1B). 164 We represent each cell by the spatial coordinates of its centre of mass and determine 165 its movement by balancing the forces that act on each it. Using an overdamped form of 166 Newton's second law and neglecting inertial terms, we have that for cell i where ν is the assumed constant drag coefficient and F i denotes the net force acting on 168 cell i at position x i and time t. The forces that act on a cell depend on its type (see 169 Fig 1C and S1 Appendix: Model Description). Cells interact via spring forces if their 170 centres are within a distance R int of each other [81]; intercellular adhesion and volume 171 exclusion are represented by attractive and repulsive forces respectively. We also 172 associate with each cell an area, defined as the area of a circle with a radius equal to the 173 cells average interaction radius with neighbouring cells, or its equilibrium radius where 174 neighbouring cells are beyond this distance. Stromal cells which are so mechanically Cell-cycle progression is determined by a cell's local oxygen concentration: a cell may be 'proliferative' (and progress through its cell cycle), 'hypoxic' (the cell cycle is temporarily paused until oxygen concentrations return to a sufficiently high level) or 'necrotic' (the cell becomes necrotic cell and degrades). Cell cycles also pause if there is insufficient space available for proliferation. B: Macrophage behaviour depends on phenotype p, modulating their rates of tumour cell killing, EGF production, and chemotactic sensitivity to gradients of CSF-1 and CXCL12. C: Forces acting on different cell types. Macrophages are subject to mechanical forces due to interactions with nearby cells, and random forces which simulate their exploration of their environment as highly motile cells. Macrophages also experience chemotactic forces that are directed up spatial gradients of CSF-1 and CXCL12, and whose magnitude depends on p. Tumour cells experience mechanical forces due to interactions with neighbouring cells, and chemotactic forces in the direction of increasing EGF. Stromal cells experience mechanical forces due to interactions with neighbouring cells. Necrotic cells experience these interaction forces, which decrease in magnitude as they decrease in size. All cells experience a drag force. D: Summary of the phases of macrophage-mediated tumour cell migration in our ABM. i) M 1 -like macrophages extravasate from blood vessels in response to CSF-1. ii) M 1 macrophages migrate into the tumour mass in response to CSF-1, where they may kill tumour cells. iii) Exposure to TGF-β causes macrophages to adopt an M 2 -like phenotype. iv) M 2 -like macrophages produce EGF, which acts as a chemoattractant for tumour cells. v) M 2 -like macrophages migrate towards blood vessels, in response to CXCL12 gradients. E: Schematic summarising the sources of CSF-1, TGF-β, EGF and CXCL12 in our model, and their interactions with cells, as described in steps i-v of panel D.
May 23, 2022 6/58 reached and the macrophage has a fully M 2 -like phenotype. Its phenotype remains fixed 184 at p i = 1 for all later times. Thus, we have: where H is the Heaviside function (H(x) = 1 when x > 0 and H(x) = 0 otherwise). 186 We now explain how changes in phenotype p affect macrophage behaviour and 187 function, and how these changes are incorporated into the ABM (see also Fig 1B). 188 Macrophage chemotactic forces Fig 1C shows and respectively, where the non-negative parameters χ m c and χ m ξ indicate macrophage 201 sensitivity to spatial gradients of CSF-1 and CXCL12, and ∇c and ∇ξ are evaluated at 202 x i . The forces F χc i and F χ ξ i contribute to the net force F i in Eq (1) (see Fig 1C and S1 203 Appendix: Model Description).

204
Macrophage cell killing We assume that when the distance between a macrophage 205 and a tumour cell is less than, or equal to, the interaction radius R int , the macrophage 206 will attempt to kill the tumour cell, with M 1 -like macrophages more likely to kill a 207 tumour cell than M 2 -like macrophages. We define a probability of cell kill per hour, P ϕ , 208 which is a monotonic decreasing function of macrophage phenotype. We suppose further 209 that, after a macrophage has killed a tumour cell, it experiences a 'cooldown' period of 210 t cool hours during which it cannot attempt tumour cell killing. Thus, we associate with 211 macrophage i a subcellular timer t ϕ,i that is updated in real time and set to zero on 212 tumour cell killing. We define P ϕ,i as: where P ϕ is the maximum probability of tumour cell killing. If multiple tumour cells are 214 within distance R int of macrophage i, then one is selected at random for cell death.

215
Following cell killing, tumour cells are labelled as 'necrotic' and decay in the same way 216 as other necrotic cells.

217
Macrophage extravasation Macrophages enter the domain via blood vessels with a 218 probability per hour P ex which is an increasing, saturating function of CSF-1: where the non-negative parameter P represents the maximum probability per hour of 220 macrophage extravasation from a vessel, and c 1/2 is the concentration of CSF-1 at 221 which the probability is half-maximal.

222
Macrophage production of EGF The diffusible cytokine EGF, , is produced by 223 M 2 -like macrophages and undergoes natural decay. It is also a potent chemoattractant 224 for tumour cells. For simplicity, we assume that macrophage i produces EGF at a rate 225 which is linearly proportional to its phenotype p i , with constant of proportionality κ . 226 Denoting by D and λ the assumed constant diffusion coefficient and natural decay 227 rate of EGF, we suppose that its evolution can be described by the following reaction 228 diffusion equation: where δ(x) = 1 when x = 0 and δ(x) = 0 elsewhere. In (7), we sum over all 230 macrophages to determine the net rate of production at spatial position x.

231
Constructing synthetic datasets from model simulations 232 We post-process the output from ABM simulations at a fixed timepoint, to generate 233 synthetic datasets which mimic information that can be collected from biological 234 experiments (see Fig 2). We categorise these datasets as follows:  3. Raw cell counts and macrophage phenotype are recorded, with spatial information 240 neglected (synthetic single cell sequencing data: Fig 2C); Before defining the statistics that we apply to the synthetic datasets, we introduce some 245 notation. Consider an object i (which may be a cell or a blood vessel), whose centre is 246 located at x i = (x i , y i ) at time t. We associate with object i a categorical label 247 q i ∈ {B, M, S, T, N } which indicates whether it is a blood vessel, macrophage, stromal 248 cell, tumour cell or necrotic cell. Given a target label Q ∈ {B, M, S, T, N }, we define a 249 binary target function Θ(Q, q i ) to indicate whether the label q i associated with object i 250 matches Q: Cell counts 252 We use the binary target function to determine the number of objects of a particular 253 type at time t. For example, N T (t), the number of tumour cells at time t, is given by Information about some cell types may not be visible (e.g., necrotic cells). C: Synthetic single-cell sequencing data provides a detailed picture of expression levels across the sample (here represented by the 'phenotype' variable in macrophages), but without spatial resolution. D: Synthetic IMC provides detailed phenotype information for each macrophage, and spatial locations of all cells.
where the sum is over all objects in the simulation at time t. Similarly, N M (t), the total 255 number of macrophages at time t, is given by and 0 ≤ P 1 < P 2 ≤ 1. We also calculate the proportions of macrophages with M 1 -like respectively.

268
The cross-PCF identifies spatial correlations between objects with categorical labels 269 that are separated by a distance r. We define a sequence of annuli, of inner radius r k 270 and outer radius r k+1 = r k + dr where r 0 = 0 and dr > 0. We denote by A r k (x) the 271 area of the annulus with inner radius r k that is centred at the point x. If this annulus 272 lies wholly inside the domain then A r k (x) = π((r k + dr) 2 − r 2 k ) = π(2r k + dr)dr; 273 otherwise, only the area contained within the domain is recorded. We also define the 274 indicator function, I k (r), as follows: We calculate the cross-PCF for blood vessels and tumour cells by considering a domain of area A. We suppose that, at time t, it contains N B blood vessels and N T tumour cells. Then, we define the cross-PCF, g BT (r), by: where r ∈ [r k , r k+1 ) and N is the total number of objects in the simulation. r ∈ [r k , r k+1 ) from blood vessels and g BT (r) < 1 indicates anti-correlation, or exclusion, 280 of tumour cells at distance r ∈ [r k , r k+1 ) from blood vessels.

281
Cross-PCFs for other pairs of categorical variables can be defined similarly. In this 282 paper, we focus on g T B , g BM and g T M . We note that the cross-PCF is not necessarily 283 symmetric (i.e., g BT = g T B since, for any pair of points, the annulus surrounding one 284 point may intersect with the domain boundary while the annulus surrounding the other 285 may not).

286
Weighted pair correlation function (wPCF) 287 The weighted PCF (wPCF) quantifies spatial correlations between objects with discrete 288 categorical labels (here, blood vessels or tumour cells) and those with continuous labels 289 (here, macrophages with phenotype p). Its relationship to the cross-PCF is explored in 290 S3 Appendix: Derivation of PCFs and Cross-PCFs from wPCF. 291 We calculate the wPCF by replacing Θ(Q, q i ) in Equation (15) with a weighting 292 function, 0 ≤ w p (P, p i ) ≤ 1, which describes how p i differs from a target phenotype, P . 293 Multiple functional forms could be used for the weighting function; in S4 Appendix:

294
Comparison of different weighting functions, we show how alternative functional forms 295 affect the wPCF. For simplicity, throughout this paper we use a triangular weighting 296 function of the form: and fix ∆P = 0.2. Then, w p ≈ 1 for cells whose phenotype p i is close to the target P 298 and w p = 0 for those with |P − p i | > ∆P . We note further that w p → Θ as ∆P → 0.

299
Replacing Θ(T, q i ) with w p (P, p i ) in Eq (15), we define the wPCF for macrophages 300 of target phenotype P and blood vessels B, at lengthscale r, as follows: where W P (P ) = N i=1 w p (P, p i ) is the total 'weight' associated with the target label P 302 across all macrophages (W P (P ) replaces N T in Eq (15); non-macrophages have weight 303 w p = 0). Intuitively, the wPCF extends the cross-PCF by weighting the contribution of 304 each macrophage based on how closely its phenotype matches the target phenotype. distance r from the nearest cross has label P ≈ r for (A) (or P ≈ r 2 for (B)). The two 315 wPCFs show strong clustering along these lines and exclusion at shorter distances for 316 points with larger labels (above the dashed lines). We explain the weaker clustering 317 observed below the lines as follows. Consider a cross at position (x j , 1). In (A), the 318 largest label associated with a circle at distance r from this cross is p = r (if the circle is 319 directly above the cross). Smaller labels can also be recorded, for circles at distance r 320 which are offset from the cross. In the bottom panels of Fig 3, we plot wP CF (r, P, B) 321 for fixed values of the target label P . These curves can be interpreted as cross-PCFs for 322 points whose labels p i are "close" to P , and show the strongest clustering at the 323 expected values.

324
A natural extension to the wPCF considers objects with two continuous labels, P 1 325 and P 2 , say, in order to identify spatial correlations between objects with labels close to 326 P 1 and objects with labels close to P 2 (e.g., colocalisation of macrophages with p ≈ 0 327 with those with p ≈ 1, or colocalisation of a particular macrophage phenotype with a 328 particular concentration of CSF-1). We discuss such extensions in S5 Appendix: wPCF 329 for comparing two continuous labels.

331
Tumour progression in the presence and absence of macrophages 332  When macrophage extravasation is active (Fig 4B and Fig 4D), the ABM reproduces 344 the qualitative behaviours outlined in Fig 1 [23]. At early times (t ≈ 100), CSF-1 levels 345 are below the threshold for macrophage extravasation and the domain is devoid of 346 macrophages. As the tumour increases in size, more CSF-1 is produced until, eventually, 347 CSF-1 levels at the blood vessels reach the threshold for extravasation of M 1 -like 348 macrophages (t ≈ 200). By t = 300, some macrophages have infiltrated the tumour 349 mass. Macrophages that have been exposed to sufficient levels of TGF-β become M 2 -like 350 and migrate back towards nearby blood vessels, in response to spatial gradients in    Table 3 in S1 Appendix: Model Description). between macrophages and blood vessels (Fig 5F), suggesting the presence of 372 perivascular niches containing macrophages, tumour cells and blood vessels.

373
The corresponding weighted PCFs, wP CF (r, P, B) and wP CF (r, P, T ) (mean of 10 374 iterations), highlight differences in the spatial colocalisation of macrophages with 375 different phenotypes ( Fig 5G and Fig 5H). In particular, in this simulation macrophages 376 with p ≈ 0 do not correlate strongly with either tumour cells or blood vessels. By  Table 3 in S1 Appendix: Model Description. The results presented in Fig 6 illustrate the range of qualitative behaviours that the 400 ABM generates as χ m c and c 1/2 vary. We summarise these behaviours as follows:

401
• A compact tumour mass, with macrophages confined to the surrounding stroma. 402 The dominant macrophage phenotype is M 1 (COMP; yellow box in Fig 6).

403
• Total, or near-total, tumour cell elimination. Some macrophages may be clustered 404 around the last surviving tumour cells and the dominant macrophage phenotype 405 is M 1 (ELIM; blue box in Fig 6).

406
• The tumour is asymmetric, with a diffuse, fragmented structure. Perivascular 407 niches containing M 2 -like macrophages and tumour cells surround blood vessels.

408
The bulk of the tumour is infiltrated with M 1 -like and transitioning macrophages, 409 with central tumour necrosis caused by macrophages killing tumour cells (DIFF; 410 red box in Fig 6). The cross-PCF g T B (r) for the tumour cells and blood vessels in (C). The cross-PCF reveals short range interactions between tumour cells and blood vessels. Comparison with (B) quantifies how the spatial distribution of tumour cells relative to the blood vessels changes in the presence of macrophages. E: The cross-PCF g T M (r) associated with (C). The cross-PCF reveals strong short range interactions between macrophages and tumour cells. F: The cross-PCF g BM (r) associated with (C). Short-range correlations between blood vessels and macrophages are very strong and decay rapidly with distance r. G: The weighted PCF wP CF (r, P, B) associated with (C). There is strong, short-range colocalisation of macrophages with p > 0.6 and blood vessels, while macrophages with 0.2 p 0.6 are excluded from regions of radius approximately 10-15 cell diameters surrounding blood vessels. H: The weighted PCF wP CF (r, P, T ) associated with (C). Macrophages with p > 0.6 are strongly colocalised with tumour cells at distances 0 ≤ r ≤ 10, indicating their presence inside the tumour mass. Short-range colocalisation (r ≈ 3) is also observed for M 2 -like macrophages with p ≥ 0.9.  some macrophages in ELIM simulations have been exposed to high enough levels of 455 TGF-β for a sufficient period to change their phenotype, we deduce that they must have 456 infiltrated the tumour mass. In contrast, since no macrophages have p > 0 for COMP 457 simulations, we deduce that none have been exposed to high levels of TGF-β and, hence, 458 that none have infiltrated the tumour.

459
Immunohistochemistry analysis 460 We now focus on perivascular niches, using the cross-PCF to describe the relationships 461 between tumour cells and blood vessels (g T B (r)) and macrophages and blood vessels ELIM simulations can be distinguished from other cases (blue box) due to differences in N T . However, cell counts alone cannot easily distinguish between COMP (yellow box) and DIFF (red box) simulations, despite crucial differences in simulation behaviours.  colocalisation at one to two cell diameters than at other distances from blood vessels.

474
The COMP simulations (yellow box in Fig 9), for which no tumour cells reach the

508
COMP simulations (red box in Fig 11 and Fig 12) exhibit a different signature, with 509 short-range clustering of M 1 -like macrophages and blood vessels, and exclusion between 510 blood vessels and other macrophage phenotypes. There is also exclusion between    with no TGF-β exposure (p = 0) are restricted to the stroma while those with slightly 516 increased phenotype cluster around the tumour mass, at distance from the blood vessels. 517 Finally, for ELIM simulations (blue box in Fig 11 and Fig 12), the wPCFs are 518 generally flatter, with fewer extreme values, indicating weaker correlations. As for the 519 DIFF simulations, M 2 -like macrophages are strongly associated with blood vessels, 520 suggesting that these macrophages have followed the same dynamic trajectory as those 521 in DIFF simulations, but that the tumour was eliminated before the selected timepoint. 522 Wider exploration of parameter space 523 We have characterised a range of qualitative behaviours, obtained by systematically 524 varying parameters associated with macrophage sensitivity to CSF-1 (namely c 1/2 and 525 χ m c ). In this section, we explore a larger, 6-dimensional parameter space that regulates 526 macrophage sensitivity to environmental cues. In addition to χ m c , we vary the 527 chemotactic sensitivity parameters χ m ξ (macrophage sensitivity to CXCL-12) and χ T 528 (tumour cell sensitivity to EGF). In addition to c 1/2 , we vary P (maximum 529 extravasation probability for saturated CSF-1) and g crit (the critical TGF-β threshold 530 above which macrophage phenotype increases). For each simulation, we sample each 531 parameter uniformly at random from a range covering extreme values of each parameter 532 (n = 432 simulations; see  The low value of χ m ξ means that these macrophages are unable to migrate towards the 553 blood vessels and remain localised within the tumour long enough to adopt an M 2 -like 554 phenotype. In contrast to the wPCFs for the compact tumour growth simulations 555 shown in Fig 12, Table 2 we focus on selected statistics and three parameters, associated with macrophage 596 extravasation (P ), macrophage localisation (χ m ξ ) and macrophage phenotype (g crit ).

597
The PRCCs in Table 2 (and S7 Appendix: Partial correlations in 6-parameter sweep) 598 show that, while changes in each parameter can significantly alter tumour size and 599 morphology and the spatial and phenotypic distributions of the macrophages, these   Table 2. Partial rank correlation coefficients (PRCCs) PRCCs corresponding to selected scalar statistics and parameters, based on simulations in which 6 parameters were randomly varied (n = 432). The full table, corresponding to variation of all 6 parameters and additional scalar statistics, can be found in S7 Appendix: Partial correlations in 6-parameter sweep. consequences of this spatial segregation on tumour cell proliferation and migration. In 643 this way, the model shows how the spatial and phenotypic distribution of macrophages 644 within tumour tissue influences tumour size and structure, while in turn being altered in 645 response to tumour-derived microenvironmental cues. We used the ABM to show how 646 changing macrophage sensitivity to specific environmental cues can generate diverse 647 patterns of tumour growth, including growth as a compact mass, tumour elimination, 648 diffuse or fragmented growth towards the vasculature, and the formation of perivascular 649 niches containing M 2 -like macrophages and tumour cells. 650 We proposed a series of statistics which summarise the spatial and phenotypic data 651 that our ABM generates, and which could be routinely extracted from biological 652 imaging data. We assessed the extent to which each statistic can distinguish between introduced the weighted PCF (wPCF) to describe relationships between cell phenotype 661 and spatial localisation. By applying the wPCF to ABM data, we have shown how it 662 can distinguish between different qualitative behaviours (and associated spatial 663 patterns) that arise as a result of tumour-macrophage interactions, as described by 664 Arwert et al, such as macrophage phenotype plasticity in the tumour bulk or migration 665 of M 2 -like macrophages towards blood vessels.

666
By combining the above statistics, we generate a multidimensional signature, or 667 'fingerprint', describing the output from ABM simulations. In particular, we can identify 668 qualitative and quantitative differences in simulation behaviours, such as compact and 669 diffuse tumour growth. Each statistic identifies different features of the simulated data. 670 Although the wPCF provides the most detailed description, simpler statistics resolve 671 differences in the simulation behaviours that do not rely on spatial and/or phenotypic 672 cues.

673
There are many ways in which we could extend the ABM presented in this paper.

674
Its extension to three spatial dimensions would yield more realistic data, but would be 675 more computationally challenging to generate and analyse. The model could also be 676 initialised from medical imaging data showing cell locations [93]. We could include a 677 more detailed description of the vasculature, that accounts for tumour blood flow, 678 angiogenesis and vascular remodelling [94][95][96][97][98][99]. This would enable us to account for 679 tumour cell intravasation and metastasis, and to investigate how distinct tumour 680 regions are initiated and evolve over time. Since macrophage phenotype represents just 681 one facet of the immune response to cancer, a natural extension would be to include 682 multiple immune cell populations, such as T cells, in our ABM. We could also analyse 683 data from multiple time points to determine how the different statistics evolve over time 684 and to understand when and how differences in the patterns of tumour growth, caused 685 by varying specific parameter values, can be detected from the statistical descriptors. In 686 future work, we plan to use the ABM to study tumour responses to treatments such as 687 radiotherapy and chemotherapy, and to investigate the prognostic power of the 688 statistical descriptors, individually and in combination, to identify not only those 689 tumours that will benefit from such treatments but also those that would benefit from 690 May 23, 2022 30/58 immunotherapy. In the longer term, the insight gained from such studies could inform 691 the development of new biomarkers for identifying those patients who will respond to a 692 particular treatment.

693
Future work will involve applying these statistics to multiplex imaging data, in order 694 to validate their use in biological and clinical settings. Applying these statistics to 695 medical images would enable their high-throughput, automated quantification and 696 comparison in a manner that goes beyond expert visual inspection and is arguably more 697 interpretable than machine learning approaches [90,91]. While some relationships 698 between cells described by the wPCF may be apparent from visual inspection, it is 699 increasingly difficult to identify such relationships by eye as the number of multiplex 700 markers increases; typically only 5 or 6 distinct cell types may be visualised 701 simultaneously (from a potential pool of up to 40 markers).

702
The 'statistical fingerprint' described in this paper could be augmented with other 703 features. For example, topological data analysis can describe spatial features such as 704 immune deserts that exist in noisy data [34], or changes in tumour and vascular 705 architecture in response to radiotherapy [92]. Multiple spatial statistics can be combined 706 to obtain more detailed descriptions of 2D data [22], or new statistics can be derived 707 from networks of cell contact [21] or observations of immune cell locations [100,101].

708
In concluding, we note that the statistical methods described in this paper are not 709 limited to the analysis of synthetic data generated by off-lattice ABMs. For example, 710 the PCF, the cross-PCF, and the wPCF can be applied to data at a range of length 711 scales including coarser-grained data (e.g., meso-scale images from PET scans or MRIs). 712 In this study, macrophage phenotype p is a continuous label for the point patterns, but 713 the wPCF can be applied to other continuous variables, including subcellular labels 714 (e.g., cell cycle position or expression levels of a particular gene) and/or local levels of 715 diffusible species (e.g., oxygen, glucose or a chemotherapeutic drug), as shown by 716 examples in S5 Appendix: wPCF for comparing two continuous labels. In particular, we 717 note that before cell segmentation and classification, multiplex images consist of 718 continuous labels describing pixel intensities for a range of markers. These intensities 719 could be viewed as continuous variables for the wPCF, permitting comparison of pixels 720 with 'high' and 'low' levels of each marker without requiring the definition of threshold 721 intensities.

722
Taken together, the ABM and statistical approach presented in this paper represent 723 a methodology by which multidimensional spatially resolved imaging data can be 724 examined. We used the ABM to generate multidimensional spatial data of the type 725 produced by multiplex imaging modalities, and computed statistical descriptions of this 726 data within a framework where interactions between different cell types are well 727 understood. This enabled us to compare a range of existing and new statistical 728 descriptors, which can be used to better understand multidimensional spatial data from 729 multiplex imaging. Our ABM extends existing ABMs developed to describe infiltration of microbeads [35] 733 and macrophages [34] into tumour spheroids growing in vitro. Here, we consider an in 734 vivo scenario, where tumour cells are embedded within a tissue containing stromal cells 735 and where oxygen is supplied by blood vessels. We use an overlapping-spheres model, 736 representing each cell as a point whose movement and position are determined by 737 balancing the forces that act on the cell. We distinguish four types of agent: 738 macrophages, tumour cells, stromal cells and necrotic cells. We also consider 739 five diffusible species: oxygen (ω(x, t)), colony stimulating factor-1 (CSF-1, 740 c(x, t)), transforming growth factor-β (TGF-β, g(x, t)), epidermal growth 741 factor (EGF, (x, t)), and C-X-C motif chemokine 12 (CXCL12, ξ(x, t)). Their 742 dynamics are defined by reaction-diffusion equations (RDEs). Following [34,35], we use 743 the Chaste (Cancer, Heart and Soft Tissue Environment) modelling environment to 744 solve the model equations [30][31][32]. The distribution of the diffusible species is modelled using RDEs, with each equation 747 solved numerically on a triangular finite element mesh large enough to span the domain. 748 For the simulations in this manuscript, the domain Ω is taken to be a square with 749 height and width equal to 50 cell diameters (1 mm in dimensional units). In this Section 750 we briefly describe the role of each diffusible species, the factors that influence rates of 751 production and depletion, and the RDE that defines its evolution. Fig 1 in the main   752 text summarises how the diffusible species interact with tumour cells and macrophages. 753 Oxygen (ω) Oxygen diffuses through the domain from blood vessels which are 754 represented by static point sources, and is consumed by tumour cells and stromal cells. 755 We assume that the oxygen concentration at each blood vessel is constant, ω 0 .

756
Following previous models [34,35] we assume that since the timescale of diffusion for 757 an oxygen molecule is faster than the timesteps in our model for describing cell 758 movement, the distribution of oxygen can be approximated using a steady state solution. 759 For simplicity, we rescale concentrations with a factor of ω 0 , so that ω = 1 at each blood 760 vessel. The governing equation is then given by for x ∈ Ω, where x i is the location of stromal or tumour cell i, the parameter D ω is the 762 assumed constant diffusion coefficient of oxygen, κ ω is the oxygen consumption rate and 763 δ is the delta function (δ(x) = 1 when x = 0, δ(x) = 0 otherwise).

764
CSF-1 (Colony stimulating factor-1, c) CSF-1 acts as a chemoattractant for 765 macrophages and is produced by tumour cells. It acts with EGF as part of a paracrine 766 loop [69,70] and is a factor which can recruit macrophages from vasculature [23]. We 767 assume that CSF-1 is produced at a constant rate, κ c , by each tumour cell, and that 768 CSF-1 in the tumour microenvironment decays at a constant rate, λ c . The distribution 769 of CSF-1 is therefore described by the equation where D c is the diffusion coefficient of CSF-1. We apply Neumann boundary conditions 771 ( ∂c ∂x = 0) on the boundaries of the domain Ω, and assume that initially c(x, t = 0) = 0 772 for all x.

773
TGF-β (Transforming growth factor-β, g) TGF-β causes macrophages to alter 774 their phenotype, and is produced by tumour cells. We assume that TGF-β decays at a 775 constant rate, λ g , and is produced by tumour cells at a constant rate, κ g . Combining 776 these effects, we arrive at the following RDE for g(x, t): where D g is the diffusion coefficient of TGF-β and the sum i is over all tumour cells. As 778 for CSF-1, we impose Neumann boundary conditions ( ∂g ∂x = 0 on the boundaries of the 779 domain Ω), and assume that g(x, t = 0) = 0 ∀x ∈ Ω. 780 EGF (Epidermal growth factor, ) EGF is a diffusible chemoattractant for 781 tumour cells that is produced by tumour associated macrophages. We assume that EGF 782 is produced by macrophages at a rate that is linearly dependent on their phenotype p, 783 so that macrophages with an extreme M 1 -like phenotype (p = 0) produce no EGF and 784 macrophages with an extreme M 2 -like phenotype (p = 1) produce κ . We assume that 785 EGF naturally decays at a constant rate, λ . Combining these effects, we obtain the 786 following RDE for EGF, (x, t): where D is the constant diffusion coefficient of EGF, κ is the maximum rate of 788 production of EGF, the sum is over all macrophages i, and p i is the phenotype of 789 macrophage i (p i ∈ [0, 1], and is fully defined below). cancer-associated fibroblasts (CAFs) [23]. We suppose that CAFs are localised close to 793 blood vessels, and, hence, make the simplifying assumption that blood vessels act as 794 constant sources of CXCL12. We therefore assume that the concentration of CXCL12, ξ, 795 is maintained at a fixed value ξ = ξ 0 at all blood vessels. We assume that CXCL12 796 decays naturally at a constant rate, λ ξ . The distribution of CXCL12 in the domain is 797 then given by where D ξ is the diffusion coefficient for CXCL12.

799
Agents 800 Our model distinguishes four types of agents: macrophages, tumour cells, stromal 801 cells and necrotic cells. In addition, a number of blood vessels are located at fixed 802 spatial positions throughout each simulation. The location of each agent is represented 803 by its cell centre. 804 We use Newton's second law to derive the equations of motion for macrophages, 805 tumour cells, stromal cells and necrotic cells. Neglecting inertial effects in the 806 overdamped limit, the force balance for cell i can be written as: where ν is the drag coefficient (assuming that the drag on a cell is proportional to its 808 velocity), and F i denotes the net force acting on cell i. The forces that act on each cell 809 type are indicated in Fig 1.

810
All cells are subject to mechanical forces due to cell-cell interactions (incorporating 811 repulsion due to volume exclusion and attraction due to intercellular adhesion). Here we 812 adopt the overlapping spheres approach [82,84], assuming that two cells interact if the 813 distance between their cell centres is less than a fixed radius of interaction R int .

814
Specifically, for cells at locations x i and x j , if |x i − x j | < R int then the interaction force 815 between cells i and j is parallel to the vector (x i − x j ) connecting their centres. The 816 resting spring length between the cell centres, s i,j , is the sum of the equilibrium spring 817 lengths associated with each cell (s i,j = s i + s j ). For most cells i, the resting spring 818 length is equal to the approximate radius of a cell (s i = R Cell ≡ 0.5 in non-dimensional 819 units). For newly divided and necrotic cells, s i changes over time to account for cell 820 growth and shrinkage (see below).

821
The net mechanical force acting on each cell is the sum of all interaction forces due 822 to cells within the interaction radius: where F m i,j is the mechanical force between cells i and j. This force always points in the 824 direction of the vector connecting the cell centres and has magnitude: vessel is converted into a probability of extravasation such that the probability of a 838 macrophage extravasating in a one hour time period is P ex , where: where P is a parameter controlling the maximum possible probability of macrophage 840 extravasation per hour from each vessel, and c 1/2 is the concentration of CSF-1 at which 841 this probability is half-maximal.

842
Phenotype The diffusible species included in our model interact with receptors on 843 macrophages and tumour cells to induce behaviours such as chemotaxis or upregulation 844 of different receptors. While we do not explicitly model surface receptors, the receptor 845 CXCR4 (C-X-C motif chemokine receptor type 4) plays an important role in this 846 system: when exposed to TGF-β, macrophages increase expression of CXCR4 and 847 become sensitive to gradients of CXCL12 [23]. irreversibly. The change in phenotype for a macrophage i at location x i is: where g(x i , t) is the concentration of TGF-β at x i at time t, H(x) is the Heaviside The dependence of EGF production on p is described above in Equation (21) there is at least one tumour cell, one of the tumour cells is selected at random to be 873 killed. The tumour cell is labelled as necrotic, and t ϕ is reset to 0 for that macrophage. 874 Force laws for macrophages In addition to mechanical forces, macrophages are 875 also subject to forces which model random movement through their neighbourhood and 876 chemotactic forces due to spatial gradients of CXCL12 and CSF-1. We model random 877 macrophage movement by applying a force F r i = (F r x , F r y ) at each timestep, given by: where the coefficient D describes the strength of the random force and n = (n x , n y ), 879 where n x and n y are random variables drawn from a standard normal distribution.

880
The chemotactic force applied due to the gradients of chemoattractants for a 881 macrophage i depends on its phenotype, p i , and is: where A i is the area of the cell calculated as A i = πr 2 i and r i is the estimated cell radius 897 calculated via the average separation between cells within the interaction radius of i.

906
Force laws for tumour cells Tumour cells move according to the force balance 907 described in Equation (23). The force term incorporates the intercellular interactions 908 described above, but also accounts for chemotaxis between tumour cells and EGF. This 909 force, denoted F χ i , has the form where χ T is parameter determining the sensitivity of tumour cells to the EGF gradient. 911 The force term used in Equation (23) is therefore where ω str H is a parameter determining the oxygen threshold below which stromal cells 919 become hypoxic and A H i str is a parameter defining the proportion of a stromal cells 920 target area below which size the cell cycle stops.

921
As with tumour cells, we define a second oxygen threshold ω str N , below which stromal 922 cells become necrotic. In our model, we take ω str H < ω tum H and A H i str < A H i tum to 923 account for the ability of tumour cells to proliferate in more adverse environments than 924 stromal cells.

925
Force laws Stromal cells are subject to the same intercellular forces as macrophages 926 and tumour cells, defined in Equations (24)- (25). The force balance for stromal cells 927 used in Equation (23) is therefore simply  RDEs describing the diffusible species are solved numerically on a regular triangular 940 mesh with edge length 1 cell diameter. We initialise the simulation by selecting lattice 941 sites to act as point vessels, which do not occupy space or interact directly with cells in 942 our model. All lattice sites more than R B cell diameters from the centre of the domain 943 are possible blood vessel sites, and N B of these sites are chosen, at random, to be blood 944 vessels. This ensures that the centre of the domain is at least R B cell diameters from 945 the nearest blood vessel and, hence, that there is sufficient space to observe macrophage 946 movement between blood vessels and the tumour, while also ensuring that cells near the 947 domain boundaries are well-oxygenated.

948
The domain is initialised with stromal cells filling the domain in rows that are 0.75 949 cell diameters apart, with alternating rows offset by 0.375 cell diameters (i.e., forming a 950 hexagonal lattice). We place four tumour cells in a cluster at the centre of the domain 951 approximately 0.5 cell diameters apart. Stromal and tumour cells are assigned cell cycle 952 durations τ i from the relevant distributions, and cell cycle progression times T i from a 953 uniform distribution U (0, τ i ). All cells are constrained to remain within the domain by 954 imposing reflective boundary conditions.    In Table 3 we list the model parameters used, their default dimensional and 963 dimensionless values or ranges, and supporting references where these are available.

964
Value of some parameters, indicated with * , have been estimated based on model 965 behaviour.

966
Our model is implemented such that typical scales are given by:

968
• Time: 1 hour is 1 unit of time.

969
• Concentration: the boundary concentration of oxygen is 1. The simulation results presented in Fig 15A show  ω tum N , and the cell cycle duration τ i ) so that a tumour initially located at the centre of 979 the domain grows as a compact mass. At long times, the total number of tumour cells 980 and necrotic cells remains approximately constant (see Figures 15B and C). Further, the 981 tumour attains its steady state before it can spread to the surrounding blood vessels.  In the main text, we show how the wPCF can identify correlations between a 984 continuous label (phenotype) and a categorical one (cell type). We now explain how we 985 can extend the wPCF to study correlations between two continuous labels. Consider 986 two labels u and v associated with each cell, which may be continuous or discrete. For 987 two target marks U and V , the wPCF in its most general form can be written as:

970
where w u and w v are weighting functions, and W U = i w u (U, u i ) and • Are u and v discrete or continuous marks?

992
• What are the ranges of u and v?

993
• At what resolution do we wish to identify correlations?

994
Through appropriate choices of w U and w V , Equation (37) reduces to the ordinary 995 PCF, the cross-PCF, or the discrete-continuous form of the wPCF that was introduced 996 in the main text. Table 4 illustrates weighting functions which achieve these different 997 cases. Table 4. Choices of w which simplify the wPCF. By appropriately choosing w u and w v , Equation (37) reduces to the wPCF presented in Equation (17), to the cross-PCF in Equation (15) In this appendix we consider different choices of the weighting, or kernel, function w p , 1000 and apply them to synthetic data. In Fig 16 we place 50 pink crosses uniformly along 1001 the line y = 1, and 1000 circles according to complete spatial randomness throughout 1002 the (2 × 2) square domain. The circles are labelled with a 'phenotype' p based on their 1003 distance from y = 1, such that for circle i the label p i is given by:

998
By construction, this results in two prominent values of p which are correlated with 1005 pink crosses at distance r: p = r (below y = 1) and p = 1 − r (above y = 1). text in Equation (16), which has the form w p (P, p i ) = max 1 − |P −pi| ∆P , 0 , for different 1010 values of ∆P . In the main text, we use this weighting function with ∆P = 0.2, which 1011 we choose as it leads to a sufficiently compact kernel to identify variations in phenotype 1012 while being broad enough to reduce noise. Weighting functions in Fig 17B have  Changing the shape of the weighting function adjusts the balance between signal and noise in the wPCF: the narrower the support of w, the more clearly the relationship between label and distance can be discerned. Using a weighting function with extremely narrow support relative to the range of labels results in more noise in the wPCF, most evident in the triangular weighting functions with ∆P = 0.05 and ∆P = 0.01. A: wPCFs generated using weighting functions of the form w(P, p) = 1 − m|P − p|, together with w(0.5, p) B: wPCFs generated using weighting functions of the form w(P, p) = 1 − m(P − p) 2 , together with w(0.5, p) wPCF which varies more smoothly with P . In all cases, Fig 17 shows the resulting 1015 wPCF alongside w p (0.5, p i ).
1016 Fig 17 shows that the shape of the weighting function plays a key role in determining the balance between signal and noise in the wPCF. When the support of 1018 the weighting function is broad (e.g., ∆P ≥ 1), the wPCF does not identify correlations 1019 between r and the target phenotype P . On the other hand, when the non-zero part of 1020 the weighting function is very compact (e.g., ∆P = 0.01 for the triangular weighting 1021 function) the resulting wPCF identifies correlation well but contains a lot of noise. This 1022 suggests that as long as the support of w p is chosen appropriately then correlations will 1023 be correctly identified through the wPCF, regardless of the precise shape of the kernel. 1024 We conclude that the choice of weighting function can have a strong effect on the 1025 resulting wPCF, and should be chosen with care. Selecting a weighting function which 1026 is too 'narrow' will result in noisy wPCFs, while an unsuitably 'broad' choice will not 1027 produce wPCFs with a high enough resolution to identify key features. The appropriate 1028 choice is likely to depend on both the distribution of labels in the data and the number 1029 of points available, in the same way that the selection of an appropriate annulus radius 1030 for the PCF must be tailored to the dataset in question. S7 Appendix: Partial correlations in 6-parameter sweep 1065 In Table 5 we list all of the partial rank correlation coefficients used to generate  Table 5. Partial rank correlation coefficients (PRCCs) Full table of PRCCs corresponding to selected scalar statistics and parameters, based on simulations in which 6 parameters were randomly varied (n = 432). Selected PRCCs are presented in the main text in Table 2.