## Abstract

Scratch assays are routinely used to study collective cell behaviour *in vitro*. Typical experimental protocols do not vary the initial density of cells, and typical mathematical modelling approaches describe cell motility and proliferation based on assumptions of linear diffusion and logistic growth. Jin *et al*. (2016) find that the behaviour of cells in scratch assays is density dependent, and showed that standard modelling approaches cannot simultaneously describe data initiated across a range of initial densities. To address this limitation, we calibrate an individual based model to scratch assay data across a large range of initial densities. Our model allows proliferation, motility and a direction bias to depend on interactions between neighbouring cells. By considering a hierarchy of models where we sequentially remove interactions, we perform model selection analysis to identify the minimum interactions required for the model to simultaneously describe data across all initial densities. The calibrated model is able to match the experimental data across all densities and captures details about the spatial structure of cells. Our primary findings provide strong evidence to suggest that motility is density-dependent in these experiments. On the other hand, we do not see the effect of crowding on proliferation in these experiments. These results are significant as they are precisely the opposite of the assumptions in standard continuum models, such as the Fisher-Kolmogorov equation and generalisations of the Fisher-Kolmogorov equation.

## 1 Introduction

Scratch assays are routinely used to study collective cell behaviour *in vitro* [1–5]. These experiments are conducted by placing a monolayer of cells on a two-dimensional substrate and creating an artificial wound, or scratch, in the centre of the population (figure 1*a–d*) [3]. Typical experimental protocols do not vary the initial density of cells between experiments, which limits the amount of information that can be obtained about any density-dependence of cell migration or proliferation. We consider novel experimental data where we deliberately vary the initial density of cells between experiments. The variation in the initial cell density in our experiments is large: the initial population in the highest density experiment is greater than the the population after 36 h in the lowest density experiment.

Logistic growth and linear diffusion are often assumed to be the key mechanisms governing collective cell behaviour in a range of *in vitro* and *in vivo* conditions [3, 6–11]. Mean-field mathematical models that incorporate one or both of these mechanisms are routinely used to model tumour spheroids [12]; cells in living tissues [13, 14]; and simple *in vitro* experiments such as scratch [3], migration [15] and proliferation [16] assays. While calibrating these models to experimental data often leads to a good match [6], these models make the standard assumption that the parameters are independent of both initial condition and cell density. For example, the Fisher-Kolmogorov equation
is commonly used to model scratch assay experiments [3, 17] and describes density-independent motility, characterised by a constant diffusivity *D*; and density-dependent proliferation, characterised by a constant proliferation rate *λ* and a constant carrying capacity *K*. Jin *et al*. [3] find that calibrating equation (1) to scratch assay data yields vastly different estimates of *D* for each initial condition considered. The assumption of density-independent motility may be, therefore, inappropriate.

In this work we describe the cell behaviour with a lattice-free individual based model (IBM) [16, 18, 19]. The IBM represents cells as *agents* that take locations in continuous space, and so we can specify the initial agent locations in the model to precisely match the initial cell locations in the experiments. This choice also allows the model to capture local details—such as spatial structure and clustering—which are neglected by standard continuum modelling approaches [3, 9]. The agents in the IBM undergo random proliferation and movement events, the rates of which we assume depend explicitly on interactions between neighbouring agents. We quantify these interactions with kernels that depend on the distance between pairs of cells. Directional bias is also incorporated, so that agents are more likely to move either away from, or towards, regions of high density [18, 20]. A key advantage of the IBM is its flexibility: it is trivial to add and remove mechanisms, which we do to study the interactions required for the model to simultaneously match all experiments. Finally, the IBM is stochastic and so naturally describes the variation between experiments.

The primary goal of this study is to identify interactions which enable the model to simultaneously describe experimental data across a range of initial densities. We take a Bayesian approach to parameter estimation [16, 21–23], and identify interactions using model selection [21]. We always force the model to simultaneously match data from all nine experiments. The mathematical model is always initiated using the initial configuration of cells in each experiment, and we compare simulated and experimental data at 18 h and 36 h, the latter which corresponds to the duration of the experiment. The calibrated model is able to replicate the experimental data, and we find evidence of density dependent motility, which is contrary to the usual assumption of linear diffusion. Additionally, experimentation with summary statistics confirms the importance of spatial structure, which is neglected by standard modelling and model calibration approaches.

## 2 Materials and methods

### 2.1 Experimental methods

We consider a series of scratch assay experiments using the PC-3 prostate cancer cell line [24], where we deliberately vary the initial number of cells. For full details of the experimental technique, see Jin *et al*. [3]. In summary, a population of cells is seeded at a density of approximately 8000, 10000 and 12000 cells in a 9000 µm diameter well within a 96-well plate (figure 1a,b). Cells are grown overnight to create a spatially uniform monolayer before a scratch is created (figure 1*c*). Images of the central 1440 × 1900 µm of each well are captured over a period of 48 hours after the monolayer is scratched (figure 1*d*).

ImageJ [25] is used to determine the approximate coordinates of individual cells in each image, this data is given in the supporting material. We exclude the first 12 hours of experimental data from our analysis [16] to ensure that sufficient time has passed so that the cells are migrating and proliferating after the scratch has been made. We then record experimental images and we treat this as the beginning of the experiment, *t* = 0 h. The variability in initial cell number is high: despite an initial seeding density of approximately 8000–12000 cells peer well, which corresponds to expected initial number of cells within the field-of-view of 344–516, we find that the initial number of cells within the field-of-view at *t* = 0 h ranges from 183 to 731 (figure 1*f–i*). This variation is also high between experiments of the same seeding density [26], due to the fact that our field-of-view is relatively small and so that fluctuations about the expected values are relatively large. We summarise the experimental data in figure 2 and table 1.

### 2.2 Mathematical model

We use a lattice-free individual based model (IBM) [16, 18] which we simulate with the Gillespie algorithm [27]. The model includes density-dependent proliferation and movement events, but does not consider death, which is not observed in the experiments. To be consistent with previous experimental observations [20], the model incorporates a bias mechanism so that cells both move, and disperse daughter agents during proliferation, in a direction either towards, or away, from crowded regions.

The field-of-view of the experimental data is rectangular, with dimensions 1440 × 1900 µm (figure 1*d*), and we replicate this by using the same geometry in the model. As the well in the tissue culture plate is much larger than this field-of-view, we apply periodic boundary conditions [16] (indicated in blue in figure 1*c,d*). Cells are modelled as *agents* that have a point location but no physical size. In our previous work, we find that, on average, these PC-3 prostate cancer cells have an area that corresponds to a disc of diameter *φ* = 24 µm [16]. The interaction mechanisms we model are not based on volume exclusion, but rather depend agent separation in such a way that configurations wherein two agent centres are very close are unlikely. We denote the agent locations **x**_{n} = (*x, y*), *n* ∈ {1, …, *N*(*t*)}, where *N*(*t*) denotes the number of agents in the simulation. We specify the initial agent locations in each simulation to match the experimental images at *t* = 0 h.

#### Directional bias

We quantify crowding by placing a *bias kernel* at the location of each agent to form a *crowding surface, B*(**x**), as shown in figure 3*c,d* for the configuration of cells in figure 3*a,b*. Mathematically, this is given by

and describes a measure of local crowding at **x**, where *w*^{(b)}(*r*) is the bias kernel. The contributions of each agent to *B*(**x**) depend on the distance between **x** and the location of the *i*th agent, **x**_{i}, given by *r* = ‖**x** − **x**_{i}‖. In this study, we choose *w*^{(b)}(*r*) to be a Gaussian [28] of spread *σ* with an extremum of *γ*_{b} so that

For computational efficiency, we truncate the kernel to zero for *r ≥* 3*σ* [28].

For *γ*_{b} *>* 0, agents prefer to move and disperse daughter agents in the direction of steepest descent on the crowding surface, which corresponds to regions of lower density (setting *γ*_{b} *<* 0 has the opposite effect). This preference depends on the steepness, so that agents close to highly crowded regions are more likely to move and disperse daughter agents in their preferred direction, demonstrated in figure 3*e,f*, where the red agent has a stronger bias strength than the green agent. To do this, we define the bias vector of agent *n* as
which gives the magnitude and direction of steepest descent. The movement and proliferation directions are then sampled from the von Mises distribution [29]

The expected and most likely direction is, therefore, arg(**B**_{n}). The direction distribution becomes increasingly concentrated around arg(**B**_{n}) as ‖**B**_{n}‖ becomes large, and approaches a uniform distribution on [0, 2*π*) as ‖**B**_{n}‖ → 0.

We illustrate the directional bias mechanism in figure 3*c–f*. The crowding surface is constructed with a Gaussian kernel placed at the location of each agent (figure 3*c,d*). In figure 3*e* we show the bias distribution and preferred direction for an agent in a low (green) and high (red) density region. For each agent, the arrow shows the preferred direction, and the corresponding von Mises distribution is plotted in radial coordinates centred at the location of each agent. In figure 3*f* we show these distributions are shown as a function of the angle, *θ* ∈ [0, 2*π*), for clarity.

#### Proliferation and movement

Proliferation and movement events occur according to a Poisson process [30] with density-dependent rates *P*_{n} *≥* 0 and *M*_{n} *≥* 0, respectively. These rates comprise constant intrinsic rates *p >* 0 and *m >* 0, that are modified by interactions with neighbouring agents. We quantify these interactions using kernels, *w*^{(·)}(*r*), that depend on the separation distance, *r ≥* 0, between an agent and its neighbours, such that

Again, we choose the kernels to be Gaussian, with spread *σ*, so that

Here, *γ*_{p} and *γ*_{m} are the extrema of the proliferation and movement kernels, respectively. A value of *γ <* 0 means that crowding increases motility or proliferation; a value of *γ >* 0 means that crowding decreases motility or proliferation; and, a value of *γ* = 0 means that motility or proliferation is independent of local density. Again, for computational efficiency, we truncate the kernels to zero for *r ≥* 3*σ* [28].

When an agent at **x**_{n} proliferates, the daughter agent is dispersed to a location of distance *φ* (approximately one cell diameter) from **x**_{n}, with the direction sampled from the bias distribution for that agent (figure 3*e,f*). This is demonstrated in figure 3*g*.

When an agent at **x**_{n} moves, it is moved to a location of distance *φ* (approximately one cell diameter) from **x**_{n}, with the direction sampled from the bias distribution for that agent (figure 3*e,f*). This is demonstrated in figure 3*h*.

### 2.3 Summary statistics

To match model simulations to the experimental data, we record the locations of agents at both *t* = 18 h and *t* = 36 h. We denote the experimental data at both time points from experiment *i* ∈ {1, …, 9} as , and simulation data from experiment *i* as . In this section, we detail how we summarise the high dimensional data **X** into lower dimensional summary statistics. This allows us to define a distance function, *d*(**X**_{obs}, **X**_{sim}), that represents the distance between experimental and simulation data.

We aim to capture three key pieces of information in the experiments: (1) the population size; (2) the spatial structure; and, (3) the density profile. The first two pieces of information are related to the first two spatial moments [28], and the last piece of information relates to the wound closure, total population and the spatial distribution of cells. The first spatial moment, the average density, is the number of agents in the population, *N*(*t*). The second spatial moment describes the spatial distribution of agents, often characterised by a pair correlation function [2, 18, 28]. In summary, the pair correlation function describes the number of pairs of agents separated by a distance *r*, relative to if the population were uniformly distributed. Since the data are discrete, we define the pair correlation, 𝒫 (*j, t*), *j* ∈ ℕ, which describes the relative number of pairs separated by a distances ranging from (*j* − 1)Δ*r < r < j*Δ*r*. This is given by
where *L* and *W* are the dimensions of the region and 𝟙 is the indicator function. In this study, we choose Δ*r* = 5 µm, and consider the pair correlation up to a distance of 100 µm such that *j ≤* 20. Smaller values of Δ*r* lead to a noisier pair correlation function, and larger values of Δ*r* hide information.

In a scratch assay the central region of the experimental field-of-view is approximately devoid of agents (figure 4*a*). To account for this, we calculate pair correlation functions for sub-region of width 400 µm in the far-left, and far-right, of the domain (figure 4*a*, indicated in red) denoted 𝒫^{(L)}(*j, t*) and 𝒫^{(R)}(*j, t*), respectively. We apply periodic boundary conditions on these sub-regions, so that the separation of a pair of agents is the smallest possible distance accounting for the periodic boundary conditions. The pair correlation function that summarises the entire experiment is 𝒫 (*j, t*) = (𝒫 ^{(L)}(*j, t*) + 𝒫 ^{(R)}(*j, t*))/2 (figure 4*b*).

The final piece of information, the density profile, describes the wound closure, total population and spatial structure. We subdivide the field-of-view in figure 4*a* into 80 vertical sub-regions, each of width Δ*x* = 1900/80 = 23.75 µm. We define the density profile 𝒟 (*j, t*) to be the number of agents with an *x*-coordinate between (*j* − 1)Δ*x* and *j*Δ*x*, divided by the area of the sub-region, giving the density. This density profile is shown in figure 4*c*. To avoid capturing excessive noise in our measurement of wound closure, we do not include the entire density profile in the distance metric. Rather, we manually approximate the *x*-coordinate of the centre of the scratch at *t* = 0 h for each experiment, denoting as the bin index of the centre the scratch in experiment *i*. We include the central 41-subregions which, in effect, surround the initially scratched region of each experiment. This region is indicated in figure 4*c* and avoids the fluctuations in density outside this region.

The distance metric, *d*(**X**_{obs}, **X**_{sim}), is given by
and includes information from all three summary statistics, at *t* = 18 h and *t* = 30 h. Therefore, *d*(**X**_{obs}, **X**_{sim}) is the relative square error of the simulation from the experiment. For 𝒫 and 𝒟, the contributions to *d*(**X**_{obs}, **X**_{sim}) approximate the relative square error in the integral of each summary statistic, given the spatial discretisation we have applied to each.

### 2.4 Approximate Bayesian computation and model selection

We consider a hierarchy of models. The full model, which we denote as Model 1, contains the five unknown parameters *θ*_{1} = (*m, p, γ*_{m}, *γ*_{p}, *γ*_{b}). Models 2 to 5 are subsets of the full model, where we progressively restrict various combinations of the interaction strength parameters *γ*_{m}, *γ*_{p} and *γ*_{b} to be zero, effectively removing them from the model. We summarise these five models in table 2, where we denote *θ*_{k} as the unknown parameter combination for Model *k*.

We treat the unknown parameters in each model as a random variable, ** θ**. In the absence of experimental observations, our knowledge of

**is characterised by specified prior distributions. When included in the model, the priors were chosen to be independent and are as follows:**

*θ**π*(

*m*) =

*U*(0, 10)/h;

*π*(

*p*) =

*U*(0.02, 0.05)/h;

*π*(

*γ*

_{m}) =

*U*(− 2, 2)/h;

*π*(

*γ*

_{p}) =

*U*(0, 0.02)/h; and

*π*(

*γ*

_{b}) =

*U*(0, 100) µm. Initially, we also treat

*σ*as an unknown parameter where

*π*(

*σ*) =

*U*(2, 30) µm. This initial analysis provides strong evidence for the value of

*σ*, so we set

*σ*=

*φ*/2 = 12 µm to decrease the dimensionality of the parameter space. In the supporting material, we also investigate

*σ*=

*φ*= 24 µm, since this is a natural choice in a lattice-based framework where the migration distance and dispersal distance are also the same as the average agent diameter. We apply approximate Bayesian computation (ABC) [14, 16, 21, 23] to update our knowledge of the parameters using experimental observations,

*𝒳*

_{obs}, from all nine experiments, to produce posterior distributions,

*π*(

**|**

*θ**𝒳*

_{obs}). Since this model is known to be computationally expensive [16] and we have a high-dimensional parameter space, we apply an ABC method based on sequential Monte-Carlo (SMC) [21, 23, 32].

In this study, we aim to find parameter combinations that simultaneously match all nine experimental data sets, such that . For each prior sample in the ABC rejection algorithm we simulate a model realisation using each experimental initial condition, to obtain . We then compare observed data, *𝒳*_{obs}, to simulated data, *𝒳*_{sim}, using the discrepancy measure
where *d*(·, ·) is given in equation (11). In ABC techniques, we accept a proposal as a posterior sample if *ρ*(*𝒳*_{obs}, *𝒳*_{sim}) *< ε* for some threshold *ε*. As *d*(·, ·) *≥* 0, the sum in equation (12) is non-decreasing in *i*. We therefore implement early rejection [33] by sequentially producing model realisations for *i* ∈ {1, …, 9}. If, at any time, the partial sum up to a value *i* exceeds the threshold *ε*, we immediately reject the sample. In practise, this saves considerable computation time by reducing the number of times the model must be simulated using high-density initial conditions.

The principle behind ABC SMC is to propagate a series of prior samples, called *particles*, through a sequence of distributions *π*(** θ**|

*ρ*(

*𝒳*

_{obs},

*𝒳*

_{sim})

*< ε*

_{u}),

*u*= {1, …,

*U*} [21, 23, 32].

*The thresholds ε*

_{u}satisfy

*ε*

_{u}

*> ε*

_{u+1}, so that the distribution gradually evolves to the target distribution

*π*(

**|**

*θ**ρ*(

*𝒳*

_{obs},

*𝒳*

_{sim})

*< ε*

_{U}) ≈

*π*(

**|**

*θ**𝒳*

_{obs}). To obtain a sequence of thresholds, and an estimate of the smallest discrepancy possible in all models, we first perform a pilot run using ABC rejection [16, 23] with Model 1 (supporting material, section 1.1). From 100,000 prior samples, this provides an estimate of the probabilities Pr(

*ρ*(

*𝒳*

_{obs},

*𝒳*

_{sim})

*< ε*

_{u}), given

**is simulated from the prior. We choose the sequence by examining a quantile plot (supporting material, section 3). We choose**

*θ**ε*

_{U}to corresponds to an acceptance rate of approximately 1% under ABC rejection. The sequence of discrepancies, and details of the ABC rejection and SMC algorithms are given in the supporting material (sections 1 and 3).

We follow the ABC SMC algorithm of Toni *et al*. [21] to perform parameter inference and model selection. Under this algorithm, we place a prior distribution on the model index, *π*(*M*_{k}), which we choose to be a discrete uniform distribution so that each model is equiprobable. ABC SMC is then used to estimate the posterior probability of each model, *π*(*M*_{k}|*𝒳*_{obs}). We detail this algorithm in the supporting material (section 1.2). A key feature of this technique is to implicitly penalise models with a higher number of parameters. We compare models by computing the Bayes factor, *ℬ*_{k} [34], *which describe the evidence* in favour of Model *k* over the full model, Model 1. As a uniform prior is placed on the model index, the Bayes factor is given by

Here, *π*(*M*_{k}|*𝒳*_{obs}) denotes the marginal posterior density of *M*_{k} (Model *k*). A value *ℬ*_{k} *>* 1 indicates evidence in favour of Model *k* compared to the full model, and vice-versa for *ℬ*_{k} *<* 1. The Bayes factor is therefore simply the ratio of the posterior density for Models *k* and 1, and provides evidence to compare models in a similar way to that used in frequentist hypothesis testing.

## 3 Results and Discussion

Common mean-field models, such as the Fisher-Kolmogorov equation and generalisations of the Fisher-Kolmogorov equation, are not able to simultaneously describe collective cell behaviour in scratch assay experiments across a range of initial densities [3]. This suggests density-dependent behaviour in these experiments that is not captured by linear diffusion. Our model allows interactions between cells to affect proliferation, movement and direction. To identify the importance of each of these interactions, we simultaneously calibrate our model to nine scratch assay experiments which we initiate across a wide range of initial densities.

Our first result is to identify the distance over which these interactions occur. We quantify interactions using Gaussian kernels dependent on the distance between pairs of agents [28], and characterised by a spread parameter *σ* (equations (3), (8) and (9)). The interaction between a pair of agents separated by more than approximately 3*σ* is, therefore, negligible. We expect *σ* to be of the same order of magnitude as *φ* = 24 µm, which is the approximate cell diameter [16]. We perform ABC rejection where *σ* is sampled from the prior *U*(2, 30) (supporting material, section 1.1). These results suggest that *σ* ≈ *φ*/2 = 12 µm, and we fix this for the rest of the study to reduce the number of unknown parameters. This result suggests that interactions between cells occurs over a relatively short distance, since the model predicts interactions between cells separated by more than 3*σ* = 36 µm is negligible.

One of the most important aspects of the lattice-free IBM is its ability to describe, in fine detail, the spatial structure of cells in the experiments, which we quantify using the pair correlation function. In contrast, mean-field models consider only average properties of the cell population [17] and lattice-based methods [19, 22] are not able to precisely capture the initial agent configuration from the experiments. Lattice-based methods also, by definition, constrain the separation of agents to take discrete values, and typically agents in these models cannot lie closer than one cell diameter. The pair correlation describes the probability of finding pairs of agents separated by each distance, and hence can provide information about the effect of interactions on the dynamics. To show this, we repeat ABC rejection but exclude the pair correlation function from the distance metric (supporting material, section 2.3). These results show that the posterior distributions change significantly in this case, verifying that the pair-correlation function contains a significant amount of information about these interactions.

To quantitatively determine the importance of each interaction, we consider a hierarchy of models where we successively set interaction strength parameters (*γ*_{m}, *γ*_{p} and *γ*_{b}) to zero to remove the corresponding interaction from the model. We use the model selection algorithm of Toni *et al*. [21], and compare the evidence in favour of each model over the full model (Model 1) using Bayes factors [21, 34]. We show the posterior density for each model in figure 5a, and summarise the Bayes factors and evidence in table 3. Overall, we find that Model 1 has the highest posterior density (figure 5*a*). We find positive evidence in favour of Model 1 over Model 2 (where *γ*_{m} = 0 and so motility is density-independent); and weak evidence in favour of Model 1 over Model 3 (where *γ*_{b} = 0 and so there is no directional bias). Importantly, we find that Models 4 and 5, where *γ*_{m}, *γ*_{b} = 0 and *γ*_{m}, *γ*_{p}, *γ*_{b} = 0, respectively, cannot match the experimental data (*ℬ*_{4} = *ℬ*_{5} = 0). Contrary to assumptions that are commonly made in models such as the Fisher-Kolmogorov equation, these results provide evidence to suggest that motility is density-dependent, as either a density dependent movement rate must be included (Models 1 and 3) or a directional bias (Models 1 and 2).

We now focus on results for the full model (Model 1), which has the highest posterior density. In figure 5*b–f* we show marginal posterior distributions for each parameter in Model 1, and in figure 6 we compare the experimental data from four of the nine experiments to the calibrated model (in the supporting material, we show these results for all nine experiments). Overall, we find an excellent match between the model and experimental data, which has not been seen across a range of initial densities for this kind of experimental data. In addition to matching the density profile (figure 6*m–p*) and population (figure 6*u–x*), we find that the calibrated IBM is able to capture information about the spatial structure of cells, specifically, the pair correlation function (figure 6*q–* t). We perform a posterior predictive check for each summary statistic by producing 50% and 95% prediction intervals (PI) that characterise both the parameter uncertainty and stochasticity described by the model. The summary statistics produced from the experimental data almost always lie completely within the 95% PI, further indicating that the calibrated model is consistent with the experimental data across the range of initial densities. While we have not presented these results for Models 2 and 3, which have non-zero posterior density, the nature of ABC means that all accepted samples lie a similar distance to the experimental data.

Results in figure 5*c* suggest that *γ*_{m} *<* 0, so that crowding *increases* motility. This is consistent with mean-field models such as the porous Fisher equation [35] where the diffusivity monotonically increases with local density, but contrasts to other non-linear diffusion models where cell motility decreases with crowding [36]. This observation also explains why model realisations with small values of the motility rate, *m*, are able to match the data (this is seen in figure 5*b*), since a value *γ*_{m} *<* 0 allows motility in crowded regions if *m* « 1. Interestingly, these results are less clear in the case where the pair correlation function is neglected (supporting material, section 2.3), which againhighlights the importance of considering spatial structure when studying these interactions. The increase of motility due to crowding may correspond to mechanical interactions such as volume exclusion in very high density regions. It is trivial to add mechanisms to the IBM, and future work may examine *γ*_{m} in the case volume exclusion [19], or other kinds of mechanical interactions [37–39], are included as additional mechanisms. Alternatively, the inclusion of non-monotonic interaction kernels [40] may allow movement to increase for agents close together, and decrease in crowded regions.

An interesting result is that the directional bias is included in the models with the highest posterior density (Models 1 and 3), but examining the marginal posterior for *γ*_{b} (figure 5*f*), we see that the strength of this bias may not be identifiable: the posterior distribution is relatively flat without a clear mode. These results might suggest that, past a certain point, increasing the strength of the directional bias as negligible effect. We verify these observations by widening the prior distribution for *γ*_{b} by a factor of two in the supporting material (section 2.2). To obtain more information about the strength of the directional bias, representing the propensity of cells to move away from proximate neighbours, more detailed data, such as cell tracking data, may be required [39].

Results in figure 5*e* indicate that the proliferation interaction strength, characterised by *γ*_{p}, may also be unidentifiable. Figure 6*u–x* shows that population growth appears to be exponential, and so we do not see crowding effects on proliferation in these experiments. Early time data is often exponential for a variety of growth laws and experiments must be run for a longer period of time to identify the appropriate growth function [22]. We verify this by performing model selection with three additional models (Models 6–8) that respectively correspond to Models 1–3 with *γ*_{p} = 0 (supporting material, section 4). These additional results show that the distributions for Models 6–8 are similar to those for Models 1–3 and confirm that crowding effects on proliferation are simply not seen in these experiments.

## 4 Conclusion

The ability of common mean-field models, such as the Fisher-Kolmogorov equation and generalisations of the Fisher-Kolmogorov equation, to match experimental data across a range of densities is rarely tested as typical experimental protocols do not vary the initial number of cells. These models typically assume either or both density-dependent proliferation and density-independent motility [3,6–11]. By modelling density-dependent interactions which effect motility, proliferation and directional bias, we calibrate a mathematical model that simultaneously describes scratch assay data across a range of densities. Using model selection, we quantitatively assess which interactions are most important. We find, in contrast to common mean-field models such as the Fisher-Kolmogorov equation and generalisations of the Fisher-Kolmogorov equation, evidence to suggest that movement is density-dependent, while there is little evidence of density-dependent proliferation. Additionally, our results confirm the importance of spatial structure, which is neglected by standard modelling approaches.

In this study we study density-dependent interactions that effect proliferation, motility and directional bias. Applying SMC, which penalises models with high dimensionality of the unknown parameters, our study results in the minimal model required to match the experimental data. Two of the primary advantages of the IBM is its ability to precisely replicate the initial condition from experimental data; and, the ease of which new mechanisms can be added. Our approach can, therefore, be applied to quantify experimental evidence for more complex mechanisms including chemotaxis [14, 41]; mechanotaxis [42]; generalised growth laws [43], to name a few. However, we do not pursue such extensions here since we find that our simpler modelling framework can provide a good match to all our experimental data without including more complex mechanisms.

## Authors’ contributions

A.P.B. performed the research and wrote the paper. A.P.B. and W.J. processed the experimental data. All authors provided feedback and gave approval for final publication.

## Data accessibility

All experimental data are available as electronic supporting material. Unprocessed experimental data, and code used in this work, are available on Github at github.com/ap-browning/scratchIBM.

## Competing interests

We have no competing interests.

## Acknowledgements

M.J.S. is supported by the Australian Research Council, M.J.P. is partly supported by Te Pūnaha Matatini, a New Zealand Centre of Research Excellence, and W.J. is supported by a QUT Vice Chancellor’s Research Fellowship. We thank David Warne for technical advice. Computational resources and services used in this work were provided by the HPC and Research Support Group, Queensland University of Technology, Brisbane, Australia.