## Abstract

The lack of effective biomarkers for predicting cancer risk in premalignant disease is a major clinical problem. There is a near-limitless list of candidate biomarkers and it remains unclear how best to sample the tissue in space and time. Practical constraints mean that only a few of these candidate biomarker strategies can be evaluated empirically and there is no framework to determine which of the plethora of possibilities is the most promising. Here we have sought to solve this problem by developing a theoretical platform for *in silico* biomarker development. We construct a simple computational model of carcinogenesis in premalignant disease and use the model to evaluate an extensive list of tissue sampling strategies and different molecular measures of these samples. Our model predicts that: (i) taking more biopsies improves prognostication, but with diminishing returns for each additional biopsy; (ii) longitudinally-collected biopsies provide slightly more prognostic information than a single biopsy collected at the latest possible time-point; (iii) measurements of clonal diversity are more prognostic than measurements of the presence or absence of a particular abnormality and are particularly robust to confounding by tissue sampling; and (iv) the spatial pattern of clonal expansions is a particularly prognostic measure. This study demonstrates how the use of a mechanistic framework provided by computational modelling can diminish empirical constraints on biomarker development.

## Introduction

Each year, tens of thousands of patients in the UK are diagnosed with a premalignant disease, a benign condition that predisposes to the future develop of cancer. Examples of common premalignant diseases include Barrett’s Oesophagus [1], Ductal Carcinoma *in situ* (DCIS) of the breast [2], benign prostatic intraepithelial neoplasia (PIN) [3], and carcinoma *in situ* in the bladder [4]. The clinical management of patients with premalignant disease is a major challenge: in order to prevent cancer, patients are typically enrolled into longitudinal screening programmes that aim to detect (and then treat) patients who show early signs of progression to cancer. However, while having a premalignant disease increases the *average* risk of developing cancer compared to the unaffected population, the cancer risk for any individual is highly variable and generally quite low. For example, patients with Barrett’s Oesophagus have an average 40-fold increased lifetime risk of developing adenocarcinoma, but the progression rate per patient per year is less than 0.5% [5] and so many of these patients will not progress to cancer in their lifetime. As a result, it is arguable that surveying an average (low-risk) patient is unnecessary as they are unlikely to ever progress to cancer. In addition, the surveillance process is typically unpleasant for the patient, and is very costly to health-care providers. In view of these facts together, premalignant disease is often described as both *over-diagnosed* and *over-treated* [6], and consequently there is a pressing clinical need to be able to accurately stratify cancer risk in these patients.

Prognostic biomarkers are central to current risk-stratification strategies. Here a biomarker is defined as an analysable property of the diseased tissue that correlates with the risk of progressing to cancer. In general, it remains unclear which of the plethora of potential biological features that could be assayed (morphological, gene expression, mutation, or other features) offers the most potential for prognostic value. Pathological grading and staging remain the most widespread biomarkers in current use; these biomarkers are descriptions of the morphological features of the disease. The current state-of-the-art biomarkers are molecular in nature, and typically quantify the aberrant expression of a panel of carefully-chosen genes. For example, the Oncotype DX assay analyses the activity of 21 genes to determine a score quantifying risk of recurrent breast cancer and response to chemotherapy [7]. Genetically based biomarkers include EGFR mutations in non-small cell lung cancer [8] and *TP53* abnormalities in Barrett’s Oesophagus [9]. The limited predictive value of existing biomarkers has prevented their widespread clinical use [10], and for many diseases such as DCIS [11] and inflammatory bowel disease [12] no prognostic biomarkers have yet been identified.

All biomarkers require the diseased tissue to be sampled. Needle biopsies are the predominant sampling method, although other tissue collection methods such as endoscopic brushings or cell washings are sometimes used. However, typically the prognostic optimality of different sampling schemes, including whether samples should be collected longitudinally, has not been evaluated. Furthermore, given the fact that taking a biopsy is an invasive procedure, an empirical evaluation of different tissue sampling schemes is largely unfeasible.

Cancer development is fundamentally an evolutionary process: the acquisition of random somatic mutations can cause a cell to develop an evolutionary advantage over its neighbours, and so drive the clonal expansion of the mutant. Repeated rounds of mutation and clonal selection can lead to the development of a malignant tumour. When viewed from this evolutionary perspective, a biomarker may be thought of as a predictor of the *evolutionary trajectory* of the disease; a successful biomarker is one that sensitively and specifically detects which premalignant lesions are (rapidly) evolving towards cancer. However, existing biomarker development efforts do not explicitly consider the evolutionary process they seek to assay, instead relying on the identification of a small set of genes that are aberrantly expressed in high-risk cases [10]. The recent appreciation that carcinogenesis is a highly stochastic process [13], in which many different combinations of genetic alterations and gene expression changes contribute to the same malignant phenotypes, has led to doubts about the utility of such “candidate gene” approaches [14]. Alternative biomarker development strategies attempt to assay the underlying evolutionary *process* itself. Quantification of within-tumour diversity, as a proxy measure of the probability that the tumour has evolved a well-adapted “dangerous” clone, is one such measure that has shown efficacy in a variety of cancer types [15, 16, 17]. Whilst most studies have focused on the quantification of within-tumour genetic diversity, it is noteworthy that quantification of phenotypic heterogeneity also shows prognostic value [18, 19].

Mathematical models are tools that have the potential to diminish the inherent constraints of empirical biomarker development. Due to the relative ease with which a mathematical model of cancer evolution can be analysed, potentially exhaustive searches of candidate biomarkers can be performed *in silico.* This is the idea that we develop in this study.

Mathematical modelling has a rich history in cancer research, and is increasingly used as a tool to investigate and test hypothesized mechanisms underlying tumour evolution [20]. A common approach is to consider spatially homogeneous well-mixed populations [21], using multi-type Moran models of constant or exponentially growing size [22] or multi-type branching processes [23]. Other work has highlighted the impact of spatial dynamics on the evolutionary process [24]. More complex models have coupled a discrete representation of the movement and proliferation of individual cells to a continuum description of microenvironment factors such as oxygen concentration and extracellular matrix composition. Such models, in particular the pioneering work of Anderson and colleagues [25, 26], demonstrate the significant selective force imposed by microenvironmental conditions such as hypoxia. A recent discussion of the use of ecological and evolutionary approaches to study cancer is provided by Korolev et al. [27]. The majority of models of tumour evolution have focused on the rates of invasion and accumulation of mutations, and how these depend on factors such as modes of cell division and spatial heterogeneity in cell proliferation and death. Defining statistics that correlate with prognosis in these kinds of models is an unaddressed problem.

Here we use mathematical modelling as a novel platform for *in silico* biomarker development. We develop a simple mathematical model of tumour evolution, and use the model to evaluate the prognostic value of a range of different potential biomarker measures and different tissue sampling schemes.

## Materials and methods

### Computational model of within-tumour evolution and biopsy sampling

To simulate the growth and dynamics of a pre-cancerous lesion, we consider a continuous-time spatial Moran process model of clonal evolution [28] on a two-dimensional square lattice, which may be thought of as a mathematical representation of an epithelial tissue. This description is similar to a model of field cancerization proposed by Foo et al. [29], although our model differs in several respects, which we describe below. We assume that in the transition from pre-malignant to malignant lesions, cells in a spatially well-structured population such as an epithelium are killed and/or extruded by an environmental stressor at a rate that is proportional to the inverse of their fitness, and replaced within the tissue via the division of a neighbouring cell. This assumption is represented in our chosen update rule. We suppose that it is this increased rate of cell turnover that leads to the accumulation of mutations, and eventually cancer. We refer to mutations as *advantageous, deleterious* or *neutral,* if they increase, decrease, or leave cell fitness unchanged.

The state of the system changes over time as a result of ‘death-birth’ events. At each point in time, each lattice site is defined by the presence of a cell with a specified ‘genotype’, given by the numbers of advantageous, neutral and deleterious mutations that it has accumulated. To implement the next death-birth event, we first choose a cell to die, at random, with a probability weighted by the inverse of each cell’s fitness. We define the fitness of a cell with *n _{p}* advantageous,

*n*neutral, and

_{n}*n*deleterious mutations by where the advantageous parameters

_{d}*s*and

_{p}*s*denote the relative fitness increase/decrease due to a advantageous/deleterious mutation. The chosen cell is removed from the lattice and one of the dead cell’s neighbours is chosen uniformly at random to divide into the vacated lattice site. The time at which this death-birth event occurs is given by a waiting time, chosen according to an exponential distribution with mean equal to the sum of all cell inverse fitnesses present on the lattice, as stipulated by the Gillespie algorithm [30].

_{d}Immediately following division, each daughter cell can independently accrue a mutation, with probability *μ*. If a mutation is accrued, it is labelled as advantageous, deleterious or neutral with equal probability 1/3. We note that neutral and deleterious mutations are not typically included in spatial Moran models of tumour evolution, as such mutations are unlikely to persist. However, over shorter timescales, their presence may have an effect on the dynamics of the system and hence the predictive power of any biomarkers considered. We emphasize that a cell that has accumulated mutations behaves the same as a wild-type cell in terms of mode of division and accumulation of mutations; the only difference between cells lies in their relative fitness, and hence the probability that they are chosen for removal as specified by the death-birth process.

We define the *time of clinical detection* of cancer to be the earliest time at which the proportion of cells with at least *N _{m}* advantageous mutations exceeds a specified threshold

*δ*. This reflects the time taken to reach a small, but clinically detectable, proportion of cancer cells that are capable of initiating and driving further tumour growth. In all simulations, we take

*δ*= 0.05. We evaluate the correlation of a measurement of some property of the state of the lesion sampled at some time

*T*with the subsequent waiting time to cancer.

_{b}Measurements of the state of the lesion are performed by (i) taking a ‘biopsy’ from the lesion, and (ii) evaluating a putative ‘biomarker assay’ on the biopsy. Three different biopsy strategies are considered. First, we consider the whole lesion, in order to establish an upper bound on the prognostic power of each biomarker when using maximal information about the state of the system at a given time. Second, we sample a biopsy comprising a circular region of cells of radius *N _{b}* lattice sites, whose centre is chosen uniformly at random such that the entire biopsy lies on the lattice; this represents the clinical procedure of core needle sampling. Third, we sample

*N*cells uniformly at random from the lattice; this represents washing or mechanical scraping of the lesion. In each case, we suppose that the biopsy constitutes a ‘snapshot’, and do not remove the sampled cells from the tissue. This simplifying assumption avoids the need to explicitly model the tissue response to wounding. The various biomarker assays evaluated are detailed below.

_{s}The definitions and values of all model parameters are summarized in Table 1. A MATLAB implementation of our model and simulated biopsy analysis is provided (see Text S1).

### Classical biomarkers

**Proportion of cells with at least two advantageous mutations.** A commonly used class of biomarkers measure the proportion of cells in a biopsy staining positive for a given receptor. Examples include the estrogen receptor (ER), progesterone receptor (PR) and HER2/neu amplification staining commonly performed for malignancies of the breast [31, 32, 33]. Such assays are cost-effective and relatively simple to implement.

Here, we use the cutoff of a cell having acquired at least *N _{p}* advantageous mutations to be representative of a cellular change that is observable in this manner. The measure is calculated simply by the number of cells having at least

*N*advantageous mutations, divided by the total number of cells sampled. We present results based on

_{p}*N*= 2 throughout the text, thus using the shorthand

_{p}*N*> 1 to refer to this biomarker. We discuss the robustness of these results to the chosen value of

_{p}*N*= 2 in the Results section.

_{p}**Mitotic proportion.** Proliferative cells are usually identified in tissue sections or cytology specimens using immunohistochemistry for cell-cycle associated proteins, foremost Ki-67 [34]. These proteins have a natural half-life over which a proliferative cell can be identified. To represent this measure in our computational model, we defined a time window *t _{w}* over which a proliferative marker can be detected by staining. The mitotic proportion at a given time

*t*is then defined as the number of cells that have undergone mitosis at least once in the time interval (

*t*–

*t*,

_{w}*t*], divided by the number of cells in the lattice,

*N*.

### Measures of heterogeneity

**Shannon index.**The Shannon index *H* measures diversity among a population comprising different types [35]. For a population of *K* distinct types, each comprising a proportion *p _{k}* of the population, the Shannon index is defined as

To calculate *H* we define *p _{k}* such that each distinct triplet of advantageous, neutral and deleterious mutations is associated with a distinct clone within the model, and

*p*represents the proportion of cells in this clone.

_{k}**Gini-Simpson index.** Another established measure of diversity is the Simpson index [36]. To ensure that a higher value corresponds to greater diversity, we choose to use a transformation of this index called the Gini-Simpson index, *S*, which is defined as follows [37]. For a population of *K* distinct types, each comprising proportion *p _{k}* of the population, we have

This index may be thought of as the probability that two randomly chosen members from the population are of different types. As the index increases towards the maximum of 1, the evenness of the distribution of the population over the various types becomes increasingly skewed toward one type.

**Moran’s I.** Moran’s

*I*is a measure of global spatial autocorrelation which computes a weighted statistical average of the deviation between data points in a set, weighted by their spatial distance [38]. Moran’s

*I*takes values in [−1, 1]. For a given set of values {

*X*

_{1}...

*X*}, with mean , and spatial weight matrix , Moran’s

_{N}*I*is defined as

We take *X _{i}* to be the sum of the numbers of advantageous, neutral and deleterious mutations accumulated by the cell at lattice site

*i*. Clinically,

*X*may be thought of as a binary variable (0 or 1) indicating whether a given cell bears some detectable abnormality (for instance a particular number of advantageous mutations). The spatial weight matrix (

_{i}*w*) can be specified in several ways; here, we define where

_{ij}*d*is the Euclidean distance between the lattice sites indexed by

_{ij}*i*,

*j*∈ {1,...,

*N*}. With this functional form, neighbouring points that are closer together are weighted more heavily, thus contributing more to the measure.

**Geary’s** *C.* Geary’s *C,* like Moran’s *I*, is a global measure of spatial autocorrelation. Geary’s *C* takes values in [0, 2], with higher values indicating less spatial autocorrelation, and lower values indicating a greater degree of spatial autocorrelation [39]. While Moran’s *I* is a more global measurement and sensitive to extreme observations, Geary’s *C* is more sensitive to differences in local neighbourhoods. For a given set of values {*X*_{1}... *X _{N}*}, with mean , and a given spatial weight matrix , Geary’s

*C*is defined as

Here, our definitions of *X _{i}* and (

*w*) follow those given for Moran’s

_{ij}*I.*

**Index of positive proliferation (IPP).** We next define a novel measure, termed the index of positive proliferation (IPP), that is a spatially weighted average of the location of mitotic cells and the number of advantageous mutations accrued by nearby cells. The biological motivation for this measure is to detect the recent clonal expansions of advantageous mutants, in order to quantify evidence of recent progression towards cancer: we might expect the concentration of proliferation in regions of high numbers of advantageous mutations to correlate with a poor prognosis.

We define the IPP as follows. Consider a population of *N* cells, with the individual cells labelled as *X*_{1},..., *X _{N}*. Suppose that a subset of these cells,

*Y*

_{1},...,

*Y*, are proliferating at a given time. We define

_{Q}*cellular contributions f*

_{1},...,

*f*∈ ℝ

_{M}^{+}as values such that a higher contribution corresponds to a cellular state genetically closer to that of cancer. Clinically, these cellular contributions correspond to cells that are genotypically closer to the end state of cancer, and represent either cutoff points that may be detected by gene sequencing, or immunohistochemical changes. For each cell

*i*with

*m*advantageous mutations, we define the cellular contribution

_{i},*f*as

_{i}Thus, for a given spatial weight matrix (*w _{ij}*) ∈ ℝ

^{N}^{×}

*we define the IPP as*

^{N}We define the weights *w _{ij}* as in equation (5), where

*d*is the Euclidean distance between cell

_{ij}*X*and proliferating cell

_{i}*Y*, such that

_{j}*X*≠

_{j}*Y*. In the case that

_{j}*X*=

_{j}*Y*, we take

_{j}*W*= 0.

_{ij}Since time steps correspond to ‘generations’ in our mathematical model, we store the locations of the most recent cell divisions within a given time window *t _{w}*, and regard these locations as locations that would stain positive for mitotic activity (e.g. via Ki-67 staining), to model the non-instantaneous process of detecting active cell division.

**Index of non-negative proliferation (INP).** To model the case where it may not be feasible to observe an accumulation of advantageous mutations only, in the sense that mutations accumulated may be neutral as well, we define an additional measure termed the index of non-negative proliferation (INP). This measure is defined analogously to the IPP, but with the cellular contributions chosen such that
where ⌊·⌋ denotes the integer part. Here, the sum of the number of advantageous and neutral mutations is considered to be the observable quantity, simulating a situation in which the observable information encapsulates and may skew the perception of the true genotypic state of the system. The cutoff value of *N _{m}*/2 was chosen in an

*ad hoc*manner based on preliminary simulations; we note that refinement of this parameter may be necessary for effective use of the INP in future studies.

### Statistical methods

The statistical association (correlation with the waiting time to detectable cancer) of each sampling strategy and putative biomarker assay was evaluated using Kaplan-Meier curves and univariate Cox Proportional Hazards models as implemented in the *R* statistical computing language. For all presented p-values, the significance cutoff is taken as 0.05.

Data used for the Cox regression model were all generated by the stochastic simulations of the computational model. That is, the event times were defined as the simulation times at which 5% of the cells of the lattice were defined as cancerous, and the predictors of this time were taken to be the biomarker index values computed at an earlier simulation time. The cohort size is therefore the number of such simulations which were carried out, which was 10^{3}. There was no censoring required, as all simulations were run to completion of endpoint as defined previously.

## Results

We consider a spatial model of the evolution of malignancy in a precancerous lesion. In our model, cells occupy a two-dimensional lattice of size *N*. Time is treated as a continuous variable in the model, but is simulated as a succession of discrete time steps, where the length of each time step is a function of the overall fitness of the population and a stochastic factor, as per the Gillespie Algorithm [30]. At each time step, a cell is chosen at random to die and is removed from the lattice with a probability that is inversely proportional to its *cellular fitness,* a positive real number that is initially equal to 1 for non-mutated cells and may be altered by mutation. When a cell dies, one of its neighbours is then chosen uniformly at random to divide, with one of the daughter cells occupying the free lattice site and each daughter cell independently acquiring a new mutation with probability *μ*. We refer to mutations as *advantageous, deleterious* or *neutral,* according to whether they increase, decrease, or leave fitness unchanged, with each type of mutation assumed to be equally likely.

Starting from a lattice occupied entirely by non-mutant cells, we consider the *outcome* of each simulation to be the time taken for the proportion of cells with at least *N _{m}* advantageous mutations to exceed a threshold

*δ*. This waiting time is defined as the time to clinically detectable lesions. We choose a value of

*δ*corresponding to a proportion of cancer cells that is sufficiently large to be clinically detectable, and to initiate subsequent rapid growth.

A representative snapshot of a model simulation is shown in Fig. 1A. To simulate clinical sampling, at a predetermined time *T _{b}* we take a virtual

*biopsy*from the lesion (Fig. 1B), from which we compute various biomarkers and assess their prognostic value in determining the waiting time to cancer (see

*Methods*). The model exhibits successive clonal sweeps of mutations (Fig. 1C).

### Assessment of candidate biomarkers and tissue sampling schemes

**Counting driver mutations.** We considered the correlation between the proportion of cells bearing at least *N _{p}* advantageous mutations (so-called driver mutations) and the waiting time to cancer. The closer the cutoff

*N*is to the number of mutations required for cancer,

_{p}*N*, the more correlated this measure became with the waiting time to cancer (Table S1). These results confirm the intuition that it is easier to predict the occurrence of a cancer at a time point close to when the cancer will occur (e.g. at the ‘end’ of the evolutionary process, when only a few additional driver mutations are required) than early in the cancer’s development, when many additional mutations are required.

_{m}**Small needle biopsies.** We computed the prognostic value of various candidate biomarker ‘assays’ performed on a single biopsy of radius *N _{b}* = 20 taken at time

*T*post simulation initiation. Neither the proportion of cells with at least one advantageous mutation nor the proliferative fraction were significant predictors of prognosis (Table 2). In contrast, measures of clonal diversity (Shannon and Gini-Simpson index) were both highly significant predictors of prognosis (

_{b}*p*< 10

^{−4}in both cases). Of the spatial autocorrelation measures, Moran’s

*I*(

*p*= 0.02) but not Geary’s

*C*(

*p*= 0.29) had prognostic value.

**Random sampling.** Random sampling of cells from the lesion represents a tissue collection method such as an endoscopic brush or a cellular wash. We took a random sample of 10^{3} cells, corresponding to 10% of the total lesion. As for small biopsy sampling, the proportion of mitotic cells within the sample was a poor prognostic marker (*p* = 0.23), but interestingly the proportion of cells with more than one advantageous mutation became a significant predictor (*p* = 0.01). This may be due to the fact that within a sparse sample of the lesion, the number of mutant cells is a proxy for active on-going evolution: either via the large scale clonal expansion of a single clone, or multiple foci of independent clones. Increased clonal diversity remained a highly significant predictor of a short waiting time to cancer (*p* < 10^{−4} for both the Shannon and Gini-Simpson indices; Fig. 2).

**Whole lesion sampling.** In the case of whole-lesion sampling, all information on the current state of the virtual tumour is available in the biomarker assay, and hence we expected to see maximum predictive value of our putative biomarkers. In this case, the proportion of cells with at least one advantageous mutation remained a poor prognosticator (*p* = 0.29), whereas the proportion of proliferative cells became a significant predictor (*p* = 0.02) (see Table 2).

The clonal diversity measures remained highly significant prognosticators (*p* < 10^{−4} in both cases), underlining their robustness as prognostic measures. Higher clonal diversity was associated with faster progression to cancer (Fig. 3). The prognostic value of the spatial autocorrelation measure Moran’s *I* was significantly improved when the whole grid was sampled (*p* < 10^{−4}), but Geary’s *C* remained non-correlated.

Together these data highlight the high prognostic value of diversity measures, and their robustness to the details of tissue sampling method used.

### Novel prognostic measures

We next sought to determine whether novel statistics calculated on the state of the lesion could provide additional prognostic value. We defined two new statistics, the *index of positive proliferation* (IPP) and the *index of non-negative proliferation* (INP), which describe the spatial autocorrelation between proliferating cells with advantageous mutations, or proliferating cells with non-deleterious mutations, respectively (see *Methods*). Since these statistics tie together measures of both the mutation burden and proliferative index, we consider them to be measures of the degree of ‘evolutionary activity’.

In both small biopsy samples and whole-lesion analysis, the IPP was a highly prognostic statistic (Table 2), with larger values of the statistic accurately predicting shorter times to cancer (Fig. 3). The INP was prognostic on whole-lesion analysis (Fig. 3), but not on targeted biopsies (Table 2). The difference in the prognostic value between IPP and INP is suggestive of the particular importance of assaying ‘distance’ travelled along the evolutionary trajectory towards cancer: the IPP is sensitive to this distance as it only measures advantageous mutations, whereas the INP is potentially confounded by non-adaptive mutations. The inherent issues associated with the identification of advantageous mutations consequently potentially limit the utility of these novel measures.

To assess the predictability of each putative biomarker [40] we calculated the area under the receiver operating characteristic (ROC) curves (Fig. 4) as a function of the censoring time [41]. ROC curves are the curves defined by the sensitivity and specificity of each index value as it predicts end time to cancer, where a positive end time is a time past a certain pre-defined simulation time, and the cutoff for the index value that defines whether the index predicts if that end time is early or late, is continuously varied. The area under these curves is 1 in the case of an index value that is perfectly predictive of an end time, and 0.5 for random guessing as to whether the index predicts if the end time is early or late. These curves show the IPP measure has the best predictive value of all measures considered, and that the Shannon and Gini-Simpson diversity indices also have strong predictive value. The lack of predictive value derived from the mitotic proportion, Geary’s *C* and proportion of mutant cells was also confirmed.

### Early versus late biopsy

Effective screening for cancer risk requires predicting cancer risk long before the cancer develops. We next considered how the timing of a biopsy affects its prognostic value by investigating how the correlation coefficient between each biomarker and the subsequent waiting time to cancer varies with the time at which the biopsy is taken.

As expected, we found that biopsies collected later in the lesion’s evolution (e.g. closer to the time of cancer development) generally had more statistical association than biopsies collected earlier, and this was true irrespective of the tissue sampling method used (Fig. 5A-C). Sampling early in the lesion’s evolution (e.g. near to the start of the simulations) had poor correlation irrespective of the putative biomarker assay used, reflecting the fact that very few mutations had accumulated in the lesion at short times. Sampling at intermediate times showed dramatic improvements in the prognostic value of the diversity indices and IPP measure, whereas samples taken at a variety of long times had approximately equal prognostic value or showed slight declines relative to intermediate times. At intermediate and long times, the IPP was the best performing prognostic measure. The mitotic proportion and proportion of cells with at least one advantageous mutation were consistently poor predictors across the entire time course.

The effect of taking a small biopsy, as opposed to sampling the whole lesion, was to both significantly reduce the prognostic value of all putative biomarker measures, and introduce ‘noise’ into their prognostic values (Fig. 5B). Importantly, we observed that in spite of this noise, the correlation coefficients for the clonal heterogeneity and IPP measures were consistently high compared to the other measures, indicating their robustness as prognostic markers. Biopsy sampling significantly reduced the prognostic value of Moran’s *I* compared to whole-lesion sampling, indicating how this measure is particularly confounded by tissue sampling.

On random samples (analogous to endoscopic brushings or washings), the Shannon and Gini-Simpson indices showed good correlations with the waiting time to cancer. These diversity measures were more correlated for random samples than for circular biopsies, despite each sample constituting similar numbers of cells (10% and 12% of the lesion, respectively). This result may reflect the fact that a biopsy can potentially miss a ‘dangerous’ clone, whereas a random sampling method is likely to obtain cells from all sizeable clones within the lesion.

Together, these data indicate that larger samples usually provide more prognostic value than smaller samples, and that very ‘early’ tissue samples are unlikely to contain significant prognostic information. They also highlight again that the prognostic value of diversity measures is particularly robust to the details of tissue sampling.

### Longitudinally collected biopsies

We next examined whether combining information from serial biopsies, taken at two different time points (*t*_{1} and *t*_{2}; both strictly before cancer occurrence), provided more prognostic information than a biopsy from a single time point. To do this, we evaluated the average of the values of each biomarker at the times *t*_{1} and *t*_{2}, and the correlation between this average and the waiting time to cancer. We then compared this correlation with that of the biomarker value at time *t*_{2} alone. These results are shown in Fig. 6, where the *x* and *y* axes indicate the time of the first and second biopsies, and the colours indicate the difference between the correlation coefficient for the *average* of the individual biomarker values at each time points and the correlation coefficient for the biomarker value at the second time point alone.

Including information from an early biopsy in this manner provided slight additional prognostic value over-and-above the information available in the later biopsy (Fig. 6; approximately a 0.1 increase in the correlation was observed). In contrast, When information was combined by taking the difference in biomarker values between the biopsies collected at two different time points, the value from the later biopsy was generally more prognostic, and importantly was more prognostic than a measure which combined information both the early and late biopsies. Interestingly, the prognostic value of the Shannon and Gini-Simpson indices was reduced when considering the difference in biomarker values between two time points (Fig. S4). When we instead compared the maximal value of the biomarker across the two time points to its value at the later time point, we found similar results to the average case, but with smaller increases in correlation at later times; this was particularly the case for the Shannon, Gini-Simpson, IPP and INP indices (Fig. S5).

**Multiple biopsies at the same time point**

A consequence of intra-tumour heterogeneity is that a single biopsy may fail to sample an important clone [42] and so cause an incorrect prognosis assignment. To address this issue, we studied how the prognostic value of each putative biomarker was improved by taking additional biopsies at the same time (*T _{b}* = 50). For simplicity, after each virtual biopsy the sampled tissue was perfectly replaced in order to avoid the complexities associated with modelling local wound healing and tissue recovery. Further, while we did not strictly preclude biopsies from overlapping, the degree of overlap between biopsies is typically minimal because of the relatively small numbers of biopsies and small sizes of biopsy considered.

Assaying from more biopsies generally improved prognostic value, but with diminishing returns for each additional biopsy (Fig. 7). For all but one of the putative biomarkers, the maximum prognostic value was achieved by taking the average biomarker value across all biopsies, whereas measures of the spread of values (the variance or range) were generally poor prognosticators. Interestingly, the maximum prognostic value for the proportion of cells with at least two advantageous mutations was achieved by taking the *minimum* value across all biopsies; this could be because the minimum value is particularly sensitive to biopsies that contain non-progressed cells. Together these data imply that taking more biopsies and averaging the biomarker signal across biopsies provides additional prognostic information.

### Robustness of results to choice of model

To assess the robustness of results to our model assumptions, we investigated the impact of parameter values and update rules on the statistical association of each biomarker. We observed the same qualitative behaviour, such as diversity measures outperforming the proliferative fraction in degree of correlation, irrespective of the choice of parameter values or update rule used (see Figs S1–S3, Tables S1–S17 and Text S1 for details).

To briefly summarize these results: (i) lower mutation rates decreased the correlation of each marker with the waiting time to cancer, because of the increased stochasticity in the model introduced by a lower mutation rate; (ii) smaller biopsies were in general less prognostic; (iii) the number of mutations required for cancer did not qualitatively change the predictions of the model; (iv) the fitness advantage and disadvantage caused by new mutations, and the relative likelihood of each of the various mutation types, did not qualitatively alter the prognostic value of the biomarkers, although diversity measures were most prognostic for the case where there were many strongly advantageous mutations; and (v) the closer the value of the threshold *N _{p}* was to the number of mutations required for cancer, the more correlated the proportion of cells with

*n*≥

_{p}*N*became with the waiting time to cancer.

_{p}We also analyzed the sensitivity of model results to variations in the update rules of the system. We tested the biomarkers in a birth-driven system as opposed to a death-driven system, and found that again, that the diversity and IPP measures remained the biomarkers most significantly associated with the waiting time to cancer. Further, we investigated the effect of decoupling mutations and cell division. While these changes to the model did alter the specific predictive values of the various indices (summarized in Table S14), the general pattern of statistical association was not altered. Moreover, even in this scenario, the IPP performed well.

## Discussion

In this work we have developed a simple computational model of cancer development within premalignant disease and used the model to evaluate the prognostic value of a range of different putative biomarker measurements and tissue sampling schemes. Our results show that simply counting the proportion of cells bearing multiple advantageous mutations (proportion of cells with ‘driver’ mutations) or the proportion of proliferating cells were universally poor predictors of the waiting time to cancer, whereas measures of clonal diversity were highly correlated with the time to cancer and were robust to the choice of tissue sampling scheme. Further, we evaluated a range of different tissue sampling schemes (single biopsy, multiple biopsies in space or time, or random sampling of a lesion). We found that random sampling (such as via an endoscopic brush) provided more consistent prognostic value than a single biopsy, likely because a single (randomly targeted) biopsy is liable to miss localised but ‘important’ clones. Prognostication was improved by taking multiple biopsies, but with diminishing returns for each additional biopsy taken. Together these data provide a rationale for the empirical evaluation of different tissue sampling schemes.

Averaging biomarker scores from two different time points did improve the predictive value of our putative biomarkers; however, the difference in each putative biomarker’s values between time points was less predictive than its value at the later time points. This result was somewhat counter to our initial intuition that taking longitudinal biopsies would accurately track the ‘evolutionary trajectory’ of the lesion and hence dramatically improve prognostication. This result illustrates how our *in silico* approach can challenge intuition and, in so doing, provide novel insights into biomarker development.

We developed a new statistic, termed the index of positive proliferation (IPP), that proved to be a highly prognostic measure. The IPP is a measure of the average distance to a proliferating cell that has acquired advantageous mutations. It thus combines both genetic (or phenotypic) information with spatial (cell position) and dynamic (proliferation) information. This integration of multiple different sources of information may account for the prognostic value of the biomarker in our model. Empirical measurement of the IPP would be feasible if, for example, the number of driver mutations accumulated by a cell could be quantified concomitantly with a proliferative marker. Developments in *in situ* genotyping methods might facilitate such an approach in the near future. Irrespective of the immediate feasibility of such a measure, our development and testing of the IPP statistic within our computational model illustrates how *in silico* approaches provide a powerful means to rapidly explore new potential biomarker assays.

Our computational model of cancer evolution is clearly a highly simplified description of reality. For example, we modelled a simple two-dimensional sheet of epithelial cells and neglected the important influence, and indeed co-evolution, of the supporting stroma. We assumed simple relationships between genotype, phenotype and fitness, and also neglected to model cell-cell interactions. Critically, we also used an abstract fitness function to define cellular phenotypes, and in doing so neglected to describe any molecular details of cell behaviour. Adequately describing these kinds of important biological complexities within a model is a necessary next step for the development of *in silico* biomarker development platform that is of general use. Increasing the realism of the model would improve confidence that the predicted prognostic value of any biomarker was not an artefact of the over-simplified model, although we have shown that our results are somewhat robust to alterations of a number of the key parameters in our model. Incorporating additional biological realism would also facilitate the *in silico* testing of the prognostic value of a full range of specific biological features; for example, the expression of a protein that fulfills a particular biological function, such as modulating cell adhesion.

Our study demonstrates how a computational model offers a platform for the initial development of novel prognostic biomarkers: computational models can be viewed as a high-throughput and cost-effective screening tool with which to identify the most promising biomarkers for subsequent empirical testing. This work provides the rationale for constructing an *in silico* biomarker development platform that would lessen the current restrictions imposed by the sole reliance on empirical testing.

## Acknowledgments

AD is supported by a J.D. Hatcher Award, School of Medicine, Queen’s University, Canada. AGF acknowledges support by the EPSRC through grant EP/I017909/1 (www.2020science.net). TAG is supported by Cancer Research UK. The authors wish to thank the anonymous reviewers for their insightful comments and suggested improvements to the manuscript.

## Footnotes

↵* adhawan{at}qmed.ca (AD); t.graham{at}qmul.ac.uk (TG); a.g.fletcher{at}sheffield.ac.uk (AF)