## Abstract

Heterogeneity in strategies for survival and proliferation among the cells which constitute a tumour is a driving force behind the evolution of resistance to cancer therapy. The rules mapping the tumour’s strategy distribution to the fitness of individual strategies can be represented as an evolutionary game. We develop a game assay to measure effective evolutionary games in co-culures of alectinib-sensitive and alectinib-resistant non-small cell lung cancer. The games are not only quantitatively different between different environments, but targeted therapy and cancer associated fibroblasts qualitatively switch the type of game being played from Leader to Deadlock. This observation provides the first empirical confirmation of a central theoretical postulate of evolutionary game theory in oncology: we can treat not only the player, but also the game. Although we concentrate on measuring games played by cancer cells, the measurement methodology we develop can be used to advance the study of games in other microscopic systems by providing a quantitative description of non-cell-autonomous effects.

Tumors are heterogeneous, evolving ecosystems [1, 2], comprised of sub-populations of neoplastic cells that follow distinct strategies for survival and propagation [3]. The success of a strategy employed by any single neoplastic sub-population is dependent on the distribution of other strategies, and on various components of the tumour microenvironment, like cancer associated fibroblasts (CAFs) [4]. The EML4-ALK fusion, found in approximately 5% of non-small cell lung cancer (NSCLC) patients, leads to constitutive activation of oncogenic tyrosine kinase activity of ALK, thereby “driving” the disease. Inhibitors of tyrosine kinase activity of ALK (ALK TKI) have proven to be highly clinically efficacious, inducing tumor regression and prolonging patient survival [5,6]. Unfortunately, virtually all of the tumors that respond to ALK TKIs eventually relapse [7] – an outcome typical of inhibitors of other oncogenic tyrosine kinases [8]. Resistance to ALK TKI, like most targeted therapies, remains a major unresolved clinical challenge. Despite significant advances in deciphering the resultant molecular mechanisms of resistance [9], the evolutionary dynamics of ALK TKI resistance remains poorly understood. The inability of TKI therapies to completely eliminate tumor cells has been shown to be at least partially attributable to protection by aspects of the tumor microenvironment [10]. CAFs are one of the main non-malignant components of tumor microenvironment and the interplay between them and tumor cells is a major contributor to microenvironmental resistance, including cytokine mediated protection against ALK inhibitors [11].

To study the eco-evolutionary dynamics of these various factors, we interrogated the competition between treatment naive cells of ALK mutant NSCLC cell line H3122 – a “workhorse” for studies of lung cancer – and a derivative cell line in which we developed resistance to alectinib – a highly effective clinical ALK TKI [12] – by selection in progressively increasing concentrations of the drug [13]. We aimed to come to a quantitative understanding of how these dynamics were affected by clinically relevant concentrations of alectinib (0.5*μ*M; see [14]) in the presence or absence of CAFs, isolated from a lung cancer. To achieve this, we developed a novel assay for quantifying eco-evolutionary dynamics that is of independent interest to the general study of microscopic systems.

## Mono vs mixed cultures & cost of resistance

To establish baseline characteristics, we performed assays in monotypic cultures of parental (alectinib-sensitive) and resistant cell lines with and without alectinib and CAFs. To gather temporally-resolved data for inferring growth rates, we used time lapse microscopy to follow the expansion of therapy resistant and parental cells, differentially labeled with stable expression of selectively neutral GFP and mCherry fluorescent proteins, respectively. From the time series data, we inferred the growth rate with confidence intervals for each one of 6 experimental replicates in four different experimental conditions (total of 24 data points, each with confidence intervals), as seen in Figure 1. As expected, alectinib inhibited growth rates of parental cells (DMSO vs Alectinib: *p* < .005; DMSO + CAF vs Alectinib + CAF: *p* < .005), whereas the growth rate of the resistant cells was not affected. And, as previously reported [11], CAFs partially rescued growth inhibition of parental cells by Alectinib (Alectinib vs Alectinib + CAF: *p* < .005; Alectinib + CAF vs DMSO: *p* < .005), without impacting growth rates of resistant cells.

The classic model of resistance posits that the resistant phenotype receives a benefit in drug (in our case: Alectinib of Alectinib + CAF) but is neutral, or even carries an inherent cost, in the abscence of treatment (DMSO or DMSO + CAF). For example, experimentalists frequently regard resistance granting mutations as selectively neutral in the absence of drug, and the modeling community often goes further by considering explicit costs like up-regulating drug efflux pumps, investing in other defensive strategies, or lowering growth rate by switching to sub-optimal growth pathways [15, 3]. If we limited ourselves to these monotypic assays, then our observations would be consistent with this classic model of resistance.

But we did not limit ourselves to monotypic assays. Our experience observing non-cell-autonomous biological interactions [16] and modeling eco-evolutionary interactions [17, 18, 19] in cancer led us to suspect that the heterotypic growth rates would differ from monotypic culture. Cell-autonomous fitness effects are ones where the benefits/costs to growth rate are inherent to the cell: the presence of other cells are an irrelevant feature of the micro-environment and the growth rates from monotypic cultures provide all the necessary information. Non-cell-autonomous effects allow fitness to depend on a cell’s micro-environmental context, including the frequency of other cell types: growth rates need to be measured in competitive fitness assays over a range of seeding frequencies. Other microscopic experimental systems in which frequency dependent fitness effects have been considered include, but are not limited to: *Escherichia coli* [20, 21], yeast [22, 23], bacterial symbionts of hydra [24], breast cancer [16] and pancreatic cancer [25]. Hence, we continued our experiments over a range of initial proportions of resistant and parental cells in mixed cultures for each of the four experimental conditions.

Figure 2 shows the resulting growth rates of each cell type in the co-culture experiments for all experimental (color, shape) and initial conditions (opacity is parental cell proportion). In the heterotypic culture – unlike monotypic – CAFs slightly improved the growth rates of the parental cells, even in DMSO. More strikingly, even in the absence of drug, resistant cells tend to have a higher growth rate than parental cells in the same environment (i.e. proportion of parental cells in the co-culture). This is evident from most DMSO points being above the dotted diagonal line (y = x) corresponding to equal growth rate of the sub-populations (this is quantified in figure 4b and is further discussed in section ‘Leader and Deadlock games in NSCLC’). The higher fitness of resistant cells in co-culture that we observe in the absence of drug is not consistent with the classic model of resistance. This is a result that cannot be seen from just the monotypic experiments, which are compatible with the classic model. Although the highest fitness of resistant cells might not surprise clinicians as much as much as the biologists: in clinical experience, tumours that have acquired resistance are more aggressive than before they were treated, even in the absence of drug.

A reductionist could rationalize our observations by saying that we actually selected for two different qualities in our resistant line: (i) a general growth advantage, and (ii) resistance to Alectinib. This is a reasonable hypothesis, but it faces a few challenges. First, both parental and resistant cells were evolved for the same length of time, with escalating dosages of DMSO for the former and Alectinib for the latter (see Mediavilla-Varela et al. [26] and Supplementary Appendix A). Thus, (i) cannot be due to just subculturing, but is somehow linked to drug. Second, there is no growth rate advantage of resistant cells in monoculture (see Figure 1); the advantage is only revealed when parental and resistant cells are cultured with a common proportion of parental cells. Finally, to even make the distinction between (i) and (ii), one has to implicitly assume that resistance has to be neutral or costly by definition. For an oncologist, however, both (i) and (ii) would constitute clinical resistance if they led to a tumour escaping therapeutic control. By using a definition of clinical resistance that is broad enough to capture both aspects, we observe resistance that is neither neutral nor costly in DMSO co-culture.

## Frequency dependence in fitness functions

Although not common in cancer biology, competitive fitness assays are a gold standard for studying bacteria. But they are typically conducted with a single initial ratio of the two competing cell types. However, in Figure 2, if we view the initial proportion of parental to resistant cells as a variable parameter represented by opacity then we can see a hint of frequency dependence in both parental and resistant growth rates. There we see an increase in fitness of both cell types as the initial proportion of parental cells (represented by the opacity of each point) increases. In other words, the variance in growth rates between points within each of the four environmental conditions is not due to just noise but can be explained by the micro-environmental variable of initial proportion of parental cells.

This is shown more clearly in Figure 3. In all four conditions, we see that the growth rate of the resistant and parental cell lines depends on the initial proportion of parental cells. To capture the principle first-order part of this dependence, we consider a line of best fit between initial proportion of parental cells and the growth rates. See equations 1-8 in Supplementary Appendix C (or the matrix entries in Figure 4b) for these lines of best fit. Interpretable versions of these lines of best fit (see Supplementary Appendix D) can be expressed as a regularized fitness function where *S ∈ {P, R*} indexes the parental or resistant strategy and *C* ∈ {DMSO, DMSO + CAF, Alectinib, Alectinib + CAF} indexes the experimental condition. For a description of regularization see Supplementary Appendix D.

In three of the conditions, the resistant cell growth rates increase with increased seeding proportion of parental cells, while the parental growth rates remain relatively constant (in the case of no CAFs) or slightly increasing (in the case of alectinib + CAFs). For example, in DMSO, this suggests that parental cells’ fitness is independent of resistant cells:. Parental fitness in DMSO could be well characterized as cell-autonomous. However, resistant cells in monotypic culture have approximately the same fitness as parental cells (Figure 2a), but they benefit from the parental cells in co-culture: (where *p* is the proportion of parental cells). Their fitness is non-cell autonomous. The positive coefficient in front of *p* suggests commensalism between resistant and parental cells, i.e. resistant cells benefit from the interaction with the parental cells, without exerting positive or negative impact on them.

The DMSO + CAF case differs from the other three in that we see a constant – although elevated growth rate in resistant cells; but a linearly decreasing (in *p*) growth rate of parental cells: ) (or, equivalently: ). This could be interpreted as CAFs switching the direction of commensalism between parental and resistant cells. Further, the stable co-existence enabled by the cross in fitness functions offers an alternative to the widely held view of pre-existent resistance as a rare cell-autonomous binary switch. We discuss this alternative in section ‘Treating the game & heterogeneity’.

## Leader and Deadlock games in NSCLC

The tools of evolutionary game theory (EGT) are well suited for making sense of frequency-dependent fitness [28, 29, 30, 17, 31, 18, 25, 19]. In EGT, a game is the rule mapping the population’s strategy distribution to the fitness of individual strategies. Previous work has considered games like snowdrift [23], stag hunt [24], rock-paper-scissors [20], and public goods [22, 25] although none have been designed to measure evolutionary games directly. Instead, prior work followed a two track approach. They compared – usually qualitatively – the results of experiments (track one) to numeric simulations or analytic dynamics of these specific games (track two). We unite these parallel tracks by defining the effective game as an assayable hidden variable of a population and its environment. As such, we are not aiming to test EGT as an explanation of these particular data but rather using these data as an illustration of how we should operationally define games. With this game assay, we are aiming to quantitatively describe our system in the language of EGT.

To measure the game that describes the non-cellautonomous interactions in NSCLC, we focus on the gain function (see [32, 19] for a theoretical perspective): the increase in growth rate that a hypothetical player would *NP* +*NR* in ‘switching’ from being parental to resistant with all other variables held constant. In other words, we quantify how the difference between resistant and parental growth rates varies with initial proportion of parental cells. The relatively good fit of a linear dependence of growth rates on parental seeding proportion allows us to describe the interaction as a matrix game – a well-studied class of evolutionary games (see a description in Figure 4a). Note that this linearity is not guaranteed to be a good description for arbitrary experimental systems. For example, the game between the two Betaproteobacteria *Curvibacter* sp. AEP1.3 and *Duganella* sp. C1.2 was described by a quadratic gain function [24]. Future work can extend our assay to non-matrix games.

In supplementary materials E we discuss a purely experimental interpretation based on Kaznatcheev [27]. **(b) Mapping of the four measured in vitro games into game space.** The x-axis is the relative fitness of a resistant focal in a parental monotypic get culture:

*C - A*. The y-axis is the relative fitness of a parental focal in a resistant monotypic culture:

*B - D*. Games measured in our experimental system are given as specific points with error bars based on goodness of fit of linear fitness functions in Figure 3. The games corresponding to our conditions are given as matrices (with entries multiplied by a factor of 100) by their label (see supplemental section C for details). The game space is composed of four possible dynamical regimes, one for each quadrant. The typical dynamics of each dynamic regime are represented as qualitative flow diagram between parental (

*P*) and resistant (

*R*) strategies: an upward red arrow corresponds to an increase in the parental subpopulation, and a downward green arrow correspond to an increase in the green subpopulation. In the case of the two dynamic regimes observed in our NSCLC system, we also include insets of measured dynamics (c,d):

**Experimental time-series of proportion of parental cells for DMSO + CAF (c) and Alectinib + CAF (d).**Each line corresponds to the time dynamics of a separate well. A line is coloured green if proportion of resistant cells increased from start to end; red if proportion of parental cells increased; black if statistically indistinguishable proportions at start and end (where start/end are defined as the first/last 5 time-pints (20 hours)). See Supplementary Figure 5 for proportion dynamics of all four games and Supplementary Figure 6 for density dynamics and their correspondence to the exponential growth model from Figure 4a.

Here, as a first step toward quantifying an evolutionary game, we are satisfied with the first-order approximation of interactions provided by matrix games. Two strategy matrix games have a convenient representation in a two dimensional game-space and can produce all possible linear gain functions. More importantly, from a linear gain function, it is possible to infer the corresponding matrix game. Since the game type and resultant dynamics are invariant under constant offsets to the columns, we can infer the game played by the cancer cells (see the model in Figure 4a for details). This is the output of our game assay. We plot the inferred games in a game-space spanned by the theoretical fitness advantage a single resistant invader would have if introduced into a parental monotypic culture versus the fitness advantage of a parental invader in a resistant monotypic culture; as shown in Figure 4b. In this representation, there are four qualitatively different types of games corresponding to the four quadrants, each of which we illustrative with a dynamic flow. We can see that the game corresponding to DMSO + CAF – although quantitatively similar to DMSO – is of a qualitatively different type compared to all three of the other combinations.

We can also convert our inferred fitness functions from Figure 3 into a payoff matrix. We do this by having each row correspond to a strategy’s fitness function with the column entries as the *p* = 1 and *p* = 0 intersects of this line of best fit. If we look at our empirical measurements for DMSO + CAF (upper-right quadrant Figure 4b) we see the Leader game, and Deadlock in the other three cases (we will use DMSO to illustrate the Deadlock game).

The Deadlock game observed in DMSO is in some ways the opposite of the popular Prisoner’s Dilemma (PD) game (in fact, Robinson and Goforth [33] call it the anti-PD). If we interpret parental as cooperate and resistant as defect then, similar to PD, each player wants to defect regardless of what the other player does (because 4.0 *>* 2.5 and 2.7 *>* 2.4) but hopes that the other player will cooperate (because 4.0 *>* 2.7). However, unlike PD, mutual cooperation does not Pareto dominate mutual defection (because 2.5 *<* 2.7) but is instead strictly dominated by it. Thus, the players are locked into defection. In our system, this corresponds to resistant cells having an advantage over parental in DMSO.

The Leader game observed in DMSO + CAF is one of Rapoport [34]’s four archetypal 2 *×* 2 games and a social dilemma related to the popular game known as Hawk-Dove, Chicken, or Snowdrift (in fact, Robinson and Goforth [33] call it Benevolent Chicken). If we interpret parental as ‘lead’ (for Snowdrift: wait) and resistant as ‘work’ (for Snowdrift: shovel) then similar to Snowdrift, mutual work is better than both leading (because 3.0 *>* 2.6) and thus no work being done (for Snowdrift: both waiting and thus not getting out of the snowdrift) but each player would want to lead while the other works (because 3.5 *>* 3.0). However, unlike Snowdrift, mutual work is not better than the “sucker’s payoff” of working while the other player leads (because 3.1 *>* 3.0). Rapoport [34] sees this as a tension with a player switching from a “natural” point of mutual work to lead and thus benefit both players (3.5 *>* 3.0, 3.1 *>* 3.0), but if the second player also does the same and becomes a leader then all benefit disappears (because 2.6 is the smallest payoff). In our system, this corresponds to cells in the tumour experiencing selective pressure to lose some but not all of its resistance in DMSO + CAF.

Note that the above intuitive stories are meant as heuristics, and the effective games that we measure are summaries of population level properties [27, 35]. This means that the matrix entries should not be interpreted as direct interactions between cells, but as general couplings between subpopulations corresponding to different strategies. The coupling term includes not only direct interactions, but also indirect effects due to spatial structure, diffusible goods, contact inhibition, etc. But this does not mean that an effective game is not interpretable. For example, the Deadlock game captures the phenomenon of the resistant population always being fitter than parental (for example, in DMSO). We noted this effect intuitively in Figure 2 (also see section Mono vs mixed cultures & cost of resistance) from replicates being above the *y* = *x* diagonal. Measuring a Deadlock game for DMSO with confidence intervals that do not extend outside the bottom right quadrant of the game space in figure 4b allows us to show the statistical significance of our prior intuitive understanding. Effective games allow us to quantify frequency-dependent differences in growth rates.

## Treating the game & heterogeneity

To our knowledge, neither of the Leader and Deadlock games are considered in the prior theoretical EGT literature in oncology. Given that the Deadlock of drug-resistant over drugsensitive cells is a challenge for classic models of resistance – as discussed in section Leader and Deadlock games in NSCLC – we would be particularly interested in theoretical models of resistance that produce the Deadlock game. In addition to challenging theorists by adding two new entries to the catalogue of games that cancers play, our results also support existing theoretical work in mathematical oncology that considers treatment (or other environmental differences) as changes between qualitatively different game regimes [31, 17, 18, 19]. In this framework, treatment has the goal not to directly target cells in the tumor, but instead to perturb the game they are playing and allow evolution to drive down or control unwanted cancer subclones through competition. Before this study, this possibility has been largely taken as a theoretical postulate. In our system, we can view an untreated tumour as similar to DMSO + CAF and thus following the Leader game. Treating with alectinib (move to Alectinib + CAF) or eliminating CAFs through a stromal directed therapy (move to DMSO), moves the game into the lower-right quadrant of Figure 4b, and the game becomes a Deadlock game. This switch allows us to show that this theoretical construct of EGT – that treatment can qualitatively change the type of game – has an experimental implementation. Unfortunately, our system is *in vitro* and even if it generalized to *in vivo*: neither of the games leads to a therapeutically desirable outcome for the patient.

A particularly important difference between Leader and Deadlock dynamics is the existence of an internal fixed point in Leader but not in Deadlock. We can see convergence towards this fixed point in the DMSO + CAF condition of Figure 4c, and no such convergence in the other three cases (Figure 4d for Alectinib + CAF; Supplementary Figure 5). Since the DMSO + CAF condition is our closest to an untreated patient, it might have important consequences for latent resistance. Many classical models of resistance assume a rare preexistent mutant taking over the population after the introduction of drug. In our experimental system, however, if the resistant strategy is preexistent then negative frequency dependent selection will push the population towards a stable polyclonal tumour of resistant and sensitive cells before the introduction of drug. This allows for much higher levels of preexisting heterogeneity in resistance than predicted by the classical picture. If similar games occur *in vivo* then such preexisting heterogeneity could be a possible *evolutionary mechanism* behind the speed and robustness of treatment resistance to targeted therapies in patients.

## Conclusion

Drug-sensitive (parental) and resistant cells interact not only with alectinib, but also with each other and microenvironmental factors like CAFs. The relative fitness advantage of resistant over parental cells is a function of the initial proportion of sensitive cells – the gain function characterizing replicator dynamics. In co-culture, resistant cells have an advantage over parental cells even in DMSO. Measuring a linear gain function has enabled us to develop an assay that represents the inter-dependence between parental and resistant cells as a matrix game. Not only are these games quantitatively different among the four environmental conditions – see Figure 4b – but they are also of two qualitatively different types: a Leader game in the case of DMSO + CAF and Deadlock in the other three cases.

This ability of treatment to qualitatively change the type of game being played provides the first empirical demonstration of the principal “don’t treat the player, treat the game”. Our hope is that our game assay will allow further empirical connection and the potential translations of existing oncologic EGT literature to the clinic. Unfortunately, the Leader and Deadlock games that we measured are understudied in mathematical oncology, and we hope that our observation of them will motivate theorists to explore them in more detail. One difference between these game types is already clear: in the case of Leader there is negative frequency dependence selection toward a coexistence of parental and resistant cells – which we confirm for DMSO + CAF in Figure 4c – while for Deadlock there is selection towards a completely resistant tumour. Since DMSO + CAF is the closest analogue to a pre-treatment patient in our *in vitro* system, this suggests that there might be much higher levels of initial heterogeneity in drug resistance than prior theory would suggest. As such, we urge theorists to reconsider the assumption of the rare preexisting resistant clone. Of course, our results are for a single *in vitro* system. But if similar results hold *in vivo* and/or for other cancers, it could help explain the ubiquity and speed of resistance that undermines our abilities to cure patients or control their disease in the long term. We will not know this unless we set out to quantify the non-cell autonomous processes in cancer. Building a catalogue of the games cancers play – by adopting our game assay in other cancers, and other experimental contexts – can help resolve this and others questions. And thus serve as foundation for targeting evolutionary mechanisms of resistance – a new strategy in cancer therapy: treating the game.

## Appendices

In these supplementary appendices, we develop and discuss the tools used to define and measure our game assay. The structure is as follows:

Description of materials used, the experimental method, and time-lapse microscopy.

Basic quantification of experimental images and how growth rates and associated error are measured within each well. This is the definition of fitness used throughout our text. Explains Figures 1 and 2 from the main text.

Defines parental proportions (

*p*) and contrasts the evolutionary dynamics of proportions (Supplementary Figure 5) with the ecological dynamics of densities (Supplementary Figure 6). Defines the fitness functions based on lines of best between fitness and proportion from Figure 3 and presents the actual lines of best fit as equations 1-8. Explains how the linear fitness functions are converted into the games of Figure 4b. Justifies the use of linear functions in terms of explanatory value and presents the model residuals in Supplementary Figure 7.Presentation of interpretable fitness functions from section ‘Frequency dependence in fitness functions’ of the main text in the context of regularization. As a visual check of the regularization, Supplementary Figure 8 shows what the games would look like if based on the regularized fitness functions instead of the unregularized fitness functions shown in Figure 4b.

Experimental interpretation of replicator dynamics as an alternative to exponential model of Figure 4a.

## A Materials and experimental method

### Cell lines

H3122 cell line was obtained from Dr. Haura (Moffitt Cancer Center). Cell line identity was validated by the Moffitt Cancer Center Molecular Genetics core facility using short tandem repeats (STR) analysis. Primary lung cancer associated fibroblasts were obtained from S. Antonia lab (Moffitt Cancer Center), following the protocols approved by the USF Institutional Review Board. CAFs were isolated as previously described in Mediavilla-Varela et al. [26] and expanded for 3-10 passages prior to the experiments. The alectinib resistant derivative cell line was obtained through escalating inhibitor concentration protocol, as described in Dhawan et al. [13]. Alectinib sensitive parental H3122 cells were cultured in DMSO for the same length of time, as the alectinib resistant derivate.

Stable GFP and mCherry expressing derivative cell H3122 cell lines were obtained through lentiviral transduction with pLVX-AcGFP (Clontech) and mCherry (obtained from K. Mitsiades, DFCI) vectors, respectively. We cultured both H3122 cells and CAFs in RPMI media (Gibco brand from Thermo Scientific), supplemented with 10% FBS (purchased from Serum Source, Charlotte, NC). Regular tests for mycoplasma contamination were performed with MycoScope PCR based kit from GenLantis, San Diego, CA.

### Experimental set-up

The cells were harvested upon reaching 70% confluence and counted using Countess II automatic cell counter (Invitrogen). CAFs were counted manually to avoid segmentation artifacts. For the determination of competitive growth rates, 2,000 H3122 cells were seeded with or without 500 CAF cells in 50 *μ*L RPMI media per well into 384 well plates (Corning, catalogueueue #7200655), with different proportions of differentially labelled parental and alectinib resistant variants. 20 hours after seeding, alectinib – purchased from ChemieTek (Indianapolis, IN) – or DMSO vehicle control, diluted in 20 *μ*L RPMI was added to each well, to achieve final alectinib concentration of 500 nM/L [14]. Time lapse microscopy measurements were performed every 4 hours in white light, as well as green and red fluorescent channels using Incucyte Zoom system from Essen Bioscience.

## B Measuring population sizes and fitnesses

### Fluorescent area as units of population size

We measured fluorescent area from time-lapse images via python code using the OpenCV package and used this as our units of size for populations. See Kaznatcheev [27] for a discussion of fitness and replicator dynamics under various definition of population size. We cleaned images by renormalizing them (GFP and mCherry intensities vary over different orders of magnitude), removed vignetting with CLAHE, and finally thresholded to identify fluorescent regions. We eliminated salt-and-pepper noise from the thresholded images with the opening morphological transform. See Figures 2a,b,d,e for examples of the image analysis. The resultant area is then taken as a measure of population size for the purposes of computing fitnesses.

### Growth rate as fitness

We use growth rate as our measure of fitness. In order to minimise the impact of growth inhibition by confluency, we analyzed the competitive dynamics during the first 5 days of culture, when the cell population was expanding exponentially. See section E for a discussion of the impact of measurement length. We learned growth rate along with a confidence interval from the time-series of population size in each well using the Theil-Sen estimator. See Figures 2c,f for examples of fitting. The learned parental growth rate and resistant growth rate of each well are used as the y coordinates in the monoculture experiments of Figure 1 (along with errors on the growth rate) and as the x and y coordinates of the main part of Figure 2. Due to too much information content, the errors on the growth rates are omitted in Figure 2, but they are shown explicitly as error-bars in Figure 3.

### Figure 2 as map of analysis flow

Along with showing all the data, Figure 2 serves as a map to the above analysis pipeline. The inset subfigures and figure can be understood in the following order:

[a,b,d,e] Within each image from the series generated by time-lapse microscopy: identify the fluorescent regions for GFP and mCherry and calculate their areas to serve as units of population size (GFA and CFA).

[c,f] For parental (mCherry) and resistant (GFP) plot the population sizes from each image in the series on a semilog grid as population vs. time. Find the slope of the two lines to serve as parental and resistant fitness.

[main] Use the parental fitness as x value and resistant as y value to plot each well as a data-point according to the above process, and color the point according to its experimental condition (with opacity for initial parental proportion; see Section C). For ease of viewing: put a convex hull binding polygon around each well data-point dependent on their experimental condition.

Given the complexity of Figure 2, it is tempting to ask for a simple summary statistic of the data in the main figure. But it is not reasonable to ask of the “average” growth rate in Figure 2 because each point differes not only along the four experimental conditions of the environment, but also along the micro-environmental conditions of the initial parental proportion (represented by the opacity that is explained in SA C). Averaging over this information would be akin to assuming that the growth rates are cell-autonomous. It would be attributing the variance in growth rates to noise instead of the independent variable or initial parental proportion. As such, the game assay developed in rest of the paper can be viewed as a method for summarizing Figure 2 when the underlying process is non-cell-autonomous. And the games derived through Figure 3 and presented in Figure 4b are the summary of the data in the main part of Figure 2.

## C Measuring fitness functions and games

### Proportions

Since raw population sizes have different units (GFP Fluorescent Area (GFA) vs mCherry Fluorescent Area (RFA)), we converted them to common cell-number-units (CNU) by learning the linear transform that scales GFA and RFA into CNU. We defined proportions based on this common CNU as *p* = *N*_{P} */*(*N*_{P} + *N*_{R}) where *N*_{{P,R}} is the CNU size of parental and resistant populations. The transform of GFA to RFA into CNU is associated with an error that is propagated to measures of *p* as *σp*. The time dynamics of *p* can be seen in the insets of Figure 4b for DMSO and DMSO+CAF or in Supplementary Figure 5 for all conditions.

### Neglecting ecological dynamics

Throughout this report, we focus on evolutionary dynamics: changes in proportion of strategies. However, one could also consider the ecological dynamics: changes in densities of strategies. It is not only proportions that are changing in our experimental system but also the densities. These ecological dynamics are not the focus of our report, but we present them in Supplementary Figure 6 for completeness. Here, we also compare the prediction of the model based on our measured games and the exponential growth interpretation in Figure 4a to the observed data. There is overall agreement between data and model. But this is based on the traditional two track approach on qualitative agreement. Instead, we prefer to focus on the single track measurement of evolutionary dynamics described in the rest of these supplementary materials. Future work can aim to extend our approach to also include ecological dynamics.

### Lines of best fit as fitness functions

To measure the fitness functions we plotted fitness of each cell-type in each well vs seeding proportion (*p*) of parental cells in Figure 3. The x-axis proportion of parental cells (*p*) was computed from the first time-point: see section E for an interpretation of this as a measurement of *dp/dt* or as a series of competitive fitness assays. We estimated the line of best-fit and error on parameters for this data using least-squares weighted by the inverse of the error on each data point (i.e. ). This provides the error estimates on the line’s parameters that we use later. The lines of best fit (with coefficients rounded to the thousandths for presentation) from weighted least-squares are:

### Summarizing fitness functions as games

For the final column of our presentation of in equations 1-8, we rewrote the fitness functions in a suggestive form of and This is done to show at a glance where the matrix entries in Figure 4b come from. This is because the *p* = 0 and *p* = 1 intercepts of the fitness functions serve as the entries of the game matrices. Note that in Figure 4b, we multiplied the entries by 100 for easier presentation. The game point are calculated from the matrices as *x* := *C - A* and *y* := *B - D*, and the error is propagated from the error estimates on fitness function’s parameters.

### Lines and matrix games

Although slight deviations from a linear fit – that might not be attributable to noise alone – might be present in the data (see Supplementary Figure 7), we do not think that they justify considering higher-order fitness functions. This is due to the higher explanatory value of linear models and our hope to influence the well-established study of matrix games in microscopic systems. Some good EGT work has recently been done on non-linear games [31, 25, 24], but this is very little compared to the immense literature on matrix games. More importantly, we think that our focus on matrix games is better viewed not from the perspective of model selection but rather as an operational definition of effective games. We are not aiming to provide the best or most predictive account of non-small-cell lung cancer in the petri dish, but rather a method for measuring (matrix) games. If the error of the measured (matrix) games ends up very high – which is not the case from the error bars in Figure 4b – then we know that this first order approximation of interactions is not sufficient and higher orders should be pursued. However, we will not know this unless we first have a robust method for measuring the lower order terms.

## D Regularization and interpretable fitness functions

Regularization is a machine learning technique for reducing over-fitting by biasing towards more succinct models. It is the use of *a priori* knowledge on what constitutes a simpler or more likely model to anchor our inference. A classic example of this is preferring lower-order over higher-order polynomials for describing data unless there is overwhelming evidence otherwise. Of course, what constitutes overwhelming evidence depends on the goals of the scientists. If the only goal is prediction then cross-validation is a good way to test how heavily inference should be regularized. But if the goal is explanation then accordance with existing theory is another important factor to consider.

As such, our choice of focusing on linear fitness function in section C and Figure 3 can be seen as a form of regularization. In particular, we can see our inference procedure as either restricted to the hypothesis class of linear functions, or as considering the hypothesis class of all polynomials but with prohibitively high costs for non-zero components (*l*0 regularization) on orders beyond linear. But we prefer to think of it in terms of operationalization. By introducing a game assay, we are defining the hidden variable of (matrix) games in terms of the measurement procedure that we described in sections B and C.

### Interpretable fitness functions

An uncontroversial case of regularization in our report is the presentation of *w*^{C} in section ‘Frequency dependence in fitness functions’. There, we restrict beyond linear fitness functions to focus on conceptually simple ones. In particular, we favor cell-autonomous functions over frequency dependent ones (i.e. *l*_{0} regularization on the fitness function coefficients) and we favor coefficients that are shared between different *S* and *C*. This results in the following regularized fitness functions:

Note that for both *P* and *R* strategies, we used the proportion of the other strategy (1 - *p*, *p*) as the parameter that captures the non-cell-autonomous contribution. In equations 11,15, we also consider the parameter because of the elegant form it provides.

We can compare these regularized fitness functions to the non-regularized in equations 1-8. As can be seen, all are close to their respective and are actually within the error estimates on . We can see the regularization in action with a push towards a constant base fitness of 0.025 shared by , and .

The absence of frequency dependent perturbation terms for and suggests that these strategies can be explained in terms of cell-autonomous processes. However, the other strategies in the other contexts ask for a non-cell-autonomous explanation.

### Games from interpretable fitness functions

For a visual confirmation that the regularization of in equations 9-16 are reasonable, we can transform them into regularized games. We do this in the same way as we did for transforming the non-regularized in equations 1-8 into the game-points of figure 4b. The results are in Supplementary Figure 8. The regularized games (points) are within the confidence rectangles of the measured games (boxes), with the exception of DMSO which is just outside its box. This is reasonable given that the boxes correspond to error: i.e. around 2/3rds confidence.

## E Experimental definition of replicator dynamics

Consider a well that is seeded with an initial number of parental and of resistance cells; total number Let be the number of {parental,resistant} cells after being grown for an amount of time Δ*t*. From this, the experimental growth rate can be defined based on fold change as:
this can be rotated into a mapping *N*^{I} *1→ N*^{F} given by

By defining the initial and final proportion of parental cells as , we can find the mapping:
where *(w)* = *p*^{I} *wP* + (1 *- p*^{I}) *wR*.

We can approximate this discrete process with a continuous one by defining *p*(*t*) = *p*^{I}, *p*(*t* + Δ*t*) = *p*^{F} and looking at the limit as Δ*t* gets very small:

Thus, we recover replicator dynamics as an explicit experimental interpretation for all of our theoretical terms. Note that we did not make any assumptions about if things are inviscid or spatial; if we are talking about individual or inclusive fitness; or, if we have growing populations in log phase or static populations with replacement. All of these microdynamical details are buried in the definition of experimental fitness. This allows us to focus on effective games [27] and avoid potential confusions over aspects like spatial structure [35].

### Better estimates of *w*

The problem with the definition of *w* in equation 17 is that it depends on just two time points, and thus not good for quantifying error. In our experimental system, we are able to peek inside the system with time-lapse microscopy. This allows us to get more than just the initial and final population sizes and replace fold-change by the more specific measurements of inferred growth rates for *w*_{{P,R}} that we describe in section B. An advantage of this approach is that the goodness-of-fit of the exponential growth model provides a good estimate of the error associated with each measurement of *w*. Thus, we are able to quantify error within each well and not just between experimental replicates in different wells with similar initial conditions.

### Accounting for finite Δ*t*

Practically, our experimental system cannot take the limit as Δ*t* goes to 0 because of a precision-accuracy trade-off. For very short measurements, we might get higher accuracy (assuming biological factors like time from seeding to adherence could be ignored) but would have incredibly low precision (due to only one, two or three time points from which to calculate growth rate). As we increase the time of the experiment, the accuracy might decrease but the precision will tend to increase. Given the biological constraints of our system, we judged that 5 days was a good trade-off point. This will most likely be different for other experimental systems.

Given that *w* are defined over a finite range of time, we need to pick a particular time-point to associate each measurement with. As is common for discrete time process, we attribute the value of the growth rate to the initial point. In particular, this means that when we make *w*_{{P,R}} a function of *p* in the main text, then the values of growth rate are attributed to the initial proportion of parental cells and not the final one. This customary choice is further reinforced by the fact that we have a less noisy estimate of initial proportions of cells than of the final, and so other definitions would lead to less precise measurements.

Finally, our procedure can be viewed as standard competitive fitness assays but with initial ratio of the two types as a varied experimental parameter. Thus, for consistency with both theoretical and experimental literature, we associated the growth rates with the initial – more controlled – seeding proportion.

## Acknowledgements

JGS would like to acknowledge the NIH Loan Repayment program for their generous support of his research in general as well as Miles for Moffitt for support of this work. We would also like to thank Mohamed Abazeed, Peter Jeavons, and Konstantine Kaznatcheev for helpful discussions.