Optimal Therapy Scheduling Based on a Pair of Collaterally Sensitive Drugs

Despite major strides in the treatment of cancer, the development of drug resistance remains a major hurdle. One strategy which has been proposed to address this is the sequential application of drug therapies where resistance to one drug induces sensitivity to another drug, a concept called collateral sensitivity. The optimal timing of drug switching in these situations, however, remains unknown. To study this, we developed a dynamical model of sequential therapy on heterogeneous tumors comprised of resistant and sensitive cells. A pair of drugs (DrugA, DrugB) are utilized and are periodically switched during therapy. Assuming resistant cells to one drug are collaterally sensitive to the opposing drug, we classified cancer cells into two groups, AR\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_\mathrm{R}$$\end{document} and BR\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B_\mathrm{R}$$\end{document}, each of which is a subpopulation of cells resistant to the indicated drug and concurrently sensitive to the other, and we subsequently explored the resulting population dynamics. Specifically, based on a system of ordinary differential equations for AR\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_\mathrm{R}$$\end{document} and BR\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B_\mathrm{R}$$\end{document}, we determined that the optimal treatment strategy consists of two stages: an initial stage in which a chosen effective drug is utilized until a specific time point, T, and a second stage in which drugs are switched repeatedly, during which each drug is used for a relative duration (i.e., fΔt\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f \Delta t$$\end{document}-long for DrugA and (1-f)Δt\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1-f) \Delta t$$\end{document}-long for DrugB with 0≤f≤1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0 \le f \le 1$$\end{document} and Δt≥0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta t \ge 0$$\end{document}). We prove that the optimal duration of the initial stage, in which the first drug is administered, T, is shorter than the period in which it remains effective in decreasing the total population, contrary to current clinical intuition. We further analyzed the relationship between population makeup, A/B=AR/BR\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {A/B} = A_\mathrm{R}/B_\mathrm{R}$$\end{document}, and the effect of each drug. We determine a critical ratio, which we term (A/B)∗\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {(A/B)}^{*}$$\end{document}, at which the two drugs are equally effective. As the first stage of the optimal strategy is applied, A/B\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {A/B}$$\end{document} changes monotonically to (A/B)∗\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {(A/B)}^{*}$$\end{document} and then, during the second stage, remains at (A/B)∗\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {(A/B)}^{*}$$\end{document} thereafter. Beyond our analytic results, we explored an individual-based stochastic model and presented the distribution of extinction times for the classes of solutions found. Taken together, our results suggest opportunities to improve therapy scheduling in clinical oncology.

further analyzed the relationship between population makeup, A/B = A R /B R , and the effect of each drug. We determine a critical ratio, which we term (A/B) * , at which the two drugs are equally effective. As the first stage of the optimal strategy is applied, A/B changes monotonically to (A/B) * and then, during the second stage, remains at (A/B) * thereafter. Beyond our analytic results, we explored an individual-based stochastic model and presented the distribution of extinction times for the classes of solutions found. Taken together, our results suggest opportunities to improve therapy scheduling in clinical oncology.
Keywords Cancer · Evolution of resistance · Dynamical systems · Optimal control

Introduction
Drug resistance is observed in many patients after exposure to cancer therapy and is a major hurdle in cancer therapy (Holohan et al. 2013). In most cases, treatment with appropriate chemo-or targeted therapy reliably reduces tumor burden upon initiation. However, in the majority of cases, resistance inevitably arises, and the disease relapses (Gillies et al. 2012). The observation of relapse is typically accomplished during surveillance through imaging, or in some cases a blood-based marker (Monzon et al. 2009;Legler et al. 1998). Disease recurrence is observed, at the earliest, when the disease burden reaches some threshold of detection, at which point the first-line therapy is deemed to have failed and a second-line drug is used to control the disease (see Fig. 1a). We argue herein that a redesign of treatment should start earlier than this time point, not only because the detection threshold is higher than the minimum disease burden, but also because the first drug could become less efficient as the duration of therapy reaches T max . In this research, we focus on the latter reason and figure out how much earlier we should switch drug in advance of T max , assuming that the former reason is less important (t DT − t o ≈ T max ).
While for many years it was assumed that tumors were simply collections of clonal cells, it is now accepted that tumor heterogeneity is the rule (Marusyk et al. 2012). The simplest manifestation of this heterogeneity can be represented by considering the existence of both therapy resistant and sensitive cell types coexisting prior to therapy (Amirouchene-Angelozzi et al. 2017), with the future cellular composition shaped by the choice of drugs (illustrated in Fig. 1b). Beyond simple selection for resistant cells, cells can also become altered toward a resistant state during treatment, either by (i) genetic mutations (Thomas et al. 1996;Jonsson et al. 2017) or by (ii) phenotypic plasticity and resulting epigenetic modifications (Scheel and Weinberg 2011;Kaznatcheev et al. 2017;Pisco et al. 2013).
To combat resistance, many strategies have been attempted, including multi-drug therapies targeting more than one cell type at a time. While multi-drug therapy has enjoyed successes in many cancers, especially pediatric ones, the resulting combinations can often be very toxic. Further, recent work has suggested that the success of multi-drug therapy at the population level is likely overstated in individuals, given intra-patient heterogeneity (Palmer and Sorger 2017). Recently, researchers have sought specific sequential single-drug applications that induce sensitivity, a concept It increases initially and then decreases as of the therapy starting point (t 0 ) and eventually rebounds after the maximum period with positive therapy effect (T max ). Relapse is found, at the earliest, when disease burden reaches detection threshold at t DT . b Change in composition of tumor cell population when a pair of collaterally sensitive drugs are given one after another is called collateral sensitivity (Hutchison 1963;Dhawan et al. 2017;Imamovic and Sommer 2013;Zhao et al. 2016). In some cases, several drugs used sequentially can complete a collateral sensitivity cycle (Imamovic and Sommer 2013;Dhawan et al. 2017), and corresponding periodic drug sequence can be used in the prescription of long-term therapies-though the continued efficacy of this cycle is not guaranteed . In this research, we focus on a drug cycle comprised of just two drugs, each of which can be used as a targeted therapy against cells that have evolved resistance to the previous drug (illustrated in Fig. 1b).
The underlying dynamics of resistance development has previously been studied using cell populations consisting of treatment sensitive and resistant types, using either genotypic or phenotypic classifications (Tomasetti and Levy 2010). Additionally, others have justified their choices of detailed cellular heterogeneities using: (i) stages in evolutionary structures Komarova and Wodarz 2003), (ii) phases of cell cycle (Goldie 1982;Gaffney 2004Gaffney , 2005Boston and Gaffney 2011), or (iii) spatial distribution of irregular therapy effect Rejniak 2005). Among these, researchers (including Tomasetti and Levy 2010;Gaffney 2004Gaffney , 2005Chen et al. 2013;Katouli and Komarova 2011) have studied the effect of a pair of collaterally sensitive drugs as we propose here, using the Goldie-Coldman model or its variations Katouli and Komarova 2011;Goldie and Coldman 1979;Coldman and Goldie 1983). These models utilize a population structure consisting of four compartments, each of which represents a subpopulation that (1) Fig. 2 Schematic of dynamics between sensitive cells population, C S , and resistant cells population, C R , (left panel) and the differential system of {C S , C R } (right panel) with s−proliferation rate of sensitive cells, r −proliferation rate of resistant cells, g−transition rate from C S to C R is either (i) sensitive to the both drugs, (ii) and (iii) resistant to one drug, respectively, or (iv) resistant to both.
In this manuscript, we propose a modeling approach which is the minimal model sufficient to study the effects of two populations of cells and two collaterally sensitive drugs. The model's simplicity facilitates exact mathematical derivations of useful concepts and quantities and illustrates several novel concepts relevant to adaptive therapy. The remainder of the manuscript is structured as follows. In Sect. 2, we outline the model and define terms. In Sect. 3, we present analysis of drug switch timing and duration. In Sect. 4, we relax several assumptions in our analytic model and study extinction times in a stochastic formulation, which agrees well with analysis in the mean field. In Sect. 5, we conclude and present work for future directions.

Modeling setup 2.1 Basic cell population dynamics under a single-drug administration
Based on the sensitivity and resistance to a therapy, the cell population can be split into two groups. We refer to the population sizes of sensitive cells and resistant cells as C S and C R , respectively, and then use the total cell population size, C P := C S + C R , to measure disease burden and drug effect.
We account for three dynamical events in our model: proliferation of sensitive (s) and resistant cells (r ), and transition between these cell types (g). Here, net proliferation rate represents combined birth and death rate, which can be positive if the birth rate is higher than the death rate or negative otherwise. It is reasonable to assume that, in the presence of drug, the sensitive cell population size declines (s < 0) and resistant cell population size increases (r > 0) and that g > 0. Therefore, for the remainder of the work we consider only conditions in which s < 0, r > 0 and g > 0. Figure 2 illustrates the population dynamics and the system of ordinary differential equations that {C S , C R } obey. The solution of the system (1) is where C P (t) is a positive function comprised of a linear combination of exponential growth (e r t ) and exponential decay (e −(g−s) t ) with positive coefficients. Despite the limitations of simple exponential growth models Gerlee (2013), we feel it is a reasonable place to start, since the relapse of tumor size starts when it is much smaller than its carrying capacity which results in almost exponential growth. C P has one and only one minimum point in {−∞, ∞}, after which C P increases monotonically. If C P (0) = s C 0 S +r C 0 R ≥ 0, the drug is inefficient (C P (t) is increasing on t ≥ 0, see an example in Fig. 3a). Otherwise, if C P (0) < 0, the drug is effective in reducing tumor burden at the beginning, although it will eventually regrow (due to drug resistance; see example in Fig. 3b).

Cell population dynamics with a pair of collateral sensitivity drugs
Here we describe the effect of sequential therapy with two drugs switched in turn, by extending the model for a single-drug administration (system (1)). Assuming that the drugs are collaterally sensitive to each other, cell population is classified into just two groups reacting to the two types of drugs in opposite ways. Depending on which drug is administered, cells in the two groups will have different proliferation rates and direction of cell-type transition (see Fig. 4). That is, the population dynamics of the two groups follow a piecewise continuous differential system consisting of a series of the system (1), each of which is assigned to a time slot bounded by drug switching times.
In summary, we assume that: •

Drug switch timing
To begin exploring the possible strategies of drug switching and timing within our model, we first tested an idea based on clinical intuition. As we discussed, the norm in the clinic is to change drugs when failure is observed either radiographically or through a biomarker. We know, however, that the true failure occurs somewhat before this, yet at that time it is below the threshold of detection. To model drug switching at the point of "true failure," the intuitive (yet unobservable) time point when the tumor population begins to rebound, we switch the drugs at the global minimum point of tumor size which we term T max (see Fig. 1a), which was shown to exist uniquely in the previous section if and only if C R (0)/C S (0) < −s/r . The expression for T max , derived from our model, is with (R/S) 0 := C R (0)/C S (0). (See "Appendix A.1" for this derivation.) We see that the quantity T max depends only on (i) the parameters of the drug being administered and (ii) the initial population makeup. In the Drug A-based therapy, it is T max ( p A , (A/B) 0 ), and in the Drug B-based therapy, it is T max In addition to T max , another important time point is T min , explained below. Since the rate of population decrease is almost zero around T max , with no switch (see the black curve in Fig. 6), we seek to find a way to extend the high rate of population decrease by switching drugs before T max . To decide how much earlier to do so, we compared the derivative of C P under constant selective pressure (no switch) at an arbitrary time point, t 1 , and compared it to the right derivative of C P at t 1 with the drug switch assigned to t 1 .
For example, if the first drug is Drug A and the follow-up drug is Drug B (illustrated in Fig. 6), we compare Here, {A R , B R } at t 1 are initial conditions in the expressions, so the derivatives are measured at t = 0. This comparison reveals that the two derivatives are equal if and only if t 1 is a specific point (T min (see the yellow curve in Fig. 6)) The derivative when the drugs are switched is lower (decreasing faster) if and only if t 1 > T min (see the blue and green curves in Fig. 6), and the derivative when the drugs are not switched is lower if and only if t 1 < T min (see the red curve in Fig. 6).
The general form of T min depends on the parameters of the "pre-switch" drug {s 1 , r 1 , g 1 } and for the "post-switch" drug {s 2 , r 2 }, as well as the initial population ratio between resistant cells and sensitive cells to the "pre-switch" drug (R/S) 0 (see "Appendix A.1" for details derivation). Here, the transition parameter in the second drug (g 2 ) and the respective values of the two populations are unnecessary in the evaluation of T min , which is found to be In the Drug A-to-Drug B switch, it is T min ( p A , p B , (A/B) 0 ), and in the Drug B-to- It is important to note that the population curve with a single drug switch after T min (and before T max , assuming that T min < T max ) is not guaranteed to be lower than that of a single drug switch at T max over the entire time range. As an example, as illustrated in Fig. 6, the green curve relevant to the switch at (T min + T max )/2 and the blue curve relevant to the switch at T max intersect at t ≈ 58 and the blue curve is DrugA alone switch to DrugB at t 1 2 T min switch to DrugB at t T min switch to DrugB at t 1 2 T min 1 2 T max switch to DrugB at t T max Fig. 6 Comparison of total population curves with a one-time drug switch from Drug A to Drug B at different time points (i) at < T min (worse than without switch, red curve), (ii) at T min (same as without switch, yellow curve), (iii) between T min and T max (better than without switch, green curve), and (iv) T max (better than without switch, blue curve). Each color represents cell population size during and after a drug switch using each switching strategy. The dashed yellow and black curve represents the overlap between the yellow and black curves. The tangent lines of the population curves at the chosen drug switch time points are illustrated above. Parameters: DrugA alone starting After DrugA to DrugB switch at t Tmax

Instantaneous switch starting at i t 0 and ii t Tmin
Arbitrary schedule with initial DrugA to DrugB switch earlier than Tmin Arbitrary schedule with initial DrugA to DrugB switch between Tmin and Tmax lower after the time of this intersection. However, sequential drug switches starting between T min and T max create the possibility of finding a better drug schedule than the T max -based strategy. Figure 7 shows possible choices of follow-up switches (green and black curves) which achieve better results than a T max -switch (red curves), unlike the drug switches starting before T min , which remain less effective (magenta curve).
The optimal drug switching scheme will be discussed in detail in Sect. 4. The optimal scheduling for the example shown in Fig. 5 starts by using the first drug until T min (blue curve for 0 < t ≤ T min ) followed by a rapid exchange of the two drugs afterward (black curve for t > T min ). Switching before T max , that is, before the drug has had its full effect, goes somewhat against clinical intuition and is therefore an opportunity for unrealized clinical improvement based on a rationally scheduled switch at T min . In order to realize this however, there are conditions about the order of T max and T min which must be satisfied. In particular: In our analysis and simulations, we will deal with the cases mostly satisfying r 1 r 2 < s 1 s 2 , as otherwise the choice of drugs is not powerful to reduce the cell population (explained in detail in the next section and Fig. 8).
This window of opportunity, where the clinical gains could be made, which we will term T gap , is the difference between T min and T max . This relationship allows us to compare T min and T max using different parameters.
We analyze sensitivity of T gap over a reasonable space of non-dimentionalized drug parameters in "Appendix B." As expected, as the proliferation rates under the second drug increase (r 2 ↑ and/or s 2 ↑), the optimal time to switch to the second drug is delayed (T min ↑ and T gap ↓). As r 1 increases, both T min and T max decrease. However, T max decreases more than T min does, so overall T gap decreases. s 1 and T gap do not have a monotonic relationship. As s 1 increases, T gap increases for a while (when s 1 is relatively low) and then decreases afterward (when s 1 is relatively high).

Population makeup and drug effect
In the previous section, the derived time points (T min , T max ) are dependent on the initial population makeup ((R/S) 0 ) from Equations (4)-(5), but not on explicit size of the total population or subpopulations. This makes sense, since absolute population size plays a role by scaling overall behavior of populations (C P (t|{s, r, g}, (2)), and T min and T max are both defined by derivatives at the time points (i.e., C P (T max ) = 0, and from (5)). In this section, we seek to clarify the relationships between population makeup and therapy effects defined using C P (t) and roles of T min and T max in these relationships. We first define functions of the ratio between the two cell subpopulations: We further define functions measuring drug effectiveness as the relative rate of population change depending only on R/S and drug parameters: In the case where we classify cells as A R and B R , we similarly define their population makeup as: Then, A/B at T min , using a Drug A-to-Drug B switch (T A min ), and A/B, using a Drug B-to-Drug A switch (T B min ), are equivalent: and further, as s < 0 and r > 0, values of A/B are all positive. We give a more thorough description of (9) and (10) in "Appendix A.1." The effects of Drug A (specified by p A ) and Drug B (specified by p B ), both defined by (8), are equivalent at T min , that is since the Drug A resistant cell population is relatively smaller than the population of the other cell type; otherwise, Drug B has a more beneficial effect. When t = T A max , and therefore when A/B(t) = −s A /r A , Drug A has no effect on population reduction (i.e., is getting smaller, Drug A becomes effective. Furthermore, the smaller A/B is, the better the effect Drug A has. Similarly, the effect of Drug B is zero when t = T B max and A/B(t) = −r B /s B ) and increases as A/B increases above it (see Fig. 8).
The population makeup changes in the opposite direction as Drug A (or Drug B) therapy continues and A/B therefore continues to increase (or decrease). Therefore, if Drug A (or Drug B) is given too long, it goes through a period of no or almost no effect around A/B = −s A /r A (or around A/B = −r B /s B ), but once the drug is switched after that, there will be a higher therapy effect with Drug B (or with Drug A). These two opposite aspects are balanced by switching the drug when the population makeup reaches (A/B) * , which is applied to the optimal therapy regimen described in the next section. Depending on condition (6), the order of the three population makeups at T min , T A min and T B max changes. In particular, if r A r B < s A s B , there exists an interval These results are illustrated in Fig. 8.

Optimal scheduling and its clinical implementation
In this section, we describe a drug switching schedule design to achieve the best effect possible with a pair of collaterally sensitive drugs. The area under the curve of the total population simulated under an assigned treatment strategy is utilized to measure the aggregate effect of the strategy. The smaller the area, the better the corresponding strategy. The numerically determined optimal strategy consists of two stages: • Stage 1 Treat with first drug until reaching the population makeup where the effects of each drug are balanced ((A/B) * ), that is until the T min of the first drug. • Stage 2 Begin switching drugs with a specific temporal ratio (represented by k or k , see Fig. 9) determining the difference in the treatment duration of each drug, and switching frequently (represented by t ≈ 0). Both conditions are used to keep A/B close to constant near (A/B) * .
We represent the relative durations of Drug A compared to the duration of Drug B in Stage 2 by k and k . The explicit formulation of k can be derived from the solution of differential equations (2). To do so, we (i) evaluate the level of A/B after t time has passed during Drug A therapy, when starting with , and then (ii) by measuring the time period taken to regain through therapy with Drug B, denoted by t , and finally (iii) taking the ratio between the two therapy periods, which is k := t/ t . k depends on the frequency of drug switching and model parameters: This k is consistent with k = k ( t, p A , p B ), which is the ratio similarly evaluated with Drug B as the first therapy and Drug A as the follow-up therapy, in the optimal case of instantaneous switching: For a more detailed derivation of k * , see "Appendix A.1." We further studied how sensitive k * (or f * = k * /(1 + k * )) is over a reasonable range of non-dimentionalized { p A , p B } (see "Appendix B" for details). k * (or f * ) increases, as r A and/or s B decreases as s A and/or r B increases. Figure 10 shows examples of population curves with the optimal strategy (T min switch) and one non-optimal strategy (T max switch) using the same choice of parameters/conditions. Visual comparison of total population curves (Fig. 10a) reveals that the predicted optimal strategy outperforms the intuitive strategy. To quantitatively compare the efficacy of each strategy, we can use area between the two population curves. This area is: With a choice of upper limit large enough to include most treatment schedules, x = 100 (days), we used sensitivity analysis of the integral (13) (see "Appendix B" for the details). The advantage of the optimal treatment strategy is demonstrated by the lower population sizes in all cases. And the evaluations of the areas under the population curves from t = 0 to a range at the upper limit of integration (Fig. 10b) confirm the superior effect of the optimal strategy over time. Figure 10c shows the typical pattern of A/B in the optimal therapy compared to the other, which is monotonically changing toward (A/B) * in the first stage and constant in the second stage. While our theory predicts optimality with "instantaneous drug switching," we realize this is not clinically feasible. Therefore, the instantaneous drug switching in Stage 2 could be approximated by a high-frequency switching strategy with t 0 along with the corresponding k( t) from (11), or k * (12) independent from t. As expected, the smaller t is chosen, the closer the population follows the ideal case with t = 0 (see "Appendix C" for the details), but improvements can still be made over non-strategic switching, if the temporal ratio is followed.
We have proved that the effect of instantaneous drug switching, with an arbitrary ratio in duration between two drugs (k), is consistent with the effect of a mixed drug with a relative dosage ratio, which is also k (Theorem A.8 in "Appendix A.2"). The theorem is used in the derivation of a differential system/solution of the optimal strategy (Theorem A.11 in "Appendix A.3"). According to these results, in Stage 2 of optimal regimen, all types of populations, A R , B R and A R + B R , change with the same constant proliferation rate: While not clinical proof, these theoretical results suggest a method of application of two drugs in sequence, which would approximate multi-drug therapy in efficacy, but which could be free of the increase in side effects from the combination.

Studying extinction time with a stochastic formulation
In the previous sections, we utilized an entirely deterministic model of heterogeneous tumor growth. Cancers, however, are not deterministic, and without stochasticity in our system, we could not model an important part of cancer treatment: extinction. We therefore constructed a simple individual-based model using a Gillespie algorithm (Gillespie 1976) to study this critical aspect of therapy that is not limited by the assumptions we were required to make for purposes of analytic tractability. Our stochastic model depends not only on net proliferation rates (s, r , see Equation (1)) but also on the combination of birth rates (b S , b R ) and death rates (d S , d R ) where s = b S − d S and r = b R − d R . These five parameters (b s , b r , d s , d r , g) govern the probabilities of events occurring. The time at which one of these events occurs is determined by an exponential probability distribution, and we represent the algorithm as pseudo-code thus: Step 3) t ← t + dt and repeat (Step 2) until a set time has passed or extinction has occurred.
We expanded the stochastic process for a single drug to treatment with two drugs being switched in turn, as in our ODE system (see "Appendix D," for the details of the computational code). Figure 11a shows the consistency between the mean field behavior of the stochastic model and the ODE system.
Despite the generally similar patterns of population curves simulated with same {s, r, g}-type parameters and initial conditions, we observe differences in terms of elimination time if birth/death combinations are different. To quantify these differences, we directly studied the elimination times (defined as the distribution of times to the absorbing state of total population = 0) simulated with different combinations of birth/death rates, with a choice of fixed proliferation rates (as well as other fixed transition rates and initial condition). We defined an index to represent different levels of birth and death rate combinations: Increased I stoch result in larger fluctuations, and these fluctuations then increase the probability of reaching the absorbing state which is extinction (tumor cure). The relationship between I stoch and extinction time is shown in Fig. 11b. The relationship is approximated by a linear model with slope, −93.68 (days 2 ), p value of the slope, p < 0.05, and squared residual of regression, r 2 = 0.1726.

Conclusions and discussion
The emergence of resistance to the best current cancer therapies is an almost universal clinical problem, and the solution to this represents one of the greatest unmet needs in oncology. While much effort has been put into novel drug discovery to combat this, there is also a growing interest in determining the optimal sequences, or cycles of drugs that promote collateral sensitivity. To study this second paradigm, we proposed a simple dynamical systems model of tumor evolution in a heterogeneous tumor composed of two cell phenotypes. While in reality, cell phenotype can be defined in many ways, here we completely describe it by considering only sensitivity (or resistance) to a pair of collaterally sensitive drugs, which is encoded in their differential growth rates in specific conditions. While the resulting mathematical model conveys only simple, but essential, features of cell population dynamics, it does yield analytical solutions that more complex models cannot.
Our original motivation was to consider more complicated sequences, or cycles of drug therapy; however, the model presented herein is difficult to apply for an expanded system of more than two drugs. On the other hand, the cell classification used by others (Tomasetti and Levy 2010;Goldie and Coldman 1983;Katouli and Komarova 2011;Goldie andColdman 1979, 2009) considers sensitivity and resistance independently, or even specifically to a given, abstracted, genotype (Komarova and Wodarz 2005;Nichol et al. 2015). Therefore, in the case of 2 drugs, there are 2 2 = 4 groups, (i) sensitive to both drugs, (ii) and (iii) resistant to only one drug, and (iv) resistant to both drugs. This formulation could be expanded and applied to more than two drugs (Tomasetti and Levy 2010;Goldie and Coldman 2009). Also, in other earlier researches, cell populations are divided by more specific criteria for the choices of cancers and drugs [e.g., level of protein expression, enzyme inhibitors, or growth factors (Kaznatcheev et al. 2017;Pisco et al. 2013;Jonsson et al. 2017)]. We will consider both of the general and specific approaches of population classification in future work.
The simplicity of our exponential growth/decay model arises from the assumption of a constant growth rate. Use of exponential growth is likely not overly inappropriate, as we are most interested in the development of resistance-and resistance is typically thought to begin when the tumor burden is much smaller than the carrying capacity. However, the assumption might have oversimplified patterns of cell growth, which is assumed to be non-exponential by others (e.g., logistic growth Gerlee 2013; Berry et al. 1984;Atuegwu et al. 2013), due to the limited space and resources of the human body for tumor growth, as well as increasing levels of resistance (increasing growth rates) in the face of continued selective pressure . We will consider the concept of changing growth rates in terms of time and population density and explore its effect on our analytical results (such as T gap (A/B) * and k * ) in future work.
We provided a strategy for drug switching which yields the best possible effect in this model system, i.e., the fastest decrease in cell population. The strategy is defined explicitly in terms of parameters determined by the drugs that are used; therefore, the applicability of our model relies on the availability of drug parameters. Drug parameters for several drugs are known based on in vitro experiment or clinical studies (Jackson and Byrne 2000;Wilson et al. 1997). However, these parameters are not available for all drugs, and even the usefulness of in vitro results may change from one patient to the next. Because of this, we propose focusing our future work on learning to parameterize models of this type from individual patient response data. Examples of parameterizing patient response from imaging (Swanson et al. 2011) as well as bloodbased markers (Werner et al. 2016) already exist, suggesting this is a reasonable goal in the near future.
In our optimized treatment regimen, we must first apply Drug A (if Drug A is better at the initial time, i.e., A/B(0) < (A/B) * ; see Fig. 8). Surprisingly, the ideal treatment course switches to Drug B, while Drug A is still effective at reducing the total population. Since treatment should ideally switch before the tumor relapses, our study justifies the search for techniques that either identify or predict resistance mechanisms early. Our study also argues against the opposite extreme, wherein resistant cells are targeted at the beginning of treatment. The preponderance of cells sensitive to the standard of care makes this treatment initially ideal and does not preclude eventual success in our model. Further, the rapid tumor size reduction, associated with targeting the larger sensitive population first, could be clinically meaningful.
Our stochastic model allowed us to explore the contributions of cell birth and death separately, as opposed to the ODE which could only consider the net growth rate. These parameters can be altered in cancer since cancer treatments have various cytostatic and cytotoxic effects, and therefore, different treatments can have different effects on death and birth. In our model, increasing the total birth and death rate (as opposed to the net growth rate) caused, on average, extinction earlier in time (Fig. 11b). This can be explained by the fact that extinction is the only absorbing state in our model, and therefore, higher death rates determine when extinction occurs, even when birth rates are also higher. Our stochastic model therefore suggests that highly cytotoxic drugs (even those with correspondingly minimal cytostatic effects) are more effective at eliminating tumors, at least when the tumor population is small.
In summary, we have presented a simple model of a heterogeneous, two-phenotype tumor, with evolution occurring between resistant and sensitive states. We derive exact analytic solutions for tumor response in temporally changing drug conditions and find an optimal regimen which involves drug switching after a specific, critical time point which occurs before resistance would normally be clinically evident. While our model is highly simplified, we have identified several opportunities to improve our understanding and treatment of drug resistance, and also future opportunities for new modeling endeavors.

T max : Equation (4)
T max is a minimum point of C p (t) (from (3)). Therefore, T min : Equation (5) Let us consider the case of drug switch with Drug A being the "pre-switch" drug and Drug B being the "post-switch" drug. If, at a specific time point t 1 , cell population is decreasing faster by continuing Drug A-therapy than by changing drug to Drug B, (2). Then, Similarly, if and only if the population is dropping faster using Drug B than by continuing to use Drug A, and if and only if the population is dropping at an equal rate with either drug.
The general form of T min is where the parameters of "pre-switch" and "post-switch" drugs are {s 1 , r 1 , g 1 } and {s 2 , r 2 , g 2 }, respectively, and initial population makeup, (R/S) 0 , is the resistant cell population divided by the sensitive cell population for the "pre-switch" drug.

It is clear that
, T max and T min from Eqs. (2), (5) and (4). Otherwise, it can be proved more simply using the concept of T min and T max . Since C S (t) + C R (t) = s C S (t) + r C R (t), from the differential system (1), the derivatives of A R (t) + B R (t) are s A B R (t) + r A A R (t) and s B A R (t) + r B B R (t) under Drug A and Drug B, respectively. At T min (whether it is T A min or T B min ), the derivatives of total populations are equivalent either under Drug A or under Drug B. Then, 5. k * : Equation (12) The sizes of the subpopulations after t-long therapy with Drug A started from initial (2), with some constant K scaling population size. Then, the population makeup at the t and its derivative in terms of t are The time taken from t = t to reach back to the time of from Eq. (5). Then, the relative ratio between the periods of Drug A and Drug B, k , illustrated in Fig. 9, and its limit, k * , can be derived using:

A.2: Differential system of instantaneous drug switch
The goal of this section is to derive the simple differential equations of V = {A R , B R } under instantaneous drug switch (Theorem A.8). For the sake of convenience, we want to use matrix operations and equations based on the vectors and matrices defined below.

Definition D
Proposition A.1 Using Drug A therapy: Using Drug B therapy:  Then, we need to prove that F(n) = L for n = 1, 2, 3, . . . If n = 1, Otherwise, if n ≥ 2 and F(m) = L for all 1 ≤ m ≤ n − 1, Proof Using mathematical induction, if n = 1, (by Proposition A.3 and Lemma A.5) The equality is true for n = 1 If n ≥ 2, and the equality works for all integers 1 ≤ m ≤ n − 1, (by the inductive assumption and Proposition A.3 and Lemma A.5) The equality is true for n ≥ 2) Therefore, proved.
For any time point t 0 , let us define V (t) as a vector-valued function of A R (t) and B R (t) describing the cell population dynamics under a periodic therapy starting at t 0 with Drug A assigned at t 0 +m ≤ t < t 0 +(m+ f ) and Drug B at t 0 +(m+ f ) ≤ t < t 0 + (m + 1) for m = 0, 1, 2, 3, .... Then, by Proposition A.1 and the definitions of A and B, . And, V 0 (t) represents instantaneous drug switching.
For any t > 0 and any positive integer n, there exists = (n, t) such that Then by the squeeze theorem, lim t→0 (n, t) = 0 for any positive integer n, and lim n→∞ t n (n, t) = 1 for any t > 0.

A.3: Population dynamics with the optimal regimen
In this section, we want to write the differential equations of V = {A R , B R } under the optimal control strategy described in Section 3.3. Based on "Appendix A.2" and a couple of lemma/theorem, we will reach to a concise form of a differential system described at Theorem A.11.
Theorem A.10 In Stage 2 of the optimal strategy, both A R and B R change with a constant net proliferation rate, Proof Without a loss of generality, let us prove it only when A/B(0) < (A/B) * . If A/B(0) < (A/B) * , Drug A has a better effect initially. So following the optimal therapy scheduling, Drug A is assigned alone at the beginning as long as T A min = T min ( p A , p B , A/B(0)) (Stage 1), and then, Stage 2 starts at T A min with initial condition By Lemma A.9, V (T A min ) is an eigenvector of D * with the corresponding eigenvalue, λ. Then, the solution of (**2) with the initial value (**1) is Proof Straightforward, by Theorem A.10 Contour maps of T gap over ranges of 10 ≤ a ≤ 100 for a ∈ {−s 1 , −s 2 , r 1 , r 2 } = {−s 1 , −s 2 , r 1 , r 2 }/g 1 and r 1 r 2 < s 1 s 2 (Condition (6)). As −s 2 decreases and/or r 2 increases, the optimal switching timing to the second drug is delayed (T min ↑ and T gap ↓). As r 1 increases, T gap decreases. Also, T gap and s 1 have a non-monotonic relationship as shown on the graphs In general, cells mutate slower than they proliferate , so we ran sensitivity analysis on T gap for all a 1 for a ∈ {−s 1 , −s 2 , r 1 , r 2 }. Figure 12 shows T gap over the range of 20 ≤ −s 1 , −s 2 , r 1 , r 2 ≤ 100. So, under the assumption that g 1 min{−s 1 , −s 2 , r 1 , r 2 }, T gap ({s 1 , r 1 }, {s 2 , r 2 }) ≈ ln −s 1 (r 1 − s 2 ) r 1 (r 2 − s 1 ) r 1 − s 1 , which approximates the contour curves in Fig. 12. 2. Sensitivity analysis of f * Regarding the regulated intensities among the two drugs, k * , we assumed that g 1 ≈ g 2 := g, similarly assuming that they are both much smaller than {−s 1 , −s 2 , r 1 , r 2 }. Then, we normalized all the parameters with the unit of g, like {s 1 , r 1 |s 2 , r 2 } := 1 g {s 1 , r 1 |s 2 , r 2 }. Contour maps of f * over ranges of 10 ≤ a ≤ 100 for a ∈ {−s 1 , −s 2 , r 1 , r 2 } = {−s 1 , −s 2 , r 1 , r 2 }/g and r 1 r 2 < s 1 s 2 (Condition 6). k * (or f * ) increases, as r 1 and/or −s 1 decreases and/or as r 2 and/or −s 2 increases k * can be rewritten in terms of the dimensionless parameters.

Appendix C: Clinical implementation of instantaneous switch in the optimal strategy
In clinical practice, the instantaneous drug switch which we suggest in the second stage of the optimal treatment scheduling is not implementable. Therefore, we compared similar schedules to the optimal case. In the "similar" schedules, the first stage, using an initial drug, remained the same as the optimal schedule. However the second part, Fig. 15 Graphs showing regular drug switching in Stage 2 with different { t, k( t, p A , p B )}: t = 1 day (blue), t = 4 days (red), t = 7 days (green), and t = 10 days (magenta). Parameters/conditions: p A = {−0.18, 0.008, 0.00075}/day, p B = {−0.9, 0.016, 0.00125}/day and {A 0 R , B 0 R } = {0.1, 0.9}, a Total population histories, C n P for n ∈ {1, 4, 7, 10} days, b Differences between the optimal population history C * P (i.e., when t = 0) and each case with positive t (i.e., C n P − C * P ). The inserts interesting ranges. c and d are equivalent with (a) and (b) except that k * ( p A , p B )} has been used instead of k( t, p A , p B )} (Color  figure online) where we previously used an instantaneous switch (with t = 0), was modified to use a fast switch ( t 0). Figure 15a, b shows how instantaneous switching ( t = 0) and fast switching (multiple choices of t 0) compare in terms of population size using different drug parameters. As expected, the smaller t is , the closer to the ideal case. And, a choice of a reasonably small t (like 1 day or 3 days) results in an outcome quite close to the optimal scenario.
We repeated this exercise with k * (from Eq. 12) instead of k( t) modulated by t (Fig. 15c, d). Only small differences are observed between Fig. 15a, b and c, d which justifies the general usefulness of k * independent of t.