Envelope Simulation: A New Adaptive Method for Dose Optimization

We present a flexible and general adaptive experimental design for dose finding clinical trials. This method can be applied in sterotypical settings such as determining a maximum tolerated dose, when the dose response relationship is complex, or when the response is an arbitrary quantitative measure. Our design generalizes dose finding methods such as the continual reassessment method (CRM), modified CRM, and estimation with overdose control (EWOC). Similar to those, our design requires a working mathematical model, which assures efficiency, low bias, and a higher fraction of dose selected near the optimum compared to purely operational designs. Unlike typical model based designs that always employ the same model, our method allows individual dose response models tailored to the circumstance. Simulations are also integral to our design allowing it to account for both known and unknown effects of dose on outcome. This method is applicable to general dose finding problems such as those encountered with modern targeted anti-cancer agents, immunotherapies, and titrations against biomarker outcome measures. The method can also support improvements in the design of the experiment being conducted by providing a platform for concurrent simulations to assess the influence of projected design points. Although we present the design in the context of clinical trials, it is equally applicable to experiments with non-human subjects.

Introduction 1 the continual reassessment method (CRM) and its relatives. Both the modified CRM 23 and estimation with overdose control (EWOC) [2] are in this class of designs, because 24 among other things they both use dose-response models to interpolate and extrapolate 25 data to estimate the target dose. Model guided designs are more efficient and have 26 superior operating characteristics compared to empirical designs [3,7]. In this paper, we 27 will not review the important nuances of these historic designs. Our focus is on broadly 28 applicable methods. 29 Empirical designs remain widely used in cancer therapeutics development and are 30 often endorsed implicitly or explicitly by important research sponsors such as FDA and 31 NIH. However, model guided dose escalation has been shown to require fewer subjects, 32 have no bias towards low doses, treat a higher proportion of study participants near 33 optimal doses, and outperform empirical methods in essentially all performance domains 34 that have been tested by statisticians [7]. The simplicity and clinical independence 35 offered by empirical designs appears to overcome operational superiority in the minds of 36 many. Outside cancer we do not routinely employ treatments with a high frequency of 37 side effects. An optimal biologic dose for antibiotics might be a maximum nontoxic dose. 38 For antihypertensives and analgesics, the optimum might be a minimum effective dose. 39 A good example is aspirin: when used as an anti-inflammatory the dose needed could be 40 as high as maximum tolerated if the underlying severity of the condition requires it. As 41 an analgesic, aspirin dose is maximum nontoxic. As a preventive, the dose is minimum 42 effective. 43 Inferior and inappropriate designs are often applied where they can't succeed. This 44 can happen when empirical dose escalation designs are uncritically applied to more 45 general questions, especially outside the domain of cytotoxic drugs. But the absence of 46 general dose finding methods pushes clinical investigators to employ standard designs 47 even when they are not suitable for the underlying question. One limitation of MTD 48 study designs is that the response scale is a probability. However, many essential 49 questions require quantitative responses that are not probabilities. Examples include 50 immunological reactivity, pharmacokinetic outcomes, target binding, or tissue 51 concentrations. Translational needs such as optimizing dose for quantitative biomarker 52 outcomes is another example. There have not been any experiment design methods that 53 facilitate optimal dose finding in such circumstances. The method we present satisfies these requirements. Our method contains both 67 deterministic and stochastic components. The deterministic portion is model guided, 68 but is not restricted to a single model. Models confer efficiency and accuracy under 69 fairly general conditions, but also allow flexibility because the mathematical construct 70 can be tailored to essential features of the problem. A monotonically increasing model 71 might be appropriate for MTD questions, however one with a peak and fall-off would be 72 more appropriate for questions about dose maxima. Any model that mimics biological 73 behavior without being restrictive could be useful in appropriate places.

74
Our method is agnostic with regard to the actual model used. To augment efficiency 75 and flexibility, the stochastic part of our method incorporates statistical simulations to 76 characterize variation. These intrinsic simulations represent a range of stochastic 77 behavior that investigators anticipate, but are difficult to capture quantitatively. For  To initiate dose finding, data are sampled randomly from the given envelope regions 83 and the model is fitted; repeating this simulation process a large number of times yields 84 a distribution of fitted features. The distribution captures both elements of the problem 85 specification, and allows the investigator to choose a starting dose in light of both 86 deterministic and random behavior. Updating this simulation process with actual data 87 as they are collected creates a learning algorithm conditional on the design assumptions. 88 We refer to this as Envelope Simulation (EnSim) guided dose finding.

90
The basic EnSim algorithm is shown in Table 1. Steps 1-4 are preparatory. Steps 5-9 91 iteratively update results with new data. During this process, the envelope data 92 diminish in influence in favor of the real data being collected.
Step 10 is optional but 93 offers the opportunity to strengthen the design of the trial. Details regarding each step 94 of the algorithm are provided in the following sections. Step Action Required  1  Identify an appropriate dose response model and  estimand of interest.  2  Use data or clinical insights to locate envelope  regions in the dose response plane to support the  model.  3 Choose simulation parameters such as number of points per envelope, point weights, and number of repetitions. 4 Check that the model and envelopes are consistent with each other, and with biological knowledge. 5 Run simulation and model fits and summarize the distribution of estimands as a histogram. 6 Use the histogram distribution to select an appropriate dose for the next cohort of study subjects. 7 Choose a cohort size. The algorithm does not dictate a cohort size. 8 Treat the next cohort and gather outcome data. 9 Include real outcome data in a new simulation cycle. 10 Optional: assess the sensitivity of model estimates to additional data and/or its location (design points) via simulation. 11 Repeat steps 5-10 until the distribution of estimands indicates that real data are dominating envelope samples.
where c, β, and d 50 are parameters to be estimated from data. The parameter c 110 represents the maximal response. If we restrict c = 1, Y (d) can then be taken as the distributions for the parameters (Bayesian method), or using pseudo-data [4] that serve 114 the same purpose. Dose escalations tend to be more about location (d 50 ) than slope (β), 115 and many alternative models might be employed for MTD determination, including a 116 simple line segment. The model does not have to be "true" across its full range -only 117 approximately correct locally for such methods to work nicely.

118
The logistic and similar models are monotonic and cannot represent true peak 119 responses. This biologic behavior could be seen with immunologic agents for example. 120 Features of such a shape could include a rising slope, an inflection point, a peak, a 121 post-peak plateau, and a dose scale factor. Hence we might anticipate that a model for 122 this purpose could incorporate five or more parameters. A flexible model that contains 123 this behavior is where M , K, α, v, β, w are free parameters with M > K. One parameter might be 125 eliminated by setting α = β without losing much flexibility. The response Y (d) is not 126 required to be a probability in such a model. A five-or six-parameter model is capable 127 of a wide range of behavior and would have to be supported by substantial data both 128 during its initiation and when drawing biological conclusions. The use of this model will 129 be illustrated below.

130
If the biological question requires finding the maximum dose response, one might 131 consider a simple model with a well defined maximum, even if it is not derived from 132 underlying biology. One such model is a parabola, which points in the correct direction 133 if parameterized as where the peak will be at α, and e β is guaranteed to be positive. This model has only 3 135 parameters. Finding a peak using a model is not purely a location problem because of 136 the shape of data on each side of the peak, so the stiffness of a parabolic form may be a 137 problem.

138
Other mathematical forms that can point to a response maximum at some dose α 139 are bilinear models with a breakpoint at the maximum. These include which is symmetric, and which is asymmetric around α by virtue of a fourth parameter. This last model can be 142 an efficient locator of maxima despite its lack of biological realism, as illustrated below. 143 A model that strictly saturates a maximal response c is With c = 1, this model can also represent a probability of response. Because Y (0) = 0, 145 it does not represent a background or a threshold.

146
The allometric form is another simple, flexible, biologically grounded descriptor of general relationships.
where a represents background response, m is threshold dose, and c is the maximal 150 response. This is quite a flexible model that comes at the expense of five parameters.

151
The message here is not that any of these is a universal model, but that the EnSim 152 algorithm can incorporate any deterministic part provided that the model a) is tractable 153 when fitting data, b) is not over-parameterized, and c) can be supported by data from 154 the experiment. In other words, we have to be able to fit our chosen model to relatively 155 sparse data. The purpose of dose finding is not model validation, but is to extract 156 biologically useful features from data using tools that facilitate without interfering. The 157 model eventually takes a background role to the data. Key inferences are not likely to 158 depend on nuances in the model.

159
Parameterization is always a concern when model fitting. For the models above, 160 certain parameters must be positive for the equation to make biological sense. For 161 example, the scaling factor for dose should be positive the way the models are written. 162 The asymptotic response at high doses should also be positive for these models, 163 although this does not have to be true universally. In Eq 2, it should be the case that     Nor is it troublesome to hypothesize an appropriate mathematical model. Reasonable

195
starting conditions are all that is necessary because real data will quickly dominate in 196 the algorithm.

197
Envelopes embody both knowledge and ignorance. They do not literally specify 198 dose-response pairs -that would seem to indicate strong knowledge. Sketching a region, 199 all points of which are equally likely to contribute data, characterizes also what we do 200 not know. Low weights or utilities can be assigned to simulated data from an envelope 201 so their influence on model fitting is small compared to real data. This further 202 emphasizes our ignorance about the truth of nature. Although the influence of 203 envelopes will ultimately be small in the presence of real data, they are essential 1) early 204 in the dose finding algorithm to specify the model, and 2) to support the model in 205 regions where actual data are sparse or cannot be obtained.

206
Several envelopes need to be specified at the initiation of a dose finding trial. Their 207 locations must be consistent with the shape of the chosen model, and constrain it to 208 plausible regions. In some cases, envelopes might overlap. But care is needed because 209 only a few hypothetical observations will be drawn from each envelope. We do not want 210 to create inconsistencies between the simulated data and the ability of the model to  The initiation of the algorithm relies on sparse samples from each envelope. Fitting 223 a multiparameter model to sparse data is delicate, and model fits might be poor or 224 impossible in some simulation replicates. This problem can be reduced or eliminated by 225 sampling more points from each envelope or from the most influential ones, adding more 226 envelopes, making them smaller, or making them jointly more consistent with model  rely on real observations for subsequent model fits. Even at that point, a mix of actual 239 and simulated data may be needed because the experiment may not gather real data 240 from certain dose ranges. Simulated data from those ranges will not be strongly 241 influential in the region of interest, but may be necessary for a good overall model fit.

242
Removing envelopes converts the algorithm essentially into a generalized CRM. 243 We do not illustrate it in this paper, but it would be simple to weight envelope data 244 dynamically as the number of real data points increases. For example, the weight of 245 envelope data could be taken to be 1/(1 + n) where n is the number of real data points. 246 As with a fixed fractional weight, this guarantees that final model fits will be dominated 247 by real data.

248
As the influence of envelopes vanishes we will be left with ordinary model fitting 249 issues. These include extracting the estimand or feature of interest, precision in 250 estimating parameters, lack of fit, and bias due to lack of data in certain ranges.

251
Resource limited dose finding studies yield sparse data unlikely to overcome all these 252 problems easily. Simulation mitigates the difficulties. The limitations of sparse data are 253 universal and should not be taken as shortcomings of any underlying method. Adequate 254 real data will overcome a multitude of problems, and could even indicate deficiencies in 255 the model being employed. It is perfectly reasonable to choose a better model at any 256 time the data indicate, but those cases will not be illustrated in this paper. The estimand is the data or model feature estimated following the model fit. For a 264 classic cytotoxic dose escalation design, the estimand might be the dose at the intended 265 quantile, for example the dose that yields a 30% probability of a dose limiting toxicity. 266 For detecting a maximum response, the estimand would be the dose under the peak of 267 the fitted model. For measured biomarkers, the estimand might be the dose that yields 268 a prespecified level of response.

269
The estimand might be obtained using any of three techniques. All three methods 270 employ the parameter estimates from the current simulation replicate as though they 271 were fixed constants. One method is algebraic calculation from the model parameter  revealed in the fitted model parameters. Even when real data dominate the simulated 314 data, the model can still fit poorly, or the data may be barely able to support the 315 number of parameters in the model, or resolve nuances in its shape. One could not claim 316 in such cases that being free from envelope simulated data is a complete solution to the 317 problem. More data will always help unless the model is systematically inappropriate. 318 Some issues surrounding the design remain. These include the cohort size, number of 319 sample points to draw randomly from each envelope, whether or not they should be the 320 same for each envelope, how to assign weights for simulated data in model fits, and how 321 many simulations to perform. In developing and testing this methodology, some answers 322 to these questions have emerged. Because each real observation has much higher weight 323 than an envelope sample, relatively small cohorts from 1-3 observations work nicely. If 324 for some reason envelopes are given more weight at initiation, cohort size should 325 probably be increased to aim for a ten-fold or higher ratio. Assume there is one  simulations between 100-1000 seems to work well. This provides a reasonable resolution 337 of the sampling distribution without consuming much computer time.

338
With sparse data there usually is insufficient information to indicate 339 mis-specification of the model or envelopes. In dose escalations, mis-specification can be 340 seen if the high-dose high-response envelopes are set too low. Real data might then 341 readily show inconsistency with those envelopes in the "upper right" region of the dose 342 response plane. Practical considerations suggest that the best course of action is simply 343 to specify or revise appropriate envelopes given the information at hand, even if some of 344 that information comes after the start of the experiment. Detecting model

362
A computer program written in C++ for Windows is also available for beta testing 363 from the author. This program has a small suite of models (n=12) prespecified.

364
Envelopes and subject data enter the program though a simple GUI. This program is 365 straightforward and fast, but cannot be modified by the user. Because of its speed, it 366 might be particularly useful for investigators needing simulations to optimize 367 experimental designs. Source code is not presently available. We have also implemented 368 the EnSIm algorithm in R. Example 2 below will illustrate it. The R code is available 369 with this publication. peak response (c), dose at peak response (α), slope for dose response below peak (e β ), 375 and slope for dose response above peak (e γ ). The exponential forms for β and γ are  Reliable estimation of the critical dose requires data on both sides of the peak. We 382 assume that several design points to the left of the peak were selected and measured.

383
While an actual trial such as this might carry ethics implications for the order and level 384 of doses, we ignore them here to focus on the methods and software. After fitting data 385 gathered to the left of peak, the results are shown in Fig , middle Table 2. Because of the way this model is parameterized, the estimand is the 393 parameter α. We can see there remains some imprecision associated with the estimate 394 of α due to the relatively small size of the experiment. This example applies EnSim to an actual clinical trial from the literature. The trial in 397 question is a dose escalation study of O 6 -benzylguanine (O6BG) for patients undergoing 398 surgery for malignant glioma [9]. The original data were used as an example by Ivanova 399 and Kim [10] and we use data from their Table 1  pre-planned escalation. We will also conducted stepwise fits to the cohort data after 415 initiating the algorithm with two envelopes, one at low dose and a second at high dose. 416 Our EnSim algorithm used 1000 simulation iterations, and 3 simulated data points per 417 envelope each with weight 0.1.

418
Results in Fig illustrates that with which is the fractional change or sensitivity when the respective point is omitted. It 456 might make sense to gather additional real data at or near those doses that have the 457 most influence on the feature (estimate) of interest. An example of this strategy is shown in Table 3 Table 3, the overall error variance, R 2 adjusted for the number of parameters, 475 and the finite sample corrected Akaike Information Criterion (AICc).

476
These model fit statistics indicate also a focus on the highest dose point. When it is 477 excluded, the error variance decreases, the R 2 increases, and the AIC indicates To illustrate the second optimization method, forward looking simulations can be used. 484 We can do this by specifying a new small envelope that represents a hypothetical Operating characteristics (OC) are the performance of an algorithm when the truth of 501 nature is known (assumed). A method that performs well under a variety of 502 assumptions is likely to reflect the truth when it is uncertain. Conversely, a method 503 that performs poorly under known conditions is also likely to perform poorly under 504 unknown conditions. The weights for envelope samples used by EnSim are so small 505 compared to those for real data that the algorithm essentially becomes a CRM late in 506 its cycles, known to have good OCs. In the important but narrow class of dose finding 507 questions where a maximum tolerated dose (MTD) is sought, CRM methods are known 508 to perform better than operational designs such as the 3+3. 509 We have compared the EnSim and 3+3 design OC for a small sample of MTD for EnSim compared to a 3+3 design for MTD determination. The true dose response 514 function is shown in blue where the 0.33 quantile is 48. A highly reliable design will 515 stop close to that on average. A very square OC is desirable as it indicates little 516 variation in stopping close to the target. In this example, the median EnSim stopping 517 point was 47 with a true probability of 0.31, based on 100 replicates. The EnSim 518 recommended dose was taken to be the median of the dose distribution for every 519 simulation. The 3+3 algorithm showed a 90% chance of stopping at or below 48 based 520 on 1000 replicates. Hence the 3+3 algorithm is strongly biased on the low side. These 521 results are typical of wider behavior.

522
These OC comparisons are unfair to a poor design such as the 3+3 for the following 523 reason. Because the 3+3 uses a decision rule based on 1 MTD out of a cohort of 3 does 524 not mean that it is titrating to a probability response of 0. nothing to compare it against for the general class of dose optimization problems.

575
EnSim is not a Bayesian method strictly speaking. It has more of the spirit of 576 multiple imputation for missing data. However, there is a strong conceptual similarity 577 between assuming a prior probability distribution for unknown model parameters in a 578 true Bayesian framework versus specifying pseudo-data to accomplish the same aim.

579
Fitting pseudo-data with a model explicitly yields parameter estimates with an implied 580 joint probability distribution. For a statistician this may seem an awkward way to reach 581 a "prior" probability distribution, but for clinical investigators it may fit intuition very 582 much better.

583
Aside from being a method to find the dose optimum in a specific biological 584 question, EnSim contains the framework to improve the experiment design during its 585 conduct. While this is a secondary goal and might be approached in various ways, the 586 creation of a model, a stable fit to real data, and a framework for simulation opens a 587 door to optimizing the design along with the dose. It remains to be seen if this 588 approach can simplify complex design problems. However it does appear perfectly 589 feasible to sharpen up a dose finding design near its conclusion, for example by adding a 590 small number of design points if they appear likely to improve estimation.

591
In conclusion, EnSim is a new method but familiar in the domain of model guided 592 dose finding. It is simple and flexible, and sure to be operationally superior to empirical 593 designs. It may also be relatively easy for clinical investigators to embrace. For these 594 reasons we believe this method deserves practical application to assess its strengths and 595 weaknesses.     Table 2. 623 Figure 6. EnSim applied to the trial data from Table 1 in Ivanova and   624 Kim [10] (resampled data from Friedman et al. [9]) as explained in the text. 625 The histograms are based on 1000 simulations. The model recommended dose is