Literature DB >> 35662077

Extending the Continual Reassessment Method to accommodate step-up dosing in Phase I trials.

Abstract

The Continual Reassessment Method (CRM) was developed for Phase I trials to identify a maximum-tolerated dose of an agent using a design in which each participant is treated with a single administration of the agent. We propose an extension of the CRM in which participants receive multiple administrations of an agent using a so-called step-up dosing procedure in which participants receive one or more administrations of lower doses of the agent before they receive their penultimate dose. We use methods developed for the CRM to model the probability of DLT for each administration, which leads to the use of conditional probability models to model the joint probability of DLT across multiple administrations. We compare our approach to two existing methods that use time-to-event modeling methods for modeling the probability of DLT. We demonstrate through simulations that our approach has operating characteristics similar to existing methods, but due to its foundations in the CRM, ours is simpler to implement than existing approaches and is therefore more likely to be adopted in practice.

Entities: Chemical

Keywords: Bayesian inference; adaptive clinical trial; conditional probability; dose escalation; fractionated dosing; up-titration

Mesh：

Year: 2022 PMID： 35662077 PMCID： PMC9546169 DOI： 10.1002/sim.9487

Source DB: PubMed Journal: Stat Med ISSN： 0277-6715 Impact factor: 2.497

INTRODUCTION

Phase I trials in oncology are designed to identify safe, and possibly effective, doses of prospective treatments for cancer. A number of Bayesian adaptive designs for Phase I trials have been developed over the past three decades, the first of which was the Continual Reassessment Method (CRM). In these designs, study participants are followed during a fixed period of time for the occurrence of a dose‐limiting toxicity (DLT), which is often defined as any treatment‐emergent adverse event of toxicity grade 3 or higher according to Common Terminology Criteria for Adverse Events (CTCAE) criteria. In immunotherapy, DLTs often include cytokine release syndrome (CRS), which is a systemic inflammatory disease characterized by a massive release of cytokine. CRS can present with a variety of symptoms ranging from those that are mild to those that are life threatening, and sometimes fatal. Mild symptoms of CRS include fever, fatigue, nausea, vomiting, headache, rash, arthralgia, myalgia, and malaise. Several case reports have documented CRS in cancer patients treated with immune‐checkpoint inhibitors. , , , , , , , Recently, in the context of CAR T‐cells, Stein et al have proposed a model‐based approach to retrospectively characterize the kinetics of tisagenlecleucel, and its relationship with the emergence of CRS. In their work, they share concerns that co‐medication with steroids or tocilizumab may not suffice to mitigate CRS. Instead, in a more recent and physiological‐based model, Jiang et al suggest that a stepwise dosing of blinatumomab in patients with non‐Hodgkin's lymphoma may successfully alleviate release of interleukin‐6 and prevent CRS adverse events to occur. To this end, a post‐marketing authorization study, listed on ClinicalTrials.gov with identifier NCT01029366, was designed to assess the benefit of fractionated dosing in adult patients with relapsed/refractory acute lymphocytic leukemia treated by CAR T‐cell therapy. Patients received either a low dose of tisagenlecleucel as a single infusion (LDS), or a high dose of tisagenlecleucel as a single infusion (HDS), or the high dose was split into fractions (HDF), with 10% of the dose delivered on day 1, 30% on day 2, and the remaining 60% on day 3. In the cohorts treated with a fractionated dosing approach, D2 and/or D3 doses were held if the patient experienced early signs of CRS, including fever. The frequency of grades 4 and 5 CRS was markedly lower in the HDF cohort (1 out of 20 patients) compared to the LDS (2 out of 9) or the HDS (3 out of 6) cohorts. As a result, fractionated or step‐up dosing (SUD) has been proposed in other programs and seems to be a promising treatment option to mitigate CRS. However, challenges arise in the design and analysis of DLT data collected during dose escalation studies involving fractionated dosing. As the goal is now to identify a maximum tolerated dose (MTD) that is preceded by one or more lower doses, which we call a maximum‐tolerated schedule (MTS), the first challenge is to deal with probability of a DLT occurring after any of several consecutive dosing events. Both Colin et al and Fernandez et al proposed a logistic regression model with a Markov component for deal with DLTs observed over multiple cycles. Assuming that the drug was administered in consecutive cycles of similar length, an additive time homogeneous transition model could be specified as where, for patient receiving cycle , is the probability of toxicity, is the scaled dose, and is the cumulative administered scaled doses received. Respectively, the parameters , , and capture the probability of a DLT on cycle 1, the effect of cumulative dose from previous cycles, and the dependency in short‐term toxicity outcomes between cycles (Markov term). A limitation of this model is that there must be sufficient cycles in the DLT assessment period to allow a plausible estimation of and . Also, Fernandez et al did not consider the setting in which dose varies across dosing events during the DLT assessment period, making their design unsuitable to the situation of fractionated dosing. A second challenge in the design of fractionated dosing studies occurs because later planned administrations are omitted once a participant experiences a DLT after earlier administrations. Indeed, it is routine practice to discontinue or hold treatment in patients who experience DLT. If the intention is to develop a dose‐toxicity model akin to those used in the CRM, the missing administrations and resulting DLT outcomes will inflate the uncertainty associated with the model parameters and make the dose recommended for the next cohort less robust. In addition, by using SUD, we assume the dose of the first administration is lower than or equal to the dose of the second administration, which is lower than or equal to the dose of the third administration. Through this treatment plan, the absence of DLT associated with the second and/or third doses will lead to greater emphasis on DLT outcomes observed at lower doses, resulting in over‐conservative dose escalation and loss of the convergence benefits of the CRM. To circumvent this problem, Braun and colleagues , modeled DLT as an absorbing state that precludes any further administrations, using time‐to‐event concepts rather than logistic regression methods. As originally conceived, this work assumed that the hazard of DLT for a single administration increases linearly to a known future point in time and then decreases linearly to zero to a later point in time. The total hazard for several administrations is simply the sum of the hazards of the individual administrations that have been received, and the probability of DLT is based upon the survivor function derived from the cumulative hazard. By modeling the hazard, this approach moves away from a binomial likelihood and instead uses the likelihood of survival models, which accounts for partial follow‐up of participants, much like the time‐to‐event Continual Reassessment Method (TITE‐CRM) was developed to account for partial follow‐up in Phase I trials designed with the CRM. In a similar vein, Gunhan et al recently modeled the probability of DLT through a hazard function for each administration like the methods of Braun and colleagues. However, Gunhan et al chose to relate the DLT hazard to a parametric pharmacokinetic (PK) model quantifying the cumulative drug exposure over time. Gerard and colleagues also created a design that integrates PK and pharmacodynamic (PD) models into the design. Given that in a more general setting, it may be unclear how to order consecutive administrations of doses with respect to their cumulative probabilities of DLT, Wages et al presented an approach to incorporate partial ordering constraints and used existing CRM modeling for each administration. Like many adaptive Phase I trial methods, many view the statistical underpinnings of dose and schedule‐finding methods as too complex, a view which tends to hinder their implementation in practice. Thus, as an alternative to the time‐to‐event approaches cited earlier, we have adopted the longitudinal binary outcome view used by Fernandez et al. However, our model is based on the CRM, which is a widely accepted adaptive design for Phase I trials of single administrations. By using the CRM as a framework, we hope to reach our primary goal with a simpler, more accepted model, that performs as well as more complex methods, but is more likely to be adopted by a wider audience of clinical trialists. The underlying specifics of our design can be found in Section 2. Via simulation, we compare the operating characteristics of our design to the designs of Braun et al and Gunhan et al in Section 3, and we present our concluding thoughts in Section 4.

METHODS

Notation

We have a study designed to examine a set of predefined treatment schedules, each of which is a series of administrations of an investigational agent. The study will enroll a total of participants, each of whom will be assigned to one of the treatment schedules. We denote schedule as , in which is the value for the dose of the agent given at administration in schedule . We will describe how to select a value for in a later section. Our primary goal is to identify which treatment schedule has a probability of DLT at the end of follow‐up for the entire schedule closest to a targeted probability . For , we let denote the span of time between administrations and . For each administration, we let denote the amount of follow‐up observed for participant after receiving administration , and we let denote the proportion of completed follow‐up. At the start of administration , we set the DLT indicator , which changes to if participant experiences a DLT before . Once , we also set , that is, we assume complete follow‐up occurs, reflecting a recommendation of treatment discontinuation after a patient experiences a DLT. Furthermore, once , participant receives no further planned administrations, while if when , participant receives their next administration.

Model

For each administration, we model the marginal probability of DLT with a model traditionally used in the CRM, known as the “power” or “empiric” model. First, assuming that patient is assigned to schedule , we let be the value assigned to the dose given to participant at administration , and we assume in which , for . Note that we place no restriction on the value of because the DLT probability for the first administration can take any value in . However, we do place a non‐negativity constraint on each of to enforce an ordering constraint. Specifically, we assume that the cumulative probability of DLT cannot decrease with additional administrations. This restriction could be removed for another setting should this assumption not hold. However, less restriction on the parameters may also impact the ability to sufficiently identify those parameters, requiring continued focus on both appropriate prior distributions and the number of study participants needed to collect sufficient data to estimate those parameters. Although our model can theoretically accommodate any number of administrations, most practical settings will examine a handful of administrations at most. Furthermore, we highlight that each administration corresponds to an additional model parameter . Given the relatively small sample sizes used in dose‐finding trials, the number of administrations will have to be limited without strong constraints or assumptions placed on model parameters. As a result, the remainder of this manuscript will focus upon our motivating example that studied administrations. We will assume has a normal prior distribution with mean and SD , while and each have exponential prior distributions with respective means and . Note that , , and are a priori independent of each other, and we will present a systematic approach for selecting values for the four hyperparameters in Section 2.5. Because further planned administrations are not given to participants who experience DLT, the observed data cannot be used to directly estimate and . Instead, the observed data allow us to estimate the conditional probabilities of DLT for the second and third administrations, given no DLT was observed in all prior administrations. Nonetheless, the conditional probabilities are easy to generate from our model. Specifically, as derived in the Appendix, we have:

Likelihood

Thus, at any point in the trial, we have enrolled a total of participants, who belong to one of three groups: those who have received the first administration, those who have received the first and second administrations, and those who have received all three administrations. We denote these three groups as , and , respectively. We let denote the collective set of doses, DLT outcomes, and lengths of follow‐up for all participants in , with corresponding definitions for and . Thus, each of the three groups has a respective likelihood equal to: which leads to the overall likelihood Note that the weights used in the likelihood are akin to those used in the time‐to‐event CRM (TITE‐CRM), which assume that DLTs occur uniformly during the follow‐up period, and are related to a cure model for the distribution of times to DLT.

Specifying skeleton

Prior to study start, thought must be given to the numeric value assigned to each dose given at each administration. Given the complexity of selecting appropriate values for all of the administrations, we have developed a systematic approach to identifying values that allow for generally good operating characteristics across many settings, as we will show in Section 3. We start by assigning , where is a value relatively close to zero and can be seen as an approximate probability of DLT for the lowest dose examined in the study. In our application, we use ; see Table 1. For the remaining first doses of schedules , we choose values that increase by an amount defined by an odds ratio (“b” is for between‐schedules), that is, for , We then select two additional odds ratios and (“w” is for within‐schedules) such that for , and Values of the odds ratios should be sufficiently large enough so that successive schedules have DLT probabilities that are distinct enough from each other and promote discrimination between them with the traditional sample sizes used in dose‐finding studies. We have found that appropriate values for each of the odds ratios are generally between 1.5 and 2.5, but they need to be determined in conjunction with the hyperparameter values described next. Selection of appropriate values for the odds ratios also can be informed by existing clinical information or previous dose‐escalation studies that might inform how doses vary in their DLT probabilities and the cumulative effects of repeated dosing. Nonetheless, like any regression model, estimation of model parameters is facilitated by variability in the predictor variable, which, in our setting, are the values assigned to each administration. Even if one believes, for example, that should be 1.1, such a study will be hard to implement without an unrealistically large number of participants, and doses with greater variability should be considered.

TABLE 1

Actual dose values, assigned skeleton values, and hypothetical true toxicity probabilities for motivating example presented in Section 3

	Administration			Administration
	1	2	3	1	2	3
Schedule	Actual dose values			Skeleton dose values
1	0.006	0.018	0.018	0.03	0.05	0.05
2	0.030	0.090	0.090	0.06	0.10	0.10
3	0.090	0.270	0.270	0.10	0.16	0.16
4	0.270	0.800	0.800	0.16	0.25	0.25
5	0.800	2.400	2.400	0.25	0.36	0.36
6	2.400	7.200	7.200	0.36	0.50	0.50

Note: Boldface represents the optimal schedule, i.e. the schedule with toxicity probability closest to 0.25 after the third administration.

Actual dose values, assigned skeleton values, and hypothetical true toxicity probabilities for motivating example presented in Section 3 Note: Boldface represents the optimal schedule, i.e. the schedule with toxicity probability closest to 0.25 after the third administration.

Specifying hyperparameters

In order to determine appropriate prior means for each of , , and , we take Equations (1) and (2) and define , which is the same value for every participant. With regard to the prior mean for , we identify an appropriate value through the values assigned to the set of first doses among the schedules. Based upon a second‐order Taylor series expansion for , we outline in the Appendix that a suitable value for the prior mean is , which is the average of the transformed dose values. We then select , so that the exponent in has mean For the prior means of and , we first select a value , such that and , so that the probability of DLT within‐schedule is assumed to increase proportionally with each administration. First focusing on the prior mean for , we start with the fact that , which leads to . If we set and given that , we use the resulting equation for the prior mean of , that is, By similar algebra, we find , which leads to the value Using these four hyperparameter values, we generate many draws of , , and from their respective prior distributions, which allows us to compute many realizations of the probability of DLT for each administration of each schedule. Averaging over the realizations gives us an expected a priori probability of DLT for each administration of each schedule. These averages then allow us to see which schedule is now assumed to be the a priori best schedule and whether or not the prior distributions should be modified to produce suitable a priori DLT probabilities. For example, we would not want prior distributions that suggest that the first schedule is the best schedule, as this would likely be too informative and lead to poor operating characteristics if the true best schedule were the last schedule. To help the reader apply these ideas in practice, we will explore these thoughts in greater detail in Section 3 when presenting the prior distributions developed for the simulation study.

Dose assignment algorithm

Once all necessary study design parameters, including sample size, skeleton dose values, and prior distributions, have been identified, participant assignments are made adaptively as follows: The first participant is assigned to the schedule with the lowest starting dose; Once a new participant , can be enrolled, the accrued data for participants are used to compute the posterior distributions of probability of DLT at the end of follow‐up for each schedule; Per a predefined stopping rule, if the accrued data suggest that the cumulative DLT probability after the follow‐up for the last administration in first schedule is unacceptably high, the study ends early and no further accrual occurs because all schedules are considered unsafe. If the stopping rule is not met, the new participant is assigned to the schedule with cumulative posterior mean probability at the end of follow‐up closest to a desired target probability , subject to any restrictions on assignments. Repeat steps 2‐4 as each new participant is accrued, or until the study is stopped. If the study has not stopped accrual, once participant has completed their follow‐up: Use all of the accrued data to compute the cumulative posterior mean probability of DLT at the end of follow‐up for each schedule; Select the schedule with posterior mean probability closest to as the maximum tolerated schedule. As suggested in step 3 above, Phase I trials often include a stopping rule when excessive toxicity is observed for the first schedule. Although stopping rules can be based upon posterior DLT probabilities, we have chosen instead to use a frequentist‐based stopping rule that is independent of the model used for computing DLT probabilities. Specifically, we use the observed number of DLTs seen for participants assigned to the first schedule to compute a one‐sided 95% confidence interval for the true DLT probability of the lowest schedule. If the lower bound of this confidence interval is higher than the targeted DLT probability , the trial is stopped and future accrual is terminated. For example, the study would be stopped if four out of five participants assigned to schedule 1 experienced a DLT. This stopping rule is implemented in step 3 of the algorithm above and will be assessed in Section 3 with a setting in which all schedules have excessive probability of DLT. In step 4 of the algorithm above, many possible restrictions on assignments can occur. For example, it is common to require that a schedule cannot be assigned to a participant until all lower schedules have been assigned to at least participants, or that participants must be enrolled in cohorts of size , that is, all members of the same cohort must be given the same assignment. A common choice is , which is adopted from the so‐called 3+3 algorithm. We could also require that at least participants must have completed their follow‐up on the same schedule before higher schedules can be assigned. Note that our parametric model, through the use of prior distributions, is able to estimate the DLT probabilities of later administrations without the direct observation of individuals who have received those administrations. Discomfort with decision‐making on incomplete data is mitigated directly through these restrictions. Furthermore, one might also consider the entire posterior distribution, beyond the posterior mean, when making assignment decisions. This approach, commonly referred to as Escalation with Overdose Control (EWOC), examines how much mass of the posterior distribution lies above a threshold for each schedule's cumulative DLT probability. An upper bound is placed upon how much mass is acceptable and only schedules meeting this criterion are considered for assignment. Thus, it is possible that a schedule may have a posterior mean DLT probability that is closest to the targeted DLT probability, but its entire posterior distribution may be skewed or have too large a variance to be certain that the schedule is safe enough to assign to the next participant.

SIMULATIONS

Motivating example

Our motivating example is a study of six schedules that follow a SUD plan. Each schedule consists of three administrations () of an experimental agent, each spaced apart by seven days (), and it is assumed that the dose used in each administration is no more than the dose used in the preceding administration. The actual clinical doses under study are shown in the first six rows of Table 1. Participants are followed for an additional seven days after the third administration, for a total planned follow‐up of 21 days given to each participant. After each administration, patients are continually followed for the occurrence of a DLT, defined as any grade 3 or higher adverse event (per National Cancer Institute CTCAE v5.0), or occurrence of CRS grade 3 or above, according to the ASTCT consensus. If a participant experiences a DLT, all further planned administrations are withheld for that participant, and they are considered to have complete follow‐up.

Simulation details

Our study wishes to identify which schedule is associated with a DLT probability by Day 21 closest to . A maximum of participants will be enrolled. Participants will be enrolled in cohorts of size and a schedule cannot be assigned to a participant unless at least participants have already been assigned to all lower schedules and at least participants have been fully followed on all lower schedules. Participant enrollment is assumed to follow a Poisson process with mean 21 days, that is, on average, each participant is enrolled after the previous participant has completed their follow‐up. Based upon , and , the skeleton values assigned to each dose are shown in the first six rows of Table 1. Based upon these skeleton values, we assign a prior normal distribution with mean and SD . Assuming a value defined in Section 2.5, we assign a prior exponential distribution with mean and a prior exponential distribution with mean . Figure 1A graphically presents the prior mean and SD for the DLT probabilities for each administration of each schedule that result from the hyperparameter and skeleton values just described. In this figure, we see that all three administrations of Schedules 1 and 2 are a priori likely to be safe, while Schedules 4‐6 are less likely to be so. Specifically, although each has a mean prior probability of DLT at Day 21 that is below the target of 0.25, all of the latter three schedules have sufficient variability so that much of their prior distributions exceed the target.

FIGURE 1

Visual representations of posterior DLT probability distributions resulting from three sets of hyperparameter values examined in Section 3. The height of each bar represents the posterior mean, and the length of each vertical line above the bar indicates the posterior SD. The horizontal dotted line represents targeted DLT probability of 0.25. S1 = schedule 1, S2 = schedule 2, , S6 = schedule 6 To demonstrate how the chosen hyperparameter values impact the resulting prior distributions for DLT probabilities, Figure 1B displays information analogous to Figure 1A, but when the prior mean for is 2.5 times larger. We now see stronger prior belief that all six schedules are safe, which might be implausible and lead to poor operating characteristics in a setting where Schedules 1 or 2 are the only acceptable schedules. Conversely, Figure 1C displays the resulting prior distributions when the prior means of and are each 2.5 larger. Now we see that there is greater prior uncertainty placed on all six schedules and we felt this prior would provide insufficient support to an algorithm incorporating data from only 30 participants. It is through these visual examinations that one can develop a suitable set of hyperparameter values before running any simulations. To assess the operating characteristics of our design, we have simulated 1000 hypothetical trials for each of six scenarios, in which schedule has a cumulative DLT rate after the third administration closest to in scenario . We also include a seventh scenario in which all schedules have Day 21 DLT probabilities above the target. The true DLT rates for each scenario are shown in Table 1. These DLT probabilities are not based on any specific model; values were selected so that neighboring schedules had DLT probabilities after the third administration spaced by approximately 10 percentage points, a difference that is biologically plausible, but is also large enough to allow for discriminating between the optimal schedule and its neighbors.

Details for comparator approaches

We will compare the operating characteristics of our design to two other designs, both of which model a hazard function for the time‐to‐DLT and implicitly generate a cumulative probability of DLT by Day 21. Note that we have modified both methods from what was originally published in order to (i) accommodate use of the skeleton values used with our proposed model, and (ii) allow for dose levels that vary within each schedule. For the first comparator, we generalize the work of Braun et al, , assuming each administration of the drug has a hazard function at time equal to which increases linearly to a value at time and then decreases linearly to zero at time . If a participant is assigned to a schedule of doses and dose is administered at time , then the total hazard of DLT at future time is which leads to a cumulative hazard function equal to and cumulative probability of DLT . The likelihood contribution at is where if a DLT has occurred at and is zero otherwise, and In our simulations, we assume a fixed value of , while has a prior Beta distribution with parameters and , and has an exponential distribution with mean 0.09. These values were selected so that the expected prior cumulative DLT probability at Day 21 for each schedule was similar to that used with the other methods. For the second comparator, we generalized the work of Gunhan et al, replacing Equation (4) above with the difference of two exponential decay functions All remaining equations above for the total and cumulative hazards and the cumulative probability of DLT are unchanged. In the simulations, we have assumed fixed values for both and , such that and . With regard to , we assume has a normal distribution with mean 2 and SD 1, again so that the expected prior cumulative DLT probability of each schedule was similar to that of the other methods. For both comparators, we use the same stopping rule and limits on escalation as used with our model, so that variations in the operating characteristics presented next are due primarily to the model that is used. All simulations were performed in R, version 4.0.4. Due to a vast savings in computation time, all posterior moments were computed through integral approximation methods, rather than through Markov Chain Monte Carlo methods. All code for our method, as well as the two comparator methods, is available on GitHub at https://github.com/tombraun1216/CRM‐with‐Step‐Up‐Dosing.git.

Operating characteristics

The performance of our design is summarized by three metrics: (i) the proportion of simulations in which each schedule is selected as the MTS at the end of the study; (ii) the average proportion of participants assigned to each schedule during the study, and (iii) the average Day 21 DLT probability of the schedule assigned to each participant. Information regarding metrics (i) and (ii) is shown in Table 2, while information regarding metric (iii) is shown in Figure 2.

TABLE 2

Operating characteristics resulting from simulations described in Section 3.2

		Early term		Schedule 1		Schedule 2		Schedule 3		Schedule 4		Schedule 5		Schedule 6
Scen	Method	Sel	Assn	Sel	Assn	Sel	Assn	Sel	Assn	Sel	Assn	Sel	Assn	Sel	Assn
1	CRM	0.08	n/a	0.68	0.66	0.20	0.17	0.05	0.09	0.00	0.03	0.00	0.01	0.00	0.00
	GWF	0.07	n/a	0.60	0.51	0.28	0.28	0.05	0.13	0.00	0.03	0.00	0.00	0.00	0.00
	TRI	0.06	n/a	0.72	0.39	0.16	0.27	0.06	0.25	0.00	0.06	0.00	0.01	0.00	0.00
2	CRM	0.01	n/a	0.32	0.38	0.40	0.29	0.23	0.21	0.04	0.09	0.00	0.03	0.00	0.01
	GWF	0.00	n/a	0.21	0.23	0.49	0.40	0.26	0.26	0.04	0.10	0.00	0.02	0.00	0.00
	TRI	0.00	n/a	0.42	0.19	0.28	0.27	0.27	0.38	0.03	0.13	0.00	0.02	0.00	0.00
3	CRM	0.00	n/a	0.02	0.12	0.25	0.22	0.41	0.31	0.27	0.23	0.04	0.09	0.00	0.03
	GWF	0.00	n/a	0.00	0.06	0.23	0.24	0.45	0.35	0.28	0.26	0.04	0.08	0.00	0.01
	TRI	0.00	n/a	0.12	0.08	0.15	0.16	0.46	0.39	0.24	0.29	0.03	0.07	0.00	0.01
4	CRM	0.00	n/a	0.00	0.06	0.03	0.09	0.21	0.19	0.42	0.29	0.30	0.24	0.05	0.13
	GWF	0.00	n/a	0.00	0.04	0.02	0.11	0.25	0.22	0.44	0.34	0.25	0.21	0.05	0.07
	TRI	0.00	n/a	0.04	0.04	0.02	0.10	0.23	0.22	0.42	0.38	0.27	0.20	0.02	0.06
5	CRM	0.00	n/a	0.00	0.05	0.01	0.07	0.06	0.11	0.29	0.22	0.41	0.28	0.24	0.26
	GWF	0.00	n/a	0.00	0.04	0.01	0.09	0.07	0.13	0.31	0.27	0.40	0.28	0.22	0.18
	TRI	0.00	n/a	0.02	0.04	0.01	0.09	0.07	0.12	0.32	0.31	0.43	0.28	0.16	0.16
6	CRM	0.00	n/a	0.00	0.05	0.00	0.05	0.02	0.08	0.12	0.14	0.34	0.24	0.52	0.44
	GWF	0.00	n/a	0.00	0.04	0.00	0.08	0.02	0.09	0.14	0.19	0.33	0.26	0.51	0.34
	TRI	0.00	n/a	0.01	0.04	0.00	0.08	0.02	0.09	0.16	0.22	0.39	0.27	0.42	0.31
7	CRM	0.75	n/a	0.24	0.48	0.00	0.05	0.00	0.02	0.00	0.01	0.00	0.00	0.00	0.00
	GWF	0.71	n/a	0.28	0.47	0.02	0.11	0.00	0.03	0.00	0.01	0.00	0.00	0.00	0.00
	TRI	0.57	n/a	0.41	0.38	0.02	0.22	0.01	0.16	0.00	0.02	0.00	0.00	0.00	0.00

Note: CRM, proposed extension of CRM; GWF, PK hazard method of Reference 22; TRI, triangular hazard method of References 19 and 20; Sel, proportion of simulations in which schedule was selected as best at end of study; Assn, average proportion of participants assigned to schedule during study; early term, no schedule selected due to early termination of study. Boldface text corresponds to operating characteristics corresponding to schedule with true DLT closest to target DLT probability of .25.

FIGURE 2

Schedule assignment patterns in Scenarios 2‐5 of simulations described in Section 3. , proposed method using extension of CRM; , PK hazard method of Reference 22; , triangular hazard method of References 19 and 20. Horizontal dashed line represents targeted DLT probability of .25 Operating characteristics resulting from simulations described in Section 3.2 Note: CRM, proposed extension of CRM; GWF, PK hazard method of Reference 22; TRI, triangular hazard method of References 19 and 20; Sel, proportion of simulations in which schedule was selected as best at end of study; Assn, average proportion of participants assigned to schedule during study; early term, no schedule selected due to early termination of study. Boldface text corresponds to operating characteristics corresponding to schedule with true DLT closest to target DLT probability of .25. To start, we focus on the boldfaced values in Table 2, which indicate that final selection or assignment during the study is done at the schedule with Day 21 DLT probability closest to the target. With regard to the decision made at the end of the study, we see that all three methods have fairly similar performance; averaging across all scenarios, we have correct selection proportions of 0.47, 0.48, and 0.46 for our proposed method, the method of Gunhan et al, and the method of Braun et al, respectively. Furthermore, we extended the work of O'Quigley et al, and generated, for each of the first six scenarios, a non‐parametric optimal upper bound for probability of correct schedule selection at study end. Based upon 10 000 simulations, the realized upper bounds were 0.69, 0.57, 0.52, 0.49, 0.54, and 0.58 for scenarios 1‐6, respectively, with an average of 0.57, which supports that all three methods have solid, but not overly optimistic, performance with 30 participants. With regard to assignments made during the trial, 2 presents fairly comparable performance among the methods, although with greater variation than what was seen for the probability of correct selection at study end. Averaging across all six scenarios, we have average proportions of assignment equal to 0.38, 0.37, and 0.34 for our proposed method, the method of Gunhan et al, and the method of Braun et al, respectively. The greatest discrepancy among the methods appears in Scenario 2, in which both our proposed design and the triangular hazard design both assign fewer participants to schedule 2 than does the PK‐based hazard design. This issue is also demonstrated in the upper left‐hand panel in Figure 2, in which we see that the proposed design tends to assign Schedule 1 more often than Schedule 2, while the triangular hazard design tends to assign Schedule 3 more than Schedule 2. To examine this phenomenon in more detail, we examined all six scenarios and tabulated, for each of the three designs, the difference in schedule assignments between a current participant who experiences a DLT and the next participant, that is, determining if the next assignment was at a lower schedule, the same schedule, or a higher schedule. What we found was that across the six scenarios, the probability of de‐escalating by two or more schedule levels ranged from 0.05 to 0.13 for our proposed design, 0.01 to 0.03 for the PK hazard design, and 0.01 to 0.04 for the triangular hazard design. Conversely, the probability of escalating one schedule level ranged from 0.01 to 0.02 for our proposed design, 0.01 to 0.03 for the PK hazard design, and 0.07 to 0.11 for the triangular hazard design. This differential is also supported by the results in Table 2 for scenario 7, in which all six schedules are overly toxic, so that dose assignments should be restricted to the lowest schedules, and the study should terminate before accruing all 30 participants. We see that the proposed method and PK‐based hazard method both terminate accrual more often than the triangular hazard approach, which also assigns schedule 3 far more often than the others. Furthermore, early termination occurred after an accrual of 13, 14, and 19 participants, on average, for our method, the PK hazard design, and the triangular hazard design, respectively. In Figure 2, we have presented the dose assignment patterns for Scenarios 2 to 5 and have omitted Scenarios 1 and 6 because they present redundant information from the other scenarios. We see across all four scenarios that all three methods have assignments that tend to coverage toward the schedule with a DLT probability closest to the target of .25. As expected, all three methods tend to escalate to schedules with DLT probabilities higher than .25, and then respond to observed DLTs and de‐escalate in efforts to observe fewer DLTs. The greatest divergence among the three methods is seen for Scenario 2 and supports our earlier discussion of the results presented in Table 2, namely that our proposed method tends to be least aggressive with schedule assignments and the triangular hazard method tends to be most aggressive.

Sensitivity analyses

Using the settings from Scenarios 1 to 6 presented in Table 1, we also examined how the operating characteristics of our proposed methods varied with (i) a different prior distribution for and , (ii) a different skeleton, which implicitly impacts the prior for , (iii) a larger sample size of 45 participants, and (iv) a shorter average inter‐arrival for participants of seven days. The results for aspects (i) and (ii) can be found in Table 3, while the results for aspects (iii) and (iv) can be found in Table S1 in the Appendix.

TABLE 3

Simulation results for assessing sensitivity to prior distributions and skeleton

		Sched 1		Sched 2		Sched 3		Sched 4		Sched 5		Sched 6
Sensitivity feature	Scen	Sel	Assn	Sel	Assn	Sel	Assn	Sel	Assn	Sel	Assn	Sel	Assn
Prior for θ2 and θ3	1	0.69	0.69	0.19	0.18	0.05	0.07	0.00	0.02	0.00	0.00	0.00	0.00
	2	0.33	0.42	0.40	0.28	0.22	0.19	0.04	0.08	0.00	0.02	0.00	0.00
	3	0.02	0.15	0.25	0.23	0.42	0.31	0.26	0.21	0.05	0.08	0.00	0.02
	4	0.00	0.08	0.04	0.11	0.23	0.22	0.43	0.29	0.26	0.21	0.04	0.10
	5	0.00	0.07	0.01	0.08	0.06	0.13	0.31	0.24	0.41	0.27	0.20	0.23
	6	0.00	0.05	0.00	0.06	0.03	0.10	0.14	0.16	0.35	0.24	0.48	0.39
Skeleton dose values	1	0.69	0.61	0.21	0.22	0.03	0.09	0.00	0.03	0.00	0.01	0.00	0.00
	2	0.28	0.31	0.47	0.35	0.22	0.23	0.03	0.08	0.00	0.02	0.00	0.00
	3	0.01	0.11	0.23	0.21	0.46	0.34	0.27	0.25	0.03	0.07	0.00	0.01
	4	0.00	0.05	0.02	0.09	0.22	0.20	0.47	0.34	0.27	0.24	0.03	0.08
	5	0.00	0.04	0.01	0.07	0.07	0.13	0.36	0.27	0.44	0.31	0.13	0.17
	6	0.00	0.04	0.00	0.05	0.03	0.10	0.18	0.19	0.42	0.31	0.36	0.32

Note: Sel, proportion of simulations in which schedule was selected as best at end of study; Assn, average proportion of participants assigned to schedule during study; boldface text corresponds to operating characteristics corresponding to schedule with true DLT closest to target DLT probability of .25.

Simulation results for assessing sensitivity to prior distributions and skeleton Note: Sel, proportion of simulations in which schedule was selected as best at end of study; Assn, average proportion of participants assigned to schedule during study; boldface text corresponds to operating characteristics corresponding to schedule with true DLT closest to target DLT probability of .25. Assuming an increased value (as described in Section 2.5), we assigned a prior exponential distribution with mean and a prior exponential distribution with mean . From the first set of six rows in Table 3, we also see no substantial change to the operating characteristics for our design, although with slightly lower performance in scenario 6. The latter set of six rows in Table 3 correspond to operating characteristics when we changed the skeleton by using values , and (as described in Section 2.4), which led to skeleton values that were shifted closer to zero. As with the prior distributions for and , we see little change in operating characteristics for scenarios 1 to 5, although the performance is decreased in scenario 6, which emphasizes that the skeleton values do impact the prior mean for , which is the likely cause of this result in scenario 6. In Supplementary Table S1, the first six rows present the operating characteristics when the sample size is increased to 45 participants. Not surprisingly, we see that the probability of correctly selecting the best schedule increases approximately 10 points, with a slightly lower increase in the average proportion of participants assigned to the best schedule. With regard to the average inter‐arrival time of patients, when participants arrived an average of every 7 days, rather than every 21 days, the last six rows of Supplementary Table S1 demonstrate no material change in the operating characteristics relative to those in Table 2.

DISCUSSION

In recent Phase I dose‐escalation trials assessing the tolerability of new investigational drugs in cancer immunotherapy, high rates of DLT after first dose have been reported. To address this issue, adjusted dosing schemes have been proposed that consist of planned stepwise dose‐escalation for each participant at the start of treatment. These dosing schemes, known as SUD or dose fractionation, pose new challenges for the design of such a Phase I dose‐escalation trial. Certainly, identifying the MTD in a Phase I dose‐escalation trial is always challenging because the process relies on sparse data. Each patient contributes a single data point, either a 0 if no DLT is observed during the DLT assessment period, or a 1 otherwise. With SUD schemes, the challenge of identifying the MTD is even larger, as a combination of doses have to be identified, each of which induces a probability of toxicity close to the target toxicity level. To this end, we have proposed an approach for assessing the cumulative probability of DLT for a series of administrations given to participants in a SUD design, whereby participants are first given administrations of a lower dose in hopes of leading to less likelihood of DLT when the final desired dose of the agent is administered. Our design is a simple extension of the CRM, as compared to other approaches founded in time‐to‐event methods, and our design has operating characteristics comparable to those methods. Our method resembles a piecewise hazard model and is suitable for designs in which participants are treated with a succession of non‐decreasing doses. One limitation of our method, is that each administration is assumed to occur within a specific timeframe, so our current model is unable to accommodate treatment delays and resulting DLT information, unless additional assumptions or modifications are made. The number of parameters in our model also grows with the number of administrations, although we could include approaches to smooth among the parameters, correlate them through their prior distributions, or even assume that the change in DLT probability is the same for each additional administration. Nonetheless, our method does not make an assumption of additivity of hazards among the administrations, as is done with the approaches we used as comparators. And, like all designs using Bayesian methods, any data from a previous study of a related compound, including information on PK and PD patterns, should be considered to assist with the selection of prior distributions and all parameter and dose values. Our proposed CRM extension requires specifying a skeleton, which is a set of plausible values ascribed to each dose and is often related to the a priori probability of DLT for single administration. Typically, the skeleton is chosen to cover a wide range of possible dose‐toxicity profiles, and this choice can be difficult given the lack of prior knowledge on dose‐toxicity at the study start. To help in this undertaking, we provide a systematic and intuitive approach for choosing the values used in the skeleton. By simplifying the selection of the initial guesses of the probabilities of DLT in practice, we hope to enhance the use of our CRM‐based approach. Our method is also extremely flexible and allows for nearly any modification desired for a specific trial. For example, although our simulations assigned a dose to each participant individually, such assignments can be done to a collective cohort of participants, such as in groups of three participants, as a safeguard against early escalation to doses later seen to be toxic. Our method also allows for any dose‐toxicity model used in the CRM, such as a standard logistic model with known intercept. Nonetheless, before implementing our or any proposed design, clinical and safety experts should be consulted to evaluate the risk of planning to increase a participant's dose while on study. Additionally, a thoughtful and careful safety monitoring plan should be made and possibly include restrictions on how many participants can be exposed to a dose level, whether a participant is exposed to that dose initially as part of a new study cohort or is exposed during their planned dose escalation. Furthermore, our design is founded on a strong a priori belief that SUD is necessary. Although our design does allow for the estimation of DLT probabilities for single administrations of varying doses, as well as the ability to compare the DLT probability of a dose given immediately vs gradually via SUD, investigators must weight scientific interest in this latter comparison with the safety, costs, and time resources that might arise with repeated administrations. Table S1: Simulation results for assessing sensitivity to sample size and inter‐arrival times. Sel = proportion of simulations in which schedule was selected as best at end of study; Assn = average proportion of participants assigned to schedule during study; boldface text corresponds to operating characteristics corresponding to schedule with true DLT closest to target DLT probability of 0.25. Click here for additional data file.

28 in total

1. Non-parametric optimal design in dose finding studies.

Authors: John O'Quigley; Xavier Paoletti; Jean Maccario
Journal: Biostatistics Date: 2002-03 Impact factor: 5.899

2. Cytokine Release Syndrome with Pseudoprogression in a Patient with Advanced Non-Small Cell Lung Cancer Treated with Pembrolizumab.

Authors: Yoshihito Kogure; Yukira Ishii; Masahide Oki
Journal: J Thorac Oncol Date: 2019-03 Impact factor: 15.609

3. Continual reassessment method: a practical design for phase 1 clinical trials in cancer.

Authors: J O'Quigley; M Pepe; L Fisher
Journal: Biometrics Date: 1990-03 Impact factor: 2.571

4. Nivolumab in the Treatment of Refractory Pediatric Hodgkin Lymphoma.

Authors: Alexandra E Foran; Helen R Nadel; Anna F Lee; Kerry J Savage; Rebecca J Deyell
Journal: J Pediatr Hematol Oncol Date: 2017-07 Impact factor: 1.289

5. Development of a minimal physiologically-based pharmacokinetic/pharmacodynamic model to characterize target cell depletion and cytokine release for T cell-redirecting bispecific agents in humans.

Authors: Xiling Jiang; Xi Chen; Pharavee Jaiprasart; Thomas J Carpenter; Rebecca Zhou; Weirong Wang
Journal: Eur J Pharm Sci Date: 2020-02-10 Impact factor: 4.384

6. Sequential designs for phase I clinical trials with late-onset toxicities.

Authors: Y K Cheung; R Chappell
Journal: Biometrics Date: 2000-12 Impact factor: 2.571

7. Diffuse edema suggestive of cytokine release syndrome in a metastatic lung carcinoma patient treated with pembrolizumab.

Authors: Elie El Rassy; Tarek Assi; Jamale Rizkallah; Joseph Kattan
Journal: Immunotherapy Date: 2017-03 Impact factor: 4.196

8. Towards using a full spectrum of early clinical trial data: a retrospective analysis to compare potential longitudinal categorical models for molecular targeted therapies in oncology.

Authors: Pierre Colin; Sandrine Micallef; Maud Delattre; Pierre Mancini; Eric Parent
Journal: Stat Med Date: 2015-06-08 Impact factor: 2.373

9. Cytokine Release Syndrome During Sequential Treatment With Immune Checkpoint Inhibitors and Kinase Inhibitors for Metastatic Melanoma.

Authors: Florentia Dimitriou; Alexandra V Matter; Joanna Mangana; Mirjana Urosevic-Maiwald; Sara Micaletto; Ralph P Braun; Lars E French; Reinhard Dummer
Journal: J Immunother Date: 2019-01 Impact factor: 4.456

10. Extending the Continual Reassessment Method to accommodate step-up dosing in Phase I trials.

Authors: Thomas M Braun; Francois Mercier
Journal: Stat Med Date: 2022-06-05 Impact factor: 2.497

1 in total

1. Extending the Continual Reassessment Method to accommodate step-up dosing in Phase I trials.

Authors: Thomas M Braun; Francois Mercier
Journal: Stat Med Date: 2022-06-05 Impact factor: 2.497

1 in total