Literature DB >> 34747043

Optimal adaptive allocation using deep reinforcement learning in a dose-response study.

Kentaro Matsuura^1,2, Junya Honda^3,4, Imad El Hanafi^5,6, Takashi Sozu⁷, Kentaro Sakamaki⁸.

Abstract

Estimation of the dose-response curve for efficacy and subsequent selection of an appropriate dose in phase II trials are important processes in drug development. Various methods have been investigated to estimate dose-response curves. Generally, these methods are used with equal allocation of subjects for simplicity; nevertheless, they may not fully optimize performance metrics because of nonoptimal allocation. Optimal allocation methods, which include adaptive allocation methods, have been proposed to overcome the limitations of equal allocation. However, they rely on asymptotics, and thus sometimes cannot efficiently optimize the performance metric with the sample size in an actual clinical trial. The purpose of this study is to construct an adaptive allocation rule that directly optimizes a performance metric, such as power, accuracy of model selection, accuracy of the estimated target dose, or mean absolute error over the estimated dose-response curve. We demonstrate that deep reinforcement learning with an appropriately defined state and reward can be used to construct such an adaptive allocation rule. The simulation study shows that the proposed method can successfully improve the performance metric to be optimized when compared with the equal allocation, D-optimal, and TD-optimal methods. In particular, when the mean absolute error was set to the metric to be optimized, it is possible to construct a rule that is superior for many metrics.

Entities: Chemical

Keywords: adaptive design; clinical trial; dose-finding; dose-ranging; optimal design; response-adaptive

Mesh：

Year: 2021 PMID： 34747043 PMCID： PMC9298337 DOI： 10.1002/sim.9247

Source DB: PubMed Journal: Stat Med ISSN： 0277-6715 Impact factor: 2.497

INTRODUCTION

Estimation of the dose‐response curve for efficacy and selection of the dose for use in confirmatory phase III trials are one of the most difficult decisions in the drug development process. While too low a dose can result in lack of efficacy, too high a dose can cause unnecessary adverse events. Various methods have been examined to accurately estimate the dose‐response curve and ensure correct dose selection. Methods for estimating the dose‐response curve include analysis of variance (ANOVA), multiple comparison procedure—modeling (MCP‐Mod) method, and Bayesian modeling average (BMA)‐based method. , These methods are typically used with equal allocation of subjects for simplicity. Various optimal allocation methods, which include adaptive allocation methods, have been studied, such as the D‐optimal method, TD‐optimal method, aMCP‐Mod method, and Miller's method. Studies have evaluated some of these methods. , One common feature in these studies is the evaluation of the operating characteristics in simulation studies using performance metrics, such as statistical power, accuracy of the estimated target dose, and mean absolute error over the estimated dose‐response curve. The issue with equal allocation is that the metrics may not be fully optimized due to nonoptimal allocation. Several optimal allocation methods have been proposed in previous studies to overcome this issue, but they rely on asymptotics, and thus sometimes cannot efficiently optimize the performance metric with the sample size in an actual clinical trial. For example, the D‐optimal method minimizes the asymptotic variance of the estimates of the dose‐response model parameters, and the TD‐optimal method minimizes the asymptotic variance of the estimated target dose. The purpose of this study is to construct an adaptive allocation rule that can directly optimize the performance metric to be optimized. To achieve this, we use deep reinforcement learning , based on the mean and standard deviation of the response for each dose and the number of subjects allocated to each dose. A simulation study was conducted to compare the operating characteristics of the equal allocation (commonly used in actual clinical trials), D‐optimal method, TD‐optimal method, and proposed method. In Section 2, the performance metrics and the proposed method are described. In Section 3, the simulation settings and results are presented. In Section 4, we summarize and discuss our findings.

OPTIMAL ADAPTIVE ALLOCATION USING REINFORCEMENT LEARNING

Settings

In most actual dose‐response studies, the doses are limited to predetermined discrete values, and thus, we assume this in this article. The number of doses is denoted by K, and the indices of the doses are , indexed from the lowest dose to the highest dose. The amount of dose is denoted by , where is the placebo group with . The total number N of subjects to be allocated in a clinical trial is assumed to be predetermined. Each subject is allocated to a dose and response Y is measured. We assume that the clinical team has a performance metric to be optimized, as described in Section 2.2, and determines a method to detect dose‐response and to estimate the dose‐response curve at the end of the trial (eg, ANOVA, MCP‐Mod, or BMA). In the proposed method, a clinical trial is conducted according to the following steps. At the beginning of the trial, subjects are allocated equally to and their responses are obtained. Based on the information obtained so far, each of subjects is probabilistically allocated to one of the K doses according to the adaptive allocation rule . Then, their responses are obtained. This step is repeated for where . At the end of the trial, the dose‐response curve and target dose are estimated, and all performance metrics are evaluated. In Step 2, the rule selects a dose k so that it can optimize the selected performance metric. The rule is determined before the start of the trial. In Section 2.3, we explain how the rule is obtained using deep reinforcement learning.

Performance metrics

When selecting the target dose to be used in a phase III trial, safety and efficacy of the drug are taken into consideration. For the purpose of simulation studies, we simplify the problem and consider only efficacy for dose selection. In simulation studies, the existence of true dose‐response curves is usually assumed to evaluate the methods. , The values of the true and estimated dose‐response curves at are denoted by and , respectively. To evaluate the operating characteristics of the methods, the following performance metrics are used in general. ,

Detecting dose‐response

The methods in the previous studies and the proposed method include a decision rule to determine whether the data provides sufficient evidence of dose‐response activity. The probability of identifying the presence of dose‐response is estimated as the percentage of simulated trials in which the decision rule concluded for dose‐response activity. Under a flat dose‐response scenario, it gives the type I error rate, and under a nonflat dose‐response scenario, it provides the power to make the correct identification of dose‐response.

Accuracy of model selection

In several dose‐response curve estimation methods, model selection is done from candidate dose‐response models such as linear, Emax, and sigmoid Emax models. For the accuracy of the model selection, we calculate the percentage of simulated trials in which the dose‐response curve selected in model selection is correct, and call this metric “MS”. Selecting the correct model is important for estimating the dose‐response curve and target dose with small errors.

Accuracy of a target dose

In this article, the target dose is defined as the smallest dose that produces an effect difference from placebo greater than or equal to the clinically relevant target effect (minimum effective dose, MED). Here, is a continuous value and is obtained by It should be noted that varies with the true dose‐response curve. We also consider target effect intervals (ie, within of the target effect) and their corresponding target dose intervals . The estimated target dose is also a continuous value and is defined using the estimated dose‐response curve by We define the accuracy of the estimated target dose by calculating the percentage of simulated trials in which is correctly within the interval , and call this metric “TD”. In this study, we evaluate “TD” without rounding to the nearest integer because we consider that a better “TD” for the continuous dose also leads to a better “TD” for the discrete dose.

Error in a dose‐response curve

Accurate estimation of the dose‐response curve is relevant not only for estimating target doses, but also for appropriate labeling after approval. To evaluate the accuracy of the dose‐response curve estimation, we calculate the mean absolute error (MAE) between the estimated and true dose‐response curves. In actual clinical trials, it is important to determine the effect compared with the placebo group. Therefore, we calculate the MAE after shifting the dose‐response curve so that the effect in the placebo group is zero.

Deep reinforcement learning

In this section, we describe how to use reinforcement learning to obtain an allocation rule that optimizes the selected metric. To conduct reinforcement learning, the distributions for the dose‐response curve and observation noise must be given to simulate trials. In each simulated trial, a dose‐response curve and responses are probabilistically generated from the distributions. The distributions should reflect the prior beliefs of the clinical team. For example, we can use the candidate models of MCP‐Mod with prespecified probabilities when using it to estimate a dose‐response curve. Similarly, we can use the prior distributions of BMA when using BMA. In reinforcement learning, a task is formulated as a Markov decision process (MDP), and an important factor is how to specify the state and reward in the MDP. In the application of an MDP, state s corresponds to a variable that succinctly describes the information available up to that time point. Now, we consider the situation in which the responses of the th block have been obtained in Step 2 in Section 2.1. In the proposed method, we define s by where and are the mean and standard deviation of the responses of the subjects allocated to dose k. The number of subjects allocated to dose k up to that time point is denoted by . Therefore, is satisfied at the end of the clinical trial. s is a vector of the difference from placebo, the standard deviation, and the proportion of the number of subjects allocated. We define action k to be selected from . Unlike when we apply the obtained allocation rule, action k represents that all subjects within the bth block receive the same dose k in the learning. This is to speed up and stabilize reinforcement learning. Next, we define the reward. For each metric selected from those in Section 2.2, we transformed the value into approximately within the range at the end of the trial to use the default value of the learning rate hyperparameter in the software. We write as the reward when the performance metric is x. We define , , , and as follows: We define as the expected cumulative reward from state s by allocating the next block to dose k and after that following the allocation rule (see Appendix for the formal definition). The aim of reinforcement learning is to learn the optimal allocation rule such that is maximized for each s. When the number of possible values of s is finite and small, it is possible to use the backward induction method; however, this method is not feasible in this case. Instead, we express using a deep neural network (DNN) and obtain numerically by reinforcement learning. Several methods have been proposed to learn . , Here, we use the proximal policy optimization (PPO) method, a type of deep reinforcement learning, owing to its ease of implementation and high performance. In the PPO method, the probability of taking action (in our case, dose) k under state s is represented by a DNN. A DNN with an activation function f and consisting of two intermediate layers with J units can be described as follows: where is an element of s, and , , , , , are the parameters of the DNN. We estimate using reinforcement learning. Specifically, we first initialize the parameters of the DNN appropriately to initialize . Then, we simulate a clinical trial according to the current rule , and obtain the data of the states and rewards. From these data, the parameters of the DNN are updated based on the gradient to increase the reward. We iteratively simulate trials and update them such that converges to . See Appendix for the overview of the PPO method.

SIMULATION STUDY

We conducted a simulation study in a slightly modified setting used by Bornkamp et al and Dragalin et al. We compared the performance of the equal allocation, D‐optimal method, TD‐optimal method, and proposed method.

Design of Simulation Study

We assumed a phase II dose‐response study using the MCP‐Mod method, which has been used frequently in actual trials in recent years. Note that it is also possible to use reinforcement learning to directly estimate the dose‐response curve without using MCP‐Mod. Nonetheless, we unified the procedure to use MCP‐Mod for a fair comparison with existing methods and to purely evaluate the efficiency of the allocation rules. In this trial, five doses (0, 2, 4, 6, and 8 mg) were set, and the total sample size was set to 150 subjects. The clinically relevant target effect was . In MCP‐Mod, candidate dose‐response models (curves) with the values of their shape parameters must be prepared before the start of the trial. The candidates in this trial were Scenarios 1, 4, and 7 in Table 1 with equal probabilities (ie, 1/3 for each), and the maximum effect in the dose range was assumed to be 1.65. The response was assumed to be the sum of the dose‐response curve and the observation noise following a normal distribution with mean 0 and variance 4.5. In MCP‐Mod, multiple testing with a significance level is performed on the candidates at the end of the trial. The models that pass the testing are fitted to the data, and the shape parameters are estimated. Then, model selection is performed using a predetermined criterion. Here, the significance level was set to 0.025, and model selection was performed using Akaike information criterion (AIC). Finally, the performance metrics in Section 2.2 were evaluated using the selected model. Although performance metrics (except power) are not defined under MCP‐Mod in case no model passes the testing, we formally performed model selection using all candidate models and calculated the performance metrics for the evaluation purpose.

TABLE 1

Dose‐response scenarios

Scenario no.	Model	Max effect	Formula	dtarg	Itargd(0.1)
1	linear	1.65	μ(d)=(1.65/8)d	6.30	(5.67, 6.93)
2	linear	1.65×0.8	μ(d)=(1.32/8)d	7.88	(7.09, 8.00)
3	linear	1.65×1.2	μ(d)=(1.98/8)d	5.25	(4.73, 5.78)
4	Emax	1.65	μ(d)=1.81d/(0.79+d)	2.00	(1.44, 2.95)
5	Emax	1.65×0.8	μ(d)=1.45d/(0.79+d)	6.83	(3.30, 8.00)
6	Emax	1.65×1.2	μ(d)=2.18d/(0.79+d)	1.17	(0.92, 1.52)
7	sigEmax	1.65	μ(d)=1.70d5/(45+d5)	5.06	(4.68, 5.58)
8	sigEmax	1.65×0.8	μ(d)=1.36d5/(45+d5)	7.37	(5.75, 8.00)
9	sigEmax	1.65×1.2	μ(d)=2.04d5/(45+d5)	4.47	(4.24, 4.74)
10	quadratic	1.65	μ(d)=(1.65/3)d−(1.65/36)d2	3.24	(2.76, 3.81)
11	quadratic	1.65×0.8	μ(d)=(1.32/3)d−(1.32/36)d2	5.26	(3.98, 8.00)
12	quadratic	1.65×1.2	μ(d)=(1.98/3)d−(1.98/36)d2	2.48	(2.16, 2.84)
13	exponential	1.65	μ(d)=0.00055(exp(d)−1)	7.76	(7.66, 7.86)
14	exponential	1.65×0.8	μ(d)=0.00044(exp(d)−1)	7.98	(7.88, 8.00)
15	exponential	1.65×1.2	μ(d)=0.00066(exp(d)−1)	7.58	(7.47, 7.67)
16	flat	0	μ(d)=0	‐	‐

Note: If the upper of did not exist or was greater than 8 (maximum dose), the upper was set to 8

Dose‐response scenarios Note: If the upper of did not exist or was greater than 8 (maximum dose), the upper was set to 8 For each of the 16 scenarios in Table 1, 10 000 simulated trials were used to estimate the mean of the performance metrics. Scenarios 2, 3, 5, 6, 8, and 9 represent the scenarios where the effect was smaller or larger than the candidates, and Scenarios 10 to 15 represent the scenarios where the model was not included in the candidates. These scenarios were set up to verify the robustness of the allocation rule obtained by the proposed method. Scenario 16 was used to evaluate the type I error rate. These scenarios are illustrated in Figure 1.

FIGURE 1

Dose‐response scenarios

Allocation Rule

We used the following eight allocation rules: , , , , , , , , and . We used the sans‐serif font for rule names to distinguish the objective used in RL, which represents reinforcement learning, from the evaluated performance metrics. The details of the eight allocation rules are described below.

At the beginning of the trial, 150 subjects were equally allocated to five doses (). This rule is easy to understand and is most frequently used in actual clinical trials. At the beginning of the trial, the allocation ratios were calculated based on the D‐optimal method to minimize where m is the index of each candidate model, is the prior probability of model m (here, 1/3 for each m), is the number of parameters of model m, and is the Fisher information matrix under model m. The calculated allocation ratios for each group were 0.30, 0.20, 0.12, 0.09, and 0.29, respectively. The calculated ratios were rounded to integer values using the method by Pukelsheim and Rieder, and , , , , were allocated. The subjects were adaptively allocated based on the D‐optimal method. More specifically, at the beginning of the trial, 50 subjects were equally allocated to the five doses. Then, after obtaining their responses, we determined the allocation ratios that minimized Equation (1), given the number of allocated subjects and the number of subjects in the next block (ie, by using the options “” and “” in function of R). Here, 10 subjects were allocated in the next block. The model probabilities () were set to 1/3 before the trial, and were updated according to Section 5 in Miller et al for each block. The shape parameters were not updated and were fixed to those of the candidates (ie, Scenarios 1, 4, and 7). The calculated ratios were rounded to integer values using the method by Pukelsheim and Rieder. Then, the responses of the 10 allocated subjects were obtained, and the allocation ratios were calculated again to allocate the next 10 subjects. This was repeated until the total number of subjects reached 150. At the beginning of the trial, the allocation ratios were calculated based on the TD‐optimal method to minimize where m is the index of each candidate model, is the probability of model m (here, 1/3 for each m), and is proportional to the asymptotic variance of the estimated target dose under model m. The calculated allocation ratios were 0.31, 0.26, 0.12, 0.18, and 0.14, respectively. According to these ratios, , , , , and were allocated. The subjects were adaptively allocated based on the TD‐optimal method. The procedure was the same as that used in , except that the objective function was Equation (2) instead of Equation (1).

, , , and

Because the procedures for constructing these rules are similar, is explained as an example. We simulated clinical trials in reinforcement learning using the settings in Sections 2 and 3.1, and learned the allocation rule. In each simulated trial, the dose‐response curve was determined uniformly at random from the scenarios considered in MCP‐Mod (ie, Scenarios 1, 4, and 7), and the observation noise was generated from a normal distribution with mean 0 and variance 4.5. We used and . In addition, we used ReLU () as the activation function and a DNN consisting of two intermediate layers with 256 units. The settings of the DNN were the default values of the software. After each simulated trial, the MAE was evaluated. After each 1000 simulated trials, allocation rule was updated using the accumulated data of the states and MAEs. With 1 000 000 simulated trials in reinforcement learning, the allocation rule was obtained. See Appendix for details on the hyperparameters of the PPO method. At the beginning of the trial, 50 subjects were allocated equally to the five doses. Thereafter, each time the responses were obtained, each of the 10 subjects was probabilistically allocated to one of the five doses according to the discrete distribution . This was repeated until the total number of subjects reached 150. , , and , were the same as , except that the metrics to be optimized were power, MS, and TD in Section 2.2. In general, it is known that using p‐values without considering adaptive allocation may inflate the type I error rate, and a simulation‐based method to control the type I error rate has been discussed previously. , Here, we first calculated the p‐values for the flat scenario, and then adjusted the significance level threshold based on the distribution of the p‐values. Then, using the adjusted significance level, we simulated the other scenarios and evaluated the performance metrics. For deep reinforcement learning, we used the RLlib library in Python and for the MCP‐Mod, D‐optimal, and TD‐optimal methods, we used the DoseFinding package in R. The code with hyperparameters is available in Supplementary Material, which can be modified according to the requirement.

Results

In this section, the means of the performance metrics obtained from 10 000 simulations for each allocation rule are presented. The results for the type I error rate are shown in Figure 2. Figure 2A shows the type I error rate of each rule when the significance level was 0.025. Note that this significance level was based on MCP‐Mod, and the type I error rate was not theoretically guaranteed for adaptive allocation rules. Figure 2B shows the type I error rates of the proposed methods using various significance levels. From these results, we adjusted the significance level to 0.0235 for , 0.024 for , 0.021 for , and 0.0165 for to control the type I error rate. We continued to use a significance level of 0.025 for the other rules, assuming that fluctuations around the 2.5% level were consistent with the Monte Carlo error and the type I error rates were under control. Using these adjusted significance levels, we evaluated the performance metrics for the other scenarios.

FIGURE 2

The results for the type I error rate before adjustment

The results for the type I error rate before adjustment The results of the performance metrics (ie, power, MS, TD, and MAE) were similar for the four models (linear, Emax, sigEmax, and quadratic), whereas the results were different for the exponential model. Here, the average results over the all models are shown. For the results of each model, see Figures 1 to 4 in Supplementary Material. The results for power are shown in Figure 3. certainly improved power. In contrast, and worsened the power. The lower average power of may be due to the much lower power when the exponential model was true (see Supplementary Material). Notably, had high power even when the true maximum effect was smaller and larger than the candidates, even though was trained assuming a maximum effect of 1.65.

FIGURE 3

The results for power. The vertical dotted line represents the value of the equal allocation

The results for power. The vertical dotted line represents the value of the equal allocation The results for MS, TD, and MAE (Figures 4‐6) were calculated from simulations where multiple testing was significant. We confirmed that the results were almost the same, even if we included the simulations in which the testing was not significant. The results for the MS are shown in Figure 4. certainly improved the MS. In contrast, worsened the MS. was also effective in scenarios different from the candidates. The results for the TD are shown in Figure 5. Better results were obtained when the maximum effect was smaller than that of the candidates. This may be because of the wider range of . and improved the TD. Note that these rules were better than and . In contrast, worsened the TD. The results for the MAE are shown in Figure 6. improved the MAE. In contrast, worsened the MAE. was also effective in scenarios that were different from the candidates.

FIGURE 4

Probability of selecting the true model

FIGURE 6

The results for MAE. Smaller MAE implies better accuracys

FIGURE 5

Probability that the estimated target dose is within the interval

Probability of selecting the true model Probability that the estimated target dose is within the interval The results for MAE. Smaller MAE implies better accuracys In summary, these results showed that the proposed method improves not only the performance metric used for optimization, but also many other metrics. In particular, was superior in most metrics for correctly estimating the dose‐response relationship for phase III trials. The average number of subjects allocated to each dose is shown in Figure 7. This figure shows that the proposed methods tended to allocate more subjects to 0 mg than . In addition, and tended to allocate more subjects to 2 mg. Since the allocation that optimizes power for the contrast test (assuming the same variance across the dose groups) should be the allocation that places half of the subjects on placebo and the other half on the dose providing the maximum effect, it is natural that tended to allocate more subjects to 0 and 8 mg. Since 0, 2, and 8 mg are likely to be important in distinguishing the flat and Emax models from the rest, it is natural that more subjects will be allocated to these doses. For the results of each model, see Figures 5 to 7 in Supplementary Material.

FIGURE 7

The results for the average number of subjects allocated

The results for the average number of subjects allocated Note that the good performance of was not only due to the nonuniform allocation, but also from the adaptivity of the allocation. In fact, we confirmed that the performance does not improve if we use a fixed design with the number of subjects equal to the average of those of in Figure 7. See Supplementary Material for details.

DISCUSSION

We showed that deep reinforcement learning with an appropriately defined state and reward can be used to construct adaptive allocation rules that can directly optimize the performance metrics to be optimized. In general, reinforcement learning becomes difficult when the reward (ie, the performance metric evaluated at the end of each trial) is delayed, and the observation is noisy. Phase II trials have these difficulties, and it is not obvious whether reinforcement learning works successfully to address the same. Nonetheless, we have shown that it can work well if we appropriately design and choose the Markov decision process as well as the learning algorithm and hyperparameters. A limitation of this method is that it is difficult to visualize and understand the obtained allocation rule intuitively because the state is multidimensional. An allocation example of in a single simulated trial is shown in Figure 8 in Supplementary Material. Interactive software such as a Shiny application may help team members to understand the rule. In a clinical trial protocol, it is necessary to specify the assumptions (state, action, and reward) and the selected performance metric, and it would be helpful to show allocation examples. In the definition of state, we used the differences from the placebo (eg, ) to avoid making assumptions about the placebo response. When we have a specific prior distribution reflecting the background knowledge on the placebo response, it is also natural to define the state by We also simulated the proposed methods with slightly modified states, rewards, and model probabilities, which retrieved in general similar results. Nonetheless, it may be possible to slightly improve the performance by further tuning these settings. Although the simulation study was conducted assuming Gaussian noise with the MCP‐Mod method, the proposed method can also be applied to other settings (eg, binary response) and other methods (eg, ANOVA and BMA). For example, if the variance of the observation noise is unknown, we will assume the prior distribution of the variance to generate it in reinforcement learning. Because the settings are quite standard in practice, we can expect that the proposed method can cover a wide range of actual clinical trials. Results showed that the proposed methods was required to adjust the significance level to control the type I error rate. Therefore, developing a statistical test that is theoretically guaranteed under adaptive allocation is an important research topic. The optimization of power () did not necessarily lead to improvements in other performance metrics. On the other hand, showed good results not only for MAE, but also for other metrics. This seems to intuitively correspond to the fact that if the dose‐response curve itself is estimated with a small error, then the other purposes are achievable. For this reason, it seems natural to use if the focus is not on any particular metric. Note that it is theoretically the best to allocate all subjects to 0 and 8 mg to maximize the power under the scenarios used in the learning. indeed allocated many subjects to 0 and 8 mg, but there still exists a gap from this ideal allocation, which may be due to incomplete learning. Therefore, further tuning of the parameters and neural network may improve the performance. The results described in Supplementary Material showed that and performed poorly when the exponential model was true. This indicates that if the candidate models considerably differ from the true model, the allocation rules obtained from the learning may not perform well. It may be important to specify the distribution that will generate many possible models in reinforcement learning. In fact, by including the exponential model in the learning, we obtained good performance without sacrificing the performance for the other models (see Supplementary Material). When the proposed method is used, the number of subjects allocated could be unbalanced. When the imbalance must be taken into consideration for safety or ethical reasons, the number of subjects allocated equally at the beginning of the trial can be increased or a penalty can be incorporated in the reward if the number of subjects at a dose does not reach the threshold. Although we allocated subjects probabilistically according to the discrete distribution when applying the obtained rule, we can also use the rounding method, such as the one reported by Pukelsheim and Rieder. The results were generally similar. Although we used one performance metric for optimization, any metric can be used, including a combination of multiple metrics, because the method does not depend on the specific properties of the performance metric. Since many factors other than dose‐response are involved in actual phase III trials, it is also important to develop an appropriate performance metric for the success of phase III trials. Furthermore, we believe that it is possible to extend this approach for the stopping rules for success or futility by adding a stopping option as one of the actions and defining an appropriate reward for the option. It remains to be verified in which situations this will apply. Data S1 Supplementary Material. Click here for additional data file.

11 in total

1. Combining multiple comparisons and modeling techniques in dose-response studies.

Authors: F Bretz; J C Pinheiro; M Branson
Journal: Biometrics Date: 2005-09 Impact factor: 2.571

2. Optimal designs for estimating the interesting part of a dose-effect curve.

Authors: Frank Miller; Olivier Guilbaud; Holger Dette
Journal: J Biopharm Stat Date: 2007 Impact factor: 1.051

3. Innovative approaches for designing and analyzing adaptive dose-ranging trials.

Authors: Björn Bornkamp; Frank Bretz; Alex Dmitrienko; Greg Enas; Brenda Gaydos; Chyi-Hung Hsu; Franz König; Michael Krams; Qing Liu; Beat Neuenschwander; Tom Parke; José Pinheiro; Amit Roy; Rick Sax; Frank Shen
Journal: J Biopharm Stat Date: 2007 Impact factor: 1.051

Review 4. Dose finding - a challenge in statistics.

Authors: Frank Bretz; Jason Hsu; José Pinheiro; Yi Liu
Journal: Biom J Date: 2008-08 Impact factor: 2.207

5. Characterization of dose-response for count data using a generalized MCP-Mod approach in an adaptive dose-ranging trial.

Authors: Francois Mercier; Bjoern Bornkamp; David Ohlssen; Erik Wallstroem
Journal: Pharm Stat Date: 2015-06-17 Impact factor: 1.894

6. BMA-Mod: A Bayesian model averaging strategy for determining dose-response relationships in the presence of model uncertainty.

Authors: A Lawrence Gould
Journal: Biom J Date: 2018-12-19 Impact factor: 2.207

7. A flexible Bayesian approach for modeling monotonic dose-response relationships in drug development trials.

Authors: David Ohlssen; Amy Racine
Journal: J Biopharm Stat Date: 2015 Impact factor: 1.051

Review 8. Design optimization for dose-finding trials: a review.

Authors: Jihane Aouni; Jean Noel Bacro; Gwladys Toulemonde; Pierre Colin; Loic Darchy; Bernard Sebastien
Journal: J Biopharm Stat Date: 2020-03-17 Impact factor: 1.051

9. Human-level control through deep reinforcement learning.

Authors: Volodymyr Mnih; Koray Kavukcuoglu; David Silver; Andrei A Rusu; Joel Veness; Marc G Bellemare; Alex Graves; Martin Riedmiller; Andreas K Fidjeland; Georg Ostrovski; Stig Petersen; Charles Beattie; Amir Sadik; Ioannis Antonoglou; Helen King; Dharshan Kumaran; Daan Wierstra; Shane Legg; Demis Hassabis
Journal: Nature Date: 2015-02-26 Impact factor: 49.962

10. Optimal adaptive allocation using deep reinforcement learning in a dose-response study.

Authors: Kentaro Matsuura; Junya Honda; Imad El Hanafi; Takashi Sozu; Kentaro Sakamaki
Journal: Stat Med Date: 2021-11-07 Impact factor: 2.497

1 in total

1. Optimal adaptive allocation using deep reinforcement learning in a dose-response study.

Authors: Kentaro Matsuura; Junya Honda; Imad El Hanafi; Takashi Sozu; Kentaro Sakamaki
Journal: Stat Med Date: 2021-11-07 Impact factor: 2.497

1 in total