| Literature DB >> 32207166 |
John Whitehead1, Yasin Desai1, Thomas Jaki1.
Abstract
When a clinical trial is subject to a series of interim analyses as a result of which the study may be terminated or modified, final frequentist analyses need to take account of the design used. Failure to do so may result in overstated levels of significance, biased effect estimates and confidence intervals with inadequate coverage probabilities. A wide variety of valid methods of frequentist analysis have been devised for sequential designs comparing a single experimental treatment with a single control treatment. It is less clear how to perform the final analysis of a sequential or adaptive design applied in a more complex setting, for example, to determine which treatment or set of treatments amongst several candidates should be recommended. This article has been motivated by consideration of a trial in which four treatments for sepsis are to be compared, with interim analyses allowing the dropping of treatments or termination of the trial to declare a single winner or to conclude that there is little difference between the treatments that remain. The approach taken is based on the method of Rao-Blackwellization which enhances the accuracy of unbiased estimates available from the first interim analysis by taking their conditional expectations given final sufficient statistics. Analytic approaches to determine such expectations are difficult and specific to the details of the design: instead "reverse simulations" are conducted to construct replicate realizations of the first interim analysis from the final test statistics. The method also provides approximate confidence intervals for the differences between treatments.Entities:
Keywords: Rao-Blackwellization; adaptive designs; estimating treatment effects; multiarm trials; sequential designs
Mesh:
Year: 2020 PMID: 32207166 PMCID: PMC7217198 DOI: 10.1002/sim.8497
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.373
Figure 1The elimination and stopping rule for a single pair of treatments
Properties of the four treatment design from million‐fold simulations
| Case |
|
|
|
|
| win1 | elim4 | nod | still |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.500 | 0.400 | 0.400 | 0.400 | 1426 | 0.819 | 0.920 | 0.045 | 0.000 |
| 2 | 0.600 | 0.500 | 0.500 | 0.500 | 1427 | 0.819 | 0.919 | 0.044 | 0.000 |
| 3 | 0.692 | 0.600 | 0.600 | 0.600 | 1537 | 0.816 | 0.916 | 0.043 | 0.004 |
| 4 | 0.771 | 0.692 | 0.692 | 0.692 | 1765 | 0.802 | 0.902 | 0.039 | 0.039 |
| Mixed Case I (Cases 1‐4) | 1531 | 0.819 | 0.918 | 0.043 | 0.004 | ||||
| 5 | 0.500 | 0.500 | 0.400 | 0.400 | 1389 | 0.025 | 0.975 | 0.901 | 0.000 |
| 6 | 0.600 | 0.600 | 0.500 | 0.500 | 1411 | 0.025 | 0.975 | 0.903 | 0.000 |
| 7 | 0.692 | 0.692 | 0.600 | 0.600 | 1540 | 0.026 | 0.974 | 0.901 | 0.002 |
| 8 | 0.771 | 0.771 | 0.692 | 0.692 | 1803 | 0.026 | 0.966 | 0.885 | 0.024 |
| Mixed Case II (Cases 5‐8) | 1524 | 0.026 | 0.975 | 0.901 | 0.001 | ||||
| 9 | 0.500 | 0.500 | 0.500 | 0.400 | 1540 | 0.005 | 0.988 | 0.861 | 0.000 |
| 10 | 0.600 | 0.600 | 0.600 | 0.500 | 1583 | 0.005 | 0.988 | 0.861 | 0.000 |
| 11 | 0.692 | 0.692 | 0.692 | 0.600 | 1752 | 0.005 | 0.987 | 0.857 | 0.003 |
| 12 | 0.771 | 0.771 | 0.771 | 0.692 | 2066 | 0.005 | 0.975 | 0.814 | 0.057 |
| Mixed Case III (Cases 9‐12) | 1722 | 0.005 | 0.987 | 0.857 | 0.003 | ||||
| 13 | 0.500 | 0.500 | 0.500 | 0.500 | 1795 | 0.002 | 0.066 | 0.785 | 0.001 |
| 14 | 0.600 | 0.600 | 0.600 | 0.600 | 1862 | 0.002 | 0.066 | 0.782 | 0.005 |
| 15 | 0.692 | 0.692 | 0.692 | 0.692 | 2071 | 0.002 | 0.064 | 0.748 | 0.053 |
| 16 | 0.771 | 0.771 | 0.771 | 0.771 | 2381 | 0.001 | 0.056 | 0.591 | 0.266 |
| Mixed Case IV (Cases 13‐16) | 2028 | 0.002 | 0.066 | 0.757 | 0.036 | ||||
Note: win1 = proportion of runs in which T1 wins; elim4 = proportion of runs in which T4 is eliminated; nod = proportion of runs in which: for Cases 1‐8 and Mixed Cases I‐II, T1 and T2 are declared no different from one another; for Cases 9‐12 and Mixed Case III, T1, T2 and T3 are declared no different from one another; for Cases 13‐16 and Mixed Case IV, all treatments are declared no different from one another; still = proportion of runs in which not all treatment comparisons are resolved after 2772 responses.
Details of 12 realizations of the triangular design and of two simple forms of analysis
| Terminal data | Naïve analysis | Orderings analysis | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Case | int* |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 | 2 | 144 | 35 | 59 | −12.0 | 8.160 | 0 | 1.000 | −1.471 | −2.157 | −0.784 | 1.000 | −1.470 | −2.156 | −0.783 |
| 2 | 3 | 216 | 68 | 87 | −9.5 | 10.943 | 0 | 0.998 | −0.868 | −1.461 | −0.276 | 0.997 | −0.857 | −1.454 | −0.256 |
| 3 | 4 | 288 | 102 | 118 | −8.0 | 12.986 | 0 | 0.987 | −0.616 | −1.160 | −0.072 | 0.983 | −0.599 | −1.149 | −0.044 |
| 4 | 10 | 720 | 284 | 285 | −0.5 | 29.833 | 0 | 0.537 | −0.017 | −0.376 | 0.342 | 0.485 | 0.007 | −0.358 | 0.378 |
| 5 | 8 | 576 | 201 | 201 | 0.0 | 30.359 | 0 | 0.500 | 0.000 | −0.356 | 0.356 | 0.464 | 0.017 | −0.344 | 0.382 |
| 6 | 13 | 936 | 275 | 259 | 8.0 | 57.337 | 0 | 0.144 | 0.140 | −0.119 | 0.398 | 0.089 | 0.187 | −0.084 | 0.468 |
| 7 | 9 | 648 | 252 | 222 | 15.0 | 31.819 | 1 | 0.004 | 0.471 | 0.124 | 0.819 | 0.007 | 0.454 | 0.097 | 0.807 |
| 8 | 6 | 432 | 120 | 88 | 16.0 | 26.963 | 1 | 0.001 | 0.593 | 0.216 | 0.971 | 0.003 | 0.563 | 0.168 | 0.949 |
| 9 | 6 | 432 | 161 | 130 | 15.5 | 23.745 | 1 | 0.001 | 0.653 | 0.251 | 1.055 | 0.002 | 0.623 | 0.205 | 1.034 |
| 10 | 5 | 360 | 135 | 108 | 13.5 | 19.744 | 1 | 0.001 | 0.684 | 0.243 | 1.125 | 0.002 | 0.676 | 0.231 | 1.120 |
| 11 | 5 | 360 | 124 | 92 | 16.0 | 21.600 | 1 | 0.000 | 0.741 | 0.319 | 1.162 | 0.001 | 0.704 | 0.260 | 1.137 |
| 12 | 3 | 216 | 82 | 55 | 13.5 | 12.527 | 1 | 0.000 | 1.078 | 0.524 | 1.631 | 0.000 | 1.075 | 0.519 | 1.629 |
Note: Terminal values of the number of interim analyses, total sample size, the numbers of successes on T1 and T2, and of the statistics Z and V are shown as int*, n*, S 1*, S 2*, Z*, and V*, respectively. Patients are evenly divided between the two treatments so that n 1* = n 2* = ½n*. b* denotes the boundary crossed, with 0 denoting the lower boundary and 1 the upper boundary. For the naïve analysis, the estimated value of θ is Z*/V* with 95% confidence interval (θ L, θ U) = ( ± 1.96/√V*). The orderings analysis is based on the ordering of Fairbanks and Madsen21 and computed following.19 , 20
Analyses of the 12 realizations of the triangular design based on Rao‐Blackwellization
| Method RB1 | Method RB2 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Case |
| SE |
|
| % Complete |
| SE |
|
|
| 1 | −1.463 | 0.360 | −2.169 | −0.757 | 99.3 | −1.473 | 0.383 | −2.225 | −0.722 |
| 2 | −0.823 | 0.325 | −1.461 | −0.185 | 89.3 | −0.834 | 0.334 | −1.488 | −0.180 |
| 3 | −0.560 | 0.298 | −1.145 | 0.025 | 79.9 | −0.567 | 0.295 | −1.145 | 0.010 |
| 4 | 0.046 | 0.204 | −0.354 | 0.447 | 55.7 | 0.046 | 0.158 | −0.263 | 0.356 |
| 5 | 0.051 | 0.201 | −0.342 | 0.445 | 67.0 | 0.052 | 0.183 | −0.307 | 0.411 |
| 6 | 0.224 | 0.166 | −0.101 | 0.549 | 17.0 | 0.227 | 0.158 | −0.081 | 0.536 |
| 7 | 0.420 | 0.197 | 0.033 | 0.806 | 63.7 | 0.424 | 0.185 | 0.062 | 0.787 |
| 8 | 0.519 | 0.214 | 0.100 | 0.939 | 56.0 | 0.529 | 0.213 | 0.110 | 0.947 |
| 9 | 0.580 | 0.226 | 0.136 | 1.024 | 54.9 | 0.584 | 0.229 | 0.135 | 1.033 |
| 10 | 0.653 | 0.239 | 0.184 | 1.122 | 85.7 | 0.658 | 0.245 | 0.179 | 1.138 |
| 11 | 0.655 | 0.238 | 0.188 | 1.122 | 58.5 | 0.671 | 0.243 | 0.195 | 1.147 |
| 12 | 1.059 | 0.291 | 0.490 | 1.629 | 95.8 | 1.069 | 0.312 | 0.457 | 1.680 |
Figure 2Estimates and 95% confidence limits for θ from the Rao‐Blackwellization approaches, the orderings analysis, and the naïve approach—with the naïve estimate subtracted—plotted against the naïve estimate for Cases 1 to 12
Evaluation of the naïve and the Rao‐Blackwellization methods based on 1000‐fold simulations
| Naïve | Method RB1 | Method RB2 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| True value of | 0 | 0.246 | 0.405 | 0 | 0.246 | 0.405 | 0 | 0.246 | 0.405 |
| Estimate of | −0.069 | 0.244 | 0.459 | −0.001 | 0.248 | 0.410 | −0.006 | 0.246 | 0.408 |
| SD | 0.209 | 0.227 | 0.213 | 0.213 | 0.182 | 0.203 | 0.233 | 0.187 | 0.196 |
| SE | 0.184 | 0.154 | 0.169 | 0.209 | 0.184 | 0.197 | 0.201 | 0.175 | 0.190 |
|
| −0.430 | −0.058 | 0.128 | −0.408 | −0.113 | 0.025 | −0.399 | −0.096 | 0.034 |
|
| 0.293 | 0.546 | 0.790 | 0.410 | 0.609 | 0.795 | 0.388 | 0.589 | 0.781 |
| Probability that | 0.943 | 0.932 | 0.920 | 0.976 | 0.976 | 0.972 | 0.958 | 0.967 | 0.971 |
Raw data from a single simulation of the four treatment design
| Treatment | Interim | Centre |
|
| Sample size at each interim | Number of successes at each interim |
|---|---|---|---|---|---|---|
| 1 | 12 | 1 | 103 | 83 | 11, 18, 30, 41, 50, 57, 65, 76, 86, 92, 98, 103 | 10, 17, 27, 35, 41, 46, 53, 63, 69, 74, 78, 83 |
| 2 | 100 | 67 | 10, 16, 25, 33, 41, 49, 60, 71, 82, 88, 96, 100 | 10, 14, 20, 25, 30, 34, 40, 47, 58, 61, 65, 67 | ||
| 3 | 104 | 64 | 7, 17, 25, 35, 44, 55, 63, 68, 72, 83, 90, 104 | 6, 11, 16, 20, 26, 32, 36, 41, 43, 49, 55, 64 | ||
| 4 | 125 | 68 | 8, 21, 28, 35, 45, 55, 64, 73, 84, 97, 112, 125 | 4, 13, 15, 20, 27, 34, 38, 45, 48, 53, 62, 68 | ||
|
|
|
| ||||
| 2 | 4 | 1 | 39 | 25 | 12, 24, 31, 39 | 9, 17, 19, 25 |
| 2 | 30 | 13 | 6, 13, 25, 30 | 4, 8, 12, 13 | ||
| 3 | 35 | 21 | 7, 16, 22, 35 | 5, 11, 15, 21 | ||
| 4 | 40 | 11 | 11, 19, 30, 40 | 1, 5, 8, 11 | ||
|
|
|
| ||||
| 3 | 12 | 1 | 111 | 85 | 9, 19, 29, 39, 48, 57, 67, 74, 85, 91, 102, 111 | 8, 15, 21, 27, 33, 41, 49, 56, 65, 70, 79, 85 |
| 2 | 94 | 56 | 7, 15, 24, 32, 40, 49, 57, 64, 72, 79, 88, 94 | 5, 9, 15, 22, 28, 31, 33, 38, 44, 47, 52, 56 | ||
| 3 | 111 | 60 | 9, 17, 25, 32, 42, 50, 58, 68, 76, 90, 101, 111 | 3, 5, 8, 13, 21, 27, 31, 37, 41, 48, 55, 60 | ||
| 4 | 116 | 45 | 11, 21, 30, 41, 50, 60, 70, 82, 91, 100, 105, 116 | 4, 7, 12, 15, 18, 23, 26, 34, 37, 42, 44, 45 | ||
|
|
|
| ||||
| 4 | 5 | 1 | 50 | 32 | 9, 15, 23, 36, 50 | 5, 11, 17, 24, 32 |
| 2 | 47 | 27 | 9, 20, 32, 42, 47 | 6, 11, 16, 24, 27 | ||
| 3 | 40 | 18 | 11, 19, 28, 32, 40 | 5, 8, 12, 14, 18 | ||
| 4 | 43 | 16 | 7, 18, 25, 34, 43 | 3, 9, 10, 13, 16 | ||
|
|
|
|
Comparative data derived from Table 5
| Comparison | Interim | Centre |
|
|
| Conclusion |
|---|---|---|---|---|---|---|
| T1 vs T2 | 4 | 1 | 4.25 | 3.75 | 1.133 | T1 knocks out T2 at fourth interim |
| 2 | 5.10 | 3.76 | 1.356 | |||
| 3 | −0.50 | 4.25 | −0.118 | |||
| 4 | 5.53 | 4.53 | 1.221 | |||
|
| 14.38 | 16.28 | 0.883 | |||
| T1 vs T3 | 12 | 1 | 2.14 | 9.02 | 0.237 | T1 knocks out T3 at 12th interim |
| 2 | 3.60 | 11.24 | 0.320 | |||
| 3 | 4.02 | 13.11 | 0.307 | |||
| 4 | 9.39 | 14.98 | 0.627 | |||
|
| 19.15 | 48.35 | 0.396 | |||
| T1 vs T4 | 5 | 1 | 4.50 | 4.93 | 0.913 | T1 knocks out T4 at fifth interim |
| 2 | 3.44 | 5.00 | 0.688 | |||
| 3 | 2.95 | 5.23 | 0.564 | |||
| 4 | 5.01 | 5.49 | 0.912 | |||
|
| 15.91 | 20.64 | 0.771 | |||
| T2 vs T3 | 4 | 1 | −1.00 | 4.33 | −0.231 | No conclusion |
| 2 | −3.94 | 3.81 | −1.034 | |||
| 3 | 3.23 | 4.18 | 0.773 | |||
| 4 | −1.84 | 4.41 | −0.417 | |||
|
| −3.54 | 16.73 | −0.212 | |||
| T2 vs T4 | 4 | 1 | −0.48 | 4.24 | −0.113 | No conclusion |
| 2 | −2.42 | 4.37 | −0.554 | |||
| 3 | 2.72 | 4.17 | 0.652 | |||
| 4 | −1.97 | 4.03 | −0.489 | |||
|
| −2.15 | 16.81 | −0.128 | |||
| T3 vs T4 | 5 | 1 | 1.16 | 5.47 | 0.212 | No conclusion |
| 2 | 2.71 | 5.02 | 0.540 | |||
| 3 | 1.02 | 5.11 | 0.200 | |||
| 4 | −0.28 | 5.36 | −0.052 | |||
|
| 4.62 | 20.97 | 0.220 |
Analyses of the data from the single simulated run of the sequential four treatment comparison shown in Tables 5 and 6
| Naïve | RB2 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Comparison |
| SE |
|
| Proportion complete |
| SE |
|
|
| T1 vs T2 | 0.883 | 0.248 | 0.347 | 1.319 | 0.7381 | 0.869 | 0.286 | 0.309 | 1.429 |
| T1 vs T3 | 0.396 | 0.144 | 0.114 | 0.678 | 0.0199 | 0.405 | 0.220 | −0.027 | 0.837 |
| T1 vs T4 | 0.771 | 0.220 | 0.340 | 1.202 | 0.3050 | 0.667 | 0.256 | 0.165 | 1.169 |
| T2 vs T3 | −0.212 | 0.244 | −0.690 | 0.266 | 0.7381 | −0.167 | 0.255 | −0.667 | 0.333 |
| T2 vs T4 | −0.128 | 0.244 | −0.606 | 0.350 | 0.7381 | −0.069 | 0.249 | −0.557 | 0.418 |
| T3 vs T4 | 0.220 | 0.218 | −0.207 | 0.647 | 0.3050 | 0.165 | 0.225 | −0.277 | 0.606 |
Note: In the naïve analyses, the sequential nature of the trial is ignored. The Rao‐Blackwellization method, RB2, is based on 10 million replicate reverse simulations, the proportion of these that were complete is shown in the sixth column.
Evaluation of the naïve method and the Rao‐Blackwellization method RB2 in the four treatment case
| Method | Naïve | RB2 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Comparison | T1 vs T2 | T1 vs T3 | T1 vs T4 | T2 vs T3 | T2 vs T4 | T3 vs T4 | T1 vs T2 | T1 vs T3 | T1 vs T4 | T2 vs T3 | T2 vs T4 | T3 vs T4 |
| True value of | 0.693 | 0.405 | 0.405 | −0.288 | −0.288 | 0.000 | 0.693 | 0.405 | 0.405 | −0.288 | −0.288 | 0.000 |
| Estimate of | 0.770 | 0.472 | 0.468 | −0.293 | −0.297 | −0.004 | 0.695 | 0.421 | 0.414 | −0.281 | −0.288 | −0.007 |
| SD | 0.210 | 0.192 | 0.195 | 0.210 | 0.209 | 0.189 | 0.240 | 0.212 | 0.218 | 0.218 | 0.212 | 0.194 |
| SE | 0.193 | 0.161 | 0.161 | 0.197 | 0.196 | 0.167 | 0.254 | 0.222 | 0.221 | 0.214 | 0.214 | 0.187 |
|
| 0.393 | 0.155 | 0.153 | −0.679 | −0.682 | −0.331 | 0.197 | −0.013 | −0.020 | −0.701 | −0.707 | −0.373 |
|
| 1.148 | 0.788 | 0.783 | 0.092 | 0.087 | 0.323 | 1.193 | 0.855 | 0.847 | 0.139 | 0.131 | 0.359 |
| Probability
that | 0.937 | 0.929 | 0.920 | 0.945 | 0.950 | 0.932 | 0.955 | 0.968 | 0.965 | 0.957 | 0.964 | 0.971 |
Note: Both evaluations are based on 1000‐fold simulations and each RB2 analysis employed 1 000 000 reverse simulations. The RB2 results are based on the 897 replicates in which 1000 or more reverse simulations were complete.
Properties of the simpler four treatment design from million‐fold simulations
| Case |
|
|
|
|
| win1 | elim4 | nod | still |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.600 | 0.400 | 0.400 | 0.400 | 377 | 0.826 | 0.923 | 0.043 | 0.000 |
| 2 | 0.600 | 0.600 | 0.400 | 0.400 | 377 | 0.026 | 0.977 | 0.904 | 0.000 |
| 3 | 0.600 | 0.600 | 0.600 | 0.400 | 422 | 0.005 | 0.989 | 0.860 | 0.001 |
| 4 | 0.500 | 0.500 | 0.500 | 0.500 | 491 | 0.002 | 0.066 | 0.772 | 0.018 |
| 5 | 0.600 | 0.600 | 0.600 | 0.600 | 480 | 0.002 | 0.072 | 0.768 | 0.000 |
Note: win1 = proportion of runs in which T1 wins; elim4 = proportion of runs in which T4 is eliminated; nod = proportion of runs in which: for Cases 1 and 2, T1 and T2 are declared no different from one another; for Case 3, T1, T2, and T3 are declared no different from one another; for Cases 4 and 5, all treatments are declared no different from one another; still = proportion of runs in which not all treatment comparisons are resolved after 640 responses.
Evaluation of the naïve method and the Rao‐Blackwellization method RB2 for the simpler four treatment design
| Method | Naïve | RB2 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Comparison | T1 vs T2 | T1 vs T3 | T1 vs T4 | T2 vs T3 | T2 vs T4 | T3 vs T4 | T1 vs T2 | T1 vs T3 | T1 vs T4 | T2 vs T3 | T2 vs T4 | T3 vs T4 |
| True value of | 1.099 | 0.811 | 0.811 | −0.288 | −0.288 | 0.000 | 1.099 | 0.811 | 0.811 | −0.288 | −0.288 | 0.000 |
| Estimate of | 1.177 | 0.907 | 0.909 | −0.291 | −0.286 | 0.003 | 1.075 | 0.804 | 0.796 | −0.290 | −0.294 | −0.006 |
| SD | 0.332 | 0.323 | 0.343 | 0.369 | 0.385 | 0.367 | 0.400 | 0.357 | 0.399 | 0.397 | 0.433 | 0.393 |
| SE | 0.335 | 0.308 | 0.309 | 0.365 | 0.366 | 0.333 | 0.424 | 0.397 | 0.396 | 0.406 | 0.406 | 0.374 |
|
| 0.519 | 0.303 | 0.304 | −1.008 | −1.003 | −0.651 | 0.243 | 0.025 | 0.020 | −1.085 | −1.090 | −0.739 |
|
| 1.834 | 1.511 | 1.514 | 0.425 | 0.432 | 0.656 | 1.907 | 1.583 | 1.573 | 0.505 | 0.501 | 0.728 |
| Probability that | 0.961 | 0.949 | 0.932 | 0.957 | 0.947 | 0.943 | 0.971 | 0.972 | 0.952 | 0.971 | 0.954 | 0.959 |
Note: Both evaluations are based on 1000‐fold simulations and each RB2 analysis employed 1 000 000 reverse simulations. The RB2 results are based on the 989 replicates in which 1000 or more reverse simulations were complete.