Janet Wittes1. 1. Statistics Collaborative, Inc., Washington, DC, USA. janet@statcollab.com
Abstract
BACKGROUND: This article addresses a problem arising when a trial shows such strong evidence of benefit of the tested intervention that it stops early with an observed effect size for the experimental treatment that is statistically significantly better than the control. Within the classical frequentist framework of group sequential trials, the observed estimated effect size, the associated naïve confidence interval, and the p-value are all biased estimates of the true values. The bias is in the direction of the overestimation of the treatment effect, creation of narrower confidence intervals than appropriate, and a p-value that is too small. PURPOSE: To discuss methods for correcting the bias in observed effect sizes, confidence intervals, and p-values for trials stopped early and to show the extent to which such correction would have modified the conclusions of the Randomized Aldactone Evaluation Study (RALES). RESULTS: In RALES, the effect of not correcting for bias is negligible. LIMITATIONS: This article does not show general results; it only explores a few examples that use conservative methods for early stopping. It does not consider sequential methods that allow a relatively high probability of stopping early. CONCLUSIONS: This article points out that there is no unique solution to the correction of the p-value, but it recommends stagewise ordering, which states that earlier stopping of a trial is ipso facto stronger evidence of effect than later stopping so long as the stopping is governed by a monitoring boundary that preserves the Type I error rate. Associated with stagewise ordering is a method for calculating the estimated effect size and its confidence interval. In the RALES trial, which stopped at 50% information time, the corrections to the estimated values are small.
BACKGROUND: This article addresses a problem arising when a trial shows such strong evidence of benefit of the tested intervention that it stops early with an observed effect size for the experimental treatment that is statistically significantly better than the control. Within the classical frequentist framework of group sequential trials, the observed estimated effect size, the associated naïve confidence interval, and the p-value are all biased estimates of the true values. The bias is in the direction of the overestimation of the treatment effect, creation of narrower confidence intervals than appropriate, and a p-value that is too small. PURPOSE: To discuss methods for correcting the bias in observed effect sizes, confidence intervals, and p-values for trials stopped early and to show the extent to which such correction would have modified the conclusions of the Randomized Aldactone Evaluation Study (RALES). RESULTS: In RALES, the effect of not correcting for bias is negligible. LIMITATIONS: This article does not show general results; it only explores a few examples that use conservative methods for early stopping. It does not consider sequential methods that allow a relatively high probability of stopping early. CONCLUSIONS: This article points out that there is no unique solution to the correction of the p-value, but it recommends stagewise ordering, which states that earlier stopping of a trial is ipso facto stronger evidence of effect than later stopping so long as the stopping is governed by a monitoring boundary that preserves the Type I error rate. Associated with stagewise ordering is a method for calculating the estimated effect size and its confidence interval. In the RALES trial, which stopped at 50% information time, the corrections to the estimated values are small.
Authors: Munyaradzi Dimairo; Philip Pallmann; James Wason; Susan Todd; Thomas Jaki; Steven A Julious; Adrian P Mander; Christopher J Weir; Franz Koenig; Marc K Walton; Jon P Nicholl; Elizabeth Coates; Katie Biggs; Toshimitsu Hamasaki; Michael A Proschan; John A Scott; Yuki Ando; Daniel Hind; Douglas G Altman Journal: BMJ Date: 2020-06-17
Authors: Munyaradzi Dimairo; Philip Pallmann; James Wason; Susan Todd; Thomas Jaki; Steven A Julious; Adrian P Mander; Christopher J Weir; Franz Koenig; Marc K Walton; Jon P Nicholl; Elizabeth Coates; Katie Biggs; Toshimitsu Hamasaki; Michael A Proschan; John A Scott; Yuki Ando; Daniel Hind; Douglas G Altman Journal: Trials Date: 2020-06-17 Impact factor: 2.279
Authors: Abigail Stevely; Munyaradzi Dimairo; Susan Todd; Steven A Julious; Jonathan Nicholl; Daniel Hind; Cindy L Cooper Journal: PLoS One Date: 2015-11-03 Impact factor: 3.240
Authors: Amalia S Magaret; Shevin T Jacob; M Elizabeth Halloran; Katherine A Guthrie; Craig A Magaret; Christine Johnston; Noah R Simon; Anna Wald Journal: Ann Intern Med Date: 2020-06-11 Impact factor: 51.598