| Literature DB >> 28282371 |
Konrad Neumann1, Ulrike Grittner1,2, Sophie K Piper1,2,3, Andre Rex2,4, Oscar Florez-Vargas5, George Karystianis6, Alice Schneider1,2, Ian Wellwood2,7, Bob Siegerink2,8, John P A Ioannidis9, Jonathan Kimmelman10, Ulrich Dirnagl2,3,4,8,11,12.
Abstract
Despite the potential benefits of sequential designs, studies evaluating treatments or experimental manipulations in preclinical experimental biomedicine almost exclusively use classical block designs. Our aim with this article is to bring the existing methodology of group sequential designs to the attention of researchers in the preclinical field and to clearly illustrate its potential utility. Group sequential designs can offer higher efficiency than traditional methods and are increasingly used in clinical trials. Using simulation of data, we demonstrate that group sequential designs have the potential to improve the efficiency of experimental studies, even when sample sizes are very small, as is currently prevalent in preclinical experimental biomedicine. When simulating data with a large effect size of d = 1 and a sample size of n = 18 per group, sequential frequentist analysis consumes in the long run only around 80% of the planned number of experimental units. In larger trials (n = 36 per group), additional stopping rules for futility lead to the saving of resources of up to 30% compared to block designs. We argue that these savings should be invested to increase sample sizes and hence power, since the currently underpowered experiments in preclinical biomedicine are a major threat to the value and predictiveness in this research domain.Entities:
Mesh:
Year: 2017 PMID: 28282371 PMCID: PMC5345756 DOI: 10.1371/journal.pbio.2001307
Source DB: PubMed Journal: PLoS Biol ISSN: 1544-9173 Impact factor: 8.029
Fig 1Study design and sequential analysis approach allowing two interim analyses.
Stage 1: 33% of samples acquired, stage 2: 66% of samples acquired, and stage 3: 100% of samples acquired. H0: null hypothesis, P: p-value, Credible interval: specific Bayesian interval of certainty about an estimate, d: effect size Cohen’s d, αi: significance levels for each stage derived from [11] α1 = 0.0006, α2 = 0.0151, α 3 = 0.0471. Additionally, we used a Bayes factor approach (Table 1) and Pocock boundaries for the frequentist approach (S1 Table). All sequential approaches used were calibrated by using simulations to get a type I error of about 5%.
Early stopping for significance or futility using nonsequential group sequential designs (examples with n = 36 or n = 72).
| Small study ( | Larger study ( | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sample size (per group) | Freq. nonseq. | Freq. seq. | Bayes Factor | Bayes (CRI) with noninf. Prior | Sample size (per group) | Freq. nonseq. | Freq. seq. | Bayes Factor | Bayes (CRI) with noninf. Prior | |
| 12 (6 versus 6) | - | 0.1 | 2.3 | 0.4 | 36 (18 versus 18) | - | 0.8/50.7 | 3.5/70.7 | 1.1/50.1 | |
| 36 | 36 | 36 | 36 | 72 | 53 | 45 | 54 | |||
| 12 (6 versus 6) | - | 0.4 | 6.4 | 1.0 | 36 (18 versus 18) | 9.7/18.8 | 25.6/32.5 | 11.2/18.5 | ||
| 36 | 35 | 34 | 34 | 72 | 62 | 51 | 61 | |||
| 12 (6 versus 6) | - | 1.6 | 22.8 | 4.5 | 36 (18 versus 18) | 54.4/0.9 | 78.3/2.8 | 57.7/0.8 | ||
| 36 | 31 | 27 | 28 | 72 | 52 | 43 | 51 | |||
Simulations based on a total number of 18 or 36 samples per group. Power or type I error for three different standardized effect sizes Cohen’s d = 0, or 0.5, or 1.0, respectively. Numbers give cumulative percentages of statistically significant trials in percent [%] out of 10,000 simulation runs, as well as “Costs” defined as the long term mean of experimental units, and median estimated effect sizes in significant trials (dest). Small study with Stage 1: n = 12 (6 versus 6), stage 1 and 2: n = 24 (12 versus 12), stage 1 and 2 and 3: n = 36 (18 versus 18) experimental units. Stopping rules that allowed early stopping: Freq. nonseq.: α = 0.05; Freq. seq.: significance levels for interim analyses: α1 = 0.0006, α2 = 0.0151, α3 = 0.0471 according to [11]; Bayes Factor: 3 for each stage; Bayes noninf. prior: CRI for effect size: stage 1: 99.8% CRI, stage 2 and 3: 96.8% CRI.
Larger study with Stage 1: n = 36 (18 versus 18), stage 1 and 2: n = 72 (36 versus 36) experimental units. Stopping rules that allowed early stopping for futility or significance: Freq. nonseq.: α = 0.05; Freq seq. [11]: αfutility = 0.5, α1 = 0.0065, α2 = 0.0525; Bayes Factor: 2 and for futility: 0.5; CRI for effect size d using a Bayesian approach with noninf. prior: stage 1 99% CRI, for futility: zero is included in 50% CRI for effect size d, stage 2 95% CRI.
All sequential approaches used were calibrated to get a type I error of about 5%.
Abbreviations: CRI, credible interval; Freq. nonseq., Frequentist nonsequential; Freq. seq., Frequentist sequential; Noninf., Noninformative.
Fig 2Predictive capabilities of sequential designs compared to traditional nonsequential design for two different scenarios of potential effect size distributions.
Upper left: “optimistic” scenario with more large effect sizes. Upper right: “pessimistic” scenario with mostly effect sizes of 0. Bottom: Probability of getting a significant test result reflecting a true effect of d ≠ 0 or d ≥ 0.5, respectively, for the two different scenarios of effect size distributions. First, the probabilities P(significant) for getting any significant study results are given, then the corresponding positive predictive values, and, finally, the product of both giving the corresponding overall probability of getting a significant study result that truly represents an effect of d ≠ 0 or d ≥ 0.5 (Pdetect true effect). Stopping rules that allowed early stopping for futility or success as given in Table 1.