| Literature DB >> 35125091 |
Katie Pike1, Barnaby C Reeves2, Chris A Rogers2.
Abstract
BACKGROUND: Opinions and practices vary around the issue of performing multiple statistical tests in randomised controlled trials (RCTs). We carried out a study to collate information about opinions and practices using a methodological rapid review and a survey, specifically of publicly funded pragmatic RCTs that are not seeking marketing authorisation. The aim was to identify the circumstances under which researchers would make a statistical adjustment for multiplicity.Entities:
Keywords: Multiple testing; Randomised controlled trials; Rapid review; Survey
Mesh:
Year: 2022 PMID: 35125091 PMCID: PMC8818238 DOI: 10.1186/s12874-022-01525-9
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Fig. 1Review: screening process
Review: study characteristics
| Characteristic | n/N | % | |
|---|---|---|---|
| Journal | Annals of Internal Medicine | 4/138 | 3% |
| BMJ | 11/138 | 8% | |
| JAMA | 28/138 | 20% | |
| Lancet | 30/138 | 22% | |
| NEJM | 36/138 | 26% | |
| NIHR HTA journal library | 17/138 | 12% | |
| PlosMED | 12/138 | 9% | |
| Trial designa | Parallel group: 2 treatment groups | 94/138 | 68% |
| Parallel group: > 2 treatment groups | 18/138 | 13% | |
| Cluster randomised | 23/138 | 17% | |
| Crossover | 2/138 | 1% | |
| Factorial | 5/138 | 4% | |
| Stepped wedge | 2/138 | 1% | |
| Non-inferiority | 18/138 | 13% | |
| Equivalence | 1/138 | 1% | |
| Total number of randomised participants - median (IQR) | 574 | (312, 2043) | |
| Primary outcomeb | More than one outcome stated | 21/138 | 15% |
| Two outcomes | 17 | ||
| Three outcomes | 4 | ||
| More than one comparison made | 28/138 | 20% | |
| Two comparisons | 16 | ||
| Three comparisons | 4 | ||
| Four comparisons | 2 | ||
| Five comparisons | 1 | ||
| >Five comparisons (maximum 20) | 5 | ||
| Secondary outcome | More than one outcome stated | 134/138 | 97% |
| Median (IQR) outcomes stated | 8 (5, 13) | ||
| More than one comparison made | 132/138 | 96% | |
| Median (IQR) comparisons made | 14.5 (7, 26) | ||
| More than two treatment groupsc | 23/138 | 17% | |
| Number of treatment comparisons made | Oned | 1 | |
| Two | 13 | ||
| Three | 6 | ||
| Four | 1 | ||
| Five | 1 | ||
| Eight | 1 | ||
| Any subgroup analyses performed | 85/138 | 62% | |
| Median (IQR) subgroup analyses | 4 (2, 7) | ||
| Any interim analyses performed | 38/135 | 28% | |
| One | 22 | ||
| Two | 9 | ||
| Three | 3 | ||
| Four | 3 | ||
| Five | 1 | ||
Notes: a Trials could be classified in more than one design category, e.g. a cluster randomised, factorial, non-inferiority trial.
bDiscrepancies between the numbers of outcomes stated and comparisons made were either due to multiple time points being analysed or multiple analysis approaches taken, with none stated as primary.
cThis includes parallel group trials with > 2 treatment groups and factorial trials.
dIn this trial one treatment arm was dropped due to futility at an interim analysis, so the final analysis comprised just two treatment groups and therefore one comparison.
Abbreviations: BMJ British Medical Journal, IQR interquartile range, JAMA Journal of the American Medical Association, NEJM New England Journal of Medicine, NIHR HTA National Institute of Health Research Health Technology Assessment, PlosMED Public Library of Science Medicine
Approach to multiplicity due to multiple outcomes
| All outcomes to be declared effective | 0/8 | 2/8a | 0/8 | 6/8 |
| One or more outcomes to be declared effectiveb | 4/20c | 1/20d | 2/20e | 13/20 |
| Secondary outcomes | 14/136f (10%) | 1/136g (1%) | 2/136h (1%) | 119/136 (88%) |
| Consider a parallel group trial with two primary outcomes. Would you adjust for multiplicity in the following scenarios? | ||||
| The trial hypotheses require both null hypotheses to be rejected? | 9/27 (33%) | 16/27 (59%) | 2/27 (7%) | |
| The trial hypotheses require either null hypothesis to be rejected? | 16/27 (59%) | 8/27 (30%) | 3/27 (11%) | |
| Would you adjust for multiplicity arising from multiple secondary outcomes? | 3/27 (11%) | 20/27 (74%) | 4/27 (15%) | |
| Would the type of outcomes (efficacy, safety, cost-effectiveness) have an impact on your response to the above question? | 9/27 (33%) | 17/27 (63%) | 1/27 (4%) | |
Notes: aBoth trials sequentially tested two outcomes
b Includes 13 trials that stated multiple primary outcomes in the methods section, and seven that stated only one outcome but made multiple comparisons
c Two trials performed a Holm correction, and two implemented a graphical multiple testing procedure
d Trial sequentially tested non-inferiority then superiority
e One trial recommended that p-values between 0.025 and 0.05 were considered to have borderline significance; one trial performed post-hoc analysis of the primary outcomes with one-sided 97.5% confidence intervals
f Three trials performed a Bonferroni correction, five a Holm correction, one a Hochberg correction, three used a 1% threshold for significance and two used a graphical method
g Formal hypothesis testing was only performed for secondary outcomes if the primary efficacy outcome was statistically significant
h Formal hypothesis testing was only performed for a small number of key secondary outcomes, other secondary outcomes were just presented descriptively
Approach to multiplicity due to multiple treatment comparisons
| Related treatments | 7/15a | 1/15b | 1/15c | 6/15 |
| Distinct treatments | 2/8d | 0/8 | 0/8 | 6/8 |
| Would you consider adjusting for multiplicity arising from making multiple treatment comparisons? | 24/27 (89%) | 1/27 (4%) | 2/27 (7%) | |
| Consider a parallel group trial with three treatment arms, where all comparisons are of interest. Would you adjust for multiplicity in the following scenarios? | ||||
| Two of the treatment arms are related, e.g. Group 1 = placebo, Group 2 = low drug dose, Group 3 = high drug dose | 22/27 (81%) | 1/27 (4%) | 4/27 (15%) | |
| The three treatment arms are unrelated, including one placebo arm, e.g. Group 1 = placebo, Group 2 = drug, Group 3 = exercise | 16/27 (59%) | 7/27 (26%) | 4/27 (15%) | |
| The three treatment arms are unrelated, but all are active treatments, e.g. Group 1 = drug, Group 2 = exercise, Group 3 = education | 19/27 (70%) | 6/27 (22%) | 2/27 (7%) | |
| Would you be more likely to adjust for multiplicity if the number of treatment arms was increased? | 12/27 (44%) | 12/27 (44%) | 3/27 (11%) | |
Notes: a Three trials performed a Bonferroni correction, one a Holm correction, two a Hochberg correction and one used a 1% significance level for all treatment comparisons
b Two treatment comparisons were split into primary and secondary hypotheses and analysed in a hierarchical manner
c A post-hoc Bonferroni correction was performed, although this was not the primary analysis for the trial
d One trial performed a Bonferroni correction and one used Dunnett’s procedure
Approach to multiplicity due to subgroup analyses
| Subgroup analyses | 8/85a (9%) | 0/85 (0%) | 0/85 (0%) | 77/85b (91%) |
| Would you consider adjusting for multiplicity arising from performing multiple subgroup analyses? | 6/27 (22%) | 17/27 (63%) | 4/27 (15%) | |
| Consider a parallel group trial with multiple subgroup analyses performed. Would you adjust for multiplicity in the following scenarios? | ||||
| Subgroup analyses pre-specified in the study protocol? | 3/27 (11%) | 22/27 (81%) | 2/27 (7%) | |
| Subgroup analyses determined post-hoc? | 4/27 (15%) | 22/27 (81%) | 1/27 (4%) | |
| Subgroup analyses specified for the following reasons: a) to confirm biological plausibility, b) to confirm existing hypotheses, AND c) to show subgroup effects for supporting decision making in target populations. | 3/27 (11%) | 19/27 (70%) | 5/27 (19%) | |
| Would you be more likely to adjust for multiplicity if the number of subgroup analyses was increased? | 5/27 (19%) | 21/27 (78%) | 1/27 (4%) | |
Notes: a One trial performed a Bonferroni correction, two a Holm correction and five studies used a threshold of 1% for significance
b Of these, five studies stated that results from secondary outcomes were exploratory/hypothesis generating
Approach to multiplicity due to interim analyses
| Interim analyses | 26/41a (63%) | 0/41 (0%) | 2/41b (5%) | 13/41c (32%) |
| Would you adjust for multiplicity if interim analysis(es) were pre-specified in the study protocol? | 8/27 (30%) | 12/27 (44%) | 3/27 (11%) | 4/27 (15%) |
Notes: a Eight trials used the Haybittle-Peto procedure, eight used O’Brien-Fleming, seven partitioned the significance level between final and interim analyses (with no further details given), one used Pocock, one used Lan DeMets and one did not give details.
bOne trial used a group sequential design and one used a conditional rejection probability approach.
cOf these, three trials stated a pre-specified significance level for stopping the trial.
Survey: comments
| Category | Example comments |
|---|---|
Abbreviations: CI Chief Investigator, DMC Data Monitoring Committee