| Literature DB >> 26514920 |
Esther de Hoop1, Ingeborg van der Tweel2, Rieke van der Graaf3, Karel G M Moons4, Johannes J M van Delden5, Johannes B Reitsma6, Hendrik Koffijberg7.
Abstract
BACKGROUND: Various papers have addressed pros and cons of the stepped wedge cluster randomized trial design (SWD). However, some issues have not or only limitedly been addressed. Our aim was to provide a comprehensive overview of all merits and limitations of the SWD to assist researchers, reviewers and medical ethics committees when deciding on the appropriateness of the SWD for a particular study.Entities:
Mesh:
Year: 2015 PMID: 26514920 PMCID: PMC4627408 DOI: 10.1186/s12874-015-0090-2
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Fig. 1Illustration of the stepped wedge design, where different (groups of) clusters switch from control to intervention at different time points
Key general characteristics of the stepped wedge design and their implications
| Characteristic | Implication |
|---|---|
| Randomization is usually at the cluster level | Statistical analyses need to take into account that measurements of subjects within a cluster may be correlated |
| Concealment of allocation will not always be possible. Blinding of outcome assessment is therefore more difficult to achieve | |
| Cross-over element: each cluster will switch from control to experimental intervention | The cross-over allows for a within-cluster comparison which may increase statistical power |
| Sample size calculations as well as analyses become more complex | |
| Two subtypes: | |
| - switch involves the same patients (cohort-type) | Cohort-type SWD allow for within-patient comparison, which may further increase efficiency, but critical evaluation whether carry-over effects may compromise the results of the study is necessary |
| - switch involves different patients (cross-sectional type) | |
| Switch from control to experimental intervention is spread over calendar time | A research team can plan and execute the switch in treatment in a dedicated way as not all clusters switch at the same point in time |
| Interim analyses need to take into account that the number of measurements in the control and intervention groups are very imbalanced at early stages and will only be comparable at the end of the study | |
| It offers the possibility to assess changes in cost-effectiveness over time when the uptake of interventions is difficult or slow due to implementation barriers that need to be overcome | |
| A study with an SWD may need a relatively long time to complete | |
| All clusters will experience the experimental intervention | This feature may enhance participation of clusters in the study |
| The switch in each cluster allows investigation and monitoring of implementation problems | |
| Fixed design in which all clusters start at the same point in time and all steps have the same time span | Preparations for data collection need to be finished in each hospital which can easily delay the start of the study |
| Lower than anticipated inclusion rates increase the risk for an underpowered study as solutions like adding more clusters or extending the length of the remaining steps seriously affect the design and are not recommended |
Comparing the SWD to the parallel group CRT: aspects of study design and preparation
| Aspect | Issue | Description | |
|---|---|---|---|
| Equipoise | ~ | (2a)* | An SWD may be used in a situation where there is a slight preference for the experimental treatment [ |
| † Social value | ~ | (2b)* | A study with an SWD may benefit fewer individuals after completion since it typically takes longer to complete. However, this disadvantage may be offset by faster implementation following the SWD |
| † Implemen-tation decisions | - | (2c)* | If evidence on the cost-effectiveness of a new intervention is lacking, collecting this evidence may be valuable to support implementation decisions. However, deimplementation following a negative result has worse consequences for SWDs than for parallel group CRTs |
| Disease | - | (2d) | An SWD is not the design of choice for a study in a rapidly spreading disease. A pandemic requires an efficient, short-term design and analysis [ |
| Study design | ~ | (2e) | An SWD might be logistically easier because of the phased implementation of the intervention rather than implementation of the intervention at (often) half of the clusters simultaneously in a parallel group CRT [ |
| † | + | (2f) | The SWD offers the possibility to assess cost-effectiveness over time when the uptake of the intervention is difficult or slow. Even though statistical power to assess time trends may be relatively low, compared to parallel group CRTs the SWD allows a more accurate assessment of the actual long-term costs and effects after implementation barriers have been overcome |
| ~ | (2g) | An SWD may take longer to complete [ | |
| † | - | (2h)* | In an SWD it will be difficult to compare more than 2 treatments whereas in a parallel group CRT more treatment arms can be added rather easily. Implementing more than 2 treatments may also be of questionable use in an SWD |
| Sample size | ~ | (2i)* | An SWD may require fewer clusters than a parallel group CRT [ |
| ~ | (2j)* | An SWD may require a larger total number of subjects and/or measurements than a parallel group CRT, depending on cluster size, intracluster correlation (ICC) and number of measurement periods [ | |
| † | ~ | (2k) | The effect of incorporating interim analyses on the total sample size for an SWD is not clear yet |
| Power | ~ | (2l) | An SWD may have more power than a parallel group CRT due to an increase in the amount of data collected and the possibility of within-cluster comparisons [ |
| + | (2m) | The ICC has only a minimal effect on power within an SWD (at least in the cross-sectional design) [ | |
| Participation | + | (2n) | Clusters may be more willing to participate in an SWD as each cluster will switch to the new (promising) intervention during the study [ |
| Timing of outcome | - | (2o) | The time between steps in an SWD should be long enough to detect a treatment effect [ |
+: positive, −: negative, ~: similar consequences/context dependent, *: discussed in results section, †: newly identified aspect
Comparing the SWD to the parallel group CRT: aspects of study execution
| Aspect | Issue | Description | |
|---|---|---|---|
| Informed consent | ~ | (3a)* | May be difficult to obtain from subjects at the start of the study [ |
| In cross-sectional SWDs the informed consent is in essence similar to that of a parallel group CRT. In cohort SWDs participants will have to understand that the moment of receiving the new intervention is being randomized | |||
| Study participation | - | (3b)* | An SWD may have increased risk of drop-outs and drop-ins (contamination) [ |
| † Inclusion rate | - | (3c)* | An SWD suffers relatively more from low inclusion rates because adding a cluster or extending the steps during the trial disrupts the symmetry of the design |
| † Study duration | ~ | (3d) | The possible longer study duration of SWDs might require interim analyses to avoid long exposure of clusters of participants to suboptimal care when the new intervention would be clearly inferior/superior to usual care. The statistical analysis aspects of interim analyses in an SWD are, however, still unclear |
| † Number of measurements | - | (3e) | If collecting data on health outcomes or costs is expensive, it may not be feasible to collect health economic evidence at each time point (step) in a cohort (longitudinal) SWD. This is particularly relevant if the number of steps (and hence number of measurements per participant), would be large. Even though a similar parallel group CRT would require more participants it might require fewer measurements in total [ |
| - | (3f) | Repeated measurements within the SWD may lead to a higher burden on everyone involved in the study. In the cross-sectional setting, this will not be a problem for individual participants, but may still be for research personnel [ | |
| Blinding | - | (3g) | Blinding of participants and care providers is often impossible within SWD, however this also holds for the parallel group CRT. Hence, blinding of assessors of the outcomes is advised [ |
| Improving intervention | ~ | (3h) | Within the SWD it is possible to improve the intervention during the study, though it is questionable whether it is desirable to do so [ |
+: positive, −: negative, ~: similar consequences/context dependent, *: discussed in results section, †: newly identified aspect
Comparing the SWD to the parallel group CRT: aspects of data analysis and interpretation
| Aspect | Issue | Description | |
|---|---|---|---|
| Effect estimate | - | (4a) | In an SWD, the unidirectional crossover strategy complicates the statistical analysis [ |
| + | (4b)* | In an SWD the effect measure of interest (e.g. difference in means or relative risk) can be calculated for each cluster, and the (in)consistency in effect estimates across clusters can be examined [ | |
| † | + | (4c) | In an SWD learning and decay effects over can be assessed over time, i.e. due to more experience with the intervention outcomes may become better over time. However it could also be that the intervention is well adopted just after implementation but ‘forgotten’ about after some time (e.g. if the intervention consists of new guidelines) |
| † Interim analyses | - | (4d)* | Interim analyses within an SWD are less efficient due to the unequal numbers of measurements under the different treatment arms during the study. For parallel group CRTs these numbers are more comparable during the entire trial period |
| † Number of measurements | + | (4e) | Collecting evidence on outcomes at several time steps may allow assessment of the (changes in) these outcomes during a longer follow-up period in those clusters that crossed over early in the study. This might benefit subsequent statistical and health economic analyses, for example, when extrapolating beyond the trial horizon |
| † Unrelated studies | + | (4f) | Collecting health economic evidence in an SWD might also provide insight into general barriers and facilitators to implementation and into changes in cost-effectiveness when moving from a clinical to a routine care setting. In an SWD more evidence on implementation is collected than in parallel group CRTs, as the process of implementing the new intervention can be observed during the study, for all clusters, as opposed to parallel group CRTs where half of the clusters do not get the intervention during the study, and studying changes in implementation over time is more limited. This additional evidence might be valuable in the design and execution of other studies, for example, studies on other interventions in the same disease area |
+: positive, −: negative, ~: similar consequences/context dependent, *: discussed in results section, †: newly identified aspect
Key aspects of the stepped wedge design in the HEART study
| Aspect | Issue | Description |
|---|---|---|
| Implemen-tation decisions | (5a)* | Based on the results of previous validation studies it is likely that the HEART score will be cost-effective if applied correctly (including adherence to management recommendations). The SWD has the benefit to demonstrate the value of the HEART score in real practice and problems in implementation can be observed and documented in each cluster |
| (5b)* | When a formal decision would be made on nation-wide implementation of the HEART score based on cost-effectiveness estimates from the HEART Impact study, the costs of disinvestment (de-implementation) have to be considered. As the intervention under investigation is the use of a clinical prediction model disinvestment costs are likely to be very small and not much larger in the SWD than they would have been in a parallel group CRT design | |
| Equipoise | (5c) | Earlier validation studies have demonstrated the ability of the HEART score to stratify patients with chest pain according their risk of having a serious heart condition. However, it is unclear whether actively using the HEART score in practice will indeed be safe and improve health care in terms of health care resources, patient burden and costs |
| Participation | (5d) | The use of risk scores in chest pain patients is recommended in (Dutch) guidelines. The SWD was attractive for hospitals as each hospital would experience using the HEART score during the trial |
| Preparation | (5e) | Inclusion in the HEART study started almost a year later than planned as all hospitals need to start at same time. Procedures in 1 hospital were slow, which contributed to the delayed start |
| Informed consent | (5f)* | No informed consent from patients was sought to determine HEART score |
| Informed consent was asked from patients to collect additional data | ||
| Timing of consent: during the initial evaluation by the treating physicians at the emergency department | ||
| Study design | (5g) | A mix of hospitals (with respect to size, city and rural, academic and non-academic) participates in the HEART study leading to differences in population and standard of care between hospitals. The SWD allows for a within-hospital comparison reducing the impact of these differences |
| (5h) | A mix of hospitals participates in the HEART study leading to noteworthy variation in numbers of included patients per hospital which has not been taken into account in the sample size calculation | |
| Blinding | (5i) | The primary outcome is major adverse cardiac events (MACE), which has some subjective elements. There will be an adjudication committee blinded for intervention period for the main endpoints |
| Interim analyses | (5j) | The HEART study has been classified as a low-risk trial. Therefore, no formal interim analyses are planned. A DSMB is monitoring the trial in particular to give an independent advice to participating hospitals about continuing the use of the HEART score at the end of the study only |
| Sample size | (5k) | Inclusion rates have been much lower than expected. The study team considered adding clusters or time points to the study, but decided not to do this because it is unclear how to accommodate such changes properly in the final statistical analysis. Furthermore, there was considerable uncertainty about the assumptions in the initial sample size calculation |
| Method of analysis | (5l) | A Generalized Linear Mixed model (GLMM)-analysis is planned to take into account the hierarchical nature of the data |
| (5m) | No interim economic evaluation has been planned. Negative results in the health economic analysis could, at least in theory, lead to de-implementation of the HEART score. As this process requires time and money, depending on the number of hospitals already switched to HEART, performing a preliminary economic evaluation as part of an interim analysis might have been worthwhile | |
| (5n) | The (in)consistency in effect across clusters (hospitals) will be examined in a explorative way, for instance whether the effect size varies depending on type or size of hospital |
*: discussed below