| Literature DB >> 24824338 |
Jenni Hislop1, Temitope E Adewuyi2, Luke D Vale1, Kirsten Harrild3, Cynthia Fraser4, Tara Gurung5, Douglas G Altman6, Andrew H Briggs7, Peter Fayers8, Craig R Ramsay4, John D Norrie9, Ian M Harvey10, Brian Buckley11, Jonathan A Cook12.
Abstract
BACKGROUND: Randomised controlled trials (RCTs) are widely accepted as the preferred study design for evaluating healthcare interventions. When the sample size is determined, a (target) difference is typically specified that the RCT is designed to detect. This provides reassurance that the study will be informative, i.e., should such a difference exist, it is likely to be detected with the required statistical precision. The aim of this review was to identify potential methods for specifying the target difference in an RCT sample size calculation. METHODS ANDEntities:
Mesh:
Year: 2014 PMID: 24824338 PMCID: PMC4019477 DOI: 10.1371/journal.pmed.1001645
Source DB: PubMed Journal: PLoS Med ISSN: 1549-1277 Impact factor: 11.069
Figure 1PRISMA flow diagram.
*For a breakdown of studies that used more than one method in combination, please see Table 1. Central, Cochrane Central Register of Controlled Trials; CMR, Cochrane Methodology Register; ERIC, Education Resources Information Center; SCI, Science Citation Index.
Use of multiple methods.
| Methods Used in Combination | Number of Studies | ||||||
| Anchor | Distribution | Health Economic | Opinion-Seeking | Pilot Study | Review of Evidence Base | Standardised Effect Size | |
| √ | √ | 70 | |||||
| √ | √ | √ | 63 | ||||
| √ | √ | 46 | |||||
| √ | √ | 13 | |||||
| √ | √ | 8 | |||||
| √ | √ | 3 | |||||
| √ | √ | √ | 2 | ||||
| √ | √ | √ | 2 | ||||
| √ | √ | 2 | |||||
| √ | √ | 1 | |||||
| √ | √ | √ | √ | 1 | |||
| √ | √ | √ | √ | 1 | |||
| √ | √ | √ | 1 | ||||
| √ | √ | 1 | |||||
| √ | √ | 1 | |||||
| √ | √ | 1 | |||||
Main variations in implementation of the methods.
| Anchor | Distribution | Health Economic | Opinion-Seeking | Pilot Study | Review of the Evidence Base | Standardised Effect Size |
|
|
|
|
|
|
|
|
RCI, reliable change index; VAS, visual analogue scale; WTP, willingness to pay per unit of effectiveness.
Assessment of the value of the methods.
| Criteria | Method | ||||||
| Anchor | Distribution | Health Economic | Opinion-Seeking | Pilot Study | Review of the Evidence Base | Standardised Effect Size | |
|
| |||||||
| Does the method seem a sensible approach)? (face validity) | Yes | No | Yes | Yes | Yes | Yes | Yes |
| Does the method allow the overall benefit/harm profile of a treatment comparison to be addressed? (content validity) | As it is based upon a single outcome, the scope is limited; multiple perspectives can be accommodated | Focuses upon a single outcome and does not address directly either a realistic or an important difference | Potentially the most comprehensive approach, though it can be complex, data-hungry, and time-intensive; a value judgement is needed as to whose costs and benefits are important | Yes, though conditional upon a perspective | Yes | Yes | No |
| Has the method been shown to be consistent with an independent standard? (criterion validity) | Yes | No | No, usage so far has been in hypothetical retrospective examples | No | No | No | No, with an exception for some quality of life outcomes |
| Has the method been shown to be consistent with expected drivers (e.g., is the specified difference greater when there is a larger risk of harm)? (construct validity) | Yes | Findings have been conflicting | No, usage so far has been in hypothetical retrospective examples | No | Yes | Yes | No |
|
| |||||||
| Has the method been reported clearly enough to be reproducible (i.e., reviewers can easily agree upon reading what the method was and how it was applied)? | Yes | Yes | Yes, although the complexity of some of the approaches may require extensive reporting | Yes | Yes | Yes | Yes |
| Are there any important variations in implementation? | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
|
| |||||||
| Has the method's repeatability been assessed (consistency of estimate when repeated—if applicable)? | Yes | Yes | No, although in principle for a given model structure and data inputs, the approach is repeatable | No | No | Yes | Not applicable |
| Is uncertainty of the estimated difference addressed by the method (implicitly or explicitly)? | Yes | Yes | Yes, using the more complex approaches | Yes, when adopting a synthesis of opinion | Yes | Yes, where the result from an appropriate statistical analysis is used | No |
| Has the method been shown to be sensitive to different outcomes/populations? | Yes | Yes | No | Yes, to a limited extent | Yes | Yes | No; universal values are routinely applied irrespective of the outcome and population |
|
| |||||||
| Is the method suited to any trial design? | Yes | Yes | Yes | Yes | Yes, though it is more likely to be used for Phase 3 or definitive trials | Yes, though it is more likely to be used for Phase 3 or definitive trials | Yes |
| Can the method be used for a variety of outcome measures? | Continuous/ordinal outcome only | Continuous/ordinal outcome only | Yes | Yes | Yes | Yes | Yes, though it is widely used only for a continuous outcomes |
| Is the method acceptable to patients, clinicians, and trialists? | Yes | Uncertain | Uncertain | Yes | Yes | Yes | Uncertain, though widely used |
| Is it straightforward to use? | Yes | Yes | No, except for simpler, more naive approaches | Yes | Yes, though it requires a study to be carried out | Yes, though it requires a review to be carried out | Yes |
| Has the method been used in an RCT setting? | Yes | Yes | Published examples are retrospective | Yes | Yes | Yes | Yes |
Usage of methods—examples and key implementation points.
| Method | Example | Key Points |
|
| Neuropathy Total Symptom Score-6 was measured at baseline and 1 y in patients with diabetes mellitus and diabetic peripheral neuropathy. The clinical global impression anchor—a seven-point scale ranging from marked improvement to marked worsening, which assesses the change in health status between baseline and 1 y—was collected by a health professional | • Suitable for continuous (or ordinal) outcomes.• Anchor implementation is critical, e.g., the perspective and anchor adopted.• Particularly suited to quality of life measures.• The magnitude of the difference can be sensitive to the population group (e.g., ceiling/floor and disease severity effects may exist).• Use of the most common anchor approach implies that a within-person (important) difference can be applied, though a between-person approach is also possible. |
|
| The Norwegian Fear Avoidance Beliefs Questionnaire (FABQ) was completed by 28 patients with chronic lower back pain. Using a measurement error approach, the maximum difference that could be attributed to spurious variation for the FABQ-Work and FABQ-Physical Activity scales was calculated as 12 and 9 units, respectively. These values can be considered as a lower bound of an important difference for the corresponding scale and can be used with an appropriate SD value | • Suitable for continuous (or possibly ordinal) outcomes.• Use of the distribution method (i.e., measurement error approach) is of limited merit because of its weak justification of an “important” difference.• A simple range or levels approach should be a last resort if no more informative methods can be used, and only when the outcome has clear meaning. |
|
| For women with tubal damage, IVF or tubal surgery could be used to treat infertility. The cost per pregnancy was calculated for both treatments. Based upon existing data, surgical treatment is successful in 12% of cases. Given this estimate, the required proportion of successful treatments for the more expensive IVF treatment was calculated as 27%, and a difference of 15% (27% to 12%) was considered (economically) important | • Allows a comprehensive approach to the value of an RCT; in particular, the costs of the intervention and its comparator and of research can be considered in conjunction with possible benefits and consequences of decision-making. The flexible modelling framework allows any type of outcome to be incorporated.• The perspective adopted is critical—the viewpoint and values that are used to determine the scope of costs and benefits incorporated into the model structure.• Uncertainty around inputs can be substantial, and extensive sensitivity analyses will likely be needed. Some inputs (e.g., time horizon) will be particularly challenging to specify, as well as appropriately representing the statistical relationship of multiple parameters. These could also be based on empirical data and/or expert opinion.• This can be a resource-intensive and complex approach to determining the sample size.• Unlikely to be accepted as the sole basis for study design at present despite intuitive appeal. Patients and clinicians may be resistant to the formal inclusion of cost into the design and thereby the primary interpretation of studies. Expressing the difference in a conventional way is likely to be necessary, as it is more intuitive to stakeholders and also furthers the science of interventions. It could provide additional justification for conducting a large and expensive trial (e.g., when there is a small effect and/or events are rare). |
|
| Six experts were asked to recommend an important difference for the Doyle Index to be used in a hypothetical trial of two antirheumatic drugs with stated inclusion/exclusion criteria for patients with rheumatoid arthritis. A Delphi consensus-reaching approach with three rounds was implemented by mail. The median (range) estimate for the third round was 5.5 (5.7), and 5.5 could be viewed as an important difference and used with an appropriate SD value | • Allows for varying degrees of complexity of the scenario (e.g., consideration of related effects or impact on practice) and any outcome type (binary, continuous, or survival).• The perspective is critical—whose opinions are being sought.• A realistic and/or important target difference can be sought.• A target difference that takes into account other outcomes and/or consequences (e.g., a target difference that would lead to a health professional changing practice) or focuses exclusively on a single outcome can be sought. |
|
| A pilot trial compared a cognitive behavioural therapy to physiotherapy in patients with acute lower back pain. The SD of Roland–Morris scores was calculated as 5.7, which was used in combination with an estimate of an important difference of 4 from a previous study | • There is a need to assess the relevance of the pilot study to the design of a new RCT study. Some down-weighting (whether formally or informally) may be needed according to the relevance of the study and methodology used. For example, a Phase 2 study should be used to directly specify a (realistic) target difference for a Phase 3 study only if the population and outcome measurement are judged to be sufficiently similar.• Helpful for estimating outcome components such as variability of a continuous outcome (or control group rate for a binary outcome), although the estimation of the target difference is typically imprecise because of a small sample size.• This approach can be used in conjunction with another method (e.g., using an opinion-seeking method to determine an important difference) to allow full specification of the target difference. |
|
| A systematic search of an online medical database identified no RCTs that had compared acupuncture to a waiting list control for patients with breast cancer and assessed fatigue. Two further searches identified relevant studies from which an estimate of the within-group effects upon fatigue for acupuncture and waiting list control treatments could be calculated. Best, worst, and average effects were calculated for the two treatments, with various possible between-treatment-group effects calculated. Estimates for the between-treatment-group effects varied from 0.19 to 1.02 (Cohen's d) | • It should be based on a systematic search of available evidence.• It can be used for any outcome type (including continuous, binary, ordinal, and time-to-event outcomes).• A choice must be made whether an important and/or a realistic difference is sought.• A number of issues need to be considered when assessing an observed difference:○ Is the evidence available directly relevant to the research question at hand (PICOT assessment)?○ Is the existing evidence of a robust nature? Are there multiple studies available, and were they conducted in a methodologically robust manner? What was the risk of bias?○ Is the outcome of interest fully reported? Individual patient data are seldom available, and reporting of outcomes is often selective.• Determination of a realistic (target) difference can, and when possible should, be based on a systematic review and associated meta-analysis of RCTs, although imprecision in the estimate needs to be considered.• The use of prior evidence can be formalised through simulation of the impact of a new study on the meta-analysis result, although this implies that a particular analysis will be conducted and the new study will be analysed alongside the current evidence. |
|
| Fifty-three nursing home patients received a specialist geriatric medicine consultation. The Goal Attainment Scale was measured post-consultation as part of an observational study. The mean (SD) score was 45.7 (6.9). Using the post-consultation SD and Cohen's criteria, the small, medium, and large effect values were calculated as 1.4, 3.5, and 5.5, respectively | • The SES for a continuous outcome should be calculated as the difference between groups divided by the appropriate SD. For a parallel group trial, the SD will typically be an estimate of the (common) final group SD, which corresponds to an unadjusted analysis of the final scores; the SD of the within-person change score could be used when an analysis of change scores is planned. The benefit of removing within-person variance, such as through an analysis that adjusts for the baseline value, can also be incorporated when the correlation can be estimated.• A SES from a before-and-after treatment study is unlikely to be representative of that achievable in a treatment study, particularly when two active treatments are compared.• Use of Cohen's criteria of interpretation is difficult to justify, although widespread. Modifications to this effect size scale have been suggested. For example, pragmatic trials are generally accepted to have smaller effects than more efficacy-focused studies. The SES may differ in magnitude between clinical areas and outcomes, and when the standard treatment is very effective.• Changes in the variability (e.g., population spectrum) for a continuous outcome can result in a different standardised effect even though the mean difference remains the same. It is important that an estimate of the variability is also specified and that the sample is similar to the anticipated RCT population. For a binary outcome, the target difference (whether a relative or an absolute difference) should be considered in conjunction with the control group event proportion.• It is most appropriate as a fallback option, if other more context-relevant methods for specifying the target difference cannot be used. |
IVF, in vitro fertilisation.