| Literature DB >> 21281501 |
Tamar Pincus1, Clare Miles, Robert Froud, Martin Underwood, Dawn Carnes, Stephanie J C Taylor.
Abstract
BACKGROUND: Current methodological guidelines provide advice about the assessment of sub-group analysis within RCTs, but do not specify explicit criteria for assessment. Our objective was to provide researchers with a set of criteria that will facilitate the grading of evidence for moderators, in systematic reviews.Entities:
Mesh:
Year: 2011 PMID: 21281501 PMCID: PMC3044921 DOI: 10.1186/1471-2288-11-14
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Methodological criteria and rationale
| Criteria | Rationale |
|---|---|
| Stage 1: | |
| 1) Rationale: | |
| a) Was the analysis | The need for a theoretical basis for choice of measurement to be tested as moderator or mediator. Ideally, the planned analysis is |
| b) Was selection of factors for analysis theory/evidence driven? | |
| 2) Method: | |
| a) Was there an equal distribution of moderators between groups at baseline? | Ideally, a-priori stratification in design (Lipchick et al., 2005, Headache). |
| b) Were moderators measured prior to randomisation? | "...a hypothesized moderator must be measured prior to randomization " Nicholson et al., 2005 page 516) |
| 3) Power: | |
| Do authors report a power analysis for moderator effect ( | |
| In the retrospective application of power analysis, as in the prospective one, the researcher must pre-specify the size of a moderator effect that is deemed to be substantively important. That is, the moderator effect size must be determined a- priori. In particular, the observed effect of the moderator variable should never be used to compute statistical power" (extracted directly from Hedges & Pigott, 2004, page 427) | |
| Sufficient power to detect small/moderate effects in moderator analysis has been defined as at least four times that of the main effect (based on the fact that most main effects are in this order of magnitude). | |
| Was sample size adequate for the moderator analysis (at least 4 fold the required sample size for main treatment effect in the lowest sub-group for the moderator factor)? | |
| If not, were there at least 20 people in the smallest sub-group of the moderator? | An inherent problem is that power in RCTs is almost always calculated based on the main effect of treatment. Arbitrary cut-point has been used by other systematic reviews of at least 10 in lowest arm of completed treatment (Eccelston et al., updated for Cochrane, 2009.) We have included this arbitrary criterion to ensure retention of studies that were under-powered in isolation but might still add value to meta-analyses. However, with sub-groups below 20, we considered the study to be unlikely to be informative. |
| Have authors employed analysis to compensate for insufficient power (i.e. boot-strapping techniques?) | This criterion was included because some researchers attempt such analysis, and its value is debatable. |
| 4) Correction for multiple comparisons: | |
| a) Was the regression significant at P < 0.05, or (if more than three comparisons) corrected or significance adjusted to P < 0.01? | In the absence of a-priori stratification, studies often explore several sub-groups, and the risk of type I error is considerably increased. The adjustment of P values has been used in RCT analysis (Turner, 2007). |
| b) Did the authors explore residual variances of interactions if carrying out multiple two-way interactions? | Multiple two-way interactions |
| 5) Measurement validity & measurement error: Was measurement of baseline and process factors reliable and valid (from published information) in target population? | Measurement error considerably inhibits the power of studies to detect interactions. |
| a) Is there evidence that the measurement error of the instrument is likely to be sufficiently small to detect the differences between sub-groups that are likely to be important? | |
| b) Did the authors comment on measurement validity in reference to construct validity, face validity etc? | "Trafimow (2006) described a concern for the construct validity of measures that is roughly analogous to that raised by measurement unreliability but for which there is currently no means of correction." Gelfand et al., 2009 p169. |
| 6)Analysis: | |
| a) Contains an explicit test of the interaction between moderator and treatment (e.g. regression)? | |
| b) Was there adjustment for other baseline factors? | |
| c) Is there an explicit presentation of the differences in outcome between baseline sub-groups (e.g. standardised mean difference between groups, Cohen's | |
| Stage 2: | |
| 1. Differences between sun-groups should be clinically plausible. | Selection of characteristics should be motivated by biological and clinical hypotheses, ideally supported by evidence from sources other than the included studies. Subgroup analyses using characteristics that are implausible or clinically irrelevant are not likely to be useful and should be avoided. " Section 9.6.5.4 |
| 2. Reporting of sub-group analysis is only justified in cases where the magnitude of the different is large enough to support different recommendations for different sub-groups. | "If the magnitude of a difference between subgroups will not result in different recommendations for different subgroups, then it may be better to present only the overall analysis results." Section 9.6.6 |
| 3. Within study comparisons are more reliable than between study comparisons. | "For patient and intervention characteristics, differences in subgroups that are observed within studies are more reliable than analyses of subsets of studies. If such within-study relationships are replicated across studies then this adds confidence to the findings. " Section 9.6.6 |
| 4. At least ten observations should be available for each characteristic explored in sub-group analysis (i.e., ten studies in a meta analysis). | "It is very unlikely that an investigation of heterogeneity will produce useful findings unless there is a substantial number of studies. It is worth noting the typical advice for undertaking simple regression analyses: that at least ten observations (i.e. ten studies in a meta-analysis) should be available for each characteristic modelled. However, even this will be too few when the covariates are unevenly distributed. Section 9.6.5.1 |
Median, 30th centile, 70th centile, appropriateness, and disagreement index, by item
| Median | Appropriateness | Disagreement index * | |||
|---|---|---|---|---|---|
| Stage 1 | |||||
| 1a | 6.0 | 6 | 7 | Appropriate | 0.16 |
| 1b | 6.5 | 6 | 8 | Appropriate | 0.29 |
| 2a | 4.0 | 3 | 5 | Uncertain | 0.85 |
| 2b | 8.0 | 7 | 8 | Appropriate | 0.13 |
| 3a | 5.5 | 5 | 6 | Appropriate | 0.22 |
| 3b | 4.5 | 3 | 6 | Uncertain | 0.97 |
| 4a | 4.5 | 3 | 6 | Uncertain | 0.97 |
| 4b | 5.0 | 4 | 5 | Uncertain | 0.32 |
| 5a | 6.5 | 6 | 8 | Appropriate | 0.29 |
| 5b | 5.0 | 3 | 6 | Uncertain | 0.97 |
| 6a | 8.0 | 8 | 8 | Appropriate | 0.00 |
| 6b | 4.5 | 3 | 6 | Uncertain | 0.97 |
| 6c | 6.5 | 5 | 7 | Appropriate | 0.37 |
| Stage 2 | |||||
| 1 | 6.0 | 4 | 7 | Appropriate | 0.65 |
| 2 | 0.0 | 0 | 5 | Inappropriate | 1.09 |
| 3 | 6.0 | 5 | 6 | Appropriate | 0.29 |
| 4 | 4.0 | 2 | 2 | Uncertain | 0.96 |
*A disagreement index of less than 1.0 indicates no disagreement
Figure 1The study process.
Figure 2The distribution of participants' ratings for each of the items (stage 1).
Figure 3The distribution of participants' ratings for the additional items (stage 2).
Criteria for inclusion in meta-analysis of moderators
| Criteria | Necessary for inclusion in meta-analysis confirming moderator effects | Necessary for inclusion in meta-analysis exploring moderator effects | Criteria for the judgment of | Exceptions |
|---|---|---|---|---|
| Was the analysis | Mention of explicit hypothesis planned in protocol stating which sub-groups will be tested for which outcome | Criterion is not fulfilled in cases where the protocol includes a considerably large set of stated hypotheses or vague hypotheses (e.g. psychological factors will interact with treatment allocation') | ||
| Was selection of factors for analysis clinically plausible and either or both: | A description of theoretical background, or reference to other published evidence leading to the hypothesis | Is not fulfilled in cases where the meta-analyst considers the theory/evidence to be weak, but should not form reason for exclusion. | ||
| i) Theory based | ||||
| ii) Evidence based | ||||
| Were moderators measured prior to randomisation? | Specific statement that baseline measurement occurred prior to randomization | Not applicable for baseline factors that do not change over time, such as gender, or for cluster randomization. | ||
| Adequate quality of measurement of baseline factors | If there is published evidence to support good measurement properties of measurements for target population, according to meta-analysts' protocol. | Is not fulfilled where there is inadequate variability in baseline measure. | ||
| Contains an explicit test of the interaction between moderator and treatment | Ideally, Report a pooled effect size with 95% confidence intervals. Other acceptable analysis includes regression etc. | Not fulfilled when sub-groups are tested separately, or in excessive multiple testing. | ||