| Literature DB >> 27690131 |
Abstract
The reporting of evaluation outcomes can be a point of contention between evaluators and policy-makers when a given reform fails to fulfil its promises. Whereas evaluators are required to report outcomes in full, policy-makers have a vested interest in framing these outcomes in a positive light-especially when they previously expressed a commitment to the reform. The current evidence base is limited to a survey of policy evaluators, a study on reporting bias in education research and several studies investigating the influence of industry sponsorship on the reporting of clinical trials. The objective of this study was twofold. Firstly, it aimed to assess the risk of outcome reporting bias (ORB or 'spin') in pilot evaluation reports, using seven indicators developed by clinicians. Secondly, it sought to examine how the government's commitment to a given reform may affect the level of ORB found in the corresponding evaluation report. To answer these questions, 13 evaluation reports were content-analysed, all of which found a non-significant effect of the intervention on its stated primary outcome. These reports were systematically selected from a dataset of 233 pilot and experimental evaluations spanning three policy areas and 13 years of government-commissioned research in the UK. The results show that the risk of ORB is real. Indeed, all studies reviewed here resorted to at least one of the presentational strategies associated with a risk of spin. This study also found a small, negative association between the seniority of the reform's champion and the risk of ORB in the evaluation of that reform. The publication of protocols and the use of reporting guidelines are recommended.Entities:
Year: 2016 PMID: 27690131 PMCID: PMC5045216 DOI: 10.1371/journal.pone.0163702
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Flow chart of the process of identifying and selecting evaluation reports.
References of studies included in the corpus of studies.
| DWP-1 | Pathways to Work | [ | |
| DWP-2 | Job Retention and Rehabilitation Pilot | [ | |
| DWP-3 | Jobseeker’s Allowance Skills Conditionality Pilot | [ | |
| DWP-4 | Jobseekers Allowance Intervention Pilots | [ | |
| DWP-5 | ONE Pathfinder | [ | |
| DWP-6 | StepUP Pilot | [ | |
| DFE-1 | Early Education Pilot for Two-Year-Old Children | [ | |
| DFE-2 | The Care Placements Evaluation (CaPE) | [ | |
| DFE-3 | Every Child a Writer | [ | |
| DFE-4 | Empowering Young People Pilots (EYPP) | [ | |
| HOME-1 | Alcohol Arrest Referral Pilot Schemes (Phase 2) | [ | |
| HOME-2 | Three restorative justice pilots: JRC, REMEDI and CONNECT | [ | |
| HOME-3 | Judicial mediation in Employment Tribunals | [ | |
| DWP-1-TS | Pathways to Work | Unpublished | |
| DFE-1-TS | Early Education Pilot for Two-Year-Old Children | Unpublished |
Fig 2Correspondence diagram: Outcomes as planned in technical specifications vs. outcomes as reported in final reports (for studies DWP-1 and DFE-1).
Fig 3Example of complete reporting (DWP-1, Table 5.1, p.49).
Fig 4Example of incomplete reporting (HOME-1, Table 7, p.17).
Quotes from the corpus of studies.
| Number | Quote | Reference |
|---|---|---|
| 1 | “A key requirement underpinning sampling is the need to include a discussion on the capability of analysing subgroups, and any implications for overall samples of the need to estimate impacts of separate components. We would welcome suggestions on types of subgroup analyses” | DWP-1-TS, p.17 |
| 2 | “[Tenderers] must also demonstrate a commitment to meet deadlines and yet be sufficiently flexible, should the programme of work require amending” | DWP-1-TS, p.26 |
| 3 | “The P-value suggests that the impact is statistically significant since there is only a nine per cent probability of finding an effect of this size by chance” | DWP-1, p.48 |
| 4 | “By convention, P-values of five per cent or less are regarded as indicating statistical significance. However, this is essentially arbitrary and ignores the continuous nature of P-values. The approach taken in this report is to use the conventional five per cent P-values for the results based on the administrative data but to use ten per cent P-values for the results based on the survey data in view of the smaller sample size available for these estimates” | DWP-1, p.48 |
| 5 | “The small sample size of those in work and with earnings information at the time of the outcome interview reduced the likelihood of detecting an impact on earnings. No statistically significant impact of Pathways on monthly net earnings about a year and a half after the initial incapacity benefits enquiry was found (Table 5.2) (…). In view of the employment effect of Pathways, one would expect a positive impact on earnings” | DWP-1, p.2 |
| 6 | “The finding is clear-cut: there is no evidence that, on average, the pilot improved the non-verbal reasoning of children overall” | DFE-1, p.99 |
| 7 | “The choice of variables from which to create sub-groups is somewhat arbitrary. The final list is based on a selection of possible variables for which: (i) the sub-groups have large enough sample sizes for at least moderately large impacts to be detected; (ii) there is some expectation that impacts may have been different in at least some of the sub-groups”. | DWP-2, p.49 |
| 8 | “The quantitative analysis used administrative data to provide details on the implementation of the pilot and whether it could be used to provide valid estimates of the impact of mandation” | DWP-3, p.1 |
| 9 | “Overall, the results are encouraging in that they suggest Pathways continues to have a positive impact on employment and, furthermore, that this impact may be sustained” | DWP-1, p.4 |
| 10 | “This report has shown no evidence that offering Job Retention and Rehabilitation Pilot interventions to those off work sick improved their chances of returning to work” | DWP-2, p.129 |
| 11 | “Tenderers’ suggestions for evaluating net impact needs to be of the highest quality, and this will be looked at specifically in addition to a more broad requirement of methodological expertise” | DWP-1-TS, p.27 |
| 12 | “The contractor will be expected to work closely with officials of the Department throughout the research, keeping them informed of progress and involving them in key decisions. Officials in policy and analytical branches in DWP and DH must have the opportunity to comment on and approve topic guides and questionnaires, formats for analysis and draft reports” | DWP-1-TS, p.22-23 |
| 13 | “This will be a high-profile evaluation and to get full value from it, timely and high quality reporting is essential. To ensure full value of the evaluation tenderers should consider ways in which emerging findings from studies can most appropriately be fed back to policy officials in order to inform further policy development. For example in advance of the production of draft reports, contractors are likely to be asked to present headline findings to core policy officials and analysts” | DWP-1-TS, p.24 |
| 14 | “It is the expectation that the key outputs from the study will be in the public domain. The Department will aim to publish key outputs within a reasonable period of time following receipt of an agreed final report. The publication of any research articles or other publications based on information collected for this study will be subject to approval from the [DFE]. However, this will not be unreasonably withheld” | DFE-1-TS, p.4 |
Spuriousness of subgroup analyses, based on Sun et al. (2010).
| DFE-1 | DFE-2 | DFE-3 | DFE-4 | DWP-1 | DWP-2 | DWP-3 | DWP-4 | DWP-5 | DWP-6 | HOM-1 | HOM-2 | HOM-3 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Number of subgroups | 4 | 5 | 5 | 7 | 4 | 10 | 9 | 4 | 4 | 10 | 4 | 6 | 0 |
| A. Number of subgroup variables not measured at baseline | 0/4 | 1/5 | 0/5 | 0/7 | 0/4 | 0/10 | 0/9 | 0/4 | 0/4 | 0/10 | 0/4 | 0/6 | 0 |
| B. Number of analyses suggested by comparisons of between studies ( | 0/4 | 0/5 | 0/5 | 0/7 | 0/4 | 0/10 | 0/9 | 0/4 | 0/4 | 0/10 | 0/4 | 0/6 | 0 |
| C. Number of subgroup analyses not based on interaction | 0/4 | 0/5 | 0/5 | 7/7 | 0/4 | 0/10 | 9/9 | 4/4 | 4/4 | 10/10 | 0/4 | 0/6 | 0 |
| D. No theoretical justification | 4/4 | 2/5 | 5/5 | 7/7 | 4/4 | 10/10 | 9/9 | 4/4 | 4/4 | 10/10 | 4/4 | 2/6 | 0 |
| E. Number of analyses for which the direction of the SG effect was not specified | 4/4 | 3/5 | 5/5 | 7/7 | 4/4 | 10/10 | 9/9 | 4/4 | 4/4 | 10/10 | 4/4 | 3/6 | 0 |
| Average proportion (standardized) | 0.4 | 0.2 | 0.4 | 0.6 | 0.4 | 0.4 | 0.6 | 0.6 | 0.6 | 0.6 | 0.4 | 0.2 | 0 |
| Overall risk of spuriousness | Medium | Low | Medium | High | Medium | Medium | High | High | High | High | Medium | Low | Nil |
* This is the proportion of subgroup analyses based on data collected or known at baseline. For example, all subgroup analyses in study DFE-4 were based on data collected or known at baseline.
** This is the sum of all proportions for criteria A to E, divided by the number of criteria. For example, the score for study DFE-1 is: (0+0+0+1+1)/5 = 0.6.
Overall risk of spin per study and per type of spin.
| DFE-1 | DFE-2 | DFE-3 | DFE-4 | DWP-1 | DWP-2 | DWP-3 | DWP-4 | DWP-5 | DWP-6 | HOM-1 | HOM-2 | HOM-3 | TOTAL | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A. Missing outcome indicators | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| B. Incomplete reporting | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 5 |
| C. Interpretative bias | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 5 |
| D. Within-group comparisons | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 |
| E. Subgroup analyses | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 12 |
| F. Upgraded/ downgraded outcomes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 3 |
| G. Conclusion bias | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 7 |
| 2 | 2 | 2 | 4 | 3 | 1 | 3 | 3 | 2 | 3 | 2 | 4 | 3 | - | |
| 4 | 1 | 3 | 2 | 4 | 2 | 3 | 1 | 5 | 3 | 3 | 1 | 1 | - | |
| r = -0.31 | ||||||||||||||
0 = This type of spin was not found in the study
1 = This type of spin was found
* Sum of criteria B to G (missing outcome indicators could not be recorded for all studies)
** 1 = The pilot was not announced; 2 = the pilot was announced by a junior minister; 3 = the pilot was announced by a senior minister; 4 = the pilot was announced by the Chancellor of the Exchequer; 5 = the pilot was announced by the Prime minister.