Literature DB >> 27238631

Assessing validity of observational intervention studies - the Benchmarking Controlled Trials.

Abstract

BACKGROUND: Benchmarking Controlled Trial (BCT) is a concept which covers all observational studies aiming to assess impact of interventions or health care system features to patients and populations. AIMS: To create and pilot test a checklist for appraising methodological validity of a BCT.
METHODS: The checklist was created by extracting the most essential elements from the comprehensive set of criteria in the previous paper on BCTs. Also checklists and scientific papers on observational studies and respective systematic reviews were utilized. Ten BCTs published in the Lancet and in the New England Journal of Medicine were used to assess feasibility of the created checklist.
RESULTS: The appraised studies seem to have several methodological limitations, some of which could be avoided in planning, conducting and reporting phases of the studies.
CONCLUSIONS: The checklist can be used for planning, conducting, reporting, reviewing, and critical reading of observational intervention studies. However, the piloted checklist should be validated in further studies. Key messages Benchmarking Controlled Trial (BCT) is a concept which covers all observational studies aiming to assess impact of interventions or health care system features to patients and populations. This paper presents a checklist for appraising methodological validity of BCTs and pilot-tests the checklist with ten BCTs published in leading medical journals. The appraised studies seem to have several methodological limitations, some of which could be avoided in planning, conducting and reporting phases of the studies. The checklist can be used for planning, conducting, reporting, reviewing, and critical reading of observational intervention studies.

Entities: Disease Species

Keywords: Checklist; benchmarking controlled trial; cost-effectiveness; effectiveness; inequality; real-effectiveness medicine; validity

Mesh：

Year: 2016 PMID： 27238631 PMCID： PMC5152539 DOI： 10.1080/07853890.2016.1186830

Source DB: PubMed Journal: Ann Med ISSN： 0785-3890 Impact factor: 4.709

The experimental studies, randomized controlled trials (RCTs), provide the least biased information of the efficacy of medical interventions (1). However, RCTs mostly assess effectiveness of interventions in ideal settings and they focus on specific interventions. Their ability to assess effectiveness of clinical pathways or interventions targeting health care system features is limited. Thus there is an obvious need for valid observational data on actual performance in routine settings (2). A recent paper presents the novel concept of Benchmarking Controlled Trial (BCT) and a comprehensive set of methodological criteria to be considered when appraising evidence from observational intervention studies (3). BCTs can be used to assess impacts of clinical interventions and impacts of features of the health care systems. The aim of this paper is to create a simple checklist for assessing methodological validity of a BCT and to pilot-test the checklist with recent BCTs published in the Lancet and in the New England Journal of Medicine.

Methods

The original comprehensive checklist for methodological validity issues of BCTs was based on author’s previous work with RCTs, systematic reviews and observational studies (1,4–6). Also checklists and scientific papers for observational studies and respective systematic reviews were utilized (7,8). The current checklist was created by extracting the most essential elements from the comprehensive set of criteria in the previous paper on BCTs (open access: http://www. tandfonline.com/doi/full/07853890.2011.586901./07853890.2015.1027255) (3). The ten BCTs analyzed in the original paper on BCT were used to assess feasibility of the checklist created. The appraisal was rechecked and errors were corrected by the author.

Results

The ten main methodological issues and description of how to assess whether they possess a risk for bias are presented in Table 1. The issues 1 and 2 evaluate whether statistical power calculations were made, and whether there is a description of patient selection before patients were eligible to the study. Issues 3 and 4 consider documentation of baseline characteristics and how comparable are the index and reference groups. Issue 5 relates to documentation of processes, and issues 6 and 7 relate to outcomes and proportion of drop-outs. Issue 8 encompasses documentation of outcome relevant health care system features; and issue 9 covers the essential elements for producing high quality services in ordinary health care, particularly staff competence. Issue 10 evaluates whether the statistical analyses are appropriate.

Table 1.

Criteria for the judgment of acceptable validity (scored ‘Yes’*) for the sources of risk of bias in Benchmarking Controlled Trials (3).

1	Statistical power calculated.Score Yes, if description of power calculations and rationale on how the study size was arrived at; post-analysis power calculation is also accepted.
2	Selection of patients described.Score Yes, if clear description of patients’ clinical path before eligible for the study; or if the patient population was comprehensive of the catchment area.
3	Valid and sufficient documentation of baseline characteristics in both index and control populations.*Score Yes, if demographic and socio-economic factors, clinically important data relevant to the particular disorder/disease (e.g. severity), general health/risk status, comorbid conditions, behavioural and environmental factors when relevant, were sufficiently documented. (N.B. what constitutes ‘sufficient’ should be appraised in relation to the study context: whether or not the risk of bias is increased).
4	Baseline comparability acceptable.*Score Yes, if groups are sufficiently similar at baseline regarding demographic and socio-economic factors, duration and severity of the main indication, co-morbid conditions, and value of main outcome measure(s). (N.B. what constitutes ‘sufficient’ should be appraised in relation to the study context: whether or not the risk of bias is increased). If baseline documentation is insufficient, score ‘Unclear’.
5	Valid and sufficient documentation of degree of adherence to the main intervention(s), and of other processes in both index and control populations. *Score Yes, if relevant factors for each particular study question are sufficiently reported, like intensity, duration, number and frequency of health services; and if there were no confounding interventions or they were similar between the index and control groups. (N.B. what constitutes ‘sufficient’ should be appraised in relation to the study context: whether or not the risk of bias is increased).
6	Valid and sufficient documentation of outcomes in both index and control populations, including identical timing of outcome assessment. ^a Score Yes, if validity of the outcomes has been documented for both index and control populations, and the follow-up time points are similar; when relevant: if outcomes are assessed also among disadvantaged patients. (N.B. what constitutes ‘sufficient’ should be appraised in relation to the study context: whether or not the risk of bias is increased).
7	Drop-out rate acceptable.The number of included participants who did not complete the observation period or were not included in the analysis must be described and reasons given. Score Yes, if the percentage of withdrawals and drop-outs does not exceed 10% and does not lead to substantial bias. (N.B. the percentage is arbitrary, not supported by literature, and should be appraised in relation to the study context).
8	System related features sufficiently documented in both the index and control health care providers.Score Yes, if relevant system related factors are sufficiently documented and adjusted for in the statistical analyses: financing of the care system, organization of the care system, available resources, reimbursement and incentives, regulations. If system related features are not relevant in the study context: score ‘Yes’ (N.B. what constitutes ‘sufficient’ should be appraised in relation to the study context: whether or not the risk of bias is increased).
9	Staff competence, use of up-to-date evidence, quality and benchmarking activities sufficiently documented in both the index and control health care providers.Score Yes, if differences in staff competence, use of up-to-date evidence, quality and benchmarking activities Real Effectiveness Medicine framework (2) are sufficiently documented between the index and control groups. If these items are not relevant: score ‘Yes’ (N.B. what constitutes ‘sufficient’ should be appraised in relation to the study context: whether or not the risk of bias is increased).
10	Statistical analyses appropriate.Score Yes, if all appropriate statistical methods have been used to increase the validity of the comparisons (e.g. instrumental variables (when feasible), propensity score matching, baseline-adjustment between observed groups, use of multilevel modelling or survival modelling).
Comments	Includes possible further information of the potential biases including extrinsic biases, e.g. conflict of interests of the researchers.

*Each item may be scored also ‘Unclear’ or ‘No’.

In studies having comparisons between cohorts in time (before-after comparisons): documentation of overall changes in patient characteristics, treatment practices, and outcome in health care over time should also be described in order to score Yes.

Criteria for the judgment of acceptable validity (scored ‘Yes’*) for the sources of risk of bias in Benchmarking Controlled Trials (3). *Each item may be scored also ‘Unclear’ or ‘No’. In studies having comparisons between cohorts in time (before-after comparisons): documentation of overall changes in patient characteristics, treatment practices, and outcome in health care over time should also be described in order to score Yes. The results of the pilot testing of the checklist show that there is considerable variation between the studies in realization of the methodological issues (Table 2). Of the ten validity criteria, two studies scored 7, one study 6, two studies 5, three studies 4, and two studies 3. Four studies made comparisons between countries, and consequently evaluate the impact of the whole health care system including all the clinical processes as determinants of outcomes. Therefore items 5, 8 and 9 are not needed for a valid answer to the study question in these studies. However, lack of information on items 5, 8 and 9 impair possibilities to make inferences of the reasons for between country differences.

Table 2.

Author, year, country	Aim of the study	1. Statistical power calculated	2. Selection of patients described; Yes, if well described or the whole catchment area is covered	3. Valid and sufficient documentation of baseline characteristics in both index and control populations	4. Baseline comparability acceptable after statistical adjustment	5. Valid and sufficient documentation of adherence to intervention, and of other processes in both index and control populations	6. Valid and sufficient documentation of outcomes in both index and control populations	7. Drop-out rate acceptable	8. System related features documented in both index and control health care providers	9. Differences in staff competence, use of up-to-date evidence, quality and benchmarking activities (REM framework) ^a documented in both index and control health care providers	10. Appropriate statistical analyses	Total of validity points (0 to 10) for each study	Comments
1. Coleman et al., Lancet Jan 8, 2011	To assess between country differences for selected cancer survival	No	Yes	Unclear	Unclear	NA ^b	Yes	Yes	NA ^b	NA ^b	Yes	4 ^b	Funding from UK Department of health, no other conflicts of interest
2. Pearse et al., Lancet Sep 22, 2012	To assess mortality rates and patterns of critical care resource use for non-cardiac surgery patients across countries	Yes	Yes	Unclear	Unclear	NA ^b	Yes	Yes	NA ^b	NA ^b	Yes	5 ^b	Authors declare no conflicts of interests
3. Birkmeyer et al. NEJM Oct 10 2013	To assess the effect of surgical skill as a determinant for complication rates after bariatric surgery	No	No	Yes	Yes	Yes	Yes	Yes	No	Yes	Yes	7	Paper provides no declaration of conflict of interests. All authors' declaration forms are available in the internet.
4. Karthikesalinam et al. Lancet Mar 15, 2014	To compare in-hospital mortality of patients with rupture of an abdominal aortic aneurysm in two countries	No	Yes	Unclear	Unclear	NA ^b	Yes	Yes	NA ^b	NA ^b	Yes	4 ^b	Authors declare no conflicts of interests
5. Chung et al. Lancet April 12 2014	To assess 30-day mortality for acute myocardial infarction between two countries	No	Yes	Yes	Yes	Yes	Yes	Yes	NA ^b	NA ^b	Yes	7 ^b	Authors declare no conflicts of interests
6. Finks et al. NEJM June 2, 2011	To assess impact of high-volume hospitals for decreased mortality after five major surgical procedures	No	No	Unclear	Unclear	Unclear	Yes	Yes	No	No	Yes	3	Paper provides no declaration of conflict of interests. All authors' declaration forms are available in the internet.
7. Song et al. NEJM Aug 9, 2011	To assess the effect of a quality system on health care spending and on quality of ambulatory care	Yes	No	Unclear	Unclear	Yes	Yes	Yes	No	No	Yes	5	Paper provides no declaration of conflict of interests. All authors' declaration forms are available in the internet.
8. Wallace et al. NEJM May 31 2012	To assess the impact of night-time intensivist physician staffing for mortality of intensive care patients	No	No	Yes	Yes	Yes	Yes	Yes	No	No	Yes	6	Paper provides no declaration of conflict of interests. All authors' declaration forms are available in the internet.
9. Sutton et al. NEJM Nov 8, 2012	To analyze impact of a hospital pay-for-performance program with patient mortality in three acute diagnoses	No	No	Unclear	Unclear	Unclear	Yes	Yes	No	No	Yes	3	Paper provides no declaration of conflict of interests. All authors' declaration forms are available in the internet.
10. Aiken et al. Lancet May 24, 2014	To assess impact of nurse workloads and nurses' educational qualifications to in hospital mortality after common surgical procedures in several countries	No	No	Unclear	Unclear	Unclear	Yes	Yes	No	Yes	Yes	4	Authors declare no conflicts of interests
Total of validity points (0 to 10) for each criteria		1	4	3	3	4	10	10	0	1	10

REM = Real Effectiveness Medicine framework, in which competence is considered the sine qua non for effectiveness in health care (2).

The study question includes impacts of the whole health care system including the clinical processes; therefore items 5, 8 and 9 are not needed for a valid answer to the study question in these studies. However, lack of information on items 5, 8 and 9 impair possibilities to make inferences of the reasons for between country differences.

Validity of recent Benchmarking Controlled Trials published in the Lancet and in the New England Journal of Medicine(3). Studies 1–5 assessed impact of clinical interventions, and studies 6–10 impact of health care system features. REM = Real Effectiveness Medicine framework, in which competence is considered the sine qua non for effectiveness in health care (2). The study question includes impacts of the whole health care system including the clinical processes; therefore items 5, 8 and 9 are not needed for a valid answer to the study question in these studies. However, lack of information on items 5, 8 and 9 impair possibilities to make inferences of the reasons for between country differences. One study presented statistical power calculations. Four studies fulfilled the criterion of information on selection of patients, because the whole catchment area (country) was covered. Three studies documented a valid and sufficient description of baseline characteristics in the index and control groups, and baseline comparability was considered adequate in these studies. Four studies showed valid and sufficient documentation of adherence to intervention, and description of other treatment processes. All the studies had sufficient documentation of the outcomes. The drop-out rates were acceptable, and the statistical analyses were appropriate in all ten studies. No study described health care system related features. Staff competence was evaluated only in one study, in which impact of surgical skill was the very study question.

Discussion

This paper presents, for the first time, a checklist for assessing validity of observational intervention studies, the BCTs. The checklist is intended for supporting planning, conducting, reporting, peer reviewing, and for critical reading of any observational intervention study. The piloted checklist should be validated in further studies. Several methodological limitations were observed in all the ten studies. Only one study reported on statistical power calculations. None of the studies provided a description of patient selection to the study (four studies included a comprehensive patient population). Only three studies provided a valid and sufficient description of the baseline characteristics, which is a prerequisite for determining whether the comparability between the study groups is acceptable. Only four studies provided a sufficient description of the treatment processes. No study provided a description of health care system features (which potentially have impact on outcomes). Staff competence was described only in one study, which very aim was to assess the impact of competence. In between country comparisons, the treatment processes, health care system features, and staff competence are parts of the causes for the outcome, and thus documentation is not needed from the point of view of validity of the results. However, data on treatment processes, health care system features, and staff competence would be important for making hypotheses of the possible reasons for the between country differences. In conclusion, current observational intervention studies (BCTs) seem to have several methodological limitations, some of which could be avoided in planning, conducting and reporting phases of the studies, and others should be acknowledged in the discussion. The piloted checklist is suggested for anyone interested in assessing validity of observational intervention studies. However, the checklist should be validated in further studies.

8 in total

1. When are observational studies as credible as randomised trials?

Authors: Jan P Vandenbroucke
Journal: Lancet Date: 2004-05-22 Impact factor: 79.321

2. 2009 updated method guidelines for systematic reviews in the Cochrane Back Review Group.

Authors: Andrea D Furlan; Victoria Pennick; Claire Bombardier; Maurits van Tulder
Journal: Spine (Phila Pa 1976) Date: 2009-08-15 Impact factor: 3.468

3. The pros and cons of evidence-based medicine.

Authors: Peter Croft; Antti Malmivaara; Maurits van Tulder
Journal: Spine (Phila Pa 1976) Date: 2011-08-01 Impact factor: 3.468

4. The PERFECT project: measuring performance of health care episodes.

Authors: Unto Häkkinen
Journal: Ann Med Date: 2011-06 Impact factor: 4.709

5. Real-effectiveness medicine--pursuing the best effectiveness in the ordinary care of patients.

Authors: Antti Malmivaara
Journal: Ann Med Date: 2012-03-01 Impact factor: 4.709

6. Arthroscopic partial meniscectomy versus sham surgery for a degenerative meniscal tear.

Authors: Raine Sihvonen; Mika Paavola; Antti Malmivaara; Ari Itälä; Antti Joukainen; Heikki Nurmi; Juha Kalske; Teppo L N Järvinen
Journal: N Engl J Med Date: 2013-12-26 Impact factor: 91.245

7. Benchmarking Controlled Trial--a novel concept covering all observational effectiveness studies.

Authors: Antti Malmivaara
Journal: Ann Med Date: 2015-05-12 Impact factor: 4.709

Review 8. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies.

Authors: Erik von Elm; Douglas G Altman; Matthias Egger; Stuart J Pocock; Peter C Gøtzsche; Jan P Vandenbroucke
Journal: PLoS Med Date: 2007-10-16 Impact factor: 11.069

8 in total

3 in total

Review 1. Systematic review of hospital-wide complication registries.

Authors: I Saarinen; A Malmivaara; R Miikki; A Kaipia
Journal: BJS Open Date: 2018-07-27

2. Common Bias and Challenges in Physical and Rehabilitation Medicine Research: How to Tackle Them.

Authors: Aurore Thibaut; Charlotte Beaudart; Géraldine Martens; Stephen Bornheim; Jean-François Kaux
Journal: Front Rehabil Sci Date: 2022-06-13

3. Applicability of evidence from randomized controlled trials and systematic reviews to clinical practice: A conceptual review.

Authors: Antti Malmivaara
Journal: J Rehabil Med Date: 2021-05-11 Impact factor: 2.912

3 in total