Literature DB >> 25955090

Can we convert between outcome measures of disability for chronic low back pain?

Tom Morris¹, Siew Wan Hee, Nigel Stallard, Martin Underwood, Shilpa Patel.

Abstract

STUDY
DESIGN: Retrospective database analysis.
OBJECTIVE: A range of patient-reported outcomes were used to measure disability due to low back pain. There is not a single back pain disability measurement commonly used in all randomized controlled trials. We report here our assessment as to whether different disability measures are sufficiently comparable to allow data pooling across trials. SUMMARY OF BACKGROUND DATA: We used individual patient data from a repository of data from back pain trials of therapist-delivered interventions.
METHODS: We used data from 11 trials (n=6089 patients) that had at least 2 of the following 7 measurements: Roland-Morris Disability Questionnaire, Chronic Pain Grade disability score, Physical Component Summary of the 12- or 36-Item Short Form Health Survey, Patient Specific Functional Scale, Pain Disability Index, Oswestry Disability Index, and Hannover Functional Ability Questionnaire. Within each trial, the change score between baseline and short-term follow-up was computed for each outcome and this was used to calculate the correlation between the change scores and the Cohen's κ for the 3-level outcome of change score of less than, equal to, and more than zero. It was considered feasible to pool 2 measures if they were at least moderately correlated (correlation>0.5) and have at least moderately similar responsiveness (κ>0.4).
RESULTS: Although all pairs of measures were found to be positively correlated, most correlations were less than 0.5, with only 1 pair of outcomes in 1 trial having a correlation of more than 0.6. All κ statistics were less than 0.4 so that in no cases were the criteria for acceptability of pooling measures satisfied.
CONCLUSION: The lack of agreement between different outcome measures means that pooling of data on these different disability measurements in a meta-analysis is not recommended. LEVEL OF EVIDENCE: 2.

Entities: Chemical

Mesh：

Year: 2015 PMID： 25955090 PMCID： PMC4504533 DOI： 10.1097/BRS.0000000000000866

Source DB: PubMed Journal: Spine (Phila Pa 1976) ISSN： 0362-2436 Impact factor: 3.468

Patient-reported outcome measures (PROMs) are commonly used in low back pain (LBP) research comparing therapist-delivered interventions. These outcomes are used to measure participants' perspectives on their symptoms, capabilities, performance functioning, treatment preferences, and general well-being. Investigators tend to choose instruments with which they are familiar or those recommended in consensus statements. Although all these instruments aim to measure similar constructs, there is little information on their compatibility and comparability. To compare results based on different measures, it is important to know if summary measures such as treatment effect sizes from one instrument have the same interpretation as that from another instrument. The commonest outcome measures used in randomized controlled trials for LBP, and the ones that researchers are most familiar with interpreting are the Roland-Morris Disability Questionnaire (RMDQ) score and the Oswestry Disability Index (ODI).1 Being able to standardize outcomes to measure in one of these would improve quality of the interpretation of outcomes. The importance of being able to crosswalk scores between different measures was identified by the National Institutes of Health Task Force on Research Standards in Chronic Low Back Pain as an important research priority.2 If the measures are comparable, then it is possible to compare data from studies using different measures and to pool these data in a meta-analysis. If the measures are not comparable, then such comparisons and any meta-analysis using different measures may not be robust. We have developed a large pooled data set of individual patient data from 19 trials (n = 9328) of therapist-delivered interventions for LBP that will be a resource for researchers working in the field (report submitted to the National Institute for Health Research).3 All included trials in this pooled data set used at least one of the 6 PROMs designed to measure the aforementioned back pain–related disability or included generic-based health-related quality-of-life instruments such as 12- (SF-12)4 or 36-Item Short Form Health Survey (SF-36).5 However, no common instrument was used by all these trials. In this article, we assess the agreement between the instruments by determining their correlation and responsiveness to detect positive, zero, or negative change at an individual participant level with the intention of calibrating measures against each other to allow data pooling using a single common scale.

MATERIALS AND METHODS

Trials, Instruments, and Change Scores

There are a number of back pain–related disability outcome measures used in the research, each to varying degrees. In our data set, we had data available on 6 PROMs that aim to measure back pain–related disability, namely, the Chronic Pain Grade (CPG) disability score, which is one of the 2 domains in the CPG that aims to grade chronic pain status,6 the Hannover Functional Ability Questionnaire (FFbHR),7 the ODI,8 the Pain Disability Index (PDI),9 the mean score of 3 items from the Patient Specific Functional Scale (PSFS),10 and the RMDQ.11 Eleven of the 19 trials (n = 6089) included 2 or more measures of back pain–related disability or included data that allowed us to calculate the Physical Component Summary (PCS) from the generic-based SF-12/36, recorded at baseline and short-term follow-up (2–3 mo postrandomization).12–22 We used individual patient data from these trials to make comparisons between back pain–specific measures and SF12/36 PCS to facilitate indirect comparisons between back pain–specific measures. The change score for each individual patient was defined as the difference between the score at short-term follow-up and baseline with sign allocated so that a positive change score indicates an improvement in disability in each case. We compared change scores of each instrument within each trial.

Correlation and Responsiveness

In order for conversion between outcome measures to be meaningful, the change in each measure should be correlated and have similar responsiveness,23 where the latter is explained as follows. Correlation was assessed by calculation of the Pearson correlation coefficient and illustrated using scatterplots. A priori we considered correlations greater than 0.5 (a large effect size) to indicate a level of correlation that would allow pooling of data collected from different measures.23,24 This criterion was lower than the one used (0.7) in a similar study that examined the justification of combining scores for meta-analyses in chronic obstructive pulmonary disease.23 Responsiveness is the ability to detect a change in condition. If 2 measures are similarly responsive when a patient's condition improves or worsens over time, then this should be reflected by a change in the patient's score on both measures. If 2 outcome measures do not have similar responsiveness, then combining them in a meta-analysis may introduce heterogeneity that could be falsely attributed to other sources, such as the treatment effect. Similarity of responsiveness of 2 outcome measures was examined by categorizing the change scores as negative change (change score <0), no change (change score = 0), or positive change (change score >0), and calculating Cohen's κ from these categorizations.25A priori we considered κ more than 0.4 to indicate sufficiently similar responsiveness.26 These broad categories were chosen to demonstrate whether or not the outcome measures had similar responsiveness in the most basic sense (improved, worsened, or no change). All analyses were run in R.27

RESULTS

We included data from 11 trials (n = 6089) in this analysis (Table 1), allowing 21 pairwise comparisons between outcomes within trials. Figure 1A–F shows a selection of scatterplots of standardized change scores of these outcome measures. The other scatterplots are available as supplementary materials (see Supplementary Digital Content Figure 1A–D, available at: http://links.lww.com/BRS/A974, http://links.lww.com/BRS/A975, http://links.lww.com/BRS/A976, and http://links.lww.com/BRS/A977). It is clear from these plots that although instruments seem to be positively correlated, there is a large disagreement between the outcomes.

TABLE 1.

Instruments Used and Number of Patients by Trial

Trial	n	Outcome Measures
UK BEAM20	885	RMDQ	CPG	PCS
BeST16	426	RMDQ	CPG	PCS
Brinkhaus et al12	281	PCS	FFbHR	PDI
Haake et al14	1110	CPG	FFbHR	PCS
Hancock et al15	235	RMDQ	PSFS
HULLEXPROB13	203	RMDQ	PCS
Macedo et al17	158	RMDQ	PCS	PSFS
Pengel et al18	232	RMDQ	PSFS
Von Korff et al21	227	RMDQ	CPG
Witt et al22	2229	PCS	FFbHR
YACBAC19	206	PCS	ODI

RMDQ indicates Roland-Morris Disability Questionnaire; CPG, Chronic Pain Grade disability score; PCS, Physical Component Summary of 12- or 36-Item Short Form Health Survey; FFbHR, Hannover Functional Ability Questionnaire; PDI, Pain Disability Index; PSFS, Patient Specific Functional Scale; ODI, Oswestry Disability Index.

Figure 1.

Scatterplots of standardized change scores of outcome measures: (A) PCS against CPG; (B) PCS against ODI; (C) PDI against PCS; (D) CPG against FFbHR; (E) CPG against RMDQ; and (F) PSFS against RMDQ. PCS indicates Physical Component Summary of the 12- or 36-Item Short Form Health Survey; CPG, Chronic Pain Grade disability score; ODI, Oswestry Disability Index; PDI, Pain Disability Index; FFbHR, Hannover Functional Ability Questionnaire; RMDQ, Roland-Morris Disability Questionnaire; and PSFS, Patient Specific Functional Scale. RMDQ indicates Roland-Morris Disability Questionnaire; CPG, Chronic Pain Grade disability score; PCS, Physical Component Summary of 12- or 36-Item Short Form Health Survey; FFbHR, Hannover Functional Ability Questionnaire; PDI, Pain Disability Index; PSFS, Patient Specific Functional Scale; ODI, Oswestry Disability Index. Correlations and κ statistics are shown in Table 2. The correlations ranged from 0.21 to 0.70, confirming that these instruments are positively correlated and with the linear associations between them ranging from weak to moderately strong. Where several trials include the same pair of measures, it is interesting to compare the correlations obtained. Three trials had both SF-12/36 PCS and FFbHR data, and the correlations in the 3 trials were very similar, all of about 0.58.12,14,22 Another 3 trials had both SF-12/36 PCS and CPG, and the correlations between these measures in the different trials were reasonably similar, ranging from 0.41 to 0.56,14,16,20 and 4 trials had both SF-12/36 PCS and RMDQ, with range from 0.38 to 0.52, again similar.13,16,17,20 However, correlations between other outcomes were quite widely ranging across trials: between CPG and RMDQ (3 trials; range, 0.21–0.47)16,20,21 and between PSFS and RMDQ (3 trials; range, 0.40–0.70).15,17,18

TABLE 2.

Pearson Correlation and Cohen's κ for Each Pair of Instruments

Outcome Measure 1	Outcome Measure 2	Trial	Pearson Correlation	Cohen κ
CPG	RMDQ	UK BEAM20	0.47	0.27
		BeST16	0.44	0.22
		Von Korff et al21	0.21	0.12
CPG	FFbHR	Haake et al14	0.48	0.25
PCS	RMDQ	UK BEAM20	0.51	0.33
		BeST16	0.38	0.17
		HULLEXPROB13	0.45	0.29
		Macedo et al17	0.52	0.27
PCS	CPG	UK BEAM20	0.56	0.31
		BeST16	0.41	0.27
		Haake et al14	0.49	0.27
PCS	FFbHR	Brinkhaus et al12	0.59	0.30
		Haake et al14	0.58	0.29
		Witt et al22	0.59	0.27
PCS	PSFS	Macedo et al17	0.36	0.17
PCS	ODI	YACBAC19	0.60	0.28
RMDQ	PSFS	Hancock et al15	0.70	0.38
		Macedo et al17	0.40	0.26
		Pengel et al18	0.53	0.18
PDI	FFbHR	Brinkhaus et al12	0.55	0.32
PDI	PCS	Brinkhaus et al12	0.54	0.31

CPG indicates Chronic Pain Grade disability score; RMDQ, Roland-Morris Disability Questionnaire; FFbHR, Hannover Functional Ability Questionnaire; PCS, Physical Component Summary of 12- or 36-Item Short Form Health Survey; PSFS, Patient Specific Functional Scale; ODI, Oswestry Disability Index; PDI, Pain Disability Index. Cohen's κ statistics calculated for the 3 by 3 table with the number of patients with positive change, no change, or negative change on each outcome was less than 0.4 for all 21 comparisons. Some were similar between trials, namely, for PCS and FFbHR (range, 0.27–0.30)12,14,22 and for PCS and CPG (range, 0.27–0.31).14,16,20 However, the level of agreement was never more than fair.

DISCUSSION

A number of patient-reported outcomes are commonly used to measure disability in randomized controlled trials of interventions for LBP, with little consensus as to a preferred measure. High correlation and similar responsiveness are necessary conditions for outcome measures to be comparable enough that one could be used to predict another so that they could be pooled, for example, in a meta-analysis. Our work reported here has used data from 11 randomized controlled clinical trials from a large pooled data set of individual participant data to assess the extent to which these criteria are satisfied for pairs of measures. We found that for each pair of outcome measures, correlation and similarity in responsiveness were low. In all cases, these were below the threshold set to consider it feasible to convert between the outcome measures or combine them in an individual participant data meta-analysis. A strength of our work has been the use of individual participant data from a large number of trials using different combinations of outcome measures. This has enabled us to conduct 21 within-trial comparisons between pairs of 7 different outcome measures, with some pairwise comparisons repeated on the basis of data from a number of different trials. We are not aware of any similar comparison conducted on this scale. A weakness of this study has been the small sample size for some trials. Because comparisons were conducted within trial, this means that some estimates may not be precise. A further weakness is that although all but one of the outcome measures are ordinal, we have treated them as continuous in our analysis. Specifically, the Pearson correlation coefficient requires that the variables in question are continuous. Although it is common practice for ordinal variables with a large number of points on their scales to be treated as though they are continuous, some authors consider this to be a mistreatment of such variables,28 but we felt that applying a more complicated method would have been an attempt to account for a richer structure than was actually present. The lack of agreement between different outcome measures taken on the same patient is probably due to the fact that the questionnaires measure disability in different ways. Indeed, it would be hard to justify the time-consuming process of creating a new questionnaire if the end result were to be very similar to another already-existing questionnaire. Data from several trials including the same pairs of measures enabled the correlation coefficients and κ statistics between a pair of measures to be obtained from different data sets and compared. Of particular note is the correlation between PCS of SF-12/36 and FFbHR, which were about 0.58 and were very similar across the 3 trials. This may not be surprising because these 3 trials were conducted by the same group, tested the same intervention (acupuncture), and recruited from similar German populations.12,14,22 On the contrary, the correlations between CPG and RMDQ ranged from 0.21 to 0.47. There were slight variations in the version of CPG instrument that was used in these trials. The UK BEAM20 and BeST16 trials used the modified version of CPG, which asked patients how much their back trouble had been interfering with their daily activities in the last 1 month, whereas in the Von Korff trial21 the time period was the last 3 months. This may explain the weaker association between CPG and RMDQ in the Von Korff trial because the RMDQ was designed to measure if their back pain had been interfering with their daily activities on the day they were evaluated. Our comparison has been based on the change from baseline to short-term follow-up (2–3 mo postrandomization). This time point was chosen because data were available in all trials. Nearly all of the improvement from baseline seen in intervention and control arms of randomized controlled trials of LBP is seen by around 3 months.29 Thus, there would be little advantage in additionally considering long-term outcomes. Many of the trials also had mid-term (6 mo) and long-term (1 yr) follow-up. We performed the same analyses on these data, and the results were similar.

CONCLUSION

We used data from 11 randomized clinical trials (n = 6089 patients) in LBP to compare the following 7 measurements: RMDQ, CPG disability score, PCS of the SF-12/36, PSFS, PDI, ODI, and FFbHR. Pairs of measures were found to be positively correlated, but correlations were mostly less than the 0.5 we specified a priori, with only 1 pair of outcomes in 1 trial having a correlation of more than 0.6. Correlations between the SF-12/36 PCS and other PROMs, namely, CPG, FFbHR, ODI, and PDI, were moderately positive (between 0.40 and 0.60). We note, however, that we set a less rigorous cutoff than other investigators. However, all κ statistics, including those comparing these pairs of outcomes, were less than 0.4. In no cases were the criteria we had set for acceptability of pooling measures satisfied. These data do not support the notion that crosswalking between scores on different LBP outcomes measures is justifiable. Future researchers need to settle on a single outcome measure for trials of back pain treatments. Adoption of the core set suggested by the National Institutes of Health Task Force is an important step that will allow a better understanding of the differences and similarities from results from different studies.2 We conclude that the lack of agreement between different outcome measures means that pooling of data on these different disability measurements in a meta-analysis is not recommended.

Key Points

Changes from baseline to short-term follow-up of 6 back pain–related disability outcomes and 1 generic-based health-related quality-of-life outcome were compared on the basis of data from 11 trials of therapist-delivered interventions for chronic LBP. Correlations between the measures and Cohen's κ statistics comparing the number of patients for whom the change scores were greater than, equal to, or less than zero were calculated. Correlations and κ statistics were found to be low. It is not recommended that data on the different patient-reported outcomes studied should be pooled in a meta-analysis.

22 in total

Review 1. Condition-specific outcome measures for low back pain. Part I: validation.

Authors: U Müller; M S Duetz; C Roeder; C G Greenough
Journal: Eur Spine J Date: 2004-03-17 Impact factor: 3.134

2. [Hannover Functional Questionnaire in ambulatory diagnosis of functional disability caused by backache].

Authors: T Kohlmann; H Raspe
Journal: Rehabilitation (Stuttg) Date: 1996-02 Impact factor: 1.113

3. A randomized trial comparing a group exercise programme for back pain patients with individual physiotherapy in a severely deprived area.

Authors: Jane L Carr; Jennifer A Klaber Moffett; Elaine Howarth; Stewart J Richmond; David J Torgerson; David A Jackson; Caroline J Metcalfe
Journal: Disabil Rehabil Date: 2005-08-19 Impact factor: 3.033

4. The measurement of observer agreement for categorical data.

Authors: J R Landis; G G Koch
Journal: Biometrics Date: 1977-03 Impact factor: 2.571

5. A study of the natural history of back pain. Part I: development of a reliable and sensitive measure of disability in low-back pain.

Authors: M Roland; R Morris
Journal: Spine (Phila Pa 1976) Date: 1983-03 Impact factor: 3.468

6. The Oswestry low back pain disability questionnaire.

Authors: J C Fairbank; J Couper; J B Davies; J P O'Brien
Journal: Physiotherapy Date: 1980-08 Impact factor: 3.358

7. A trial of an activating intervention for chronic back pain in primary care and physical therapy settings.

Authors: Michael Von Korff; Benjamin H K Balderson; Kathleen Saunders; Diana L Miglioretti; Elizabeth H B Lin; Stephen Berry; James E Moore; Judith A Turner
Journal: Pain Date: 2005-02 Impact factor: 6.961

8. United Kingdom back pain exercise and manipulation (UK BEAM) randomised trial: effectiveness of physical treatments for back pain in primary care.

Authors:
Journal: BMJ Date: 2004-11-19

9. Grading the severity of chronic pain.

Authors: Michael Von Korff; Johan Ormel; Francis J Keefe; Samuel F Dworkin
Journal: Pain Date: 1992-08 Impact factor: 6.961

10. The Pain Disability Index: psychometric properties.

Authors: Raymond C Tait; John T Chibnall; Steven Krause
Journal: Pain Date: 1990-02 Impact factor: 6.961

3 in total

1. Outcome measurement in patients with low back pain undergoing epidural steroid injection.

Authors: Tülay Erçalık; Kardelen Gencer Atalay; Canan Şanal Toprak; Osman Hakan Gündüz
Journal: Turk J Phys Med Rehabil Date: 2019-04-18

2. Identification of subgroup effect with an individual participant data meta-analysis of randomised controlled trials of three different types of therapist-delivered care in low back pain.

Authors: Siew Wan Hee; Dipesh Mistry; Tim Friede; Sarah E Lamb; Nigel Stallard; Martin Underwood; Shilpa Patel
Journal: BMC Musculoskelet Disord Date: 2021-02-16 Impact factor: 2.362

3. Comparison of the Effectiveness of Pharmacological Treatments for Patients with Chronic Low Back Pain: A Nationwide, Multicenter Study in Japan.

Authors: Gen Inoue; Takashi Kaito; Yukihiro Matsuyama; Toshihiko Yamashita; Mamoru Kawakami; Kazuhisa Takahashi; Munehito Yoshida; Shiro Imagama; Seiji Ohtori; Toshihiko Taguchi; Hirotaka Haro; Hiroshi Taneichi; Masashi Yamazaki; Kotaro Nishida; Hiroshi Yamada; Daijiro Kabata; Ayumi Shintani; Motoki Iwasaki; Manabu Ito; Naohisa Miyakoshi; Hideki Murakami; Kazuo Yonenobu; Tomoyuki Takura; Joji Mochida
Journal: Spine Surg Relat Res Date: 2020-11-20

3 in total