Literature DB >> 26583478

Generic Preference-based Measures for Low Back Pain: Which of Them Should Be Used?

Aureliano Paolo Finch¹, Melina Dritsaki, Claudio Jommi.

Abstract

STUDY
DESIGN: Systematic review.
OBJECTIVE: This systematic review examines validity and responsiveness of three generic preference-based measures in patients with low back pain (LBP). SUMMARY OF BACKGROUND DATA: LBP is a very common incapacitating disease with a significant impact on health-related quality of life (HRQoL). Health state utility values can be derived from various preference-based HRQoL instruments, and among them the most widely ones are EuroQol 5 dimensions (EQ-5D), Short Form 6 Dimensions (SF-6D), and Health Utilities Index 3 (HUI III). The ability of these instruments to reflect HRQoL has been tested in various contexts, but never for LBP populations.
METHODS: A systematic search on electronic literature databases was undertaken to identify studies of patients with LBP where health state utility values were reported. Records were screened using a set of predefined eligibility criteria. Data on validity (correlations and known group methods) and responsiveness (effect sizes, standardized response means, tests of statistical significance) of instruments were extracted using a customized extraction template, and assessed using predefined criteria.
RESULTS: There were substantial variations in the 37 included papers identified in relation to study design and outcome measures used. EQ-5D demonstrated good convergent validity, as it was able to distinguish between known groups. EQ-5D was also able to capture changes of health states as results of different interventions. Evidence for SF-6D and HUI III was limited to allow an appropriate evaluation.
CONCLUSION: EQ-5D performs well in LBP population and its scores seem to be suitable for economic evaluation of LBP interventions. However, the paucity of information on the other instruments makes it impossible to determine its relative validity and responsiveness compared with them.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2016 PMID： 26583478 PMCID： PMC4772812 DOI： 10.1097/BRS.0000000000001247

Source DB: PubMed Journal: Spine (Phila Pa 1976) ISSN： 0362-2436 Impact factor: 3.468

Low back pain (LBP) is a common health problem. A review of studies published between 1966 and 1998 reported that LBP lifetime prevalence reaches an 84% peak, whereas point prevalence and 1-year prevalence ranges from 12% to 33% and from 22% to 65%, respectively.[1] As an incapacitating disease LBP has an important impact on health-related quality of life (HRQoL), making cost-utility analysis (CUA) the preferred economic evaluation for LBP interventions. In CUA, life years gained are weighted for heath state utility values (HSUVs), which are commonly derived from three generic preference-based measures: EuroQol Five Dimensions (EQ-5D), Short Form Six Dimensions (SF-6D), and Health Utility Index Three (HUI III). Preference-based HRQoL instruments typically comprise a descriptive system covering core dimensions of health (e.g., mobility, self-care, usual activities, pain, and anxieties) and an attached value set, which is obtained on the basis of population's relative desire for dimensions of health. These generic measures are claimed to be applicable across all disease areas, therefore representing an important clinical outcome as well as a common currency for health technology assessment.[2] These instruments psychometric performance in terms of validity (i.e., reaching the objectives it has been developed for) and responsiveness (i.e., ability to detect changes over time and across participants) has been already tested in different decision contexts, and more precisely in patients with visual disorders,[3] cardiovascular diseases,[4] cancer,[5] rheumatoid arthritis,[6] musculoskeletal diseases,[7] and multiple sclerosis,[8] but not in LBP populations.[9] This systematic review aims at covering this gap and establishing whether these instruments use is appropriate in LBP populations. As it is common in similar studies, included articles will not be required to have conducted an assessment of validity and responsiveness themselves, but will contain information from which the instruments performance can be analyzed.

MATERIALS AND METHODS

The study design is a systematic literature review.

Literature Search

Medline, Embase, and Web of Science were investigated using a strategy developed around the four main constructs of the research question: EQ-5D, SF-6D, HUI-III, LBP. Terms searched were derived from Brazier et al,[10] Brooks,[11] Dolan,[12] Fenny et al,[13] Hayden et al,[14] and Lin et al.[15] The searching strategy included synonyms and spelling variations and was refined using truncation, wildcards, phrase search and proximity operators, and adjusted for differences in databases. Related terms such as “validity” or “psychometric characteristics” were not used because of this systematic review objective (this would have been useful in a systematic review of studies assessing the validity of preference-based instruments). No publication date limit was set. All studies published in English or for which a transator was available were considered. As an example, the complete search strategy for MEDLINE (Ovid) is provided in Appendix I.

Study Selection

Relevant records were imported on Refworks and duplicates were removed. Studies were included in the systematic review if they met all the eligibility criteria presented in Table 1.

TABLE 1

Eligibility Criteria

Inclusion criteria

The study population had LBP

The study examined at least one of the three general preference based instrument (EQ5D, SF6D, HUI3)

The study reported an estimate of mean score for the preference based instrument/s examined and for a comparator (e.g., disease specific)

Exclusion criteria

The study focused on a condition other than LBP

The study examined LBP with comorbidities

Pharmacodynamic and pharmacokinetic studies

Presentation at conferences and poster presentation

Data Extraction

A customized extraction template model was used for the collection of relevant data, including study characteristics (e.g., study design), patients characteristics (e.g., age), type and method of validity assessment (e.g., convergent, correlations), method of responsiveness assessment (e.g., standardize response mean), validity and responsiveness data.

Quality Assessment

Quality was assessed using the COnsesus-based Standards for the selection of health Measurement INstrument (COSMIN) checklist,[16] a rating tool to evaluate the methodological quality of studies on measurement properties of health status instruments. For the four psychometric characteristics relevant to the current systematic review (“measurement error,” “hypothesis testing,” “cross cultural validity,” and “responsiveness”) 11 to 18 items per characteristic were analyzed. Each item was assigned one of the four possible scores: “excellent,” “good,” “fair,” or “poor.” The item with the lowest score determined the overall score for the property under investigation.

Assessment of Validity and Responsiveness

Construct validity has been defined as the extent to which an instrument measures what it is intended to measure. Construct validity was analyzed when papers reported on convergent validity (correlation between instruments) and known groups differences detected by instruments. Responsiveness has been defined as the extent to which an instrument is sensitive to statistically significant changes in health over time or between treatment arms.[17] Responsiveness was analyzed when papers reported on tests of statistical significance (TSS), effect sizes (ES), and/or standardize response mean (SRM). Instruments validity and responsiveness was assessed against a set of hypotheses derived from the literature[18-22] (Table 2).

TABLE 2

Validity and Responsiveness Hypothesis

Convergent validity

Hypothesis 1. A positive and moderate-to-very strong correlation (>0.4) between generic instruments and disease-specific instruments (for those disease-specific instruments measuring improvements through a reduction in the scores a negative correlation is expected)

Hypothesis 2. A positive and strong-to-very strong correlation between generic instruments (>0.6)

Hypothesis 3. Stronger correlations between generic preference-based instruments and disease-specific instruments than generic preference-based instruments and disease construct-specific instruments

Known groups

Hypothesis 1. Generic instruments to distinguish between different grades of disability (lower scores at increasing level of disability)

Hypothesis 2. Generic instruments to distinguish between groups with disability and groups without disability (lower scores in the presence of disability)

Hypothesis 3. Generic instruments to distinguish between men and women (lower scores for women than for men)

Hypothesis 4. Generic instruments to distinguish between acute and recurrent LBP (lower scores for acute cases)

Test of statistical significance

Hypothesis 1. Generic instruments to be able to detect changes because of treatments

Hypothesis 2. Generic instruments to be able to detect differences between interventions

Hypothesis 3. Generic instruments to be able to detect changes coherent with those reported by other generic or disease-specific measures

Standardized response mean and effect sizes

Hypothesis 1. SRM and ES to be moderate to strong (>0.5)

Monotonic correlations were considered very weak between 0 and 0.19; weak between 0.2 and 0.39; moderate between 0.4 and 0.59; strong between 0.6 and 0.79; and very strong between 0.8 and 1.[23] Changes in SRM and ES were considered very weak between 0 and 0.19; weak between 0.2 and 0.49; moderate between 0.5 and 0.79; and strong between 0.8 and 1.[24]

RESULTS

Characteristics of the Included Studies

A total of 739 potentially relevant articles were found. Title and abstract screening excluded 223 and 432 records respectively. After reviewing the articles full text, 37 reports referred to 35 studies were included. The process is described in Figure 1.

Figure 1

Flow diagram. It represents the screening of the potentially relevant records retrieved with the database search. Thirty-five studies referred to 37 reports were included.

Flow diagram. It represents the screening of the potentially relevant records retrieved with the database search. Thirty-five studies referred to 37 reports were included. The design feature of included studies varied significantly. The majority of them were randomized controlled trials (RCT),[25-41] followed by cross-sectional studies,[42-47] observational longitudinal[48-58] and cohort studies.[59-61]

Quality of the Included Papers

Quality scores for the three mostly investigated psychometric characteristics (“measurement error,” “hypothesis testing,” “responsiveness”) varied substantially between studies, with at least one study per characteristic receiving a score of excellent, good, fair, and poor quality. Substantially different scores were seen for different characteristics within the same study. For example, Rivero-Arrias et al[40] reported excellent properties for “measurement error,” fair properties for “hypothesis testing” and poor properties for “responsiveness.” In addition to Rivero-Arrias et al,[40] only one other study received an assessment of excellent on some of the aspects of methodological quality investigated (Hellum et al)[36] and this was for “hypothesis testing” and “responsiveness.” The only two studies for which it was relevant to assess cross-cultural validity received a score of fair (Klemenc-Ketis et al)[53] and good (Genevay et al).[51]Table 3 provides an overview of the quality assessment results for the included studies.

TABLE 3

Quality of the Included Papers

Name of the First Author and Year of Publication	Measurement Error	Hypothesis Testing	Cross Cultural Validity	Responsiveness
RCTs
Bastiaenen et al, 2008[25]	Good	Poor	n/r	Poor
Berg et al, 2009[26]	Fair	Poor	n/r	Fair
Berg et al, 2009[27]	Fair	Poor	n/r	Fair
Carr et al, 2005[28]	Fair	Poor	n/r	Fair
Casserley-Feeney et al, 2012[29]	Fair	Poor	n/r	Fair
Chown et al, 2008[30]	Fair	Poor	n/r	Fair
Cox et al, 2010[31]	Fair	Poor	n/r	Fair
Del Pozo-Cruz et al, 2011[32]	Good	Good	n/r	Good
Djais et al, 2005[33]	Fair	Poor	n/r	Fair
Gilbert et al, 2004[34]	Good	Good	n/r	Good
Gilbert et al, 2004[35]	Fair	Good	n/r	Fair
Hellum et al, 2011[36]	Good	Excellent	n/r	Excellent
Hurley et al, 2001[37]	Fair	Fair	n/r	Fair
Kendrick et al, 2001[38]	Fair	Fair	n/r	Fair
Miller et al, 2002[39]	Good	Good	n/r	Fair
Rivero-Arrias et al, 2006[40]	Excellent	Fair	n/r	Poor
Wilkens et al, 2010[41]	Good	Good	n/r	Good
Cross-sectional
Burstrom et al, 2001[42]	Poor	Good	n/r	Poor
Eker et al, 2007[43]	Good	Good	n/r	Poor
Klemenc-Ketis, 2011[44]	Poor	Fair	n/r	Poor
Muraki et al, 2011[45]	Poor	Fair	n/r	Poor
Muraki et al, 2012[46]	Poor	Fair	n/r	Poor
Sogaard et al, 2009[47]	Poor	Good	n/r	Poor
Observational longitudinal
Aghayev et al, 2010[48]	Fair	Poor	n/r	Poor
Cheshire et al, 2011[49]	Fair	Poor	n/r	Poor
Garratt et al, 2001[50]	Fair	Good	n/r	Good
Genevay et al, 2012[51]	Good	Good	Good	Good
Gutke et al, 2011[52]	Good	Good	n/r	Good
Klemenc-Ketis, 2011[53]	Poor	Fair	Fair	Poor
Kovacs et al, 2005[54]	Good	Good	n/r	Good
Kovacs et al, 2004[55]	Good	Good	n/r	Good
Parker et al[56]	Good	Good	n/r	Good
Schluesmann[57]	Fair	Fair	n/r	Fair
Suarez-Almazor et al, 2000[58]	Fair	Fair	n/r	Fair
Cohort studies
Gutke et al, 2006[59]	Poor	Good	n/r	Poor
Jannsonn et al, 2009[60]	Fair	Good	n/r	Fair
Van der Roer et al, 2006[61]	Fair	Fair	n/r	Good

n/r , not relevant.

HRQoL Measures Used

The most frequently used descriptive systems are shown in Table 4. As it can be seen, EQ-5D has been used in all the included studies, whereas SF-6D and HUI III have been found only in two.[47,58] Other common measures used were Oswestry Disability Questionnaire (ODI), Roland Morris Disability Questionnaire (RMDQ), Aberdeen Back Pain Scale (ABPS), and Lumbar Spine Outcome Assessment Instrument (LSOA).

TABLE 4

Main Outcome Measures Reported by the Included

Author, Year	Descriptive System			Rating Scale	Other Instruments Used (Generic non Preference Based, Clinical, Condition specific)
EQ-5D	SF-6D	HUI III	VAS	SF12 or SF-36	ODI	RDQ	NASS	ABPS
Aghayev et al, 2010[49]	√							√
Bastiaenen et al, 2008[24]	√			√	√		√
Berg et al, 2009[25]	√			√	√	√
Berg et al, 2009[26]	√			√	√	√
Burstrom et al, 2001[41]	√
Carr et al, 2005[27]	√				√		√
Casserley-Feeney et al, 2012[28]	√				√		√
Cheshire et al, 2011[50]	√
Chown et al, 2008[29]	√
Cox et al, 2010[30]	√						√		√
Del Pozo-Cruz et al, 2011[31]	√					√	√
Djais et al, 2005[32]	√			√			√
Eker et al, 2007[51]	√				√
Garratt et al, 2001[52]	√						√		√
Genevay et al, 2012[53]	√						√
Gilbert et al, 2004[33]	√				√				√
Gilbert et al, 2004[34]	√				√				√
Gutke et al, 2011[54]	√			√		√
Gutke et al, 2006[55]	√			√		√
Hellum et al, 2011[35]	√				√	√
Hurley et al, 2001[36]	√				√	√
Jansson et al, 2009[56]	√
Kendrick et al, 2001[37]	√			√			√
Klemenc-Ketis, 2011[42]	√			√		√
Klemenc-Ketis, 2011[57]	√					√
Kovacs et al, 2005[58]	√			√			√
Kovacs et al, 2004[59]	√			√			√
Miller et al, 2002[38]	√						√
Muraki et al, 2011[43]	√				√
Muraki et al, 2010[44]	√				√
Parker et al, 2012[45]	√			√	√
Rivero-Arrias et al, 2006[39]	√				√	√	√
Schluessman et al, 2009[46]	√			√				√
Sogaard et al, 2009[47]	√	√			√
Suarez-Almazor et al, 2000[48]	√		√		√	√
Van der Roer et al, 2006[60]	√
Wilkens et al, 2010[40]	√						√

ABPS , Aberdeen Back Pain Scale; EQ-5D, EuroQol 5 Dimensions; HUI 3, Health Utility Index Mark 3; NASS, Lumbar Spine Outcome Assessment Instrument; ODI, Oswestry Disability Index; RDQ, Roland Morris Disability Questionnaire; SF-12, Short Form 12 Dimensions; SF-36, Short Form 36 Dimensions; SF-6D, Short Form 6 Dimensions; VAS, Visual Analogue Scale.

Validity

Convergent Validity Method

Correlations between the outcome measures were reported in 12 studies.[26,42-44,47,50-55,58]

Hypothesis 1

Correlation between EQ-5D and disease-specific instruments was assessed in 10 studies.[26,43,44,50-53,55,58] Five of them analyzed EQ-5D and ODI correlations[44,52,53,55,58] and results were generally moderate to strong (in absolute terms). Correlation coefficients were between 0.510 and 0.739 in three studies,[44,52,53] 0.48 in one,[58] and between 0.232 and 0.206 in one.[55] In one study data were too sparse to assess correlations.[43] Rather strangely, the direction of the correlation changed across studies. Three of them assessed convergent validity between EQ-5D and RMDQ.[50,54,55] Correlations were moderate to strong (ρ between −0.422 to −0.815) in all of them. EQ-5D was also found to moderately correlate with ABPS (r = −0.44) in one study[50] and with Specific Sexual Function Questions (r = −0.51)[26] and Core Outcome Measure Index (COMI) (r = −0.54)[51] in two others. One study[58] presented results for both EQ-5D and HUI III correlations with ODI and found moderate correlations at 3 and 6 months for both instruments. Correlations between HUI III and ODI were stronger than those between EQ-5D and ODI at 3 months but weaker at 6 months. Overall, given that only one study[55] did not reflect our prior expectations of moderate-to-very strong correlations, findings support the first hypothesis of convergent validity for the EQ-5D, and the small evidence found sustains the first hypothesis for the HUI III.

Hypothesis 2

EQ-5D correlation with other HRQOL instruments was assessed in five studies.[42,43,47,54,55] EQ-5D and visual analogue scale (VAS) agreement was examined by three studies.[42,54,55] Burstrom et al[42] reported strong correlations between the two instruments (r = 0.67). Similarly, in the two papers of Kovacs et al[54,55] correlations between EQ-5D and VAS were strong at both 15 and 60 days follow-up. More precisely, the correlation coefficients were 0.70 at 15 days investigation point and 0.76 at 60 days investigation time points[54] and 0.67 at 15 days investigation time point in all of the cases.[55] Differently from what we expected, correlations were only moderate at baseline (r = 0.52[53] and r = 0.42).[54] In one study[47] EQ-5D and SF-6D correlation was moderate (r = 0.553). Similarly, one study found moderate correlations between EQ-5D and SF36 (r = 0.49).[43] Although some papers present data that sustain our prior expectations of positive and strong-to-very strong correlations for the second hypothesis, results are not conclusive given that moderate correlations were also frequently reported.

Hypothesis 3

Only one study presented results for correlation between a generic and a disease construct-specific instrument. In detail, Genevay et al[51] found that EQ-5D was weakly associated with COMI symptom specific (r = −0.36). This study supports the third hypothesis of convergent validity of weak correlations between generic preference-based instruments and disease construct-specific instruments.

Known Group Method

Ten studies allowed an assessment of known groups for EQ-5D.[42,43,45,46,53,56,58-61] Five studies (six reports) permitted an assessment of EQ-5D validity after the first hypothesis.[45,46,49,56,58,61] Two reports referred to the same study[45,46] showed that EQ-5D was able to detect variations in groups with different severity grades of lumbar spondylitis. Differences were statistically significant. One study[49] showed that EQ-5D is able to distinguish between women with non-lumbar-pelvic pain, women with lumbar pain, women with pregnancy-related pelvic girdle pain, and women with combined pain. Differences between groups were statistically significant between women without lumbar-pelvic pain and all the other groups, and between women with lumbar pain and women with combined pain. Differences between lumbar pain and pregnancy-related pelvic girdle pain were not statistically significant. One study[58] reported EQ-5D to differentiate between the group of patients for which the treatment was successful and the group of patients who did not respond to it (P = 0.003). Parker et al[56] presented similar results between patients categorized according to three severity grades: stable; worst and best clinical situation (P ≤ 0.005). EQ-5D presented the highest values for the best clinical situation and the lowest values for the worst situation. Van der Roer et al[61] reported similar results for the same severity groups, although it did not provide results for statistical significance. Overall, EQ-5D responds well when tested on different severity known groups distinguishing between different grades of disability and therefore sustaining the first hypothesis for known group methods. Only one study permitted an assessment of the second hypothesis for known groups.[45,46] The two reports of Muraki et al[45,46] registered a higher mean score (P < 0.05) for those patients who declared not to have LBP if compared with those with the symptom. This sustains the second hypothesis of known group methods, which is the ability of generic preference measure to distinguish between patients and general population. The third hypothesis of known group method has been tested in four studies.[42-44,60] All of them reported women to have significantly lower EQ-5D utility scores than men[42-44,60] maintaining constant the clinical condition, and this was always statistically significant. Results support the third hypothesis of known groups assessment (distinguishing between male and female).

Hypothesis 4

Only one study[61] permitted to evaluate the fourth hypothesis of known group. This study showed EQ-5D to perform well in differentiating patients with acute or recurrent LBP, presenting higher pain and dysfunction for the acute group. This confirms the fourth hypothesis of the study, namely the ability of distinguishing between acute and recurrent LBP.

Responsiveness

Twenty-four studies allowed for an assessment of responsiveness.[25,27-30,32-41,49,50,53,54,56-58,60,61] Twenty-one of them reported TSS,[25,27-30,33-41,48,49,54,56,57,60,61] three of them ES,[32,53,58] and one of them SRM.[50]

Test of Statistical Significance Method

Eighteen studies (19 reports) permitted an assessment of the first hypothesis of responsiveness.[25,27-30,33-37,40,41,48,49,54,56,57,60,61] Hellum et al[36] managed to detect statistically significant improvements in patients treated with surgery with disc prosthesis and patients treated with rehabilitation therapy. Schluessmann et al[57] presented significant changes in patients receiving total disc arthroplasty, with an EQ-5D mean score of 0.32 at baseline, and improvements to 0.72 at 3 months and 0.73 at 1 year. Parker et al[56] registered significant improvement of EQ-5D after patients had undertaken lumbar fusion, which were statistically significant. Also Berg et al,[27] Chown et al,[30] Aghayev et al,[48] and Cheshire et al[49] reported similar results, which were statistically significant. In studies conducted by Bastiaenen et al,[25] Carr et al,[28] Casserley-Feeney et al,[29] Djas and Kalim,[33] Gilbert et al,[34,35] Hurley et al[37] Jansson et al,[60] and Wilkens et al[41] EQ-5D values appeared responsive to improvements because of the treatment of LBP, although these were not statistically significant. According to Kovacs et al,[54] Rivero-Arrias et al,[40] and Van der Roer et al,[61] the EQ-5D is responsive to variations in the health status because of treatment. Overall, the first hypothesis for TSS holds given that preference-based measures are able to detect changes because of treatment. Twelve studies permitted to test for the second hypothesis of responsiveness.[25,27-31,33-35,41,48,60] In Chown et al[30] all patients assigned to the exercise, physiotherapy, or osteopathy groups improved, but patients in the osteopathy group reported significantly higher EQ-5D values if compared with patients in the group exercise (P < 0.01). Similarly, Berg et al[27] registered a different increase in mean EQ-5D values from baseline to 1 year for patients assigned to the total disc replacement group compared with patients assigned to the fusion group, with the total disc replacement being more effective (P < 0.05). Aghayev et al[48] found that EQ-5D was able to distinguish between patients receiving Dynardi total disc arthroplasty and patients receiving total disc replacement, with the differences between the two groups being statistically significant at P < 0.001. Gilbert et al[34,35] found that EQ-5D differentiated between magnetic resonance imaging and delayed magnetic resonance imaging at 8 and 24 months, and that differences were statistically significant in this latter follow-up. Other seven studies presented data that supported the second hypothesis, although results were not statistically significant.[25,28,29,31,33,36,41] Carr et al,[28] for instance, registered an increase in EQ-5D mean values from baseline to 3 months of 0.028 and from baseline to 12 months of 0.045 for the individual physiotherapy group, whereas improvements for the group exercise were milder. Similarly, Casserley-Feeney et al[29] reported EQ-5D to differ between public physiotherapy and private physiotherapy patients, Djas and Kalim[33] for the instrument to be sensitive to differences between patients undergoing radiography and patients not undergoing radiography and Wilkens et al[41] for the measurement to recognize patients administered with glucosamine and patients administered with placebo. Bastiaenen et al,[25] Hellum et al,[36] and Cox et al[31] reported similar results. One study[60] managed to differentiate between patients treated with macrodecompression, microscopic decompression, decompression and fusion, and fusion alone. These results confirm the ability of the EQ-5D to distinguish between different interventions outcomes. Fifteen studies (16 reports) permitted an assessment of the third hypothesis of responsiveness.[25,27-30,33-39,41,54,56,61] Twelve of them reported an EQ-5D behavior that was coherent with the scores registered by other measures.[27-30,33-37,41,54,56] For example, Berg et al[27] registered an increase in EQ-5D values for the total disc replacement group at 1 year, and a reduction of the mean value at 2 years, and similar trends were reported for ODI and VAS. Also Parker et al[56] results of EQ-5D and ODI were coherent. Similarly, Carr et al,[28] Chown et al,[30] Djas and Kalim,[33] Hurley et al,[37] and Kovacs et al[54] presented improvements that were well detected by both EQ-5D and RMDQ, Van der Roer et al[61] by EQ-5D and Quebeck Pain Disability Scale and Gilbert,[34,35] and Hellum et al[36] by EQ-5D and ABPS. Although also for Casserley-Feeney et al[29] EQ-5D and RMDQ presented similar results, this latter study evidenced that RMDQ is more sensitive than EQ-5D to small differences at low levels of disability. This lack of sensitivity to change in health states seems confirmed also by other studies. For example, in Miller et al[39] RMDQ is able to detect a small change in patients’ status at 3 months that passed undetected by EQ-5D and in Bastiaenen et al[25] a similar problem occurs with EQ-5D and RMDQ at 6 months. In Kendrick et al,[38] median EQ-5D scores remained stable from baseline to 9 months, whereas RMDQ scores detected a small improvement in patients. Also, Wilkens et al[41] found an extremely small improvement registered by RMDQ at 1 year follow-up not registered by the EQ-5D. Overall, the evidence collected supports the third hypothesis of responsiveness which is the ability of reporting changes coherent to those reported by other generic or diseases-specific measures.

Effect Size and Standardize Response Mean

Three studies permitted to test ES[32,53,58] and one study SRM.[50] EQ-5D ES were moderate and statistically significant in two studies.[32,53] The third study[58] reported ES for both EQ-5D and HUI III, and found HUI III to be more discriminative than EQ-5D at 3 months, with effect sizes similar to ODI ones. At 6 months, both EQ-5D and HUI III were highly discriminative. One study presented EQ-5D SRM and found a moderate responsiveness of the instrument.[50] ES and SRM were moderate to strong, therefore supporting the hypothesis of responsiveness. EQ-5D validity and responsiveness results are summarized in Table 5.

TABLE 5

EQ-5D Summary of Results

Author, Year	Convergent Validity			Validity—Known Groups				Responsiveness TSS			Responsiveness ES/SRM
H1	H2	H3	H1	H2	H3	H4	H1	H2	H3	H1
Aghayev et al, 2010[49]				√				√	√
Bastiaenen et al, 2008[24]								±	±	X
Berg et al, 2009[25]	√
Berg et al, 2009[26]								√	√	√
Burstrom et al, 2001[41]		√				√
Carr et al, 2005[27]								±	±	√
Casserley-Feeney et al, 2012[28]								±	±	√
Cheshire et al, 2011[50]								√
Chown et al, 2008[29]								√	√	√
Cox et al, 2010[30]									±
Del Pozo-Cruz et al, 2011[31]											√
Djais et al, 2005[32]								±	±	√
Eker et al, 2007[51]	?	X				√
Garratt et al, 2001[52]	√										√
Genevay et al, 2012[53]	√		√
Gilbert et al, 2004[33]								±	√	√
Gilbert et al, 2004[34]								±	√	√
Gutke et al, 2011[54]	√
Gutke et al, 2006[55]				√
Hellum et al, 2011[35]								√	±	√
Hurley et al, 2001[36]								±		√
Jansson et al, 2009[56]						√		±	–
Kendrick et al, 2001[37]										X
Klemenc-Ketis, 2011[42]	√					√
Klemenc-Ketis, 2011[57]	√										√
Kovacs et al, 2005[58]	√	√ X						–		√
Kovacs et al, 2004[59]	√	√ X
Miller et al, 2002[38]										X
Muraki et al, 2011[43]				√	√
Muraki et al, 2010[44]				√	√
Parker et al, 2012[45]				√				√		√
Rivero-Arrias et al, 2006[39]								–
Schluessman et al, 2009[46]								√
Sogaard et al, 2009[47]		X
Suarez-Almazor et al, 2000[48]	√			√							√
Van der Roer et al, 2006[60]				√			√	–
Wilkens et al, 2010[40]								±	±	√ X

Keys: √ Meeting prior expectations; ± trend meeting prior expectation but not statistically significant; - trend meeting prior expectation but no test of statistical significance performed; X trend nonmeeting prior expectations; ? mixed/not possible to assess.

When two keys for the same item are used, it is because more than one result was found.

ES indicates effect size; H1, hypothesis 1; H2, hypothesis 2; H3, hypothesis 3; H4, hypothesis 4; TSS, test of statistical significance.

DISCUSSION

The 35 studies (37 reports) included in this systematic review show that LBP decreases HRQoL and that EQ-5D is generally able to detect improvements and deteriorations in health states because of health interventions or disease progression. Comparing our results with those of similar researches it emerges that EQ-5D performs well in LBP populations. In a review of Tosh et al[3] EQ-5D correlation with visual acuity, a disease-specific instrument for visual disorders, was often poor or nonsignificant for patients with age-related macular degeneration and cataracts. Similarly, a review of Papaioannou et al[62] found generally modest and mostly weak correlations between EQ-5D and disease-specific instruments such as brief psychiatry rating scale and quality-of-life scale, two-schizophrenia HRQoL measures. In light of this, the commonly moderate-to-strong correlations between EQ-5D and disease-specific instruments found in our study show a good performance of the instrument. Differently from what it was hypothesized, EQ-5D correlation with other generic instruments was strong at follow-ups, but only moderate at baseline. Weaker correlations for baseline data might be because of EQ-5D being more sensitive to the lower end of the utility scale,[63] EQ-5D having more distributed frequencies among spine patients compared with other generic instruments[64] (the effect of which is lower mean values for patients in worst health states), or EQ-5D measuring constructs that are relevant for greater disability levels than other generic instruments. Nevertheless, moderate correlations between general preference-based instruments have already been seen in other studies (e.g., Dyer et al),[4] thus this behavior cannot be considered proper evidence against the instrument validity. EQ-5D known group assessment showed statistically significant differences between different disease severities, patients with/without LBP and respondents/nonrespondents to treatments. There was also strong and statistically significant evidence that EQ-5D can distinguish between women and men perception of health, with the HRQoL values for the former being lower than the latter. These results sustain our prior hypothesis and are in line with those of other systematic reviews on EQ-5D validity in other population (e.g., Peasgood et al[65]). EQ-5D appears to be a responsive instrument, although it seems to be less responsive than disease-specific ones. This is not surprising. Disease-specific and general preference-based instruments are not perfect substitutes. Disease-specific instruments only contain items or health dimensions that are relevant for the specific condition examined, whereas generic instruments assess all domains of HRQoL. By contrast, general preference-based instruments are meant to be perfect substitutes, at least in theory. The current systematic review presents paucity of data as regards between generic instruments comparison. One study found HUI III to be more responsive than EQ-5D at 3 months and equally responsive at 6 months. Another study presented only moderate correlation between EQ-5D and SF-6D. These results seem to suggest that the three preference-based instruments are not equivalent measures of HRQoL and that they assess different domains. However, results cannot be considered conclusive and a study estimating direct correlations between generic instruments might be useful. This systematic review has some limitations. First, some of the included studies present small sample sizes. This might be one of the reasons for the lack of statistical significance registered in some reports. Second, there is not enough reference to missing data caused by nonrespondents and how these have been accounted for. Finally, some of the included studies did not control for age, sex, social status, and other variables that can influence LBP evaluation. Nevertheless, our systematic review represents an important effort. It suggests that EQ-5D performs well in LBP population and that its scores are suitable for economic evaluation of LBP interventions, whereas it recommends the use of EQ-5D in combination with disease-specific instruments for clinical evaluation, given its lack of sensitivity to change in health state compared with them. Results for SF-6D and HUI III are too scarce to draw any conclusion. EQ-5D showed good validity and responsiveness in patients with low back pain. EQ-5D can be used for economic evaluation of interventions targeting low back pain. EQ-5D appears unable to detect changes in health status at lower levels of severity. Assessment for SF-6D and HUI III was not possible because of lack of evidence.

55 in total

Review 1. Evaluating common outcomes for measuring treatment success for chronic low back pain.

Authors: Jens R Chapman; Daniel C Norvell; Jeffrey T Hermsmeyer; Richard J Bransford; John DeVine; Matthew J McGirt; Michael J Lee
Journal: Spine (Phila Pa 1976) Date: 2011-10-01 Impact factor: 3.468

2. The access randomized clinical trial of public versus private physiotherapy for low back pain.

Authors: Sarah N Casserley-Feeney; Leslie Daly; Deirdre A Hurley
Journal: Spine (Phila Pa 1976) Date: 2012-01-15 Impact factor: 3.468

3. Reliability and validity of the cross-culturally adapted French version of the Core Outcome Measures Index (COMI) in patients with low back pain.

Authors: Stéphane Genevay; Christine Cedraschi; Marc Marty; Sylvie Rozenberg; Pierre De Goumoëns; Antonio Faundez; Federico Balagué; François Porchet; Anne F Mannion
Journal: Eur Spine J Date: 2011-09-01 Impact factor: 3.134

4. Effects of whole body vibration therapy on main outcome measures for chronic non-specific low back pain: a single-blind randomized controlled trial.

Authors: Borja del Pozo-Cruz; Miguel A Hernández Mocholí; Jose C Adsuar; Jose A Parraca; Inmaculada Muro; Narcis Gusi
Journal: J Rehabil Med Date: 2011-07 Impact factor: 2.912

Review 5. Cost-effectiveness of guideline-endorsed treatments for low back pain: a systematic review.

Authors: Chung-Wei Christine Lin; Marion Haas; Chris G Maher; Luciana A C Machado; Maurits W van Tulder
Journal: Eur Spine J Date: 2011-01-13 Impact factor: 3.134

Review 6. How valid and responsive are generic health status measures, such as EQ-5D and SF-36, in schizophrenia? A systematic review.

Authors: Diana Papaioannou; John Brazier; Glenys Parry
Journal: Value Health Date: 2011-07-28 Impact factor: 5.725

7. Is it feasible and effective to provide osteopathy and acupuncture for patients with musculoskeletal problems in a GP setting? A service evaluation.

Authors: Anna Cheshire; Marie Polley; David Peters; Damien Ridge
Journal: BMC Fam Pract Date: 2011-06-13 Impact factor: 2.497

8. Surgery with disc prosthesis versus rehabilitation in patients with low back pain and degenerative disc: two year follow-up of randomised study.

Authors: Christian Hellum; Lars Gunnar Johnsen; Kjersti Storheim; Oystein P Nygaard; Jens Ivar Brox; Ivar Rossvoll; Magne Rø; Leiv Sandvik; Oliver Grundnes
Journal: BMJ Date: 2011-05-19

9. Fitness and health-related quality of life dimensions in community-dwelling middle aged and older adults.

Authors: Pedro R Olivares; Narcis Gusi; Josue Prieto; Miguel A Hernandez-Mocholi
Journal: Health Qual Life Outcomes Date: 2011-12-22 Impact factor: 3.186

Review 10. A review of generic preference-based measures of health-related quality of life in visual disorders.

Authors: Jonathan Tosh; John Brazier; Philippa Evans; Louise Longworth
Journal: Value Health Date: 2011-10-01 Impact factor: 5.725

10 in total

1. Attributes Underlying Non-surgical Treatment Choice for People With Low Back Pain: A Systematic Mixed Studies Review.

Authors: Thomas G Poder; Marion Beffarat
Journal: Int J Health Policy Manag Date: 2021-03-14

2. Improved health-related quality of life, participation, and autonomy in patients with treatment-resistant chronic pain after an intensive social cognitive intervention with the participation of support partners.

Authors: Peter Joseph Jongen; Rob P Ruimschotel; Y M Museler-Kreijns; Tmc Dragstra; L Duyverman; J Valkenburg-Vissers; J Cornelissen; R Lagrand; Rogier Donders; A Hartog
Journal: J Pain Res Date: 2017-12-01 Impact factor: 3.133

3. Health state utility values among children and adolescents with disabilities: protocol for a systematic review.

Authors: Lucy Kanya; Nana Anokye; Jennifer M Ryan
Journal: BMJ Open Date: 2018-02-21 Impact factor: 2.692

4. Benefits in pain perception, ability function and health-related quality of life in patients with failed back surgery syndrome undergoing spinal cord stimulation in a clinical practice setting.

Authors: Luciana Scalone; Furio Zucco; Angelo Lavano; Amedeo Costantini; Marisa De Rose; Paolo Poli; Gianpaolo Fortini; Laura Demartini; Enrico De Simone; Valentino Menardo; Mario Meglio; Paolo Cozzolino; Paolo A Cortesi; Lorenzo G Mantovani
Journal: Health Qual Life Outcomes Date: 2018-04-19 Impact factor: 3.186

5. Determining the impact of a new physiotherapist-led primary care model for back pain: protocol for a pilot cluster randomized controlled trial.

Authors: Jordan Miller; David Barber; Catherine Donnelly; Simon French; Michael Green; Jonathan Hill; Joy MacDermid; Jacquelyn Marsh; Kathleen Norman; Julie Richardson; Monica Taljaard; Timothy Wideman; Lynn Cooper; Colleen McPhee
Journal: Trials Date: 2017-11-09 Impact factor: 2.279

6. Evaluation of the EQ-5D-3L and 5L versions in low back pain patients.

Authors: A M Garratt; H Furunes; C Hellum; T Solberg; J I Brox; K Storheim; L G Johnsen
Journal: Health Qual Life Outcomes Date: 2021-05-28 Impact factor: 3.186

7. Is an enhanced behaviour change intervention cost-effective compared with physiotherapy for patients with chronic low back pain? Results from a multicentre trial in Israel.

Authors: Alastair Canaway; Tamar Pincus; Martin Underwood; Yair Shapiro; Gabriel Chodick; Noa Ben-Ami
Journal: BMJ Open Date: 2018-04-10 Impact factor: 2.692

8. Predicting EQ-5D-5L Utility Scores from the Oswestry Disability Index and Roland-Morris Disability Questionnaire for Low Back Pain.

Authors: Thomas G Poder; Nathalie Carrier
Journal: J Pain Res Date: 2020-03-26 Impact factor: 3.133

9. Comparative Effectiveness of Chuna Manipulative Therapy for Non-Acute Lower Back Pain: A Multi-Center, Pragmatic, Randomized Controlled Trial.

Authors: Sun-Young Park; Eui-Hyoung Hwang; Jae-Heung Cho; Koh-Woon Kim; In-Hyuk Ha; Me-Riong Kim; Kibong Nam; Min Ho Lee; Jun-Hwan Lee; Namkwen Kim; Byung-Cheul Shin
Journal: J Clin Med Date: 2020-01-05 Impact factor: 4.241

10. EQ-5D-5L and SF-6Dv2 utility scores in people living with chronic low back pain: a survey from Quebec.

Authors: Thomas G Poder; Liang Wang; Nathalie Carrier
Journal: BMJ Open Date: 2020-09-15 Impact factor: 2.692

10 in total