Literature DB >> 29194127

Core outcome measurement instruments for clinical trials in nonspecific low back pain.

Alessandro Chiarotto1,2, Maarten Boers1,3, Richard A Deyo4, Rachelle Buchbinder5,6, Terry P Corbin7, Leonardo O P Costa8, Nadine E Foster9, Margreth Grotle10,11, Bart W Koes12, Francisco M Kovacs13, C-W Christine Lin14, Chris G Maher14, Adam M Pearson15, Wilco C Peul16, Mark L Schoene17, Dennis C Turk18, Maurits W van Tulder2, Caroline B Terwee1, Raymond W Ostelo1,2.   

Abstract

To standardize outcome reporting in clinical trials of patients with nonspecific low back pain, an international multidisciplinary panel recommended physical functioning, pain intensity, and health-related quality of life (HRQoL) as core outcome domains. Given the lack of a consensus on measurement instruments for these 3 domains in patients with low back pain, this study aimed to generate such consensus. The measurement properties of 17 patient-reported outcome measures for physical functioning, 3 for pain intensity, and 5 for HRQoL were appraised in 3 systematic reviews following the COSMIN methodology. Researchers, clinicians, and patients (n = 207) were invited in a 2-round Delphi survey to generate consensus (≥67% agreement among participants) on which instruments to endorse. Response rates were 44% and 41%, respectively. In round 1, consensus was achieved on the Oswestry Disability Index version 2.1a for physical functioning (78% agreement) and the Numeric Rating Scale (NRS) for pain intensity (75% agreement). No consensus was achieved on any HRQoL instrument, although the Short Form 12 (SF12) approached the consensus threshold (64% agreement). In round 2, a consensus was reached on an NRS version with a 1-week recall period (96% agreement). Various participants requested 1 free-to-use instrument per domain. Considering all issues together, recommendations on core instruments were formulated: Oswestry Disability Index version 2.1a or 24-item Roland-Morris Disability Questionnaire for physical functioning, NRS for pain intensity, and SF12 or 10-item PROMIS Global Health form for HRQoL. Further studies need to fill the evidence gaps on the measurement properties of these and other instruments.

Entities:  

Mesh:

Year:  2018        PMID: 29194127      PMCID: PMC5828378          DOI: 10.1097/j.pain.0000000000001117

Source DB:  PubMed          Journal:  Pain        ISSN: 0304-3959            Impact factor:   6.961


1. Introduction

Low back pain (LBP) represents the leading cause of years lived with disability globally, ranking first in both developed and developing countries.[46] The mean lifetime prevalence of LBP is estimated to be 39%, with a mean point prevalence of 18%.[58] The costs of LBP constitute a major burden to health care systems and society.[32,76] Most commonly, a specific pathoanatomical cause cannot be identified for LBP, so its most prevalent form is nonspecific LBP (nsLBP).[79] The number of randomized controlled trials assessing the effectiveness of health interventions in nsLBP has substantially increased over the past 2 decades.[12] Heterogeneity in the choice of outcomes and measurement instruments assessed in clinical trials hampers comparisons between studies and systematic reviews summarizing them.[72,73] In several medical fields including nsLBP, this is a major issue.[53,70,77] It can be addressed by agreeing on a standardized set of outcomes that should be measured and reported in all clinical trials on a specific health condition: a core outcome set (COS).[7,19,113] A COS does not preclude the choice of primary or secondary outcomes that are not in the COS, but ensures that important outcomes are consistently assessed.[7,19,113] A COS specific to LBP was introduced 20 years ago by a group of experienced researchers and clinicians.[8,30] Deyo et al.[30] and Bombardier[8] proposed 5 core outcome domains to be measured in LBP clinical research: back-specific function, pain symptoms, generic health status, work disability, and satisfaction with care; for each of these domains, 1 or 2 patient-reported outcome measures (PROMs) were also suggested. More recently, we initiated an international Steering Committee to build on this existing proposal, by consulting up-to-date methodology of Core Outcome Measures in Effectiveness Trials (COMETs) and Outcome Measures in Rheumatology (OMERACT) initiatives[6,7,92,104,111,112] to develop a COS applicable to clinical trials in patients with nsLBP.[22] Developing a COS is a 2-step consensus process that involves, first, determining the core outcome domains (“core domain set”), and second, selecting the best outcome measurement instruments to measure these domains (“core outcome measurement set”).[7,19,113] For nsLBP, a consensus was achieved on 4 core outcome domains: physical functioning, pain intensity, health-related quality of life (HRQoL), and number of deaths.[16] The domain number of deaths was included in line with OMERACT mandatory requirement to have at least 1 domain in the core area “Death”[7] and because it is good practice for any trial to report on this domain; it can be covered with a simple statement reporting how many deaths occurred in a trial.[16] However, there is no consensus on measurement instruments for the other 3 core outcome domains. The selection of core outcome measurement instruments comprises the following steps: (1) identifying potential core instruments, (2) evaluating their measurement properties and feasibility, and (3) reaching a consensus on those that should be recommended.[6,92] The objective of this study was to formulate recommendations on core outcome measurement instruments for clinical trials in patients with nsLBP.

2. Methods

An international Steering Committee, including 19 members, worked on the development of this COS: 17 researchers and/or clinicians (A.C., M.B., R.A.D., R.B., L.O.P.C., N.E.F., M.G., B.W.K., F.M.K., C.-W.C.L., C.G.M., A.M.P., W.C.P., D.C.T., M.W.v.T., C.B.T., and R.W.O.) and 2 patients' representatives (T.P.C. and M.L.S.). A 4-member project team comprising a subset of the Steering Committee (A.C., M.B., C.B.T., and R.W.O.) oversaw the initiative. The committee expertise included the following: anesthesiology, epidemiology, internal medicine, orthopaedics, physical therapy, neurosurgery, primary care, psychology, rehabilitation, and rheumatology. The intent was to develop a COS applicable to the measurement of efficacy or effectiveness of health interventions assessed in all clinical trials for patients with nsLBP, defined as “LBP not attributable to a recognizable, known specific pathology (eg, infection, tumour, fracture, and axial spondyloarthritis).”[22] Therefore, this COS applies to all interventions, regardless of type, setting, frequency, or mode of administration. Following COMET and OMERACT definitions,[7,113] this COS does not prescribe primary outcomes. Rather, it recommends outcome domains and measurement instruments that should be included in each individual trial, alongside additional trial-specific outcomes. The selection of instruments for physical functioning, pain intensity, and HRQoL was guided by the OMERACT handbook,[6] and the consensus-based guidance of the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative in cooperation with COMET.[92] In the Netherlands, this type of study does not fall within the score of the Dutch Medical Research in Human Subjects Act (WMO), therefore it was exempt from ethical approval of a University Ethics Committee.

2.1. Identification of potential core outcome measurement instruments

The Steering Committee selected a preliminary set of outcome measurement instruments for the core domains, choosing among those frequently used in clinical trials[15,44] and those recommended by other initiatives aimed at standardizing measurements for LBP[8,24,30,31] or chronic pain.[34] It was considered that these criteria (ie, already in frequent use and recommended by others) would facilitate implementation of this COS. The project team performed an initial screening to determine whether an instrument had good face validity to measure the domain and was feasible (eg, accessibility, cost prohibitive, and availability of translations) for inclusion in a COS.[6] A previous systematic review linking LBP-specific PROMs content to the International Classification of Functioning was consulted to support decisions on face validity.[49] Only PROMs were selected because they are feasible and the most frequently used and recommended tools in the LBP literature.[8,15,24,30,31,34,44]

2.2. Appraisal of measurement properties of outcome measurement instruments

The COSMIN initiative[83] previously identified 9 measurement properties relevant for PROMs: internal consistency, test-retest reliability, measurement error, construct validity, structural validity, criterion validity, cross-cultural validity, and responsiveness.[85] Three systematic reviews (for physical functioning, pain intensity and HRQoL) summarized and appraised the evidence on these measurement properties in patients with nsLBP (Chiarotto et al., 2018. Measurement properties of Numeric Rating Scale, Visual Analogue Scale and Pain Severity subscale of Brief Pain Inventory in patients with low back pain: a systematic review: Unpublished data; Chiarotto et al., 2018. Evidence on the measurement properties of health-related quality of life instruments is largely missing in patients with low back pain, a systematic review: Unpublished data; and Ref. 18). These reviews were conducted according to the recently updated COSMIN methodology for this type of reviews (Prinsen et al., 2018. COSMIN guideline for systematic reviews of patient-reported outcome measures: Unpublished data); a more detailed description of their methodology is presented elsewhere (Chiarotto et al., 2018. Measurement properties of Numeric Rating Scale, Visual Analogue Scale and Pain Severity subscale of Brief Pain Inventory in patients with low back pain: a systematic review: Unpublished data; Chiarotto et al., 2018. Evidence on the measurement properties of health-related quality of life instruments is largely missing in patients with low back pain, a systematic review: Unpublished data; and Ref. 18).

2.3. Delphi study

A consensus procedure is recommended to find an agreement on core outcome measurement instruments.[7,92] An online modified Delphi survey was chosen as it is a widely used method to establish a consensus on various health- and research-related issues[47,63,74,85,105]; allows participation of a broad, international, and multistakeholder panel of ‘experts’; enables reconsideration of participants' views based on responses from others; and preserves anonymity among respondents.[51,98] Authors of at least 2 publications comprising psychometric or clinimetric studies, randomized clinical trials, or systematic reviews of clinical trials in patients with nsLBP were selected to participate. This selection was performed among 280 people invited to participate in the Delphi study on core outcome domains for nsLBP (selected with a systematic approach, as explained elsewhere[16,22]), members of the Initiative on Methods, Measurement and Pain Assessment in Clinical Trials (IMMPACT) executive, authors of the 2 most recent IMMPACT publications,[37,103] and 39 members of the OMERACT pain working group. To retrieve the publications, a PubMed search was performed on October 18, 2016, by 1 reviewer (A.C.) combining authors' names with MESH terms and key words referring to LBP. All eligible authors were invited for Delphi participation; all Steering Committee members were also invited. Two Delphi rounds were run: the first between October 19 and November 9, 2016, the second between December 13, 2016, and January 17, 2017. Before invitation, the content of each round was pilot tested by at least 4 Steering Committee members. Selected participants were invited to participate in both rounds, unless they explicitly indicated that they did not wish to participate. During each round, 2 reminders were sent to people who had not responded. Participants were asked about sociodemographic (eg, nationality and sex) and professional characteristics (eg, current role and number of clinical trials in nsLBP). Given the high LBP point prevalence,[58] all participants were asked whether they currently had nsLBP, and those answering positively were specifically requested to also consider their patient perspective when responding to the Delphi survey. These professionals were also considered as part of the patient stakeholder group, together with patient representatives. Proposals were presented in the Delphi survey as closed questions in which participants could answer on a 5-point Likert scale ranging from “Strongly disagree/Absolutely no” to “Strongly agree/Absolutely yes” and give reasons for their answers. Because Delphi studies rely on reaching a consensus, no sample size calculation was required. A consensus was set a priori at 67% of total number of participants (dis)agreeing with a proposal (ie, “Strongly (dis)agree” and “(Dis)Agree” answers were pooled together). This criterion is in line with previous Delphi studies (Terwee et al., 2018. COSMIN standards and criteria for evaluating the content validity of patient-reported outcome measures: a Delphi study: Unpublished data; and Refs. 16, 87, 88, 90). Consistency of results was assessed by separately calculating proportions of each stakeholder group (ie, researchers, clinicians, and patients). The online software SurveyMonkey (SurveyMonkey, Palo Alto, CA) was used.

2.3.1. Delphi round 1

There is a consensus that the minimum requirement to include a PROM in a COS is that it has high quality evidence for sufficient content validity,[92] but in the systematic reviews this criterion was not met by any instrument (Chiarotto et al., 2018. Measurement properties of Numeric Rating Scale, Visual Analogue Scale and Pain Severity subscale of Brief Pain Inventory in patients with low back pain: a systematic review: Unpublished data; Chiarotto et al., 2018. Evidence on the measurement properties of health-related quality of life instruments is largely missing in patients with low back pain, a systematic review: Unpublished data; and Ref. 18). Despite this, a proposal was made in the first round, before the actual consensus procedure commenced, for recommending core instruments based on the following reasoning: the absence of high quality evidence does not equate to insufficient content validity, not endorsing any instrument may hamper design and conduct of future trials, and there is a need to update the 20-year old recommendations.[8,30] Subsequently, participants were asked whether they agreed or disagreed with the endorsement of each potential core instrument for inclusion in the COS, taking into account the instrument itself, its measurement properties, and characteristics (synthesized in a table comparing multiple PROMs for the same domain). To facilitate the interpretation of the summary of evidence on measurement properties, colored smiley faces were used for each measurement property of each instrument (eg, a green happy smiley face indicated a high or moderate quality evidence of sufficient results). The order of PROM presentation was randomized across participants. Finally, 2 open questions were asked to participants for additional potential core instruments and for generic feedback on the Delphi and the COS development process. One reviewer (A.C.) read all comments and selected the most consistent and/or substantial ones for discussion together with quantitative results in face-to-face meetings with the other members of the project team.

2.3.2. Delphi round 2

In the second round, participants were presented with the results of Round 1, including their own ratings, those of the total Delphi panel and those of each stakeholder group; a selection of illustrative comments describing participants' reasoning was also displayed. The full feedback report with all comments was emailed to the participants. Patient-reported outcome measures for which there was a consensus for endorsement in the first round were rediscussed only to address some specific aspects (eg, feasibility and characteristics). Patient-reported outcome measures without a consensus were presented again for voting only if they had at least 50% of participants in favor of the endorsement or if any substantial remark favored their endorsement. If no consensus was found on any instrument for a domain, all potential core instruments for that domain were presented again for rating. The round concluded with an open question asking for suggestions for the research agenda.

2.4. Recommendations on core outcome measurement instruments

The Delphi results were discussed in a face-to-face meeting of the project team. A first proposal on recommendations for core outcome measurement instruments for clinical trials in nsLBP was formulated and sent to all members of the Steering Committee for review. The committee feedback was considered in a second face-to-face meeting of the project team, after which a refined proposal was sent to the Steering Committee for further revision. Once approval was obtained from all committee members, the recommendations were considered ready for reporting.

3. Results

3.1. Potential core outcome measurement instruments

Seventeen PROMs were selected as potential core instruments for physical functioning, 3 for pain intensity, and 5 for HRQoL (Table 1).[1,5,9,10,13,14,22,23,28,29,33,36,38,40,42,54,59,62,64,71,75,80,81,89,94,95,101,102,107,108] There are multiple versions of both the Roland-Morris Disability Questionnaire (RMDQ) and Oswestry Disability Index (ODI), the most widely used physical functioning PROMs in LBP.[15,44] Several versions with sufficient face validity were included (Table 1). The Pain Interference subscale of the Brief Pain Inventory (BPI-PI) and the Pain Interference items of the Multidimensional Pain Inventory (MPI-PI) were included because they had been recommended as generic instruments to measure physical functioning in chronic pain.[34]
Table 1

Patient-reported outcome measures selected as potential core outcome measurement instruments to measure physical functioning, pain intensity and health-related quality of life in clinical trials in non-specific low back pain.

Patient-reported outcome measures selected as potential core outcome measurement instruments to measure physical functioning, pain intensity and health-related quality of life in clinical trials in non-specific low back pain. The NIH Task Force report for research standards for chronic LBP recommended the 4-item Patient-Reported Outcomes Measurement Information System Physical Function short form (PROMIS-PF-4) to measure physical functioning[31]; in this Delphi the standard 4-, 6-, 8-, 10-, and 20-item PROMIS-PF short forms[2,40,95] were included as potential core instruments. The 36-item Short Form Health Survey (SF36) is the most frequently used PROM to measure HRQoL in LBP[15] and its physical functioning subscale (SF36-PF) was also included as a standalone instrument for physical functioning (Table 1). The Sickness Impact Profile is one of the most frequently used tools to measure HRQoL in LBP,[15] but it was not selected because its length (ie, 136 items) was considered excessively burdensome for inclusion in a COS. The 10-item PROMIS Global Health short form (PROMIS-GH-10) is not broadly used, but it was included for HRQoL as its face validity was judged to be similar to that of the other selected PROMs and because recently it was recommended by another core set initiative[96] (Table 1).

3.2. Measurement properties of the potential core outcome measurement instruments

The systematic review on physical functioning PROMs revealed low or very low quality evidence underpinning the content validity of all the PROMs, with the exception of the 24-item RMDQ (RMDQ-24), which displayed high quality evidence of insufficient comprehensiveness and sufficient comprehensibility.[18] High quality evidence of insufficient unidimensionality was found for ODI 1.0, RMDQ-24, and RMDQ-18; unidimensionality of other PROMs was underpinned by moderate quality evidence, or no studies were found (Appendix 2, available online as supplemental digital content at http://links.lww.com/PAIN/A511).[18] The systematic review on pain intensity PROMs highlighted that content validity of visual analogue scale (VAS), Numeric Rating Scale (NRS), and pain severity subscale of the Brief Pain Inventory (BPI-PS) was underpinned by (very) low quality evidence (Appendix 2, available online as supplemental digital content at http://links.lww.com/PAIN/A511) (Chiarotto et al., 2018. Measurement properties of Numeric Rating Scale, Visual Analogue Scale and Pain Severity subscale of Brief Pain Inventory in patients with low back pain: a systematic review. Unpublished data). High quality evidence was found only for insufficient measurement error of the NRS. Moderate quality evidence was found for sufficient structural validity and internal consistency of BPI-PS, inconsistent construct validity of BPI-PS, and inconsistent responsiveness of NRS. There was lower quality evidence or no studies on the other measurement properties of these 3 instruments (Appendix 2, available online as supplemental digital content at http://links.lww.com/PAIN/A511) (Chiarotto et al., 2018. Measurement properties of Numeric Rating Scale, Visual Analogue Scale and Pain Severity subscale of Brief Pain Inventory in patients with low back pain: a systematic review. Unpublished data). In the systematic review on HRQoL PROMs, very low quality evidence was found on the content validity of each PROM (Appendix 2, available online as supplemental digital content at http://links.lww.com/PAIN/A511) (Chiarotto et al., 2018. Evidence on the measurement properties of health-related quality of life instruments is largely missing in patients with low back pain, a systematic review. Unpublished data). High quality evidence was found only for insufficient construct validity of EuroQol 5D (EQ-5D) utility and VAS scores. Moderate quality evidence was found for inconsistent construct validity of component summaries of the SF36 and for inconsistent responsiveness of the EQ-5D utility score. All other measurement properties were underpinned by lower quality evidence or not assessed (Appendix 2, available online as supplemental digital content at http://links.lww.com/PAIN/A511) (Chiarotto et al., 2018. Evidence on the measurement properties of health-related quality of life instruments is largely missing in patients with low back pain, a systematic review. Unpublished data). A detailed presentation of results of these reviews is available elsewhere (Chiarotto et al., 2018. Measurement properties of Numeric Rating Scale, Visual Analogue Scale and Pain Severity subscale of Brief Pain Inventory in patients with low back pain: a systematic review. Unpublished data; Unpublished data; Chiarotto et al., 2018. Evidence on the measurement properties of health-related quality of life instruments is largely missing in patients with low back pain, a systematic review. Unpublished data; and Ref. 18).

3.3. Delphi study

In total, 207 people were invited to participate in the Delphi study, and response rates in the 2 rounds were 44% and 41%, respectively (Fig. 1). Most participants were from the United States, the Netherlands, United Kingdom, and Australia; the most represented disciplines were epidemiology, physical therapy, human movement sciences, psychology, and orthopedics (Table 2). In Round 1, 13 participants had LBP: 11 were male, mean (SD) age was 56 (8) years, 7 were classified as nsLBP by a health care professional, 11 with pain lasting for more than 1 year, 1 with pain spreading down the legs, and none having received a LBP operation or disability compensation. In round 2, 14 participants reported LBP with similar characteristics.
Figure 1.

Flowchart of participants in the Delphi study on core outcome measurement instruments for clinical trials in nonspecific low back pain (LBP).

Table 2

Characteristics of participants in the Delphi study.

Flowchart of participants in the Delphi study on core outcome measurement instruments for clinical trials in nonspecific low back pain (LBP). Characteristics of participants in the Delphi study.

3.3.1. Delphi round 1

In the first round, there was a consensus (90%) to provisionally recommend core outcome measurement instruments, despite the absence of adequate evidence to support the PROMs' content validity. Several participants emphasized that core instruments should be recommended because COS development and/or PROM validity are moving fields in which results are always provisional, meaning that this should not refrain from providing recommendations on the best available instruments. There was also a consensus (90%) to reduce the list of potential core instruments for physical functioning because for 8 of them (ODI 1.0, Chiropractic Low Back Pain Disability Questionnaire [CLBPDQ], Modified Low Back Pain Disability Questionnaire [MLBPDQ], RMDQ-18, LBPRS, PROMIS-PF-4, PROMIS-PF-6, and PROMIS-PF-10), there were also convincing arguments for not being endorsed. Main reasons were that some of these PROMs were cross culturally adapted in very few languages (ie, ODI 1.0, CLBPDQ, MLBPDQ, and LBPRS-DI)[18] or they could be extracted from other instruments included in the list of potential core instruments (ie, RMDQ-18, PROMIS-PF-4, PROMIS-PF-6, and PROMIS-PF-10). Regarding the remaining physical functioning PROMs, 78% of the panel agreed to endorse ODI 2.1a as a core outcome measurement instrument, whereas 71% and 70% agreed on not endorsing RMDQ-23 and MPI-PI, respectively (Fig. 2). No consensus was reached on the other 6 PROMs, with Quebec Back Pain Disability Scale (QBPDS) (62% in favor and 24% unsure) and RMDQ-24 (50% in favor and 26% unsure) being the second and third highest in endorsement (Fig. 2). These results were consistent across stakeholder groups.
Figure 2.

Delphi endorsement of 9 physical functioning tools as core outcome measurement instruments for clinical trials in nonspecific low back pain (round 1).

Delphi endorsement of 9 physical functioning tools as core outcome measurement instruments for clinical trials in nonspecific low back pain (round 1). For pain intensity, NRS was endorsed (75%), but the panel was split on BPI-PS (47% in favor and 24% against) and VAS (46% in favor and 36% against) (Fig. 3). For HRQoL, the panel was unsure for all included instruments, with the Short Form Health Survey 12 (SF12) being the closest to endorsement (64% in favor and 21% unsure) (Fig. 4A). Single participants suggested 11 additional potential instruments, whereas 2 participants suggested the PROMIS pain interference instrument. Two participants highlighted that the generic information supplied on the costs of using PROMs may not be correct and that more precise costs for each instrument should have been reported. Four participants expressed the concern that the instruments considered may be “dated” and 2 of these participants suggested that new instruments should be developed. Two other participants criticized our systematic reviews for pain intensity and HRQoL PROMs on the basis that they should have included studies in all pain conditions.
Figure 3.

Delphi endorsement of 3 pain intensity tools as core outcome measurement instruments for clinical trials in nonspecific low back pain (round 1).

Figure 4.

Delphi endorsement of 4 health-related tools as core outcome measurement instruments for clinical trials in nonspecific low back pain (round 1 and round 2).

Delphi endorsement of 3 pain intensity tools as core outcome measurement instruments for clinical trials in nonspecific low back pain (round 1). Delphi endorsement of 4 health-related tools as core outcome measurement instruments for clinical trials in nonspecific low back pain (round 1 and round 2).

3.3.2. Delphi round 2

In the second round, the exact cost for the use of each instrument was presented together with information on characteristics and measurement properties. Given the inconsistency of suggestions for additional potential core instruments, none were added to this round. For physical functioning, because a consensus on endorsing ODI 2.1a was reached in Round 1, participants were asked whether they could see any major argument against its endorsement. Eleven participants responded that they were concerned with its fees (350€/study for funded academic research and 0€/study for nonfunded academic research),[3] arguing against any fee to use instruments for measuring core domains, expressing concerns that it could represent a barrier for funded academic research in low- and middle-income countries, and that fees might be increased once an instrument is recommended as core (Appendix 3, available online as supplemental digital content at http://links.lww.com/PAIN/A511). QBPDS and RMDQ-24 were presented again but no consensus was reached on their endorsement (ie, 54% in favor and 27% against for QBPDS, 52% in favor and 33% against for RMDQ-24). For pain intensity, because a consensus on endorsing NRS was achieved in round 1, participants were asked whether they agreed on endorsing an NRS referring to “average LBP intensity over the last week” in the introductory statement (Appendix 1, available online as supplemental digital content at http://links.lww.com/PAIN/A511), similar to other recommendations for LBP.[24,31] A strong consensus (96%) was achieved on endorsing this NRS version. For HRQoL, results were similar to round 1, with the SF12 being the highest on endorsement (51% in favor and 22% unsure) (Fig. 4B). The main reasons against endorsing these instruments were overlap of their content with physical functioning and pain intensity instruments; scarce validity for measuring HRQoL for EQ-5D; unfamiliarity and lack of testing in nsLBP for PROMIS-GH-10; high costs for SF36 and SF12; and excessive length of SF36. Various suggestions for the research agenda were made by the participants, with the most consistent being to investigate the measurement properties not fully assessed so far (9 participants), perform head-to-head comparison studies on measurement studies of recommended and not recommended PROMs (6), take PROMIS instruments more into account (4), develop a better outcome measurement instrument for LBP (3), develop a new instrument for HRQoL (2), develop an instrument for LBP that takes into account other constructs (eg, social participation) (2), use instruments that can be administered with computerized adaptive testing (CAT) (2), consider the recently developed Musculoskeletal Health Questionnaire[56] in future clinimetric studies (2), and assess the minimal important difference of the various instruments to explore whether it differs depending on patient characteristics and interventions (2).

3.4. Recommendations on core outcome measurement instruments

Considering the Delphi process results, the Steering Committee discussed and formulated a set of recommendations on measurement instruments to be used in nsLBP clinical trials (Table 3). This includes ODI 2.1a and NRS to measure physical functioning and pain intensity, respectively. Given the concerns of Delphi participants and some committee members on the ODI 2.1a fees, the instrument's distributor was contacted to ask iwhether it was possible to eliminate or reduce the ODI 2.1a fee for funded academic research. Because this was not possible, the Steering Committee decided to also recommend the RMDQ-24 for physical functioning because it achieved the highest level of consensus among the free-to-use instruments (Fig. 2), but also because its measurement properties resemble those of ODI 2.1a in head-to-head comparisons studies.[17] Despite a similar level of endorsement and measurement properties, the QBPDS was not recommended because of the same fee issue as the ODI 2.1a and also to limit the number of instruments for a single core domain.
Table 3

Core outcome measurement instruments for clinical trials in nonspecific low back pain.

Core outcome measurement instruments for clinical trials in nonspecific low back pain. The NRS with a 1-week recall period (Appendix 1, available online as supplemental digital content at http://links.lww.com/PAIN/A511) should be used to measure pain intensity in nsLBP trials. Because it is a free tool that obtained ample consensus in the Delphi, the Steering Committee does not recommend another instrument for pain intensity. However, researchers should note the limitations in its use for acute nsLBP trials when participants may have had pain for less than 1 week at baseline.[41,110] In these trials, the addition of an NRS with a 24-hour recall period is suggested. Despite the lack of a consensus for measuring HRQoL, to reduce measurement variability for this domain, we recommend the use of the SF12 as it was closest to a consensus (Fig. 4), but because it is not free of charge, the PROMIS-GH-10 is also recommended (Table 3). Both PROMs provide a physical and a mental summary score (Table 1), which allows pooling of their results in meta-analysis. The SF36 is not recommended because of its length. The EQ-5D is not recommended because of its cost; it results in a utility index, which is not possible to pool with data from other instruments and its content is strongly redundant given the domains physical functioning and pain intensity. However, the Steering Committee suggests inclusion of the EQ-5D (preferably EQ-5D-5L version[55,65]) in nsLBP clinical trials if there is an economic evaluation. No specific recommendations regarding time frames of outcome assessment and reporting of adverse events are made in line with the NIH Task Force Report for chronic LBP suggestion.[31] Time frames should match the specific goals and feasibility of each clinical trial. Potential adverse events should preferably be specified before the start of a clinical trial and measured prospectively. The Steering Committee suggests the use of previous consensus-based recommendations for reporting of outcome results[43] and for interpreting change scores on core instruments.[35,86]

4. Discussion

This study formulates recommendations on core outcome measurement instruments for use in nsLBP trials (Table 3). They comprise the ODI 2.1a or RMDQ-24 for physical functioning, NRS with a 1-week recall period for pain intensity, and SF12 or PROMIS-GH-10 for HRQoL. In addition, a simple statement reporting whether any death occurred in a clinical trial is recommended.[16] These recommendations update the previous LBP outcome recommendations of Deyo et al.[30] and Bombardier.[8] This COS applies to both acute and chronic nsLBP, and in the latter group, it complements the baseline research standards recommended by the NIH Task Force Report.[31]

4.1. Recommendations for future research

A recommended process that involved identification and review of measurement properties for candidate instruments and a consensus process for final selection was followed.[6,92] This core outcome measurement set is preliminary because high quality evidence is lacking for several measurement properties of various PROMs (Chiarotto et al., 2018. Measurement properties of Numeric Rating Scale, Visual Analogue Scale and Pain Severity subscale of Brief Pain Inventory in patients with low back pain: a systematic review. Unpublished data; Chiarotto et al., 2018. Evidence on the measurement properties of health-related quality of life instruments is largely missing in patients with low back pain, a systematic review. Unpublished data; and Ref. 18). In particular, there is an urgent need to better assess and compare content validity, structural validity, reliability, and responsiveness of the recommended instruments with other instruments (Chiarotto et al., 2018. Measurement properties of Numeric Rating Scale, Visual Analogue Scale and Pain Severity subscale of Brief Pain Inventory in patients with low back pain: a systematic review. Unpublished data; Chiarotto et al., 2018. Evidence on the measurement properties of health-related quality of life instruments is largely missing in patients with low back pain, a systematic review. Unpublished data; and Ref. 18). Developing a COS is an iterative process that should be updated if new evidence emerges on outcome domains or measurement instruments. Therefore, these recommendations are likely to evolve in the future. Cross-cultural validity has not been investigated for the recommended instruments or other candidate PROMs (Chiarotto et al., 2018. Measurement properties of Numeric Rating Scale, Visual Analogue Scale and Pain Severity subscale of Brief Pain Inventory in patients with low back pain: a systematic review: Unpublished data; Chiarotto et al., 2018. Evidence on the measurement properties of health-related quality of life instruments is largely missing in patients with low back pain, a systematic review: Unpublished data; and Ref. 18). This measurement property assesses whether the performance of the items on a translated or culturally adapted PROM is an adequate reflection of the performance of the items of the original version.[85] It can be evaluated using data from several countries to assess differential item functioning,[26,100] and it would give a clear indication on the appropriateness of pooling data on the same PROM from different countries.

4.1.1. Physical functioning

Roland-Morris Disability Questionnaire and ODI were included in earlier recommendations for physical functioning in LBP,[8,30] but this report gives more precise recommendations on which versions to use (Table 3). The International Consortium for Health Outcomes Measurement standard set for LBP also recommended ODI 2.1a to measure physical functioning because it “is the most heavily studied, providing superior interpretability” and “the most feasible to implement as it has been validated in 14 languages (…) and is relatively short.”[24] One systematic review showed that from a measurement point of view, there are no strong reasons to prefer ODI 2.1a over RMDQ-24 in patients with nsLBP.[17] Moreover, the RMDQ-24 is available in some languages in which the ODI 2.1a is not.[18] There is high quality evidence suggesting that RMDQ-24 has limitations in key aspects of validity such as comprehensiveness and unidimensionality,[18] but its (content and structural) validity has never been directly compared with that of ODI 2.1a in the same group of patients with LBP.[17] Direct head-to-head comparisons of instruments should be extended to include other recently suggested instruments to measure physical functioning in LBP (eg, QBPDS or PROMIS-PF short forms).[21,31,99] Comparing the content validity has the highest priority because this is the first measurement property that should be evaluated when selecting PROMs for a COS.[92] The measurement properties of PROMIS-PF instruments have been assessed in the generic population or in a heterogeneous spine or pain population,[10,25,27,40,60,61,88,95,97] but there is little evidence in patients with nsLBP. A recent study compared unidimensionality and item response theory performance of PROMIS-PF short forms with the RMDQ-24 in patients with chronic nsLBP, finding promising results in favor of PROMIS-PF short forms (Chiarotto et al., 2018. The 4-, 6-, 8- and 10-item PROMIS Physical Function short forms have better psychometric performance than the 24-item Roland Morris Disability Questionnaire: Unpublished data). It should be noted that there is a lively debate on the question whether generic instruments should be tested in each specific disease population or not.[78,113] The PROMIS-PF item bank was also developed to administer computerized adaptive testing (CAT) forms (ie, PROMIS-PF-CAT[40]), however CAT instruments have not been considered for LBP outcome standardization because they are not yet feasible for use in every trial internationally. Nonetheless, researchers should also test CAT forms because CAT simulations were demonstrated to provide increased measurement efficiency and precision.[27,40] Some participants of this Delphi study suggested that new outcome measurement instruments should be developed for LBP, but we are hesitant to suggest this as a high research priority because many PROMs to measure physical functioning are already available[48] and efforts may be better spent on generating evidence on the key measurement properties of these instruments.

4.1.2. Pain intensity

An NRS with a 1-week recall period has been repeatedly suggested as a key instrument for pain intensity in LBP,[21,24,31] and these previous suggestions strengthen our recommendation. Although the evidence base for this tool was of low quality in nsLBP (Chiarotto et al., 2018. Measurement properties of Numeric Rating Scale, Visual Analogue Scale and Pain Severity subscale of Brief Pain Inventory in patients with low back pain: a systematic review. Unpublished data). There is a larger body of evidence in other pain conditions suggesting that its measurement properties are satisfactory.[52,57,67] Nevertheless, pain-rating scales definitely present some shortcomings, such as capturing multiple dimensions of the pain experience, and not only its intensity.[39,93,109] For this reason, we decided to add the key word “intensity” in the recommended NRS, and more studies exploring the patients' perspective on these tools are needed. A few studies have directly compared the measurement performance of single-item NRS with that of multiitem instruments (eg, BPI-PS) and suggested that single-item instruments may be acceptable.[66,68,69]

4.1.3. Health-related quality of life

Reaching a consensus on a single instrument for HRQoL proved to be challenging. This highlights various issues with the domain and its instruments. Compared with physical functioning and pain intensity, HRQoL displayed a lower level of consensus for inclusion in this COS;[16] it has a broad definition, is multidimensional in nature, and has been less frequently assessed in LBP clinical trials.[46] Moreover, only the construct validity of commonly used PROMs has been adequately assessed in patients with nsLBP (Chiarotto et al., 2018. Evidence on the measurement properties of health-related quality of life instruments is largely missing in patients with low back pain, a systematic review. Unpublished data). Low back pain is considered as a multidimensional biopsychosocial pain disorder,[90,106] and some authors have advocated the use of multidimensional instruments to fully capture the complexity of treatment response.[45,50] Health-related quality of life is a domain that meets the LBP multidimensional nature, and this may be a sufficient reason to make an effort to better define this domain for patients with nsLBP, taking into account all the aspects that impact and burden their life.[11,45] New back-specific or musculoskeletal-specific PROMs, such as instruments based on the International Classification of Functioning LBP core set[4] or the Musculoskeletal Health Questionnaire,[56] should be considered in future clinimetric studies for a direct comparison with the generic instruments recommended here.

4.2. Strengths and weaknesses

Overall, the main strengths of the current study are the thorough assessment of the measurement properties of candidate instruments (Chiarotto et al., 2018. Measurement properties of Numeric Rating Scale, Visual Analogue Scale and Pain Severity subscale of Brief Pain Inventory in patients with low back pain: a systematic review: Unpublished data; Chiarotto et al., 2018. Evidence on the measurement properties of health-related quality of life instruments is largely missing in patients with low back pain, a systematic review: Unpublished data; and Ref. 18) and transparency in each stage of the study (eg, providing full feedback reports to Delphi participants). The systematic reviews were conducted according to the most recent COSMIN methodology (Prinsen et al., 2018. COSMIN guideline for systematic reviews of patient-reported outcome measures: Unpublished data; Terwee et al., 2018. COSMIN standards and criteria for evaluating the content validity of patient-reported outcome measures: a Delphi study: Unpublished data; and Refs. 82,85). and included a thorough assessment of the content validity of the instruments as well as information about their development phase. The Delphi participants were presented with summary information on the potential core instruments, including measurement properties and availability and, therefore, had the opportunity to make informed decisions, taking into account the instruments' content also. This is the first study to perform a consensus procedure on core outcome measurement instruments for nsLBP and the first one to use a Delphi survey to seek a consensus on instruments for any health condition. Another strength of this project is that the selected outcome domains and measurement instruments represent those for which there is a consensus across relevant stakeholders in the nsLBP field. Therefore, it is reasonable to suggest that these recommendations may also apply to observational studies or routine clinical practice. A limitation of our study regards the Delphi panel selection. It included a selected sample of researchers, clinicians, and patient representatives that may not generalize to the whole LBP community. We attempted to be comprehensive in inviting participants and we have described the sample appropriately (Table 2), but our sample may not be fully representative. Another potential limitation is that “ordinary” patients were not involved in the consensus procedure. Nevertheless, it should be underlined that it remains unclear how patients can contribute to the selection of core instruments taking aspects like measurement properties into account, and methodological research in this field is lacking. In addition, all existing studies in which patients with nsLBP were asked about their perspective on the potential core instruments were included in the 3 systematic reviews and this became part of the content validity evidence synthesis presented in the Delphi survey. Another limitation may be that potential core instruments were selected among those most frequently used and recommended, potentially overlooking some more recent, less frequently used, and/or investigated tools; however, it should be also noted that PROMIS instruments were included in our consensus procedure to partly address this issue. Delphi open-ended questions were reviewed and categorized by only 1 reviewer with no double checking by a second one; this may also represent a potential limitation of this study.

5. Conclusions

In summary, this study has formulated a preliminary core outcome measurement set specifying instruments to be included in every clinical trial in patients with nsLBP (Table 3). These recommendations will be updated as further evidence on the measurement properties of recommended and alternative instruments becomes available.

Conflict of interest statement

The authors have no conflict of interest to declare. R. Buchbinder, C.-W.C. Lin, and C.G. Maher are supported by Australian National Health and Medical Research Council (NHMRC) Research Fellowships. N.E. Foster is supported by a UK National Institute for Health Research (NIHR) Research Professorship (NIHR-RP-011-015). These funding bodies did not have any role in designing the study, in collecting, analysing and interpreting the data, in writing this manuscript, and in deciding to submit it for publication.
  103 in total

1.  Driving up the quality and relevance of research through the use of agreed core outcomes.

Authors:  Paula Williamson; Douglas Altman; Jane Blazeby; Michael Clarke; Elizabeth Gargon
Journal:  J Health Serv Res Policy       Date:  2012-01

2.  Report of the NIH Task Force on research standards for chronic low back pain.

Authors:  Richard A Deyo; Samuel F Dworkin; Dagmar Amtmann; Gunnar Andersson; David Borenstein; Eugene Carragee; John Carrino; Roger Chou; Karon Cook; Anthony DeLitto; Christine Goertz; Partap Khalsa; John Loeser; Sean Mackey; James Panagis; James Rainville; Tor Tosteson; Dennis Turk; Michael Von Korff; Debra K Weiner
Journal:  J Pain       Date:  2014-04-29       Impact factor: 5.820

Review 3.  Non-specific low back pain.

Authors:  Chris Maher; Martin Underwood; Rachelle Buchbinder
Journal:  Lancet       Date:  2016-10-11       Impact factor: 79.321

4.  COSMIN Risk of Bias checklist for systematic reviews of Patient-Reported Outcome Measures.

Authors:  L B Mokkink; H C W de Vet; C A C Prinsen; D L Patrick; J Alonso; L M Bouter; C B Terwee
Journal:  Qual Life Res       Date:  2017-12-19       Impact factor: 4.147

5.  Evaluation of the PROMIS physical function item bank in orthopaedic patients.

Authors:  Man Hung; Daniel O Clegg; Tom Greene; Charles L Saltzman
Journal:  J Orthop Res       Date:  2011-03-15       Impact factor: 3.494

6.  The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study.

Authors:  Lidwine B Mokkink; Caroline B Terwee; Donald L Patrick; Jordi Alonso; Paul W Stratford; Dirk L Knol; Lex M Bouter; Henrica C W de Vet
Journal:  Qual Life Res       Date:  2010-02-19       Impact factor: 4.147

7.  Mechanical Low Back Pain: Secular Trend and Intervention Topics of Randomized Controlled Trials.

Authors:  Greta Castellini; Silvia Gianola; Giuseppe Banfi; Stefanos Bonovas; Lorenzo Moja
Journal:  Physiother Can       Date:  2016       Impact factor: 1.037

Review 8.  Measures of adult pain: Visual Analog Scale for Pain (VAS Pain), Numeric Rating Scale for Pain (NRS Pain), McGill Pain Questionnaire (MPQ), Short-Form McGill Pain Questionnaire (SF-MPQ), Chronic Pain Grade Scale (CPGS), Short Form-36 Bodily Pain Scale (SF-36 BPS), and Measure of Intermittent and Constant Osteoarthritis Pain (ICOAP).

Authors:  Gillian A Hawker; Samra Mian; Tetyana Kendzerska; Melissa French
Journal:  Arthritis Care Res (Hoboken)       Date:  2011-11       Impact factor: 4.794

Review 9.  A systematic review and meta-synthesis of the impact of low back pain on people's lives.

Authors:  Robert Froud; Sue Patterson; Sandra Eldridge; Clive Seale; Tamar Pincus; Dévan Rajendran; Christian Fossum; Martin Underwood
Journal:  BMC Musculoskelet Disord       Date:  2014-02-21       Impact factor: 2.362

10.  How to select outcome measurement instruments for outcomes included in a "Core Outcome Set" - a practical guideline.

Authors:  Cecilia A C Prinsen; Sunita Vohra; Michael R Rose; Maarten Boers; Peter Tugwell; Mike Clarke; Paula R Williamson; Caroline B Terwee
Journal:  Trials       Date:  2016-09-13       Impact factor: 2.279

View more
  78 in total

1.  The Italian version of the Quebec Back Pain Disability Scale: cross-cultural adaptation, reliability and validity in patients with chronic low back pain.

Authors:  Marco Monticone; Luca Frigau; Francesco Mola; Barbara Rocca; Franco Franchignoni; Salvatore Simone Vullo; Calogero Foti; Alessandro Chiarotto
Journal:  Eur Spine J       Date:  2019-09-26       Impact factor: 3.134

Review 2.  Avoidance-endurance fast screening (AE-FS) : Content and predictive validity of a 9-item screening instrument for patients with unspecific subacute low back pain.

Authors:  S V Wolff; R Willburger; D Hallner; A C Rusu; H Rusche; T Schulte; M I Hasenbring
Journal:  Schmerz       Date:  2020-05       Impact factor: 1.107

3.  Can demographic and anthropometric characteristics predict clinical improvement in patients with chronic non-specific low back pain?

Authors:  Indiara Soares Oliveira; Leonardo Oliveira Pena Costa; Alessandra Narciso Garcia; Gisela Cristiane Miyamoto; Cristina Maria Nunes Cabral; Lucíola da Cunha Menezes Costa
Journal:  Braz J Phys Ther       Date:  2018-06-28       Impact factor: 3.377

4.  Description of low back pain clinical trials in physical therapy: a cross sectional study.

Authors:  Dafne Port Nascimento; Gabrielle Zoldan Gonzalez; Amanda Costa Araujo; Leonardo Oliveira Pena Costa
Journal:  Braz J Phys Ther       Date:  2018-09-13       Impact factor: 3.377

5.  Characterization of microenvironmental changes in the intervertebral discs of patients with chronic low back pain using multiparametric MRI contrasts extracted from Z-spectrum.

Authors:  Li Li; Zhiguo Zhou; Wei Xiong; Jicheng Fang; Alessandro Scotti; Mehran Shaghaghi; WenZhen Zhu; Kejia Cai
Journal:  Eur Spine J       Date:  2021-01-21       Impact factor: 3.134

Review 6.  Development of a standard set of outcome measures for non-specific low back pain in Dutch primary care physiotherapy practices: a Delphi study.

Authors:  A C Verburg; S A van Dulmen; H Kiers; M W G Nijhuis-van der Sanden; P J van der Wees
Journal:  Eur Spine J       Date:  2019-04-19       Impact factor: 3.134

7.  [Avoidance-endurance fast screening (AE-FS) : Content and predictive validity of a 9‑item screening instrument for patients with unspecific subacute low back pain].

Authors:  S V Wolff; R Willburger; D Hallner; A C Rusu; H Rusche; T Schulte; M I Hasenbring
Journal:  Schmerz       Date:  2018-08       Impact factor: 1.107

8.  Effects of behavioural exercise therapy on the effectiveness of multidisciplinary rehabilitation for chronic non-specific low back pain: a randomised controlled trial.

Authors:  Jana Semrau; Christian Hentschke; Stefan Peters; Klaus Pfeifer
Journal:  BMC Musculoskelet Disord       Date:  2021-05-29       Impact factor: 2.362

9.  Developing clinical prediction models for nonrecovery in older patients seeking care for back pain: the back complaints in the elders prospective cohort study.

Authors:  Wendelien H van der Gaag; Alessandro Chiarotto; Martijn W Heymans; Wendy T M Enthoven; Jantine van Rijckevorsel-Scheele; Sita M A Bierma-Zeinstra; Arthur M Bohnen; Bart W Koes
Journal:  Pain       Date:  2021-06-01       Impact factor: 6.961

10.  Understanding regional activation of thoraco-lumbar muscles in chronic low back pain and its relationship to clinically relevant domains.

Authors:  Francesca Serafino; Marco Trucco; Adele Occhionero; Giacinto Luigi Cerone; Alessandro Chiarotto; Taian Vieira; Alessio Gallina
Journal:  BMC Musculoskelet Disord       Date:  2021-05-11       Impact factor: 2.362

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.