Literature DB >> 27776141

A Systematic Review of Outcome Measures Use, Analytical Approaches, Reporting Methods, and Publication Volume by Year in Low Back Pain Trials Published between 1980 and 2012: Respice, adspice, et prospice.

Robert Froud^1,2, Shilpa Patel², Dévan Rajendran³, Philip Bright³, Tom Bjørkli¹, Rachelle Buchbinder⁴, Sandra Eldridge⁵, Martin Underwood².

Abstract

BACKGROUND: Increasing patient-reported outcome measures in the 1980s and 1990s led to the development of recommendations at the turn of the millennium for standardising outcome measures in non-specific low back pain (LBP) trials. Whether these recommendations impacted use is unclear. Previous work has examined citation counts, but actual use and change over time, has not been explored. Since 2011, there has been some consensus on the optimal methods for reporting back pain trial outcomes. We explored reporting practice, outcome measure use, and publications over time.
METHODS: We performed a systematic review of LBP trials, searching the European Guidelines for the management of LBP, extending the search to 2012. We abstracted data on publications by year, outcome measure use, analytical approach, and approaches taken to reporting trials outcomes. Data were analysed using descriptive statistics and regression analyses.
RESULTS: We included 401 trials. The number of published trials per year has increased by a factor of 4.5 from 5.4 (1980-1999) to 24.4 (2000-2012). The most commonly used outcome measures have been the Visual Analogue Scale for pain intensity, which has slowly increased in use since 1980/81 from 20% to 60% of trials by 2012, and the Roland-Morris Disability Questionnaire, which rose to 55% in 2002/2003, and then fell back to 28% by 2012. Most trialists (85%) report between-group mean differences. Few (8%) report individual improvements, and some (4%) report only within-group analyses. Student's t test, ANOVA, and ANCOVA regression, or mixed models, were the most common approaches to analysis.
CONCLUSIONS: Recommendations for standardising outcomes may have had a limited or inconsistent effect on practice. Since the research community is again considering outcome measures and modifying recommendations, groups offering recommendations should be cognisant that better ways of generating trialist buy-in may be required in order for their recommendations to have impact.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2016 PMID： 27776141 PMCID： PMC5077121 DOI： 10.1371/journal.pone.0164573

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Patient-reported outcome measures (PROMs) are outcomes that are reported by patients, rather than being objectively assessed or involving third-party (e.g. clinician) judgement. Throughout the 1980s and 1990s, multiple back-specific PROMs were developed and began to dominate as outcome measures used in non-specific low back pain (nsLBP) trials. Between 1998 and 2000 recommendations were made to standardise outcome measure use to facilitate cross-trial comparisons, pooling of data, and encourage scale familiarity. [1-3] More recently, researchers and clinicians have again begun to question whether the right things are being measured and there have been calls to review the outcome measures used in trials. [4-8] It is not clear whether the millennial recommendations for standardisation had an effect on practice. Previous studies have explored the number of times back-specific measures have been cited, but not actual use; also, trends of use over time have not hitherto been explored. [9, 10] Results from trials using PROMs can be reported differently, and this is known to affect clinicians’ interpretations of effectiveness and subsequent decision-making. [11, 12] With this in mind, recommendations for reporting outcomes in back pain trials were made in 2011 and 2014. [13, 14] We aimed to explore actual use of outcome measures in nsLBP trials, between 1980 and 2012, spanning the publication point of the millennial core-set recommendations. Our objectives were to identify the most commonly used outcomes, and the domain coverage of back-specific PROMs, and to consider whether there was any change in the trajectories of outcome measure use over the period of interest. Additionally, we reviewed the number of publications over time, reporting methods and analytical approaches for the most commonly used outcome measures to provide a baseline assessment of current practice so that any future change may be monitored.

Materials and Methods

Two independent reviewers (RF and SP) identified randomised controlled trials (RCTs) of any intervention for nsLBP published in or after 1980, from COST-B13’s European Guidelines for the Management of Low Back Pain (EGLBP), which included a comprehensive systematic search of all interventions for nsLBP, and the systematic reviews reported in the EGLBP. [15] As the COST-B13 search ended in November 2002, we extended the search to January 1, 2007 using the Cochrane Library, EMBASE, Lilacs, PsycINFO, and PubMed databases, and we hand-searched the Health Technology Assessment (HTA) journal. We later updated the search to January 1, 2012, using the Cochrane Library, PubMed, and EMBASE database. We omitted PsycINFO, Lilacs, and the HTA journal in this extension due to good cross-coverage from the other databases (see Discussion). An example search strategy is included as a supplementary file (S1 Text). We combined material from the EGLBP and the extended searches, removed duplicates, and short-listed by title and abstract. Full-texts were obtained if the titles and abstract alone contained insufficient information for assessment against the criteria listed in Table 1.

Table 1

Inclusion and exclusion criteria.

Order	Inclusion criterion
1	RCTs of nsLBP
	Exclusion criteria
1	Non-English language reports
2	Studies that were not RCTs or presented insufficient information for us to determine whether randomisation was used to allocate participants
3	Reports that self-identified as pilot/feasibility studies
4	Cross-over designs (because of limited utility in the LBP field)
5	RCTs with mixed samples (e.g. neck or thoracic pain in addition to LBP), samples of participants with radiating leg pain, or referred pain extending past the knee, or samples including LBP specific pathology (e.g cancer, ankylosing spondylitis, or disc herniation) or pregnancy
6	Non-inferiority trials (because of limited utility in the LBP field)
7	Follow-up studies with no new outcome measures, and multiple publications. In the case of multiple publications, we included the first published article and excluded subsequent publications

RCT = Randomised controlled trial; nsLBP = non-specific low back pain; LBP = low back pain.

Data abstraction and validation

Two reviewers (either RF, SP, TB, PB, or DR) independently abstracted data on outcome measure use, details of primary outcome analysis, and reporting methods. An outcome was identified as ‘primary’ if (1) the outcome was nominated as such; if no outcome was nominated, or multiple outcomes were nominated, we used (2) the outcome measure on which the sample size calculation was based; if this was not reported, we identified (3) the first outcome measure referred to in the abstract; and if this was not identified, we used (4) the first outcome mentioned in the paper. We identified the primary time point of interest, or used the first follow-up time point in cases when this was not clear. This approach has been taken in other methodological reviews. [16-20] For comparison, a sensitivity analysis of primary outcome measure use for the most commonly used outcome measures was performed using only criterion 1 and 2. Disagreements were resolved through discussion and, if necessary, with arbitration and a third reviewer (RF, SE, or MU). Using Microsoft Visual Basic 6.3 (Microsoft, Washington) and Microsoft Office Excel 2003 (Microsoft, Washington), we developed a front-end program to assist the data abstraction process and manage abstracted data, which validated entries and provided alerts in the case of missed fields. For outcome measure identification, we used expert validation of 20% of papers, as has been done in other methodological reviews. [21] Half (i.e. 10%) were selected at random and half (i.e. 10%) were purposively sampled (papers that we anticipated might lead to disagreement) from the papers marked for full-text extraction and then given to an independent reviewer (either SE, RF, SP, TB, PB, or DR), for independent abstraction. Since early in the process we observed good (> 80%) agreement on outcome measurement identification, but inadequate agreement on analytical and reporting methods, full independent abstraction was subsequently used to identify analytical approach and reporting methods used. Disagreements were settled by arbitration involving one of the statisticians (either RF or SE). Quality of included trials was not evaluated, since we were interested in all non-specific back pain trials, regardless of the trials’ methodological quality.

Analysis

To report the proportion of outcome measure use by year, the total number of trials is needed as the denominator. For this reason, we first explored the number of published nsLBP trials by year. Prevalence of outcome measure use and in the case of PROMs, domain of measurement, was then calculated for primary and secondary outcomes, by year. We were particularly interested in PROM use, and we did not differentiate between different types of objective outcome measure use, or clinical judgements. Methods for reporting the two most commonly used outcome measures and the types of analysis used were summarised using descriptive statistics and graphical methods. For reporting methods, we explored the statistics used to summarise central tendency and variance, graphical forms of representation, and use of tables. For analytical approach, we explored the statistical test used to test the null hypothesis that a between group difference was zero, or the model that was fitted to the data. We used regression analysis to explore the relationship between publications and time, and outcome measure use and time, fitting polynomial terms if relationships were non-linear. Residuals from regression analysis were examined for fit. If data were too heteroschedastic (i.e variance of an outcome variable was dependent on the value of a predictor variable) for regression modelling, we fitted locally weighted scatter plot smoothing (Lowess) lines, which has the effect of smoothing across erratic data points, so that trends can be more easily visualised. All analyses were performed in Stata, version 12 (Statacorp, Texas). We did not publish a review protocol ahead of undertaking this work.

Results

We identified 7,066 potential articles from EGLBP and electronic databases and following removal of duplicates, titles and abstracts sifting, and full-text inspection against inclusion criteria, included 401 trials (Fig 1). Characteristics of included and excluded trials are detailed with their references as supplementary material (S1 and S2 Tables).

Fig 1

Flow chart showing search results.

The figure shows the number of initial hits, duplicates, exclusions based on titles and abstracts screening, and assessments at full text level evaluation.

Flow chart showing search results.

The figure shows the number of initial hits, duplicates, exclusions based on titles and abstracts screening, and assessments at full text level evaluation.

Publications over time

A linear regression model with a quadratic term in year was significant (β = −166.187, P = 0.002, β = 0.041, P = 0.002), and explained 79% of the variance (Fig 2). Some caution must be noted with respect to interpretation, since as can be seen from Fig 2, these data are slightly heteroschedastic. The number of publications increases after the millennium; the average number of publications per year from 1980 to 1999 is 5.4 and from 2000 to 2012 is 24.4.

Fig 2

Number of published non-specific low back pain trials by publication year between 1980 and 2012.

The figure shows the increase in the number of published non-specific low back pain trials by year of publication and change in publication rate over time. A Lowess smoother is fitted to these data.

Number of published non-specific low back pain trials by publication year between 1980 and 2012.

The figure shows the increase in the number of published non-specific low back pain trials by year of publication and change in publication rate over time. A Lowess smoother is fitted to these data.

Outcome measure use

Authors explicitly declared a primary outcome measure in 50% (n = 201) of trials. In trials that did not declare one, a primary outcome measure could be identified from a sample size calculation in 20% (n = 40) of trials. We identified the primary as the first outcome measure mentioned in the abstract, or in the paper, in the remaining 40% (n = 161). The most commonly used PROMs were the Visual Analogue Scale for measuring Pain intensity (VAS-P), [22], and the Roland Morris Disability Questionnaire (RMDQ), [23] Oswestry Disability Index (ODI), [24] Numerical Rating Scale for measuring pain intensity (NRS), [25, 26] and patient-rated global assessment of improvement (i.e. a health transition question (TQ) [27]) (Table 2 and Fig 3).

Table 2

The most common back-specific PROMs: Frequency of use.

Instrument	Primary outcome	Secondary outcome	Total use	Primary use (%)	Sensitivity analysis (%)^†
Visual Analogue Scale of back pain intensity [22]	119	86	205	29.7	28.2
Roland-Morris Disability Questionnaire [23]	58	94	152	14.5	20.7
Oswestry Disability Index [24]	36	84	120	9.0	10.4
Pain Intensity Numerical Rating Scale [25, 26]	37	39	76	9.2	8.7
Patient Rated Global Assessment (TQ) [27]*	10	53	63	2.5	2.5

PROM = Patient-reported outcome measure

* Example reference only

† Of proportional use as primary outcome measure

Fig 3

The five most common back-specific patient reported outcome measures.

The figure shows the use of the five most common back-specific patient reported outcome measures as primary and secondary outcome measures.

PROM = Patient-reported outcome measure * Example reference only † Of proportional use as primary outcome measure

The five most common back-specific patient reported outcome measures.

The figure shows the use of the five most common back-specific patient reported outcome measures as primary and secondary outcome measures. We note that we class the VAS-P and NRS as back-specific, since the wording of these instruments, when used in back pain trials, typically refers specifically to a person’s back pain. In total, there were 258 different PROMs used across the trials studied within the time period. Sensitivity analyses of primary outcome measure use within only trials that either explicitly declared the outcome measure as primary, or used it for a sample size calculation, revealed similar estimates of primary outcome measure use in across outcome measures, with the possible exception of the RMDQ (Table 2). Apart from PROMs, objectively assessed outcome measures were also common (n = 130), particularly as secondary outcome measures, as were medication and medical services consumption (n = 66), and subjective clinical examinations (n = 19). Pain and disability were the most commonly measured domains (Fig 4). Some outcomes, for example, adverse events, or adherence, have only ever been measured as secondary outcomes. We note that it is possible for the usage to exceed the number of included trials, which is due to some trials using more than one outcome measure to measure within these domains.

Fig 4

Domains of measurement in non-specific low back pain trials.

The figure shows the domains of measurement in non-specific low back pain trials published between 1980 and 2012.

Domains of measurement in non-specific low back pain trials.

The figure shows the domains of measurement in non-specific low back pain trials published between 1980 and 2012. The VAS-P has had a slowly increasing usage as an outcome measure in nsLBP trials, increasing from from 20% in slowly increased in use since 1980/81 to 60% of trials by 2012. There is some suggestion of a rise and fall in the use of the RMDQ, which peaks at 55% in 2002/2003. Use of the ODI has steadily increased. These data were too heteroschedastic for regression analysis and we present the data graphically with a Lowess smoother (Fig 5). We note the use of a large smoothing bandwidth to easily visualise trends; while the RMDQ smoother intersects the x-axis at 1982, the year of its publication, it was not recorded as used in any included trials until 1987. As such, these Lowess lines should be interpreted as a general impression of trend only. We have included as an additional file (S1 Fig) the plot with a smoother of half the bandwidth, to depict more sensitive trend-lines.

Fig 5

The five most common back-specific patient-reported outcome measures: Relative use by year.

The figure shows relative frequency of use for the most common back-specific patient-reported outcome measures, by publication year.

The five most common back-specific patient-reported outcome measures: Relative use by year.

The figure shows relative frequency of use for the most common back-specific patient-reported outcome measures, by publication year.

Reporting methods and analytical approach

For the two most commonly used outcomes, the VAS-P and the RMDQ, reporting methods include describing mean changes (within groups) or mean differences (between groups), P-values for changes or differences, and standard deviations (Table 3). Confidence intervals were only provided in around 40% of trials. Few authors reported individual improvements. The t test has been the most common analytical approach, followed by regression analyses. The most common covariates used in adjusted models were baseline score, sex, age, study centre, and episode duration, respectively. In 4% of trials utilising the VAS-P or RMDQ, authors present within-group analyses only.

Table 3

Reporting methods and statistical analysis: Prevalence of use.

Details	VAS-P(%)	RMDQ(%)	Totals(%)
Usage	205 (100)	152 (100)	357 (100)
Reporting method
Mean or mean difference	179 (87)	126 (83)	300 (85)
P-values	162 (79)	109 (72)	271 (76)
Standard deviation	134 (65)	98 (64)	232 (65)
Median	22 (11)	14 (9)	36 (10)
Range or IQR	20 (10)	13 (9)	33 (9)
Standard error	18 (9)	18 (12)	36 (10)
Confidence intervals	62 (30)	76 (50)	138 (39)
Number/proportion improved*	20 (10)	8 (5)	28 (8)
Number needed to treat	2 (1)	5 (3)	7 (2)
Odds ratio (improvement)	3 (1)	9 (6)	12 (3)
Relative risk (improvement)	1 (0)	6 (4)	7 (2)
Percentage change score	25 (12)	15 (10)	40 (11)
Standardised mean difference	9 (4)	11 (7)	20 (6)
Graphs and tables
Table	174 (85)	119 (78)	293 (82)
Line chart	60 (29)	39 (26)	99 (28)
Bar chart	29 (14)	9 (6)	38 (11)
Other	8 (4)	9 (6)	17 (5)
Statistical analysis
ANCOVA regression / mixed model	53 (25)	64 (42)	117 (33)
ANOVA/MANOVA regression / mixed model	74 (36)	39 (26)	113 (32)
t test	93 (45)	43 (28)	136 (38)
Non-parametric test^†	68 (33)	37 (24)	105 (29)
Other	18 (9)	9 (6)	27 (8)
Within-group analysis only	11 (5)	3 (2)	14 (4)
Test not described	10 (5)	10 (7)	20 (6)

VAS = Visual analogue scale

RMDQ = Roland-Morris Disability Questionnaire

IQR = Interquartile range

* Where ‘improvement’ is defined by the change in score of a specified magnitude

† e.g. Wilcoxon, Friedman’s, Mann Whitney U, or Kruskal Wallis

VAS = Visual analogue scale RMDQ = Roland-Morris Disability Questionnaire IQR = Interquartile range * Where ‘improvement’ is defined by the change in score of a specified magnitude † e.g. Wilcoxon, Friedman’s, Mann Whitney U, or Kruskal Wallis

Discussion

We discuss the results and consider their implications in three sections. We consider these data and past recommendations for core sets of outcome measures in back pain trials (Respice), the current situation and emerging recommendations for core sets (Adspice), and implications for the future of outcome measure use in back pain trials (Prospice).

Respice

The results show that the most commonly used domains in back-specific outcome measures over the whole time-period of interest are domains and PROMs that were included in the millennial recommendations. The millennial recommendations of Deyo et al in 1998, updated by Bombardier et al in 2000, recommend measurement in the domains of pain, function, well-being, disability, and work disability and both recommended using the VAS-P and NRS for measuring pain, and the RMDQ or ODI for measuring function (disability). [1, 2] The World Health Organisation made back-specific recommendations to measure pain, disability, and depression, in 1999, recommending the VAS-P and the ODI be used as primary outcome measures in trials. [3] International Classifications of functioning (ICF) categories were later proposed, recommending 78 (comprehensive) or 35 (brief) domains of measurement for LBP in 2004. [28] The brief set, intended in particular for use in clinical studies, included body functions, structures, activities and participation and environmental factors. The recommendations were criticised for having incomplete coverage. [29] If the millennial recommendations for measurement instruments had impacted practice then notwithstanding the time it takes for recommendations to be implemented, one might expect relative increases in proportional use trajectories sometime after the publication of the recommendations at the turn of the millennium. There is possibly a post-millennial increase in trajectory in the VAS-P and ODI; there may be a decrease in use of the RMDQ. As these data represent a population of trials over the time period, inferential statistics are unnecessary, and the millennial recommendations appear to have had a limited or inconsistent impact on actual use.

Adspice

Towards the end of the 1980–2012 period of interest, researchers suggested outcome measures are inadequate and that their reassessment needed to be prioritised. [4, 7] A second wave of recommendations is now emerging. In 2011, Buchbinder et al identified several domains of measurement not covered by existing core sets, including loss of independence, worry about the future, and negative or discriminatory actions by others. [30] In 2012, Hush et al recommended adding the Patient Generated Index (PGI) and a global back pain recovery scale to the core set, specifically to measure recovery. [31] In 2014, Froud et al showed social components were not adequately represented in measurement. [32] The results in the current study highlight the absence of measurement in social domains; although social factors are covered to some extent within the domain of health-related quality of life. In 2014, a National Institutes of Health (NIH) Task Force recommended using Patient Reported Outcomes Measurement Information System (PROMIS) measures as a minimum dataset in all NIH-funded LBP research; permitting the RMDQ to substitute the PROMIS physical activity items if more extensive ‘legacy measures’ are required. [14] In 2015, Froud et al extended the Hush recommendation for patient-centric instruments, such as the PGI and TQ, for trial outcome measurement more generally, after demonstrating that people may not attend to thinking about their back pain when completing the RMDQ and that changes in RMDQ score can be independent of changes in back pain. [8, 31] In 2015, Chiarotto et al recommended three domains for inclusion in a core set, including physical functioning, pain intensity and health-related quality of life. [33] Thus, it may currently be difficult from a trialist’s perspective to decide upon appropriate primary and secondary outcomes in the design-phase of nsLBP trials. We suggest that this requires careful future attention from groups making core set recommendations, and we discuss this in the next section. In the meantime, given the widespread use shown in the results, there may be some value in using either the VAS-P or RMDQ as an outcome measure in nsLBP trials as this facilitates the greatest cross-comparison. Choices of secondary outcomes might include some of the more patient-relevant and coverage-improving domains, such as the PGI and TQ. The results showed that some domains (for example, adverse events and adherence) are only ever measured as secondary outcomes. In our view this is reasonable (unless the research question relates to adverse events) since the primary focus across most trials and core set recommendations has emphasis on pain and function.

Prospice

There is a risk that overlaps or contractions between and within the millennial and current recommendations may continue to give a discordant message to trialists. For example, the millennial Deyo/Bombardier recommendations suggested use of VAS-P or NRS (pain) and RMDQ or ODI (function) whereas the World Health Organisation recommended only VAS-P and ODI for measuring these domains. Currently, researchers are taking slightly different directions with their recommendations for the future (see above). Future divergences in recommendations may benefit from clear justification and argument. One driving issue may be that researchers disagree upon fundamental clinical measurement properties of instruments. For example, although the RMDQ is the second most commonly used outcome in nsLBP trials, it has gained both criticism and support of its clinimetric properties. [8, 27, 34] An underlying issue here may be disagreement on and heterogeneity in how key clinimetric assessments, such as test-retest studies and responsiveness studies, are conducted. [35-37] The Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN) checklist may go some way towards helping with this, and it may be that new instruments, with a focus on performance, coverage, and patient-relevance need to be developed prior to future recommendations being made. [38] The historical data in the current study suggest that the current wave of recommendations may be set to have limited or inconsistent impact on practice. In future, consideration may need to be given to mechanisms for maximising impact of recommendations and affecting change. We suggest that to maximise impact it may be worth considering (1) the consistency of recommendations, which might be facilitated through (2) inclusivity—a more widespread and comprehensive collaboration between methodologists working on outcome measures, along with clinimetric/psychometric experts, and domain-specific experts where required, in addition to clinical experts. One approach to achieving this may be Delphi technique, which is supported by the Core Outcome Measures in Effectiveness Trials (COMET) initiative, with consideration given to fully reporting panel disagreement and lack of consensus. [39] Finally, generating sufficient (3) participation or ‘buy-in’ from trialists with acquisition of support, and a clear and implementable transition strategy to manages the large lag-times from trial conception to publication, may help to avoid fragmentation in outcome measure use. It may be that amendments to well-adopted check-lists would also help. The Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT) check list strongly encourages trialists to explain the clinical relevance of chosen outcomes in trial protocols (item 12). [40] However, for trial reporting, the Consolidated Standards of Reporting Trials (CONSORT) statement (in-particular item 6a), might be modified to encourage authors to discuss any divergences from recommended core sets or justify outcome measurement selection, which over time, may help to homogenise outcome measurement across similar trials. [41] Additionally, it may be that public and private funders, as stakeholders with a vital interest in maximising the use of clinical trial data, are well-placed to encourage trial teams to carefully consider outcome selection prior to their awarding a grant for a trial.

Reporting methods and analysis

Specific recommendations for including individual improvements when reporting the outcomes of back pain trials were not made until 2011. [13] These were reiterated by the NIH task force in 2014. [14] The 1980 to 2012 data in the current study are not consistent with the recommendations, but the results of the current study will allow future monitoring of the impact of these or future recommendations. It is encouraging that the majority of authors report between-group differences and a reasonable proportion (33%) report adjusted models. Adjusted models improve the precision of estimates by taking account of imbalances that exist between groups notwithstanding randomisation. [42] Few authors have used inappropriate analytical methods, such as analysing only within-group changes; meaning it is not possible to differentiate treatment effect from regression to the mean.

Comparisons to existing research

Litcher-Kelly et al report, from a systematic review of musculoskeletal clinical trials, that the most frequently used instruments were the VAS (60%), and the NRS (12%), with an RMDQ prevalence of 14%. [43] These data are not directly comparable to ours since the study population and publication years differ. In 2004, Müller et al identified 84 different back-specific PROMs and showed that the Oswestry Disability Questionnaire and the RMDQ were most commonly cited. [9] In 2011, Chapman et al also reviewed which outcome measures had been cited in back pain trials between 2006 and 2011. [10] They found that the most cited functional measure was the ODI and that the most cited pain measure was the NRS. Counting citations may not reflect actual use. In some cases, citations may not be given and for long-running outcome measures, such as the VAS, references may not be uniform. For example, Huskisson is commonly credited with developing the VAS, in 1974, but he is not uniformly credited with it. [22] There is evidence that the VAS was being used at least as far back as 1921. [44] Also, some citations will reference validatory work. In other fields, Araújo et al have shown that recommendation of core sets of measurement in gout have not impacted practice. [45] Page et al showed in 2015 that shoulder pain trials suffer from having no core measurement sets recommended. [46] Relatively the situation in back pain research may be similar or better, respectively. Nuovo et al examined the prevalence of reporting of absolute risk reduction (ARR) and number needed to treat (NNT) in RCTs published in five mainstream medical journals. [47] Evaluating publications in 1989, 1992, 1995, and 1998 they found that amongst 359 articles that ARR was reported in 18 (5.0%) reports, and that NNT was used in eight (2.2%). This is similar to our study, in which 8% reported the number or proportion of improvements (a coding in our review which subsumed ARR), and 2% of trials reported NNT.

Strengths and limitations

A strength of our research is provision of prevalence data based on actual use, rather than using citations as a proxy. Other authors have reviewed what outcome measures exist for LBP, but have not estimated prevalence of use. [48-50] We acknowledge that it is preferable to conduct a systematic review using a singular search pattern. As the EGLBP search was comprehensive, included all interventions, and we had previously used it to identify all nsLBP trials, we reasoned it had good cross-coverage with out later search. Also, we did not search PsycINFO and HTA reports, due to good cross-coverage from the other databases. This fragmented search strategy may be viewed as a limitation. However, in comparison to Castellini et al, who reviewed back pain trials published after 1968 and included 222 trials, we judge our search to have been comprehensive. [51] We adopted a 20% validation approach rather than full independent reviewer extraction for all variables. We used full independent abstraction when abstracting details of reporting methods and analysis as validation revealed inadequate agreement only on these variables. While full independent abstraction on all variables would have improved validity, as our focus was methodological and not on estimating a treatment effect, we considered the approach to be reasonable. Our sensitivity analysis, using only the first two of four criteria for judging a primary outcome measure, showed similar results for estimates of prevalence of use for all but the RMDQ. Few additional (only eight instances) RMDQ primary outcomes were identified using criteria 3 or 4. It may be that, unlike users of other primary outcome measures, those who use the RMDQ as a primary are relatively more likely to explicitly declare it as a primary, or perform a sample size calculation based on detecting a difference in RMDQ score between groups. We note that the difference between estimating prevalence of use using only the first two criteria, or all four criteria, does not change the relative ranking of most common usage of primary outcome measures and that, using either method, the RMDQ is the second most commonly used primary outcome measure.

Conclusions

The Visual Analogue Scale of pain intensity and the Roland Morris Disability Questionnaire have been most commonly used in back pain trials. Recommendations for standardising outcomes may have had a limited or inconsistent effect on practice. Analytical and reporting practice is encouraging, although there is still room for improvement. Research groups planning to make further recommendations on core outcome measures for back pain may have more impact if they consider better ways of generating trialist buy-in.

Typical search strategy.

A typical search strategy used in the systematic review. (PDF) Click here for additional data file.

Characteristics of included trials.

A table showing the characteristics of included trials and their references. (PDF) Click here for additional data file.

Characteristics of excluded trials.

A table showing the characteristics of excluded trials and their references. (PDF) Click here for additional data file.

Most commonly used measures over time, standardised by the number of annual publications.

The figure shows the most commonly used measures over time, standardised by the number of annual publications, using a Lowess smoother of half the bandwidth of that shown in Fig 5, in case more sensitive trend-lines are preferred. (TIF) Click here for additional data file.

PRISMA checklist.

A completed PRISMA checklist for the systematic review. (PDF) Click here for additional data file.

42 in total

Review 1. Condition-specific outcome measures for low back pain. Part I: validation.

Authors: U Müller; M S Duetz; C Roeder; C G Greenough
Journal: Eur Spine J Date: 2004-03-17 Impact factor: 3.134

Review 2. Self-report outcome measures for low back pain: searching for international cross-cultural adaptations.

Authors: Leonardo Oliveira Pena Costa; Chris G Maher; Jane Latimer
Journal: Spine (Phila Pa 1976) Date: 2007-04-20 Impact factor: 3.468

Review 3. A systematic review of measures used to assess chronic musculoskeletal pain in clinical and randomized controlled clinical trials.

Authors: Leighann Litcher-Kelly; Sharon A Martino; Joan E Broderick; Arthur A Stone
Journal: J Pain Date: 2007-08-09 Impact factor: 5.820

4. Responsiveness of the numeric pain rating scale in patients with low back pain.

Authors: John D Childs; Sara R Piva; Julie M Fritz
Journal: Spine (Phila Pa 1976) Date: 2005-06-01 Impact factor: 3.468

5. A study of the natural history of back pain. Part I: development of a reliable and sensitive measure of disability in low-back pain.

Authors: M Roland; R Morris
Journal: Spine (Phila Pa 1976) Date: 1983-03 Impact factor: 3.468

6. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study.

Authors: Lidwine B Mokkink; Caroline B Terwee; Donald L Patrick; Jordi Alonso; Paul W Stratford; Dirk L Knol; Lex M Bouter; Henrica C W de Vet
Journal: Qual Life Res Date: 2010-02-19 Impact factor: 4.147

7. Responsiveness of functional status in low back pain: a comparison of different instruments.

Authors: A J H M Beurskens; H C W de Vet; A J A Köke
Journal: Pain Date: 1996-04 Impact factor: 6.961

8. SPIRIT 2013 statement: defining standard protocol items for clinical trials.

Authors: An-Wen Chan; Jennifer M Tetzlaff; Douglas G Altman; Andreas Laupacis; Peter C Gøtzsche; Karmela Krleža-Jerić; Asbjørn Hróbjartsson; Howard Mann; Kay Dickersin; Jesse A Berlin; Caroline J Doré; Wendy R Parulekar; William S M Summerskill; Trish Groves; Kenneth F Schulz; Harold C Sox; Frank W Rockhold; Drummond Rennie; David Moher
Journal: Ann Intern Med Date: 2013-02-05 Impact factor: 25.391

Review 9. A systematic review and meta-synthesis of the impact of low back pain on people's lives.

Authors: Robert Froud; Sue Patterson; Sandra Eldridge; Clive Seale; Tamar Pincus; Dévan Rajendran; Christian Fossum; Martin Underwood
Journal: BMC Musculoskelet Disord Date: 2014-02-21 Impact factor: 2.362

Review 10. A systematic review of cluster randomised trials in residential facilities for older people suggests how to improve quality.

Authors: Karla Diaz-Ordaz; Robert Froud; Bart Sheehan; Sandra Eldridge
Journal: BMC Med Res Methodol Date: 2013-10-22 Impact factor: 4.615

12 in total

1. Patient-Reported Outcome Measures in Registry-Based Studies of Type 2 Diabetes Mellitus: a Systematic Review.

Authors: Yu Ting Chen; Yan Zhi Tan; Mcvin Cheen; Hwee-Lin Wee
Journal: Curr Diab Rep Date: 2019-11-20 Impact factor: 4.810

Review 2. What can we learn from long-term studies on chronic low back pain? A scoping review.

Authors: Alisa L Dutmer; Remko Soer; André P Wolff; Michiel F Reneman; Maarten H Coppes; Henrica R Schiphorst Preuper
Journal: Eur Spine J Date: 2022-01-19 Impact factor: 3.134

3. Inclusion and exclusion criteria used in non-specific low back pain trials: a review of randomised controlled trials published between 2006 and 2012.

Authors: Pål André Amundsen; David W Evans; Dévan Rajendran; Philip Bright; Tom Bjørkli; Sandra Eldridge; Rachelle Buchbinder; Martin Underwood; Robert Froud
Journal: BMC Musculoskelet Disord Date: 2018-04-12 Impact factor: 2.362

4. Effect of differentiating exercise guidance based on a patient's level of low back pain in primary care: a mixed-methods systematic review protocol.

Authors: Jens Erik Jorgensen; Tamana Afzali; Allan Riis
Journal: BMJ Open Date: 2018-01-23 Impact factor: 2.692

5. Primary outcome measures used in interventional trials for ankle fractures: a systematic review.

Authors: Rebecca McKeown; Abdul-Rasheed Rabiu; David R Ellard; Rebecca S Kearney
Journal: BMC Musculoskelet Disord Date: 2019-08-28 Impact factor: 2.362

Review 6. Outcomes research in non-specific low back pain : Knowledge transfer in clinical practice.

Authors: Tanja A Stamm; Anna Boesendorfer; Maisa Omara; Valentin Ritschl; Siniša Štefanac; Erika Mosor
Journal: Wien Klin Wochenschr Date: 2019-06-24 Impact factor: 1.704

7. Pain and Function in Chronic Musculoskeletal Pain-Treating the Whole Person.

Authors: Theresa Mallick-Searle; Kristin Sharma; Philip Toal; Asya Gutman
Journal: J Multidiscip Healthc Date: 2021-02-10

8. Effects of hypnosis, cognitive therapy, hypnotic cognitive therapy, and pain education in adults with chronic pain: a randomized clinical trial.

Authors: Mark P Jensen; Maria Elena Mendoza; Dawn M Ehde; David R Patterson; Ivan R Molton; Tiara M Dillworth; Kevin J Gertz; Joy Chan; Shahin Hakimian; Samuel L Battalio; Marcia A Ciol
Journal: Pain Date: 2020-10 Impact factor: 7.926

9. Core outcome measurement instruments for clinical trials in nonspecific low back pain.

Authors: Alessandro Chiarotto; Maarten Boers; Richard A Deyo; Rachelle Buchbinder; Terry P Corbin; Leonardo O P Costa; Nadine E Foster; Margreth Grotle; Bart W Koes; Francisco M Kovacs; C-W Christine Lin; Chris G Maher; Adam M Pearson; Wilco C Peul; Mark L Schoene; Dennis C Turk; Maurits W van Tulder; Caroline B Terwee; Raymond W Ostelo
Journal: Pain Date: 2018-03 Impact factor: 6.961

10. Responsiveness, Reliability, and Minimally Important and Minimal Detectable Changes of 3 Electronic Patient-Reported Outcome Measures for Low Back Pain: Validation Study.

Authors: Robert Froud; Carol Fawkes; Jonathan Foss; Martin Underwood; Dawn Carnes
Journal: J Med Internet Res Date: 2018-10-24 Impact factor: 5.428