Literature DB >> 31107833

The NIH Minimal Dataset for Chronic Low Back Pain: Responsiveness and Minimal Clinically Important Change.

Alisa L Dutmer¹, Michiel F Reneman¹, Henrica R Schiphorst Preuper^1,2, André P Wolff^2,3, Bert L Speijer², Remko Soer^2,4.

Abstract

STUDY
DESIGN: Prospective cohort study.
OBJECTIVE: To analyze responsiveness and minimal clinically important change (MCIC) of the US National Institutes of Health (NIH) minimal dataset for chronic low back pain (CLBP). SUMMARY OF BACKGROUND DATA: The NIH minimal dataset is a 40-item questionnaire developed to increase use of standardized definitions and measures for CLBP. Longitudinal validity of the total minimal dataset and the subscale Impact Stratification are unknown.
METHODS: Total outcome scores on the NIH minimal dataset, Dutch Language Version, were calculated ranging from 0 to 100 points with higher scores representing worse functioning. Responsiveness and MCIC were determined with an anchor-based method, calculating the area under the receiver operating characteristics (ROC) curve (AUC) and by determining the optimal cut-off point. Smallest detectable change (SDC) was calculated as a parameter of measurement error.
RESULTS: In total 223 patients with CLBP were included. Mean total score on the NIH minimal dataset was 44 ± 14 points at baseline. The total outcome score was responsive to change with an AUC of 0.84. MCIC was 14 points with a sensitivity of 72% and specificity 82%, and SDC was 23 points. Mean total score on Impact Stratification (scale 8-50) was 34.4 ± 7.4 points at baseline, with an AUC of 0.91, an MCIC of 7.5 with a sensitivity 96% of and specificity of 78%, and an SDC of 14 points.
CONCLUSION: The longitudinal validity of the NIH minimal dataset is adequate. An improvement of 14 points in total outcome score and 7.5 points in Impact Stratification can be interpreted as clinically important in individual patients. However, MCIC depends on baseline values and the method that is chosen to determine the optimal cut-off point. Furthermore, measurement error is larger than the MCIC. This means that individual change scores should be interpreted with caution. LEVEL OF EVIDENCE: 2.

Entities: Chemical

Mesh：

Year: 2019 PMID： 31107833 PMCID： PMC6791505 DOI： 10.1097/BRS.0000000000003107

Source DB: PubMed Journal: Spine (Phila Pa 1976) ISSN： 0362-2436 Impact factor: 3.241

In 2014, The US National Institute of Health (NIH) introduced a minimal dataset for chronic low back pain (CLBP) to increase use of standardized definitions and measures and to facilitate comparison in clinical and epidemiological studies.[1-3] This self-report questionnaire has been translated and adapted to Canadian French,[4] Farsi,[5] and Dutch.[6] The NIH minimal dataset includes items related to medical history and self-report measures of physical function, psychosocial functioning, sleep disturbance, pain intensity, and pain interference.[3] A Dutch validation study of the NIH minimal dataset revealed sufficient to good measurement properties and a good fit in a 7-factor model.[6] Longitudinal validity of the NIH minimal dataset, however, has not been tested in any language version. To compare longitudinal measurements and to allow better comparison across studies, an outcome score needs to be constructed. The NIH task force proposed an outcome score called the Impact Stratification. This score consists of three domains only: pain intensity, pain interference, and physical function. To compare full biopsychosocial characteristics and effects after interventions for patients with CLBP, an outcome score should also include the remaining domains of the questionnaire such as depression and sleep disturbance. Detecting change in health status over time (responsiveness), and being able to interpret change scores are important aspects of patient-reported outcome measures.[7,8] Change scores can be interpreted with the minimal clinically important change (MCIC) and the measurement error, expressed as smallest detectable change (SDC). Both MCIC and SDC are expressed on the actual scale of measurement and are therefore advantageous for clinical interpretation. When the MCIC is larger than the SDC, an outcome measure is able to distinguish clinically important change from measurement error.[7,9] The MCIC can also be used in responder analyses, where a proportion of patients is identified that improved by more than the MCIC.[10] Many commonly used patient-reported outcome measures in LBP have previously been studied on responsiveness. However, these outcome measures are predominantly unidimensional, measuring a single construct (e.g., pain or back specific function).[11,12] Whereas the NIH minimal dataset is a multidimensional instrument that combines multiple constructs that are relevant in LBP research, such as pain interference, physical and psychosocial functioning, sleep, and depression. Therefore, studying responsiveness and MCIC of the NIH minimal dataset is deemed important. The objectives of this study were to: construct outcome of the NIH minimal dataset for CLBP, analyze responsiveness, and interpret change scores by determining MCIC and SDC. Secondary analyses were performed to explore whether clinically important change depends on baseline score.

MATERIALS AND METHODS

Procedures

Data were collected from July 2015 to September 2018 in the Groningen Spine Center, a university-based multispecialty tertiary care center in the north of the Netherlands. Baseline (T0) and 12-months follow-up data (T1) were extracted from a longitudinal cohort. Patients digitally filled out a set of questionnaires, including the NIH minimal dataset Dutch Language Version, the Pain Disability Index (PDI), the EuroQol-5D (EQ5D), a single item on work status, and a Global Perceived Effect (GPE) scale. The Medical Ethical Committee of the University Medical Center Groningen, the Netherlands, provided a waiver for this study with respect to medical ethical permission, because the study was performed within care as usual. All patients signed informed consent. The handling of the data was done in accordance with the guidelines for Good Research Practice.[13]

Patients

All patients 18 to 65-year old who reported pain in their lower back and/or leg for more than 12 weeks were included. Patients with no Internet access, insufficient Dutch reading skills, and who did not respond to the follow-up questionnaire were excluded. The items in the NIH minimal dataset were specifically chosen by the NIH research task force for their importance to a wide range of patients with chronic LBP with or without specific pathoanatomic diagnoses.[1] Therefore, patients with specific or multifactorial (often referred to as nonspecific) LBP, were both included. Interventions were chosen based on indication and patient preference. Possible treatment options were multidisciplinary rehabilitation for patients with multifactorial LBP, surgery or conservative therapy for patients with specific complaints such as herniated disks or stenosis, and anesthesiology for patients with clear sensitization patterns in well-described dermatomes.

Measures

NIH Minimal Dataset for CLBP

The NIH minimal dataset includes 40 items related to demographics, medical history, and self-reported symptoms and functioning.[1] Seventeen of these items are derived from the 29-item Patient Reported Outcomes Measurement Information System (PROMIS) short form.[14] An outcome score, Impact Stratification, was created with nine of the items (one item on pain intensity, four items on pain interference, and four items on physical function). For each item a score of 1 is least severe and 5 most severe, with the exception of the single item on pain intensity, which ranges from 0 (no pain) to 10 (worst possible pain). Total scores range from 8 (least impact) to 50 (most impact). Impact Stratification showed moderate and strong correlation with the Roland-Morris Disability Questionnaire (RMDQ) (RS = 0.66) and Oswestry Disability Index (ODI) (RS = 0.81) and demonstrated higher responsiveness compared with the RMDQ at 3-month follow-up.[3] Exploratory factor analyses led to a 7-factor model for the NIH minimal dataset with 29 of the original 40 items remaining (Appendix 1).[6] Two items on ethnicity and race were removed beforehand for their lack of clinical relevance and nine items were removed during the analyses for having insufficient variance or significant factor loadings. The seven factors that were identified are 1: pain intensity and interference (six items), 2: pain history (seven items), 3: medical interventions (five items), 4: depression and catastrophizing (six items), 5: physical function (three items), 6: sleep disturbance (four items), and 7: lifestyle (two items). Factor 3 and one item of factor 2 (“How long has low-back pain been an ongoing problem for you?”) are not used in follow-up measurements because their levels are fixed in a cohort of patients with a chronic condition. Each individual factor showed a fair to good correlation with the PDI and EQ5D. Two-week test–retest reliability per factor was moderate to good (ICC = 0.71; range = 0.52–0.82) and showed substantial agreement per item (κω = 0.65).[6]

Work Status

Patients were asked whether they were currently employed. If yes, a question about the status of their employment (working, partial sick leave, sick leave) followed.

Pain Disability Index

The PDI measures self-reported pain interference in seven categories of daily life activities.[15] The questionnaire is constructed on an 11-point numeric rating scale in which 0 means “no disability” and 10 “maximum disability.” Total scores range from 0 to 70 where a higher score means a greater disability due to pain. The Dutch language version of the PDI was used. Two-week test–retest reliability is good.[16]

EuroQol-5D

The EQ5D is a 5-item (representing five dimensions) questionnaire that measures quality of life.[17] Each item has three levels: no problems, some problems, and extreme problems. The dimensions measured are mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. The Dutch utility index was used to calculate a total score between −0.33 and 1.00.[18] Higher scores represent a better quality of life. Criterion validity of the Dutch language version is moderate and responsiveness moderate to good in patients with LBP.[19]

Global Perceived Effect Scale

A Global Perceived Effect Scale (GPE) was used as an external criterion.[20] At 12 months follow-up (T1) patients answered the question: “How much did your treated complaints change compared with pretreatment level?” Possible answers ranged from 1 to 7 on a 7-point Likert scale (1, “extremely worsened”; 2, “much worsened”; 3, “little worsened”; 4, “unchanged”; 5, “little improved”; 6, “much improved”; an 7, “completely improved”). Next, patient scores were divided into two categories: improved (much improved and completely improved) and not improved (all others). Studies have reported strong correlations between GPE scores and changes in pain and disability.[21,22] Test–retest reliability of 11-point GPEs is excellent.[23] Overall, the use of a 7 to 11 points scale is recommended when taking into account patient preference, adequate discriminative ability, and test–retest reliability.[24]

Data Analyses

Constructing Outcome Scores

All records containing more than three missing items were excluded from the study. If less than three items were missing, scores were imputed based on the mean score of the item. The response “don’t know” for the item on leg pain was defined as a missing value. The raw 29 item scores of the 7-factor model were recoded to scores between 0 and 1, where 0 represents highest levels of functioning and 1 lowest level of functioning. Factor scores were calculated by taking the mean of the corresponding item scores and multiplying them by 100. This led to factor scores ranging from 0 to 100 points with lower scores representing higher functioning, giving equal weight to each item. A total outcome score was calculated by taking the average of all factor scores (0–100 points). Floor or ceiling effects were considered present when >15% of patients achieved the lowest or highest possible score.[25]

Responsiveness and Minimal Clinically Important Change

Responsiveness and minimal clinically important chance (MCIC) were calculated according to the Consensus-based Standards for the selection of health Measurement Instruments criteria (COSMIN).[8,26] To differentiate between improved and unimproved patients the group with the GPE score “improved” was compared with the group “unimproved”. Area under the receiver operator curve (AUC) was calculated for the total NIH minimal dataset, each individual factor and for Impact Stratification. An AUC higher than 0.70 was considered responsive. The MCIC was measured by determining the optimal cut-off point (OCP) of the AUC. This is the cut-point closest to the top-left corner of the ROC curve, where the sum of squares of 1-sensitivity and 1-specificity is minimized (equation 1).[27] To study the effect of baseline score on MCIC (i.e., patients with low functional problems may have lower MCICs), secondary analyses were performed where responsiveness and MCICs were calculated for different baseline-score groups. For the total outcome score, three equally sized subgroups were created based on tertile baseline scores. For the Impact Stratification, three subgroups were created: mild impact 8–27 points; moderate impact 28–35; severe impact ≥35 points.[3]

Measurement Error

To determine the measurement error, standard error or measurement (SEM) and smallest detectable change (SDC) were calculated. The SEM was based on the variability between T0 and T1 plus the variability caused by random error (equation 2) in patients who reported to be “unchanged” on the GPE.[9,28] The variance components for the SEM formula were retrieved with the VARCOMP command in SPSS (Version 23; IBM Corp., Armonk, NY). The SDC is the smallest change in score that a patient must show to ensure that the observed change is real and not attributed to measurement error. The SDC can be determined in individual patients and at group level with the following equations, where 1.96 relates to a confidence level of 95% and √2 represents a correction for repeated measurements:[7,9]

RESULTS

Baseline and 1 year follow-up data was available for 223 patients. The majority of patients were diagnosed with multifactorial LBP (78%), and 22% were diagnosed with specific spinal pathology (e.g., fractures, radiculopathy, malignancy, rheumatoid arthritis). No patients were excluded due to having more than three missing items on the NIH minimal dataset. Only for the item on the presence of leg pain were there missing values, due to 19 patients answering “don’t know.” Demographic and clinical variables are presented in Table 1. Mean age was 49.7 ± 11.9 years and 58% of patients were female. A majority (56%) experienced LBP for over 5 years.

TABLE 1

Patient Characteristics

Characteristic	Patients (n = 223)
Age, mean years ± SD	49.7 ± 11.9
Sex, n female (%)	130 (58)
Duration LBP, n (%)
12 weeks to 1 year	34 (15)
1 year to 5 years	65 (29)
> 5 years	124 (56)
Education level, n (%)
No education	2 (1)
Low	81 (36)
Middle	78 (35)
High	45 (20)
Other/unknown	17 (8)
Work status, n (%)
Working	65 (29)
Partial sick leave	34 (15)
Sick leave	39 (18)
No job	85 (38)
Baseline measures of pain, disability, and quality of life
NRS pain (0–10), mean ± SD	6.6 ± 1.6
NIH total outcome Score (0–100), mean ± SD	44.3 ± 13.9
NIH Impact Stratification (8–50), mean ± SD	34.4 ± 7.4
PDI score (0–70), mean ± SD	36.4 ± 14.1
EQ5D score (−0.33–1.00), mean ± SD	0.48 ± 0.29

EQ5D indicates EuroQol-5D; LBP, low back pain; N, number of patients; NIH, National Institutes of Health; NRS, numeric rating scale; PDI, Pain Disability Index; SD, standard deviation.

Patient Characteristics EQ5D indicates EuroQol-5D; LBP, low back pain; N, number of patients; NIH, National Institutes of Health; NRS, numeric rating scale; PDI, Pain Disability Index; SD, standard deviation.

Responsiveness and Minimal Clinically Important Change

Total Sample

Scores on the NIH minimal dataset on T0 and T1, mean changes, 95% confidence intervals and responsiveness and MCIC parameters (AUC, OCP, sensitivity and specificity) are presented in Table 2. Relevant floor effects were found at T0 for factor 7 (lifestyle; 81%) and at T1 for factor 4 (depression and catastrophizing; 24%), factor 5 (physical function; 17%), and factor 7 (lifestyle; 78%). Mean change of the total outcome score and Impact Stratification were respectively 20.5 ± 13.7 and 16.3 ± 7.9 points for improved (n = 50) and 3.6 ± 11.1 and 3.5 ± 6.4 points for unimproved (n = 173) patients. The total outcome score and Impact Stratification showed good responsiveness with an AUC of respectively 0.84 (0.78–0.91) and 0.91 (0.86–0.96). The MCIC was 14.3 points for the total outcome score with a sensitivity and specificity of 0.72 and 0.82 and 7.5 points for the Impact Stratification with a sensitivity and specificity of 0.86 and 0.96. Factor 1 (pain intensity and interference) and factor 5 (physical function) also showed good responsiveness with AUCs≥0.70. The lower bound of the AUC confidence interval was <0.70 for factor 2 (95% CI = 0.67–0.83) and factor 4 (95% CI = 0.62–0.79), whereas factor 6 (sleep disturbance) and factor 7 (lifestyle) showed insufficient responsiveness (AUC≤0.70).

TABLE 2

Responsiveness and Minimal Clinically Important Change of the NIH Minimal Dataset (n = 223)

	Total Outcome Score	Factor 1: Pain Intensity and Interference	Factor 2: Pain History	Factor 4: Depression and Catastrophizing	Factor 5: Physical Function	Factor 6: Sleep Disturbance	Factor 7: Lifestyle	Impact Stratification
Improved patients, n (%)	50 (29)
Scores
Score T0, mean ± SD	44.3 ± 13.9	68.6 ± 17.8	53.2 ± 19.6	35.6 ± 26.2	49.8 ± 26.4	50.5 ± 22.4	8.1 ± 19.5	34.4 ± 7.4
Min–max	7.7–75.1	15–98.3	0–100	0–100	0–100	6.25–100	0–100	14–49
Score T1, mean ± SD	36.9 ± 16.9	50.6 ± 23.7	45.2 ± 23.0	28.0 ± 24.9	40.0 ± 27.5	46.4 ± 22.8	10.8 ± 23.4	28.1 ± 9.7
Min–max	4.1–80.4	0–96.7	0–100	0–100	0–100	0–100	0–100	8–48
Mean change ± SD	7.4 ± 13.6^*	18.0 ± 21.9^*	8.0 ± 19.4^*	7.6 ± 23.8^*	9.8 ± 25.9^*	4.1 ± 19.7^†	−2.8 ± 18.9^*	6.3 ± 8.6^*
95% CI of mean change	5.6–9.2	15.1–20.8	5.4–10.5	4.5–10.7	6.3–13.2	1.5–6.7	−5.3–0.3	5.2–7.5
Change (%)	16.7	26.2	15.0	21.3	19.7	8.1	−34.6	18.3
Responsiveness
AUC	0.84	0.91	0.75	0.70	0.78	0.65	0.49	0.91
95% CI	0.78–0.91	0.86–0.95	0.67–0.83	0.62–0.79	0.70–0.86	0.57–0.73	0.40–0.59	0.86–0.96
MCIC
OCP	14.2	25.4	16.7	16.8	16.7	3.1	−8.3	7.5
Sensitivity (%); specificity (%)	72;82	88;84	66;78	58;72	70;76	72;56	84;13	96;78

Total Outcome Score indicates total outcome score of the NIH minimal dataset (scale 0–100 points); all factors (scale 0–100); Impact Stratification (scale 8–50 points).

Factor 3 not included due to the fact that the corresponding items are only administered at baseline (T0).

*Significant change between T0 and T1 (P < 0.01).

†Significant change between T0 and T1 (P < 0.05).

AUC indicates area under the receiver operating characteristic (ROC) curve; CI, confidence interval; MCIC, minimal clinically important change; n, number of patients; OCP, optimal cut-off point of the ROC curve; SD, standard deviation.

Responsiveness and Minimal Clinically Important Change of the NIH Minimal Dataset (n = 223) Total Outcome Score indicates total outcome score of the NIH minimal dataset (scale 0–100 points); all factors (scale 0–100); Impact Stratification (scale 8–50 points). Factor 3 not included due to the fact that the corresponding items are only administered at baseline (T0). *Significant change between T0 and T1 (P < 0.01). †Significant change between T0 and T1 (P < 0.05). AUC indicates area under the receiver operating characteristic (ROC) curve; CI, confidence interval; MCIC, minimal clinically important change; n, number of patients; OCP, optimal cut-off point of the ROC curve; SD, standard deviation.

Baseline-score Groups

Responsiveness and MCIC parameters for the different baseline-score groups are presented in Table 3. Total outcome score groups were equal in size, but for Impact Stratification group sizes differed with the severely impacted group being the largest baseline-score subgroup. Scores between T0 and T1 improved significantly for all subgroups except for both lowest scoring (thus highest functioning) subgroups at baseline, i.e., tertile 1 of the total outcome score and subgroup mild from Impact Stratification. Adversely, both groups proportionally had most improved patients according to the GPE (Tertile 1, 26%; Mild, 32%). MCICs for the total outcome score were 6.9 points for tertile 1, 19.7 points for tertile 2, and 17.1 points for tertile 3. For the Impact Stratification MCICs were 7.5 points for mildly impacted patients, 11.5 points for moderately, and 12.5 points for severely impacted patients.

TABLE 3

Responsiveness and Minimal Clinically Important Change of the NIH Minimal Dataset Total Outcome Score and Impact Stratification per Baseline-score Group

	Total Outcome Score			Impact Stratification
Baseline Tertile 1	Baseline Tertile 2	Baseline Tertile 3	Baseline Mild (8–27)	Baseline Moderate (28–34)	Baseline Severe (≥35)
Patients, n	74	74	75	37	73	113
Improved patients, n (%)	19 (26)	18 (24)	13 (17)	12 (32)	19 (26)	19 (17)
Scores
Score T0, mean ± SD	28.8 ± 7.6	44.3 ± 3.7	59.5 ± 6.0	22.7 ± 3.6	31.2 ± 2.0	40.3 ± 3.8
Min–max	7.7–38.0	38.1–50.1	50.1–75.1	14–27	28–34	35–49
Score T1, mean ± SD	26.3 ± 12.6	35.4 ± 15.3	48.8 ± 14.6	20.8 ± 8.9	25.1 ± 7.5	32.3 ± 8.9
Min–max	4.9–59.0	4.1–64.5	9.0–80.4	8–37	10–40	8–48
Mean change ± SD	2.6 ± 11.3	9.0 ± 14.0^*	10.7 ± 14.2^*	1.9 ± 8.3	6.1 ± 7.8^*	8.0 ± 8.7^*
95% CI of mean change	−0.1–5.2	5.7–12.2	7.5–14.0	−0.9–4.7	4.2–7.9	6.3–9.6
Change (%)	9.0	20.3	18.0	8.4	19.6	19.9
Responsiveness
AUC	0.76	0.94	0.91	0.91	0.92	0.97
95% CI	0.64–0.88	0.88–1.00	0.82–0.99	0.77–1.00	0.85–0.99	0.93–1.00
MCIC
OCP	6.9	19.7	17.1	7.5	11.5	12.5
Sensitivity (%); specificity (%)	74;71	94;91	85;86	92;96	84;91	95;88

Total Outcome Score indicates total outcome score of the NIH minimal dataset (scale 0–100 points); Impact Stratification (scale 8–50 points).

*Significant change between T0 and T1 (P < 0.01).

Responsiveness and Minimal Clinically Important Change of the NIH Minimal Dataset Total Outcome Score and Impact Stratification per Baseline-score Group Total Outcome Score indicates total outcome score of the NIH minimal dataset (scale 0–100 points); Impact Stratification (scale 8–50 points). *Significant change between T0 and T1 (P < 0.01). AUC indicates area under the receiver operating characteristic (ROC) curve; CI, confidence interval; MCIC, minimal clinically important change; n, number of patients; OCP, optimal cut-off point of the ROC curve; SD, standard deviation.

Measurement Error

The SEM was 8.3 points for the total outcome score with an SDCindividual of 22.9 and an SDCgroup of 1.8 points. The SEM for the Impact Stratification was 5.2 points with an SDCindividual of 14.4 and an SDCgroup of 1.1 points.

DISCUSSION

This is the first study to calculate a total outcome score for the NIH minimal dataset, using a 7-factor model from a previous Dutch validation study.[6] Results show that the total outcome score for the NIH minimal dataset is responsive. A change score of 14 points on the total outcome score (0–100) and 7.5 points on Impact Stratification (8–50) can be considered clinically important. However, individual change scores up to 23 points for the total outcome score and 14 points for Impact Stratification should be interpreted with caution, because of a greater than 5% risk of measurement error. The total outcome score of the NIH minimal dataset, Impact Stratification, and separate factors related to pain and functioning showed good responsiveness. These findings correspond to other studies in patients with musculoskeletal pain[29] and lumbar spinal surgery.[30] MCICs in our study are similar to proposed MCICs for commonly used pain and disability measures in LBP and also vary between 10 and 20% of a total score.[11] Furthermore, as a rule of thumb a 30% change from baseline is considered a useful threshold for identifying clinically meaningful improvement.[11] Overall, responsiveness for the NIH total outcome score and pain and functioning domains is established and clinically relevant change scores match recommendations from literature. The SDC individual for the total outcome score (22.9 points) and Impact Stratification (14.4) both exceeded the MCICs of the corresponding measures. This is also observed in other studies on back pain.[16,31-33] Individual change scores larger than the MCIC but smaller than the SDC need to be interpreted with caution, because there is a risk of falsely labeling patients as improved while their scores fall within the measurement error. As the SDCgroup was considerably smaller than the SDCindividual, both outcome scores are better at detecting changes at a group level. However, we recommend reporting the percentage of improved patients (responders; determined with the MCIC) instead of comparing change scores on a group level. It is also important to take baseline scores into account when interpreting individual change scores.[34] Higher MCICs apply for higher baseline values (more severely impacted), since there is more potential for improvement.[33,35] MCICs for the total outcome score baseline-score groups were less proportionally distributed with estimates of roughly 7, 20, and 17 points in order from lowest to highest scoring tertile. Change scores for the lowest scoring tertile (< 29 points at baseline) appear more difficult to interpret compared with other baseline-score groups due to partially insufficient responsiveness and lower sensitivity and specificity for the MCIC. A larger MCIC for the second tertile compared to the highest tertile seems counterintuitive, but has been observed before in a study on pain and disability instruments in patients with LBP.[36] The authors hypothesized that the more disabled patients at baseline possibly learned not to have too high expectations to the treatment outcome. Responsiveness was insufficient for the factors depression and catastrophizing, sleep disturbance, and lifestyle. Floor effects were observed for the factor depression and catastrophizing at follow-up (24%) and for lifestyle at baseline (81%) and follow-up (78%), indicating that a decrease in score could possibly be underestimated, or that, for lifestyle in particular, there was little effect of LBP on these domains. It should also be noted that the concept of recovery can be complex and that we do not know what patients take into account as they rate their perceived overall change.[24,37] Patients may perceive a “change in treated complaints” (phrasing of the GPE item), in terms of reduced pain and disability and higher levels of functioning instead of improvement in domains such as sleep and depression. However, several studies on the patient perspective on successful treatment for chronic pain do indicate the relevance of outcomes such as enjoyment of life, emotional well-being, fatigue and weakness, and sleep-related problems.[38-40] For that reason, the NIH minimal dataset could still provide useful information into important domains other than pain and functioning.

Methodological Considerations

A variety of methods exist to measure responsiveness and determine the MCIC and measurement error of patient reported outcome measures, and results may vary significantly depending on which method is used. We determined MCICs with an anchor-based method because distribution-based methods do not take into account patients’ perspective and are inappropriate to use when the magnitude of the effect of an intervention is unknown.[8,41] A second consideration was how to determine the optimal cut-off point of the ROC curve in order to best estimate the MCIC. Given that sensitivity and specificity in chronic conditions such as CLBP are often valued equally, the cut-point to the top-left corner of the ROC curve represents the optimal cut-off point for the MCIC. One method of estimation that is often used is by determining where the sum of 1-sensitivity and 1-specificity is at its smallest.[35,42,43] However, the most efficient way to choose a cut-point closest to the top-left corner of the ROC curve is by first squaring the 1-sensitivity and 1-specificity terms.[27] Had we chosen to utilize the first method, we would have found a similar MCIC for the Impact stratification but a larger MCIC for the total outcome score; 17 instead of 14 points. SEM was based on the variability between time points and variability caused by random error in the “unchanged” patients in our cohort.[9,28] SEM can also be calculated with SD√(1-ICC), where the ICC can be obtained from a test–retest study with a similar sample. While it can have a significant impact on the magnitude of the SEM and SDC, studies often differ in which standard deviation they use in their calculations. By using baseline SD[16,44] we would have found a SEM and SDC of approximately 14 and 21 points for the total outcome score and 7 and 10 points for the Impact Stratification. A pooled SD, obtained from an ANOVA analysis with the “unchanged” patients,[45] would produce a substantially lower SEM and SDC of approximately 8 and 12 points for the total outcome score and 3 and 7 points for the Impact Stratification. A limitation of this study is the generalizability of the results in terms of CLBP severity and level of care. Also, patients were recruited from a single clinical research facility, which could further limit external validity. Our patient sample scores similar on pain (NRS: 6.7 ± 1.8) and disability (PDI: 38.0 ± 14.1) compared with Dutch patients with chronic pain referred to pain rehabilitation,[46] but score much higher compared with Dutch workers with chronic musculoskeletal pain who do not seek specialty care (NRS: 4.6 ± 2.1; PDI: 19.1 ± 11.1).[47] However, the baseline value analyses in the present study do provide us with more insight into responsiveness and MCIC for different CLBP severity. Future research should further explore longitudinal validity of the NIH minimal dataset in patients who receive primary and secondary level LBP care. The GPE consists of one item only and may not be very representative for individual differences in what is perceived as a change in treated complaints. Furthermore, with a follow-up period of 12 months there is a fairly high risk of recall bias[48] or response-shift, where rating results might be influenced by functional status at discharge.[37,49] We do not know to what extent and in what direction this effect might have influenced the results of this study. As far as we know, no better alternative exists for an external criterion that is known to correlate with pain, disability, and quality-of-life measures.

CONCLUSION

The NIH minimal dataset is responsive in patients with CLBP seeking tertiary multispecialty care. A change of 14 points on the total outcome score and 7.5 points on Impact Stratification can be considered clinically important. MCIC depends among others on baseline values and the method that is chosen to determine the optimal cut-off point. Furthermore, individual change scores have to be interpreted with caution due to a risk of measurement error.

Key Points

The longitudinal validity of the NIH minimal dataset is adequate. An improvement of 14 points on the total outcome score and 7.5 points on Impact Stratification can be interpreted as clinically important in patients with CLBP. The measurement error is larger than the minimal clinically important change. This means that individual change scores should be interpreted with caution. Generally, for more severe patients, higher change scores should be obtained to be considered clinically important.

48 in total

1. Quebec Back Pain Disability Scale was responsive and showed reasonable interpretability after a multidisciplinary treatment.

Authors: Christophe Demoulin; Raymond Ostelo; J André Knottnerus; Rob J E M Smeets
Journal: J Clin Epidemiol Date: 2010-04-18 Impact factor: 6.437

2. Quality criteria were proposed for measurement properties of health status questionnaires.

Authors: Caroline B Terwee; Sandra D M Bot; Michael R de Boer; Daniëlle A W M van der Windt; Dirk L Knol; Joost Dekker; Lex M Bouter; Henrica C W de Vet
Journal: J Clin Epidemiol Date: 2006-08-24 Impact factor: 6.437

3. The Canadian minimum dataset for chronic low back pain research: a cross-cultural adaptation of the National Institutes of Health Task Force Research Standards.

Authors: Anaïs Lacasse; Jean-Sébastien Roy; Alexandre J Parent; Nioushah Noushi; Chúk Odenigbo; Gabrielle Pagé; Nicolas Beaudet; Manon Choinière; Laura S Stone; Mark A Ware
Journal: CMAJ Open Date: 2017-03-10

4. Index for rating diagnostic tests.

Authors: W J YOUDEN
Journal: Cancer Date: 1950-01 Impact factor: 6.860

5. Report of the NIH Task Force on Research Standards for Chronic Low Back Pain.

Authors: Richard A Deyo; Samuel F Dworkin; Dagmar Amtmann; Gunnar Andersson; David Borenstein; Eugene Carragee; John A Carrino; John Carrino; Roger Chou; Karon Cook; Anthony DeLitto; Christine Goertz; Partap Khalsa; John Loeser; Sean Mackey; James Panagis; James Rainville; Tor Tosteson; Dennis Turk; Michael Von Korff; Debra K Weiner
Journal: Spine J Date: 2014-06-18 Impact factor: 4.166

6. Minimal clinically important change for pain intensity, functional status, and general health status in patients with nonspecific low back pain.

Authors: Nicole van der Roer; Raymond W J G Ostelo; Geertruida E Bekkering; Maurits W van Tulder; Henrica C W de Vet
Journal: Spine (Phila Pa 1976) Date: 2006-03-01 Impact factor: 3.468

7. Interpreting change scores for pain and functional status in low back pain: towards international consensus regarding minimal important change.

Authors: Raymond W J G Ostelo; Rick A Deyo; P Stratford; Gordon Waddell; Peter Croft; Michael Von Korff; Lex M Bouter; Henrica C de Vet
Journal: Spine (Phila Pa 1976) Date: 2008-01-01 Impact factor: 3.468

8. Identifying important outcome domains for chronic pain clinical trials: an IMMPACT survey of people with pain.

Authors: Dennis C Turk; Robert H Dworkin; Dennis Revicki; Gale Harding; Laurie B Burke; David Cella; Charles S Cleeland; Penney Cowan; John T Farrar; Sharon Hertz; Mitchell B Max; Bob A Rappaport
Journal: Pain Date: 2007-10-15 Impact factor: 6.961

9. Establishing anchor-based minimally important differences (MID) with the EORTC quality-of-life measures: a meta-analysis protocol.

Authors: Zebedee Jammbe Musoro; Jean-Francois Hamel; Divine Ewane Ediebah; Kim Cocks; Madeleine T King; Mogens Groenvold; Mirjam A G Sprangers; Yvonne Brandberg; Galina Velikova; John Maringwa; Hans-Henning Flechtner; Andrew Bottomley; Corneel Coens
Journal: BMJ Open Date: 2018-01-10 Impact factor: 2.692

10. Determination and comparison of the smallest detectable change (SDC) and the minimal important change (MIC) of four-shoulder patient-reported outcome measures (PROMs).

Authors: Derk A van Kampen; W Jaap Willems; Loes W A H van Beers; Rene M Castelein; Vanessa A B Scholtes; Caroline B Terwee
Journal: J Orthop Surg Res Date: 2013-11-14 Impact factor: 2.359

5 in total

1. Electroencephalography Signatures for Conditioned Pain Modulation and Pain Perception in Nonspecific Chronic Low Back Pain-An Exploratory Study.

Authors: Paulo E P Teixeira; Kevin Pacheco-Barrios; Elif Uygur-Kucukseymen; Roberto Mathias Machado; Ana Balbuena-Pareja; Stefano Giannoni-Luza; Maria Alejandra Luna-Cuadros; Alejandra Cardenas-Rojas; Paola Gonzalez-Mego; Piero F Mejia-Pando; Timothy Wagner; Laura Dipietro; Felipe Fregni
Journal: Pain Med Date: 2022-03-02 Impact factor: 3.637

2. Support for the Reliability and Validity of the National Institutes of Health Impact Stratification Score in a Sample of Active-Duty U.S. Military Personnel with Low Back Pain.

Authors: Ron D Hays; Maria Orlando Edelen; Anthony Rodriguez; Patricia Herman
Journal: Pain Med Date: 2021-10-08 Impact factor: 3.750

The NIH Minimal Dataset for Chronic Low Back Pain: Responsiveness and Minimal Clinically Important Change.

MATERIALS AND METHODS

Procedures

Patients

Measures

NIH Minimal Dataset for CLBP

Work Status

Pain Disability Index

EuroQol-5D

Global Perceived Effect Scale

Data Analyses

Constructing Outcome Scores

Responsiveness and Minimal Clinically Important Change

Measurement Error

RESULTS

Responsiveness and Minimal Clinically Important Change

Total Sample

Baseline-score Groups

Measurement Error

DISCUSSION

Methodological Considerations

CONCLUSION

Key Points

1. Quebec Back Pain Disability Scale was responsive and showed reasonable interpretability after a multidisciplinary treatment.

2. Quality criteria were proposed for measurement properties of health status questionnaires.

3. The Canadian minimum dataset for chronic low back pain research: a cross-cultural adaptation of the National Institutes of Health Task Force Research Standards.

4. Index for rating diagnostic tests.

5. Report of the NIH Task Force on Research Standards for Chronic Low Back Pain.

6. Minimal clinically important change for pain intensity, functional status, and general health status in patients with nonspecific low back pain.

7. Interpreting change scores for pain and functional status in low back pain: towards international consensus regarding minimal important change.

8. Identifying important outcome domains for chronic pain clinical trials: an IMMPACT survey of people with pain.

9. Establishing anchor-based minimally important differences (MID) with the EORTC quality-of-life measures: a meta-analysis protocol.

10. Determination and comparison of the smallest detectable change (SDC) and the minimal important change (MIC) of four-shoulder patient-reported outcome measures (PROMs).

1. Electroencephalography Signatures for Conditioned Pain Modulation and Pain Perception in Nonspecific Chronic Low Back Pain-An Exploratory Study.

2. Support for the Reliability and Validity of the National Institutes of Health Impact Stratification Score in a Sample of Active-Duty U.S. Military Personnel with Low Back Pain.

3. Unpacking the impact of chronic pain as measured by the impact stratification score.

4. Trajectories of Disability and Low Back Pain Impact: 2-year Follow-up of the Groningen Spine Cohort.

Review 5. Between-group minimally important change versus individual treatment responders.