Literature DB >> 32328553

Minimally Important Differences for Interpreting EORTC QLQ-C30 Scores in Patients With Advanced Breast Cancer.

Jammbe Z Musoro¹, Corneel Coens¹, Frederic Fiteni^1,2,3,4, Pogoda Katarzyna^1,5, Fatima Cardoso^1,6, Nicola S Russell^1,7, Madeleine T King^1,8, Kim Cocks^1,9,10, Mirjam Ag Sprangers^1,11, Mogens Groenvold^1,12, Galina Velikova^1,13, Hans-Henning Flechtner^1,14, Andrew Bottomley¹.

Abstract

BACKGROUND: We aimed to estimate the minimally important difference (MID) for interpreting group-level change over time, both within a group and between groups, for the European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire core 30 (EORTC QLQ-C30) scores in patients with advanced breast cancer.
METHODS: Data were derived from two published EORTC trials. Clinical anchors (eg, performance status [PS]) were selected using correlation strength and clinical plausibility of their association with a particular QLQ-C30 scale. Three change status groups were formed: deteriorated by one anchor category, improved by one anchor category, and no change. Patients with greater anchor changes were excluded. The mean change method was used to estimate MIDs for within-group change, and linear regression was used to estimate MIDs for between-group differences in change over time. For a given QLQ-C30 scale, MID estimates from multiple anchors were triangulated to a single value via a correlation-based weighted average.
RESULTS: MIDs varied by QLQ-C30 scale, direction (improvement vs deterioration), and anchor. MIDs for within-group change ranged from 5 to 14 points (improvement) and -14 to -4 points (deterioration), and MIDs for between-group change over time ranged from 4 to 11 points and from -18 to -4 points. Correlation-weighted MIDs for most QLQ-C30 scales ranged from 4 to 10 points in absolute values.
CONCLUSIONS: Our findings aid interpretation of changes in EORTC QLQ-C30 scores over time, both within and between groups, and for performing more accurate sample size calculations for clinical trials in advanced breast cancer.

Entities: Chemical

Year: 2019 PMID： 32328553 PMCID： PMC7050000 DOI： 10.1093/jncics/pkz037

Source DB: PubMed Journal: JNCI Cancer Spectr ISSN： 2515-5091

Patient-reported outcomes such as health-related quality of life (HRQOL) are increasingly assessed as important endpoints in cancer clinical trials. As a result, there is growing interest to improve the interpretation of HRQOL data in cancer clinical trials (1). It is recognized that interpreting HRQOL scores merely via statistical significance might be misleading because small differences in mean scores can be statistically significant, even when clinical relevance is absent. The minimally important difference (MID) approach aids interpreting differences and changes in HRQOL scores as clinically meaningful (2–7). MID can be defined as the smallest change in a HRQOL score that is perceived as “important” by a patient or by a third party (eg, a clinician), which may indicate a change in the patient’s management (2). MIDs are commonly estimated using anchor-based and distribution-based methods (7). Anchor-based methods express differences or change in HRQOL scores using other familiar variables that have clinical relevance (3,7–9) or to patient and/or physician-derived ratings of change in the specific domain (4–6). Distribution-based methods use the statistical distribution of HRQOL scores (eg, SD criteria or SEM) and are considered as supportive evidence to anchor-based methods (10). This study focused on interpreting the European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire core 30 (EORTC QLQ-C30) in patients with advanced breast cancer. Guidelines for interpreting the QLQ-C30 were initially published by King (3) and Osoba et al. (4). King (3) evaluated published evidence about differences in QLQ-C30 scores between groups for multiple cancer sites and clinical anchors and found that the score range for small, moderate, and large effects differed between the scales of the QLQ-C30. Osoba et al. (4) provided thresholds for interpreting small (5 to 10 points), moderate (10 to 20 points), and large changes (>20 points) in QLQ-C30 scores using a global rating of change in metastatic breast and small-cell lung cancer patients. Based on King (3) and Osoba et al. (4), mean differences no less than 10 points are widely regarded as clinically meaningful for the QLQ-C30 in randomized clinical trials (11). However, recent guidelines revealed that MIDs can differ by QLQ-C30 scale, direction of change (improvement vs deterioration), and settings (5, 6), rendering a widely applicable rule for MIDs highly unlikely. We therefore need to gather further empirical evidence on patterns of MIDs across QLQ-C30 scales and disease sites (12). This study examined MIDs for group-level change in HRQOL scores over time. In contrast to Osoba et al. (4), we used available clinical anchors in the database. Furthermore, the guidelines of King (3) and Cocks et al. (5, 6) were based on meta-analyses of published studies, pooling across cancer sites, whereas this study used individual patient data from archived trials.

Methods

Data Description

Data were derived from two published phase III EORTC trials. Trial 1 assessed the clinical benefit of a dose-intensive anthracycline-based regimen compared with standard treatment in women with locally advanced breast cancer and enrolled 448 patients (13). Trial 2 compared a combination of doxorubicin and paclitaxel vs doxorubicin and cyclophosphamide as first-line chemotherapy in advanced (metastatic) breast cancer and enrolled 275 patients (14). Both trials assessed HRQOL using the QLQ-C30 at baseline, during treatment, and at several follow-up time points after the end of treatment.

The EORTC QLQ-C30

The EORTC QLQ-C30 comprises 30 items, 24 of which are aggregated into 9 multi-item scales: 5 functioning scales (physical, role, cognitive [CF], emotional, and social); 3 symptom scales (fatigue, pain, and nausea and/or vomiting); and 1 global health-status scale. The remaining six single items assess symptoms of dyspnea, appetite loss (AP), sleep disturbance, constipation, diarrhea, and financial impact. Both trials used version 2 of the QLQ-C30, with standard scoring applied to the scales (15). For consistency in signs, all scales were scored such that 0 represents the worst possible score and 100, the best possible score. Financial impact was omitted from the analysis because suitable anchors were not available.

Clinical Anchor

Anchors were selected from variables that were available in the trial datasets (eg, physician examinations and common terminology criteria for adverse events [CTCAE]). Anchors were selected for each HRQOL scale based on correlation strength. Spearman rank, polyserial, or polychoric correlation was used, depending on the distribution of the pair of variables. Anchors with correlations no less than 0.30 were prioritized (10), and where achievable, anchors with much stronger correlations were targeted. The retained anchors were further verified for clinical plausibility by a panel of six breast cancer and/or HRQOL experts to avoid spurious findings. Multiple anchors could be selected for each HRQOL scale (12). For trial 1, the retained anchors comprised 1) World Health Organization PS, scored between 0 (completely active with no limitations) and 4 (bedbound); and 2) four CTCAEs (nausea, vomiting, fatigue, and alopecia), graded between 0 (no toxicity) to 4 (life-threatening). The only anchor retained for trial 2 was the PS.

Definition of Clinical Change Groups

Three clinical change status groups (CCG) were defined: deterioration (worsened by 1 anchor category), stable (no change in anchor category), and improvement (improved by 1 anchor category). In order not to overestimate the MIDs, change scores no less than a 2-point change in anchor categories were excluded from datasets used to estimate MIDs because they were considered to be above the “minimal” expected change.

Statistical Analysis

Anchor-Based Methods

Change scores of HRQOL scale and anchor pairs were computed across all pairwise time points and combined to provide sufficient data for examining clinically important changes. For example, for a subject measured at time points ta, tb, and tc, change scores were computed between ta and tb; ta and tc; and tb and tc. Hence, a subject can contribute multiple change scores, and given their change scores, subjects can contribute to multiple CCGs. Only subjects with HRQOL and anchor data for a given pair of time points contributed to the calculation of change scores. Data from the two trials were pooled to estimate MIDs. The mean change method was used to estimate MIDs for within-group change over time. MIDs for improvement and deterioration were computed as the mean HRQOL change scores for the improvement and deterioration CCGs, respectively. This is relevant for interpreting change within a single group of patients, and it is similar to the mean HRQOL change score over time for a treatment group in a trial. Effect sizes (ESs) were computed within each CCG by dividing the mean of the HRQOL change scores (derived from all the pairwise time point differences) by the SD of the HRQOL change scores over all time points. Only mean changes with an ES no less than 0.2 and less than 0.8 were considered appropriate for inclusion as MIDs. This was based on Cohen’s (16) recommendations that an ES of 0.2 is small, 0.5 is moderate, and no less than 0.8 is large. The rationale was that ESs less than 0.2 reflect changes that are clinically unimportant, and those no less than 0.8 are obviously more than minimally important. The difference in change scores between the improvement (or deterioration) CCG and no change CCG was compared using analysis of variance (ANOVA). A linear regression was used to estimate MIDs for differences between groups in change over time. For a given HRQOL scale and anchor pair, the outcome variable was the HRQOL change score, and the covariate was a binary anchor variable (coded as stable = 0 and improvement = 1 when modeling improvement [deteriorated observations were excluded], and stable = 0 and deterioration = 1 when modeling deterioration [improved observations were excluded]). Because change scores were computed across all pairwise time points, some patients contributed change scores to more than one CCG and more than one change score to a particular CCG. We corrected for the association between multiple change scores contributed by the same patients by specifying a suitable covariance structure using the generalized estimating equations (17). The slope parameters for the “improved” and “deteriorated” covariates correspond to the MID for improvement and deterioration, respectively. This approach is similar to comparing the mean HRQOL change score over time in a treatment group to a control group in a trial, which is why these MIDs are useful for interpreting changes over time between two distinct groups of patients. Furthermore, we compared the two trials by adding a “trial” effect in a linear regression model, separately for improving and deteriorating HRQOL scores. This was based on the data with PS as the anchor. Both within-group and between-group MID estimates for a given HRQOL scale, from multiple anchors, were triangulated to a single value via a correlation-based weighted average.

Distribution-Based Methods

The SEM, 0.2 SD, 0.3 SD, and 0.5 SD were applied to HRQOL scores at two time points common to both trials: 1) start of treatment (t1), time point before or on the first day of treatment, and 2) end of treatment (t2), last day of protocol treatment. Test–retest reliability estimates to compute SEM for the QLQ-C30 were based on Hjermstad et al. (18). All analyses were performed using the SAS software (19).

Results

Table 1 summarizes the demographic and clinical characteristics of patients at baseline. The median follow-up time (in months) for HRQOL was 5.3 (16.9) for trial 1 and 1.6 (2.8) for trial 2. An overview of the flow of patients through this study is presented in Supplementary Figure 1 (available online). Cross-sectional correlations ranged from 0.20 to 0.62 in absolute value, with a majority of the correlation coefficients being above the 0.30 threshold (7) (Table 2). Correlations between the change scores ranged from 0.14 to 0.51. At least one suitable anchor was constructed for 8 of the 14 QLQ-C30 scales that were considered for this study. The distribution of patients and the number of change observations across the categories of suitable anchors are summarized in Supplementary Table 1 (available online).

Table 1.

Baseline demographic and clinical characteristics of the patients by study (all patients had advanced breast cancer)

Characteristic	Study 10921 No. (%) (N = 448)	Study 10961 No. (%) (N = 275)	Total (N = 723)
Performance status
0	394 (87.9)	119 (43.3)	513 (71.0)
1	54 (12.1)	133 (48.4)	187 (25.9)
2	0 (0.0)	22 (8.0)	22 (3.0)
Unknown	0 (0.0)	1 (0.4)	1 (0.1)
Number of positive nodes
N₀–N₁	250 (55.8)	144 (52.4)	394 (54.5)
N₂	176 (39.3)	26 (9.5)	202 (27.9)
N₄₊	0 (0.0)	51 (18.5)	51 (7.1)
N_x	9 (2.0)	41 (14.9)	50 (6.9)
N₃	13 (2.9)	13 (4.7)	26 (3.6)
Country
France	97 (21.7)	41 (14.9)	138 (19.1)
Netherlands	41 (9.2)	42 (15.3)	83 (11.5)
United Kingdom	11 (2.5)	68 (24.7)	79 (10.9)
Poland	78 (17.4)	0 (0.0)	78 (10.8)
Belgium	48 (10.7)	29 (10.5)	77 (10.7)
Canada	68 (15.2)	0 (0.0)	68 (9.4)
Slovenia	22 (4.9)	26 (9.5)	48 (6.6)
Switzerland	28 (6.3)	8 (2.9)	36 (5.0)
Russia	27 (6.0)	0 (0.0)	27 (3.7)
Italy	0 (0.0)	18 (6.5)	18 (2.5)
Israel	0 (0.0)	16 (5.8)	16 (2.2)
South Africa	3 (0.7)	12 (4.4)	15 (2.1)
Portugal	13 (2.9)	0 (0.0)	13 (1.8)
Czech Republic	12 (2.7)	0 (0.0)	12 (1.7)
Spain	0 (0.0)	9 (3.3)	9 (1.2)
Austria	0 (0.0)	6 (2.2)	6 (0.8)
Age, y
Mean (SD)	50.07 (9.68)	52.27 (9.61)	—
Range	26.0–79.0	28.0–70.0	—

Table 2.

Correlations over all time points of the EORTC QLQ-C30 scale scores with suitable anchors, and correlations between change scores of the EORTC QLQ-C30 scales and anchors

		Scores		Change scores
Scale	Anchor	n₁ (n_1R)*	Correlation	n₂ (n_2R)*	Correlation
PF	Performance status	587 (2922)	−0.52	548 (8508)	−0.30
	CTCAE fatigue	355 (2658)	−0.30	343 (11102)	−0.20
	CTCAE vomiting	355 (2656)	−0.30	343 (11077)	−0.25
RF	Performance status	587 (2922)	−0.54	547 (8520)	−0.20
SF	Performance status	594 (2890)	−0.34	545 (8390)	−0.20
	CTCAE fatigue	355 (2630)	−0.21	340 (10984)	−0.15
	CTCAE vomiting	355 (2628)	−0.25	340 (10959)	−0.20
CF	CTCAE fatigue	355 (2638)	−0.20	342 (11032)	−0.14
QL	CTCAE vomiting	355 (2628)	−0.39	341 (10892)	−0.30
	CTCAE nausea	355 (2628)	−0.39	341 (10892)	−0.30
	CTCAE alopecia	355 (2629)	−0.39	341 (10914)	−0.35
	Performance status	585 (2893)	−0.32	547 (8351)	−0.25
FA	Performance status	587 (2915)	−0.40	546 (8476)	−0.23
	CTCAE nausea	355 (2644)	−0.21	341 (11 014)	−0.15
	CTCAE vomiting	355 (2644)	−0.22	341 (11 014)	−0.16
NV	CTCAE nausea	355 (2654)	−0.60	343 (11 050)	−0.51
	CTCAE vomiting	355 (2654)	−0.62	343 (11 050)	−0.48
AP	CTCAE nausea	355 (2621)	−0.58	343 (10 816)	−0.44
	CTCAE vomiting	355 (2621)	−0.59	343 (10 816)	−0.48

n1 (n1R) and n2 (n2R) can vary by anchor and EORTC QLQ-C30 scale. AP = appetite loss; CF = cognitive functioning; EORTC QLQ-C30 = European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire core 30; FA = fatigue; n1 = number of patients with at least 1 matched EORTC QLQ-C30 and an anchor form; n1R = number of repeated anchor and HRQOL matched forms across all subjects; n2 = number of patients with at least 2 matched EORTC QLQ-C30 and an anchor form (at least 2 forms are needed to compute change scores); n2R = number of repeated EORTC QLQ-C30 scale and anchor change scores across all subjects; NV = nausea and/or vomiting; PF = physical functioning; QL = global quality of life; RF = role functioning; SF = social functioning.

Baseline demographic and clinical characteristics of the patients by study (all patients had advanced breast cancer) Correlations over all time points of the EORTC QLQ-C30 scale scores with suitable anchors, and correlations between change scores of the EORTC QLQ-C30 scales and anchors n1 (n1R) and n2 (n2R) can vary by anchor and EORTC QLQ-C30 scale. AP = appetite loss; CF = cognitive functioning; EORTC QLQ-C30 = European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire core 30; FA = fatigue; n1 = number of patients with at least 1 matched EORTC QLQ-C30 and an anchor form; n1R = number of repeated anchor and HRQOL matched forms across all subjects; n2 = number of patients with at least 2 matched EORTC QLQ-C30 and an anchor form (at least 2 forms are needed to compute change scores); n2R = number of repeated EORTC QLQ-C30 scale and anchor change scores across all subjects; NV = nausea and/or vomiting; PF = physical functioning; QL = global quality of life; RF = role functioning; SF = social functioning. Table 3 shows the range of MIDs from the mean change method (useful for interpreting within-group change over time) and the linear regression (useful for interpreting between-group differences in change over time) for each HRQOL scale, across multiple anchors. Detailed results are presented in Supplementary Table 2 (available online).

Table 3.

Range of anchor-based MID estimates from the mean change method and linear regression

	Mean change method*		Linear regression†
Scale	Improvement	Deterioration	Improvement	Deterioration
PF	7 to 10	−11 to −10	7 to 9	−10 to −8
RF	No MID	−6	No MID	−4
SF	7 to 9	−9 to −5	6 to 7	−11 to −5
CF	5	−4	4	−4
QL	10 to 14	−11 to −5	8 to 11	− 13 to −6
FA	8	−9 to −7	8	−8 to −6
NV	No MID	−12	No MID	−14
AP	No MID	−14	No MID	−18

The mean change method is useful for interpreting within-group change over time. The symptom scores were reversed to follow the functioning scales interpretation (ie, 0 represents the worst possible score and 100 the best possible score); “no MID” is used where no MID estimate is available either because of the absence of a suitable anchor or ES was either <0.2 or ≥0.8. All of the ESs for the no change group were <0.2. AP = appetite loss; CF = cognitive functioning; ES = effect size; FA = fatigue; MID = minimally important difference; NV = nausea and/or vomiting; PF = physical functioning; QL = global quality of life; RF = role functioning; SF = social functioning.

The linear regression is useful for interpreting between-group differences in change over time.

Range of anchor-based MID estimates from the mean change method and linear regression The mean change method is useful for interpreting within-group change over time. The symptom scores were reversed to follow the functioning scales interpretation (ie, 0 represents the worst possible score and 100 the best possible score); “no MID” is used where no MID estimate is available either because of the absence of a suitable anchor or ES was either <0.2 or ≥0.8. All of the ESs for the no change group were <0.2. AP = appetite loss; CF = cognitive functioning; ES = effect size; FA = fatigue; MID = minimally important difference; NV = nausea and/or vomiting; PF = physical functioning; QL = global quality of life; RF = role functioning; SF = social functioning. The linear regression is useful for interpreting between-group differences in change over time. The MIDs varied according to the scale, direction of change scores (improvement vs deterioration), and anchor (Figure 1). Estimates were always in the expected direction according to the anchor (ie, positive vs negative change scores within the improvement vs deterioration CCG, respectively). Statistically significant differences (ANOVA P < .05) were observed between the HRQOL change scores for all improvement and deterioration CCGs vs no change CCG.

Figure 1.

Mean change and 95% confidence interval for improvement and deterioration EORTC QLQ-C30 scales, across multiple anchors and averaged across different time periods. Estimates are available only for scales with at least 1 suitable anchor and with effect size ≥0.2 and <0.8 within the “deteriorate” and “improve” groups, respectively. These mean change scores are useful for interpreting within-group change over time. AP = appetite loss; CF = cognitive functioning; CTCAE = common terminology criteria for adverse events; deteriorate = worsened by 1 anchor category; EORTC QLQ-C30 = European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire Core 30; FA = fatigue; Improve = improved by 1 category; NV = nausea and/or vomiting; PF = physical functioning; QL = global quality of life; RF = role functioning; SF = social functioning. MIDs for within-group change (based on the mean-change method) ranged from 5 to 14 points (improvement) and −14 to −4 points (deterioration), and MIDs for between-group change (based on the linear regression) ranged from 4 to 11 points and from −18 to −4 points (Table 3). For the majority of the QLQ-C30 scales, the estimated MIDs ranged from 4 to 10 points in absolute values. Adding a trial effect to the regression models showed no statistically significant differences in change scores between the two trials, hence, supporting the combination of the two trials. The MIDs in Table 3 are summarized to single MID values per scale in Table 4 and ranged from 4 to 10 points in absolute values for most HRQOL scales. Table 4 also compares the anchor-based estimates to the distribution-based estimates at t1. The distribution-based estimates at t2 for each HRQOL scale were similar to t1, mostly within a less-than-1-point range. All anchor-based estimates were no less than 0.2 SD, with most estimates being less than 0.5 SD. The anchor-based estimates tended to be closer to both the 0.3 SD and the 1 SEM.

Table 4.

Summary of anchor-based MIDs for within- and between-group changes compared with distribution-based estimates

	Anchor-based MID for within-group change*		Anchor-based MID for between-groups difference in change*		Distribution-based QL scores at t₁ (n = 415–425)
Scale	Improvement	Deterioration	Improvement	Deterioration	0.2 SD	0.3 SD	0.5 SD	1 SEM
PF	9	−10	8	−9	4.7	7.0	11.7	7.0
RF	No MID	−6	No MID	−4	5.1	7.6	12.7	10.7
SF	8	−7	7	−8	5.3	7.9	13.1	9.5
CF	5	−4	4	−4	4.1	6.2	10.3	8.8
QL	12	−8	10	−10	4.9	7.3	12.2	10.3
FA	8	−8	8	−7	4.9	7.3	12.2	10.0
NV	No MID	−11	No MID	−14	3.4	5.1	8.5	10.3
AP	No MID	−14	No MID	−18	5.2	7.8	13.1	12.0

The within-group MIDs (from the mean change method) and the between-group MIDs (from the linear regression) were summarized via weighted averages based on scale and anchor pair correlation. The symptom scores were reversed to follow the functioning scales interpretation (ie, 0 represents the worst possible score and 100, the best possible score); “no MID” is used where no MID estimate is available either because of the absence of a suitable anchor or ES was either <0.2 or ≥0.8. = appetite loss; CF = cognitive functioning; ES = effect size; FA = fatigue; MID = minimally important difference; n = number of patients; NV = nausea/vomiting; PF = physical functioning; QL = global quality of life; RF = role functioning; SF = social functioning; t1 = time points for the start of treatment.

Summary of anchor-based MIDs for within- and between-group changes compared with distribution-based estimates The within-group MIDs (from the mean change method) and the between-group MIDs (from the linear regression) were summarized via weighted averages based on scale and anchor pair correlation. The symptom scores were reversed to follow the functioning scales interpretation (ie, 0 represents the worst possible score and 100, the best possible score); “no MID” is used where no MID estimate is available either because of the absence of a suitable anchor or ES was either <0.2 or ≥0.8. = appetite loss; CF = cognitive functioning; ES = effect size; FA = fatigue; MID = minimally important difference; n = number of patients; NV = nausea/vomiting; PF = physical functioning; QL = global quality of life; RF = role functioning; SF = social functioning; t1 = time points for the start of treatment.

Discussion

This study examined MIDs for interpreting group-level change of EORTC QLQ-C30 scores over time in patients with advanced breast cancer. Anchors for each HRQOL scale were selected based on both the correlation strength and the clinical plausibility. When available, multiple anchors were used per HRQOL scale to provide some reassurance about the plausibility of the estimated MIDs. Despite the modest correlation between anchors and scales change scores, most MID estimates from multiple anchors were in a narrow range (often<5 points) and were always in the expected direction according to the anchor change category. In agreement with recent findings (5–9), our estimates varied by HRQOL scale and direction of change (improvement vs deterioration). Similar to Maringwa et al. (8, 9). and Musoro et al. (7), no systematic differences were observed in the magnitude of change between deteriorating and improving scores. However, other studies reported that estimates for deterioration tended to be larger than those for improvement (6, 20). We distinguished between MIDs for interpreting the degree of change within a group (obtained from the mean change method) and MIDs for interpreting the degree of differences between groups in within-group change (obtained from linear regression). Interestingly, estimates from both approaches were often in the same range. For many scales, the MIDs were within the range of 5–10 points that was suggested by Osoba et al. (4) and also observed by Cocks et al. (5, 6), Musoro et al. (7), and Maringwa et al. (8, 9). However, similar to Cocks et al. (5, 6), we noticed that the thresholds for some scales were much lower. For example, MIDs of 4 points were observed for the CF scale. Musoro et al. (7) also reported MIDs that were as low as 3 points for the CF scale in patients with malignant melanoma. On the other hand, similar to Musoro et al. (7), we observed a much bigger threshold of 18 points for the AP scale. This reinforces the evidence that there is no single global standard for clinically meaningful change, and scale-specific MIDs should therefore be selected with more caution. Most often, investigators seeking MIDs would desire simple guidelines. However, as shown in this article, results are often varied as a consequence of there being numerous anchors, various distribution-based criteria, and multiple HRQOL scales. Results shown in Figure 1 and Table 3 represent this diversity because the range of MIDs varies by the different anchors. We acknowledge that end users may find such a range of options confusing. So, to provide a single MID value per scale, we further simplified this by calculating a correlation-weighted average across multiple anchors. End users can choose to work with either the ranges provided in Table 3 or the single values provided in Table 4, whichever they feel most comfortable with. Most of the anchor-based estimates were closer to 0.3 SD and 1 SEM compared to the commonly reported 0.5 (21). A limitation of this study is that suitable anchors were not always available, hence, anchor-based MIDs could not be estimated for seven of the EORTC QLQ-C30 scales, which were omitted in this article Furthermore, the available anchors (PS or CTCAE grades) relied exclusively on clinical observations or interpretations. Because the two trials that were used in this study evaluated chronic delivery of cytotoxic chemotherapy, clinical anchors such as CTCAE nausea, CTCAE vomiting, and CTCAE fatigue were reasonable and relevant. The availability of a pretreatment baseline assessment also allows detecting persistent effects such as alopecia. However, such anchors might not be relevant in other settings, treatments, or subtypes of breast cancer. The available anchors were also not necessarily suitable in all situations. For example, although CTCAE fatigue met the requirements of a plausible clinical relationship with the QLQ-C30 fatigue scale, the resulting correlations were too low (<0.1) to be retained. The low correlation can be explained by the discrete nature of the CTCAE scale where only a few high-grade events were scored. Moreover, because of the subjective nature of fatigue, there is likely also misrepresentation by physicians compared to patients’ ratings as already reported by Basch et al. (22).This might also explain the potentially inflated MID estimates for the AP scale. Also, anchors that are based on patients’ perspective of change (eg, subjective significance questionnaires) were not available in our study. Nonetheless, it is reassuring to notice the considerable overlap between our findings and those of Osoba et al. (4), which used patients’ ratings of change as the anchor. Patients’ self-assessed ratings across the different QLQ-C30 scales and across different disease sites are rarely available from retrospective data sources and would need to be planned as future research to complement our findings. Another limitation is that our data originate from two controlled clinical trials, each with specific selection and treatment criteria. Although results were consistent between the trials, extrapolation beyond their specific setting should be made with caution. A number of articles are available that provide general guidelines for selecting MIDs for the QLQ-C30 scales (5, 6, 11). For instance, Cocks et al. (6) published MIDs for interpreting QLQ-C30 change scores over time for all 15 scales using published results from multiple cancer sites. The MID values obtained for the eight scales considered in this study were comparable to those presented by Cocks et al. (6). These increasingly robust guidelines advocate a more nuanced approach to clinical relevance beyond a single threshold. In conclusion, our findings can help clinicians and researchers interpret the clinical relevance of group-level change of QLQ-C30 scores over time in patients with advanced breast cancer. The fact that MIDs can vary by QLQ-C30 scale and anchor suggests that we cannot rely on global standards for defining clinically meaningful change. Finally, our results will also inform more accurate sample size calculations for clinical trials in advanced breast cancer with endpoints that are based on EORTC QLQ-C30 scales.

Funding

This study was funded by the EORTC Quality of Life Group.

Notes

Affiliations of authors: European Organisation for Research and Treatment of Cancer, Brussels, Belgium (JZM, CC, AB); Department of Medical Oncology, University Hospital of Nîmes, France (FF); Institut de Recherche en Cancérologie de Montpellier, France (FF); University of Montpellier, France (FF); Maria Sklodowska-Curie Institute-Oncology Center, Warsaw, Poland (PK); Breast Unit, Champalimaud Clinical Centre, Champalimaud Foundation, Lisbon, Portugal (FC); Netherlands Cancer Institute, Amsterdam, The Netherlands (NSR); University of Sydney, Faculty of Science, School of Psychology, Sydney, NSW, Australia (MTK); Department of Health Sciences, University of York, York, UK (KC); Adelphi Values, Bollington, Cheshire, UK (KC); Department of Medical Psychology, Amsterdam University Medical Centers, Academic Medical Center, University of Amsterdam, Cancer Center Amsterdam, The Netherlands (MAS); Department of Public Health, University of Copenhagen, and Bispebjerg Hospital, Copenhagen, Denmark (MG); Leeds Institute of Cancer and Pathology, University of Leeds, St James’s Hospital, Leeds, UK (GV); Clinic for Child and Adolescent Psychiatry and Psychotherapy, University of Magdeburg, Magdeburg, Germany (H-HF). We thank the EORTC breast disease group members and their clinical investigators and all the patients who participated in the trials that we used for this analysis. The authors declare no conflicts of interest.

19 in total

1. Evidence-based guidelines for interpreting change scores for the European Organisation for the Research and Treatment of Cancer Quality of Life Questionnaire Core 30.

Authors: K Cocks; M T King; G Velikova; G de Castro; M Martyn St-James; P M Fayers; J M Brown
Journal: Eur J Cancer Date: 2012-03-12 Impact factor: 9.162

Review 2. Evidence-based guidelines for determination of sample size and interpretation of the European Organisation for the Research and Treatment of Cancer Quality of Life Questionnaire Core 30.

Authors: Kim Cocks; Madeleine T King; Galina Velikova; Marrissa Martyn St-James; Peter M Fayers; Julia M Brown
Journal: J Clin Oncol Date: 2010-11-22 Impact factor: 44.544

Review 3. Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes.

Authors: Dennis Revicki; Ron D Hays; David Cella; Jeff Sloan
Journal: J Clin Epidemiol Date: 2007-08-03 Impact factor: 6.437

Review 4. Health related quality of life outcomes in cancer clinical trials.

Authors: A Bottomley; H Flechtner; F Efficace; V Vanvoorden; C Coens; P Therasse; G Velikova; J Blazeby; E Greimel
Journal: Eur J Cancer Date: 2005-08 Impact factor: 9.162

Review 5. Regression analysis for correlated data.

Authors: K Y Liang; S L Zeger
Journal: Annu Rev Public Health Date: 1993 Impact factor: 21.981

6. Minimal important differences for interpreting health-related quality of life scores from the EORTC QLQ-C30 in lung cancer patients participating in randomized controlled trials.

Authors: John T Maringwa; Chantal Quinten; Madeleine King; Jolie Ringash; David Osoba; Corneel Coens; Francesca Martinelli; Jurgen Vercauteren; Charles S Cleeland; Henning Flechtner; Carolyn Gotay; Eva Greimel; Martin J Taphoorn; Bryce B Reeve; Joseph Schmucker-Von Koch; Joachim Weis; Egbert F Smit; Jan P van Meerbeeck; Andrew Bottomley
Journal: Support Care Cancer Date: 2010-10-01 Impact factor: 3.603

7. Test/retest study of the European Organization for Research and Treatment of Cancer Core Quality-of-Life Questionnaire.

Authors: M J Hjermstad; S D Fossa; K Bjordal; S Kaasa
Journal: J Clin Oncol Date: 1995-05 Impact factor: 44.544

8. Interpreting clinically significant changes in patient-reported outcomes.

Authors: Jolie Ringash; Brian O'Sullivan; Andrea Bezjak; Donald A Redelmeier
Journal: Cancer Date: 2007-07-01 Impact factor: 6.860

9. Establishing anchor-based minimally important differences (MID) with the EORTC quality-of-life measures: a meta-analysis protocol.

Authors: Zebedee Jammbe Musoro; Jean-Francois Hamel; Divine Ewane Ediebah; Kim Cocks; Madeleine T King; Mogens Groenvold; Mirjam A G Sprangers; Yvonne Brandberg; Galina Velikova; John Maringwa; Hans-Henning Flechtner; Andrew Bottomley; Corneel Coens
Journal: BMJ Open Date: 2018-01-10 Impact factor: 2.692

Review 10. Distribution- and anchor-based methods to determine the minimally important difference on patient-reported outcome questionnaires in oncology: a structured review.

Authors: Ahmad Ousmen; Célia Touraine; Nina Deliu; Francesco Cottone; Franck Bonnetain; Fabio Efficace; Anne Brédart; Caroline Mollevi; Amélie Anota
Journal: Health Qual Life Outcomes Date: 2018-12-11 Impact factor: 3.186

12 in total

Review 1. Triangulation of multiple meaningful change thresholds for patient-reported outcome scores.

Authors: Andrew Trigg; Philip Griffiths
Journal: Qual Life Res Date: 2021-07-28 Impact factor: 4.147

2. Electroacupuncture for Cancer-Related Cognitive Impairment: A Clinical Feasibility Study.

Authors: Yee Ran Lyu; Hye-Yoon Lee; Hyo-Ju Park; O-Jin Kwon; Ae-Ran Kim; In Chul Jung; Yang-Chun Park; Jung-Hyo Cho; Jung-Eun Kim; Mikyung Kim; Jun-Hwan Lee; Joo-Hee Kim
Journal: Integr Cancer Ther Date: 2022 Jan-Dec Impact factor: 3.077

3. The Effect of Exercise on Life Quality and Depression Levels of Breast Cancer Patients.

Authors: Mensure Aydin; Elif Kose; Ilhan Odabas; Bergun Meric Bingul; Deniz Demirci; Zeki Aydin
Journal: Asian Pac J Cancer Prev Date: 2021-03-01

4. Especially for neuro-oncologists-minimally important differences for the EORTC QLQ-C30 in glioma patients.

Authors: Tito R Mendoza
Journal: Neuro Oncol Date: 2021-08-02 Impact factor: 13.029

5. Minimal important differences of EORTC QLQ-C30 for metastatic breast cancer patients: Results from a randomized clinical trial.

Authors: Takuya Kawahara; Naruto Taira; Takeru Shiroiwa; Yasuhiro Hagiwara; Takashi Fukuda; Yukari Uemura; Hirofumi Mukai
Journal: Qual Life Res Date: 2022-01-04 Impact factor: 3.440

6. A 4-Month Whole-Systems Ayurvedic Medicine Nutrition and Lifestyle Intervention Is Feasible and Acceptable for Breast Cancer Survivors: Results of a Single-Arm Pilot Clinical Trial.

Authors: Anand Dhruva; Cairn Wu; Christine Miaskowski; Wendy Hartogensis; Hope S Rugo; Shelley R Adler; Ted J Kaptchuk; Rucha Kelkar; Sangeeta Agarawal; Amisha Vadodaria; Ellen Garris; Frederick M Hecht
Journal: Glob Adv Health Med Date: 2020-12-01

7. Impact of the COVID-19 Pandemic on Patient-Reported Outcomes of Breast Cancer Patients and Survivors.

Authors: Claudia A Bargon; Marilot C T Batenburg; Lilianne E van Stam; Dieuwke R Mink van der Molen; Iris E van Dam; Femke van der Leij; Inge O Baas; Miranda F Ernst; Wiesje Maarse; Nieke Vermulst; Ernst J P Schoenmaeckers; Thijs van Dalen; Rhodé M Bijlsma; Danny A Young-Afat; Annemiek Doeksen; Helena M Verkooijen
Journal: JNCI Cancer Spectr Date: 2020-11-05

8. Establishing anchor-based minimally important differences for the EORTC QLQ-C30 in glioma patients.

Authors: Linda Dirven; Jammbe Z Musoro; Corneel Coens; Jaap C Reijneveld; Martin J B Taphoorn; Florien W Boele; Mogens Groenvold; Martin J van den Bent; Roger Stupp; Galina Velikova; Kim Cocks; Mirjam A G Sprangers; Madeleine T King; Hans-Henning Flechtner; Andrew Bottomley
Journal: Neuro Oncol Date: 2021-08-02 Impact factor: 12.300

Review 9. Measuring Quality of Life Using Patient-Reported Outcomes in Real-World Metastatic Breast Cancer Patients: The Need for a Standardized Approach.

Authors: Marloes E Clarijs; Jacob Thurell; Friedrich Kühn; Carin A Uyl-de Groot; Elham Hedayati; Maria M Karsten; Agnes Jager; Linetta B Koppert
Journal: Cancers (Basel) Date: 2021-05-12 Impact factor: 6.639

10. Real-world patient-reported outcomes of women receiving initial endocrine-based therapy for HR+/HER2- advanced breast cancer in five European countries.

Authors: Alison Davie; Gebra Cuyun Carter; Alex Rider; James Pike; Katie Lewis; Abigail Bailey; Gregory L Price; Francois Ringeisen; Xavier Pivot
Journal: BMC Cancer Date: 2020-09-07 Impact factor: 4.430