| Literature DB >> 35838889 |
Ilene L Hollin1, Jonathan Paskett2, Anne L R Schuster2, Norah L Crossnohere2, John F P Bridges3.
Abstract
BACKGROUND ANDEntities:
Mesh:
Year: 2022 PMID: 35838889 PMCID: PMC9363399 DOI: 10.1007/s40273-022-01167-1
Source DB: PubMed Journal: Pharmacoeconomics ISSN: 1170-7690 Impact factor: 4.558
Changes in the application of best-worst scaling in health over time
| 2010–2016 | 2017–2021 | ||
|---|---|---|---|
| Number of studies per year, mean [range] | 4.9 [1–15] | 26.4 [11–37] | < 0.01 |
| Region, | 0.94 | ||
| North America | 16 (47.1%) | 55 (42.0%) | |
| Europe | 8 (23.5%) | 28 (21.2%) | |
| Asia | 3 (8.8%) | 14 (10.7%) | |
| Africa | 1 (2.9%) | 7 (5.3%) | |
| Oceania | 2 (5.9%) | 6 (4.6%) | |
| South America | 0 (0.0%) | 2 (1.5%) | |
| Multi-continent | 4 (11.8%) | 15 (11.5%) | |
| Not specified | 0 (0.0%) | 4 (3.1%) | |
| Terminology used to describe, | 0.04 | ||
| BWS | 23 (67.6%) | 89 (67.9%) | |
| BWS object case | 0 (0.0%) | 17 (13.0%) | |
| MaxDiff | 7 (20.6%) | 12 (9.2%) | |
| BWS case 1 | 3 (8.8%) | 11 (8.4%) | |
| Best worst choice/task | 0 (0.0%) | 2 (1.5%) | |
| Object scaling | 1 (2.9%) | 0 (0.0%) | |
| Study objective*, | |||
| Substantive/empirical | 33 (97.1%) | 128 (97.7%) | 0.83 |
| Educational/methodological | 3 (8.8%) | 2 (1.5%) | 0.03 |
| Pilot | 1 (2.9%) | 2 (1.5%) | 0.58 |
| Perspective*, | |||
| Patient/consumer | 20 (58.8%) | 88 (66.7%) | 0.39 |
| Provider/producer | 13 (38.2%) | 43 (32.8%) | 0.55 |
| Citizen/societal | 2 (5.9%) | 10 (7.6%) | 0.73 |
BWS best–worst scaling
*All that apply
Fig. 1Flow chart. BWS best–worst scaling
Fig. 2Perspective of best–worst scaling studies over time
Changes in the development of best–worst scaling instruments in health over time
| 2010–16 | 2017–21 | ||
|---|---|---|---|
| Methods of instrument development*, | |||
| Literature review | 18 (52.9%) | 82 (62.6%) | 0.30 |
| Pilot test | 13 (38.2%) | 37 (28.2%) | 0.26 |
| Pretest | 3 (8.8%) | 32 (24.4%) | 0.05 |
| Formal qualitative research | 15 (44.1%) | 51 (38.6%) | 0.56 |
| Key informant interviews | 7 (20.6%) | 47 (35.6%) | 0.10 |
| Prior preference research | 3 (8.8%) | 15 (11.5%) | 0.66 |
| Existing list of objects | 2 (5.9%) | 8 (6.1%) | 0.96 |
*All that apply
Changes in the design of best-worst scaling applications in health over time
| 2010–2016 | 2017–2021 | ||
|---|---|---|---|
| Mode of survey administration*, | |||
| Online | 15 (46.9%) | 100 (76.9%) | < 0.01 |
| Self-administered** (paper) | 10 (31.2%) | 18 (13.8%) | 0.02 |
| Administered | 7 (21.9%) | 17 (13.0%) | 0.20 |
| Time-horizon, | < 0.01 | ||
| Present | 26 (76.5%) | 121 (92.4%) | |
| Future | 5 (14.7%) | 9 (6.9%) | |
| Past | 3 (8.8%) | 1 (0.8%) | |
| Measurement scale, | 0.48 | ||
| Importance/priorities | 27 (79.4%) | 91 (69.5%) | |
| Preference | 4 (11.8%) | 19 (14.5%) | |
| Emotion | 3 (8.8%) | 21 (16.0%) | |
| Experimental design, | 0.18 | ||
| BIBD | 18 (52.9%) | 64 (48.9%) | |
| Sawtooth | 6 (17.6%) | 31 (23.7%) | |
| Orthogonal | 3 (8.8%) | 6 (4.6%) | |
| Random | 0 (0.0%) | 2 (1.5%) | |
| Other | 3 (8.8%) | 2 (1.5%) | |
| Not specified | 4 (11.8%) | 26 (19.5%) | |
| BWS anchor description, | 0.04 | ||
| Most/least | 26 (76.5%) | 117 (89.3%) | |
| Best/worst | 7 (20.6%) | 14 (10.7%) | |
| Other | 1 (2.9%) | 0 (0.0%) | |
| Objects total, mean (SD) | 15.3 (9.0) | 16.0 (8.9) | 0.68 |
| Objects per task, mean (SD) | 4.8 (1.6) | 4.6 (1.1) | 0.40 |
| Choice tasks total, mean (SD) | 20.8 (23.9) | 24.3 (34.7) | 0.60 |
| Choice tasks per respondent, mean (SD) | 13.1 (3.3) | 12.5 (6.4) | 0.63 |
BIBD balanced incomplete block design, BWS best–worst scaling, SD standard deviation
*All that apply
**Self-administered a paper survey in person or at home and returned via mail
Fig. 3Number of objects included in best–worst scaling studies
Fig. 4Number of objects per task included in best–worst scaling studies
Changes in the administration/analysis of best–worst scaling applications studies in health over time
| 2010–2016 | 2017–2021 | ||
|---|---|---|---|
| Sample size, mean (SD) | 221.8 (205.6) | 472.4 (896.2) | 0.11 |
| Sample size justification, | 0.68 | ||
| Rule of thumb | 3 (8.8%) | 16 (12.2%) | |
| Historical/empirical justification | 4 (11.8%) | 22 (16.8%) | |
| Sample size calculation | 1 (2.9%) | 7 (5.3%) | |
| Not specified | 26 (76.5%) | 86 (65.6%) | |
| Analytic program*, | |||
| Sawtooth | 12 (35.3%) | 36 (27.5%) | 0.37 |
| Stata | 4 (11.8%) | 29 (22.1%) | 0.18 |
| | 1 (2.9%) | 20 (15.3%) | 0.06 |
| NLOGIT | 0 (0.0%) | 10 (7.6%) | 0.10 |
| SPSS | 4 (11.8%) | 15 (11.5%) | 0.96 |
| Latent Gold Choice | 0 (0.0%) | 4 (3.1%) | 0.30 |
| SAS | 2 (5.9%) | 12 (9.2%) | 0.54 |
| Excel | 0 (0.0%) | 8 (6.1%) | 0.14 |
| Analytic approach*, | |||
| Probability/ratio rescaling | 17 (50.0%) | 73 (55.7%) | 0.55 |
| B–W score | 15 (44.1%) | 45 (34.4%) | 0.29 |
| Coefficients | 9 (26.5%) | 55 (42.0%) | 0.10 |
| Counts | 8 (23.5%) | 27 (20.6%) | 0.71 |
| Square roots | 5 (14.7%) | 8 (6.1%) | 0.10 |
| SUCRA | 0 (0.0%) | 2 (1.5%) | 0.47 |
| Theoretical assumption, | 0.42 | ||
| MaxDiff | 12 (35.3%) | 33 (25.2%) | |
| Sequential BWS | 3 (8.8%) | 18 (13.7%) | |
| Simultaneous | 1 (2.9%) | 1 (0.8%) | |
| Not specified | 18 (52.9%) | 79 (60.3%) | |
| Directionality (unidirectional), | 21 (61.8%) | 73 (55.7%) | 0.53 |
| Heterogeneity analysis used, | 19 (55.9%) | 99 (75.6%) | 0.02 |
| Heterogeneity analysis method*, | |||
| Stratification | 16 (47.1%) | 70 (53.4%) | 0.51 |
| Segmentation/latent class | 3 (8.8%) | 18 (13.7%) | 0.44 |
| Mixed logit | 1 (2.9%) | 14 (10.7%) | 0.16 |
| Individual level score | 1 (2.9%) | 3 (2.3%) | 0.82 |
SD standard deviation, SUCRA Surface Under the Cumulative Ranking score
*All that apply
Changes in the quality and policy relevance of best–worst scaling studies in health over time
| 2010–2016 | 2017–2021 | ||
|---|---|---|---|
| PREFS total score, mean (SD) | 3.1 (0.9) | 3.1 (0.9) | 0.98 |
| | 31 (91.2%) | 130 (99.2%) | <0.01 |
| | 16 (47.1%) | 32 (24.4%) | 0.01 |
| | 21 (61.8%) | 92 (70.2%) | 0.34 |
| | 9 (26.5%) | 41 (31.3%) | 0.59 |
| | 27 (79.4%) | 105 (80.2%) | 0.92 |
| Subjective quality score, mean (SD) | 6.0 (1.7) | 6.7 (1.8) | 0.06 |
| Policy relevance score, mean (SD) | 6.3 (2.0) | 6.9 (1.9) | 0.13 |
SD standard deviation
Determinants of subjective quality in best–worst scaling studies (n = 165)
| Variable | Subjective quality (a) | Subjective quality (b) | Subjective quality (c) | Subjective quality (d) |
|---|---|---|---|---|
| PREFS | 1.305* (0.12) | 1.113* (0.12) | ||
| | 0.682 (1.25) | 0.490 (0.81) | ||
| | 0.721* (0.22) | 0.573* (0.21) | ||
| | 1.679* (0.23) | 1.480* (0.21) | ||
| | 1.074* (0.22) | 0.934* (0.21) | ||
| | 1.859* (0.30) | 1.629* (0.27) | ||
| Policy relevance | 0.174* (0.05) | 0.192* (0.05) | ||
| Heterogeneity analysis | 0.458 (0.23) | 0.352 (0.22) | ||
| No development methods | −1.195* (0.50) | −0.851 (0.49) | ||
| BIBD | 0.741* (0.20) | 0.748* (0.20) | ||
| Sample size (100) | −0.023* (0.01) | −0.021* (0.01) | ||
| Constant | 2.546* (0.40) | 2.703* (1.26) | 1.401* (0.40) | 1.488 (0.86) |
| 0.42 | 0.47 | 0.53 | 0.57 |
BIBD balanced incomplete block design, *p < 0.05
Ordinary least squares performed on the full data set of all studies in both time periods
Robust standard errors in parentheses
Determinants of policy relevance and PREFS in best–worst scaling studies (n = 165)
| Variable | Policy relevance (a) | Policy relevance (b) | PREFS (c) | PREFS (d) |
|---|---|---|---|---|
| PREFS | 0.404* (0.17) | |||
| Subjective quality | 0.362* (0.09) | 0.323* (0.02) | 0.329* (0.03) | |
| Policy relevance | 0.000 (0.03) | −0.005 (0.03) | ||
| Heterogeneity analysis | 0.116 (0.37) | −0.069 (0.34) | 0.142 (0.13) | |
| No development methods | 0.212 (0.63) | 0.645 (0.65) | 0.024 (0.13) | |
| BIBD | −0.133 (0.30) | −0.394 (0.30) | −0.208 (0.11) | |
| Sample size (100) | 0.042* (0.01) | 0.047* (0.01) | −0.001 (0.00) | |
| Constant | 5.313* (0.62) | 4.404* (0.63) | 0.943* (0.22) | 0.942* (0.23) |
| 0.06 | 0.13 | 0.42 | 0.44 |
BIBD balanced incomplete block design, *p < 0.05
Ordinary least squares performed on the full data set of all studies in both time periods
Robust standard errors in parentheses
Comparison of best-worst scalings and discrete-choice experiments
| DCE | BWS | |
|---|---|---|
| Number of studies per year, mean (range) | 60 (32–98) | 26 (11–37) |
| Used of formal qualitative methods, | 258 (86%) | 51 (39%) |
| Number of attributes (DCE) or objects (BWS), median (range) | 5 (2–21) | 12 (5–60) |
| Number of choice tasks per respondent, median (range) | 12 (1–32) | 12 (1–300) |
| Sample size, median (range) | 401 (35–30,6000) | 220 (15–9289) |
| Country/location of study, | ||
| USA | 50 (17%) | 67 (41%) |
| EU | 72 (24%) | 25 (19%) |
| UK | 50 (17%) | 5 (3%) |
| Australia | 30 (10%) | 5 (3%) |
| Canada | 25 (8%) | 4 (2%) |
| Other/not specified | 102 (34%) | 25 (32%) |
| Mode of survey administration*, | ||
| Online | 172 (57%) | 100 (77%) |
| Self-administered (paper) | 69 (23%) | 18 (14%) |
| Administered | 44 (15%) | 17 (13%) |
| Other/not specified | 16 (5%) | 0 (0%) |
| Perspective*, | ||
| Patient/consumer | 110 (37%) | 88 (67%) |
| Provider/producer | 39 (13%) | 43 (33%) |
| Citizen/societal | 81 (27%) | 10 (8%) |
| Other/not specified | 98 (33%) | 0 (0%) |
| Analytic program*, | ||
| Stata | 94 (31%) | 29 (22%) |
| Sawtooth | 16 (5%) | 36 (28%) |
| NLOGIT | 65 (22%) | 10 (8%) |
| | 10 (3%) | 20 (15%) |
| SAS | 17 (6%) | 12 (9%) |
| Other/not specified | 99 (33%) | 0 (0%) |
BWS best–worst scaling, DCE discrete-choice experiment, EU European Union
*All that apply
| Best–worst scaling is a theory-driven method increasingly being used in health. While best–worst scaling can be applied to study preferences when applied to single or multiple product profiles defined by attributes and levels, it can also be applied to study how a finite set of objects should be prioritized without the use of levels. In this instance, best–worst scaling may be referred to as case 1, object case, object scaling, MaxDiff, or simply as best–worst scaling. |
| The average number of best–worst scaling studies focusing on prioritization has jumped from under five per year prior to 2017 to now 26.4 per year. It is now also used in all regions of the world and for a wide variety of purposes. The average sample size for best–worst scaling has increased over time, likely owing to the growing use of online panels to sample respondents, and the increase in the likelihood that the study is relevant to policy makers. |
| While the PREFS measure of study quality has received some criticism in recent years, we find that it is highly associated with a global assessment of subjective quality. This said, we also find that several other factors, including policy relevance and issues associated with both the design and analysis, will impact quality and could be included in future measures. |