Stavros Petrou1, Christine Hockley. 1. National Perinatal Epidemiology Unit, University of Oxford (Old Road Campus), Old Road, Headington, Oxford, England. stavros.petrou@npeu.ox.ac.uk
Abstract
BACKGROUND: An important consideration for studies that derive utility scores using multi-attribute utility measures is the psychometric integrity of the measurement instrument. Of particular importance is the requirement to establish the empirical validity of multi-attribute utility measures; that is, whether they generate utility scores that, in practice, reflect people's preferences. We compared the empirical validity of EQ-5D versus SF-6D utility scores based on hypothetical preferences in a large, representative sample of the English population. METHODS: Adult participants in the 1996 Health Survey for England (n=16 443) formed the basis of the investigation. The subjects were asked to complete the EQ-5D and SF-36 measures. Their responses were converted into utility scores using the York A1 tariff set and the SF-6D utility algorithm, respectively. One-way analysis of variance was used to test the hypothetically constructed preference rule that each set of utility scores differs significantly by self-reported health status (categorised as very good, good, fair, bad or very bad). The degree to which EQ-5D and SF-6D utility scores reflect alternative configurations of self-reported health status; illness, disability or infirmity, and medication use was tested using the relative efficiency statistic and receiver operating characteristic (ROC) curves. RESULTS: The mean utility score for the EQ-5D was 0.845 (95% CI: 0.842, 0.849), whilst the mean utility score for the SF-6D was 0.799 (95% CI: 0.797, 0.802), representing a mean difference in utility score of 0.046 (95% CI: 0.044, 0.049; p<0.001). Bland-Altman plots displayed considerable lack of agreement between the two measures, particularly at the lower end of the utility scale. Both measures demonstrated statistically significant differences between subjects who described their health status as very good, good, fair, bad or very bad (p<0.001), as well as monotonically decreasing utility scores (test for linear trend: p<0.001). The SF-6D was between 30.9 and 100.4% more efficient than the EQ-5D at detecting differences in self-reported health status, and between 10.4 and 45.6% more efficient at detecting differences in illness, disability or infirmity and medication use. The area under the curve scores generated by the ROC curves were significantly higher for the SF-6D at the 0.1% significance level when self-reported health status was dichotomised as very good versus good, fair, bad or very bad. However, the AUC scores did not reveal any significant differences in the discriminatory powers of the measures when alternative configurations of illness, disability or infirmity and medication use were examined. CONCLUSIONS: This study provides evidence that the SF-6D is an empirically valid and efficient alternative multi-attribute utility measure to the EQ-5D, and is capable of discriminating between external indicators of health status. However, health economists should also consider other psychometric properties, such as practicality and reliability, when selecting either measure for evaluative purposes.
BACKGROUND: An important consideration for studies that derive utility scores using multi-attribute utility measures is the psychometric integrity of the measurement instrument. Of particular importance is the requirement to establish the empirical validity of multi-attribute utility measures; that is, whether they generate utility scores that, in practice, reflect people's preferences. We compared the empirical validity of EQ-5D versus SF-6D utility scores based on hypothetical preferences in a large, representative sample of the English population. METHODS: Adult participants in the 1996 Health Survey for England (n=16 443) formed the basis of the investigation. The subjects were asked to complete the EQ-5D and SF-36 measures. Their responses were converted into utility scores using the York A1 tariff set and the SF-6D utility algorithm, respectively. One-way analysis of variance was used to test the hypothetically constructed preference rule that each set of utility scores differs significantly by self-reported health status (categorised as very good, good, fair, bad or very bad). The degree to which EQ-5D and SF-6D utility scores reflect alternative configurations of self-reported health status; illness, disability or infirmity, and medication use was tested using the relative efficiency statistic and receiver operating characteristic (ROC) curves. RESULTS: The mean utility score for the EQ-5D was 0.845 (95% CI: 0.842, 0.849), whilst the mean utility score for the SF-6D was 0.799 (95% CI: 0.797, 0.802), representing a mean difference in utility score of 0.046 (95% CI: 0.044, 0.049; p<0.001). Bland-Altman plots displayed considerable lack of agreement between the two measures, particularly at the lower end of the utility scale. Both measures demonstrated statistically significant differences between subjects who described their health status as very good, good, fair, bad or very bad (p<0.001), as well as monotonically decreasing utility scores (test for linear trend: p<0.001). The SF-6D was between 30.9 and 100.4% more efficient than the EQ-5D at detecting differences in self-reported health status, and between 10.4 and 45.6% more efficient at detecting differences in illness, disability or infirmity and medication use. The area under the curve scores generated by the ROC curves were significantly higher for the SF-6D at the 0.1% significance level when self-reported health status was dichotomised as very good versus good, fair, bad or very bad. However, the AUC scores did not reveal any significant differences in the discriminatory powers of the measures when alternative configurations of illness, disability or infirmity and medication use were examined. CONCLUSIONS: This study provides evidence that the SF-6D is an empirically valid and efficient alternative multi-attribute utility measure to the EQ-5D, and is capable of discriminating between external indicators of health status. However, health economists should also consider other psychometric properties, such as practicality and reliability, when selecting either measure for evaluative purposes.
Authors: Oriol Cunillera; Ricard Tresserras; Luis Rajmil; Gemma Vilagut; Pilar Brugulat; Mike Herdman; Anna Mompart; Antonia Medina; Yolanda Pardo; Jordi Alonso; John Brazier; Montse Ferrer Journal: Qual Life Res Date: 2010-03-31 Impact factor: 4.147
Authors: Chen-Wei Pan; Rui-Jie Liu; Xue-Jiao Yang; Qing-Hua Ma; Yong Xu; Nan Luo; Pei Wang Journal: Qual Life Res Date: 2021-05-24 Impact factor: 4.147
Authors: Nick Kontodimopoulos; Evelina Pappa; Angelos A Papadopoulos; Yannis Tountas; Dimitris Niakas Journal: Qual Life Res Date: 2008-11-29 Impact factor: 4.147