Juan M Ramos-Goñi1, Jose L Pinto-Prades, Mark Oppe, Juan M Cabasés, Pedro Serrano-Aguilar, Oliver Rivero-Arias. 1. *HTA Unit, Canary Island Health Service (SESCS) †Red de Investigación en Servicios de Salud en Enfermedades Crónicas (REDISSEC), Canary Islands, Spain ‡EuroQol Research Foundation, Rotterdam, The Netherlands §Yunus Centre for Social Business and Health, Institutes for Applied Health Research and Society and Social Justice Research, Glasgow Caledonian University, UK ∥Department of Economics, Public University of Navarra, Pamplona, Spain ¶National Perinatal Epidemiology Unit (NPEU), Nuffield Department of Population Health, University of Oxford, Oxford, UK.
Abstract
BACKGROUND: The EQ-5D instrument is the most widely used preference-based health-related quality of life questionnaire in cost-effectiveness analysis of health care technologies. Recently, a version called EQ-5D-5L with 5 levels on each dimension was developed. This manuscript explores the performance of a hybrid approach for the modeling of EQ-5D-5L valuation data. METHODS: Two elicitation techniques, the composite time trade-off, and discrete choice experiments, were applied to a sample of the Spanish population (n=1000) using a computer-based questionnaire. The sampling process consisted of 2 stages: stratified sampling of geographic area, followed by systematic sampling in each area. A hybrid regression model combining composite time trade-off and discrete choice data was used to estimate the potential value sets using main effects as starting point. The comparison between the models was performed using the criteria of logical consistency, goodness of fit, and parsimony. RESULTS: Twenty-seven participants from the 1000 were removed following the exclusion criteria. The best-fitted model included 2 significant interaction terms but resulted in marginal improvements in model fit compared to the main effects model. We therefore selected the model results with main effects as a potential value set for this methodological study, based on the parsimony criteria. The results showed that the main effects hybrid model was consistent, with a range of utility values between 1 and -0.224. CONCLUSION: This paper shows the feasibility of using a hybrid approach to estimate a value set for EQ-5D-5L valuation data.
BACKGROUND: The EQ-5D instrument is the most widely used preference-based health-related quality of life questionnaire in cost-effectiveness analysis of health care technologies. Recently, a version called EQ-5D-5L with 5 levels on each dimension was developed. This manuscript explores the performance of a hybrid approach for the modeling of EQ-5D-5L valuation data. METHODS: Two elicitation techniques, the composite time trade-off, and discrete choice experiments, were applied to a sample of the Spanish population (n=1000) using a computer-based questionnaire. The sampling process consisted of 2 stages: stratified sampling of geographic area, followed by systematic sampling in each area. A hybrid regression model combining composite time trade-off and discrete choice data was used to estimate the potential value sets using main effects as starting point. The comparison between the models was performed using the criteria of logical consistency, goodness of fit, and parsimony. RESULTS: Twenty-seven participants from the 1000 were removed following the exclusion criteria. The best-fitted model included 2 significant interaction terms but resulted in marginal improvements in model fit compared to the main effects model. We therefore selected the model results with main effects as a potential value set for this methodological study, based on the parsimony criteria. The results showed that the main effects hybrid model was consistent, with a range of utility values between 1 and -0.224. CONCLUSION: This paper shows the feasibility of using a hybrid approach to estimate a value set for EQ-5D-5L valuation data.
The EQ-5D instrument is the most widely used preference-based health-related quality of life questionnaire in cost-effectiveness analysis. Reimbursement agencies such as the UK National Institute for Health and Care Excellence (NICE) recommend the use of the EQ-5D in submissions to the institute and this partly explains the spread use of the instrument in applied studies.1The original EQ-5D (EQ-5D-3L) is a questionnaire with 5 dimensions (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression) and 3 levels in each dimension (no problems, some problems, and extreme problems).2 Extensive research supports the use of the instrument in many disease areas but recent studies have shown ceiling effects issues, particularly in general population samples.3,4 In response to this, the EuroQol Group proposed a new version of the instrument: the EQ-5D-5L. This new version increased the number of severity levels from 3 to 5 (no problems, slight, moderate, severe, and unable or extreme) describing 3125 (55) possible health states.3 Each health state is usually represented using a 5-digit number (profile) where 11111 indicates perfect health and 55555 the worst health state or pits state.Available EQ-5D-3L value sets cannot be used directly with 5-level version responses. As a temporary solution, an interim scoring algorithm needs to be used.5 Therefore, new valuation studies are necessary to obtain preferences from the general public for EQ-5D-5L health states. The EuroQol Group has developed a valuation protocol to elicit preferences after a series of pilot studies conducted by research teams worldwide.6 A group of researchers based in Spain, the UK, and the Netherlands, has been one of the first teams in implementing this protocol. This manuscript explores the feasibility of a hybrid method to estimate a potential value set for EQ-5D-5L valuation data.
METHODS
Protocol
The results obtained from the pilot studies6 informed the standardized protocol for EQ-5D-5L value sets used in this study.7 The interview process described in the protocol has 5 sections. First, a general welcome and an introduction to the research were given. Next, respondents were asked to provide background information, including their own health using the EQ-5D-5L, age, sex, and experience with illness. This was followed by the composite time trade-off (C-TTO) task, which was administered after giving an explanation of the task, and included 10 EQ-5D-5L C-TTO valuations. The next part was a discrete choice (DC) experiment, which consisted of 7 paired comparisons. Finally, there was a general thank you and goodbye. After each block of tasks (C-TTO and DC experiments) and at the end of the interview, participants were given the opportunity to clarify whether they found difficulties completing the tasks and the overall survey. The EuroQol Group developed the online system to carry out the survey called EuroQol Valuation Technology (EQ-VT).
Eliciting Preferences Methods
C-TTO
The traditional time trade-off (TTO) has been widely used in the EQ-5D-3L valuation studies conducted so far and it is appropriate to value health states considered better than dead.8,9 However, using the traditional TTO method for states worse than dead gives negative values that are normally transformed to be bounded to −1, which has been criticized in the literature.10 Other TTO alternatives to evaluate health states were therefore assessed during the EuroQol pilot studies including lead and lag time.11,12 In the former, additional trading time is included before the health state, whereas in the latter, trading time is included after the health state to be valued. The pilot studies looked at the potential of using these methods in practice and concluded that the protocol should include a composite TTO method.This composite approach involved the use of the traditional TTO approach for states better than dead and lead time TTO for states worse than dead in a single task.13 For the lead time TTO, 10 years lead time and 10 years in the state were used. This lead time method produces a minimum value of −1 and no transformation of negative values is needed. The iterative process used in the original UK valuation exercise8 was adapted to be used in the C-TTO task. The C-TTO design included 86 health states selected using Monte Carlo simulation. The health states were distributed over 10 blocks and each block contained 1 very mild state (1 dimension at level 2, the remaining dimensions at level 1), the pits state 55555, and a balanced set of intermediate states. The EQ-VT randomly assigned respondents to one of the blocks and presented the states in random order.
DC Experiment
The use of DC experiments for health state valuation has received recent attention in the literature.14,15 Modeling ordinal data follows the theoretical foundations of random utility theory.16 Values obtained with DC models have been shown to have patterns similar to those obtained with TTO models.17 The values obtained from DC models are expressed on an arbitrary scale and need to be rescaled on the dead (0) full health (1) scale.17,18 Using DC experiments was also piloted and the results suggested that collecting such information could provide additional useful information to the C-TTO data. Hence, a DC experiment was included as part of the protocol. The DC experiment design included 196 pairs divided in 28 blocks with similar severity representation identified using Bayesian design.19 The EQ-VT randomly assigned respondents to one of the blocks, presented the pairs in random order, and randomized the location of the states within the pair (ie, left and right).
Sampling and Data Collection
Our power calculations estimated that to obtain a 0.01 SE of the observed mean C-TTO, we needed 9735 C-TTO responses. We therefore recruited 1000 participants that after completing the valuations tasks provided 10,000 C-TTO and 7000 DC responses to estimate the models.A 2-stage sampling strategy was designed to obtain a representative sample of the Spanish population. In a first stage, we stratified geographically by Spanish provinces, whereas in a second stage we systematically sample individuals from a panel until an accurate age and sex distribution for that province was achieved. We contracted an independent market research company, which identified respondents and arranged interviews at convenient places. Interviews were conducted face-to-face during June and July 2012 by 33 trained interviewers. Respondents did not receive payment for participating in the survey. A different market research company was contracted to call a random subsample of 15% of respondents as quality control of the process.
Statistical Analyses
Descriptive statistics were used to summarize respondent’s characteristics and responses to the C-TTO and DC experiments.Two sources of data were available to estimate the EQ-5D-5L value set: C-TTO and DC data. To maximize the use of the available data, we implemented a hybrid modeling approach that made use of both C-TTO and DC data to estimate the potential value sets. This hybrid method estimated a unique set of coefficients from a likelihood function obtained multiplying the likelihood functions of a normal distribution for the C-TTO data by the likelihood function of a conditional logit distribution for DC data.20 As the coefficients estimated from a conditional logit are expressed on a latent arbitrary utility scale, we used a rescaled parameter θ, which assumes that the C-TTO model coefficients are proportional to DC model coefficients. See the Appendix for a full description and analytical derivation of the hybrid method. This method combines the utility values elicited in the C-TTO for the 86 health states with utility values elicited in the DC experiment for 196 pairs of states. The dependent variable in the C-TTO part of the model was defined as 1 minus the C-TTO observed values for a given health state to indicate disutility and therefore coefficients expressed utility decrements. In the DC part of the model, the dependent variable was a binary outcome 0/1 indicating the respondent’s choice for each pair of EQ-5D-5L states. We used cluster estimation to acknowledge that for each participant included in the models, 10 C-TTO and 7 DC responses were available.We also present models to estimate C-TTO and DC data separately, to illustrate how the hybrid model combined both types of data. We analyzed C-TTO data using a linear regression model assuming normal distribution in its errors, as it is the C-TTO part of hybrid model. We analyzed DC data using the standard econometric method for ordinal data conditional logit regression.16 To make model coefficients comparable, we rescaled the DC model coefficients using the same rescaling parameter θ that was estimated in the hybrid model.We started exploring the hybrid main effects with a 20-parameter model consisting of 4 dummies for each EQ-5D-5L dimensions using level 1 as the reference. We constructed dummies to represent the additional utility decrement of moving from one level to another. For instance for the mobility dimension we created 4 dummies MO1 to MO4 and the coefficient associated to MO1 indicated the utility decrement of moving from no problems (level 1) to slight problems (level 2), MO2 the additional utility decrement of moving from slight (level 2) to moderate (level 3) problems, and so on. Therefore, the overall decrement of moving from no to moderate problems could be calculated as the sum of the coefficients of MO1 plus MO2. The same set of dummy variables was defined for each of the remaining dimensions: self-care (SC), usual activities (UA), pain/discomfort (PD), and anxiety/depression (AD). We also estimated the model using the definition of dummies implemented in most previous EQ-5D-3L valuation exercises21 and such analyses are available from the authors upon request.Our starting point for the selection of additional covariates for the models was the US valuation study.9 Several variables were defined. For example, D1 as the number of dimensions at levels 2, 3, 4, or 5 beyond the first; IJ as the number of dimensions at level J beyond the first; K45 as the number of dimensions at level 4 or 5, and others. Squared of all terms were also introduced to assess nonlinear effects on the dependent variable. We included all terms first, and use a stepwise approach removing nonsignificant terms and ensuring model consistency.
Exclusion Criteria and Interviewer Assessment
We excluded observations using the following 2 criteria: (1) respondents with a positive slope on a regression between his/her values and the severity of the health states indicating that the participant provided higher utility values for poorer health states on average; and (2) respondents who valued all states equal to death.We used the Kruskal-Wallis test to assess the differences among mean values by interviewer in the C-TTO responses. We further assess this including dummies that identified interviewers in the main effects model and using an F test among the dummy coefficients.
Evaluation of Model Performance
We evaluated model performance using (1) logical consistency of parameters; (2) goodness of fit; and (3) parsimony. Estimated coefficients are said to be logically consistent if magnitude values from logically worse health states are lower than those from logically better health states. In our estimated results this is translated to all main effects coefficients being positive. Goodness of fit was assessed using the Akaike (AIC) and the Bayesian information criteria (BIC). Finally, the principle of parsimony stated that if competing models were similar in logical consistency and goodness of fit, the model with fewer parameters was preferred. These 3 criteria were used to compare different hybrid model specifications using different interaction terms. However, prediction accuracy evaluated using mean square error or mean absolute error are not appropriate measures in this case, given the lack of an appropriate counterfactual for hybrid model predictions.We present the results of the regression with the main effects and the best-fitted model with significant terms. Statistical analysis and regression modeling were conducted in Stata MP 11.22 The hybrid model was not available in any standard package and was programmed in Stata specifically for this study.
Comparison With EQ-5D-3L Value Set
We calculated and compared predictions for the 3125 health states using the final selected EQ-5D-5L value set and the interim solution to calculate EQ-5D-3L values5 presented for a selected set of health states covering mild, moderate, and severe states. In addition, we compared the kernel density functions for the index values of the 243 states of the Spanish EQ-5D-3L value set23 and for the 3125 states of the final selected EQ-5D-5L value set.
RESULTS
Descriptive Statistics
Twenty-seven participants from the 1000 were removed following the exclusion criteria: 18 respondents with a positive slope on a regression between his/her values and the severity of the health states and 9 respondents who valued all states equal to death. Overall the excluded observations were older with no studies or primary school studies than the estimation sample (Table 1). The estimation sample was similar in the distribution of employment status; mean age and sex distribution than Spanish population, but the estimation sample had a larger number of respondents in age group 25–34 and fewer participants over 75 (Table 1). The self-reported health using the EQ-5D-5L of respondents showed that 18.90% reported problems in usual activities and 30.8% reported problems in anxiety or depression dimension (Table 1). For the remaining dimensions, proportions of respondents with problems were <10% (Table 1).
TABLE 1
Background Characteristics of Excluded Sample, Estimation Sample, and Comparison Against Spanish General Population
Background Characteristics of Excluded Sample, Estimation Sample, and Comparison Against Spanish General PopulationThe outcome of the quality control reported no incidences, but we observed significant differences between interviewers in the valuations obtained with Kruskal-Wallis (P<0.0001) and F tests (P<0.0001).We report further descriptive information about the C-TTO and the DC data in the online supplemental digital content (Tables 1 and 2 and SDC Figures 1 and 2, Supplemental Digital Content, http://links.lww.com/MLR/A839).
Modeling Results
The hybrid model with main effects was a consistent model predicting utilities with a range between 1 and −0.224 (Table 2). Both, the C-TTO and the DC models derived logical inconsistencies. It is shown how the hybrid model corrects the inconsistencies in the C-TTO model by using DC information and the DC model inconsistencies with C-TTO information. As described in the Appendix, the log likelihood in the hybrid model was approximately the sum of the log likehood of both C-TTO and DC models separately.
TABLE 2
Estimation Results for Hybrid Model Using Main Effects Only
Estimation Results for Hybrid Model Using Main Effects OnlyAfter exploring many interactions terms, the best-fitted estimation model we found was using the interaction terms D12 and K452 (Table 3). The constant term of this model was suppressed as the D12 term captures the effect of the constant. The reduction of the hybrid log likehood estimation for those terms inclusion only reduces the AIC and BIC by 0.4%. About 3/4 of this reduction was produced by a reduction in the C-TTO part of the model.
TABLE 3
Estimation Results Using Best-fitted Model
Estimation Results Using Best-fitted ModelThe main effects hybrid model produced a wider range of utility values at the upper and lower end of the scale compared to the hybrid model including the terms D12 and K452 (Table 4). Given that the improvement in goodness of fit between the main effects and the best-fitted model was marginal (0.4%), we have selected the estimation results from the hybrid model with main effects as the value set for this methodological study based on the parsimony criteria.
TABLE 4
Predicted Utility Values for Selected Health States for Estimated Models and From the Interim EQ-5D-3L Solution
Predicted Utility Values for Selected Health States for Estimated Models and From the Interim EQ-5D-3L SolutionThe probability density functions of the Spanish EQ-5D-3L value set and the EQ-5D-5L value set presented here (Fig. 1) show a symmetric distribution for EQ-5D-5L, whereas the EQ-5D-3L has a bimodal distribution. The proportion of states considered worse than death is lower in the EQ-5D-5L value set.
FIGURE 1
Probability density function of EQ-5D-3L and EQ-5D-5L value sets.
Probability density function of EQ-5D-3L and EQ-5D-5L value sets.
DISCUSSION
In this manuscript we have reported the performance of a hybrid approach to estimate a value set for the EQ-5D-5L questionnaire. The choice of the hybrid approach is based on the assumption that subjects have a unique utility function that generates both the sets of responses. If utilities were the same in the C-TTO and DC methods there would be no need of combining them except for having more precise estimates. Our hypothesis is that this disparity is related to the choice versus matching discrepancy as it is one of the most replicated effects in preference elicitation literature.24–27 Some researchers have tried to find arguments in favor of one method or the other.28 We believe that neither matching-based (like C-TTO) nor choices (DC) methods are unbiased.29 Matching methods are influenced by scale compatibility and, in the case of C-TTO, by loss aversion.30 Choices are also subject to problems as it has been shown that responses are more lexicographic in choice than in matching. Evidence on the prominence effect suggests that in choices, subjects tend to choose the alternative that is better with respect to the more important attribute without paying enough attention to how much better the option is.31 Finally, it has also been observed that subjects perceive the distances between outcomes differently when comparisons are conducted in a separate or in a joint model, again without clear evidence that one method is better than another.32 We then do not think that the “true” values can be inferred from 1 single method and for this reason we suggest that it can make sense to use a hybrid approach. We are not claiming that the biases present in 1 method compensate the biases present in the other so that adding up the 2 methods we get unbiased results. There is no evidence to suggest this is the case. Even in the absence of such empirical evidence, we think that there are reasons to suspect that, at least, the potential biases present in the C-TTO are not enhanced by choices of the DC experiment, rather the opposite.In our results, introducing the D12 and K452 terms provided a better fit to the data suggesting that the selected value set should have included such effect. However, the improvement in fit is mostly captured by the C-TTO part of the model. Given that the improvement in the goodness of fit of using D12 and K452 variables was marginal as suggested by the AIC and the BIC, we selected the main effects model using the parsimony criterion.As far as we are aware there is no EQ-5D-5L value set available in the literature for direct comparison. Given the lack of such information we compared our model with the Spanish value set for the 3L version of EQ-5D.23 Our model has higher values in the upper scale compared to the EQ-5D-3L valuation study conducted in Spain. This was expected as the label for level 2 in the 3L version is “some problems” and the label for level 2 in the 5L version is “slight problems.” However, the utility decrement of level 2 for AD dimension is higher in our study than in the 3L study. A possible explanation for this is the fact that the self-reported health results in our sample showed a high rate of people reporting problems in the AD dimension, causing them to put more weight on this dimension in the valuation tasks. On the other side of the scale the pits state prediction was higher in our study. Something expected as well, as the change in the wording of the mobility level “confined to bed” in EQ-5D-3L to “unable to walk about” in EQ-5D-5L has changed the definition of the worst possible health state. Given that this new level is not as severe as “confined to bed” (which had the largest decrement of all dimensions in the Spanish 3L study) it is expected to obtain higher valuations for 55555 than for 33333. We observed a lower proportion of negative values in our study in comparison with the Spanish EQ-5D-3L value set. The number of nonextreme health states has increased >10-fold in the EQ-5D-5L compared with the 3L version reducing the proportion of the extreme health states, and partly explaining why the kernel density distribution of the 5L value set shows a smaller area below 0 than the 3L value set.The hybrid model is not exempt of limitations. The assumption of normal distribution for the errors in the C-TTO part suffers from problems related to the robustness of the estimation of SE and related to the violation of the homoscedasticity condition. In addition, the use of conditional logit model for DC data does not explicitly consider within respondents correlations. We try to limit the impact of these limitations by using cluster estimations of the SEs of the estimated coefficients. However, further exploration of more sophisticated hybrid models for both types of data is needed. For example, the use of random coefficient models for the C-TTO part and mixed (conditional) logit models for the DC part of the model.We have observed significant differences in the valuations observed by interviewers that lead us to be cautious about suggesting a final value set to use in practice in Spain. We are now trying to understand the nature of these differences, which could be attributable to several factors including issues with the EQ-VT software, the use of C-TTO, or noncompliance of the protocol by the interviewer.We present here a novel methodological approach to obtain an EQ-5D-5L value set. Our results show the feasibility of using a hybrid model to estimate a value set for EQ-5D-5L valuation data.Supplemental Digital Content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal's Website, www.lww-medicalcare.com.
Authors: Christopher McCabe; John Brazier; Peter Gilks; Aki Tsuchiya; Jennifer Roberts; Anthony O'Hagan; Katherine Stevens Journal: J Health Econ Date: 2006-02-24 Impact factor: 3.883
Authors: Ben van Hout; M F Janssen; You-Shan Feng; Thomas Kohlmann; Jan Busschbach; Dominik Golicki; Andrew Lloyd; Luciana Scalone; Paul Kind; A Simon Pickard Journal: Value Health Date: 2012-05-24 Impact factor: 5.725
Authors: Gang Chen; Miguel A Garcia-Gordillo; Daniel Collado-Mateo; Borja Del Pozo-Cruz; José C Adsuar; José Manuel Cordero-Ferrera; José María Abellán-Perpiñán; Fernando Ignacio Sánchez-Martínez Journal: Patient Date: 2018-12 Impact factor: 3.883
Authors: Pedro L Ferreira; Patrícia Antunes; Lara N Ferreira; Luís N Pereira; Juan M Ramos-Goñi Journal: Qual Life Res Date: 2019-06-14 Impact factor: 4.147
Authors: Sanjeewa Kularatna; Joshua Byrnes; Yih Kai Chan; Chantal F Ski; Melinda Carrington; David Thompson; Simon Stewart; Paul A Scuffham Journal: Qual Life Res Date: 2017-08-01 Impact factor: 4.147