Literature DB >> 30882636

Comparison of the psychometric properties of the EQ-5D-3L and SF-6D in the general population of Chengdu city in China.

Longchao Zhao¹, Xiang Liu¹, Danping Liu¹, Yan He², Zhijun Liu³, Ningxiu Li¹.

Abstract

The EQ-5D-3L and SF-6D are the most commonly used economic evaluation instruments. Data comparing the psychometric properties of the instruments are scarce in the Chinese population. This study compared the psychometric properties of these measures in the Chinese general population in Chengdu.From October to December 2012, 2186 respondents (age ≥18) were selected from urban and rural areas of Chengdu, China, via multistage stratified cluster sampling. Correlations, scatter plots and Bland-Altman plots were used to explore the relationships between the 2 measures. Ceiling and floor effects were used to analyze the score distribution. The known-groups method was used to evaluate discriminant validity.Among 2186 respondents, 2182 completed the questionnaire, and 2178 (18-82 years old, mean 46.09 ± 17.49) met the data quality requirement. The mean scores for the EQ-5D-3LCN, EQ-5D-3LUK, and SF-6DUK were 0.95 (Std: 0.11), 0.93 (Std: 0.15), and 0.79 (Std: 0.12), respectively. The correlations between domains ranged from 0.16 to 0.51. The correlation between the EQ-5D-3LCN and SF-6DUK and between the EQ-5D-3LUK and SF-6DUK was 0.46. The scatter plots and Bland-Altman plots demonstrated poor agreement between the EQ-5D-3L and SF-6D. The floor and ceiling effects were respectively 0.05% and 74.60% for the EQ-5D-3L and 0.05% and 2.53% for the SF-6DUK. The EQ-5D-3LCN, EQ-5D-3LUK and SF-6D have good discriminant validity in different sociodemographic and health condition groups. The SF-6D has higher level of discriminant validity in moderately healthy groups in the EQ-5D-3L full-health population.Both the EQ-5D-3L and SF-6D are valid economic evaluation instruments in the Chinese general population in Chengdu but do not seem to be interchangeable. The EQ-5D-3L has a higher ceiling effect and higher level of discriminant validity among different sociodemographic groups, and the SF-6D has a lower ceiling effect and higher level of discriminant validity in health condition groups. Users may consider the evidence in the choice of these instruments.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2019 PMID： 30882636 PMCID： PMC6426629 DOI： 10.1097/MD.0000000000014719

Source DB: PubMed Journal: Medicine (Baltimore) ISSN： 0025-7974 Impact factor: 1.889

Introduction

As the pressure to contain the costs of medical care escalates, there is an increasing use of cost-utility analysis (CUA) to perform economic evaluation. CUA allows decision makers to compare the economic value of different health care interventions.[ In CUA, the quality-adjusted life year (QALY) is the widely applied health indicator, which combines the attributes of the length and quality of life (QOL) into a single health utility, whereby 1.0 corresponds to full health and 0.0 corresponds to death.[ Health utility can be estimated by 2 methods: direct preference elicitation, preference-based health state classification systems.[ Direct methods, such as standard gamble (SG), time trade-off (TTO), and visual analogue scale (VAS), are time-consuming and resource-intensive in calculating health utility. In contrast, the preference-based health state classification systems are increasingly used in CUA and are more convenient in assessing health utility.[ These instruments can define the health state based on a health status classification system and then assign a utility score to each health state by using a scoring algorithm that incorporates population preferences.[ Several general health-related quality of life (HRQOL) instruments have been developed to estimate health utility. For example, the Health Utilities Index (HUI),[ three-level EQ-5D (EQ-5D-3L),[ five-level EQ-5D (EQ-5D-5L),[ and the Short-Form Six-Dimension (SF-6D) [ are widely used health utility index instruments. Among the health utility index instruments, the EQ-5D and SF-6D are 2 of the most commonly used preference-based measurements in the world.[ The Chinese pharmaceutical economic research guide suggests that utility measures should use the country-specific value set.[ The EQ-5D and SF-6D utility value sets are developed by different methods in different countries and regions.[ The EQ-5D-3L value sets have been produced for many countries or regions using TTO,[ such as the UK,[ US,[ Australia,[ and Japan.[ Recently, there has been increased use of the EQ-5D in China after the preference-based EQ-5D-3L value sets for the mainland China population were developed by the TTO method.[ Previous studies of the EQ-5D-3L in China either use other countries’ value sets or are restricted to use as an instrument to report HRQOL problems.[ In contrast, the SF-6D value sets were first developed using SG in the UK,[ Hong Kong,[ Japan,[ Portugal [ and Brazil.[ However, a preference-based value set for the mainland China population still has not been developed. The EQ-5D-3L was recommended for use in health technology assessment by the China Guidelines for Pharmacy Economic Evaluations in 2015.[ Derived from the SF-36 and SF-12, the SF-6D is also a widely used instrument in economic evaluations,[ and previous studies have validated the SF-6D in several population groups.[ However, the application of the SF-6D in mainland China is limited. To date, several studies have compared the EQ-5D and SF-6D in various general populations and patient groups and suggest that they are interchangeable in different target populations.[ The CUA is one of the most important indicators used by decision makers in health technology assessment. Different instruments may lead to different economic evaluation outcomes, which may influence healthcare decisions.[ However, little is known about the performance of the EQ-5D and SF-6D in mainland China's general population. The aim of this study is to compare the performance of the EQ-5D-3L and SF-6D in the general population of Chengdu city in mainland China.

Methods

Study design

The survey was conducted in Chengdu, a city in southwestern China, from October to December 2012. A multistage stratified cluster sampling method was used to select respondents. Respondents were recruited if they were 18 years old and above. In the study, 5 districts (towns) were selected from urban areas (counties) according to economic level. Within each district or town, 5 communities or villages were selected according to the geographic location and economic level. Within each selected community or village, 60 households were randomly selected. Subsequently, in each household, all residents over 18 years old were chosen for the survey. A total of 2182 respondents were recruited, consented to participate, and completed questionnaires. All the respondents provided informal consent and were interviewed by trained interviewers using the standard questionnaire.

Instruments and measures

The questionnaire contained questions regarding demographics (age and sex), socioeconomic status (marriage, education, employment, annual household income and health insurance), and health status (emotions, chronic disease, recent health status and self-reported health status). The questionnaire also includes the Chinese versions of the EQ-5D-3L and SF-36v2. The EQ-5D-3L was developed by the EuroQol Group and consists of 5 health dimensions (mobility, self-care, usual activities, pain/discomfort and anxiety/depression), and each dimension has 3 levels (no problems, some problems, and major problems).[ Thus, it can describe 243 (35) health statuses. Using the scoring algorithm, each health status can be assigned a utility score. The SF-6D is a preference-based instrument derived from the SF-36 and SF-12.[ This study used the Chinese version of the SF-36v2 for data collection. The SF-6D consists of 6 dimensions, and each dimension has 4 to 6 levels: physical functioning (6 levels), role limitations (4 levels), social functioning (5 levels), pain (6 levels), mental health (5 levels), and vitality (5 levels). Thus, the instrument can describe 18,000 possible health statuses, which can also be assigned a utility score by using the population-based preference algorithm.

Statistical analysis

Currently, there is no SF-6D algorithm based on the preference value set for mainland China's population. Therefore, we use the UK population-based scoring algorithm to calculate the SF-6D utility, and the utility score ranges from 0.29 to 1.00.[ The EQ-5D-3L scoring algorithm for mainland China was recently developed by TTO.[ To compare utility scores calculated by the same population preference, we also used the UK scoring algorithm for the EQ-5D-3L.[ The EQ-5D-3L China TTO preference value ranged from −0.149 to 1.00.[ The EQ-5D-3L UK preference value ranged from −0.114 to 1.00. A utility score under 0 represents a health status that is considered worse than being dead. The utility of the EQ-5D-3L calculated by the China TTO value set was represented as EQ-5D-3LCN, and the utility calculated by the UK TTO value set was represented as EQ-5D-3LUK. The utility of the SF-6D calculated by the UK value set was represented as SF-6DUK. The respondents’ demographic characteristics and item distributions were described in numbers and percentages of the sample size. Continuous variables, including utility scores, and EQ-VAS scores, are presented as the mean and standard deviation (Std). Health status was measured by emotions, chronic conditions, visits to the doctor in the past 2 weeks, and self-reported health. The chronic conditions include diabetes, hyperlipidemia, hypertension, heart disease, stroke, respiratory disease, liver disease, gastrointestinal disease, bone and joint disease, and cancer. The chronic conditions were diagnosed by a doctor in a hospital or community health service center. The Spearman correlation was used to evaluate the correlations between domains and index scores as follows: negligible <0.20; poor 0.20 to 0.30, moderate 0.31 to 0.50; and strong >0.50.[ We also used a scatter plot and Bland-Altman plot to evaluate the relationship between the EQ-5D-3L and SF-6D utility scores. The level of agreement between the EQ-5D-3L and SF-6D was analyzed by the Bland-Altman plot,[ which is an informative analytic method that allows for the identification of the relationship between measurement error and the best estimate of the true value. The average of 2 measures was plotted on the x-axis, and the mean difference between the 2 measures was plotted on the y-axis to check the systematic error. Good agreement between the 2 measures would indicate a mean difference close to 0 and 95% of the differences falling within 2 standard deviations of the mean difference. The ceiling and floor effects were used to compare the sensitivity of the EQ-5D-3L and SF-6D. The ceiling effect was assessed by computing the percentage of respondents reporting no problems (11111 for the EQ-5D-3L and 111111 for the SF-6D). The floor effect was assessed by computing the percentage of respondents reporting worse levels (33333 for the EQ-5D-3L and 645655 for the SF-6D). Discriminant validity was used to assess the instruments’ ability to distinguish groups with different demographic characteristics and health statuses.[ In terms of social determinants of health (SDH),[ we categorized 3 levels of external indicators: demographic characteristics, family indicators, and health conditions. The first level includes age, sex, education, and employment. The second level includes annual household income per member, marriage, and place of residence. The third level includes the quality of life score (QOL score), self-reported health status, the number of chronic diseases, and doctor visits in the last 2 weeks, as follows: We used the median QOL score (QOL score = 80) as a cutoff point to divide the respondents into 2 groups. The QOL score was obtained by a self-reported QOL item that asked the respondents to rate their physical health, mental health, social relationships, and living environment on a scale of 0 (worst) to 100 (best); The self-reported health status item asked the respondents to rate their overall level of health as excellent, very good, good, fair, or poor; We categorized the respondents’ chronic disease status into 3 subgroups: 0 = no chronic disease, 1= one chronic disease, 2+ = 2 or more diseases; The respondents were asked whether they had visited a doctor in the past 2 weeks, and their responses were recorded as “yes” or “no”. The known group validity was evaluated by t tests and analysis of variance (ANOVA). The effect size (ES) was used to detect health differences.[ The ES calculated by the mean difference found in utility divided by the standard deviation of utility and Cohen's moderate ES of 0.2 to 0.5 was adopted as the minimally important difference (MID) in this study.[ We calculate the ES between each characteristic subgroup to estimate the discriminant validity of the index score. The relative efficiency (RE) was also used to evaluate whether 1 instrument is more efficient or sensitive than another or more likely to result in a statistically significant difference between groups of respondents known to differ.[ RE can be calculated by the F statistic ratio or the square of the t ratio between 2 measurements: RE > 1 indicates that the comparator measure has greater discriminating power or responsiveness than the reference measure and vice versa.[ P-values less than .05 were considered statistically significant. All statistical analyses were two-sided and performed using R software (version 3.4.2; R Foundation for Statistical Computing, Vienna, Austria).

Results

Sample demographic characteristics

A total of 2182 respondents were randomly selected by the multistage sampling method from the rural and urban areas of Chengdu, Sichuan Province, China, and 2178 respondents completed the questionnaires. The respondents’ ages ranged from 18 to 82, and the mean age was 46.09 (Std 17.49). Female respondents comprised 55.28% of the sample size. Those who were married accounted for 76.26% of the sample. A total of 11.85% of the sample had graduated from university. A total of 30.9% of the respondents had chronic diseases, and 26.95% had experienced discomfort or consulted a doctor 2 weeks before the survey. The mean of the EQ-5D-3LCN score was 0.95 (Std: 0.11) (median 1.0; interquartile range 0.13), and that of the EQ-5D-3LUK score was 0.93 (Std: 0.15) (median 1.0; interquartile range: 0.15). The mean of the SF-6DUK score was 0.79 (Std: 0.12) (median: 0.81; interquartile range: 0.19).

Relationship between the EQ-5D-3L and SF-6D

Correlations between the EQ-5D-3L and SF-6D domains are presented in Table 1. The SF-6D domains have a higher correlation with related domains on the EQ-5D-3L. The correlations between domains were as follows: 0.32 between physical functioning and mobility, 0.21 between physical functioning and self-care, 0.26 between physical functioning and usual activities, 0.36 between physical functioning and pain/discomfort, 0.31 between role limitation and pain/discomfort, 0.25 between social functioning and pain/discomfort, 0.24 between social functioning and anxiety/depression, 0.51 between pain and pain/discomfort, and 0.20 between mental health and anxiety/depression. The vitality domain of the SF-6D has no counterpart domain on the EQ-5D-3L and was moderately correlated with the EQ-5D-3L pain/discomfort domain (r = 0.30).

Table 1

Correlations among dimensions of the EQ-5D-3L and SF-6D (n = 2,178).

Level of agreement between utility scores

The Spearman correlation between the EQ-5D-3LCN and SF-6D UK (Fig. 1) and between the EQ-5D-3LUK and SF-6DUK (Fig. 2) was 0.46. A notable disagreement can be observed on both ends of the plot. The lowest EQ-5D-3L utility scores tended to have higher SF-6D utility scores. The highest scores on the EQ-5D-3L (EQ-5D-3L utility = 1.00) were associated with a very wide score range on the SF-6DUK, from 0.46 to 1.00, which displays the high ceiling effect of the EQ-5D-3L.

Figure 1

Scatter plot between the EQ-5D-3LUK and SF-6DUK.

Figure 2

Scatter plot between the EQ-5D-3LCN and SF-6DUK.

Scatter plot between the EQ-5D-3LUK and SF-6DUK. Scatter plot between the EQ-5D-3LCN and SF-6DUK. The Bland-Altman plots show patterns similar to those of the scatter plots between the EQ-5D-3L and SF-6D utilities (Figs. 3 and 4). The mean difference between the EQ-5D-3LCN and SF-6DUK is 0.156, with a 95% limit of agreement of −0.067 to 0.378. A total of 82 (3.76%) observations were out of the 95% limit of agreement. The mean difference between the EQ-5D-3LUK and SF-6DUK is 0.137, with 62 (2.85%) observations out of the 95% limit of agreement of −0.139 to 0.414. Bland-Altman plots indicate an acceptable agreement between 2 instruments. Notably, the figures also show a nonrandom mean difference between the EQ-5D-3L and SF-6D, and the EQ-5D-3L utilities were larger than the SF-6D utilities at the upper end and smaller at the lower end. The limit of agreement in the second figure is wider than that in the first, indicating that the difference in variation between the EQ-5D-3LUK and SF-6DUK was greater than that between the EQ-5D-3LCN and SF-6DUK.

Figure 3

Bland-Altman plot between the EQ-5D-3LCN and SF-6DUK.

Figure 4

Bland-Altman plot between the EQ-5D-3LUK and SF-6DUK.

Bland-Altman plot between the EQ-5D-3LCN and SF-6DUK. Bland-Altman plot between the EQ-5D-3LUK and SF-6DUK.

Ceiling and floor effects

Tables 2 and 3 show the score distribution of both the EQ-5D and SF-6D dimensions. A large proportion of patients reported no problems in either the EQ-5D-3L or SF-6D dimensions except in the vitality dimension of the SF-6D. All domains of the EQ-5D-3L have higher ceiling effects (>80%) than those of the SF-6D. Floor effects can be negligible for 2 measurements’ dimensions except for the role limitation domain of the SF-6D (21.03%).

Table 2

Frequency distribution of EQ-5D-3L scores by dimensions (%) (n = 2,178).

Table 3

Frequency distribution of SF-6D scores by dimensions (%) (n = 2,178).

Frequency distribution of EQ-5D-3L scores by dimensions (%) (n = 2,178). Frequency distribution of SF-6D scores by dimensions (%) (n = 2,178). The EQ-5D-3LCN and EQ-5D-3LUK tend to have very high ceiling effects (n = 1625, 74.60%) and a low floor effects (0.05%) (Fig. 5). The EQ-5D-3L utility scores are skewed toward high scores and are more skewed than the SF-6DUK utility scores. The SF-6DUK has low floor (0.05%) and ceiling effects (n = 55, 2.53%), and the distribution of the SF-6DUK utilities is more normal than that of the EQ-5D-3L utilities.

Figure 5

Health utility histogram of the EQ-5D-3L and SF-6D (n = 2,178).

Descriptive statistics and discriminant validity

Table 4 shows the discriminant ability of the EQ-5D-3LCN, EQ-5D-3LUK and SF-6DUK in different sociodemographic groups. All of the utility scores were significantly different according to age, gender, marriage, education, employment and household incomes, but not according to health insurance. Specifically, respondents who were male, were younger, were more educated, were married, were employed, had higher household incomes, and had health insurance reported higher utility scores. In the adjacent sociodemographic characteristic groups, the effect sizes of 3 utilities show higher level of discriminant validity (ES > 0.20).

Table 4

Discriminant validity of the EQ-5D-3L and SF-6D in different demographic populations.

Discriminant validity of the EQ-5D-3L and SF-6D in different demographic populations. Table 5 shows the discrimination of the EQ-5D-3LCN, EQ-5D-3LUK and SF-6DUK utility in different health groups. All the utility scores can discriminate the groups into the following different health indicator groups: self-reported health status, number of chronic diseases, outpatients in the recent 2 weeks, emotions and QOL score groups (P < .01). The utility scores were lower in the poor health groups than in the better health groups (ES < 0), except for the SF-6DUK score in the emotion group of “better” and “as usual”. The absolute value of ES > 0.20 indicates that the utility scores can discriminate among subgroups of health indicators. The RE shows that both the EQ-5D-3LCN and EQ-5D-3LUK were less discriminating than the SF-6D in self-reported health groups, outpatient groups, and QOL score groups (RE < 1.00) but not in chronic disease groups (RE > 1.00).

Table 5

Discriminant validity of the EQ-5D-3L and SF-6D in different health groups.(n = 2178).

Discriminant validity of the EQ-5D-3L and SF-6D in different health groups.(n = 2178). In each sociodemographic group and health group, both the EQ-5D-3LCN and EQ-5D-3LUK are greater than the SF-6DUK, and the EQ-5D-3LCN is greater than the EQ-5D-3LUK (Tables 4 and 5). The result also shows that the standard deviation of the EQ-5D-3LCN is lower than that of the EQ-5D-3LUK and SF-6DUK. Although the REs of the EQ-5D-3LCN and EQ-5D-3LUK were greater than 1.00 in single indicators of sociodemographic characteristics (age, gender, education, marriage status, employment, annual household income, chronic conditions, and emotions), the SF-6D shows higher level of discriminant validity in comprehensive health status (self-reported health status and QOL score groups).

Discriminating among respondents with better health

Fifty five (2.53%) respondents reported the best health condition on the SF-6D (the SF-6D health status was “111111”). However, 1,625 (72.4%) respondents reported the best health condition on the EQ-5D-3L (the EQ-5D-3L health status was “11111”). The mean EQ-5D-3L for those with the best SF-6D health status (“111111”) was 1.00. Conversely, the SF-6DUK utility shows a normal distribution on the EQ-5D-3L full-health respondents (“11111”), and the mean was 0.82 (Std: 0.10), with scores ranging from 0.46 to 1.00. Table 6 shows the SF-6D discriminant validity in the EQ-5D-3L full-health groups. Among the EQ-5D-3L full-health respondents, the SF-6DUK index scores were significantly different in subgroups by age, employment, annual household income, self-reported health, emotions, outpatients, and QOL (P < .05). Respondents with better self-reported health, better emotional status, no chronic disease and no outpatient status have a higher SF-6DUK utility than people with poor or worse conditions. The effect sizes of the SF-6DUK in these groups also show higher level of discriminant validity (ES > 0.20), except for in the education and household income groups (ES < 0.20).

Table 6

Discriminant validity of the SF-6D in the EQ-5D full-health group (n = 1,625).

Discussion

Evidence regarding the performance of the EQ-5D-3L and SF-6D in the Chinese general population was provided in this study. The results show that the 2 measurements demonstrated good discriminant validity in the general population. Both displayed high ceiling effects, the domains showed moderate correlations between theoretically related pairs, and the level of agreement between the 2 measurement utilities was poor. However, there are some notable differences between the EQ-5D-3L and SF-6D, which is consistent with the results in the general population and in patient groups.[ First, the scores on both the EQ-5D-3LCN and EQ-5D-3LUK are higher than those on the SF-6DUK in the overall sample, which is consistent with previous studies.[ The absolute difference is 0.156 (Std: 0.113) (P < .001) between the EQ-5D-3LCN and SF-6DUK and 0.137 (Std: 0.141) (P < .001) between the EQ-5D-3LUK and SF-6DUK. Previous studies have suggested that there are several reasons for the discrepancy. The first reason is the method used to derive the value sets and scoring algorithms.[ The SF-6DUK scoring algorithm was derived by the SG method, and the EQ-5D-3LCN and EQ-5D-3LUK scoring algorithms were derived by the TTO method. The SG method usually derives higher scores than the TTO method in patients with severe health states and lower scores than the TTO method in patients with mild health states.[ Second, the population resource may be another reason for the absolute difference. The UK population's preferential value was set to calculate the EQ-5D-3LUK and SF-6DUK scores in the Chinese population. In this study, the EQ-5D-3L means exceeded the SF-6D mean across the whole sample, which was inconsistent with some studies.[ Furthermore, Whithurst et al[ used the same method to derive the EQ-5D-3L and SF-6D preference value set in the same population and found that the SF-6D is still lower than the EQ-5D-3L (mean difference 0.253). Thus, the mean discrepancy may result from characteristics of the EQ-5D-3L and SF-6D and not only from the method and population difference.[ The most possible explanation may be the high ceiling effect of the EQ-5D-3L. The EQ-5D-3L had a much higher ceiling effect (1625 full-health respondents, 74.6%) than the SF-6D (55 full-health respondents, 2.53%), as shown in Fig. 4. The high ceiling effect will elevate the mean score of the EQ-5D-3L in all samples, which is consistent with a study in the general population.[ Second, the correlation between the EQ-5D-3L and SF-6D domains (0.20–0.51) and between utilities (0.46) was acceptable, but the scatter plot and the Bland-Altman plot revealed a lack of agreement between the EQ-5D-3L and SF-6D. The lowest EQ-5D-3L utility scores tend to have a high SF-6D score, and the highest EQ-5D-3L utility (1.00) tends to have a wide range of SF-6D utility (0.456–1.00). The high-end discrepancy between the EQ-5D-3L and SF-6D also revealed the high ceiling effect of the EQ-5D-3L. Third, the EQ-5D-3LCN, EQ-5D-3LUK and SF-6DUK performed well in discriminating among different sociodemographic and health groups. All 3 utility scores were lower among the groups with poor health than among those with good health (Table 5). This result is consistent with previous studies.[ The EQ-5D-3LCN and EQ-5D-3LUK seem more sensitive than the SF-6D in discriminating among sociodemographic subgroups based on age, gender, marriage, education, employment, household income, and health insurance (RE > 1.00, Table 4). This higher consistency may be caused by the larger standard deviation of the SF-6D. The larger standard deviation will lead to smaller F statistic values in ANOVA and smaller REs than in the EQ-5D-3LCN and EQ-5D-3LUK. Nevertheless, the results still show that the SF-6DUK significantly discriminates among all sociodemographic groups, including those based on age, gender, marriage, education, employment, household income, and health insurance (P < .01, Table 4). Furthermore, the SF-6DUK is more sensitive than the EQ-5D-3L in detecting smaller health differences. Among EQ-5D-3L full-health respondents, there are about 28.94% of respondents who self-reported their health as “good” and 20% of respondents with chronic conditions whom the EQ-5D-3L failed to discriminate with regard to health differences. The SF-6D can discriminant among different health groups on the ceiling of the EQ-5D-3L (P < .01, Table 6), although the results are inconsistent with those of the US population.[ Finally, the EQ-5D-3LUK and SF-6DUK utility scores calculated by the UK population preference value set and algorithm tended to have higher standard deviations, and the scores on the EQ-5D-3LUK are lower than those on the EQ-5D-3LCN in each sociodemographic group (Tables 4 and 5). These differences may be caused by the population source of the value set not representing the Chinese population. Values for health status may vary across countries because country-specific value sets are developed based on the local population health preferences and are usually affected by cultural differences.[ Previous studies comparing different countries’ specific preference value sets suggest that there are some differences between health statuses in the value sets of the UK, the US, Spanish and Japan.[ These differences may lead to different outcomes of QALYs, which in turn may influence healthcare decisions when the QALYs are used for economic evaluations. Therefore, a country-specific preference value set should be applied when it is available. Further research is needed to develop the SF-6D Chinese general population preference value sets. Our study must be interpreted in light of several study limitations. First, the Chinese pharmaceutical economic research guide suggests that the utility measures should use country-specific value sets.[ In this study, we used the UK population-based value set to calculate SF-6D scores because of the lack of China-specific SF-6D value sets. It is a limitation to compare the SF-6D and EQ-5D-3L using different country-specific values. Second, the country-specific value sets were developed based on different methods: TTO and SG. However, previous studies using the same method and population-based value sets have also displayed differences.[ Third, this is a cross-sectional study without any interventions. Therefore, it is not possible to compare longitudinal responsiveness and discriminant validity. Therefore, new research on establishing the mainland China-specific SF-6D value sets may be an important future advancement. And recently, the mainland China-specific EQ-5D-5L value sets have been developed by TTO method.[ Future studies are needed to compare the psychometric properties between EQ-5D-5L and SF-6D to explore the responsiveness of them in studies which involved interventions that would lead to changes in health conditions and assess whether the choice of EQ-5D-5L or SF-6D have different impact on estimates cost-utility and decision making. In conclusion, the study compared the construct validity, sensitivity and level of agreement between the EQ-5D-3L and SF-6D in the Chinese general population in Chengdu. Both are valid economic evaluation instruments in the Chinese general population. Country-specific value sets should be used when available. It seems that the 2 measurements are not interchangeable. The EQ-5D-3L has a higher ceiling effect and higher level of discriminant validity to discriminate among different sociodemographic groups, and the SF-6D has a lower ceiling effect and higher level of discriminant validity in moderately healthy groups. Users may consider this evidence in the choice of these instruments.

Acknowledgments

We would like to thank all the respondents who participated in the survey and all the investigators who helped conduct the survey in 2012.

Author contributions

LZ, NL and DL conceived the study and designed the study protocol; LZ, YH and ZL performed the data collection; LZ wrote the first draft of manuscript in addition to performing the literature search; and XL and NL provided statistical support and critically revised the manuscript for intellectual content. All authors read and approved the final manuscript. NL is the corresponding author of the paper. Conceptualization: Longchao Zhao, Ningxiu Li. Data curation: Longchao Zhao, Ningxiu Li. Formal analysis: Longchao Zhao. Funding acquisition: Ningxiu Li. Investigation: Longchao Zhao, Danping Liu, Yan He, Zhijun Liu, Ningxiu Li. Methodology: Longchao Zhao, Xiang Liu, Danping Liu, Yan He, Zhijun Liu, Ningxiu Li. Project administration: Longchao Zhao, Danping Liu, Yan He, Ningxiu Li. Supervision: Xiang Liu, Danping Liu, Ningxiu Li. Writing – original draft: Longchao Zhao. Writing – review & editing: Longchao Zhao, Xiang Liu, Ningxiu Li.

7 in total

1. Comparing the performance of the EQ-5D-3 L and the EQ-5D-5 L in an elderly Chinese population.

Authors: Ruxu You; Jinyu Liu; Zhihao Yang; Chenwei Pan; Qinghua Ma; Nan Luo
Journal: Health Qual Life Outcomes Date: 2020-04-09 Impact factor: 3.186

Review 2. Measurement Properties of Commonly Used Generic Preference-Based Measures in East and South-East Asia: A Systematic Review.

Authors: Xinyu Qian; Rachel Lee-Yin Tan; Ling-Hsiang Chuang; Nan Luo
Journal: Pharmacoeconomics Date: 2020-02 Impact factor: 4.981

3. The Relationship of Sitting Time and Physical Activity on the Quality of Life in Elderly People.

Authors: Jung In Choi; Young Hye Cho; Yun Jin Kim; Sang Yeoup Lee; Jeong Gyu Lee; Yu Hyeon Yi; Young Jin Tak; Hye Rim Hwang; Seung Hun Lee; Eun Ju Park; Young In Lee; Young Jin Ra; Su Jin Lee
Journal: Int J Environ Res Public Health Date: 2021-02-04 Impact factor: 3.390

4. Validity and responsiveness of EQ-5D-5L and SF-6D in patients with health complaints attributed to their amalgam fillings: a prospective cohort study of patients undergoing amalgam removal.

Authors: Admassu N Lamu; Lars Björkman; Harald J Hamre; Terje Alræk; Frauke Musial; Bjarne Robberstad
Journal: Health Qual Life Outcomes Date: 2021-04-17 Impact factor: 3.186

5. Resilience and Frailty in People Living With HIV During the COVID Era: Two Complementary Constructs Associated With Health-Related Quality of Life.

Authors: Giovanni Guaraldi; Jovana Milic; Sara Barbieri; Tommaso Marchiò; Agnese Caselgrandi; Sara Volpi; Emanuele Aprile; Michela Belli; Maria Venuta; Cristina Mussini
Journal: J Acquir Immune Defic Syndr Date: 2022-02-01 Impact factor: 3.771

6. Comparison of the measurement properties of SF-6Dv2 and EQ-5D-5L in a Chinese population health survey.

Authors: Shitong Xie; Dingyao Wang; Jing Wu; Chunyu Liu; Wenchen Jiang
Journal: Health Qual Life Outcomes Date: 2022-06-16 Impact factor: 3.077

7. The relationship between anthropometric indicators and health-related quality of life in a community-based adult population: A cross-sectional study in Southern China.

Authors: Yu-Jun Fan; Yi-Jin Feng; Ya Meng; Zhen-Zhen Su; Pei-Xi Wang
Journal: Front Public Health Date: 2022-09-28

7 in total