Literature DB >> 31254306

Cross-cultural validation of the work functioning impairment scale (WFun) among Japanese, English, and Chinese versions using Rasch analysis.

Yoshihisa Fujino¹, Ning Liu², Odgerel Chimed-Ochir¹, Makoto Okawara¹, Tomohiro Ishimaru³, Tatsuhiko Kubo¹.

Abstract

OBJECTIVES: The work functioning impairment scale (WFun) was developed to measure the degree of work functioning impairment in Japanese workers based on the Rasch model. Given that the number of foreign workers employed in Japan and abroad has increased in recent years, a multilingual questionnaire is becoming increasingly necessary to investigate work functioning impairment in these workers. The purpose of this study was to verify the cross-cultural validity of WFun between Japanese, Chinese, and English versions.
METHODS: A cross-sectional study was conducted in two stages. First, the Chinese and English versions of WFun were created. Second, an internet survey was conducted among 1000 Japanese, 400 Chinese, and 300 Americans. Estimates and standard errors of an individual's ability and item difficulty were calculated using the Rasch model. Differential item functioning (DIF) and differential test functioning (DTF) were also examined using Rasch model analyses.
RESULTS: The effect size of DIF for one item in the English version exceeded 0.5 logit, indicating the presence of some DIF. In contrast, the effect sizes of DIF for all other items were below 0.5 logit, indicating that the influence of DIF was negligible. Furthermore, Rasch measurements according to the raw score for each version of WFun showed strong agreement among the three versions, with an intraclass correlation of 0.98 (95% confidence interval: 0.97-0.99), indicating the absence of DTF.
CONCLUSIONS: Our findings indicate that the English, Chinese, and Japanese versions of WFun have good comparability.

Entities: Chemical Disease Species

Keywords: China; Japan; patient outcome assessment; presenteeism; translations; work capacity evaluation

Mesh：

Year: 2019 PMID： 31254306 PMCID： PMC6842009 DOI： 10.1002/1348-9585.12072

Source DB: PubMed Journal: J Occup Health ISSN： 1341-9145 Impact factor: 2.708

INTRODUCTION

Interest in presenteeism is increasing. Presenteeism refers to the practice of continuing to work despite being sick or being of poor health.1, 2 Reports have shown that presenteeism is significantly associated with productivity loss.3, 4, 5, 6, 7, 8, 9, 10 Consequently, many have sought to measure the magnitude of the impact of presenteeism on worker productivity. These efforts have led to the development of evaluation tools such as the Stanford Presenteeism Scale, Work Productivity and Activity Impairment Questionnaire, Work Limitation Questionnaire, and Work Performance Questionnaire.11, 12, 13, 14 The work functioning impairment scale (WFun) was developed to measure the degree of work functioning impairment in Japanese workers based on the Rasch model.15, 16, 17 That is, WFun endeavors to express the level of worker health problems in terms of the extent to which a worker experiences reduced functioning at work as a consequence of these problems. In contrast, other presenteeism indexes evaluate productivity based on a number of different factors, such as time not spent on a job, standard of work, amount of work, and personal factors.6 WFun has been validated according to Consensus‐based Standards for the Selection of Health Measurement Instruments (COSMIN).18, 19 COSMIN establishes and provides recommendations for examining various categories of validity and reliability for measurement instruments. Cross‐cultural validity refers to the degree that the performance of the items in a translated or culturally adjusted patient‐reported outcome tool suitably reproduces the performance of items in the source tool. Given that the number of foreign workers employed in Japan and abroad has increased in recent years, there is an increasing need for a multilingual questionnaire to investigate the work functioning impairment in these workers. Furthermore, for future international comparisons, it is necessary to evaluate the validity among foreign workers in general, not just those working in Japanese companies. However, WFun has not been cross‐culturally validated. Questionnaire items may not possess the same function across different cultural groups, and such items are said to show cross‐cultural bias or differential item functioning (DIF) according to culture.20, 21, 22 DIF not only arises from problems with translation but also from cultural heterogeneity. The existence of DIF biases comparability between the same questionnaires written in different languages. The purpose of this study was to verify the cross‐cultural validity of WFun between Japanese, Chinese, and English versions.

METHODS

This cross‐sectional study was performed in two stages. First, Chinese and English versions of WFun were created. Second, Rasch analysis and DIF verification were conducted.

Cross‐cultural adaptation

WFun consists of the following seven items: “I haven't been able to behave socially”, “I haven't been able to maintain the quality of my work”, “I have had trouble thinking clearly”, “I have taken more rests during my work”, “I have felt that my work isn't going well”, “I haven't been able to make rational decisions”, and “I haven't been proactive about my work”. Respondents are required to choose from one of the following five response categories for each item: 1, "not at all"; 2, "one or more days a month"; 3, "about one day a week"; 4, "two or more days a week"; and 5, "almost every day." The final WFun score was the sum of the scores of the 7 items. Scores could range from 7 to 35, with higher scores indicating worse work ability. Translation was performed based on previously described methods.19, 23 First, the Japanese version of WFun was translated into the target language by two translators independently. The authors then unified the two resulting translations with consultation. Subsequently, the unified translation was back‐translated into Japanese. The final translated versions were then developed according to the contents of the back translation at an expert meeting. This process was performed for both the Chinese and English versions. The Japanese, English, and Chinese versions of WFun are available from the corresponding author upon request.

Cross‐cultural validation

Subjects

The Japanese, Chinese, and English versions of WFun were examined using internet surveys. For the Japanese version, the internet survey was conducted in our previous study for the original development of WFun15; the data were reanalyzed for the purposes of the present study. The original Japanese version of WFun was examined using an internet survey that targeted 1000 registered Japanese monitors, as described previously.15 Briefly, we enlisted a commercial testing company to perform an internet test user study. An email was sent to approximately 20 000 of the 2 million registered internet test users asking for participation in the survey. Potential participants were screened for the inclusion of sentences such as “I am currently employed” and “I have some health issues”. We excluded workers who did not have any health issues because WFun aims to measure the degree of work functioning impairment due to health problems. Registered users who satisfied these criteria were categorized into five age groups (20s, 30s, 40s, 50s, and 60s) according to sex, with each group containing 100 respondents. The first 1,000 responses were included to attain the target population for each group. Respondents were asked about their age, sex, occupation, and employment type, and provided responses to the WFun items. Likewise, the Chinese version was examined using an internet survey targeting 400 Chinese respondents aged 20–59 years living in mainland China, and the English version was examined using an internet survey targeting 300 Americans aged 20–59 years living in the United States. American subjects registered to DynataTM, to which approximately 7 million subjects are registered, and Chinese subjects registered to iPanel Online Market Research Ltd., to which approximately 100 000 subjects are registered, were used for the internet survey. Given that this research uses data that do not include personal information from internet monitors, the need for ethical approval from an ethical committee was waived.

Statistical analysis

The Rasch model is a common statistical method used to estimate latent ability based on item responses.24, 25, 26The Rasch model is a mathematical framework that provides approximations and standard errors of person ability and item difficulty, which are determined on a common equal‐interval logit scale. In the Rasch model, one variable is used to approximate person ability (total correct responses by the individual) and item difficulty (total correct responses to an item) to calculate the likelihood of the individual being successful at the item. The Rasch analysis was conducted in WINSTEPS version 4.2.0. Data were fitted to the Rasch rating scale model with the joint maximum likelihood estimation, where all items had equivalent rating scale structures. The magnitude of DIF is known as the DIF contrast or effect size, and indicates logit differences in Rasch model difficulty estimates.21 The recommended effect size is typically 0.40–0.60 logit,27, 28, 29 although standards for important effect size are lacking. In practice, <0.50 logit is used to indicate no DIF as “measures based on item calibration with random deviations up to 0.50 logit are ‘for all practical purposes free from bias.’”27, 28, 29 We also evaluated the potential presence of differential test functioning (DTF). Given that a subject is evaluated based on the results from the entire test, it is important to verify whether the existence of DIF affects the evaluation of the whole test.29, 30, 31 To do this, we approximated Rasch measurements correlating to the raw scores from each of the Japanese, Chinese, and English versions of WFun, and subsequently calculated the absolute consistency using intraclass correlation (ICC) (2,1).32

RESULTS

Table 1 shows the demographic characteristics of the study subjects. Due to planned sampling, there were no substantial differences in the age or gender of respondents among the three versions of WFun. The proportion of occupation was similar between Japanese and American subjects. Approximately, 50% of the subjects were desk workers. On the other hand, the percentage of desk work was high among the Chinese subjects, with 76% being desk work.

Table 1

Basic characteristics of study subjects

	Japanese	Chinese	American
Number of subjects	1000	400	300
Men (%)	50a	50a	50a
Age (mean and SD)	44 (13)a	39 (11)a	43 (13)a
Job type (%)
Mainly desk work	51	76	55
Mainly work involving interpersonal communication	23	15	22
Mainly physical work	25	9	23

Equal number of subjects were assigned to 10‐year age groups (20s, 30s, 40s, 50s and 60s) by sex.

Basic characteristics of study subjects Equal number of subjects were assigned to 10‐year age groups (20s, 30s, 40s, 50s and 60s) by sex. Table 2 shows the estimated Rasch measurements for all groups combined (Japanese, Chinese, Americans) and for each group separately. For Japanese respondents, item 6 (“I haven't been able to make rational decisions”) had the highest value of 0.40 logit. This indicates that respondents who answered “yes” to item 6 may experience the most severe work functioning impairment. In contrast, item 3 (“I have had trouble thinking clearly”) had the lowest value of −0.38 logit. Similar results were obtained for American respondents. For Chinese respondents, item 1 (“I haven't been able to behave socially”) had the highest value of 0.55 logit, while item 2 (“I haven't been able to maintain the quality of my work”) had the lowest value of −0.53 logit.

Table 2

Rasch measurements by different language versions

Itema	Total (n = 1700)		Japanese (n = 1000)		Chinese (n = 400)		American (n = 300)
Itema	item difficulty	SE	item difficulty	SE	item difficulty	SE	item difficulty	SE
q1	0.00	0.03	−0.29	0.05	0.55	0.08	0.22	0.07
q2	−0.24	0.03	−0.20	0.05	−0.53	0.07	−0.09	0.07
q3	−0.30	0.03	−0.38	0.05	−0.15	0.07	−0.31	0.07
q4	0.17	0.04	0.39	0.05	0.16	0.08	−0.29	0.07
q5	−0.19	0.03	−0.15	0.05	−0.47	0.07	−0.03	0.07
q6	0.37	0.04	0.40	0.05	0.03	0.07	0.65	0.08
q7	0.19	0.04	0.23	0.05	0.41	0.08	−0.13	0.07

q1: I haven't been able to behave socially.

q2: I haven't been able to maintain the quality of my work.

q3: I have had trouble thinking clearly.

q4: I have taken more rests during my work.

q5: I have felt that my work isn't going well.

q6: I haven't been able to make rational decisions.

q7: I haven't been proactive about my work.

Items:

Rasch measurements by different language versions q1: I haven't been able to behave socially. q2: I haven't been able to maintain the quality of my work. q3: I have had trouble thinking clearly. q4: I have taken more rests during my work. q5: I have felt that my work isn't going well. q6: I haven't been able to make rational decisions. q7: I haven't been proactive about my work. Items: Table 3 shows the effect size of DIF for the three versions of WFun. Only the effect size for item 4 (“I have taken more rests during my work”) in the English version exceeded 0.5 logit, indicating the presence of some DIF for this item. The effect sizes for all other items were less than 0.5 logit, indicating that DIF was negligible.

Table 3

Effect size of differential item functioning (DIF) by different language versions

Itema	Japanese (n = 1000)		Chinese (n = 400)		American (n = 300)
Itema	effect size	SE	effect size	SE	effect size	SE
q1	−0.30	0.05	0.42	0.07	0.25	0.08
q2	0.04	0.05	−0.16	0.06	0.13	0.08
q3	−0.08	0.05	0.19	0.06	−0.07	0.08
q4	0.23	0.05	−0.05	0.07	−0.52	0.08
q5	0.04	0.05	−0.16	0.06	0.16	0.08
q6	0.05	0.05	−0.34	0.06	0.41	0.08
q7	0.06	0.05	0.12	0.07	−0.34	0.08

Effect size refers to logit differences in Rasch difficulty estimates between target subjects and total subjects. Effect size more than 0.50 logit in absolute value indicates presence of DIF.

q1: I haven't been able to behave socially.

q2: I haven't been able to maintain the quality of my work.

q3: I have had trouble thinking clearly.

q4: I have taken more rests during my work.

q5: I have felt that my work isn't going well.

q6: I haven't been able to make rational decisions.

q7: I haven't been proactive about my work.

Items:

Effect size of differential item functioning (DIF) by different language versions Effect size refers to logit differences in Rasch difficulty estimates between target subjects and total subjects. Effect size more than 0.50 logit in absolute value indicates presence of DIF. q1: I haven't been able to behave socially. q2: I haven't been able to maintain the quality of my work. q3: I have had trouble thinking clearly. q4: I have taken more rests during my work. q5: I have felt that my work isn't going well. q6: I haven't been able to make rational decisions. q7: I haven't been proactive about my work. Items: Table 4 shows the Rasch measurements according to the raw score from each version of WFun used to determine the potential presence of DTF. The ICC was 0.98 (95% confidence interval: 0.97‐0.99), indicating strong agreement among the three versions and, therefore, the absence of DTF.

Table 4

Rasch measurements according to raw scores by language

Total score	Japanese (n = 1000)		Chinese (n = 400)		American (n = 300)
Total score	measure	SE	measure	SE	measure	SE
7	−5.00	1.80	−5.83	1.85	−4.40	1.82
8	−3.73	1.00	−4.56	1.05	−3.20	1.00
9	−2.96	0.70	−3.77	0.78	−2.52	0.71
10	−2.47	0.60	−3.26	0.66	−2.11	0.58
11	−2.10	0.50	−2.86	0.60	−1.81	0.51
12	−1.79	0.50	−2.52	0.56	−1.57	0.47
13	−1.53	0.50	−2.22	0.54	−1.37	0.44
14	−1.29	0.40	−1.94	0.52	−1.19	0.41
15	−1.08	0.40	−1.67	0.51	−1.02	0.40
16	−0.87	0.40	−1.41	0.51	−0.87	0.39
17	−0.69	0.40	−1.15	0.50	−0.72	0.38
18	−0.50	0.40	−0.90	0.50	−0.58	0.38
19	−0.33	0.40	−0.65	0.50	−0.43	0.38
20	−0.16	0.40	−0.39	0.51	−0.29	0.38
21	0.01	0.40	−0.13	0.51	−0.15	0.38
22	0.18	0.40	0.13	0.52	0.00	0.39
23	0.35	0.40	0.40	0.53	0.16	0.40
24	0.52	0.40	0.68	0.53	0.32	0.41
25	0.70	0.40	0.97	0.54	0.49	0.42
26	0.89	0.40	1.27	0.55	0.68	0.44
27	1.09	0.40	1.59	0.57	0.89	0.46
28	1.30	0.40	1.91	0.58	1.11	0.49
29	1.53	0.40	2.26	0.59	1.36	0.51
30	1.79	0.50	2.62	0.62	1.64	0.55
31	2.09	0.50	3.02	0.65	1.97	0.60
32	2.45	0.60	3.47	0.70	2.36	0.66
33	2.93	0.70	4.03	0.81	2.88	0.78
34	3.70	1.00	4.87	1.07	3.68	1.05
35	4.96	1.80	6.17	1.86	4.96	1.85

Rasch measurements according to raw scores by language

DISCUSSION

This study examined the cross‐cultural validity of Japanese, Chinese, and English versions of WFun. DIF was identified for only one item in the English version, but was negligible for the other items and all items in the Chinese version. Furthermore, there was no DTF among the three versions. Two processes are necessary when adopting patient‐reported outcome measures in multiple languages: cross‐cultural adaptation and cross‐cultural validation.33 Cross‐cultural adaptation is a process that ensures equivalence in meaning, with equivalence comprising several components, including conceptual equivalence and item equivalence.33, 34 This process was reflected in the following steps used in the present study: initial translation, synthesis/reconciliation of the translations, back translation, expert committee review, and pretesting.23, 35 Following cross‐cultural adaptation, cross‐cultural validation is examined, in which particular scrutiny is placed on measurement invariance. This refers to target populations with comparable disease severity; that is, scores obtained using the original and cross‐culturally adapted versions are the same.19 Such cases would not exhibit DIF. The Rasch model is a well‐known method for detecting DIF. DIF was negligible for all seven items in the Chinese version and six items in the English version of WFun. This suggests that there are few linguistic or cultural biases affecting the question items for Chinese and American subjects. We only identified DIF in the item “I have taken more rests during my work” in the English version. This indicates that American respondents had a higher tendency to affirm this item than Japanese and Chinese respondents. Although the reason for this is unclear, American respondents may be more prone to interruptions under poor health conditions because of higher job control than subjects from Japan and China. Similarly, the lack of DTF, which examines discrepancies between whole tests, was consistent, indicating that there was no DTF among the three versions. The negligible DTF reflects the lack of DIF in the Chinese version, and the presence of DIF in only one item in the American version. These findings indicate that the Japanese, Chinese, and English versions of WFun are comparable. The number of foreign workers in the Japanese labor market is increasing as a result of the declining birth rate and aging population in Japan. Moreover, the number of foreign workers employed overseas is increasing due to globalization of economic activities. Against this backdrop, health management of not only Japanese but also foreign workers is becoming increasingly important in many Japanese companies. However, given that health care, medical delivery systems, awareness of health, and the scope of safety considerations differ between Japan and other countries, health management based on diagnosis according to a disease name and medical examination results is ineffective. We propose that management based on the degree of difficulty in conducting work, rather than a disease name or examination results, may be useful as a screening tool for health management in the global labor market. Several limitations of this study warrant mention. First, the subjects of this study were not representative of the examined countries because sampling was conducted using an internet survey. Given that verification of DIF depends on the sample, the results of this study may not necessarily reflect those of a representative group. However, sampling in this study was non‐systematic or haphazard, and, therefore, does not represent a specific group. Second, the present study was not limited to foreigners working in Japanese companies. Foreigners who work in Japanese companies may experience different socioeconomic conditions to those who work in non‐Japanese companies. If there is a cross‐cultural difference, there may be differences in interpretability or understanding of their respective languages between foreigners who work in Japanese companies and those who work in non‐Japanese companies. However, there is no reason to assume this. Furthermore, the Rasch model assumes no sample dependence for test items of measures produced by the model, a property called “specific objectivity”.26, 36, 37 In fact, we examined this property in the development process and found that estimates of item difficulty were consistent between subgroups of different sex, age, income, and job type, as well as different companies.15 Nonetheless, to further verify comparability with Japanese workers, future studies should conduct a survey targeting foreigners who work for Japanese companies. Third, validation of the English version among an American population does not guarantee its validity among other English‐speaking populations. In conclusion, there was no DIF in the Chinese or English version of WFun except for one item in the English version. Likewise, there was no DTF in either the Chinese or English version. This study suggests that results from the English, Chinese, and Japanese versions of WFun have good comparability.

DISCLOSURE

Approval of the research protocol: N/A. Informed consent: N/A. Registry and the registration no. of the study/trial: N/A. Animal studies: N/A. Conflicts of interest: The authors have no conflict of interest.

AUTHOR CONTRIBUTIONS

YF collected and analyzed the data and led the writing. NL, OC, MO, TI, and TK supported writing.

24 in total

Review 1. Guidelines for the process of cross-cultural adaptation of self-report measures.

Authors: D E Beaton; C Bombardier; F Guillemin; M B Ferraz
Journal: Spine (Phila Pa 1976) Date: 2000-12-15 Impact factor: 3.468

2. Stanford presenteeism scale: health status and employee productivity.

Authors: Cheryl Koopman; Kenneth R Pelletier; James F Murray; Claire E Sharda; Marc L Berger; Robin S Turpin; Paul Hackleman; Pamela Gibson; Danielle M Holmes; Talor Bendel
Journal: J Occup Environ Med Date: 2002-01 Impact factor: 2.162

3. Validity of a Work Productivity and Activity Impairment questionnaire for patients with symptoms of gastro-esophageal reflux disease (WPAI-GERD)--results from a cross-sectional study.

Authors: Peter Wahlqvist; Jonas Carlsson; Nils-Olov Stålhammar; Ingela Wiklund
Journal: Value Health Date: 2002 Mar-Apr Impact factor: 5.725

4. The World Health Organization Health and Work Performance Questionnaire (HPQ).

Authors: Ronald C Kessler; Catherine Barber; Arne Beck; Patricia Berglund; Paul D Cleary; David McKenas; Nico Pronk; Gregory Simon; Paul Stang; T Bedirhan Ustun; Phillip Wang
Journal: J Occup Environ Med Date: 2003-02 Impact factor: 2.162

Review 5. Presenteeism: critical issues.

Authors: Ambyr Brooks; Susan E Hagen; Sudhakar Sathyanarayanan; Alyssa B Schultz; Dee W Edington
Journal: J Occup Environ Med Date: 2010-11 Impact factor: 2.162

6. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes.

Authors: Lidwine B Mokkink; Caroline B Terwee; Donald L Patrick; Jordi Alonso; Paul W Stratford; Dirk L Knol; Lex M Bouter; Henrica C W de Vet
Journal: J Clin Epidemiol Date: 2010-07 Impact factor: 6.437

Review 7. A review of guidelines for cross-cultural adaptation of questionnaires could not bring out a consensus.

Authors: Jonathan Epstein; Ruth Miyuki Santo; Francis Guillemin
Journal: J Clin Epidemiol Date: 2014-12-17 Impact factor: 6.437

Review 8. Cross-cultural adaptation of health-related quality of life measures: literature review and proposed guidelines.

Authors: F Guillemin; C Bombardier; D Beaton
Journal: J Clin Epidemiol Date: 1993-12 Impact factor: 6.437

9. The Work Limitations Questionnaire.

Authors: D Lerner; B C Amick; W H Rogers; S Malspeis; K Bungay; D Cynn
Journal: Med Care Date: 2001-01 Impact factor: 2.983

10. Validity and responsiveness of the work functioning impairment scale (WFun) in workers with pain due to musculoskeletal disorders.

Authors: Misako Makishima; Yoshihisa Fujino; Tatsuhiko Kubo; Hiroyuki Izumi; Masamichi Uehara; Ichiro Oyama; Shinya Matsuda
Journal: J Occup Health Date: 2017-12-28 Impact factor: 2.708

2 in total

1. Cross-cultural validation of the work functioning impairment scale (WFun) among Japanese, English, and Chinese versions using Rasch analysis.

Authors: Yoshihisa Fujino; Ning Liu; Odgerel Chimed-Ochir; Makoto Okawara; Tomohiro Ishimaru; Tatsuhiko Kubo
Journal: J Occup Health Date: 2019-06-28 Impact factor: 2.708

2. Construct validity and test-retest reliability of the World Mental Health Japan version of the World Health Organization Health and Work Performance Questionnaire Short Version: a preliminary study.

Authors: Norito Kawakami; Akiomi Inoue; Masao Tsuchiya; Kazuhiro Watanabe; Kotaro Imamura; Mako Iida; Daisuke Nishi
Journal: Ind Health Date: 2020-03-14 Impact factor: 2.179

2 in total