Literature DB >> 25931689

Rasch validation of the SF-36 for assessing the health status of Korean older adults.

Sae-Hyung Kim1, Wi-Young So2.   

Abstract

[Purpose] To verify, using Rasch analysis, the applicability of the 36-Item Short Form Health Survey (SF-36) to elderly Koreans, as this instrument would be useful for determining elderly individuals' overall performance and providing them with health information.
[Subjects and Methods] The SF-36 was administered to a sample of 510 individuals aged over 60 living in the Seoul and Gyeonggi areas of South Korea. When testing for goodness-of-fit, we considered items with infit and outfit indexes of over 1.30 or less than 0.70 to be incongruent. SF-36 factors that contained over three items, including physical functioning, role limitations (physical and emotional), mental health, vitality, and general health, were analyzed. Each factor was examined through step calibration of the response categories in the probability curve.
[Results] The response categories were found to be appropriate because the adjustment values of each factor increased. We found five items in physical functioning, two items in role limitation-emotion, one item in mental health, and one item in general health to be incongruent; all items in the role limitation-physical and vitality factors were congruent.
[Conclusion] We conclude that the SF-36 could be revised to more accurately measure the health status of elderly Koreans.

Entities:  

Keywords:  Elderly; Health survey

Year:  2015        PMID: 25931689      PMCID: PMC4395673          DOI: 10.1589/jpts.27.601

Source DB:  PubMed          Journal:  J Phys Ther Sci        ISSN: 0915-5287


INTRODUCTION

By 2017, Korea will be an aging society, with over 14.0% of its population being over 65 years old; by 2026, it will be a super-aged society, with the elderly accounting for over 20.0% of the population1). For this reason, gerontology research is becoming increasingly important, especially that centered on the measurement and evaluation of elderly health status2, 3). In order to measure the health status of the elderly accurately, the concept of health should be clearly defined in terms of present societal demands. Health, which a century ago simply meant “survival” or being “disease free”4), nowadays contains aspects related to quality of life in addition to the absence of disease5). Due to this change in the definition of health, the use of a scale developed according to the former definition would be obviously problematic in the present aging society. Unfortunately, numerous existing measures of health status of adults or the elderly adhere to the former definition. One of these, the 36-Item Short Form Health Survey (SF-36) has come to serve as a representative general health status and treatment efficacy assessment scale since its development6,7,8). The SF-36 has been used in health science, medicine, and physical education among other fields. Furthermore, it has been translated into several languages, including Korean, and it is widely used as a measure of the health status of Korean elderly adults. Its validity in the adult and elderly population of Korea has also been confirmed. For example, Nam and Lee4), who studied adults, and Han and Lee9), who studied the elderly, demonstrated the construct validity of the SF-36 through exploratory and confirmatory factor analyses. However, factor analysis by itself does not accurately confirm a scale’s validity, especially when it is applied to a group other than the original10). Furthermore, it cannot verify the structure of the sub-components and the difficulty level of the items11). The above problems can be circumvented via the Rasch model12). This model is based on item response theory, and it evaluates the appropriateness of the number of items in a scale and calculates the fit of the items to the data and the difficulty of these items. The Rasch model is also able to evaluate the suitability of the response method (e.g., five- or four-point scales) for subjects13). Furthermore, it can detect overlapping and incongruent items, which can be confusing to subjects14,15,16). For these reasons, it is feasible and desirable to employ the Rasch methodology to confirm the validity of the SF-36 for elderly Korean adults.

SUBJECTS AND METHODS

Setting and samples: This study surveyed elderly people over the age of 60 by distributing the SF-36 questionnaire at senior welfare centers in the Seoul and Gyeonggi areas of South Korea. An investigator explained and completed each item for those elderly persons who were unable to complete the self-administered questionnaire. Five hundred and ten responses were collected after excluding one with inconsistent answers. The distribution of subjects according to gender and age is shown in Table 1.
Table 1.

Distribution of subjects according to age and gender

AgeMaleFemaleTotal
60–6412 (2.4%)16 (3.1%)28 (5.5%)
65–6953 (10.4%)109 (21.4%)162 (31.8%)
70–7486 (16.9%)103 (20.2%)189 (37.1%)
75–7931 (6.1%)43 (8.4%)74 (14.5%)
Over 8022 (4.3%)35 (6.9%)57 (11.2%)

Total204 (40.0%)306 (60.0%)510 (100%)
This study was approved by the Public Institutional Review Board of the Korea Ministry of Health and Welfare (P01-201303-SB-07-00) and conforms to the principles of the Declaration of Helsinki. All study participants provided their informed consent before completing the questionnaire. Furthermore, we received permission to use the SF-36 from the company that owns it (OptumInsight Life Sciences, Inc., USA); the approval number is C009024. Measurements/Instruments: The SF-36 scale was developed by Ware and Sherboune17), and was adapted by Koh et al.18) to measure the quality of life of elderly Koreans. It consists of 36 items categorized as follows: 10 items of physical functioning, four items of role limitation–physical, two items of bodily pain, five items of general health, four items of vitality, two items of social functioning, three items of role limitation–emotion, five items of mental health, and one item of health changes. The items concerning physical functioning are answered on a three-point scale, while those concerning bodily pain are answered on a five or six-point scale; the rest of the items are answered on a five-point scale. In general, higher scores indicate better health status; however, three items of general health, one item of social functioning, two items of vitality, two items of mental health, the two items of bodily pain, and the item of health changes are reverse scored. The sequence of the items in the present study was randomized to increase measurement validity. The SF-36 subscales and the number of items are shown in Table 2.
Table 2.

SF-36 dimensions and the number and sequence of items

DimensionNumber of itemsSequence of items
Physical functioning103, 4, 5, 6, 7, 8, 9, 10, 11, 12
Social functioning220, 32
Role limitation–Physical413, 14, 15, 16
Role limitation–Emotional317, 18, 19
Mental health524, 25, 26, 28, 30
Vitality423, 27, 29, 31
Bodily pain221, 22
General health51, 33, 34, 35, 36
Health changes12

Total36

SF-36, 36-item Short Form Survey

SF-36, 36-item Short Form Survey We used Rasch analysis to examine the validity of the SF-36. First, we conducted a principal component analysis to determine the unidimensionality of the items, as this is a fundamental condition of the Rasch model. This was done by calculating whether the items of each factor share equal variance19) using SPSS 20.0 (SPSS, Chicago, IL, USA). In principal component analysis, the scale is considered unidimensional if the eigenvalue of the first component is greater than 1.00 or if the largest principal component accounts for over 20% of the total variance20). Goodness-of-fit was then examined using Winsteps 3.65.0 (Rasch Measurement Software, USA). An item was regarded as inappropriate if its goodness-of-fit (infit and outfit) indexes were more than 1.30 or less than 0.7021). Furthermore, the suitability of the rating scales of the items was evaluated through step calibration in the probability curve. Step calibration examines the appropriateness of the rating scales of the items by dividing subjects’ numerical response scores by the number of responses in the scale (three-point, five-point, or six-point scale). The response scale appropriateness is expressed as thresholds, which are calculated from each factor’s item characteristic curve22). Health status is considered to have an appropriate response scale if the step calibration of the scale applied to each factor (the adjustment value) increases gradually towards as high scale score23, 24). Social functioning, bodily pain, and health change, which consist of less than two items each, were excluded from this analysis.

RESULTS

Unidimensionality verification: The results of the principal component analysis used to test the unidimensionality are displayed in Table 3. All six factors together accounted for over 20% of the variance (effect size) of the first component eigenvalue. Furthermore, the variances of the first component eigenvalue in physical functioning, role limitation–physical, role limitation–emotion, mental health, vitality, and general health were 64.13%, 89.51%, 91.31%, 62.43%, 65.34%, and 61.46% respectively; thus, the criteria of unidimensionality were satisfied.
Table 3.

Results of principal component analysis of SF-36

FactorFirst componenteigenvalueSecond componenteigenvalueVariance of firstcomponent eigenvalue (%)
Physical functioning6.411.1864.13
Role limitation–Physical3.580.1989.51
Role limitation–Emotion2.730.1891.31
Mental health3.120.7262.43
Vitality2.610.7365.34
General health3.070.8561.46

SF-36, 36-item Short Form Survey

SF-36, 36-item Short Form Survey It is possible to examine whether the number of item categories is appropriate by using the category probability curve (Fig. 1) and step calibration (Table 4), both of which are integral in Rasch analysis. The x-axis of the category probability curve presents the personal latent characteristic of health status and the logit difference of item difficulty, while the y-axis presents the category probability of a specific response category. Step calibration involves presenting the x-axis value of the point that each category curve crosses. A response category is considered appropriate if the step calibration increases gradually. The response rate of category 3 was 49%, which was the highest. From categories 1 to 3, the observed average value increased gradually from −2.74 to 3.63. Furthermore, for the step calibration, the numerical value of the two points that each of the three category probability curves crossed was shown to increase gradually—specifically, the value between categories 1 and 2 was −1.42 and that between 2 and 3 was 1.42.
Fig. 1.

Response category probability curve of physical functioning

Table 4.

Step calibration of the response category of physical functioning

CategoryResponse rate (%)Observed average valueStep calibration
121−2.74
2300.32−1.42
3493.631.42
Response category probability curve of physical functioning Figure 2 and Table 5 present the category probability curve and step calibration of the four items of the role limitation–physical factor, which were answered using a five-point scale. As observed from the figure, the crossing points increased gradually, as follows: between categories 1 and 2, the crossing point was −5.11; between 2 and 3, it was −1.02; between 3 and 4, it was 1.38; and between 4 and 5, it was 4.75. Figure 3 and Table 6 show the category probability curve and step calibration of the three items of the role limitation–emotion factor, which were also answered using a five-point scale. The crossing points increased gradually, as follows: between categories 1 and 2, the crossing point was −6.15; between categories 2 and 3, it was −2.19; between categories 3 and 4, it was 2.16; and between categories 4 and 5, it was 6.17. Figure 4 and Table 7 show the category probability curve and step calibration of the five items of the mental health factor, which were answered on a five-point scale. As before, the crossing points increased gradually, as follows: between categories 1 and 2, the crossing point was −1.54; between categories 2 and 3, it was −0.65; between categories 3 and 4, it was 0.75; and between categories 4 and 5, it was 1.44. Figure 5 and Table 8 present the category probability curve and step calibration of the four items of the vitality factor, which were answered on a five-point scale. The crossing points increased gradually, as follows: between categories 1 and 2, the crossing point was −1.52; between categories 2 and 3, it was −0.68; between categories 3 and 4, it was 0.74; and between categories 4 and 5, it was 1.46. Figure 6 and Table 9 display the category probability curve and step calibration of the five items of the general health factor, which were answered on a five-point scale. The crossing point increased gradually, as follows: between categories 1 and 2, the crossing point was −2.17; between categories 2 and 3, it was −0.60; between categories 3 and 4, it was 0.59; and between categories 4 and 5, it was 2.18.
Fig. 2.

Response category probability curve of role limitation–physical

Table 5.

Step calibration of the response category of role limitation–physical

CategoryResponse rate (%)Observed average valueStep calibration
19−5.26
222−2.74−5.11
3220.40−1.02
4313.051.38
5164.944.75
Fig. 3.

Response category probability curve of role limitation–emotion

Table 6.

Step calibration of the response category of role limitation–emotion

CategoryResponse rate (%)Observed average valueStep calibration
15−5.99
217−3.33−6.15
3320.28−2.19
4343.972.16
5125.926.17
Fig. 4.

Response category probability curve of mental health

Table 7.

Step calibration of the response category of mental health

CategoryResponse rate (%)Observed average valueStep calibration
111−1.88
214−0.77−1.54
3210.21−0.65
4211.250.75
5322.561.44
Fig. 5.

Response category probability curve of vitality

Table 8.

Step calibration of the response category of vitality

CategoryResponse rate (%)Observed average valueStep calibration
117−1.99
219−0.95−1.52
3240.11−0.68
4191.010.74
5212.051.46
Fig. 6.

Response category probability curve of general health

Table 9.

Step calibration of the response category of general health

CategoryResponse rate (%)Observed average valueStep calibration
116−2.27
225−1.18−2.17
326−0.11−0.60
4231.050.59
5101.812.18
Response category probability curve of role limitation–physical Response category probability curve of role limitation–emotion Response category probability curve of mental health Response category probability curve of vitality Response category probability curve of general health Test of goodness-of-fit: Next, we examined the item difficulty and the goodness-of-fit of the factors (Table 10). In Rasch analysis, the closer the fit index is to 1.0, the more appropriate the item. In physical functioning, the infit and outfit indexes of the first item (engaging in vigorous activities such as running, lifting heavy objects, or participating in strenuous sports) were respectively 0.93 and 1.96 (the latter being over the threshold); those of the fourth item (climbing several flights of stairs) were respectively 0.76 and 0.66 (the latter being below the threshold); those of the sixth item (being able to bend, kneel, or stoop) were respectively 1.43 and 1.67 (both of which were over the threshold); those of the ninth item (being able to walk one block) were respectively 0.91 and 0.69 (the latter being below the threshold); and those of the 10th item (being able to bathe or dress) were respectively 1.20 and 0.64 (the latter being below the threshold). For role limitation–emotion, the infit and outfit indexes of the second item (accomplished less than would have liked due to feeling depressed or anxious) were respectively 0.69 and 0.65 (both of which were below the threshold), and those of the third item (did not do work or other activities as carefully as usual due to feeling depressed or anxious) were respectively 1.36 and 1.32 (both of which were over the threshold). For mental health, the infit and outfit indexes of the first item (“Are you a very nervous person?”) were respectively 1.44 and 1.58 (both of which were over the threshold). For general health, the infit and outfit indexes of the fourth item (“I expect my health to get worse”) were respectively 1.39 and 1.46 (both of which were over the threshold). The other items, such as the four items of role limitation–physical and the four items of vitality, were appropriate, with goodness-of-fit (infit and outfit) indexes over 0.70 and under 1.30.
Table 10.

Item difficulty and goodness-of-fit index

No.ItemsCalibration logitSE logitInfit MnSqaOutfit MnSq
Physical functioning1Engaging in vigorous activities such as running, lifting heavy objects, or participating in strenuous sports3.680.110.931.96
2Engaging in moderate activities such as moving a table, pushing a vacuum cleaner, bowling, or playing golf1.590.101.061.22
3Able to lift or carry groceries−0.090.110.971.03
4Able to climb several flights of stairs1.220.100.760.66
5Able to climb one flight of stairs−0.960.111.020.98
6Able to bend, kneel, or stoop0.880.101.431.67
7Able to walk more than a mile−0.070.110.880.90
8Able to walk several blocks−1.120.120.810.71
9Able to walk one block−1.930.130.910.69
10Able to bathe or dress yourself−3.190.161.200.64

Role limitation–physical1You have to cut down on the amount of time you spend on work or other activities−0.600.101.131.14
2You have accomplished less than you would have liked−0.090.101.010.98
3You are limited in the kind of work or other activities you can perform0.530.100.920.90
4You have difficulty performing work or other activities (for example, it takes extra effort)0.160.100.880.85

Role limitation–emotion1You have to cut down on the amount of time spent on work or other activities due to feeling depressed or anxious−0.270.130.880.86
2You have accomplished less than you would have liked due to feeling depressed or anxious0.040.130.690.65
3You did not do work or other activities as carefully as usual due to feeling depressed or anxious0.230.131.361.32

Mental health1Are you a very nervous person?−1.240.071.441.58
2Do you feel so down in the dumps that nothing could cheer you up?−0.730.060.780.70
3Do you feel calm and peaceful?1.280.061.051.00
4Do you feel downhearted and blue?−0.550.060.880.85
5Are you a happy person?1.240.060.970.92

Vitality1Do you feel full of pep?0.630.060.840.84
2Do you have a lot of energy?0.810.061.041.01
3Do you feel worn out?−1.360.061.021.07
4Do you feel tired?−0.080.061.101.14

General health1In general, would you say your health is0.990.060.780.74
2I seem to get sick a little easier than other people do−0.940.061.010.99
3I am as healthy as anybody I know−0.150.060.950.90
4I expect my health to get worse0.220.061.391.46
5My health is excellent−0.120.060.870.82

Item names are descriptions of the content and do not reflect the exact Korean wording of the question. a MnSq: mean square residuals

Item names are descriptions of the content and do not reflect the exact Korean wording of the question. a MnSq: mean square residuals

DISCUSSION

This study examined whether the SF-36 is appropriate for elderly Koreans using Rasch analysis. The six factors of the SF-36 were found to satisfy the criterion of unidimensionality. Furthermore, step calibration showed that the three-point scale used to respond to items of the physical functioning factor and the five-point scales for the role limitation–physical, role limitation–emotion, mental health, vitality, and general health factors were appropriate. This implies that the SF-36 response categories are appropriate for elderly Koreans, and that the elderly who participated in this study understood the reverse scored items of the SF-36. In our study, fewer than three items from each factor—which tested content-based evidence such as bodily pain, social functioning, and health changes—were excluded from the analysis. This corresponds with the results of Bollen25) and Gilbert et al.26), the latter of whom conducted a psychometric scaling study, which indicated that at least three items were appropriate for exclusion. However, the results of the test of goodness-of-fit suggested that five of the ten items in the physical functioning factor are inappropriate. The infit indexes of items 1 (engaging in vigorous activities such as running, lifting heavy objects, participating in strenuous sports) and 6 (being able to bend, kneel, or stoop) were over 1.30. This implies that these items were incongruent with other items and confused subjects27), making these items inappropriate for measuring the physical functioning level of elderly Koreans. In addition, the outfit indexes of items 4 (being able to climb several flights of stairs), 9 (being able to walk one block), and 10 (being able to bathe or dress) were less than 0.70. This suggests that these items overlap with each other, and excluding them would simplify and clarify the measurement28). Furthermore, items 2 (accomplished less than you would have liked due to feeling depressed or anxious) and 3 (did not do work or other activities as carefully as usual due to feeling depressed or anxious) of role limitation–emotion, which consists of three items, were found to be inappropriate. Item 2 overlapped with role limitation–physical and Item 3 had the highest logit difficulty, which likely means that it was confusing to subjects. Both the infit and outfit indexes of one item each of the role limitation–emotion and general health factors were over 1.30, making these items completely inappropriate for the participants. Item 1 in mental health (“Are you a very nervous person?”) and item 4 in general health (“I expect my health to get worse”) were incongruent with other items within the same factors; that is, subjects who scored highly on other items scored low for item 1 in mental health and item 4 in general health. This may be because the word “nervous” can be misinterpreted and the word “worse” could have been confusing because it is the only negatively phrased item in the general health factor. This study investigated the validity of the SF-36 for measuring the health status of elderly Koreans by applying the Rasch model. We found that nine items on this scale are inappropriate for measuring the health status of elderly Koreans. Therefore, it is necessary to revise this scale to make it better able to measure the health status of elderly Koreans. Future studies should explore ways of improving the validity of the SF-36 for elderly Koreans.
  7 in total

1.  Comparing holistic and analytic scoring for performance assessment with many-facet Rasch model.

Authors:  E Chi
Journal:  J Appl Meas       Date:  2001

2.  A Rasch model analysis of technology usage in Minnesota hospitals.

Authors:  John R Olson; James A Belohlav; Lori S Cook
Journal:  Int J Med Inform       Date:  2012-02-22       Impact factor: 4.046

3.  The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection.

Authors:  J E Ware; C D Sherbourne
Journal:  Med Care       Date:  1992-06       Impact factor: 2.983

4.  Deriving a preference-based single index from the UK SF-36 Health Survey.

Authors:  J Brazier; T Usherwood; R Harper; K Thomas
Journal:  J Clin Epidemiol       Date:  1998-11       Impact factor: 6.437

5.  Tests of data quality, scaling assumptions, and reliability of the SF-36 in eleven countries: results from the IQOLA Project. International Quality of Life Assessment.

Authors:  B Gandek; J E Ware; N K Aaronson; J Alonso; G Apolone; J Bjorner; J Brazier; M Bullinger; S Fukuhara; S Kaasa; A Leplège; M Sullivan
Journal:  J Clin Epidemiol       Date:  1998-11       Impact factor: 6.437

6.  Is the partial credit model a Rasch model?

Authors:  Robert W Massof
Journal:  J Appl Meas       Date:  2012

7.  Item set discrimination and the unit in the Rasch model.

Authors:  Stephen Humphry
Journal:  J Appl Meas       Date:  2012
  7 in total
  1 in total

1.  Evaluating the Longitudinal Item and Category Stability of the SF-36 Full and Summary Scales Using Rasch Analysis.

Authors:  Reinie Cordier; Ted Brown; Lindy Clemson; Julie Byles
Journal:  Biomed Res Int       Date:  2018-11-04       Impact factor: 3.411

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.