Rainer W Alexandrowicz1, Rebecca Jahn2, Johannes Wancata2. 1. Institute for Psychology, Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria. 2. Department of Psychiatry and Psychotherapy, Medical University of Vienna, Vienna, Austria.
Abstract
OBJECTIVES: The CES-D is a widely used depression screening instrument. While numerous studies have analysed its psychometric properties using exploratory and various kinds of confirmatory factor analyses, only few studies used Rasch models and none a multidimensional one. METHODS: The present study applies a multidimensional Rasch model using a sample of 518 respondents representative for the Austrian general population aged 18 to 65. A one-dimensional model, a four-dimensional model reflecting the subscale structure suggested by [1], and a four-dimensional model with the background variables gender and age were applied. RESULTS: While the one-dimensional model showed relatively good fit, the four-dimensional model fitted much better. EAP reliability indices were generally satisfying and the latent correlations varied between 0.31 and 0.88. In the analysis involving background variables, we found a limited effect of the participants' gender. DIF effects were found unveiling some peculiarities. The two-items subscale Interpersonal Difficulties showed severe weaknesses and the Positive Affect subscale with the reversed item wordings also showed unexpected results. CONCLUSIONS: While a one-dimensional over-all score might still contain helpful information, the differentiation according to the latent dimension is strongly preferable. Altogether, the CES-D can be recommended as a screening instrument, however, some modifications seem indicated.
OBJECTIVES: The CES-D is a widely used depression screening instrument. While numerous studies have analysed its psychometric properties using exploratory and various kinds of confirmatory factor analyses, only few studies used Rasch models and none a multidimensional one. METHODS: The present study applies a multidimensional Rasch model using a sample of 518 respondents representative for the Austrian general population aged 18 to 65. A one-dimensional model, a four-dimensional model reflecting the subscale structure suggested by [1], and a four-dimensional model with the background variables gender and age were applied. RESULTS: While the one-dimensional model showed relatively good fit, the four-dimensional model fitted much better. EAP reliability indices were generally satisfying and the latent correlations varied between 0.31 and 0.88. In the analysis involving background variables, we found a limited effect of the participants' gender. DIF effects were found unveiling some peculiarities. The two-items subscale Interpersonal Difficulties showed severe weaknesses and the Positive Affect subscale with the reversed item wordings also showed unexpected results. CONCLUSIONS: While a one-dimensional over-all score might still contain helpful information, the differentiation according to the latent dimension is strongly preferable. Altogether, the CES-D can be recommended as a screening instrument, however, some modifications seem indicated.
According to the Global Burden of Disease 2010 study [2], major depressive disorder (MDD) is one of the leading causes for disability with high prevalence causing a substantial economic burden [3]. Although depression is a highly prevalent illness, it is poorly diagnosed in the general health care setting and in non-psychiatric wards [4,5]. Early detection and treatment could reduce impairment in patients, the burden of relatives, and health care costs. Screening instruments facilitate early and correct diagnosis [6] and are essential for epidemiologic studies. Numerous screening tools are available differing in length, psychometric properties, and target population. Wancata et al. [7] discuss crucial attributes a screening instrument must fulfill to be useful for both epidemiologic studies and primary care settings.The present study focuses on the psychometric properties of the Center of Epidemiologic Studies-Depression Scale [1]. This widely used screening instrument for assessing depressive symptoms frequency in the last week comprises 20 questions. The instrument uses a four-point self-rating response format with the categories 0 = rarely or none of the time (less than 1 day), 1 = some or a little of the time (1–2 days), 2 = occasionally or a moderate amount of time (3–4 days), and 3 = most or all of the time (5–7 days) allowing for a maximum score of 60. For the score across all items, Radloff (1977)[1] suggested a cut-off value of 16 indicating further clinical evaluation. Based on principal components analysis, she determined four factors from the data comprising the dimensions Positive Affect (4 items), Negative Affect (7 items), Somatic Symptoms (7 items), and Interpersonal Difficulties (2 items). Nevertheless, based on the „high internal consistency of the scale found in all groups“, she argued in favour of an overall score to assess „the degree of depressive symptomatology”(p. 398) and against what she considered „undue emphasis on separate factors”(p. 398).However, from a psychometric point of view, one-dimensionality (i.e., all items cover one and the same latent construct) is a prerequisite for a meaningful interpretation of a total score. Internal consistency alone cannot provide sufficient evidence for the one-dimensionality assumption. Rather, we have to apply more complex and–most importantly–empirically testable models to justify such an assumption. For that purpose, we dispose of either the structural equation modelling family (SEM; [8]) with its special case confirmatory factor analysis (CFA; [9]), or a model from the item response theory family (IRT; also termed Rasch models, RM; [10-12]). Although most of these models were already available in 1977, they were not applied by default at that time and expedient software was not in widespread use.
Psychometric analyses of the CES-D
Numerous studies have analysed the psychometric properties of the CES-D with special focus on the question of its latent dimensionality. The most basic approach thereby is to apply an exploratory factor analysis/principal component analysis (EFA/PCA; [13]) to determine the required number of latent factors from the data. This strategy has been chosen, for example, by [14-19]. Resulting solutions ranged from 2 to 5 latent factors.By far more (in fact, most) of the psychometric studies applied a more theory-driven approach by using a confirmatory factor analysis (CFA) in various ways. A combination of EFA and CFA, i.e., exploring and testing, has been applied by [20-27]. These studies applied both the exploratory and the confirmatory factor analysis to the same data sets, thus providing only limited explanatory value regarding the latent dimensionality of the instrument.A “pure” CFA approach which meansformulating a measurement model on substantive considerations (i.e. a supposed subscale structure expressed, basically, by a factor loading matrix) and testing its adequacy against observed data, has been applied by [28-33]. Several studies employed more complex variants of CFA. These were (a) second order CFA (cf. [9]), which assumes a secondary factor behind the (in most cases four) subscale-factors (e.g., [34-43]), (b) multi-group-CFA (MG-CFA, cf. [9]), allowing for testing equality constraints across specific sub-samples, such as gender groups (e.g., [44-59]), or (c) multiple indicator multiple cause (MIMIC; cf. [9]) or BIFACTOR [60] models, explaining items with more than one latent factor [61-64]. This list makes no claim to be complete, but it demonstrates that we dispose of an impressive body of research regarding the CES-D based on various kinds of factor analyses and SEM approaches.In contrast, a much smaller number of studies applied IRT models: For example, Stansbury, et al. [65] applied a Rasch model (RM) to a sample of ~2,500 community-dwelling elderly, finding the reverse scored items (4, 8, 12, and 16) not in line with a one-dimensional latent construct and, therefore, eliminated them. But even the reduced set of 16 items still showed deviations from a uni-dimensional construct. Pickard et al. [66] analysed a sample of 101 stroke and 366 primary care patients with the RM, reporting generally good fit except for five items (2, 11, 15, 17, and 19). Gay et al. [67] applied a Rasch analysis to a sample of 347 adults with HIV/AIDS revealing five items (2, 4, 8, 11, and 16) as problematic; however, even their omission would not improve the overall performance of the scale. Kim and Park [68] found in a convenience sample of 183 Korean stroke survivors items 2, 8, and 11 to misfit the RM. Covic et al. [69] and Covic et al. [70] investigated samples of Rheumatoid Arthritispatients with a RM, promoting a 13-items short-version of the CES-D (omitting items 2, 4, 8, 11, 12, 16, and 18) and rescoring the remaining items to a three categorical response format (merging the two middle categories). Two further studies applying an IRT model to the CES-D [71,72] focussed on linking scores of various depression assessments and were therefore not considered in the present article. Table 1 summarizes problematic items identified in the cited studies.
Table 1
Items considered problematic in studies applying an IRT model to the CES-D.
Bullets indicate items with a significant infit index, bullets and counts in brackets indicate partially problematic items with suspicious thresholds (i.e., significant or outside the critical limits) only.
good
hopeful
happy
enjoy
blues
depressed
failure
cry
sad
appetite
sleep
unfriendly
dislike
Total
Subscale
I
I
I
I
II
II
II
II
II
III
III
IV
IV
Item number
4
8
12
16
3
6
9
17
18
2
11
15
19
[65]
•
•
•
•
4
[66]
•
•
•
•
•
5
[67]
•
•
•
•
•
5
[68]
•
•
2
[69,70]
•
•
•
•
•
•
•
7
Present study– 1 dim
•
•
(•)
(•)
•
•
•
5/(7)
Present study– 4 dim
•
(•)
(•)
(•)
•
(•)
•
3/6
Total
5
5/(6)
2/(3)
3/(5)
(1)
2
(1)
1
2
3
6
1
1
Items considered problematic in studies applying an IRT model to the CES-D.
Bullets indicate items with a significant infit index, bullets and counts in brackets indicate partially problematic items with suspicious thresholds (i.e., significant or outside the critical limits) only.
Research question
These results of the IRT analyses indicate that a one-dimensional model seems to not adequately describe the data generating mechanism. The often applied CFA approach allows already for a multidimensional analysis (and the results of these studies support indeed a multi-dimensional structure of the CES-D), however, the CFA model has been originally developed for interval scaled data, assuming linear relationships and a multivariate normal distribution. Although extensions covering ordered categorical data and non-normality exist, the IRT family of models is specifically designed for (ordered) categorical data as we obtain from questionnaires like the CES-D. Amongst others, the IRT approach allows for a detailed analysis of items and item categories, specifically taking into account the categorical response format (for a direct comparison of the various approaches see [73]). To the authors’ knowledge, the CES-D has so far not been analysed with a multidimensional IRT model. Moreover, the present study is the first to also take background variables into account.Misfit of models applied so far could very well be due to the fact that the CES-D has been applied in specific populations (HIV/AIDS, community-dwelling elderly, stroke & primary care patients, and stroke survivors), although it has originally been designed for “general population surveys” [1] (p. 386]. Hence, the results obtained so far are of limited value, as it remains unclear, whether they also apply to the general population. To shed light on this open question, the present study uses a representative sample from the general population. To the authors’ knowledge, this study is the first one analyzing the CES-D on the basis of a representative sample using a multi-dimensional IRT model.
Methods
Sample
The sample consisted of 518 respondents randomly selected from a large Austrian address broker’s data base of phone numbers covering approximately 75% of the Austrian population according to the seller’s information. The sample covered persons aged 18–65 years. Because no population register is available to us, a simple random sample would not be feasible and we applied a complex sampling scheme: Austria has 9 provinces, which have key responsibilities in certain public health issues relevant to our research question. Therefore, we decided to represent them accordingly in the sample by stratification. As the data collection is based on face-to-face interviews, the routes to the households have to be taken into account. Therefore, we used within each stratum a cluster sampling scheme based on districts, which are available in the data base. Based on logistic and financial capabilities, we decided to sample a total of 40 districts, which were drawn at random taking proper shares of urban vs. rural regions into account. The required number of respondents per district was determined proportionally to the respective gender shares and district sizes. The resulting number of male and female respondents per district was drawn at random from the districts addresses in the data base. The sample size has been chosen in line with general recommendation, for example as given by [74], stating that 500 establishes a “Size for most purposes” even under “Adverse Circumstanes” (p. 328).First, a notification letter informing about study aims and processes was sent to the selected respondents. Then, study workers called each person by phone and asked for permission to visit them for performing the interviews and filling out the questionnaire. Those agreeing to the interview were visited at home. Persons, who were not reached (e.g., due to change of address or phone number) or refused study participation were replaced by further addresses from a back-up list sampled in the same way as the primary list.
Assessments
Psychiatric case identification was performed by using the SCAN 2.0, the Schedules for Clinical Assessment in Neuropsychiatry [75]. The SCAN is a semi-structured clinical interview designed for use by psychiatrists and clinical psychologists. Every symptom in SCAN is defined in detail [76] and wording is suggested for eliciting each symptom. However, interviewers had to continue inquiring until they dispose of sufficient information to decide whether or not symptom definitions were fulfilled. Its feasibility and reliability have been tested in international field trials [75]. Diagnoses were given according to ICD-10 [77] using a computer algorithm provided for SCAN. Only current disorders (occurring during the 4 weeks before interview) were evaluated in the present study. Eleven psychologists were recruited as interviewers, who were trained by experienced staff from one of the WHO-designated SCAN training centres. All interviewers performed several pilot SCAN interviews before data collection started.Study participants could decide whether they wanted to start with the questionnaire or the research interview. Either way, interviewers were not aware of the CES-D results. Study participants were included only if they had signed the informed consent. The study was approved by the Ethics Committee of the Medical University of Vienna.
Model
The response format of the CES-D provides four categories requiring a polytomous version of the Rasch model. One frequently applied model of this kind is the partial credit model (PCM; [78]). However, the PCM is a one-dimensional Rasch Model, i.e., we cannot describe more than one subscale at a time. We also dispose of multidimensional IRT models, which assume more than one latent dimension to generate the responses (cf. [79]). A versatile multidimensional formulation is the multidimensional random coefficients multinomial logit model (MRCMLM; [80]). It covers multidimensionality and allows for controlling for background variables, which each latent factor can be regressed upon. We used a between-item-multidimensional formulation, i.e., each item is associated with exactly one latent factor (cf.[79]). Our analysis strategy was to apply first a one-dimensional model and contrast it to (a) the four-dimensional model and (b) the four-dimensional with background variables. Finally, we performed a differential item functioning analysis (DIF; [81]) to identify potentially problematic items.For assessing model fit, we use the infit measure [82, 83], the ideal value of which is one. Values larger than one indicate an increasing amount of responses differing from what the model would predict. Values below one indicate responses showing lesser variability than expected critical limits for the infit measure were chosen at 0.7 and 1.3 (cf.[84]). Further, the MRCMLM provides the EAP reliability index (based on Expected A Posteriori parameter estimates, cf. [85,86]) for each latent scale, which can be seen as an equivalent to the classical reliability measure, but for Rasch models; its value should be close to one. For comparing models we use the information based indices AIC [87], the bias corrected AIC (AICc; [88,89]), the bayesian information criterion (BIC; [90]), the adjusted BIC (aBIC; [91]), and the consistent AIC (CAIC; [92]). Information based indices allow for comparing competing models applied to the same data set, with smaller values indicating better over-all model fit. Moreover, we compare nested models with the likelihood ratio test (LRT; [93]).We used R [94] for all calculations and graphics and the R-package Test Analysis Module (TAM; [95]) for the MRCMLM. A critical alpha of 5% (0.05) was applied for inferential assessment.
Results
Sample description
Our sample consisted of 518 participants aged 21 to 67 years (M = 46.6, SD = 13.3); 264 (51%) of them were female. Regarding education, 238 (46,1%) had a university entrance diploma (termed “Matura” in Austria) and 24 (4.6%) were still in education. Thirty-six respondents (6.9%) declared to be unemployed while 364 (70.3%) were employed.
The one-dimensional model
First, a one-dimensional Rasch model for polytomous data (i.e., a PCM) was applied. This model constitutes the reference model, against which the more complex approaches will be tested. The EAP reliability index of the latent scale of this model was 0.795.Fig 1 shows the person-item-map ([83]; a detailed treatment give [96]) of the one-dimensional model. The horizontal axis denotes the latent dimension representing “over-all”-depression (in contrast to the specific depression facets in the next model). From the histogram in the upper part we learn that the majority of the sample exhibits low depression values. In contrast, we find the majority of the thresholds in the higher regions of this latent dimension, indicating that only respondents with higher depression values are likely to choose the according response categories. Especially for items 2 (appetite), 9 (failure), 10 (fearful), 15 (unfriendly), and 19 (dislike), even the threshold between categories 0 and 1 is located considerably high. This means that these items are “difficult”from a psychometric point of view thus requiring a higher latent score to endorse them. Accordingly, the thresholds of the subscale I (Positive Affect), i.e., items 4 (good), 8 (hopeful), 12 (happy), and 16 (enjoy), are located in the lower regions of the latent dimension. One peculiarity becomes evident: The thresholds of items 3 (blues), 4 (good), and 9 (failure) are considerably close to each other indicating that these items do not differentiate very much across the latent dimension.
Fig 1
Person-item map of the one-dimensional model.
The upper part shows the histogram of the person parameter distribution and the lower plot the location of the Thurstonian thresholds, both sharing the same metric. The red lines in the lower diagram indicate the average threshold of each item, constituting a measure of the “difficulty”of this item. Items are sorted according to subscales as indicated by Radloff (1977).
Person-item map of the one-dimensional model.
The upper part shows the histogram of the person parameter distribution and the lower plot the location of the Thurstonian thresholds, both sharing the same metric. The red lines in the lower diagram indicate the average threshold of each item, constituting a measure of the “difficulty”of this item. Items are sorted according to subscales as indicated by Radloff (1977).Fig 2 shows the infit measures and the thresholds of the 20 CES-D items. Most of the values appear in the vicinity of 1, hence, the global impression is good. However, some items show peculiarities: The four items of subscale I show elevated item infit with statistically significantly deviating thresholds; thresholds 2 and 3 of the items 4 (good) and 8 (hopeful) are significant and three of them also lie above the upper limit of 1.3; further, thresholds 1 of items 12 (happy) and 18 (enjoy) are below the ideal value of 1 and were significant. In subscale II, item 6 (depressed) was close to the lower limit and significant; its first threshold was significant as well. The same applies to item 18 (sad). Finally, in subscale III, item 11 (sleep) was larger than 1 and significant.
Fig 2
Infit measures of the one-dimensional model.
Notes: The bold line shows the item infit with bullets indicating significant values. The dotted lines indicate the infit values of the three thresholds (labelled with 1, 2, and 3; slightly horizontally displaced for better readability).The bold horizontal line indicates the ideal value of 1 and the two dashed horizontal lines the limits of acceptability (0.7 to 1.3). Numbers in circles indicate significant thresholds (note that the significance also depends on the standard error of the respective estimate, hence, significant values need not be located outside the acceptability limits and similar values need not be significant at the same time). The (r) indicates that the item codings had to be reversed prior to evaluation, because these items were positively worded. The items along the horizontal axis are sorted according to the four subscales with dotted vertical lines showing the subscale blocks with their original number in brackets.
Infit measures of the one-dimensional model.
Notes: The bold line shows the item infit with bullets indicating significant values. The dotted lines indicate the infit values of the three thresholds (labelled with 1, 2, and 3; slightly horizontally displaced for better readability).The bold horizontal line indicates the ideal value of 1 and the two dashed horizontal lines the limits of acceptability (0.7 to 1.3). Numbers in circles indicate significant thresholds (note that the significance also depends on the standard error of the respective estimate, hence, significant values need not be located outside the acceptability limits and similar values need not be significant at the same time). The (r) indicates that the item codings had to be reversed prior to evaluation, because these items were positively worded. The items along the horizontal axis are sorted according to the four subscales with dotted vertical lines showing the subscale blocks with their original number in brackets.
The four-dimensional model
Next, we applied a four-dimensional model according to the item allocation as proposed by [1]. The EAP reliability indices for the 4 latent dimensions were 0.699 for Positive Affect (henceforth termed subscale I), 0.730 for Negative Affect (subscale II), 0.727 for Somatic Symptoms (III), and 0.451 for Interpersonal Difficulties (IV). Table 2 lists the information based indices indicating that the four-dimensional model describes the data better than the one-dimensional model.
Table 2
Information based fit indices for the one- and the four-dimenensional model.
1-Dim
4-Dim
log Lik
–6464.67
–6450.43
num Par
70
78
AIC
13450
13069
AICc
13467
13092
BIC
13709
13366
aBIC
13515
13144
CAIC
13770
13436
Also, the direct model comparison via the likelihood ratio test (LRT) identified the four-dimensional model to fit the data significantly better than the one-dimensional one (χ2 = 398.77; df = 9; p < 1e–10). Fig 3 shows the person-item-map of the four-dimensional model.
Fig 3
Person-item-map of the 4-dimensional model.
The upper part of the plot shows the histogram of the person parameter estimates for each of the four subscales. The colors indicate the subscales. For further notes see Fig 1.
Person-item-map of the 4-dimensional model.
The upper part of the plot shows the histogram of the person parameter estimates for each of the four subscales. The colors indicate the subscales. For further notes see Fig 1.The histogram of the person parameter estimates shows again that most respondents exhibit low values of depression, with subscale II (Negative Affect) covering a wider range than the other three subscales. The item category thresholds show a similar pattern as in the one-dimensional case. However, the thresholds of the four-dimensional model cover a much broader range of values. Nevertheless, items 4 (good) and 8 (hopeful) still show thresholds considerably close to each other, which means that these two items still do not discriminate very well across the spectrum of depression, i.e. respondents chose predominantly either category 0 (not at all) or category 3 (all the time).Fig 4 shows the infit indices for the 20 CES-D items. Again, we find a few peculiarities in scale I, yet to a lesser degree: The thresholds 2 and 3 of item 4 (good) and threshold 3 of item 8 (hopeful) are still significant, but the infit measure is below the critical limit of 1.3. Interestingly, now the items 3 (blues), 9 (failure), and 10 (fearful) show infit measures above the critical limit of 1.3. Again, item 6 (depressed) and item 11 (sleep) have thresholds deviating significantly from the ideal value of 1.
Fig 4
Infit measures of the four-dimensional model.
For notes see Fig 2.
Infit measures of the four-dimensional model.
For notes see Fig 2.Table 3 shows the correlation matrix of the four latent dimensions (main diagonal entries denote the variances of each latent dimension).
Table 3
Correlations of the latent subscales.
Note: The entries in the main diagonal (italicized) are the variance of each subscale.
Pos.Aff.
Neg.Aff.
Som.Symp.
Interpers.Diff.
Pos.Aff.
0.911
Neg.Aff.
0.554
1.912
Som.Symp.
0.586
0.875
1.165
Interpers.Diff.
0.310
0.595
0.477
1.905
Correlations of the latent subscales.
Note: The entries in the main diagonal (italicized) are the variance of each subscale.The highest correlation was found between Negative Affect and Somatic Symptoms (.88) while the weakest correlation occurred between Positive Affect and Interpersonal Difficulties (.31); the remaining correlation coefficients were mediocre (between 0.48 and 0.59).
The four-dimensional model with background variables
Finally, the multidimensional model has been extended by the two background variables gender and age. Regarding model-fit, we find the person-item-map almost identical to that of the four-dimensional model without background variables (therefore not presented here; the same applies to the infit plot; interested readers can request a copy of these plots from the authors). The EAP reliability indices for the four latent dimensions were marginally better than for the previous model (I: 0.702; II: 0.740; III: 0.730; IV: 0.455). A direct model comparison using information based indices or the LRT is not possible, because this model was applied to a different data set (with the two background variables added).The most interesting results of this model are the regression coefficients of the two background variables upon the four latent dimensions (see Table 4).
Table 4
Regression coefficients of the latent background model.
Pos. Aff.
Neg. Aff.
Som. Symp.
Int.Diff.
Intercept
0
0
0
0
Gender
-0,11325
0,72233
0,24627
0,43946
Age
0,00445
-0,00421
-0,00986
-0,00365
Regarding the impact of gender upon the subscales, we find two effects for the latent dimensions Negative Affect and Interpersonal Difficulties. In contrast, the respondents’ age did not reveal any notable influence. From these results, we learn that gender but not age seems to play a role for the CES-D. This will be pursued further in the following DIF-analysis, which delivers more detailed insights.
DIF analysis
We split the sample according to gender on the one hand and a diagnosis of depression within the last month as split criteria for the DIF analysis–the former, because it proved to be influential as background variable, and the latter, because the CES-D has been developed to measure depression in the general population. Therefore, it is of particular interest, if there are items operating different in depressedpeople than in non-depressed-ones. We used the four-dimensional model without background variables for the DIF analyses, because controlling for gender or depression would eliminate possible effects we are looking for in this analysis step.First, we will focus upon the global DIF-effect. Here, we find a weak general DIF-effect for gender (global effect parameter –0.103; 95% CI = -0.13/-0.07), i.e., women were slightly (but significantly) more likely to endorse all items. Because such an over-all effect is little informative, we turn to an item-wise analysis. Fig 5 presents the item-wise DIF-effects according to gender (solid line).
Fig 5
Differential item functioning due to gender and depression.
Notes: The dots represent the DIF-Effect, i.e., the item parameter difference between the two groups; error bars indicate the 95% confidence interval; Bullets indicate a significant DIF-Effect for the respective item. Values below 0 means that the item is rather preferred by men, items with DIF-values above 0 are rather preferred by women. The solid line indicates DIF according to sex and the dashed line indicates DIF according to depression. In the latter case, item 2 had to be omitted due to technical reasons (see text). For better readability, the two curves were horizontally displaced.
Differential item functioning due to gender and depression.
Notes: The dots represent the DIF-Effect, i.e., the item parameter difference between the two groups; error bars indicate the 95% confidence interval; Bullets indicate a significant DIF-Effect for the respective item. Values below 0 means that the item is rather preferred by men, items with DIF-values above 0 are rather preferred by women. The solid line indicates DIF according to sex and the dashed line indicates DIF according to depression. In the latter case, item 2 had to be omitted due to technical reasons (see text). For better readability, the two curves were horizontally displaced.Seven items show a significant yet moderate DIF effect. The Positive Affect subscale is affected the most with three out of four items (hopeful, happy, enjoy) showing DIF in favour of men (i.e., men are more likely to endorse these items than women). There is a DIF-effect in favour of women for two of the Negative Affect subscale items (failure, cry) and in favour of men for two items of the Somatic Symptoms subscale (appetite, talk).For the second DIF analysis, we split the sample into respondents with vs. without a diagnosis of depression according to SCAN. Other diagnoses were excluded for this step, resulting in a slight sample reduction (n = 452). Item 2 (appetite) had to be excluded from the analysis for technical reasons (response category 3 did not occur in the reduced sample). There was a global effect with depressed respondents more likely endorsing all items. (effect parameter -0.656; 95% CI = –0.62/–0.69). Fig 5 shows the item-wise DIF-effects (dashed line). We find significant effects for 10 items: For depressed respondents, it was more difficult to endorse items 4 (good), 8 (hopeful), 13 (talk), 15 (unfriendly), and 19 (dislike) and more easy to endorse items 3 (blues), 6 (depressed), 9 (failure), 10 (fearful), and 7 (effort). Although most of these effects were statistically significant, they can be considered small from a substantive perspective. The largest effect was observed for items 15 (unfriendly), 19 (dislike), and 13 (talk), which were more difficult to endorse for respondents fulfilling depression criteria.
Discussion
The present study analysed the CES-D with a multi-dimensional IRT model in a sample representative for the general population. A one-dimensional solution was contrasted to a four-dimensional model reflecting the subscales as asserted by [1]. Interestingly, the fit of the one-dimensional model was already considerably good. Only item 1 (bothered) showed an infit value outside the usual limits of acceptability, and a few thresholds of the remaining items reached statistical significance. The EAP reliability measure of this model was 0.8, which can be regarded as fairly satisfying. Hence, we can conclude that an overall-score would deliver quite useful information. This finding supports the view of Radloff [1] advocating the use of the total score of the CES-D, however, based on a much more elaborated methodological foundation. This could be advantageous, for example, when using the CES-D as a screening instrument in a multistep diagnostic process, where a single total score with a certain cut-off value would be preferable.However, the fit of the four-dimensional model was by far (and significantly) better than the fit of the one-dimensional model. It is also in line with the meta study of Shafer (2006), who also found “strongest support (…) for the four-factor structure of the CES-D” [97] (p. 136). The reliability coefficients of the subscales revealed that subscales I, Positive Affect, II, Negative Affect, and III, Somatic Symptoms achieve values in the vicinity of 0.7, which is satisfying, while subscale IV, Interpersonal Difficulties was mediocre at best (0.45). When comparing reliability indices of the four- and the one-dimensional model, we have to keep in mind that reliability depends–amongst other things–on scale length as well. In the one-dimensional model, a common scale is built from all 20 items, while the subscales of the four-dimensional model are much shorter, therefore, the subscale indices are lower for technical reasons. Taking this into account, we consider the reliability indices of the subscales I-III as sufficiently high. The poor result of subscale IV implies that two items would not suffice to establish a meaningful subscale. Such short scales are rather useful for screenings in the first step of a two-step screening procedure fostering a decision regarding further diagnostic procedures (cf. [98-100]). However, they are hardly suitable for the quantitative assessment of a trait. In the present case, Interpersonal Difficulties–which is a rather complex construct–would be measured with a score consisting of two items and a total value ranging from 0 to 6. Hence, the interpretation of this scale is very limited and should be done with great caution (if at all).Comparing the present results to those of the previously reported IRT-based studies, we find largely agreeing and some interesting new results: Generally, the one-dimensional model rendered seven items suspicious (five with significant infit plus two with significant thresholds only), whereas the four-dimensional model only showed significant infit for three items and suspicious thresholds for another 3 items. This is in line with the previous results, again showing the four-dimensional model to be superior to the one-dimensional model. We will, therefore, focus on this model in the discussion of item fit: Regarding subscale I, Positive Affect, item 4 (good) proved most problematic, as not only was its infit measure significant, but also thresholds 2 and 3. Items 8 (hopeful) and 16 (enjoy) had one problematic threshold each. Interestingly, item 12 (happy) worked well here, in contrast to [66] and [70, 71]. For subscale II, Negative Affect, we find diverging results, as the suspicious items 3 (blues), 6 (depressed), and 9 (failure) have not been reported problematic in the previous studies. Taking into consideration that these items cover core symptoms of depression, our results might reflect the different populations in which the CES-D was used. Our study covered the general population, where these statements may play a different role compared to the specific populations reported in the previous studies. The DIF analysis discussed below will shed further light on this issue. For subscale III, Somatic Symptoms, the situation is fairly clear: Item 11 (sleep) was suspicious, which is in line with four out of the five reported studies. In contrast, item 2 (appetite) was inconspicuous in contrast to [67,68,70,71]. Interestingly, the infit measures of the two items of subscale IV, Interpersonal Difficulties, were satisfying in our study. Further details regarding the results and the discussion of our analyses can be found in the online supplemental material S1 File.As a limitation, we have to take into account that the sample relies on a phone number data base, which will not cover the entire population of a country. Therefore, slight peculiarities may still exist. However, we consider this limitation tolerable for two reasons: First, it is unlikely that our results are severely biased as the data base still covers an enormous portion of the entire population. Second, Rasch models are “sample independent” [101], which, in short, describes the fact that item parameter estimates do not depend on the person parameter distribution and vice versa [102,103]. We therefore regard our results as dependable.Concluding, we can state that the one-dimensional modelling approach proved clearly inferior to the multidimensional one. This is in line with previous studies: For example, Gay et al. [67] also used the PCM approach and found violations of the one-dimensionality assumption for all 20 items of the CES-D. Moreover, we found subscale IV, Interpersonal Difficulties, to exhibit severe limitations from a psychometric point of view. Therefore, it should be handled with care. Apart from that and a few limitations deserving further elaboration, analyses of the subscales yielded convincing results supporting the subscale structure of the CES-D. Therefore, although not entirely dismissing the overall score, we advocate the use of a subscale based interpretation due to its superior psychometric qualities.
Authors: Theo Vos; Abraham D Flaxman; Mohsen Naghavi; Rafael Lozano; Catherine Michaud; Majid Ezzati; Kenji Shibuya; Joshua A Salomon; Safa Abdalla; Victor Aboyans; Jerry Abraham; Ilana Ackerman; Rakesh Aggarwal; Stephanie Y Ahn; Mohammed K Ali; Miriam Alvarado; H Ross Anderson; Laurie M Anderson; Kathryn G Andrews; Charles Atkinson; Larry M Baddour; Adil N Bahalim; Suzanne Barker-Collo; Lope H Barrero; David H Bartels; Maria-Gloria Basáñez; Amanda Baxter; Michelle L Bell; Emelia J Benjamin; Derrick Bennett; Eduardo Bernabé; Kavi Bhalla; Bishal Bhandari; Boris Bikbov; Aref Bin Abdulhak; Gretchen Birbeck; James A Black; Hannah Blencowe; Jed D Blore; Fiona Blyth; Ian Bolliger; Audrey Bonaventure; Soufiane Boufous; Rupert Bourne; Michel Boussinesq; Tasanee Braithwaite; Carol Brayne; Lisa Bridgett; Simon Brooker; Peter Brooks; Traolach S Brugha; Claire Bryan-Hancock; Chiara Bucello; Rachelle Buchbinder; Geoffrey Buckle; Christine M Budke; Michael Burch; Peter Burney; Roy Burstein; Bianca Calabria; Benjamin Campbell; Charles E Canter; Hélène Carabin; Jonathan Carapetis; Loreto Carmona; Claudia Cella; Fiona Charlson; Honglei Chen; Andrew Tai-Ann Cheng; David Chou; Sumeet S Chugh; Luc E Coffeng; Steven D Colan; Samantha Colquhoun; K Ellicott Colson; John Condon; Myles D Connor; Leslie T Cooper; Matthew Corriere; Monica Cortinovis; Karen Courville de Vaccaro; William Couser; Benjamin C Cowie; Michael H Criqui; Marita Cross; Kaustubh C Dabhadkar; Manu Dahiya; Nabila Dahodwala; James Damsere-Derry; Goodarz Danaei; Adrian Davis; Diego De Leo; Louisa Degenhardt; Robert Dellavalle; Allyne Delossantos; Julie Denenberg; Sarah Derrett; Don C Des Jarlais; Samath D Dharmaratne; Mukesh Dherani; Cesar Diaz-Torne; Helen Dolk; E Ray Dorsey; Tim Driscoll; Herbert Duber; Beth Ebel; Karen Edmond; Alexis Elbaz; Suad Eltahir Ali; Holly Erskine; Patricia J Erwin; Patricia Espindola; Stalin E Ewoigbokhan; Farshad Farzadfar; Valery Feigin; David T Felson; Alize Ferrari; Cleusa P Ferri; Eric M Fèvre; Mariel M Finucane; Seth Flaxman; Louise Flood; Kyle Foreman; Mohammad H Forouzanfar; Francis Gerry R Fowkes; Richard Franklin; Marlene Fransen; Michael K Freeman; Belinda J Gabbe; Sherine E Gabriel; Emmanuela Gakidou; Hammad A Ganatra; Bianca Garcia; Flavio Gaspari; Richard F Gillum; Gerhard Gmel; Richard Gosselin; Rebecca Grainger; Justina Groeger; Francis Guillemin; David Gunnell; Ramyani Gupta; Juanita Haagsma; Holly Hagan; Yara A Halasa; Wayne Hall; Diana Haring; Josep Maria Haro; James E Harrison; Rasmus Havmoeller; Roderick J Hay; Hideki Higashi; Catherine Hill; Bruno Hoen; Howard Hoffman; Peter J Hotez; Damian Hoy; John J Huang; Sydney E Ibeanusi; Kathryn H Jacobsen; Spencer L James; Deborah Jarvis; Rashmi Jasrasaria; Sudha Jayaraman; Nicole Johns; Jost B Jonas; Ganesan Karthikeyan; Nicholas Kassebaum; Norito Kawakami; Andre Keren; Jon-Paul Khoo; Charles H King; Lisa Marie Knowlton; Olive Kobusingye; Adofo Koranteng; Rita Krishnamurthi; Ratilal Lalloo; Laura L Laslett; Tim Lathlean; Janet L Leasher; Yong Yi Lee; James Leigh; Stephen S Lim; Elizabeth Limb; John Kent Lin; Michael Lipnick; Steven E Lipshultz; Wei Liu; Maria Loane; Summer Lockett Ohno; Ronan Lyons; Jixiang Ma; Jacqueline Mabweijano; Michael F MacIntyre; Reza Malekzadeh; Leslie Mallinger; Sivabalan Manivannan; Wagner Marcenes; Lyn March; David J Margolis; Guy B Marks; Robin Marks; Akira Matsumori; Richard Matzopoulos; Bongani M Mayosi; John H McAnulty; Mary M McDermott; Neil McGill; John McGrath; Maria Elena Medina-Mora; Michele Meltzer; George A Mensah; Tony R Merriman; Ana-Claire Meyer; Valeria Miglioli; Matthew Miller; Ted R Miller; Philip B Mitchell; Ana Olga Mocumbi; Terrie E Moffitt; Ali A Mokdad; Lorenzo Monasta; Marcella Montico; Maziar Moradi-Lakeh; Andrew Moran; Lidia Morawska; Rintaro Mori; Michele E Murdoch; Michael K Mwaniki; Kovin Naidoo; M Nathan Nair; Luigi Naldi; K M Venkat Narayan; Paul K Nelson; Robert G Nelson; Michael C Nevitt; Charles R Newton; Sandra Nolte; Paul Norman; Rosana Norman; Martin O'Donnell; Simon O'Hanlon; Casey Olives; Saad B Omer; Katrina Ortblad; Richard Osborne; Doruk Ozgediz; Andrew Page; Bishnu Pahari; Jeyaraj Durai Pandian; Andrea Panozo Rivero; Scott B Patten; Neil Pearce; Rogelio Perez Padilla; Fernando Perez-Ruiz; Norberto Perico; Konrad Pesudovs; David Phillips; Michael R Phillips; Kelsey Pierce; Sébastien Pion; Guilherme V Polanczyk; Suzanne Polinder; C Arden Pope; Svetlana Popova; Esteban Porrini; Farshad Pourmalek; Martin Prince; Rachel L Pullan; Kapa D Ramaiah; Dharani Ranganathan; Homie Razavi; Mathilda Regan; Jürgen T Rehm; David B Rein; Guiseppe Remuzzi; Kathryn Richardson; Frederick P Rivara; Thomas Roberts; Carolyn Robinson; Felipe Rodriguez De Leòn; Luca Ronfani; Robin Room; Lisa C Rosenfeld; Lesley Rushton; Ralph L Sacco; Sukanta Saha; Uchechukwu Sampson; Lidia Sanchez-Riera; Ella Sanman; David C Schwebel; James Graham Scott; Maria Segui-Gomez; Saeid Shahraz; Donald S Shepard; Hwashin Shin; Rupak Shivakoti; David Singh; Gitanjali M Singh; Jasvinder A Singh; Jessica Singleton; David A Sleet; Karen Sliwa; Emma Smith; Jennifer L Smith; Nicolas J C Stapelberg; Andrew Steer; Timothy Steiner; Wilma A Stolk; Lars Jacob Stovner; Christopher Sudfeld; Sana Syed; Giorgio Tamburlini; Mohammad Tavakkoli; Hugh R Taylor; Jennifer A Taylor; William J Taylor; Bernadette Thomas; W Murray Thomson; George D Thurston; Imad M Tleyjeh; Marcello Tonelli; Jeffrey A Towbin; Thomas Truelsen; Miltiadis K Tsilimbaris; Clotilde Ubeda; Eduardo A Undurraga; Marieke J van der Werf; Jim van Os; Monica S Vavilala; N Venketasubramanian; Mengru Wang; Wenzhi Wang; Kerrianne Watt; David J Weatherall; Martin A Weinstock; Robert Weintraub; Marc G Weisskopf; Myrna M Weissman; Richard A White; Harvey Whiteford; Steven T Wiersma; James D Wilkinson; Hywel C Williams; Sean R M Williams; Emma Witt; Frederick Wolfe; Anthony D Woolf; Sarah Wulf; Pon-Hsiu Yeh; Anita K M Zaidi; Zhi-Jie Zheng; David Zonies; Alan D Lopez; Christopher J L Murray; Mohammad A AlMazroa; Ziad A Memish Journal: Lancet Date: 2012-12-15 Impact factor: 79.321
Authors: Ted C T Fong; Cecilia L W Chan; Rainbow T H Ho; Jessie S M Chan; Celia H Y Chan; S M Ng Journal: Qual Life Res Date: 2015-08-18 Impact factor: 4.147