Literature DB >> 15892882

Identification of ambiguities in the 1994 chronic fatigue syndrome research case definition and recommendations for resolution.

Bart Stouten.   

Abstract

BACKGROUND: A recent article by Reeves et al. on the identification and resolution of ambiguities in the 1994 chronic fatigue syndrome (CFS) research case definition recommended the Checklist Individual Strength, the Chalder Fatigue Scale, and the Krupp Fatigue Severity Scale for evaluating fatigue in CFS studies. To be able to discriminate between various levels of severe fatigue, extreme scoring on the individual items of these questionnaires must not occur too often.
METHODS: We derived an expression that allows us to compute a lower bound for the number of items with the maximum item score for a given study from the reported mean scale score, the number of reported subjects, and the properties of the fatigue rating scale. Several CFS studies that used the recommended fatigue rating scales were selected from literature and analyzed to verify whether abundant extreme scoring had occurred.
RESULTS: Extreme scoring occurred on a large number of the items for all three recommended fatigue rating scales across several studies. The percentage of items with the maximum score exceeded 40% in several cases. The amount of extreme scoring for a certain scale varied from one study to another, which suggests heterogeneity in the selected subjects across studies.
CONCLUSION: Because all three instruments easily reach the extreme ends of their scales on a large number of the individual items, they do not accurately represent the severe fatigue that is characteristic for CFS. This should lead to serious questions about the validity and suitability of the Checklist Individual Strength, the Chalder Fatigue Scale, and the Krupp Fatigue Severity Scale for evaluating fatigue in CFS research.

Entities:  

Mesh:

Year:  2005        PMID: 15892882      PMCID: PMC1175848          DOI: 10.1186/1472-6963-5-37

Source DB:  PubMed          Journal:  BMC Health Serv Res        ISSN: 1472-6963            Impact factor:   2.655


Text

Since ambiguities in the 1994 chronic fatigue syndrome (CFS) research case definition [1] do indeed contribute to inconsistenties in the identification of cases, I welcome the publication by Reeves et al. [2] and the authors' efforts to resolve these problems. However, I have to express my deepest concerns about the three instruments that the authors have recommend for measuring fatigue in research studies on CFS. Because all three instruments easily reach the extreme ends of their scales on a large number of the individual items, they do not accurately represent the severe fatigue that is required to satisfy any of the published CFS research case definitions [1,3-5]. This low ceiling effect seriously distorts the fatigue measurements, which will inevitably result in bias and potentially misleading results. To verify that the three recommended instruments do indeed exhibit low ceiling effects, one can study the mean scale scores that are reported in the literature. The recommended instruments were the Checklist Individual Strength (CIS) [6], the Chalder Fatigue Scale [7], and the Krupp Fatigue Severity Scale [8]. Each of these questionnaires consists of a fixed number of questions or statements. The answer to each question or the degree to which the participant agrees with a statement is scored on a certain scale. A question or statement with its corresponding scale is referred to as an item, and the assigned value corresponding to the participant's answer as the item score. A participant's fatigue rating scale score Y is computed by summing his individual item scores. We can derive a lower bound L for the number of items with a maximum score for a given study by combining the reported mean fatigue rating scale score with the properties of the scale. Let us denote the reported number of subjects by n and the mean scale score of these subjects by . We consider instruments that consist of N items, with m possible scores for each item. Each item score is an element of the set {S1, S2,..., S, S}, where S The sum of the item scores of all individuals together is equal to n. Moreover, it is also equal to . Since S Hence, we find that the lower bound L that we were looking for is given by If L should be negative, which happens when is less than N S, then we set L to zero. A lower bound for the percentage of items with the maximum score is . Note that this percentage is independent of the number of subjects in the study. Lower bounds L for the number of items with the maximum score corresponding to data reported in literature were computed for each of the recommended fatigue rating scales. Because a recent Dutch article [9] recommended the Shortened Fatigue Questionnaire (SFQ) for assessing fatigue in clinical practice, this scale was also included in the analysis. The SFQ is simply a reduced version of the CIS 'fatigue severity' subscale, so the two are closely related. At least two articles per fatigue rating scale were selected on a rather arbitrary basis. Subjects fulfilled the CDC88-CFS [3], Oxford-CFS [5], CDC94-CFS [1], or CDC94-UCF (unexplained chronic fatigue, i.e. either CFS or idiopathic chronic fatigue) [1] criteria. In particular, the study by Vercoulen et al. [10] was selected because it contains detailed data on the distribution of the scores for each CIS subscale. The study by Alberts et al. [11] was included because it contained normative data for the SFQ. The study by Vermeulen et al. [12] was selected to also include data on the SFQ from another source than the University Medical Centre Nijmegen. The article by Jason et al. [13] was selected because it was specifically concerned with the reliability and validity of a screening instrument for CFS. A recent Cochrane review [14] has investigated the relative effectiveness of exercise therapy and control treatments for CFS. All four studies that were included in that review and that have already been published [15-18] were analyzed here (one study by Moss-Morris et al. that was included in the review was submitted but not yet published). The other studies were selected because they were easily available to the author. Baseline data for Friedberg and Krupp [19] and Deale et al. [20] were read from the graphs presented in the articles. It is remarked that the 'matched ambulant group' in Van der Werf et al. [21] is a subset of the 'total ambulant group' in that study. Furthermore, the 'research participants' in Van der Werf et al. [22] are the same subjects as the 'total ambulant group' in [21]. The lower bounds for the number of items with the maximum score are presented in Table 1. From the lower bounds listed in the last column of the table we see that for several studies the number of items with the maximum score is larger than 40%. It is emphasized that the lower bounds were derived assuming a worse case scenario for the distribution of the item scores, i.e. participants have either the highest or the second highest possible score on each item. Since the worse case distribution is quite unrealistic, in reality the percentages of items with the maximum score are generally (even) higher than the values reported in the table. For example, according to the table it is not possible to conclude that extreme scoring occurred on the 'physical activity' subscale of the CIS in the study by Vercoulen et al. [10]. However, according to additional data listed in that article the 80th percentile of the 'physical activity' subscale is equal to the maximum possible subscale score of 3 × 7 = 21. Thus approximately 20% of the subjects reached the extreme score on all of their items, from which we can infer that extreme scoring occurred on at least 20% of the items.
Table 1

Lower bounds for the number of items with the maximum score for several studies. N is the number of items that constitute the (sub)scale, Sis the maximum possible individual item score, n is the reported number of subjects, is the reported mean (sub)scale score, and L is the derived lower bound for the number of items with the maximum score. The last column lists a lower bound for the percentage of items with the maximum score based on L. The second highest possible item score S is equal to S1 for all considered (sub)scales.

ScaleNSmnL
Checklist Individual StrengthOxford-CFS, CDC94-UCF; Vercoulen et al. [10]

 -fatigue severity subscale8775851.7280546%
 -physical activity subscale3775816.900%
 -reduced motivation subscale4775817.000%
 -concentration subscale5775827.500%

Checklist Individual StrengthCDC94-UCF; van der Werf et al. [21]

 -homebound group; fatigue severity subscale871853.610170%
 -matched ambulant group; fatigue severity subscale873252.815460%
 -total ambulant group; fatigue severity subscale8727052.1110751%
 -homebound group; physical activity subscale371815.800%
 -matched ambulant group; physical activity subscale373217.000%
 -total ambulant group; physical activity subscale3727017.600%
 -homebound group; concentration subscale571522.400%

Shortened Fatigue Questionnairevan der Werf et al. [22]

 -survey respondents (Dutch ME-Association members)47195523.900%
 -research participants (CDC94-UCF)4727026.156753%

Shortened Fatigue QuestionnaireOxford-CFS, CDC94-UCF; Alberts et al. [11]

 -normative data for CFS4744526 to 2789050%

Shortened Fatigue QuestionnaireCDC94-CFS; Vermeulen et al. [12]
 -study group473524.82820%

Krupp Fatigue Severity ScaleCDC88-CFS; Friedberg et al. [19]

 -treatment group9722588844%
 -no-treatment group97225100%

Krupp Fatigue Severity ScaleCDC88-CFS; DeLuca et al. [23]

 -subjects with concurrent axis 1 psychiatric disorder971258.55450%
 -subjects without concurrent psychiatric disorder972157.26736%

14-item Chalder Fatigue ScaleOxford-CFS; Wearden et al. [15]

 -'exercise and fluoxetine group'1433335.926156%
 -'exercise and placebo group'1433433.719441%
 -'exercise control and fluoxetine group'1433534.422446%
 -'exercise control and placebo group'1433434.020443%

14-item Chalder Fatigue ScaleOxford-CFS; Fulcher et al. [16]

 -exercise group1433328.9306%
 -fiexibility group1433330.58318%

11-item Chalder Fatigue ScaleCDC94-CFS; Jason et al. [13]

 -physical subscale731518.406663%
 -mental subscale43159.131728%

11-item Chalder Fatigue ScaleCDC94-CFS; Wallman et al. [17]

 -exercise group; physical subscale733211.600%
 -exercise group; mental subscale43326.300%
 -relaxation/flexibility group; physical subscale732911.400%
 -relaxation/flexibility group; mental subscale43295.600%

11-item bimodal Chalder Fatigue ScaleOxford-CFS and CDC94-CFS; Deale et al. [20]

 -cognitive behavior therapy group1113010.130392%
 -relaxation group111309.327985%

11-item bimodal Chalder Fatigue ScaleOxford-CFS; Powell et al. [18]

 -control group1113410.636096%
 -minimum intervention group1113710.438595%
 -telephone intervention group111399.938690%
 -maximum intervention group1113810.238893%
Lower bounds for the number of items with the maximum score for several studies. N is the number of items that constitute the (sub)scale, Sis the maximum possible individual item score, n is the reported number of subjects, is the reported mean (sub)scale score, and L is the derived lower bound for the number of items with the maximum score. The last column lists a lower bound for the percentage of items with the maximum score based on L. The second highest possible item score S is equal to S1 for all considered (sub)scales. It should be clear that extreme scoring on a large number of items occurred for all scales across several studies. Only the 'concentration' and 'reduced motivation' subscales of the CIS did not show evidence of extreme scoring. That the amount of extreme scoring for a certain scale varies from one study to another suggests heterogeneity in the selected subjects across studies. Since the studies that were analyzed were selected on a rather arbitrary basis and not in a systematic way, the data in Table 1 should not be regarded as a true reflection of the CFS literature as a whole. The main point is that it does prove that abundant extreme scoring occurred for all the recommended fatigue rating scales in at least some of the CFS studies published in literature. One only needs to glance at the three recommended instruments to understand why extreme scoring occurs so often. The CIS and the Krupp Fatigue Severity Scale consist of statements like "I feel tired" and "I am easily fatigued" that are scored on seven-point scales (from "yes, that is true" to "no, that is not true" for the CIS; from "strongly disagree" to "strongly agree" for the Krupp scale). Thus it does not matter whether a subject feels 'extremely tired,' 'severely tired' or 'just tired,' and is 'easily extremely fatigued,' 'easily severely fatigued' or 'easily fatigued;' he will score on the extreme end of the scale for all these cases. A similar argument applies to the Chalder Fatigue Scale, where the participant has to choose from one of four answers like "less than usual," "no more than usual," "more than usual" and "much more than usual" to questions such as "Do you feel weak?" For the continuous version of the Chalder scale answers are rated from 0 to 3, for the bimodal version the scoring system is {0, 0, 1, 1}. This explains why the binary version performs even worse than the continuous version. Interestingly, the ceiling effect has been noted before by members of the International CFS Study Group in their individual publications: "The CIS-fatigue score [i.e. the 'fatigue severity' subscale of the CIS] involves an overall rating and in CFS samples easily reaches the extreme end of its scale" [21]; "a ceiling effect in the [Krupp] Fatigue Severity Scale may limit its utility to assess severe fatigue-related disability" [24]. A publication that examined the distribution of the 14 items of the Chalder Fatigue Scale in 136 CFS patients found that "Scores on eight items were normally distributed, but six items ('tiredness,' 'resting more,' 'lacking energy,' 'feeling weak,' 'feeling sleepy or drowsy,' and 'starts things without difficulty but gets weaker as goes on') were highly skewed with the majority of patients reaching the maximum score" [25]. Abundant extreme scoring and the corresponding inability to discriminate between various levels of severe fatigue can lead to misleading results in several ways. For example, van der Werf et al. [21] compared a group of 18 homebound CF(S) patients with a group of 32 matched ambulant CF(S) patients. No significant difference was found when fatigue was measured with the CIS 'fatigue severity' subscale (p = 0.39). But when fatigue was measured with the 'Daily Observed Fatigue' scale that does not exhibit such a strong ceiling effect, it was concluded that the homebound group was significantly more fatigued than the ambulant group (p < 0.01). Another problem occurs when studying the relation between the experienced level of fatigue and another factor such as social support. Then the correlation between the two will certainly be distorted if the fatigue measurement has a low ceiling effect and the other measure has not. The most dangerous situation however arises when a scale with low ceiling is used as a primary outcome measure to evaluate a CFS treatment. Consider five patients with a baseline CIS-fatigue score of 52 (e.g. the mean baseline score in Prins et al. [26] was 52.1). Suppose one patient improves (e.g. CIS-fatigue = 16 at follow-up) and the other four patients become extremely fatigued due to treatment (CIS-fatigue = 56 at follow-up, i.e. the maximum scale score). Then still the overall mean has improved from 52 to 48, even though 80% of the subjects are substantially more fatigued after treatment. In particular, participants who already have the maximum scale score at baseline can never get worse according to the 'recommended' fatigue rating scales. Systematic errors that may result in artificial treatment effects opposite to the true situation should be avoided at all times. Unfortunately, the reasons for recommending the CIS, the Krupp and the Chalder scales in the main article text are limited to 'they have been used before,' 'normative data have been collected' and 'receiver-operating characteristics have been published.' In the Author's response to reviews (25 July 2003) that is available on the pre-publication site of the article, the authors remark that these are all 'standardized, validated, internationally accepted instruments' without giving any reference to support this statement. Although the recommended fatigue rating scales might indeed be accepted by numerous scientists of various nationalities, the evidence presented here must lead to serious questions about their validity and suitability for CFS research. Noticeably, the Profile of Fatigue-Related Symptoms (PFRS) that was developed more than a decade ago by Ray et al. [27,28] is a rating scale that does not has the flaw of low ceiling in CFS samples. It consists of the four subscales 'Emotional Distress,' 'Cognitive Difficulty,' 'Fatigue' and 'Somatic Symptoms.' All subscales have high reliability and showed good convergence with comparison measures. Why was the PFRS not included in the authors' advice? To shed some light on the underlying scientific process that has ultimately led to their recommendations, I would like to ask the authors to make the workshop summaries and the focus group reports available. Strictly speaking, the CIS, the Krupp Fatigue Severity Scale and the Chalder Fatigue Scale are all able to discriminate between CFS subjects and healthy subjects. Thus all three might indeed be used to improve the precision of CFS case ascertainment for research studies. However, if one really wishes to take CFS research forwards instead of three steps backwards, then it would be wise to abandon these low ceiling fatigue rating scales and start focussing on instruments that accurately represent the severe fatigue that is currently defined to be so characteristic for CFS.

Competing interests

The author(s) declares that he has no competing interests.

Authors' contributions

BS wrote the paper and performed the analysis.

Pre-publication history

The pre-publication history for this paper can be accessed here:
  21 in total

1.  Cognitive functioning is impaired in patients with chronic fatigue syndrome devoid of psychiatric disease.

Authors:  J DeLuca; S K Johnson; S P Ellis; B H Natelson
Journal:  J Neurol Neurosurg Psychiatry       Date:  1997-02       Impact factor: 10.154

2.  ['Abbreviated fatigue questionnaire': a practical tool in the classification of fatigue].

Authors:  M Alberts; E M Smets; J H Vercoulen; B Garssen; G Bleijenberg
Journal:  Ned Tijdschr Geneeskd       Date:  1997-08-02

3.  Randomised controlled trial of graded exercise in patients with the chronic fatigue syndrome.

Authors:  K Y Fulcher; P D White
Journal:  BMJ       Date:  1997-06-07

4.  Development of a fatigue scale.

Authors:  T Chalder; G Berelowitz; T Pawlikowska; L Watts; S Wessely; D Wright; E P Wallace
Journal:  J Psychosom Res       Date:  1993       Impact factor: 3.006

5.  The chronic fatigue syndrome: a comprehensive approach to its definition and study. International Chronic Fatigue Syndrome Study Group.

Authors:  K Fukuda; S E Straus; I Hickie; M C Sharpe; J G Dobbins; A Komaroff
Journal:  Ann Intern Med       Date:  1994-12-15       Impact factor: 25.391

6.  Cognitive behavior therapy for chronic fatigue syndrome: a randomized controlled trial.

Authors:  A Deale; T Chalder; I Marks; S Wessely
Journal:  Am J Psychiatry       Date:  1997-03       Impact factor: 18.112

7.  Illness perception and symptom components in chronic fatigue syndrome.

Authors:  C Ray; W R Weir; S Cullen; S Phillips
Journal:  J Psychosom Res       Date:  1992-04       Impact factor: 3.006

8.  Randomised, double-blind, placebo-controlled treatment trial of fluoxetine and graded exercise for chronic fatigue syndrome.

Authors:  A J Wearden; R K Morriss; R Mullis; P L Strickland; D J Pearson; L Appleby; I T Campbell; J A Morris
Journal:  Br J Psychiatry       Date:  1998-06       Impact factor: 9.319

9.  Exploring the validity of the Chalder Fatigue scale in chronic fatigue syndrome.

Authors:  R K Morriss; A J Wearden; R Mullis
Journal:  J Psychosom Res       Date:  1998-11       Impact factor: 3.006

10.  A comparison of cognitive behavioral treatment for chronic fatigue syndrome and primary depression.

Authors:  F Friedberg; L B Krupp
Journal:  Clin Infect Dis       Date:  1994-01       Impact factor: 9.079

View more
  9 in total

1.  Fatigue Scales and Chronic Fatigue Syndrome: Issues of Sensitivity and Specificity.

Authors:  Leonard A Jason; Meredyth Evans; Molly Brown; Nicole Porter; Abigail Brown; Jessica Hunnell; Valerie Anderson; Athena Lerch
Journal:  Disabil Stud Q       Date:  2011

2.  The utility of patient-reported outcome measures among patients with myalgic encephalomyelitis/chronic fatigue syndrome.

Authors:  Kyle W Murdock; Xin Shelley Wang; Qiuling Shi; Charles S Cleeland; Christopher P Fagundes; Suzanne D Vernon
Journal:  Qual Life Res       Date:  2016-09-06       Impact factor: 4.147

3.  Classification of myalgic encephalomyelitis/chronic fatigue syndrome by types of fatigue.

Authors:  Leonard A Jason; Aaron Boulton; Nicole S Porter; Tricia Jessen; Mary Gloria Njoku; Fred Friedberg
Journal:  Behav Med       Date:  2010 Jan-Mar       Impact factor: 3.104

4.  Differential diagnosis of chronic fatigue syndrome and major depressive disorder.

Authors:  Caroline Hawk; Leonard A Jason; Susan Torres-Harding
Journal:  Int J Behav Med       Date:  2006

5.  Structural and functional small fiber abnormalities in the neuropathic postural tachycardia syndrome.

Authors:  Christopher H Gibbons; Istvan Bonyhay; Adam Benson; Ningshan Wang; Roy Freeman
Journal:  PLoS One       Date:  2013-12-27       Impact factor: 3.240

6.  Moxibustion upregulates hippocampal progranulin expression.

Authors:  Tao Yi; Li Qi; Ji Li; Jing-Jing Le; Lei Shao; Xin Du; Jing-Cheng Dong
Journal:  Neural Regen Res       Date:  2016-04       Impact factor: 5.135

7.  Comparison of assessment scores for fatigue between multidimensional fatigue inventory (MFI-K) and modified chalder fatigue scale (mKCFQ).

Authors:  Eun-Jin Lim; Chang-Gue Son
Journal:  J Transl Med       Date:  2022-01-03       Impact factor: 5.531

8.  Elevated blood lactate in resting conditions correlate with post-exertional malaise severity in patients with Myalgic encephalomyelitis/Chronic fatigue syndrome.

Authors:  Alaa Ghali; Carole Lacout; Maria Ghali; Aline Gury; Anne-Berengere Beucher; Pierre Lozac'h; Christian Lavigne; Geoffrey Urbanski
Journal:  Sci Rep       Date:  2019-12-11       Impact factor: 4.379

9.  Differential Effects of Exercise on fMRI of the Midbrain Ascending Arousal Network Nuclei in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) and Gulf War Illness (GWI) in a Model of Postexertional Malaise (PEM).

Authors:  James N Baraniuk; Alison Amar; Haris Pepermitwala; Stuart D Washington
Journal:  Brain Sci       Date:  2022-01-05
  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.