| Literature DB >> 31266488 |
Regina Kunz1, David Y von Allmen2, Renato Marelli3,4, Ulrike Hoffmann-Richter5,6, Joerg Jeger7, Ralph Mager3,8, Etienne Colomb9, Heinz J Schaad10, Monica Bachmann2, Nicole Vogel2, Jason W Busse11,12, Martin Eichhorn13, Oskar Bänziger14, Thomas Zumbrunn2, Wout E L de Boer2, Katrin Fischer15.
Abstract
BACKGROUND: Expert psychiatrists conducting work disability evaluations often disagree on work capacity (WC) when assessing the same patient. More structured and standardised evaluations focusing on function could improve agreement. The RELY studies aimed to establish the inter-rater reproducibility (reliability and agreement) of 'functional evaluations' in patients with mental disorders applying for disability benefits and to compare the effect of limited versus intensive expert training on reproducibility.Entities:
Keywords: Disability evaluation; Evidence-based medicine; Observer variation; Reproducibility of results; Return to work; Social security; Work capacity evaluation
Year: 2019 PMID: 31266488 PMCID: PMC6607597 DOI: 10.1186/s12888-019-2171-y
Source DB: PubMed Journal: BMC Psychiatry ISSN: 1471-244X Impact factor: 3.630
Inter-rater variability: Expectation of stakeholders. ‘Maximum acceptable difference’ in work capacity (WC) ratings between two experts performing a psychiatric evaluation in the same patient [6]
| What is the maximum difference in WC ratings that stakeholders would find acceptable when two experts independently assess the same patient? | Lawyers | Psychiatrists | Experts | Judges | Insurers |
|---|---|---|---|---|---|
| … in the current situation of performing evaluations, median difference (interquartile range, IQR) | 15% (10–20%) | 20% (10–25%) | 20% (10–25%) | 15% (10–20%) | 10% (10–20%) |
Legend: WC: work capacity; % WC = absolute percentage points in work capacity
How to interpret this table?
• 75% of treating and expert psychiatrists felt that the ‘maximum acceptable difference’ in WC ratings between two experts should be 25% corresponding to the upper limit of the IQR
• 75% of lawyers, judges and insurers and 50% of treating and expert psychiatrists felt that the ‘maximum acceptable difference’ in WC ratings between two experts should be 20% WC corresponding to the upper limit of the IQR (jurists) or the median (psychiatrists)
• 50% of lawyers, judges and insurers felt that the ‘maximum acceptable difference’ in WC ratings between two experts should be 15% corresponding to the median
• 25% of all stakeholders felt that the ‘maximum acceptable difference’ in WC ratings between two experts should be 10% corresponding to the lower limit of the IQR
Characteristics of psychiatrists and patients. Characteristics of psychiatrists and patients, including the main diagnoses of the patients’ mental disorder(s) with impact on work capacity. In RELY 1 (RELY 2), six (seven) patients had been assigned two main diagnoses
| RELY 1 | RELY 2 | |
|---|---|---|
| Age | ||
| 31–40/ 41–50/ 51–60/ > 60 years/ missing | 5/ 42/ 21/ 32/ 0%c | 3/ 40/ 31/ 20/ 6% |
| Gender | ||
| male | 79% | 83% |
| Experience | ||
| Years since board certification as psychiatrist, mean (SD) | 15.6 (9.7) | 15.8 (9.0) |
| Number of years performing disability evaluations, mean (SD) | 13.8 (9.2) | 12.4 (7.5) |
| Number of evaluations in the previous year, | ||
| 0–4/ 5–20/ 21–50/ > 50/ missing | 0/ 10 / 32 / 58/ 0% | 6/ 17/ 31/ 40/ 6% |
| Time span from training to rating in days, mean (range) | 404 days (115–578) | 41 days (5–88) |
| Age, years: mean (SD) | 47.2 (8.6) | 48.6 (10.1) |
| Gender | ||
| male | 57% | 53% |
| Marital status | ||
| Unmarried/ married/ divorced/ missing | 20/ 40/ 40/ 0% | 20/ 28/ 45/ 8% |
| Nationality | ||
| Swiss/ others/ missing | 63/ 23/ 14% | 70/ 28/ 2% |
| Country of birth | ||
| Switzerland/ others/ missing | 67/ 27/ 6% | 75/ 23/ 2% |
| Severity of disorderd | ||
| mean (SD) | 5.3 (2.1) | 4.9 (1.8) |
| Typicality of study patient compared to other patients seen by the expert | ||
| frequent / semifrequent / rare | 36/ 44/ 20% | 27/ 56/ 17% |
Number of diagnoses RELY 1: n=36; RELY 2: n=47 | ||
| Mood disorders (F3) | 26% | 40% |
| Neurotic, stress-related, somatoform disorders (F4) | 19% | 21% |
| thereof somatoform disorders (F45) | 6% | 15% |
| Organic (F0) | 11% | 9% |
| Disorders of adult personality and behaviour (F6) | 11% | 6% |
| Psychoactive substance use (F1) | 3% | 0% |
| Mental retardation (F7) | 0% | 2% |
| Behavioural and emotional disorders with onset in childhood (F9) | 0% | 2% |
| Patients without main diagnosis | 19% | 19% |
a) Twelve out of 19 psychiatrists performed interviews, all performed ratings. b) Eleven out of 35 psychiatrists performed interviews, all performed ratings. c) Percentages are rounded to nearest whole numbers, d) Scale from 0 to 10, higher score indicates more severe disorder
Fig. 1Work capacity ratings in RELY 1. Thirty plots of the four psychiatrists’ ratings of the patients’ overall work capacity in their last job and in alternative work for 30 patients (c01 to c30). The dots on the left in each cell indicate the psychiatrists’ ratings in relation to the patients’ last job and the dots on the right indicate their ratings in relation to the patients’ alternative work. The lines linking the dots represent the changes in the psychiatrists’ ratings. Each psychiatrist has a different colour. Red frames: psychiatrists disagreed with each other by 100% about the extent of work capacity. This was the case for two patients in relation to their last job, and for five patients in relation to alternative work. Patients with maximum divergent expert ratings. For ‘alternative work’, one rating of patient 26 was excluded from the analysis due to a violation of the rating rules
Reliability and agreement measures. Absolute and relative contributions of the different sources of variation to work capacity ratings: work capacity ratings, total variance and variance components (psychiatrists, patients, residuals), reliability and agreement parameters for ‘last job’ and ‘alternative work’ in RELY 1 and RELY 2
| Reference for WC | WC Mean (95%CI) | Total variance | Variance components | Reliability | Agreement | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Proportion of WC ratings between two psychiatrists whose ratings differed equal or less than the ‘maximum acceptable difference’ of 25 percentage points WC | ‘Standard error of measurement’ (95%CI) | ‘Maximum acceptable difference’ (95%CI) | ||||||||
| Psychiatrists | Patients | Residuals | ICCabs.agree (95%CI) | reported in natural units | reported in natural units | |||||
| Last job | RELY 1 | 43.6% (34.1–53.2) | 1092 | 263 (24%) | 414 (38%) | 415 (38%) | 0.38 (0.19–0.55) | 52.2% (94/180) | 26.0% WC (21.5–31.0) | 72.2% WC (59.5–86.0) |
RELY 2 | 46.3% (39.9–52.6) | 1064 | 76 (7%) | 495 (47%) | 493 (46%) | 0.47 (0.29–0.61) | 61.7% (148/240) | 23.9% WC (20.8–27.0) | 66.1% WC (57.7–74.9) | |
| Alternative work | RELY 1 | 55.0% (47.3–62.8) | 1060 | 88 (8%) | 457 (43%) | 515 (49%) | 0.43 (0.22–0.60) | 61.6% (112/177) | 24.6% WC (20.9–28.4) | 68.1% WC (57.9–78.8) |
RELY 2 | 62.9% (57.7–68.0) | 669 | 50 (7%) | 292 (44%) | 328 (49%) | 0.44 (0.25–0.59) | 73.6% (170/231) | 19.4% WC (16.9–22.0) | 53.8% WC (46.8–61.0) | |
Legend: WC: work capacity, % WC = absolute percentage points in work capacity, ICC = intraclass correlation coefficient (agreement variant); CI: confidence interval
Fig. 2Agreement between experts for varying levels of ‘maximum acceptable difference’ This figure demonstrates the impact of varying limits for ‘maximum acceptable difference’ in WC ratings on level of agreement. Agreement is defined as the proportions of comparisons (in percentage, values in the bars) for whom the WC ratings between any two experts’ differ less than a prespecified limit, here, the ‘maximum acceptable agreement’. We used the expectations from a recent survey among stakeholders to specify the limits for ‘maximum acceptable difference’ (see Table 1 [6]).
Illustrative examples from the stakeholder survey [6]. a Treating and expert psychiatrists defined 25 percentage points* in work capacity ratings between two experts as the ‘maximum acceptable difference’. In RELY 1, 61.6% (109/177) of comparisons would fall within this limit versus 73.6% (170/231) of comparisons in RELY 2. b Lawyers, judges and insurers defined 20 percentage points* in work capacity ratings between two experts as the ‘maximum acceptable difference’. In RELY 1, 59.3% (105/177) of comparisons would fall within this limit versus 65.4% (151/231) of comparisons in RELY 2.
* upper limit of the interquartile range (see Table 1)
Expected versus observed agreement
| a) Expected by stakeholders | b) Observed in the RELY studies | ||||
|---|---|---|---|---|---|
| ‘Maximum acceptable difference’a | Corresponding ‘Standard error of measurement’ | ‘Standard error of measurement’ | Corresponding ‘Maximum acceptable difference’ | ||
| 25% WC | 9.0% WC | Last job | RELY 1 | 26.0% WC | 72.2% WC |
| 20% WC | 7.2% WC | RELY 2 | 23.9% WC | 66.1% WC | |
| 15% WC | 5.4% WC | Alternative job | RELY 1 | 24.6% WC | 68.1% WC |
| 10% WC | 3.6% WC | RELY 2 | 19.4% WC | 53.9% WC | |
Legend: % WC = absolute percentage points in work capacity
a derived from the stakeholder survey (Table 1) [6]
This table compares the expectations of Swiss stakeholders of the agreement in WC ratings between two experts, expressed as ‘maximum acceptable differencea’, with the agreement observed in the RELY studies, i.e., the variation between experts, expressed as ‘standard error of measurement’. Converting ‘maximum acceptable difference’ into ‘standard error of measurement’ and vice versa allows comparison of the level of agreement
a) Agreement expected by stakeholders: Treating and expert psychiatrists considered a difference of 25% WC between two experts as the ‘maximum acceptable difference’ (i.e. for example, expert A: 60% WC; expert B: 35% WC or 85% WC) which corresponds to a variation between experts of 9.0% WC ‘standard error of measurement’
If the ‘maximum acceptable difference’ between two experts were only 15% WC (i.e. for example, expert A: 60% WC, expert B: 45% WC or 75% WC), the corresponding variation between experts would be as low as 5.4% WC ‘standard error of measurement’
b) Agreement observed in the RELY studies: RELY 2last job found a level of agreement of 23.9% WC ‘standard error of measurement’ which corresponds to a (‘maximum acceptable’) difference in WC of 66.1% (i.e. for example, expert A: 30% WC; expert B: 96% WC)
Fig. 3Work capacity ratings in RELY 2. Forty plots of the four psychiatrists’ ratings of the patients’ overall work capacity in their last job and in alternative work for 40 patients (c01 to c40). Red frames: Psychiatrists disagreed with each other by 100% about the extent of work capacity for two patients in their last job, and for no patient in relation to alternative work, which was the primary outcome. Patients with maximum divergent ratings. For ‘alternative work’, all ratings of patient 19 and one rating of patient 23 were excluded from the analysis due to violations of the rating rules
Interaction of various sources of variance on reliability
| Illustration of the interaction of various sources of variance and their impact on the reliability measure ICC. | |
ICC = | |
ICC = | |
| Despite reduction of total variance, the proportionate reduction of variance across all sources of variance results in an ICC of 0.45 identical to example 1. Despite reduction of variance by half, the ability to discriminate patients in their ability to work did not change. | |
Sources of variation. Potential factors for the three sources of variation (psychiatrists, patients, residuals) which may contribute to the variance in overall WC ratings (modified from [5, 30])
| Source of variation | Factors that may impact on the variance of overall work capacity |
|---|---|
| Psychiatrists | • Experience in disability evaluation • Knowledge about previous work • Structuring and prioritizing of information • Psychiatrists’ idiosyncrasies (e.g. leniency/strictness) |
| Patients | • Socio-demographic features • Diagnosis, severity of disorder • Compliance, including malingering • Skills in presenting their case • Symptom exaggeration |
| Residuals | • Interaction psychiatrists*patients • Interaction patient*last job; patient*‘alternative work’ |
External factors: • Changes in legislation with impact on medical evaluations • Interferences of legal demands with medical judgements • Turn-over of staff involved in the studies • Overall attitude in society towards disability |