| Literature DB >> 27495131 |
Antonia Zapf1, Stefanie Castell2, Lars Morawietz3, André Karch4,5.
Abstract
BACKGROUND: Reliability of measurements is a prerequisite of medical research. For nominal data, Fleiss' kappa (in the following labelled as Fleiss' K) and Krippendorff's alpha provide the highest flexibility of the available reliability measures with respect to number of raters and categories. Our aim was to investigate which measures and which confidence intervals provide the best statistical properties for the assessment of inter-rater reliability in different situations.Entities:
Keywords: Bootstrap; Confidence interval; Fleiss’ K; Fleiss’ kappa; Inter-rater heterogeneity; Krippendorff’s alpha
Mesh:
Year: 2016 PMID: 27495131 PMCID: PMC4974794 DOI: 10.1186/s12874-016-0200-9
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Fig. 1Distribution of the true values in the 27 scenarios (independent of the sample size)
Fig. 2Percentage bias for Krippendorff’s alpha and Fleiss’ K over all 81 scenarios. The dotted line indicates unbiasedness. On the left side the whole range from −100 to +100 % is displayed, on the right side the relevant excerpt is enlarged
Fig. 3Two-sided empirical type-one error of the three approaches over all 81 scenarios. The dotted line indicates the theoretical coverage probability of 95 %
Fig. 4Empirical coverage probability for the bootstrap intervals for Krippendorff’s alpha and Fleiss’ K with varying factors sample size (a), number of categories (b), number of raters (c) and strength of agreement (d). In each subplot, summary results over all levels of the other factors are displayed. The dashed line indicates the theoretical coverage probability of 95 %
Empirical coverage probability and bias in % of Krippendorff’s alpha and Fleiss’ K for simulated data with varying percentage of missing values
| Missing values | Krippendorff’s alpha | Fleiss’ K | |||
|---|---|---|---|---|---|
| Coverage probability (%) | Bias (%) | Coverage probability (%) | Bias (%) | ||
| I | 10 % | 95.4 | - 0.82 | 94.4 | - 0.78 |
| 25 % | 94.3 | - 0.54 | 94.3 | - 1.40 | |
| 50 % | 93.9 | - 0.67 | 40.8 | - 25.93 | |
| II | 10 % | 92.9 | 0.04 | 95.2 | - 0.16 |
| 25 % | 94.7 | 0.03 | 67.7 | 8.27 | |
| 50 % | 93.6 | 0.01 | 13.3 | - 25.72 | |
| III | 10 % | 95.1 | 0.01 | 93.8 | - 0.26 |
| 25 % | 95.2 | −0.02 | 65.5 | - 7.76 | |
| 50 % | 94.8 | −0.13 | 33.3 | - 23.72 | |
The scenarios are defined as: I. N = 100, n = 5, k = 2, low agreement; II. N = 100, n = 5, k = 5, high agreement; III. N = 100, n = 10, k = 3, medium agreement (with N as number of observations, n as number of raters and k as number of categories)
Results of the case study (n = 50) of histopathological assessment of patients with mamma carcinoma rated by four independent and blinded readers. The six ordinal parameters were also assessed if as they were measured in a nominal way
| Parameter | Levels | Scale | Missing values (in %) | Observed agreement | Fleiss’ K | Krippendorff’s alpha | |||
|---|---|---|---|---|---|---|---|---|---|
| Point estimate | Asymptotic CI | Bootstrap CI | Point estimate | Bootstrap CI | |||||
| Estrogen IRS | 2 | Nominal | 0 | 96 % | 0.88 | 0.76–0.99 | 0.65–1.00 | 0.88 | 0.66–1.00 |
| MIB-1 status | 2 | Nominal | 0 | 72 % | 0.66 | 0.55–0.78 | 0.51–0.80 | 0.66 | 0.51–0.80 |
| HER-2 status | 3 | Nominal | 0 | 86 % | 0.77 | 0.68–0.87 | 0.58–0.90 | 0.77 | 0.60–0.92 |
| Estrogen intensity | 4 | Nominal | 0 | 78 % | 0.62 | 0.54–0.71 | 0.42–0.78 | 0.62 | 0.40–0.79 |
| Ordinal | - | - | - | 0.74 | 0.51–0.80 | ||||
| Estrogen group | 5 | Nominal | 0 | 86 % | 0.74 | 0.66–0.82 | 0.55–0.88 | 0.74 | 0.55–0.89 |
| Ordinal | - | - | 0.88 | 0.73–0.96 | |||||
| Progesteron intensity | 4 | Nominal | 10 | 77 % | 0.74 | 0.63–0.84 | 0.56–0.89 | 0.69 | 0.53–0.83 |
| Ordinal | - | - | - | 0.86 | 0.75–0.93 | ||||
| Progesteron group | 5 | Nominal | 0 | 44 % | 0.56 | 0.50–0.63 | 0.43–0.66 | 0.56 | 0.45–0.67 |
| Ordinal | - | - | - | 0.83 | 0.72–0.90 | ||||
| HER-2 score | 4 | Nominal | 0 | 46 % | 0.52 | 0.45–0.60 | 0.38–0.64 | 0.52 | 0.37–0.65 |
| Ordinal | - | - | - | 0.70 | 0.53–0.82 | ||||
| MIB-1 proliferation rate | 10 | Nominal | 0 | 10 % | 0.20 | 0.15–0.25 | 0.12–0.28 | 0.20 | 0.12–0.27 |
| Ordinal | - | - | - | 0.81 | 0.68–0.87 | ||||