| Literature DB >> 28464322 |
Abstract
Reliability and agreement are two notions of paramount importance in medical and behavioral sciences. They provide information about the quality of the measurements. When the scale is categorical, reliability and agreement can be quantified through different kappa coefficients. The present paper provides two simple alternatives to more advanced modeling techniques, which are not always adequate in case of a very limited number of subjects, when comparing several dependent kappa coefficients obtained on multilevel data. This situation frequently arises in medical sciences, where multilevel data are common. Dependent kappa coefficients can result from the assessment of the same individuals at various occasions or when each member of a group is compared to an expert, for example. The method is based on simple matrix calculations and is available in the R package "multiagree". Moreover, the statistical properties of the proposed method are studied using simulations. Although this paper focuses on kappa coefficients, the method easily extends to other statistical measures.Entities:
Keywords: Clustered bootstrap; Delta method; Hierarchical; Intraclass; Rater
Mesh:
Year: 2017 PMID: 28464322 PMCID: PMC5600130 DOI: 10.1002/bimj.201600093
Source DB: PubMed Journal: Biom J ISSN: 0323-3847 Impact factor: 2.207
Figure 1(A) Mean squared error and (B) mean standard error of κ1, (C) mean correlation between κ1 and κ2 and (D) type I error for the comparison of two dependent multilevel kappa coefficients (κ1 and κ2) obtained on a binary scale when the observers marginal probability distribution is uniform and the cluster size is equal to (left) and (right). The results obtained by the delta method ignoring the hierarchical structure (dashed lines) and by the multilevel delta method (plain line) are reported for (black), (middle gray), and (light gray) clusters. Results are depicted for different interobserver agreement values ().
Figure 2Theoretical (plain line) and observed (dashed line) sampling distribution of the T 2 statistic when comparing two kappa coefficients equal to 0.8 obtained on a binary scale with uniform observers' marginal distribution when the intracluster kappa coefficient equals 0.5. In the left panel, there are observations per cluster and in the right panel there are observations per cluster. The number of clusters is 20 (upper panel), 30 (middle panel) and 100 (lower panel).
FEES study (third part). Proportion of patients classified in the different FEES severity categories according to the liquid consistency (N=20). Test of the homogeneity of the dysphagia severity within patient
| Category | |||||||
|---|---|---|---|---|---|---|---|
| Parameter | Liquid consistency | 1 | 2 | 3 | 4 | 5 |
|
| VP | Thin | 0.41 | 0.55 | 0.04 | <0.0001 | ||
| Thick | 0.20 | 0.43 | 0.37 | ||||
| PP | Thin | 0.57 | 0.41 | 0.02 | 0.41 | ||
| Thick | 0.65 | 0.29 | 0.06 | ||||
| PD | Thin | 0.30 | 0.22 | 0.18 | 0.15 | 0.15 | 0.36 |
| Thick | 0.18 | 0.44 | 0.18 | 0.12 | 0.09 | ||
| PA | Thin | 0.35 | 0.60 | 0.05 | <0.0001 | ||
| Thick | 0.62 | 0.34 | 0.04 | ||||
a)VP, valleculae pooling; PP, pyriform pooling; PD, piecemeal deglutition; PA, penetration/aspiration.
b) p‐value obtained by ordinal multilevel probit regression.
FEES study. Intraobserver agreement level (linear weighted kappa coefficient and standard errors obtained with the multilevel delta method and the clustered bootstrap method) for the 4 FEES variables. The p‐value refers to the comparison of the three multilevel dependent kappa coefficients
| Delta method | |||||||
|---|---|---|---|---|---|---|---|
| Parameter | K | N | Liquid consistency | Observer 1 | Observer 2 | Consensus | p‐value |
| VP | 14 | 20 | All | 0.79 (0.11) | 0.94 (0.061) | 0.75 (0.12) | 0.25 |
| 14 | 14 | Thin | 0.69 (0.21) | 1.00 (NA) | 0.66 (0.18) | NA | |
| 6 | 6 | Thick | 0.57 (0.39) | 0.57 (0.39) | 0.79 (0.21) | NA | |
| PP | 19 | 29 | All | 0.56 (0.19) | 0.76 (0.11) | 1.00 (NA) | NA |
| PD | 20 | 35 | All | 0.93 (0.037) | 0.78 (0.081) | 0.94 (0.034) | 0.11 |
| PA | 18 | 26 | All | 0.62 (0.14) | 0.79 (0.098) | 0.88 (0.071) | 0.25 |
| 13 | 13 | Thin | 0.84 (0.15) | 0.48 (0.23) | 0.80 (0.12) | 0.28 | |
| 13 | 13 | Thick | 0.35 (0.28) | 1.00 (NA) | 1.00 (NA) | NA | |
| Clustered bootstrap method | |||||||
| Parameter | K | N | Liquid consistency | Observer 1 | Observer 2 | Consensus | p‐value |
| VP | 20 | 40 | All | 0.74 (0.12) | 0.94 (0.053) | 0.84 (0.074) | 0.13 |
| 20 | 20 | Thin | 0.55 (0.23) | 1.00 (NA) | 0.71 (0.15) | NA | |
| 20 | 20 | Thick | 0.72 (0.29) | 0.82 (0.18) | 0.92 (0.080) | 0.62 | |
| PP | 20 | 40 | All | 0.54 (0.18) | 0.75 (0.12) | 0.90 (0.075) | 0.13 |
| PD | 20 | 40 | All | 0.94 (0.036) | 0.76 (0.088) | 0.95 (0.030) | 0.10 |
| PA | 20 | 40 | All | 0.64 (0.13) | 0.81 (0.088) | 0.93 (0.049) | 0.14 |
| 20 | 20 | Thin | 0.84 (0.15) | 0.58 (0.21) | 0.85 (0.092) | 0.41 | |
| 20 | 20 | Thick | 0.40 (0.25) | 1.00 (NA) | 1.00 (NA) | NA | |
a)VP, valleculae pooling; PP, pyriform pooling; PD, piecemeal deglutition; PA, penetration/aspiration.
b)K is the number of patients.
c)N is the total number of observations.
d)A separate kappa coefficient was computed for each liquid consistency when dysphagia scores were different for thin and thick liquids.
Figure 3Differences between the kappa coefficients obtained by the observers individually and in consensus with the clustered bootstrap method (95% confidence ellipse). The square represents the bootstrap estimate and the triangle the origin point (0,0).