| Literature DB >> 23327567 |
Ansgar Espeland1, Nils Vetti, Jostein Kråkenes.
Abstract
BACKGROUND: Magnetic resonance imaging (MRI) studies typically employ either a single expert or multiple readers in collaboration to evaluate (read) the image results. However, no study has examined whether evaluations from multiple readers provide more reliable results than a single reader. We examined whether consistency in image interpretation by a single expert might be equal to the consistency of combined readings, defined as independent interpretations by two readers, where cases of disagreement were reconciled by consensus.Entities:
Mesh:
Year: 2013 PMID: 23327567 PMCID: PMC3626747 DOI: 10.1186/1471-2342-13-4
Source DB: PubMed Journal: BMC Med Imaging ISSN: 1471-2342 Impact factor: 1.930
Kappa values for agreement between initial and second evaluations
| | ||||||
|---|---|---|---|---|---|---|
| Reader A | 0.59 (0.47, 0.70) | 0.62 (0.51, 0.73) | 0.63 (0.48, 0.78) | 0.50 (0.35, 0.64) | 0.51 (0.38, 0.64) | 0.59 (0.43, 0.75) |
| Reader B | 0.51 (0.39, 0.63) | 0.57 (0.44, 0.70) | 0.52 (0.35, 0.68) | 0.48 (0.32, 0.64) | 0.58 (0.45, 0.72) | 0.53 (0.38, 0.68) |
| A and B combined | 0.68 (0.56, 0.79) | 0.71 (0.61, 0.81) | 0.74 (0.61, 0.88) | 0.66 (0.53, 0.79) | 0.65 (0.54, 0.77) | 0.69 (0.54, 0.84) |
Values represent linearly weighted kappa values for scores 0, 1, 2, or 3 on each side of the spine, and unweighted kappa values for scores 2–3 vs. scores 0–1 per subject on any side (right and/or left), with 95% confidence intervals in parenthesis, based on magnetic resonance imaging in 102 subjects.
Figure 1Scoring high signal intensities of alar and transverse ligaments on upper neck MRIs. Proton-density-weighted, fast-spin echo, 1.5 Tesla MRI sections were performed in (A, D) coronal, (B, E) sagittal, and (C, F) axial directions. MRIs were from two healthy women, aged (A-C) 44 years old, and (D-F) 60 years old. Broken lines mark the sagittal plane. (A-C) The transverse ligament is indicated with arrow heads. The high intensity signal was scored 2 by reader A, 1 by reader B, and 2 by consensus; in the second evaluation, the same signal was scored 2 by both readers independently. The alar ligament is indicated with arrows. (A, B) The high intensity signal was graded 2 by both readers independently; in the second evaluation, the same signal was scored 2 by reader A, 3 by reader B, and 2 by consensus. (D-F) The transverse ligament (arrow heads) and alar ligament (arrows) were scored 0 by both readers independently in both evaluations.
Prevalence of scores 2–3 on initial and second evaluations
| | ||||||
|---|---|---|---|---|---|---|
| Reader A | 29.4 | 43.1 | 0.001 | 27.5 | 40.2 | 0.004 |
| Reader B | 22.5 | 40.2 | <0.001 | 25.5 | 46.1 | <0.001 |
| A and B combined | 31.4 | 39.2 | 0.039 | 28.4 | 36.3 | 0.057 |
The data are based on magnetic resonance imaging in 102 subjects.
* Highest assigned score when different intensities were noted on the right and left sides of the spine.
§P value for difference in prevalence (%), based on McNemar’s test.
Figure 2Bland Altman plots of difference in MRI sum score between second and first evaluations. On 102 upper neck MRIs, two readers A and B independently graded the signal intensities of the alar and transverse ligaments on both sides of the spine (i.e. four ligament parts) on a scale of 0, 1, 2 or 3 and then resolved disagreements by consensus (combined reading). They also repeated the grading process (second evaluation). The sum of the scores for all ligament parts (MRI sum score, possible values 0–12) was calculated. In each plot, the difference between sum score 2 (second evaluation) and sum score 1 (first evaluation) is plotted against the mean of the two sum scores. Dotted lines represent mean difference and 95% limits of agreement. The plots show generally smaller differences for both readers’ combined reading and for the average of both readers’ scores (average reading) than for individual reader scores. Mean difference in sum score (with 95% limits of agreement) was for reader A −0.4 (−4.4, 3.7), reader B 1.2 (−2.7, 5.1), combined reading 0.0 (−3.4, 3.4) and average reading 0.4 (−2.5, 3.3).