B Mucci1, H Murray, A Downie, K Osborne. 1. Department of Radiology, South Glasgow University Hospitals, Southern General Hospital, Glasgow, Scotland, UK. Brian.Mucci@ggc.scot.nhs.uk
Abstract
OBJECTIVE: Discrepancy meetings are an important aspect of clinical governance. The Royal College of Radiologists has published advice on how to conduct meetings, suggesting that discrepancies are scored using the scale: 0=no error, 1=minor error, 2=moderate error and 3=major error. We have noticed variation in scores attributed to individual cases by radiologists and have sought to quantify the variation in scoring at our meetings. METHODS: The scores from six discrepancy meetings totalling 161 scored events were collected. The reliability of scoring was measured using Fleiss' kappa, which calculates the degree of agreement in classification. RESULTS: The number of cases rated at the six meetings ranged from 18 to 31 (mean 27). The number of raters ranged from 11 to 16 (mean 14). Only cases where all the raters scored were included in the analysis. The Fleiss' kappa statistic ranged from 0.12 to 0.20, and mean kappa was 0.17 for the six meetings. CONCLUSION: A kappa of 1.0 indicates perfect agreement above chance and 0.0 indicates agreement equal to chance. A rule of thumb is that a kappa ≥0.70 indicates adequate interrater agreement. Our mean result of 0.172 shows poor agreement between scorers. This could indicate a problem with the scoring system or may indicate a need for more formal training and agreement in how scores are applied. ADVANCES IN KNOWLEDGE: Scoring of radiology discrepancies is highly subjective and shows poor interrater agreement.
OBJECTIVE: Discrepancy meetings are an important aspect of clinical governance. The Royal College of Radiologists has published advice on how to conduct meetings, suggesting that discrepancies are scored using the scale: 0=no error, 1=minor error, 2=moderate error and 3=major error. We have noticed variation in scores attributed to individual cases by radiologists and have sought to quantify the variation in scoring at our meetings. METHODS: The scores from six discrepancy meetings totalling 161 scored events were collected. The reliability of scoring was measured using Fleiss' kappa, which calculates the degree of agreement in classification. RESULTS: The number of cases rated at the six meetings ranged from 18 to 31 (mean 27). The number of raters ranged from 11 to 16 (mean 14). Only cases where all the raters scored were included in the analysis. The Fleiss' kappa statistic ranged from 0.12 to 0.20, and mean kappa was 0.17 for the six meetings. CONCLUSION: A kappa of 1.0 indicates perfect agreement above chance and 0.0 indicates agreement equal to chance. A rule of thumb is that a kappa ≥0.70 indicates adequate interrater agreement. Our mean result of 0.172 shows poor agreement between scorers. This could indicate a problem with the scoring system or may indicate a need for more formal training and agreement in how scores are applied. ADVANCES IN KNOWLEDGE: Scoring of radiology discrepancies is highly subjective and shows poor interrater agreement.
Authors: Valerie P Jackson; Trudie Cushing; Hani H Abujudeh; James P Borgstede; Kenneth W Chin; Charles K Grimes; David B Larson; Paul A Larson; Robert S Pyatt; William T Thorwarth Journal: J Am Coll Radiol Date: 2009-01 Impact factor: 5.532
Authors: Patrick T Liu; C Daniel Johnson; Rafael Miranda; Maitray D Patel; Carrie J Phillips Journal: J Am Coll Radiol Date: 2010-01 Impact factor: 5.532
Authors: Jared T Verdoorn; Christopher H Hunt; Marianne T Luetmer; Christopher P Wood; Laurence J Eckel; Kara M Schwartz; Felix E Diehn; David F Kallmes Journal: Open Neuroimag J Date: 2015-01-27
Authors: Peter Mæhre Lauritzen; Jack Gunnar Andersen; Mali Victoria Stokke; Anne Lise Tennstrand; Rolf Aamodt; Thomas Heggelund; Fredrik A Dahl; Gunnar Sandbæk; Petter Hurlen; Pål Gulbrandsen Journal: BMJ Qual Saf Date: 2016-03-24 Impact factor: 7.035
Authors: Luigi Marongiu; Eric Shain; Lydia Drumright; Reidun Lillestøl; Donald Somasunderam; Martin D Curran Journal: PLoS One Date: 2016-11-09 Impact factor: 3.240
Authors: Christy I Sandborg; Gary E Hartman; Felice Su; Glyn Williams; Beate Teufe; Nina Wixson; David B Larson; Lane F Donnelly Journal: Pediatr Qual Saf Date: 2020-12-28