Nada Gawad1, Amanda Fowler2, Richard Mimeault3, Isabelle Raiche4. 1. Division of General Surgery, Department of Surgery, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada; The Ottawa Hospital, Ottawa, Ontario, Canada; Department of Innovation in Medical Education (DIME), University of Ottawa, Ottawa, Ontario, Canada. Electronic address: ngawad@toh.ca. 2. Division of General Surgery, Department of Surgery, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada; The Ottawa Hospital, Ottawa, Ontario, Canada; Department of Innovation in Medical Education (DIME), University of Ottawa, Ottawa, Ontario, Canada. Electronic address: afowlermd@gmail.com. 3. Division of General Surgery, Department of Surgery, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada; The Ottawa Hospital, Ottawa, Ontario, Canada. 4. Division of General Surgery, Department of Surgery, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada; The Ottawa Hospital, Ottawa, Ontario, Canada; Department of Innovation in Medical Education (DIME), University of Ottawa, Ottawa, Ontario, Canada.
Abstract
BACKGROUND: The inter-rater reliability (IRR) of laparoscopic skills assessment is usually determined in the context of motivated raters from a single subspecialty practice group with significant experience using similar tools. The purpose of this study was to determine the IRR among attending surgeons of different experience and practices, the extent of rater training that is necessary to achieve good IRR, and if rater training is retained over periods of nonuse. METHODS: In Part 1, 5 surgeons of different practice backgrounds assessed 3 laparoscopic cholecystectomy videos using the Global Operative Assessment of Laparoscopic Skills instrument. In Part 2, 2 of the surgeons assessed a total of 33 videos over 5 scoring sessions distributed across 6 months. They participated in 2 different training sessions, and retention was tested in the other 3 sessions. IRR was calculated for Parts 1 and 2 with an intraclass correlation (ICC) in a 2-way random-effects model. RESULTS: The ICC for Part 1 was poor (ICC = 0.26). In Part 2, the ICC was highest after each training session (scoring #1 ICC = 0.76, scoring #3 ICC = 0.74). The ICC was not retained 1.5 months after the brief video-based training session (scoring #2 ICC = -0.17). The ICC was retained 2.5 months after the in-depth discussion training session (scoring #4 ICC = 0.70), but not 4.5 months later (scoring #5 ICC = 0.04). CONCLUSIONS: Good IRR is not implicit among surgeons with varying backgrounds and experience. Good IRR can be achieved with different types of rater training, but the impact of rater training is lost in periods of nonuse. This suggests the need for further study of the IRR of technical skills assessment when performed by the wide variety of surgeon raters as is commonly encountered in the environment of postgraduate resident assessment.
BACKGROUND: The inter-rater reliability (IRR) of laparoscopic skills assessment is usually determined in the context of motivated raters from a single subspecialty practice group with significant experience using similar tools. The purpose of this study was to determine the IRR among attending surgeons of different experience and practices, the extent of rater training that is necessary to achieve good IRR, and if rater training is retained over periods of nonuse. METHODS: In Part 1, 5 surgeons of different practice backgrounds assessed 3 laparoscopic cholecystectomy videos using the Global Operative Assessment of Laparoscopic Skills instrument. In Part 2, 2 of the surgeons assessed a total of 33 videos over 5 scoring sessions distributed across 6 months. They participated in 2 different training sessions, and retention was tested in the other 3 sessions. IRR was calculated for Parts 1 and 2 with an intraclass correlation (ICC) in a 2-way random-effects model. RESULTS: The ICC for Part 1 was poor (ICC = 0.26). In Part 2, the ICC was highest after each training session (scoring #1 ICC = 0.76, scoring #3 ICC = 0.74). The ICC was not retained 1.5 months after the brief video-based training session (scoring #2 ICC = -0.17). The ICC was retained 2.5 months after the in-depth discussion training session (scoring #4 ICC = 0.70), but not 4.5 months later (scoring #5 ICC = 0.04). CONCLUSIONS: Good IRR is not implicit among surgeons with varying backgrounds and experience. Good IRR can be achieved with different types of rater training, but the impact of rater training is lost in periods of nonuse. This suggests the need for further study of the IRR of technical skills assessment when performed by the wide variety of surgeon raters as is commonly encountered in the environment of postgraduate resident assessment.