Timothy M Kowalewski1, Bryan Comstock2, Robert Sweet3, Cory Schaffhausen1, Ashleigh Menhadji4, Timothy Averch5, Geoffrey Box6, Timothy Brand7, Michael Ferrandino8, Jihad Kaouk9, Bodo Knudsen6, Jaime Landman10, Benjamin Lee11, Bradley F Schwartz12, Elspeth McDougall13, Thomas S Lendvay14. 1. Department of Mechanical Engineering, University of Minnesota, Minneapolis, Minnesota. 2. Department of Biostatistics, University of Washington, Seattle, Washington. 3. Department of Urology, University of Minnesota, Minneapolis, Minnesota. 4. Boston University School of Medicine, Boston, Massachusetts. 5. University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania. 6. Department of Urology, Ohio State University, Columbus, Ohio. 7. Madigan Army Medical Center, Uniformed Services University of the Health Sciences, Tacoma, Washington. 8. Department of Urology, Duke University, Durham, North Carolina. 9. Cleveland Clinic, Cleveland, Ohio. 10. Department of Urology, UC Irvine, Orange, California. 11. Department of Urology and Oncology, Tulane University, New Orleans, Louisiana. 12. Division of Urology, Southern Illinois University, Springfield, Illinois. 13. Department of Urologic Sciences, University of British Colombia, Vancouver, British Columbia, Canada. 14. Department of Urology, University of Washington and Seattle Children's Hospital, Seattle, Washington. Electronic address: thomas.lendvay@seattlechildrens.org.
Abstract
PURPOSE: The BLUS (Basic Laparoscopic Urologic Skills) consortium sought to address the construct validity of BLUS tasks and the wider problem of accurate, scalable and affordable skill evaluation by investigating the concordance of 2 novel candidate methods with faculty panel scores, those of automated motion metrics and crowdsourcing. MATERIALS AND METHODS: A faculty panel of surgeons (5) and anonymous crowdworkers blindly reviewed a randomized sequence of a representative sample of 24 videos (12 pegboard and 12 suturing) extracted from the BLUS validation study (454) using the GOALS (Global Objective Assessment of Laparoscopic Skills) survey tool with appended pass-fail anchors via the same web based user interface. Pre-recorded motion metrics (tool path length, jerk cost etc) were available for each video. Cronbach's alpha, Pearson's R and ROC with AUC statistics were used to evaluate concordance between continuous scores, and as pass-fail criteria among the 3 groups of faculty, crowds and motion metrics. RESULTS: Crowdworkers provided 1,840 ratings in approximately 48 hours, 60 times faster than the faculty panel. The inter-rater reliability of mean expert and crowd ratings was good (α=0.826). Crowd score derived pass-fail resulted in 96.9% AUC (95% CI 90.3-100; positive predictive value 100%, negative predictive value 89%). Motion metrics and crowd scores provided similar or nearly identical concordance with faculty panel ratings and pass-fail decisions. CONCLUSIONS: The concordance of crowdsourcing with faculty panels and speed of reviews is sufficiently high to merit its further investigation alongside automated motion metrics. The overall agreement among faculty, motion metrics and crowdworkers provides evidence in support of the construct validity for 2 of the 4 BLUS tasks.
PURPOSE: The BLUS (Basic Laparoscopic Urologic Skills) consortium sought to address the construct validity of BLUS tasks and the wider problem of accurate, scalable and affordable skill evaluation by investigating the concordance of 2 novel candidate methods with faculty panel scores, those of automated motion metrics and crowdsourcing. MATERIALS AND METHODS: A faculty panel of surgeons (5) and anonymous crowdworkers blindly reviewed a randomized sequence of a representative sample of 24 videos (12 pegboard and 12 suturing) extracted from the BLUS validation study (454) using the GOALS (Global Objective Assessment of Laparoscopic Skills) survey tool with appended pass-fail anchors via the same web based user interface. Pre-recorded motion metrics (tool path length, jerk cost etc) were available for each video. Cronbach's alpha, Pearson's R and ROC with AUC statistics were used to evaluate concordance between continuous scores, and as pass-fail criteria among the 3 groups of faculty, crowds and motion metrics. RESULTS: Crowdworkers provided 1,840 ratings in approximately 48 hours, 60 times faster than the faculty panel. The inter-rater reliability of mean expert and crowd ratings was good (α=0.826). Crowd score derived pass-fail resulted in 96.9% AUC (95% CI 90.3-100; positive predictive value 100%, negative predictive value 89%). Motion metrics and crowd scores provided similar or nearly identical concordance with faculty panel ratings and pass-fail decisions. CONCLUSIONS: The concordance of crowdsourcing with faculty panels and speed of reviews is sufficiently high to merit its further investigation alongside automated motion metrics. The overall agreement among faculty, motion metrics and crowdworkers provides evidence in support of the construct validity for 2 of the 4 BLUS tasks.
Authors: Anna French; Thomas S Lendvay; Robert M Sweet; Timothy M Kowalewski Journal: Int J Comput Assist Radiol Surg Date: 2017-05-17 Impact factor: 2.924
Authors: Jason D Kelly; Ashley Petersen; Thomas S Lendvay; Timothy M Kowalewski Journal: Int J Comput Assist Radiol Surg Date: 2020-09-30 Impact factor: 2.924
Authors: Jason D Kelly; Ashley Petersen; Thomas S Lendvay; Timothy M Kowalewski Journal: Int J Comput Assist Radiol Surg Date: 2020-04-15 Impact factor: 2.924
Authors: Geb W Thomas; Steven Long; Marcus Tatum; Timothy Kowalewski; Dominik Mattioli; J Lawrence Marsh; Heather R Kowalski; Matthew D Karam; Joan E Bechtold; Donald D Anderson Journal: Iowa Orthop J Date: 2020