Nikki L Bibler Zaidi1, Clarence D Kreiter, Peris R Castaneda, Jocelyn H Schiller, Jun Yang, Cyril M Grum, Maya M Hammoud, Larry D Gruppen, Sally A Santen. 1. N.L.B. Zaidi is associate director, Evaluation and Assessment, Office of Medical Student Education, University of Michigan Medical School, Ann Arbor, Michigan. C.D. Kreiter is professor, Office of Consultation and Research in Medical Education, University of Iowa Carver College of Medicine, Iowa City, Iowa. P.R. Castaneda is first-year medical student, University of Michigan Medical School, Ann Arbor, Michigan. J.H. Schiller is associate professor of pediatrics and director of pediatric student education, University of Michigan Medical School, Ann Arbor, Michigan. J. Yang is statistician in evaluation and assessment, Office of Medical Student Education, University of Michigan Medical School, Ann Arbor, Michigan. C.M. Grum is professor and senior associate chair, Undergraduate Medical Education, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, Michigan. M.M. Hammoud is professor of obstetrics and gynecology and of medical education, University of Michigan Medical School, Ann Arbor, Michigan. L.D. Gruppen is professor, Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan. S.A. Santen is senior associate dean of assessment, evaluation, and scholarship, Virginia Commonwealth University School of Medicine, Richmond, Virginia. At the time this study was conducted, she was assistant dean, Educational Research and Quality Improvement, Office of Medical Student Education, and associate professor and chair of education, Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, Michigan.
Abstract
PURPOSE: Many factors influence the reliable assessment of medical students' competencies in the clerkships. The purpose of this study was to determine how many clerkship competency assessment scores were necessary to achieve an acceptable threshold of reliability. METHOD: Clerkship student assessment data were collected during the 2015-2016 academic year as part of the medical school assessment program at the University of Michigan Medical School. Faculty and residents assigned competency assessment scores for third-year core clerkship students. Generalizability (G) and decision (D) studies were conducted using balanced, stratified, and random samples to examine the extent to which overall assessment scores could reliably differentiate between students' competency levels both within and across clerkships. RESULTS: In the across-clerkship model, the residual error accounted for the largest proportion of variance (75%), whereas the variance attributed to the student and student-clerkship effects was much smaller (7% and 10.1%, respectively). D studies indicated that generalizability estimates for eight assessors within a clerkship varied across clerkships (G coefficients range = 0.000-0.795). Within clerkships, the number of assessors needed for optimal reliability varied from 4 to 17. CONCLUSIONS: Minimal reliability was found in competency assessment scores for half of clerkships. The variability in reliability estimates across clerkships may be attributable to differences in scoring processes and assessor training. Other medical schools face similar variation in assessments of clerkship students; therefore, the authors hope this study will serve as a model for other institutions that wish to examine the reliability of their clerkship assessment scores.
PURPOSE: Many factors influence the reliable assessment of medical students' competencies in the clerkships. The purpose of this study was to determine how many clerkship competency assessment scores were necessary to achieve an acceptable threshold of reliability. METHOD: Clerkship student assessment data were collected during the 2015-2016 academic year as part of the medical school assessment program at the University of Michigan Medical School. Faculty and residents assigned competency assessment scores for third-year core clerkship students. Generalizability (G) and decision (D) studies were conducted using balanced, stratified, and random samples to examine the extent to which overall assessment scores could reliably differentiate between students' competency levels both within and across clerkships. RESULTS: In the across-clerkship model, the residual error accounted for the largest proportion of variance (75%), whereas the variance attributed to the student and student-clerkship effects was much smaller (7% and 10.1%, respectively). D studies indicated that generalizability estimates for eight assessors within a clerkship varied across clerkships (G coefficients range = 0.000-0.795). Within clerkships, the number of assessors needed for optimal reliability varied from 4 to 17. CONCLUSIONS: Minimal reliability was found in competency assessment scores for half of clerkships. The variability in reliability estimates across clerkships may be attributable to differences in scoring processes and assessor training. Other medical schools face similar variation in assessments of clerkship students; therefore, the authors hope this study will serve as a model for other institutions that wish to examine the reliability of their clerkship assessment scores.
Authors: Annabel K Frank; Patricia O'Sullivan; Lynnea M Mills; Virginie Muller-Juge; Karen E Hauer Journal: J Gen Intern Med Date: 2019-05 Impact factor: 5.128
Authors: Jeremy M Jones; Alexandra B Berman; Erik X Tan; Sarthak Mohanty; Michelle A Rose; Judy A Shea; Jennifer R Kogan Journal: J Gen Intern Med Date: 2022-06-28 Impact factor: 5.128
Authors: Aaron W Bernard; Richard Feinn; Gabbriel Ceccolini; Robert Brown; Ilene Rosenberg; Walter Trymbulak; Christine VanCott Journal: J Med Educ Curric Dev Date: 2019-07-22