Matthew Kelleher1, Benjamin Kinnear, Dana Sall, Daniel Schumacher, Daniel P Schauer, Eric J Warm, Ben Kelcey. 1. M. Kelleher is assistant professor of medicine and pediatrics and associate program director, Department of Internal Medicine, University of Cincinnati College of Medicine, Cincinnati, Ohio. B. Kinnear is assistant professor of medicine and pediatrics and associate program director, Department of Internal Medicine, University of Cincinnati College of Medicine, Cincinnati, Ohio. D. Sall is assistant professor of medicine and associate program director, Department of Internal Medicine, University of Cincinnati College of Medicine, Cincinnati, Ohio. D. Schumacher is associate professor of pediatrics, Cincinnati Children's Hospital Medical Center and University of Cincinnati College of Medicine, Cincinnati, Ohio. D.P. Schauer is associate professor of medicine and associate program director, Department of Internal Medicine, University of Cincinnati College of Medicine, Cincinnati, Ohio; ORCID: https://orcid.org/0000-0003-3264-8154. E.J. Warm is professor of medicine and program director, Department of Internal Medicine, University of Cincinnati College of Medicine, Cincinnati, Ohio; ORCID: https://orcid.org/0000-0002-6088-2434. B. Kelcey is associate professor of quantitative research methodologies, Department of Education, University of Cincinnati, Cincinnati, Ohio.
Abstract
PURPOSE: To examine the reliability and attributable facets of variance within an entrustment-derived workplace-based assessment system. METHOD: Faculty at the University of Cincinnati Medical Center internal medicine residency program (a 3-year program) assessed residents using discrete workplace-based skills called observable practice activities (OPAs) rated on an entrustment scale. Ratings from July 2012 to December 2016 were analyzed using applications of generalizability theory (G-theory) and decision study framework. Given the limitations of G-theory applications with entrustment ratings (the assumption that mean ratings are stable over time), a series of time-specific G-theory analyses and an overall longitudinal G-theory analysis were conducted to detail the reliability of ratings and sources of variance. RESULTS: During the study period, 166,686 OPA entrustment ratings were given by 395 faculty members to 253 different residents. Raters were the largest identified source of variance in both the time-specific and overall longitudinal G-theory analyses (37% and 23%, respectively). Residents were the second largest identified source of variation in the time-specific G-theory analyses (19%). Reliability was approximately 0.40 for a typical month of assessment (27 different OPAs, 2 raters, and 1-2 rotations) and 0.63 for the full sequence of ratings over 36 months. A decision study showed doubling the number of raters and assessments each month could improve the reliability over 36 months to 0.76. CONCLUSIONS: Ratings from the full 36 months of the examined program of assessment showed fair reliability. Increasing the number of raters and assessments per month could improve reliability, highlighting the need for multiple observations by multiple faculty raters.
PURPOSE: To examine the reliability and attributable facets of variance within an entrustment-derived workplace-based assessment system. METHOD: Faculty at the University of Cincinnati Medical Center internal medicine residency program (a 3-year program) assessed residents using discrete workplace-based skills called observable practice activities (OPAs) rated on an entrustment scale. Ratings from July 2012 to December 2016 were analyzed using applications of generalizability theory (G-theory) and decision study framework. Given the limitations of G-theory applications with entrustment ratings (the assumption that mean ratings are stable over time), a series of time-specific G-theory analyses and an overall longitudinal G-theory analysis were conducted to detail the reliability of ratings and sources of variance. RESULTS: During the study period, 166,686 OPA entrustment ratings were given by 395 faculty members to 253 different residents. Raters were the largest identified source of variance in both the time-specific and overall longitudinal G-theory analyses (37% and 23%, respectively). Residents were the second largest identified source of variation in the time-specific G-theory analyses (19%). Reliability was approximately 0.40 for a typical month of assessment (27 different OPAs, 2 raters, and 1-2 rotations) and 0.63 for the full sequence of ratings over 36 months. A decision study showed doubling the number of raters and assessments each month could improve the reliability over 36 months to 0.76. CONCLUSIONS: Ratings from the full 36 months of the examined program of assessment showed fair reliability. Increasing the number of raters and assessments per month could improve reliability, highlighting the need for multiple observations by multiple faculty raters.
Authors: Daniel P Schauer; Benjamin Kinnear; Matthew Kelleher; Dana Sall; Daniel J Schumacher; Eric J Warm Journal: J Gen Intern Med Date: 2022-04-04 Impact factor: 6.473
Authors: David R Brown; Jeremy J Moeller; Douglas Grbic; Dorothy A Andriole; William B Cutrer; Vivian T Obeso; Mark D Hormann; Jonathan M Amiel Journal: JAMA Netw Open Date: 2022-09-01
Authors: Benjamin Kinnear; Matthew Kelleher; Dana Sall; Daniel P Schauer; Eric J Warm; Andrea Kachelmeyer; Abigail Martini; Daniel J Schumacher Journal: J Gen Intern Med Date: 2020-10-26 Impact factor: 5.128
Authors: Matthew Kelleher; Benjamin Kinnear; Dana R Sall; Danielle E Weber; Bailey DeCoursey; Jennifer Nelson; Melissa Klein; Eric J Warm; Daniel J Schumacher Journal: Perspect Med Educ Date: 2021-09-02