Literature DB >> 32322348

The Reliability of Rater Variability.

Andrea Gingerich.   

Abstract

Entities:  

Mesh:

Year:  2020        PMID: 32322348      PMCID: PMC7161334          DOI: 10.4300/JGME-D-20-00163.1

Source DB:  PubMed          Journal:  J Grad Med Educ        ISSN: 1949-8357


× No keyword cloud information.
Simulation is well recognized for its affordances for collecting important assessment information.1–3 In this issue of the Journal of Graduate Medical Education, Andler and colleagues present validity evidence for leveraging the simulation context to provide assessment data for entrustable professional activities (EPAs).4 Unfortunately, they found their validity argument hampered by an unexpected finding: despite good interrater reliability for entrustment-based simulation assessment ratings and fair interrater reliability for similar entrustment-based clinical practice ratings, there were no correlations between them. The authors ponder possible explanations for this troublesome finding and suggest that since there was only “fair agreement at best” for some of the behaviors, rater variability might be an explanation for the lack of correlations. The havoc that rater variability has inflicted on reliability measures has spurred several of us to study its sources.5–7 Aspects not directly related to the rating scale, such as the context in which assessments take place8–10 and variations in rater interpretations and judgments,11–14 have been identified as contributors to rater variability. Thus, I am not surprised to see rater variability when an entrustment scale is used. In fact, as evidence of rater variability continues to accumulate along with increasing recognition of the “plurality of interpretations,”15 we may be reaching a point where rater variability can no longer be framed as an unexpected finding. Yet, this raises a conundrum for the assessment field. Accepting rater variability as the status quo would complicate plans for collecting and interpreting validity evidence.16 How can we demonstrate a relationship to other variables without reliability? In part, the simulation context might offer a solution to this by providing a stable context where raters can be standardized and, themselves, judged. Almost 2 decades ago,17 medical educators were directed to techniques that optimize interrater reliability—figure skating judging.18,19 Although it is not free from bias,20 figure skating judging has design features that support rater agreement and interrater reliability. First, judges are trained and monitored so that those who share consensus are invited to continue judging and outlier judges are not. Second, the assessed performance lasts only a few minutes with a specified number of predictable elements that can be performed in a limited number of ways, with each variation assigned a corresponding score. Third, the assessment task is the judge's only task where they directly observe a series of similar performances. They assign ratings immediately after each assessment, and then note how their ratings compare with those of other judges. These design features are incompatible with almost every aspect of workplace-based assessment; however, the simulation context does offer similar affordances.21 Yet, I wonder how the design features that aim to minimize all types of unwanted variability would align with the very notion of entrustment-based assessment? Entrustment, entrustability, and level of supervision scales promised to better mimic the judgments and decisions supervisors make in the workplace.22,23 The construct of entrustment resonated with the essence of supervision.24,25 It offered to systematically track subjective expert judgments of overall performance to complement the competence judgments based on observed behaviors that were already being collected and analyzed.26 I was excited about using entrustment as the basis for workplace-based assessment because it had the potential to capture indescribable and nuanced aspects of being a physician that resisted measurement.27 I am not an expert in simulation so I will pose the question to those who are: How well does entrustment align with what raters are doing, thinking, and feeling during simulation? It is not a straightforward question and leads to other difficult questions. What does it mean to entrust in simulation and how does it compare to entrusting in the workplace? For example, is the construct of entrustment most aligned when the rater is exposed to the competing priorities of patient safety, learner autonomy, clinical care, teaching obligations, service efficiency, and learner welfare? In other words, must the rater be simultaneously engaged with supervising the trainee for the construct of entrustment to be sufficiently aligned? If so, which forms of simulation offer that context for raters? In proposing that entrustment can be used as the basis for assessment in simulation, the latest research of Andler and colleagues offers the opportunity to contemplate the ideal constructs for simulation assessment. If we were without contemporary pressures to provide data to inform EPA decisions, would we choose to use entrustment in this context? The assessment construct of feedback provision (like that used by field notes28) may be better aligned than entrustment if the rater's role in simulation is akin to that of a coach helping a trainee to learn during practice. Or perhaps the predictable and controllable conditions of simulation, similar to that of figure skating judging, could be used to optimize measurement of competence through standardized assessment of performance. Entrustment-based assessment is rapidly becoming an important component of our assessment tool kit, but I cannot imagine a post-psychometric utopia where all assessments are based on entrustment. All of our assessment modalities (including EPAs), assessment constructs (including entrustment), and assessment contexts (including simulation) have strengths to be leveraged and limitations to be accommodated. Fortunately, the limitations of one can be strategically addressed by the strengths of another with its own limitations supported by yet another context or construct or modality.29 I am eager to see how the strengths of the simulation assessment context and the construct of entrustment can contribute to an assessment program that is more informative than the sum of its parts.
  24 in total

Review 1.  Cognitive, social and environmental sources of bias in clinical performance ratings.

Authors:  Reed G Williams; Debra A Klamen; William C McGaghie
Journal:  Teach Learn Med       Date:  2003       Impact factor: 2.414

2.  A model for programmatic assessment fit for purpose.

Authors:  C P M van der Vleuten; L W T Schuwirth; E W Driessen; J Dijkstra; D Tigelaar; L K J Baartman; J van Tartwijk
Journal:  Med Teach       Date:  2012       Impact factor: 3.650

3.  Selecting and Simplifying: Rater Performance and Behavior When Considering Multiple Competencies.

Authors:  Walter Tavares; Shiphra Ginsburg; Kevin W Eva
Journal:  Teach Learn Med       Date:  2016       Impact factor: 2.414

4.  Competency-based postgraduate training: can we bridge the gap between theory and clinical practice?

Authors:  Olle ten Cate; Fedde Scheele
Journal:  Acad Med       Date:  2007-06       Impact factor: 6.893

5.  Competency-based achievement system: using formative feedback to teach and assess family medicine residents' skills.

Authors:  Shelley Ross; Cheryl N Poth; Michel Donoff; Paul Humphries; Ivan Steiner; Shirley Schipper; Fred Janke; Darren Nichols
Journal:  Can Fam Physician       Date:  2011-09       Impact factor: 3.275

6.  What if the 'trust' in entrustable were a social judgement?

Authors:  Andrea Gingerich
Journal:  Med Educ       Date:  2015-08       Impact factor: 6.251

Review 7.  Factors Influencing Mini-CEX Rater Judgments and Their Practical Implications: A Systematic Literature Review.

Authors:  Victor Lee; Keira Brain; Jenepher Martin
Journal:  Acad Med       Date:  2017-06       Impact factor: 6.893

8.  Nuts and bolts of entrustable professional activities.

Authors:  Olle Ten Cate
Journal:  J Grad Med Educ       Date:  2013-03

Review 9.  Rater cognition: review and integration of research findings.

Authors:  Geneviève Gauthier; Christina St-Onge; Walter Tavares
Journal:  Med Educ       Date:  2016-05       Impact factor: 6.251

10.  Simulation versus real-world performance: a direct comparison of emergency medicine resident resuscitation entrustment scoring.

Authors:  Kristen Weersink; Andrew K Hall; Jessica Rich; Adam Szulewski; J Damon Dagnone
Journal:  Adv Simul (Lond)       Date:  2019-05-01
View more
  1 in total

1.  Determining influence, interaction and causality of contrast and sequence effects in objective structured clinical exams.

Authors:  Peter Yeates; Alice Moult; Natalie Cope; Gareth McCray; Richard Fuller; Robert McKinley
Journal:  Med Educ       Date:  2022-01-11       Impact factor: 7.647

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.