Literature DB >> 35875436

Four ways to get a grip on making robust decisions from workplace-based assessments.

Abstract

Synthesising the results of workplace-based assessments to inform robust decisions is seen as both important and difficult. Concerns about failing to fail the trainee not ready to proceed has drawn disproportionate attention to assessors. This paper proposes a model for a more systems-based view so that the value of the assessor's judgement is incorporated while preserving the value and robustness of collective decision-making. Our experience has shown it can facilitate robust decisions on some of the more difficult areas, such as professionalism.

Entities: Chemical

Year: 2022 PMID： 35875436 PMCID： PMC9297242 DOI： 10.36834/cmej.73361

Source DB: PubMed Journal: Can Med Educ J ISSN： 1923-1202

Introduction

Combining the results of workplace-based assessments (WBAs) to inform high stakes decisions is a goal for many educational programmes but one that can be difficult to realise. WBAs fill an important gap in any programme-level assessment blueprint but seem more susceptible to failure to fail than point-in-time exams. A brief observation of a trainee in a workplace will not give a robust picture of that trainee’s overall competence; the long case is a well-described example of this with reliability estimates of around 0.3-0.4.[1] This is not to say long cases have no value, but basing a high stakes decision on one episode alone is hazardous. Greater generalisability requires synthesis of several observations.[1] The problem of reliance on single observations also makes sense intuitively. Trainees recognise this problem and realise that the result they get on such assessments may be as much due to serendipity and circumstance as it is to their own ability–such a perception is well backed up by data.[1],[2] It is this serendipity and circumstance that likely also contributes to trainees’ high levels of anxiety because much of the result they obtain may be out of their control. Examiners also recognise this–they see a performance on a single observation but realise that performance may not actually reflect the trainee’s overall competence. If the performance is below expectations, assessors often correctly feel some hesitancy in giving a fail.[3] If assessors are uncertain about a trainee’s performance and are then pushed to make a judgement, they will tend to give the benefit of the doubt in the trainee’s favour.[3] If such reluctance happens on several WBAs, then a series of substandard or borderline performances can all be recorded as passes. As a result, trying to make anything other than a pass decision at the end of a year becomes indefensible. High stakes decisions are more trust-worthy if they are made by a group of people and after incorporating several observations.[3],[4] This susceptibility to failure to fail illustrates the ‘black ice’ and highlights a paradox–the attributes that worry people the most and that get graduates into trouble (such as professionalism) are the attributes that seem to be less well represented on an assessment blueprint that focuses just on point-in-time examinations.[5] A second paradox is that even if we do assess those important attributes, they are also possibly the things that are most susceptible to ‘failure to fail.’ Despite the necessity for WBAs, they also have problems: (1) they can be seen as a ‘tick box exercise’ that trainees somehow just have to do;[6] (2) people hardly ever fail them;[7] and (3) they are too subjective to reliably inform decisions.[8] Oftentimes the blame for such shortcomings has been levelled at the assessor or supervisor whereas a shift to focussing on the system is needed.[9] This paper presents reflections on some strategies used in the medical programme at Otago University in New Zealand to achieve robust high-stakes decisions from in-course workplace-based assessments. An unanticipated benefit was that the system works to highlight many aspects of professionalism so that now issues with professionalism carry the highest risk of trainees failing a year (odds ratio of 17.2).[10] There are four key components to success.

1. Make expectations clear

Criterion referenced assessments based on pre-specified standards are now commonplace in health professional education. Nevertheless, having explicit expectations is a precursor to making judgements about whether a trainee has met those standards. This could be in the form of expected learning outcomes or codes of conduct.[11] The program must then communicate these to the trainees in multiple ways to ensure understanding.

2. Make space to convey supervisor uncertainty

We need a system whereby it becomes easy for examiners to convey their uncertainty while encouraging them to provide rich data to inform learning (and to inform decision making). At Otago, we tried introducing a ‘borderline’ category to address this but found that unworkable. This was because many (staff and students) still perceived ‘borderline’ as being equivalent to ‘fail’ and it was therefore not used often. Even when used, a series of borderline results was usually insufficient to help make robust end of year decisions, largely because any accompanying narrative was not sufficiently informative. As a result, we made a very simple change that had a surprisingly effective result: we introduced the category of ‘conditional pass’ (CP).[10] See Box 1 for a worked example. This was usually determined based on a series of assessment episodes or WBAs. It could be used after single assessments but “unsure” or “insufficient data” could also be used for single observations. Put simply, if an observed performance or series of performances was not a clear pass and not clearly irremediable then we encourage the assessor or supervisor to give CP. However, in doing so, they must also provide the conditions needed for that trainee to pass.

3. Use future-focused language that aligns with assessor views

Invoking a language that is familiar to clinicians has been shown to be helpful in other areas of assessment[12]. In cases of uncertainty, and when a CP is given, assessors are asked “what would the trainee need to do in order for you to be confident (or less uncertain) that they’re ready to progress?” Alternatively, “what would reassure you that the trainee is now up to standard?” Questions posed this way are future-focused and seem easier to answer than asking them to explain what the trainee did wrong, where the problem lies or how the deficit might best be filled. The answers the assessors give to these questions are usually easily converted into the conditions of a conditional pass. There are some similarities here with diagnostic uncertainty–if we are not sure about a patient diagnosis, we do not expect doctors to guess. Instead, we suggest they need to gather more information; and they need to be purposeful in the sort of information they should seek. Likewise, if we are uncertain about a trainee’s competence, we should not push the assessor to decide prematurely. Instead, we should ask what additional information they would need to make them less uncertain. We know that feedback is more effective for learning when presented as a narrative rather than numbers.[13] Secondly, the wording of uncertainty through ‘conditions to pass’ is like the wording of learning outcomes. This future-directed narrative contrasts with past-focused narratives that emphasize deficits. It also places more of the onus on the learner to show competence rather than a competence committee having to show incompetence. Learners must show the conditions have been met thereby positively affecting learning. Most CPs are converted to pass in our institution – attesting to learning happening.

4. Facilitate shared decision-making

Any trainee with a CP is reviewed at a progress meeting (held four times per year) where a group of staff check other indicators of that trainee’s performance, determine whether the conditions are sufficiently clear and that the trainee has been informed. This process is like what competency committees do as part of competency-based medical education (CBME). Shared decision making has four benefits. Decisions by groups generally mitigate implicit biases of individuals.[14] Secondly, knowing that these conditions are reviewed by others, and shared with any other concerns about that trainee, means that any high stakes decision no longer rests only on one assessor’s shoulders. In turn, that creates a forum that makes it easier for assessors to express their concerns and uncertainties. Thirdly, peer review helps determine if the conditions are specific enough and if they are reasonable for the student’s stage of learning, thus adding another component of defensibility. Finally, it contributes to supervisors having a shared mental model of what a satisfactory performance looks like.

Closing remarks

The contention is that ‘conditional pass’ works synergistically with WBAs to act on concerns, particularly those related to professionalism, and partially addresses the failure to fail problem. The automatic creation of a paper trail, being clear about future expectations, being clear about reasons for decisions and creating joint decision-making have all contributed to assessors being able to convey uncertainty which creates greater confidence, and defensibility in identifying students who are not yet ready to progress. The system itself also promotes change. Regular meetings of staff to discuss CPs have been associated with a slow cultural shift in staff viewpoints from using assessments solely to judge students towards using them to help students. Such meetings are also surprisingly well attended attesting to staff perceiving them as a good use of their time. The process has similarities to performance reviews of underperforming employees–fair process includes making the deficits known, creating clear goals, and then giving an opportunity for the employee to meet those goals. Failure to improve then becomes a defensible basis to back up high-stakes decisions. This list should not be regarded as a full programmatic assessment model or as a full CBME model. This is because the trainees who receive the most attention around setting and meeting personal learning goals are only those who are not clearly progressing as expected. When time and resources are short, one could argue that most time and energy should be devoted to the few trainees who give us the most concern, while also setting up the circumstances in which learning can occur. A trainee is offhand with patients on two occasions. Each episode on its own would not cause concern but a pattern may be emerging. Mitigating factors from the trainee’s perspective are noted. The breaches are not judged to be sufficient to fail the trainee but if unchecked, could be seen as serious. The trainee is therefore given a CP, with the condition that they need to show evidence of reflecting the patient’s concerns during the consultation. As the year progresses, and at subsequent progress meetings, the information related to the student’s CP is noted. If the trainee responds to that feedback, they pass. If they do not, they fail. A fail decision is defensible as there is an automatic paper trail. Trainees must meet all conditions of a conditional pass before they are permitted to sit any end of year examinations. There is one exception–if the conditions that need to be met are likely to be assessed in the examinations, then the trainee is permitted to sit, as passing the exam would provide us with sufficient evidence that the conditions were met. An example here could be having a sufficient knowledge base. However, there are many conditions we are not confident an exam could or would pick up.

14 in total

1. Making sense of work-based assessment: ask the right questions, in the right way, about the right things, of the right people.

Authors: Jim Crossley; Brian Jolly
Journal: Med Educ Date: 2012-01 Impact factor: 6.251

2. Trainee doctors' views on workplace-based assessments: Are they just a tick box exercise?

Authors: Taruna Bindal; David Wall; Helen M Goodyear
Journal: Med Teach Date: 2011 Impact factor: 3.650

3. Failure to fail: the perspectives of clinical supervisors.

Authors: Nancy L Dudek; Meridith B Marks; Glenn Regehr
Journal: Acad Med Date: 2005-10 Impact factor: 6.893

4. The reliability of long and short cases undertaken as practice for a summative examination.

Authors: T J Wilkinson; L J D'Orsogna; B R Nair; S J Judd; C M Frampton
Journal: Intern Med J Date: 2010-08 Impact factor: 2.048

5. Reliability of the long case.

Authors: Tim J Wilkinson; Peter J Campbell; Stephen J Judd
Journal: Med Educ Date: 2008-09 Impact factor: 6.251

Review 6. Workplace-based assessment: a review of user perceptions and strategies to address the identified shortcomings.

Authors: Jonathan Massie; Jason M Ali
Journal: Adv Health Sci Educ Theory Pract Date: 2015-05-24 Impact factor: 3.853

7. What we measure … and what we should measure in medical education.

Authors: John R Boulet; Steven J Durning
Journal: Med Educ Date: 2018-09-14 Impact factor: 6.251

8. Decision-making bias in assessment: the effect of aggregating objective information and anecdote.

Authors: Mike J Tweed; Mark Thompson-Fawcett; Tim J Wilkinson
Journal: Med Teach Date: 2013-06-28 Impact factor: 3.650

9. Joining the dots: conditional pass and programmatic assessment enhances recognition of problems with professionalism and factors hampering student progress.

Authors: Tim J Wilkinson; Mike J Tweed; Tony G Egan; Anthony N Ali; Jan M McKenzie; MaryLeigh Moore; Joy R Rudland
Journal: BMC Med Educ Date: 2011-06-07 Impact factor: 2.463

Review 10. Student progress decision-making in programmatic assessment: can we extrapolate from clinical decision-making and jury decision-making?

Authors: Mike Tweed; Tim Wilkinson
Journal: BMC Med Educ Date: 2019-05-30 Impact factor: 2.463