Literature DB >> 25690440

Simulation as a high stakes assessment tool in emergency medicine.

Abstract

The Australasian College for Emergency Medicine (ACEM) will introduce high stakes simulation-based summative assessment in the form of Objective Structured Clinical Examinations (OSCEs) into the Fellowship Examination from 2015. Miller's model emphasises that, no matter how realistic the simulation, it is still a simulation and examinees do not necessarily behave as in real life. OSCEs are suitable for assessing the CanMEDS domains of Medical Expert, Communicator, Collaborator and Manager. However, the need to validate the OSCE is emphasised by conflicting evidence on correlation with long-term faculty assessments, between essential actions checklists and global assessment scores and variable interrater reliability within individual OSCE stations and for crisis resource management skills. Although OSCEs can be a valid, reliable and acceptable assessment tool, the onus is on the examining body to ensure construct validity and high interrater reliability.

Entities: Disease Gene Species

Keywords: educational measurement; patient simulation

Mesh：

Year: 2015 PMID： 25690440 PMCID： PMC4415593 DOI： 10.1111/1742-6723.12370

Source DB: PubMed Journal: Emerg Med Australas ISSN： 1742-6723 Impact factor: 2.151

The Australasian College for Emergency Medicine (ACEM) will introduce high stakes simulation-based summative assessment in the form of Objective Structured Clinical Examinations (OSCEs) into the Fellowship examination from 2015, while at the same time introducing Work-Based Assessments (WBAs) that specifically exclude simulation as an assessment tool. The aim of the present paper is to describe the current evidence for the use of simulation in high stakes assessment. OSCEs have been used internationally by the United States Medical Licensing Examination (USMLE) since 2004 and the College of Emergency Medicine (UK) Fellowship exam from 2005. Waterson has described the process the NSW Medical Board followed to incorporate simulation methods into its high stakes Performance Assessment Program and in particular describes the need for reliability, validity and acceptability. Reliability is the extent to which test results will be reproduced by different raters (interrater reliability), by a candidate on different occasions (test–retest reliability) or by subsets of the same test (internal consistency). Reliability is considered best achieved by standardising as many components of tests as possible (e.g. case design and delivery, scoring criteria and raters). Validity is the extent to which the outcomes of a test faithfully reflect the tasks and traits it is intended to. A test has high content validity if experts have been adequately consulted about relevant competencies; construct validity if candidates’ scores increase with their level of experience; and concurrent validity if candidates’ scores on the test correlate with those of other tests, which assess the same tasks and traits. High validity is achieved in part by retaining as many variables as possible so as to preserve the realism of the cases.1 Miller's hierarchical performance assessment model emphasises that it is important to recognise that, no matter how realistic the simulation, it is still a simulation and examinees do not necessarily behave as in real life. This reinforces the need for summative WBAs during the period of training, scored against a recognised curriculum (Fig. 1).

Figure 1

Miller's prism of clinical competence (aka Miller's pyramid). Based on the work by Miller GE, The Assessment of Clinical Skills/Competence/Performance; Acad. Med. 1990; 65(9): 63–67. Adapted by Drs. R. Mehay and R. Burns, UK (Jan 2009). Boulet has described some obstacles to overcome for summative, simulation-based assessments being used in emergency medicine (EM) including: choosing the appropriate simulation tasks, developing appropriate metrics, assessing the reliability of test scores and providing the evidence to support the validity of test score inferences. Particularly emphasising that as attempts are made to assess skills more broadly and combined with new technologies then there is a necessity to create new metrics and studies to support their accuracy.2 Table 1 describes the types of criteria used in scoring simulations,3 and Table 2 describes the barriers to using simulation in testing.4

Table 1

Criteria for scoring simulations3

Criteria	Example
Explicit Process	Case-specific checklist used in a standardised patient chest-pain station to record the history findings obtained and physical examination manoeuvres performed by an examinee
Implicit Process	Global judgment of a physician-rater observing an examinee's work with an integrated simulator in a trauma-type scenario
Explicit Outcome	Indicators of overall patient status (alive vs dead; complications; physiological indicators) at the conclusion of a computer-based clinical simulation
Implicit Outcome	Global judgment of a physician-rater inspecting the sutures made by an examinee on a skin pad
Combined Criteria	Task-specific checklist of explicit process and outcome criteria for observation and inspection of an end-to-end anastomosis of pig bowel

Table 2

Identified barriers to using simulation as an assessment tool4

Costs and logistics

Standardisation across multiple simulation sites

Exposure of simulation modalities to trainees before high-stakes testing

Overreliance on psychometric criteria that can lead to measures (e.g. checklists) that may fail to capture the complexities involved in healthcare, such as caring for the patient with multiple comorbidities

Validity, especially in maintenance of licensure and certification where little evidence exists

Transferability to actual clinical practice. Training and recruitment of the raters for high-stakes simulation-based assessment

Evidence base for some simulation based activities not yet robust enough for high-stakes assessment

Criteria for scoring simulations3 Identified barriers to using simulation as an assessment tool4 With the use of the CanMEDS (Canadian Medical Education Directives for Specialists) domains of competence for EM it has been suggested that OSCE and high-fidelity simulation are suitable for assessing the domains of Medical Expert (history taking, physical exam, technical skills performance and clinical decision making), Communicator (patient and family interaction, writing records), Collaborator (ability to manage conflict, interprofessional interaction), Manager (ability to lead) and Scholar (teaching ability).5 However, the need to validate the OSCE is emphasised by a paper examining the relationship between EM Intern OSCE and EM faculty evaluation of resident performance. Over 5 years the OSCE assessment of clinical performance did not correlate with faculty assessment in any of the measured domains of history taking, physical exam, overall performance and interpersonal skills.6 In Anaesthetics the Generic Integrated Objective Structured Assessment Tool (GIOSAT) has been evaluated to integrate Medical Expert and intrinsic (non-medical expert) CanMEDS competencies with non-technical skills. The tool has shown construct validity for the medical expert domain, but was not valid for the intrinsic CanMEDS competencies.7 Hall et al. have developed and evaluated a high-fidelity simulation-based assessment tool for EM residents.8 This uses a three-station OSCE, and for each station, a corresponding assessment tool was developed with an essential actions (EA) checklist and a global assessment score (GAS). Using GAS scores there was construct validity with increasing scores with increasing seniority, but based on the EA scores junior residents outperformed senior residents in some situations. This may illustrate the problem with EA checklists for high stakes senior assessments in that those participants who have reached mastery may not follow the stepwise approach but will still do better overall and receive superior GAS scores. This point is extremely important when designing assessment tools at Fellowship level.8 When looking at the individual skills that might be examined in an OSCE a group developed and evaluated an objective structured assessment of technical skills for neonatal lumbar puncture (OSATS-LP). The domains of sterility and CSF collection had moderate statistical reliability (κ = 0.41 and 0.51, respectively). The domains of preparation, analgesia and management of laboratories had substantial reliability (κ = 0.60, 0.62 and 0.62, respectively). The domains of positioning and needle insertion were less reliable (κ = 0.16 and 0.16, respectively. In high stakes assessment you would be requiring a kappa of >0.75 (excellent) to be an acceptable benchmark so this tool would not be suitable.9 In contrast, scoring systems for Paediatric Advanced Life Support (PALS) algorithms seem to be more reliable with interrater reliability high (0.81) when using four scenarios (asystole, dysrhythmia, respiratory arrest, shock) and four raters and also demonstrating construct validity.10 Adler and colleagues have demonstrated that a dichotomous checklist or the Global Performance Assessment Tool (GPAT), an anchored multidimensional scale, is highly reliable and has high interrater reliability (>0.9) in simulated paediatric emergencies assessing paediatric EM residents.11 Formal tools to assess the crisis resource management skills of trainees do already exist but have shown mixed results with evaluation. The Ottowa Global Rating Scale (GRS) has acceptable interrater reliability as well as construct validity12 and a study by Adler et al. has shown that a simulation programme based on four cases (apnoea, asthma, supraventricular tachycardia and sepsis) can reliably measure and discriminate competence.13 An interesting paper on senior intensive care trainees using high-fidelity simulation for a simulated specialist examination using the Anaesthesia Non-technical Skills (ANTS) scale and the GRS, only showed a fair interrater reliability for the ANTS and GRS scores as well as only a fair (weighted kappa, 0.32) for non-technical skills pass or fail and for technical skills pass and fail (weighted kappa, 0.36). The low interrater reliability for the ANTS and GRS rating scales, which are evaluated and well regarded tools, is a real concern for a potential high stakes assessment examination.14 In the anaesthetic literature Everett and colleagues have validated checklists and a global rating scale as part of the Managing Emergencies in Paediatric Anaesthesia (MEPA) course, noting that at least two raters were required to achieve acceptable reliability, claiming that the global rating scale allows raters to make a judgement regarding a participant's readiness for independent practice.15 Although it does seem possible that simulation in the form of OSCEs can be a valid, reliable and acceptable assessment tool, the onus is on the examining body to ensure construct validity, high interrater reliability (even within individual stations) for both Medical Expert or technical skills as well as the non-technical skills. This will ensure the OSCEs are acceptable to candidates, examiners and other health professionals.

Competing interests

None declared.

12 in total

1. A pilot study using high-fidelity simulation to formally evaluate performance in the resuscitation of critically ill patients: The University of Ottawa Critical Care Medicine, High-Fidelity Simulation, and Crisis Resource Management I Study.

Authors: John Kim; David Neilipovitz; Pierre Cardinal; Michelle Chiu; Jennifer Clinch
Journal: Crit Care Med Date: 2006-08 Impact factor: 7.598

2. Development and evaluation of high-fidelity simulation case scenarios for pediatric resident education.

Authors: Mark D Adler; Jennifer L Trainor; Viva Jo Siddall; William C McGaghie
Journal: Ambul Pediatr Date: 2007 Mar-Apr

Review 3. Assessing competence in emergency medicine trainees: an overview of effective methodologies.

Authors: Jonathan Sherbino; Glen Bandiera; Jason R Frank
Journal: CJEM Date: 2008-07 Impact factor: 2.410

4. Development and evaluation of a simulation-based resuscitation scenario assessment tool for emergency medicine residents.

Authors: Andrew Koch Hall; William Pickett; Jeffrey Damon Dagnone
Journal: CJEM Date: 2012-05 Impact factor: 2.410

5. The Managing Emergencies in Paediatric Anaesthesia global rating scale is a reliable tool for simulation-based assessment in pediatric anesthesia crisis management.

Authors: Tobias C Everett; Elaine Ng; Daniel Power; Christopher Marsh; Stephen Tolchard; Anna Shadrina; Matthew D Bould
Journal: Paediatr Anaesth Date: 2013-06-26 Impact factor: 2.556

6. Reliability and validity of a scoring instrument for clinical performance during Pediatric Advanced Life Support simulation scenarios.

Authors: Aaron Donoghue; Akira Nishisaki; Robert Sutton; Roberta Hales; John Boulet
Journal: Resuscitation Date: 2010-01-04 Impact factor: 5.262

7. "GIOSAT": a tool to assess CanMEDS competencies during simulated crises.

Authors: Victor M Neira; M Dylan Bould; Amy Nakajima; Sylvain Boet; Nicholas Barrowman; Philipp Mossdorf; Devin Sydor; Amy Roeske; Stephen Noseworthy; Viren Naik; Dermot Doherty; Hilary Writer; Stanley J Hamstra
Journal: Can J Anaesth Date: 2013-01-19 Impact factor: 5.063

8. Assessing the validity evidence of an objective structured assessment tool of technical skills for neonatal lumbar punctures.

Authors: Maya S Iyer; Sally A Santen; Michele Nypaver; Kavita Warrier; Stuart Bradin; Rachel Chapman; Jennifer McAllister; Jennifer Vredeveld; Joseph B House
Journal: Acad Emerg Med Date: 2013-03 Impact factor: 3.451

9. Comparison of checklist and anchored global rating instruments for performance rating of simulated pediatric emergencies.

Authors: Mark D Adler; John A Vozenilek; Jennifer L Trainor; Walter J Eppich; Ernest E Wang; Jennifer L Beaumont; Pamela R Aitchison; Paul J Pribaz; Timothy Erickson; Marcia Edison; William C McGaghie
Journal: Simul Healthc Date: 2011-02 Impact factor: 1.929

10. Summative assessment in medicine: the promise of simulation for high-stakes evaluation.

Authors: John R Boulet
Journal: Acad Emerg Med Date: 2008-09-05 Impact factor: 3.451

3 in total

1. Assessing competency using simulation: the SimZones approach.

Authors: Christopher Roussin; Taylor Sawyer; Peter Weinstock
Journal: BMJ Simul Technol Enhanc Learn Date: 2020-09-03

2. Attitudes Towards Introduction of Multiple Modalities of Simulation in Objective Structured Clinical Examination (OSCE) of Emergency Medicine (EM) Final Board Examination: A Cross-Sectional Study.

Authors: Loui K Alsulimani; Fayhan M Al-Otaiby; Yasser H Alnofaiey; Fares A Binobaid; Linda M Jafarah; Daniyah A Khalil
Journal: Open Access Emerg Med Date: 2020-12-01

Review 3. The feasibility of simulation-based high-stakes assessment in emergency medicine settings: A scoping review.

Authors: Loui K Alsulimani
Journal: J Educ Health Promot Date: 2021-11-30

3 in total