Literature DB >> 26955280

Simulation-based assessments in health professional education: a systematic review.

Tayne Ryall¹, Belinda K Judd², Christopher J Gordon³.

Abstract

INTRODUCTION: The use of simulation in health professional education has increased rapidly over the past 2 decades. While simulation has predominantly been used to train health professionals and students for a variety of clinically related situations, there is an increasing trend to use simulation as an assessment tool, especially for the development of technical-based skills required during clinical practice. However, there is a lack of evidence about the effectiveness of using simulation for the assessment of competency. Therefore, the aim of this systematic review was to examine simulation as an assessment tool of technical skills across health professional education.
METHODS: A systematic review of Cumulative Index to Nursing and Allied Health Literature (CINAHL), Education Resources Information Center (ERIC), Medical Literature Analysis and Retrieval System Online (Medline), and Web of Science databases was used to identify research studies published in English between 2000 and 2015 reporting on measures of validity, reliability, or feasibility of simulation as an assessment tool. The McMasters Critical Review for quantitative studies was used to determine methodological value on all full-text reviewed articles. Simulation techniques using human patient simulators, standardized patients, task trainers, and virtual reality were included.
RESULTS: A total of 1,064 articles were identified using search criteria, and 67 full-text articles were screened for eligibility. Twenty-one articles were included in the final review. The findings indicated that simulation was more robust when used as an assessment in combination with other assessment tools and when more than one simulation scenario was used. Limitations of the research papers included small participant numbers, poor methodological quality, and predominance of studies from medicine, which preclude any definite conclusions.
CONCLUSION: Simulation has now been embedded across a range of health professional education and it appears that simulation-based assessments can be used effectively. However, the effectiveness as a stand-alone assessment tool requires further research.

Entities: CellLine Chemical Disease Gene Species

Keywords: competency; health care; students; technical skills

Year: 2016 PMID： 26955280 PMCID： PMC4768888 DOI： 10.2147/JMDH.S92695

Source DB: PubMed Journal: J Multidiscip Healthc ISSN： 1178-2390

Introduction

Assessment, in the most expansive definition, is used to identify appropriate standards and criteria and ascertain quality through judgment.1 There are a multitude of assessment modes, adopted for various reasons, such as measuring performance or skill acquisition, and these can be used at different stages of the learner’s educational trajectory. There has been much debate, however, about the effectiveness of various forms of assessment, such as multiple-choice question examinations, and this has influenced educators’ desire to develop assessments that are more realistic and performance based.2,3 The types of assessment used in pre- and postregistration health professional education have been widely reviewed.4–9 A compounding challenge with assessment for health professionals and health students is determining competency of practice. This is a complex but necessary component of education and training. In more recent decades, performance-based assessment practices have gained strong momentum4 as educators have sought to examine authentic learner performance with the knowledge that these types of assessments are a driving influence on learning and teaching practices. Out of this need for authentic assessment came the adoption of simulation-based assessment. Simulation, as a technique for both training and assessment, has been used in the aeronautical industry and military fields since the early 1900s, with the first flight simulator being developed in 1929.10 The complexity and sophistication of simulation improved progressively from the 1950s, driven primarily by the integration of computer-based systems. The translation of simulation into health education has resulted in an almost exponential growth in the use of simulation as an educational tool. Simulation aims to replicate real patients, anatomical regions, or clinical tasks or to mirror real-life situations in clinical settings.11 The increasing implementation of simulation-based learning and assessment within health education has been driven by training opportunities to practice difficult or infrequent clinical events, limited clinical placement opportunities, increasing competition on clinical educators’ time, new diagnostic techniques and treatment, and greater emphasis being placed on patient safety.11–15 Accordingly, health educators have adopted simulation as a viable educational method to teach and practice a diverse range of clinical and nonclinical skills. Simulation modalities such as standardized patients (SPs), anatomical models, part-task trainers, computerized high-fidelity human patient simulators, and virtual reality are in use within health education.10,11,16 In particular, these techniques have been used in preregistration health professional training, as simulation allows learners to practice prior to clinical placement and patient contact, maximizing learning opportunities and patient safety.6,17,18 Simulation provides a safe environment to practice clinical skills in a staged progression of increasing difficulty, appropriate for the level of the learner. Practicing skills on real patients can be difficult, costly, time consuming, and potentially dangerous and unethical.11,12,14,15 As such, health professional educators have increasingly adopted simulation-based assessment as a viable means of evaluating student and health professional populations. In addition, simulation-based assessments are a means of creating an authentic assessment, replicating aspects of actual clinical practice. While there has been widespread acceptance of simulation as an educational training tool, with evidence supporting its use in health education, the effectiveness of simulation-based assessments in evaluating competence and performance remains unclear. With an increasing use of simulation in health education worldwide, it is salient to review the literature related to simulation-based assessments. Therefore, the aim of this systematic literature review was to evaluate the evidence related to the use of simulation as an assessment tool for technical skills within health education.

Methods

This systematic review was undertaken using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.19 The review involved searching health and education databases, followed by structured inclusion and exclusion criteria with consensus across reviewers. Two raters (TR, BJ) independently screened all abstracts for eligibility. There was high agreement on the initial screen, and both raters showed excellent interrater reliability (Cohen’s kappa =0.91). Any disagreements with article eligibility were reconciled via consensus or referred to a third reviewer (CJG).

Search databases and terms

Literature was searched in the following key databases: Cumulative Index to Nursing and Allied Health Literature (CINAHL), Education Resources Information Center (ERIC), Medical Literature Analysis and Retrieval System Online (Medline), and Web of Science. The following search terms were included: allied health, medical education, nursing education, assessment, and simulation. This initial search located 1,190 articles, with another 33 located through reference list searches and gray literature (n=1,223). Following removal of duplicates, 1,064 abstracts were screened for eligibility. We reviewed 67 full-text articles for eligibility, with 21 articles chosen for the final systematic review (Figure 1). An adapted critical appraisal tool20 (Table 1) was used to determine the methodological rigor of the articles.

Figure 1

Flow diagram of search.

Table 1

Adapted critical appraisal tool

	Yes (=1)	No (=0)	Not addressed (=0)	Not applicable (=0)
1. Was the study purpose clearly stated?
2. Was relevant background literature reviewed?
3. Was the sample described in detail?
4. Was sample size justified?
5. Were the outcome measures reliable?
6. Were the outcome measures valid?
7. Was intervention described in detail?
8. Was contamination avoided?
9. Was cointervention avoided?
10. Were results reported in terms of statistical significance?
11. If multiple outcomes, was that taken into account in the statistical analysis?
12. Were the analysis methods appropriate?
13. Was educational importance reported?
14. Were dropouts reported?
15. Were the conclusions appropriate given study methods and results?

Note: Data from Law et al.20

Inclusion/exclusion criteria

Our inclusion criteria required all articles to be in English, and the databases were searched for articles published between the years 2000 and 2015. Articles needed to be research based and to have examined simulation as an assessment tool for health professionals or health professional students. Articles incorporating simulation-based assessments that explored technical and nontechnical skills were included. However, those studies that focused solely on nontechnical skills only, such as communication, interpersonal skills, and team work were excluded. The focus of this systematic review was on technical skills and we, therefore, only included studies that examined technical and nontechnical skills in combination. Technical skills were defined as those requiring the participant to complete a physical assessment (or part thereof) or to perform a treatment technique/s that required a hands-on component. The included articles all focused on simulation as an assessment tool and ideally were compared to other established forms of assessment. Due to the large number of studies and reviews that have previously investigated objective structured clinical examinations (OSCEs),21–40 all papers that investigated OSCEs were excluded. This was beyond the scope of this systematic review. Research articles focusing on the use of simulation as a training modality only were excluded. This included studies on simulation training program validity conducted by incorporating a simulation-based assessment at the end of the training. We excluded these articles as they did not address the effectiveness of the simulation as an assessment but rather as a training tool. Studies that researched a specific simulation-based assessment grading tool were also excluded as these focused on tool validation and not on the assessment process. All search outcomes were assessed by two investigators (TR, BJ), and each abstract was read by both investigators for quality control. Articles were also assessed for eligibility by their outcome measures. Many articles evaluated simulation as an assessment technique, but their primary outcome measure was a survey of participants’ attitudes on the simulation experience. In such studies, nearly all participants found it to be a positive experience, with only minor suggestions for improvement.41–50 Such studies were excluded as they did not focus on our primary aim of objectively determining the reliability, validity, or feasibility of simulation as an assessment technique.

Critical appraisal

The methodological quality of all included full-text articles was assessed using a modified critical appraisal tool.20 The McMasters Critical Review Form for Quantitative Studies has been used repeatedly in systematic reviews of health care51–54 and it demonstrates strong interrater reliability. We modified this tool using 15 items, which were scored dichotomously (Yes =1/No =0, and “Not addressed” and “Not applicable” were also scored zero). As such, a maximum score of 15 was permissible, and all 67 full-text articles were appraised and scored. Both reviewers (TR, BJ) independently scored the articles, and the final inclusion of full-text articles was discussed with the third reviewer (CJG) to find consensus, with the critical appraisal tool score used as a measure of methodological rigor. Forty-six articles were excluded for the reasons shown in Figure 1.

Findings

Of the 21 articles included, the majority were from the field of medicine (n=16), with the remaining being from the disciplines of paramedics (n=2), nursing (n=1), osteopathy (n=1), and physical therapy (n=1) (Table 2). Studies were undertaken in Australia, Denmark, New Zealand, Switzerland, and USA. There were no randomized controlled trials, and the majority of studies were of an observational study design. As such, blinding of participants and assessors to the simulation intervention was not undertaken in any of the studies. In addition, many studies used convenience samples that were not powered, and none of the studies calculated the number of participants required to achieve statistical significance, with one study commenting that their study was not adequately powered to detect differences between their two academic sites.55 Many of the studies (n=13) were pilots or had small numbers (<50) of participants (range: n=18 to n=45). A small number of studies were conducted across different health centers55–57; however, the majority of these were conducted in a single health setting or university, making it difficult to generalize the findings. The included articles had scores on the critical appraisal tool ranging from 8 to 14/15. Some of the main reasons for low scores were a lack of description of the participants and a lack of either statistical analysis or description of the analysis.

Table 2

Summary of included articles

Author (year)	Sample size/discipline	Primary outcome	Simulation type	Assessment type	Main findings
Asprey et al (2007)67	101 Medical students, 71 physician assistant students	Comparison of third year medical student scores with physician assistant scores on SP assessments - clinical skill checklist	SP	Checklists – completed by the SP	SP-based assessments were not able to distinguish differences between medical and physician assistant performance and are therefore measuring clinical experience as the two groups had comparable clinical experience.
Bick et al (2013)55	26 Anesthetists	Comparison of expert vs novice anesthetist performance on a simulated transesophageal echocardiography examination – time and accuracy checklist	Virtual reality	Checklists – completed via video by blinded (voice masking technology) experts	Simulated transesophageal echocardiography examination is able to discriminate between expert and novice performers.
Boulet et al (2003)59	24 Medical students, 13 medical residents	Reliability and validity of simulation assessment for final year medical students and residents	High-fidelity patient simulator	Checklists – scored via video by four raters (two unaware of the training levels of participants)	Simulation can validly and reliably assess acute care skills, but multiple encounters are required to predict performance.
Burns et al (2013)62	26 Medical doctors (16 interns, five PGY-2, two PGY-3, two chief residents, and three pediatric hematology/oncology fellows)	Performance scores of interns (<1-year experience in pediatrics) vs the rest of the residents and fellows (>1-year experience)	High-fidelity pediatric patient simulator	Checklists and global rating scale – scored via video by two raters	The manikin- based assessment provided reliable and valid measures of participants’ performance.
Edelstein et al (2000)68	147 Medical students	Comparison of three measures of medical student performances - SP examinations, computer- based case simulations, and traditional performance indicators (GPAs)	SP and computer- based simulations	Unclear how they were assessed on the SP examinations – rated by both the SP as well as the author	Differences were found between the SP assessments and computer simulations and traditional assessments.A multipronged assessment approach is supported.
Fehr et al (2011)61	27 Anesthetic residents and eight anesthetic fellows	Reliability and validity of simulation assessments for anesthesia residents and pediatric fellows	High-fidelity patient simulator	Checklists – rated by lead investigator and two experts	The multiple-scenario assessment could reliably assess pediatric residents and determine the skill level of participants. Further validation required, including comparing of clinical performance.
Gimpel et al (2003)71	121 Osteopathy students	Reliability and validity of simulation assessment the Comprehensive Osteopathic Medical Licensing Examination – USA performance-based clinical skills examination (COMPLEX-USA-PE)	SP	Checklists – rated by SPs on all stations and experts on the stations that the participants may have chosen to perform a manipulation technique as part of their treatment	Acceptable reliability and validity of the assessment. Use of the Complex-USA-PE is supported for future use to assess the readiness of osteopathic medical students for clinical practice.
Grantcharov et al (2005)75	Ten medical students, ten medical residents, and eight medical doctors	Validity of virtual reality simulator for assessment of gastrointestinal endoscopy skills.Comparison of time and accuracy scores in three groups (experienced, residents, and medical students)	Virtual reality	Parameters were calculated and recorded by the computer	Virtual reality simulator was able to distinguish between the three levels of experience and therefore possessed acceptable construct validity.
Hawkins et al (2004)70	54 Medical doctors	Evaluation of computer- based case simulations and SP methods for patient management skills via checklists	Computer-based case simulations and SP	Computer-based scoring and checklists and a patient perception questionnaire for the SP station – rated by the SP	Computer-based case simulations and SP examinations were unable to distinguish between different experience levels; it appears that they may be useful in a multipronged assessment approach.
Iyer et al (2013)78	16 Medical residents	Validity of objective structured assessment of technical skills for neonatal lumbar puncture via competency-based scoring tool	Task trainer	Checklist and global rating scale – rated by six raters via video	Reasonable evidence of validity of this assessment tool.Adoption of this tool in clinical environments is recommended for providing formative and summative feedback to improve resident skills.
Lammers et al (2009)76	212 Paramedics	Identifying performance of paramedics in simulated pediatric emergencies using prevalidated scoring checklists	Low-, medium-, and high-fidelity patient simulators	Checklist – rated by one evaluator at the time and an author via video later, who amended scores if required and made the final decision	Scores from manikin- based simulations objectively identified multiple performance deficiencies in paramedics.
Lipner et al (2010)56	115 Medical cardiology fellows and cardiologists	Evaluation of simulation- based assessment of technical and cognitive skills of physicians during coronary interventions by determining whether assessment differentiates performances of novice, skilled, and expert	Virtual reality	Computer-based scoring system based on the consensus of a committee rating that included rating of the potential risk of taking that action	This assessment approach was able to identify poor- performance physicians who are unlikely to be providing appropriate patient care.
McBride et al (2011)58	29 Medical residents (13 interns, nine second year residents, six third year residents, and one chief resident)	Multiple-scenario assessment (n=20) of residents’ acute pediatric management skills using action checklists and global rating scales	High-fidelity pediatric patient simulator	Checklists and global rating scale – rated by two raters: one at the time, and one via video later	Residents’ scores in the assessments were found to be reliable and valid measures of their ability as long as multiple scenarios were used. Scores were able to discriminate between different skill levels.
Murray et al (2007)60	64 Medical residents and 35 anesthesiologists	Evaluation of anesthesia resident performances in a simulation- based intraoperative environment using an item score checklist	High-fidelity patient simulator	Checklists and time taken to perform the key actions – rated by two blinded raters via video; where there were large discrepancies, a third rater scored the performance	The simulation-based assessment was found to be a reliable and valid measure of performance and was able to distinguish more experienced from early training participants.
Nagoshi et al (2004)69	39 Medical students, 49 medical residents, and eleven medical fellows	Evaluation of a simulation- based standardized assessment of geriatric management skills by examining reliability and validity of the tool and scored using checklists	SP	Checklists – rated by the SP	The assessment was found to be reliable but did not discriminate between levels of training of participants.
Nunnink et al (2010)57	45 Medical doctors (ICU trainees)	Validity of simulation- based assessments of intensive care trainees procedural skills by comparing with written exams and oral viva formats	Low-high fidelity patient simulators and SP’s	Templates for written responses and checklists for all other – one rater (also rated the written questions on the same topic)	There was a lack of correlation between exam formats suggesting a multi- modal approach to assessment is favorable; and that simulation may be more useful in assessment of procedural skills.
Panzarella and Manyon (2008)66	34 Physical therapy students	Evaluation of an Integrated Standardized Patient Examination for Physical therapy students	SP	Checklists for the subjective and objective assessment, four-point rubric for the integration question – rated by four raters: two in the room (the SP and an expert rater), one via video (the participant) and a “criterion” rater (one of the authors blinded to the others’ score)	Student scores need comparison to other forms of assessment of performance.Interrater reliability was found to vary between the raters training for raters needs investigation. Good content validity reported by participants and expert raters.Needs further testing.
Penprase et al (2012)65	70 Registered nurses	Evaluation of simulation- based assessment of applicants to Nurse Anesthesia programs by comparing scores to candidate interview scores	High-fidelity patient simulator	Checklists – rated by one of two trained raters concealed behind a one-way mirror	Simulation-based assessment may provide a useful adjunct to the admission process.
von Wyl et al (2009)77	30 Paramedics	Determine the interrater reliability of assessing technical and nontechnical skills of paramedics during simulated emergency scenarios using previously validated checklists	Medium-fidelity patient simulator	Checklists – rated by two raters (an experienced emergency physician and a psychologist) via video	The assessment of technical and nontechnical skills using simulation-based assessment was shown to be feasible and reliable. There was a significant positive correlation between both skill types.
Waldrop et al (2009)63	12 Medical interns and 44 medical residents	Evaluate the reliability and validity of simulation- based assessment in the management of intraoperative equipment-related errors in anesthesia	High-fidelity patient simulator	Checklists – rated initially by two raters for the first four participants; as scores were 100% in agreement, only one rater was used for the remaining participants	The assessment was found to be an effective, reliable, and valid method to determine individual performance.The assessment was also able to discriminate between the experience levels of participants.
Weller et al (2005)64	21 Medical doctors	Evaluation of the psychometric properties of a simulation- based assessment for anesthetists	High-fidelity patient simulator	Global rating scale - rated by four blinded raters via video	The results show that 12–15 cases were required for acceptable reliability in this assessment modality.At lower level of performance, trainees overrated performance compared to those of high-performance levels, suggesting self- assessment may not be reliable.

Abbreviations: GPA, grade point average, ICU, intensive care unit; PGY, postgraduate year; SP, standardized patient.

Themes

Eight (40%) of the studies used high-fidelity patient simulators to assess medical students, professionals, or applicants for a postgraduate nursing degree; with six (27%) of the studies using an SP combined with a clinical examination as the form of assessment. All of these studies varied in the format that they were completed, eg, the number of stations, time allowed, and the number of assessors used. The SP studies were conducted in the disciplines of medicine, physical therapy, and osteopathy and typically involved students rather than health professionals. Of the remaining studies, three studies (14%) used a virtual reality simulator to assess novice (medical students and professionals) to experienced professionals; two studies (9%) used manikins with varying levels of fidelity, from low-fidelity manikins to high-fidelity human patient simulators, to assess paramedics and intensive care unit (ICU) medical trainees; one study (5%) used a medium-fidelity patient simulator to assess paramedics; and one study (5%) included a part-task trainer to assess medical residents. Two studies compared SPs or low- to high-fidelity patient simulator assessments to other form(s) of assessment, such as paper-based examinations, oral examinations, or current university grade point averages. As such, the major themes that emerged are related to the type of simulation modality chosen for assessment (Supplementary material presents the definitions).

High-fidelity simulation

Eight studies58–65 conducted simulation-based assessment using high-fidelity human patient simulators. The simulators used were METI Emergency Care Simulator® (Medical Education Technologies Inc., Sarasota, FL, USA)58; METI HPS® (Sarasota, FL, USA)63,64; METI BabySim® (Sarasota)58; METI PediaSIM HPS® (Sarasota)61,62; SimMan 3.3 (Laerdal Medical, Wappingers Falls, NY, USA)65; SimNewB® (Laerdal Medical)58,61,62; and a life-size simulator developed by MEDSIM-EAGLE® (Med-Sim USA, Inc., Fort Lauderdale, FL, USA).59,60 All of the studies were conducted with medical students or practicing doctors, except for one focusing on postgraduate nursing applicants.65 Generally, the reliability and validity of assessment using high-fidelity human patient simulators was found to be good. All of the medical-related studies used multiple scenarios (eg, trauma, myocardial infarction, and respiratory failure) using high-fidelity human patient simulators to assess the candidates, with the postgraduate nursing applicants only being assessed on one anesthetic scenario within a group of three. As all of the assessments were targeting the clinical performance of students and doctors on high-risk skills, it was not surprising that high-fidelity patient simulators were a popular assessment modality. Unfortunately, this type of assessment attracted generalizability coefficients less than what are acceptable for a high-stakes examination such as a summative performance assessment (G coefficients <0.8). All high-fidelity human patient simulator assessments were found to be suitable for low-stakes examinations (eg, a formative assessment of performance). The evidence from these studies showed that increasing scenario numbers, rather than increasing the number of raters, increased assessment reliability.58–64 Some researchers suggested that 10–12 scenarios, with three to four assessors, would be required to reach an acceptable level of reliability of 0.8,64 while others observed that ten scenarios with two raters did not reach these levels (0.57).61 When multiple raters (two to four) were used, interrater reliabilities of 0.59–0.97 (the majority being >0.8) were achieved.58–62 While it was unclear how some of the raters reviewed the scenarios,61,63 the majority rated the performance via a video recording of the scenarios,59,60,62,64 with one study having a rater present at the time of the scenario as well as one scoring the performance via a video recording,58 whereas one study used a one-way mirror to rate participants at the time.65 When assessing pediatric interns, residents, and hematology/oncology fellows on sickle cell disease scenarios, checklists had superior interrater reliability than global rating scales.62 Construct validity was high in studies that used high-fidelity human patient simulators for assessing participants with varying degrees of experience (medical students through to specialists) as they were able to differentiate between the different levels of experience.58–63 The pilot study investigating the correlation between high-fidelity human patient simulator assessment and face-to-face interviews for applicants applying for a postgraduate anesthetic nursing course found that there was a significant positive relationship (r=0.42) between the two and that high-fidelity human patient simulator assessment was a suitable adjunct to the admissions process.65

Standardized patients

There were six studies that used SPs within a clinical examination. These simulation-based assessments varied significantly in total duration, the number of stations, the amount of time per station, the number of SPs used, the types of stations used, and skills assessed, but all had the common feature of using SPs. These studies investigated SP encounters, but with fewer SP encounters than traditional OSCEs, and allowed participants longer time with each SP and expected more than just one technical skill to be performed at each station, eg, a full physical therapy assessment and treatment66 under an assessment format. Four were from the medical profession,67–70 with the others being from physical therapy66 and osteopathy.71 Two of the studies combined the SP assessments with computer-based assessments to assess medical students68 as well as emergency medicine, general surgery, and internal medicine doctors.70 Unfortunately, only three of these articles listed the presenting problem of the SP.66,69,71 Within SP assessments, participants’ performance was assessed by trained assessors,66,71 clinical experts,66 self-assessment,66 and the SPs themselves.66,67,69–71 In all studies, assessors were trained to score the encounters, but only one study commented on the assessor’s reliability. This study found that for physical therapy students, SP ratings did not significantly correlate with the ratings of other raters.66 In contrast, strong agreement between experts and the criterion rater were evident. This suggests that experts and criterion raters are better placed than SPs to rate performances during high-stakes examinations.66 When checklists were used by SPs, they negatively correlated with experience, possibly as more experienced doctors may solve problems and make decisions using fewer items of information, therefore checklists may lead to less valid scores.70 Results varied as to whether SP examinations were able to determine clinical experience.67,69 Nonetheless, they were found to be reliable in assessing osteopathic students’ readiness to treat patients.71 The correlation between SP examinations and computer-based assessments was varied, with minimal correlation (r=0.24 uncorrected and r=0.40 corrected)68 to low-to-moderate correlation (r=0.34–0.48).70 However, SP examinations showed low correlation with curriculum results within physical therapy (<0.3).66 Overall, it was concluded that SP-based assessments should not be used in isolation to assess clinical competence.

Virtual reality

Virtual reality is increasingly being adopted as a simulation tool. In the health professions, virtual reality simulation uses computers and human patient simulators to create a realistic and immersive learning and assessment environment.72–74 Three studies used virtual reality55,56,75 in simulation-based assessments comparing novice (medical students or residents), skilled (residents), and expert medical clinicians. Three different virtual reality systems were applied and all were shown to be able to differentiate between participants’ skill levels. The systems used were the SimSuite system (Medical Simulation Corporation, Denver, CO, USA), which includes an interactive endovascular simulator56; GI Mentor II computer system (Simbionix Ltd, Cleveland, OH, USA)75; and a Heartworks TEE Simulator (Inventive Medical Ltd, London, UK), which includes a manikin and haptic-simulated probe.55 All three systems were found to have construct validity as they were able to distinguish between the technical ability among the groups and therefore they are useful in determining those that require further training prior to clinical practice on real patients.

Mixed-fidelity patient simulators

Two studies that used low-, medium-, and high-fidelity human patient simulators during an assessment of paramedics76 and ICU medical trainees’ resuscitation skills57 were included in this review. All three levels of patient simulator fidelity were found to have high interrater reliability in these populations.76 Intensive care trainees were assessed on medium- and high-fidelity human patient simulators as well as by written and oral viva examinations.57 The written examination was shown not to correlate with either medium- or high-fidelity human patient simulation-based assessments, indicating that written and simulation assessments differed in their ability to evaluate knowledge and practical skills. Specific skill deficiencies were able to be determined when low- to high-fidelity simulators were used, therefore allowing subsequent training to be targeted to individuals’ needs.76

Medium-fidelity simulation

One study77 investigated a medium-fidelity simulator (human patient simulator designed to allow limited invasive procedures with lower fidelity needs) and a volunteer with moulage. The paramedics were assessed in pairs on two simulated scenarios (acute coronary syndrome and a severe traumatic brain injury). Both their technical and nontechnical skills were assessed via separate checklists; the two assessors rated the performance from a video-recording and were allowed to rewind as necessary. Interrater reliability (between an emergency physician and psychologist) showed good correlations, especially for technical skills such as assessment of primary airway, breathing, circulation, and defibrillation. A positive and significant relationship was found between technical and nontechnical skills. Accordingly, one rater was found to be sufficient to adequately assess technical skills, but two raters were required to demonstrate equivalent reliability for nontechnical skills.

Task trainer

One pilot study78 investigated the performance of pediatric residents in lumbar puncture using a neonatal lumbar puncture task trainer. This simulation-based assessment used a video-delayed format, in which six raters reviewed the video and assessed the pediatric residents’ performance based on the seven criteria of preparation, positioning, analgesia administration, needle insertion technique, cerebrospinal fluid (CSF) fluid return/collection, diagnostic purpose/laboratory management of CSF, and creating and maintaining a sterile field. There was good interrater reliability and validity in regard to the response process (potential bias if the raters recognized the residents; voices were not altered, but faces were not shown) and relationship to external variables (eg, previous experience in neonatal or pediatric ICUs).

Discussion

We undertook a systematic review to examine simulation as an assessment tool across health professional education. Although this review demonstrated that simulation-based assessments of technical-based skills can be used reliably and are valid, the research was constrained by the findings that simulation-based assessments were commonly used in isolation, not in combination with other assessment forms or with more than one simulation scenario. This review also demonstrated that assessments using high-fidelity simulators and SPs have been more widely adopted. High-fidelity simulation was more widely adopted in medicine and commonly used in the emergency and anesthetic specialties in which high-risk skill assessments are used more frequently. The evidence suggests that participants can be assessed reliably with high-fidelity human patient simulators combined with multiple station assessment tasks with well-constructed scenarios. Overall, the results are promising for the future use and development of simulation-based assessment in the health education field. Due to the multiplicity of simulation-based assessments, it was difficult to compare data between studies and definitive statements on which form of assessment type would be best for health disciplines and for students and practicing health professionals. In regard to health students, standardizing assessments created a fairer and more consistent approach, leading to greater equity and reliability. Simulation appears to achieve this in competency-based assessments as well as being a useful tool for predicting future performances. This area of research needs exploration as it may have the potential to determine future performances of students and their competency, especially in relation to whether students are ready for clinical environments and exposure to real patients. Simulation-based assessments may also assist newly graduated health professionals who could be deemed competent by using reliable and valid authentic assessments prior to commencing practice in a new area. In addition, simulation-based assessment is a promising approach for determining the skill level and capability for safe practice, as it appears to be able to distinguish between different levels of performance among novice and expert groups as well as being able to identify poor performers, allowing for safe practice. The methodological rigor was an issue, with many of the studies having scores on the critical appraisal tool ranging from 8 to 14/15. Many of the studies had modest participant numbers, a common limitation noted in several studies,58,62,64,70,78 which may limit the generalizability of the results. Sample size was not justified in many instances, and there was little mention of participant dropouts. In contrast, three studies had substantive participant numbers (>120 participants)68,71,76 and provided robust analyses, which increased external validity. A noticeable gap in this literature is that only three of the articles reviewed compared their simulation-based assessment to another assessment form or simulation type. These comparative studies provided a higher degree of critique of the assessment type and permitted observation of differences, which may be of assistance for health educators. We believe that comparative studies should be conducted in future research to provide evidence of assessment superiority and enhanced informed assessment choices. A continuation of this theme is that studies examining the reliability and validity of simulation-based assessments need stronger research approaches, such as blinding assessors and participants, providing precise details of the intervention, and – where possible – to avoid contamination. While we appreciate that educational research is often challenging, robust study design should be tantamount. Overall, further research is required to determine which form of simulation-based assessment is best suited in specific health professional learner situations. While it is suggested that simulation-based assessments should not be used in isolation to make an overall assessment of an individual’s clinical and theoretical skills, simulation-based assessments are being widely used and sometimes for this discrete purpose. Development of simulation-based assessment needs to continue as it will provide clarity and consistency for the assessors and participants, in addition to furthering the use of simulation in health education. As simulation is increasingly being used to replace a proportion of health student’s clinical practice time,79,80 it is expected that simulation-based assessment will become an integral component of health professional curricula and, therefore, it needs to be evidence based and valid. This will provide stronger conclusions for the use of simulation-based assessment in health professional education.

Limitations

There were several limitations of this systematic review. Studies included were limited to the English language, and there may well be other studies conducted and published in non-English-speaking publications. Due to the varying nature of the studies, we were unable to complete any form of pooled data analysis. We did not include studies that investigated the cost-effectiveness and cost analysis of simulation-based assessments. Studies of this type may highlight other areas of practicalities not highlighted in this systematic review. As already mentioned, studies investigating OSCEs were also excluded due to the extensive previous research conducted in this area. The inclusion of OSCE-based studies may have helped to strengthen the argument for SP use, as this tends to be the most common form of simulation used within the OSCE literature.

Conclusion

The use of simulation within health education is expanding; in particular, its use in the training of health professionals and students. The evidence from this review suggests that the use of SPs would be a practical approach for many clinical situations, with the use of part-task trainers or patient simulators to aid in areas in which the actors are unable to “act” or in cases wherein invasive procedures are undertaken. In assessments in which clinical skills need to be evaluated in high-pressure situations, the evidence of simulation-based assessments is that the use of patient simulators in high-fidelity environments may be more suitable than using task trainers. High-fidelity simulation assessments could also be used to incorporate and assess multidisciplinary team assessments. Overall, there is a clear need for further methodologically robust research into simulation-based assessments within health professional education.

Supplementary materials

Definitions

High-fidelity patient simulators

These are designed to allow a large range of noninvasive and invasive procedures to be performed and offer realistic sensory and physiological responses, with outputs such as heart rate and oxygen saturation usually displayed on a monitor. They can be run by a computer technician or preprogrammed to react to the participant’s actions.

Objective structure clinical examinations

These involve participants progressing through multiple stations at predetermined time intervals. They may have active or simulation-based stations that assess practical skills or passive stations such as written or video analysis, commonly used to assess theoretical knowledge.

Standardized patients

These are people trained to portray a patient in a consistent manner and present the case history of a real patient using predetermined subjective and objective responses.

Task trainers

These are models that are designed to look like a part of the human anatomy and allow individuals to perform discrete invasive procedures, for example a pelvis for internal pelvic examinations, or an arm to practice cannulation.

69 in total

1. Patient safety and simulation-based medical education.

Authors: A Ziv Stephen D Small Paul Root Wolpe
Journal: Med Teach Date: 2000 Impact factor: 3.650

2. Reliability and validity of a simulation-based acute care skills assessment for medical students and residents.

Authors: John R Boulet; David Murray; Joe Kras; Julie Woodhouse; John McAllister; Amitai Ziv
Journal: Anesthesiology Date: 2003-12 Impact factor: 7.892

3. Can simulation replace part of clinical time? Two parallel randomised controlled trials.

Authors: Kathryn Watson; Anthony Wright; Norman Morris; Joan McMeeken; Darren Rivett; Felicity Blackstock; Anne Jones; Terry Haines; Vivienne O'Connor; Geoffrey Watson; Raymond Peterson; Gwendolen Jull
Journal: Med Educ Date: 2012-05-30 Impact factor: 6.251

4. Psychometric characteristics of simulation-based assessment in anaesthesia and accuracy of self-assessed scores.

Authors: J M Weller; B J Robinson; B Jolly; L M Watterson; M Joseph; S Bajenov; A J Haughton; P D Larsen
Journal: Anaesthesia Date: 2005-03 Impact factor: 6.955

5. An evidence-based virtual reality training program for novice laparoscopic surgeons.

Authors: Rajesh Aggarwal; Teodor P Grantcharov; Jens R Eriksen; Dorthe Blirup; Viggo B Kristiansen; Peter Funch-Jensen; Ara Darzi
Journal: Ann Surg Date: 2006-08 Impact factor: 12.969

6. Determining the value of simulation in nurse education: study design and initial results.

Authors: Guillaume Alinier; William B Hunt; Ray Gordon
Journal: Nurse Educ Pract Date: 2004-09 Impact factor: 2.281

Review 7. Is the OSCE a feasible tool to assess competencies in undergraduate medical education?

Authors: Madalena Folque Patrício; Miguel Julião; Filipa Fareleira; António Vaz Carneiro
Journal: Med Teach Date: 2013-03-22 Impact factor: 3.650

Review 8. The patient experience in the emergency department: A systematic synthesis of qualitative research.

Authors: Jane Gordon; Lorraine A Sheppard; Sophie Anaf
Journal: Int Emerg Nurs Date: 2009-08-05 Impact factor: 2.142

9. A prospective comparison between written examination and either simulation-based or oral viva examination of intensive care trainees' procedural skills.

Authors: L Nunnink; B Venkatesh; A Krishnan; K Vidhani; A Udy
Journal: Anaesth Intensive Care Date: 2010-09 Impact factor: 1.669

10. Simulation can contribute a part of cardiorespiratory physiotherapy clinical education: two randomized trials.

Authors: Felicity C Blackstock; Kathryn M Watson; Norman R Morris; Anne Jones; Anthony Wright; Joan M McMeeken; Darren A Rivett; Vivienne O'Connor; Raymond F Peterson; Terry P Haines; Geoffrey Watson; Gwendolen Anne Jull
Journal: Simul Healthc Date: 2013-02 Impact factor: 1.929

26 in total

Review 1. The Past, Present, and Future of Virtual Reality in Pharmacy Education.

Authors: Leanne Coyne; Thayer A Merritt; Brittany L Parmentier; Rachel A Sharpton; Jody K Takemoto
Journal: Am J Pharm Educ Date: 2019-04 Impact factor: 2.047

2. Simulation Experiences in Canadian Physiotherapy Programmes: A Description of Current Practices.

Authors: Meaghan Melling; Mujeeb Duranai; Blair Pellow; Bryant Lam; Yoojin Kim; Lindsay Beavers; Erin Miller; Sharon Switzer-McIntyre
Journal: Physiother Can Date: 2018 Impact factor: 1.037

3. Assessing competency using simulation: the SimZones approach.

Authors: Christopher Roussin; Taylor Sawyer; Peter Weinstock
Journal: BMJ Simul Technol Enhanc Learn Date: 2020-09-03

4. The Feasibility of Virtual Reality and Student-Led Simulation Training as Methods of Lumbar Puncture Instruction.

Authors: Mark Roehr; Teresa Wu; Philip Maykowski; Bryce Munter; Shelby Hoebee; Eshaan Daas; Paul Kang
Journal: Med Sci Educ Date: 2020-11-20

5. Comparing trained student peers versus paid actors as standardized patients for simulated patient prescription counseling.

Authors: Megan N Willson; Kimberly C McKeirnan; Andrew Yabusaki; Christina R Buchman
Journal: Explor Res Clin Soc Pharm Date: 2021-10-12

6. Adaptation of a Simulation Model and Checklist to Assess Pediatric Emergency Care Performance by Prehospital Teams.

Authors: Tehnaz P Boyle; Julianne N Dugas; James Liu; Stephanie N Stapleton; Ron Medzon; Barbara M Walsh; Pamela Corey; Leonard Shubitowski; John R Horne; Richard O'Connell; Graham Williams; Kerrie P Nelson; Vinay M Nadkarni; Carlos A Camargo; James A Feldman
Journal: Simul Healthc Date: 2022-03-02 Impact factor: 2.690

Review 7. Virtual Standardized Patients for Mental Health Education.

Authors: Greg M Reger; Aaron M Norr; Michael A Gramlich; Jennifer M Buchman
Journal: Curr Psychiatry Rep Date: 2021-07-15 Impact factor: 5.285

8. Simulation in Clinical Nursing Education.

Authors: Konstantinos Koukourikos; Areti Tsaloglidou; Lambrini Kourkouta; Ioanna V Papathanasiou; Christos Iliadis; Aikaterini Fratzana; Aspasia Panagiotou
Journal: Acta Inform Med Date: 2021-03

Review 9. A Scoping Review of Assessment Methods Following Undergraduate Clinical Placements in Anesthesia and Intensive Care Medicine.

Authors: Enda O'Connor; Evin Doyle
Journal: Front Med (Lausanne) Date: 2022-04-05

10. The impact of local health professions education grants: is it worth the investment?

Authors: Susan Humphrey-Murto; Kyle Walker; Simran Aggarwal; Nina Preet Kaur Dhillon; Scott Rauscher; Timothy J Wood
Journal: Can Med Educ J Date: 2021-06-30