Literature DB >> 36091742

Eleven ways to get a grip on the implementation of remote administration of high-stakes assessments.

Abstract

The COVID-19 pandemic rushed licensure and certification institutions, as well as many university programs, to integrate Information and Communication Technologies (ICTs) in their practices to allow for remote administrations of their exams independent of distancing measures. The Black Ice covered in this manuscript is the integration of ICTs to allow remote administration of high-stakes assessments in terms of its development, administration, and monitoring with the aim to promote the validity of score interpretation.

Entities: Chemical

Year: 2022 PMID： 36091742 PMCID： PMC9441112 DOI： 10.36834/cmej.73734

Source DB: PubMed Journal: Can Med Educ J ISSN： 1923-1202

Introduction

During the Spring 2020, many national licensure and certification exams were postponed because of the COVID-19 pandemic, leaving candidates in limbo with regards to the practice of their chosen profession. Many university programs had to pivot quickly to allow for off-site administration of their assessments. While the pandemic pushed these institutions to integrate Information and Communication Technologies (ICTs) in their practices to allow for remote administrations of their exams independent of distancing measures, these changes were done hastily. The Black Ice covered in this manuscript is the integration of ICTs to allow remote administration of high-stakes assessments, building on the lessons learned over the past two years and previous literature on this topic. To build on these lessons learned I offer considerations for the assessment development, its administration, and its monitoring with the aim to promote the validity of score interpretation[1] that could inform future integration of technologies to written- and performance-based assessment practices. Some of these recommendations may not apply nor be feasible for all assessments, but we could consider them as a goal to work towards.

Considerations while planning the assessment and before the administration

1. Use a stable and reliable platform

The stability and reliability of the technology used can be a major threat to the validity of score interpretation.[2] For example, when the technology is not stable enough it can disconnect candidates during the assessment, undoubtedly hindering the quality of the experience. For example, a stable and reliable platform would offer a user experience that has as little lag as possible, and a connection maintained through the examination. Beyond in-the-moment frustrations, issues of technology reliability can undermine the candidates’ and public’s trust in the examinations and what it represents. More importantly, this may void the assessment score altogether. Choosing a stable and reliable platform provider is of the utmost importance, and as such may require substantial piloting and testing to identify the best platform. When piloting a platform, one should aim to reproduce, as closely as possible, the conditions in which the exam would be administered (including number of potential candidates, number of items, length, etc.). While it may be impossible to reproduce all these conditions, mimicking the expected setting as closely as possible will provide a better sense of the stability and reliability of the platform.

2. Use a platform compatible with all (or most) Operating Systems and that requires minimum hardware, and internet bandwidth, to limit inequities between examinees

Equity is more and more put at the forefront of assessment considerations,[3],[4] especially in the context of e-assessment. Authors have commented, for example, on the negative consequences that the digital divide may have on low-income students.[2],[5],[6] Considerations for equity are enacted when one aims to “…identify and remove construct-irrelevant barriers to maximal performance for any examinee. Removing these barriers allows for the comparable and valid interpretation of test scores for all examinees.”[1](p63) Strategies to reduce inequity include providing candidates with the required technology such as providing computers, webcams or eventually Virtual Reality equipment.[7] Other strategies include using accessible technology (minimum hardware requirement and minimal bandwidth use) and, ensuring compatibility with multiple operating systems, screen sizes or even font size.

3. Create a lot of content (questions or stations) to decrease the potential effect of content sharing between candidates

Using parallel forms of an exam reduces cheating in the context of multiple days of testing.[2] In addition, the development of bigger item banks minimizes the negative impact on items’ psychometric properties associated with the frequent re-use of items.[5],[8] In other words, using items less often reduces the risk that they become ‘easier’ after being shared between candidates. Collaborations between institutions, when possible, or the use of algorithm to generate items[9] are strategies that can potentially reduce the burden of content creation. However, when aiming for more content and more versions of the same exam there is a danger that these different exams may not be comparable. Consequently, strategies or processes ensuring comparable difficulty levels need to be put in place.

4. Leverage the technology to enhance the quality of the assessment

Moving to ICT-based assessment offers several opportunities, such as a purposeful and strategic use of multi-media, including audio and video clips. To be considered Technology Enhanced Assessment, the integration of ICTs to assessment practices should contribute to enhancing the validity of score interpretation -increasing the authenticity of assessment tasks- and the quality of user experience.[10],[11] While integrating ICTs to assessment opens a world of possibilities, these changes should be done with considerations for platform stability and accessibility, as discussed previously. In addition, exam designers should consider questions that are long, how much scrolling is needed, and for students who like to take notes to help them organize their thoughts when answering questions, how is notetaking handled.

5. Offer simulation sessions and proper training for candidates, standardized patients, and examiners to familiarize themselves with the platform

The technology can be a source of anxiety for candidates,[12]–[14] standardized patients and examiners. Given the usual distress associated with high stakes assessment, it is important to consider how to avoid the technology becoming a burden so big that candidates’ true knowledge, skills and attitudes cannot be captured with the assessment. Providing practice opportunities for candidates, examiners and standardized patients to familiarize themselves with the technology can help to reduce the anxiety,[15],[16] and favors a smoother administration because everyone involved knows the process and the technology.[17] It seems that just being exposed to the platform reduces the anxiety of all users (examinees, examiners and standardized patients).[16],[17] This applies for both written and performance-based assessment. When simulation or practice sessions are not feasible, one could consider podcasts or step-by-step demonstrations to prepare candidates, standardized patients, and examiners.

Considerations during the administration of the assessment

6. Adapt the structure and process

Performance-based assessment may be the type of examination that require the most adaptation when conducted online. In the context of Objective Structured Clinical Examinations (OSCEs) for example, one might consider using a different approach to candidates moving from room to room. Lewandowski et al.[17] and Ryan et al.,[18] for example, favored an approach where examiners moved from candidate to candidate. These movements -from room to room—were facilitated by resource dedicated to the management of examinee and examiner location.

7. Ensure appropriate availability of resources

While there is not one fail-safe way of doing a virtual OSCE, having sufficient resources available to manage and monitor moves between rooms, and to assist candidates, examiners, and SPs, is one way of contributing to a less eventful administration of the examination.[18] Similarly, one also requires sufficient available resources when conducting written exams on-line to ensure appropriate support and invigilation.

8. Use proper authentication and proctoring protocols while balancing for privacy considerations

Authentication refers to the processes put in place to ensure that the correct candidate is sitting the exam.[6] This process can take on many forms from requiring a photo ID, some one-on-one questioning, and maybe testing the computer being used.[2] These strategies seem easy enough to implement and seem acceptable for most stakeholders. Proctoring refers to the surveillance put in place during the assessment to prevent—as much as possible—cheating behavior. Live remote proctoring, that is having an invigilator observe examinees by using their webcam, is a strategy used to mitigate the potential of cheating.[7] However, live remote proctoring has been criticized for creating additional test anxiety,[19]-[21] violating personal privacy,[19],[20],[22] and leading to test taker withdrawal from the assessment.[19] Measures and processes of authentication and proctoring need to be balanced out with issues of privacy. Aligned with principles of equity, fairness and responsible conduct of assessment, institutions implementing remote proctoring practices should be mindful, not only of privacy laws, but of how candidates perceive these strategies. In addition, strategies should be put in place to revisit any performance flagged as potential cheating behavior. Having an examiner doing live scoring can give the perception of an added invigilator, thus reducing opportunities for cheating behavior.

9. Implement fair accommodation strategies

Additional time to complete an exam is by far the most common accommodation requested and offered.[23],[24] Remote assessment could facilitate having settings that are less distracting if conditions put in place by the testing institutions are respected. While these accommodations are easy to implement in remote assessments, other forms of accommodation may be more challenging. Some candidates may require larger prints or read-aloud test directions or questions (which may or may not be possible according to the platform used).[25] Some candidates may require a sign interpreter in the context of performance-based assessments,[25] or additional breaks.[26] Further research is required to understand the consequences on performance of these accommodations in the context of remote assessment.

10. Record performances as a safety net

Saving candidates’ answers as they move along in the process of a written exam is a common practice. Having a recording of candidates’ performances offers many possibilities.[27]–[33] If an examiner is disconnected from the platform, the recording allows for scoring of the performance later thus not penalizing the candidate. In addition, the recording can be used in the case of a candidate contesting their score.[17]

Considerations for detecting cheating behaviors

11. Use advanced statistical modeling to determine the probability of individual cheating

Advanced statistical modeling, such as Person-Fit Statistics, offer the opportunity to establish the probability of cheating behavior on Multiple Choice Exams.[34],[35] These statistics have been tested to detect unusual and unexpected assessor behavior (i.e., leniency and stringency) in a performance-based assessment, such as an OSCE.[36]-[38] Future research could be undertaken to explore if Person-Fit Statistic have any merit or potential use in detecting cheating behavior in candidates in assessments other than MCQs.

Conclusion

While there is no question that licensure and certification institutions, as well as university programs, need to integrate ICTs in their high-stakes assessment practices and be prepared to conduct reliable and valid remote assessment, during the transition to remote e-licensure assessment there are bound to be some hits and some misses. The question then becomes, “how many fail safe and contingency plans should be put in place?” In addition, if these institutions play their cards well in integrating ICTs to their assessments, this could go beyond “being prepared,” but also enhanced the quality of their assessments and validity of the score interpretation.

14 in total

1. Scoring objective structured clinical examinations using video monitors or video recordings.

Authors: Deborah A Sturpe; Donna Huynh; Stuart T Haines
Journal: Am J Pharm Educ Date: 2010-04-12 Impact factor: 2.047

2. Examining accommodation effects for equity by overcoming a methodological challenge of sparse data.

Authors: Pei-Ying Lin; Yu-Cheng Lin
Journal: Res Dev Disabil Date: 2016-01-11

3. Quality control of an OSCE using generalizability theory and many-faceted Rasch measurement.

Authors: Cherdsak Iramaneerat; Rachel Yudkowsky; Carol M Myford; Steven M Downing
Journal: Adv Health Sci Educ Theory Pract Date: 2007-02-20 Impact factor: 3.853

4. ADHD symptoms and benefit from extended time testing accommodations.

Authors: Benjamin J Lovett; Ashley M Leja
Journal: J Atten Disord Date: 2013-11-11 Impact factor: 3.256

5. Video assessment of basic surgical trainees' operative skills.

Authors: Peter J Driscoll; Anna M Paisley; Simon Paterson-Brown
Journal: Am J Surg Date: 2008-06-16 Impact factor: 2.565

6. Direct Observation versus Endoscopic Video Recording-Based Rating with the Objective Structured Assessment of Technical Skills for Training of Laparoscopic Cholecystectomy.

Authors: Felix Nickel; Jonathan D Hendrie; Christian Stock; Mohamed Salama; Anas A Preukschas; Jonas D Senft; Karl F Kowalewski; Martin Wagner; Hannes G Kenngott; Georg R Linke; Lars Fischer; Beat P Müller-Stich
Journal: Eur Surg Res Date: 2016-04-09 Impact factor: 1.745

7. A pilot of a Virtual Objective Structured Clinical Examination in dental education. A response to COVID-19.

Authors: James Donn; James Alun Scott; Vivian Binnie; Aileen Bell
Journal: Eur J Dent Educ Date: 2020-12-03 Impact factor: 2.355

8. Standardized and quality-assured video-recorded examination in undergraduate education: informed consent prior to surgery.

Authors: Christoph Kiehl; Anne Simmenroth-Nayda; Yvonne Goerlich; Andrew Entwistle; Sarah Schiekirka; B Michael Ghadimi; Tobias Raupach; Sarah Koenig
Journal: J Surg Res Date: 2014-01-30 Impact factor: 2.192

9. Detecting rater bias using a person-fit statistic: a Monte Carlo simulation study.

Authors: André-Sébastien Aubin; Christina St-Onge; Jean-Sébastien Renaud
Journal: Perspect Med Educ Date: 2018-04

10. COVID-19 as the tipping point for integrating e-assessment in higher education practices.

Authors: Christina St-Onge; Kathleen Ouellet; Sawsen Lakhal; Tim Dubé; Mélanie Marceau
Journal: Br J Educ Technol Date: 2021-10-02