BACKGROUND: OSCEs can be both reliable and valid but are subject to sources of error. Examiners become more hawkish as their experience grows, and recent research suggests that in clinical contexts, examiners are influenced by the ability of recently observed candidates. In OSCEs, where examiners test many candidates over a short space of time, this may introduce bias that does not reflect a candidate's true ability. AIMS: Test whether examiners marked more or less stringently as time elapsed in a summative OSCE, and evaluate the practical impact of this bias. METHODS: We measured changes in examiner stringency in a 13 station OSCE sat by 278 third year MBChB students over the course of two days. RESULTS: Examiners were most lenient at the start of the OSCE in the clinical section (β = -0.14, p = 0.018) but not in the online section where student answers were machine marked (β = -0.003, p = 0.965). CONCLUSIONS: The change in marks was likely caused by increased examiner stringency over time derived from a combination of growing experience and exposure to an increasing number of successful candidates. The need for better training and for reviewing standards during the OSCE is discussed.
BACKGROUND: OSCEs can be both reliable and valid but are subject to sources of error. Examiners become more hawkish as their experience grows, and recent research suggests that in clinical contexts, examiners are influenced by the ability of recently observed candidates. In OSCEs, where examiners test many candidates over a short space of time, this may introduce bias that does not reflect a candidate's true ability. AIMS: Test whether examiners marked more or less stringently as time elapsed in a summative OSCE, and evaluate the practical impact of this bias. METHODS: We measured changes in examiner stringency in a 13 station OSCE sat by 278 third year MBChB students over the course of two days. RESULTS: Examiners were most lenient at the start of the OSCE in the clinical section (β = -0.14, p = 0.018) but not in the online section where student answers were machine marked (β = -0.003, p = 0.965). CONCLUSIONS: The change in marks was likely caused by increased examiner stringency over time derived from a combination of growing experience and exposure to an increasing number of successful candidates. The need for better training and for reviewing standards during the OSCE is discussed.
Authors: Boaz Shulruf; Arvin Damodaran; Phil Jones; Sean Kennedy; George Mangos; Anthony J O'Sullivan; Joel Rhee; Silas Taylor; Gary Velan; Peter Harris Journal: BMC Med Educ Date: 2018-01-06 Impact factor: 2.463
Authors: Catherine Hyde; Sarah Yardley; Janet Lefroy; Simon Gay; Robert K McKinley Journal: Adv Health Sci Educ Theory Pract Date: 2020-01-29 Impact factor: 3.853
Authors: Peter Yeates; Alice Moult; Natalie Cope; Gareth McCray; Richard Fuller; Robert McKinley Journal: Med Educ Date: 2022-01-11 Impact factor: 7.647
Authors: Boaz Shulruf; Barbara-Ann Adelstein; Arvin Damodaran; Peter Harris; Sean Kennedy; Anthony O'Sullivan; Silas Taylor Journal: BMC Med Educ Date: 2018-11-20 Impact factor: 2.463