Danny G P Mathysen1, Wagih Aclimandos, Ella Roelant, Kristien Wouters, Catherine Creuzot-Garcher, Peter J Ringens, Marko Hawlina, Marie-José Tassignon. 1. Department of Ophthalmology, Antwerp University Hospital, Antwerp, BelgiumDepartment of Ophthalmology, Faculty of Medicine, University of Antwerp, Antwerp, Belgium;Department of Ophthalmology, King's College Hospital, London, UK;Department of Scientific Research and Statistics, Antwerp University Hospital, Antwerp, Belgium;Institute of Health and Society, Newcastle University, Newcastle upon Tyne, UK;Department of Ophthalmology, Centre Hospitalier Universitaire, Dijon, France;Department of Ophthalmology, Maastricht Universitair Medisch Centrum (MUMC), Maastricht, The NetherlandsUniversity Eye Clinic, Ljubljana, Slovenia.
Abstract
PURPOSE: To investigate whether introduction of item-response theory (IRT) analysis, in parallel to the 'traditional' statistical analysis methods available for performance evaluation of multiple T/F items as used in the European Board of Ophthalmology Diploma (EBOD) examination, has proved beneficial, and secondly, to study whether the overall assessment performance of the current written part of EBOD is sufficiently high (KR-20≥ 0.90) to be kept as examination format in future EBOD editions. METHODS: 'Traditional' analysis methods for individual MCQ item performance comprise P-statistics, Rit-statistics and item discrimination, while overall reliability is evaluated through KR-20 for multiple T/F items. The additional set of statistical analysis methods for the evaluation of EBOD comprises mainly IRT analysis. These analysis techniques are used to monitor whether the introduction of negative marking for incorrect answers (since EBOD 2010) has a positive influence on the statistical performance of EBOD as a whole and its individual test items in particular. RESULTS: Item-response theory analysis demonstrated that item performance parameters should not be evaluated individually, but should be related to one another. Before the introduction of negative marking, the overall EBOD reliability (KR-20) was good though with room for improvement (EBOD 2008: 0.81; EBOD 2009: 0.78). After the introduction of negative marking, the overall reliability of EBOD improved significantly (EBOD 2010: 0.92; EBOD 2011:0.91; EBOD 2012: 0.91). CONCLUSION: Although many statistical performance parameters are available to evaluate individual items, our study demonstrates that the overall reliability assessment remains the only crucial parameter to be evaluated allowing comparison. While individual item performance analysis is worthwhile to undertake as secondary analysis, drawing final conclusions seems to be more difficult. Performance parameters need to be related, as shown by IRT analysis. Therefore, IRT analysis has proved beneficial for the statistical analysis of EBOD. Introduction of negative marking has led to a significant increase in the reliability (KR-20 > 0.90), indicating that the current examination format can be kept for future EBOD examinations.
PURPOSE: To investigate whether introduction of item-response theory (IRT) analysis, in parallel to the 'traditional' statistical analysis methods available for performance evaluation of multiple T/F items as used in the European Board of Ophthalmology Diploma (EBOD) examination, has proved beneficial, and secondly, to study whether the overall assessment performance of the current written part of EBOD is sufficiently high (KR-20≥ 0.90) to be kept as examination format in future EBOD editions. METHODS: 'Traditional' analysis methods for individual MCQ item performance comprise P-statistics, Rit-statistics and item discrimination, while overall reliability is evaluated through KR-20 for multiple T/F items. The additional set of statistical analysis methods for the evaluation of EBOD comprises mainly IRT analysis. These analysis techniques are used to monitor whether the introduction of negative marking for incorrect answers (since EBOD 2010) has a positive influence on the statistical performance of EBOD as a whole and its individual test items in particular. RESULTS: Item-response theory analysis demonstrated that item performance parameters should not be evaluated individually, but should be related to one another. Before the introduction of negative marking, the overall EBOD reliability (KR-20) was good though with room for improvement (EBOD 2008: 0.81; EBOD 2009: 0.78). After the introduction of negative marking, the overall reliability of EBOD improved significantly (EBOD 2010: 0.92; EBOD 2011:0.91; EBOD 2012: 0.91). CONCLUSION: Although many statistical performance parameters are available to evaluate individual items, our study demonstrates that the overall reliability assessment remains the only crucial parameter to be evaluated allowing comparison. While individual item performance analysis is worthwhile to undertake as secondary analysis, drawing final conclusions seems to be more difficult. Performance parameters need to be related, as shown by IRT analysis. Therefore, IRT analysis has proved beneficial for the statistical analysis of EBOD. Introduction of negative marking has led to a significant increase in the reliability (KR-20 > 0.90), indicating that the current examination format can be kept for future EBOD examinations.
Authors: Gordana Sunaric-Mégevand; Wagih Aclimandos; Catherine Creuzot-Garcher; Carlo-Enrico Traverso; Anja Tuulonen; Roger Hitchings; Danny G P Mathysen Journal: J Educ Eval Health Prof Date: 2016-07-28
Authors: Danny G P Mathysen; Peter J Ringens; Edoardo Midena; Artur Klett; Gordana Sunaric-Mégevand; Rafael Martinez-Costa; Denise Curtin; Marie-José Tassignon; Wagih Aclimandos; Catherine Creuzot-Garcher; Christina Grupcheva Journal: J Educ Eval Health Prof Date: 2016-07-26