Literature DB >> 23746162

Evaluating the quality of medical multiple-choice items created with automated processes.

Mark J Gierl1, Hollis Lai.   

Abstract

OBJECTIVES: Computerised assessment raises formidable challenges because it requires large numbers of test items. Automatic item generation (AIG) can help address this test development problem because it yields large numbers of new items both quickly and efficiently. To date, however, the quality of the items produced using a generative approach has not been evaluated. The purpose of this study was to determine whether automatic processes yield items that meet standards of quality that are appropriate for medical testing. Quality was evaluated firstly by subjecting items created using both AIG and traditional processes to rating by a four-member expert medical panel using indicators of multiple-choice item quality, and secondly by asking the panellists to identify which items were developed using AIG in a blind review.
METHODS: Fifteen items from the domain of therapeutics were created in three different experimental test development conditions. The first 15 items were created by content specialists using traditional test development methods (Group 1 Traditional). The second 15 items were created by the same content specialists using AIG methods (Group 1 AIG). The third 15 items were created by a new group of content specialists using traditional methods (Group 2 Traditional). These 45 items were then evaluated for quality by a four-member panel of medical experts and were subsequently categorised as either Traditional or AIG items.
RESULTS: Three outcomes were reported: (i) the items produced using traditional and AIG processes were comparable on seven of eight indicators of multiple-choice item quality; (ii) AIG items can be differentiated from Traditional items by the quality of their distractors, and (iii) the overall predictive accuracy of the four expert medical panellists was 42%.
CONCLUSIONS: Items generated by AIG methods are, for the most part, equivalent to traditionally developed items from the perspective of expert medical reviewers. While the AIG method produced comparatively fewer plausible distractors than the traditional method, medical experts cannot consistently distinguish AIG items from traditionally developed items in a blind review.
© 2013 John Wiley & Sons Ltd.

Mesh:

Year:  2013        PMID: 23746162     DOI: 10.1111/medu.12202

Source DB:  PubMed          Journal:  Med Educ        ISSN: 0308-0110            Impact factor:   6.251


  2 in total

1.  Re-using questions in classroom-based assessment: An exploratory study at the undergraduate medical education level.

Authors:  Sébastien Xavier Joncas; Christina St-Onge; Sylvie Bourque; Paul Farand
Journal:  Perspect Med Educ       Date:  2018-12

Review 2.  Feasibility assurance: a review of automatic item generation in medical assessment.

Authors:  Filipe Falcão; Patrício Costa; José M Pêgo
Journal:  Adv Health Sci Educ Theory Pract       Date:  2022-03-01       Impact factor: 3.629

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.