James Ware1, Torstein Vik. 1. Faculty of Medicine, Health Sciences Centre, Kuwait University, Safat, Kuwait. jamesw@hsc.edu.kw
Abstract
BACKGROUND: One Norwegian medical school introduced A-type MCQs (best one of five) to replace more traditional assessment formats (e.g. essays) in an undergraduate medical curriculum. Quality assurance criteria were introduced to measure the success of the intervention. METHOD: Data collection from the first four year-end examinations included item analysis, frequency of item writing flaws (IWF) and proportion of items testing at a higher cognitive level (K2). All examinations were reviewed before after delivery and no items were removed. RESULTS: Overall pass rates were similar to previous cohorts examined with traditional assessment formats. Across 389 items, the proportion of items with >or=5% of candidates marking two or more functioning distracters was >or=47.5%. Removal of items with high p-values (>or=85%), this item distracter proportion became >75%. With each successive year in the curriculum the proportion of K2 items used rose steadily to almost 50%. 31/389 (7%) items had IWFs. 65% items had a discriminatory power, >or=0.15. CONCLUSIONS: Five item quality criteria are recommended: (1) adherence to an in-house style, (2) item proportion testing at K2 level, (3) functioning distracter proportion, (4) overall discrimination ratio and (5) IWF frequency.
BACKGROUND: One Norwegian medical school introduced A-type MCQs (best one of five) to replace more traditional assessment formats (e.g. essays) in an undergraduate medical curriculum. Quality assurance criteria were introduced to measure the success of the intervention. METHOD: Data collection from the first four year-end examinations included item analysis, frequency of item writing flaws (IWF) and proportion of items testing at a higher cognitive level (K2). All examinations were reviewed before after delivery and no items were removed. RESULTS: Overall pass rates were similar to previous cohorts examined with traditional assessment formats. Across 389 items, the proportion of items with >or=5% of candidates marking two or more functioning distracters was >or=47.5%. Removal of items with high p-values (>or=85%), this item distracter proportion became >75%. With each successive year in the curriculum the proportion of K2 items used rose steadily to almost 50%. 31/389 (7%) items had IWFs. 65% items had a discriminatory power, >or=0.15. CONCLUSIONS: Five item quality criteria are recommended: (1) adherence to an in-house style, (2) item proportion testing at K2 level, (3) functioning distracter proportion, (4) overall discrimination ratio and (5) IWF frequency.