BACKGROUND: This study investigated the impact of item format and number of options on the psychometric characteristics (p values and biserials) and response times for multiple-choice questions (MCQs) appearing on Step 2 of the United States Medical Licensing Examination. METHOD: In all, 192 MCQ items were used in the study. Each item was presented in two formats: in a two-item extended-matching set and as an independent item. For the extended matching format, there were two versions: a base version that included all options (10 to 26) and an 8-option version. For the independent-item format, there were three versions: a base version that included all options, and 8-option and 5-option versions created by a group of physicians that selected options without information about examinee performance. All items were embedded in unscored sections of the 2005-06 Step 2 test forms. RESULTS: Versions of items with more options were harder and required more testing time; no differences in item discrimination were observed. Mean response times for items presented in the extended-matching format were lower than for those presented as independent items, primarily because of shorter response times for the second item presented in a set. CONCLUSION: Use of the extended-matching format and smaller numbers of options per item (and more items) should result in more efficient use of testing time and greater score precision per unit of testing time.
BACKGROUND: This study investigated the impact of item format and number of options on the psychometric characteristics (p values and biserials) and response times for multiple-choice questions (MCQs) appearing on Step 2 of the United States Medical Licensing Examination. METHOD: In all, 192 MCQ items were used in the study. Each item was presented in two formats: in a two-item extended-matching set and as an independent item. For the extended matching format, there were two versions: a base version that included all options (10 to 26) and an 8-option version. For the independent-item format, there were three versions: a base version that included all options, and 8-option and 5-option versions created by a group of physicians that selected options without information about examinee performance. All items were embedded in unscored sections of the 2005-06 Step 2 test forms. RESULTS: Versions of items with more options were harder and required more testing time; no differences in item discrimination were observed. Mean response times for items presented in the extended-matching format were lower than for those presented as independent items, primarily because of shorter response times for the second item presented in a set. CONCLUSION: Use of the extended-matching format and smaller numbers of options per item (and more items) should result in more efficient use of testing time and greater score precision per unit of testing time.