Seetha U Monrad1, Nikki L Bibler Zaidi2, Karri L Grob3, Joshua B Kurtz4, Andrew W Tai5, Michael Hortsch6, Larry D Gruppen7, Sally A Santen8. 1. Division of Rheumatology, Department of Internal Medicine, University of Michigan Medical School (UMMS), Ann Arbor, MA, USA. 2. RISE innovation unit, University of Michigan Medical School, Ann Arbor, MA, USA. 3. Office of Medical School Education, University of Michigan Medical School, Ann Arbor, MA, USA. 4. University of Michigan Medical School, Ann Arbor, MA, USA. 5. Division of Gastroenterology, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MA, USA. 6. Department of Cell and Developmental Biology, University of Michigan Medical School, Ann Arbor, MA, USA. 7. Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MA, USA. 8. Department of Emergency Medicine, Virginia Commonwealth University School of Medicine, Richmond, VA, USA.
Abstract
BACKGROUND: Using revised Bloom's taxonomy, some medical educators assume they can write multiple choice questions (MCQs) that specifically assess higher (analyze, apply) versus lower-order (recall) learning. The purpose of this study was to determine whether three key stakeholder groups (students, faculty, and education assessment experts) assign MCQs the same higher- or lower-order level. METHODS: In Phase 1, stakeholders' groups assigned 90 MCQs to Bloom's levels. In Phase 2, faculty wrote 25 MCQs specifically intended as higher- or lower-order. Then, 10 students assigned these questions to Bloom's levels. RESULTS: In Phase 1, there was low interrater reliability within the student group (Krippendorf's alpha = 0.37), the faculty group (alpha = 0.37), and among three groups (alpha = 0.34) when assigning questions as higher- or lower-order. The assessment team alone had high interrater reliability (alpha = 0.90). In Phase 2, 63% of students agreed with the faculty as to whether the MCQs were higher- or lower-order. There was low agreement between paired faculty and student ratings (Cohen's Kappa range .098-.448, mean .256). DISCUSSION: For many questions, faculty and students did not agree whether the questions were lower- or higher-order. While faculty may try to target specific levels of knowledge or clinical reasoning, students may approach the questions differently than intended.
BACKGROUND: Using revised Bloom's taxonomy, some medical educators assume they can write multiple choice questions (MCQs) that specifically assess higher (analyze, apply) versus lower-order (recall) learning. The purpose of this study was to determine whether three key stakeholder groups (students, faculty, and education assessment experts) assign MCQs the same higher- or lower-order level. METHODS: In Phase 1, stakeholders' groups assigned 90 MCQs to Bloom's levels. In Phase 2, faculty wrote 25 MCQs specifically intended as higher- or lower-order. Then, 10 students assigned these questions to Bloom's levels. RESULTS: In Phase 1, there was low interrater reliability within the student group (Krippendorf's alpha = 0.37), the faculty group (alpha = 0.37), and among three groups (alpha = 0.34) when assigning questions as higher- or lower-order. The assessment team alone had high interrater reliability (alpha = 0.90). In Phase 2, 63% of students agreed with the faculty as to whether the MCQs were higher- or lower-order. There was low agreement between paired faculty and student ratings (Cohen's Kappa range .098-.448, mean .256). DISCUSSION: For many questions, faculty and students did not agree whether the questions were lower- or higher-order. While faculty may try to target specific levels of knowledge or clinical reasoning, students may approach the questions differently than intended.
Keywords:
Multiple choice questions; assessment; basic science; medical student