Sean Tackett1,2, Mark Raymond3, Rishi Desai2,4, Steven A Haist3, Amy Morales3, Shiv Gaglani2,5, Stephen G Clyman3. 1. a Department of Medicine, Johns Hopkins Bayview Medical Center , Baltimore , Maryland. 2. b Osmosis , Baltimore, MD , USA. 3. c National Board of Medical Examiners , Philadelphia, PA , USA. 4. d Stanford University School of Medicine , Palo Alto, CA , USA. 5. e Johns Hopkins University School of Medicine , Baltimore, MD , USA.
Abstract
PURPOSE: Adaptive learning requires frequent and valid assessments for learners to track progress against their goals. This study determined if multiple-choice questions (MCQs) "crowdsourced" from medical learners could meet the standards of many large-scale testing programs. METHODS: Users of a medical education app (Osmosis.org, Baltimore, MD) volunteered to submit case-based MCQs. Eleven volunteers were selected to submit MCQs targeted to second year medical students. Two hundred MCQs were subjected to duplicate review by a panel of internal medicine faculty who rated each item for relevance, content accuracy, and quality of response option explanations. A sample of 121 items was pretested on clinical subject exams completed by a national sample of U.S. medical students. RESULTS: Seventy-eight percent of the 200 MCQs met faculty reviewer standards based on relevance, accuracy, and quality of explanations. Of the 121 pretested MCQs, 50% met acceptable statistical criteria. The most common reasons for exclusion were that the item was too easy or had a low discrimination index. CONCLUSIONS: Crowdsourcing can efficiently yield high-quality assessment items that meet rigorous judgmental and statistical criteria. Similar models may be adopted by students and educators to augment item pools that support adaptive learning.
PURPOSE: Adaptive learning requires frequent and valid assessments for learners to track progress against their goals. This study determined if multiple-choice questions (MCQs) "crowdsourced" from medical learners could meet the standards of many large-scale testing programs. METHODS: Users of a medical education app (Osmosis.org, Baltimore, MD) volunteered to submit case-based MCQs. Eleven volunteers were selected to submit MCQs targeted to second year medical students. Two hundred MCQs were subjected to duplicate review by a panel of internal medicine faculty who rated each item for relevance, content accuracy, and quality of response option explanations. A sample of 121 items was pretested on clinical subject exams completed by a national sample of U.S. medical students. RESULTS: Seventy-eight percent of the 200 MCQs met faculty reviewer standards based on relevance, accuracy, and quality of explanations. Of the 121 pretested MCQs, 50% met acceptable statistical criteria. The most common reasons for exclusion were that the item was too easy or had a low discrimination index. CONCLUSIONS: Crowdsourcing can efficiently yield high-quality assessment items that meet rigorous judgmental and statistical criteria. Similar models may be adopted by students and educators to augment item pools that support adaptive learning.