| Literature DB >> 34687383 |
P F McLean1, D J Davies2, P R Kemp1,3, A D Liddle1, M J Morrell1,3, O Halse1, N M Martin1, A H Sam1.
Abstract
Open-book examinations (OBEs) will likely become increasingly important assessment tools. We investigated how access to open-book resources affected questions testing factual recall, which might be easy to look-up, versus questions testing higher-order cognitive domains. Few studies have investigated OBEs using modern Internet resources or as summative assessments. We compared performance on an examination conducted as a traditional closed-book exam (CBE) in 2019 (N = 320) and a remote OBE with free access to Internet resources in 2020 (N = 337) due to COVID-19. This summative, end-of-year assessment focused on basic science for second-year medical students. We categorized questions by Bloom's taxonomy ('Remember', versus 'Understand/Apply'). We predicted higher performance on the OBE, driven by higher performance on 'Remember' questions. We used an item-centric analysis by using performance per item over all examinees as the outcome variable in logistic regression, with terms 'Open-Book, 'Bloom Category' and their interaction. Performance was higher on OBE questions than CBE questions (OR 2.2, 95% CI: 2.14-2.39), and higher on 'Remember' than 'Understand/Apply' questions (OR 1.13, 95% CI: 1.09-1.19). The difference in performance between 'Remember' and 'Understand/Apply' questions was greater in the OBE than the CBE ('Open-Book' * 'Bloom Category' interaction: OR 1.2, 95% CI: 1.19-1.37). Access to open-book resources had a greater effect on performance on factual recall questions than higher-order questions, though performance was higher in the OBE overall. OBE design must consider how searching for information affects performance, particularly on questions measuring different domains of knowledge.Entities:
Keywords: Assessment; Medical education; Open-book examinations
Mesh:
Year: 2021 PMID: 34687383 PMCID: PMC8536902 DOI: 10.1007/s10459-021-10076-5
Source DB: PubMed Journal: Adv Health Sci Educ Theory Pract ISSN: 1382-4996 Impact factor: 3.853
shows the student-centric and item-centric descriptive statistics for the exam, performed as a closed-book exam (CBE) in 2019 and open-book exam (OBE) in 2020
| Closed-book exam (2019) | Open-book exam (2020) | |
|---|---|---|
| Candidate-centric descriptive statistics | ||
| N candidates | 320 | 337 |
| Mean score per candidate (standard deviation) (%) | 72.1 (11) | 85.7 (8.2) |
| Median score per candidate (median absolute deviation) (%) | 74.2 (10.4) | 87.5 (4.6) |
| Range of scores per candidate (%) | 22.7–93 | 35.9–97.7 |
| Interquartile range of scores per candidate (%) | 14.9 | 6.2 |
| Number of Failing Students | 19 | 9 |
| Item-centric descriptive statistics | ||
| N items | 128 | 128 |
| Mean score per item (standard deviation) (%) | 72.1 (18.6) | 87.0 (13.7) |
| Median score per item (median absolute deviation) (%) | 75.5 (17.4) | 92.1 (9.5) |
| Interquartile range of scores per item (%) | 24.2 | 17.3 |
| Range of scores per item (%) | 18.8–99.1 | 31.8–99.7 |
shows the categorization of the closed-book exam (CBE) and open-book exam (OBE) by Bloom’s Taxonomy and item-centric average performance per category. Chi-squared test showed no evidence of difference in the numbers of ‘Remember’ versus ‘Understand/Apply’ questions (Chi-squared = 0.070, df = 1, p = 0.76) between the 2019 and 2020 exams
| Bloom category | N Items CBE (2019) | Mean (SD) performance per category–CBE (2019) | Median (MAD) performance per category–CBE (2019) | N Items OBE (2020) | Mean (SD) performance per category–OBE (2020) | Median (MAD) performance per category–OBE (2020) |
|---|---|---|---|---|---|---|
| Remember | 82 | 73.0 (19.6) | 76.7 (17.8) | 79 | 88.7 (14.9) | 94.7 (6.2) |
| Understand | 42 | 70.1 (17.1) | 74.4 (17.1) | 43 | 84.8 (11.6) | 85.5 (14.5) |
| Apply | 4 | 73.8 (15.6) | 72.8 (16.4) | 6 | 81.3 (8.3) | 81.8 (6.6) |
| Understand/Apply Combined | 46 | 70.4 (16.8) | 74.4 (17.1) | 49 | 84.4 (11.2) | 84.3 (14.5) |
Fig. 1Shows the distributions of percentage scores per item, averaged over all examinees. Items are grouped by their classification by Bloom’s taxonomy (‘Remember’ or ‘Understand/Apply’), and whether they were in the 2019 closed-book exam (CBE) or the 2020 open-book exam (OBE). The thick dashed line shows the median. The thin dashed lines show the interquartile range
shows the results of comparing logistic regression models using BIC
| Model | BIC |
|---|---|
| Null (no effects) | 17,857.87 |
| Open-book main effect only | 14,915.46 |
| Bloom category main effect only | 17,747.99 |
| Main effects & interaction | 14,731.0 |
We compared the null model (no effects), to models with a main effect of ‘Open-Book’ (whether the exam was open or closed-book) only, a main effect of ‘Bloom Category’ (whether the item was categorised as ‘Remember’ or ‘Understand/Apply’) only, and a model with ‘Open-Book’, ‘Bloom Category’ main effects and their interaction. The model with the interaction performed the best, indicated by the lowest BIC
Shows the results of the winning logistic regression model with predictors of ‘Open-Book’, ‘Bloom Category’ and their interaction
| Model | Outcome | Predictor | Log Odds (Standard Error) | Z-value | P-value | Odds Ratio (CI) |
|---|---|---|---|---|---|---|
| Logistic regression model with interaction | N correct answers out of total answers per item | Open-Book | 0.82 (0.03) | 29.17 | < 0.0001 | 2.26 (95% CI: 2.14—2.39) |
| Bloom Category | 0.13 (0.02) | 5.55 | < 0.0001 | 1.13 (95% CI: 1.09—1.19) | ||
| Open-Book * Bloom Category Interaction | 0.25 (0.04) | 6.69 | < 0.0001 | 1.28 (95% CI: 1.19—1.37) | ||
| Posthoc logistic regression: CBE data only | N correct answers out of total answers per item | Bloom Category | 0.13 (0.02) | 5.55 | < 0.0001 | 1.13 (99% CI: 1.07—1.2) |
| Posthoc logistic regression: OBE data only | N correct answers out of total answers per item | Bloom Category | 0.37 (0.03) | 12.91 | < 0.0001 | 1.45 (99% CI: 1.35—1.56) |
Table 4 also shows the results of posthoc logistic regressions performed on open-book and closed-book data separately to understand the interaction term
Fig. 2Shows the probabilities of answering a question correctly, extracted from logistic regression models and calculated using the inverse-logit function. Examinees were more likely to correctly answer ‘Remember’ questions than ‘Understand/Apply’ questions, and the gap between ‘Remember’ and ‘Understand/Apply’ questions was greater in the open-book exam. This supported that access to open-book resources made examinees more likely to answer questions correctly, particularly for questions testing recall of facts