| Literature DB >> 36092385 |
Magdy H Balaha1,2, Mona T El-Ibiary2, Ayman A El-Dorf2, Shereef L El-Shewaikh2, Hossam M Balaha3.
Abstract
Background The item-writing flaws (IWFs) in multiple-choice questions (MCQs) can affect test validity. The purpose of this study was to explore the IWFs in the published resources, estimate their frequency and pattern, rank, and compare the current study resources, and propose a possible impact for teachers and test writers. Methods This cross-sectional study was conducted from September 2017 to December 2020. MCQs from the published MCQ books in Obstetrics and Gynecology was the target resources. They were stratified into four clusters (study-book related, review books, self-assessment books, and online-shared test banks). The sample size was estimated and 2,300 out of 11,195 eligible MCQs were randomly selected. The MCQs (items) were judged on a 20-element compiled checklist that is organized under three sections as follows: (1) structural flaws (seven elements), (2) test-wiseness flaws (five elements), and (3) irrelevant difficulty flaws (eight elements). Rating was done dichotomously, 0 = violating and 1 = not violating. Item flaws were recorded and analyzed using the Excel spreadsheets and IBM SPSS. Results Twenty three percent of the items ( n = 537) were free from any violations, whereas 30% ( n = 690) contained one violation, and 47% ( n = 1073) contained more than one violation. The most commonly reported IWFs were "Options are Not in Order (61%)." The best questions with the least flaws (75th percentiles) were obtained from the self-assessment books followed by study-related MCQ books. The average scores of good-quality items in the cluster of self-assessment books were significantly higher than other book clusters. Conclusion There were variable presentations and percentages of item violations. Lower quality questions were observed in review-related MCQ books and the online-shared test banks. Using questions from these resources needs a caution or avoidance strategy. Relative higher quality questions were reported for the self-assessment followed by the study-related MCQ books. An adoption strategy may be applied with mitigation if needed. Syrian American Medical Society. This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. ( https://creativecommons.org/licenses/by-nc-nd/4.0/ ).Entities:
Keywords: multiple-choice questions violations; multiple-choice questions writing flaws; test bank quality
Year: 2022 PMID: 36092385 PMCID: PMC9458348 DOI: 10.1055/s-0042-1755332
Source DB: PubMed Journal: Avicenna J Med ISSN: 2231-0770
Distribution of the percentage of items having an overall item flaws (violations), structural, test wiseness, and irrelevant difficulty flaws in all MCQ resources
| Total item flaws (%) | Structural flaws (%) | Test-wiseness flaws (%) | Irrelevant difficulty flaws (%) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Source or book
|
Good items
|
One flaw (
|
>1 flaw (
|
No flaws (
|
One flaw (
|
>1 flaw (
|
No flaws (
|
One flaw (
|
>1 flaw (
|
No flaws (
|
One flaw (
|
>1 flaw (
|
| 1 | 10 | 14 | 76 | 42 | 8 | 50 | 68 | 28 | 4 | 10 | 64 | 26 |
| 2 | 4 | 16 | 80 | 38 | 6 | 56 | 70 | 28 | 2 | 8 | 68 | 24 |
| 3 | 12 | 28 | 60 | 62 | 28 | 10 | 80 | 18 | 2 | 26 | 54 | 20 |
| 4 | 14 | 30 | 56 | 50 | 46 | 4 | 78 | 20 | 2 | 34 | 26 | 40 |
| 5 | 4 | 18 | 78 | 46 | 34 | 20 | 60 | 32 | 8 | 12 | 34 | 54 |
| 6 | 11 | 29 | 60 | 37 | 34 | 29 | 81 | 18 | 1 | 40 | 36 | 24 |
| 7 | 4 | 44 | 52 | 58 | 42 | 0 | 84 | 14 | 2 | 8 | 76 | 16 |
| 8 | 36 | 19 | 45 | 51 | 17 | 32 | 90 | 9 | 1 | 70 | 26 | 4 |
| 9 | 52 | 23 | 25 | 63 | 35 | 2 | 82 | 16 | 2 | 84 | 15 | 1 |
| 10 | 68 | 19 | 13 | 78 | 21 | 1 | 97 | 3 | 0 | 81 | 18 | 1 |
| 11 | 43 | 27 | 30 | 75 | 11 | 14 | 92 | 6 | 2 | 53 | 38 | 9 |
| 12 | 23 | 21 | 56 | 49 | 31 | 20 | 90 | 8 | 2 | 38 | 38 | 24 |
| 13 | 45 | 17 | 38 | 74 | 21 | 5 | 94 | 6 | 0 | 49 | 31 | 20 |
| 14 | 8 | 18 | 74 | 44 | 36 | 20 | 90 | 10 | 0 | 12 | 44 | 44 |
| 15 | 16 | 21 | 63 | 57 | 22 | 21 | 91 | 9 | 1 | 24 | 36 | 40 |
| 16 | 7 | 15 | 77 | 39 | 27 | 34 | 85 | 14 | 1 | 12 | 51 | 37 |
| 17 | 34 | 42 | 24 | 77 | 18 | 5 | 89 | 9 | 2 | 50 | 48 | 3 |
| 18 | 13 | 59 | 28 | 83 | 16 | 2 | 89 | 11 | 1 | 17 | 77 | 7 |
| 19 | 15 | 35 | 50 | 65 | 22 | 14 | 89 | 10 | 1 | 29 | 44 | 27 |
| 20 | 24 | 39 | 37 | 76 | 14 | 10 | 93 | 6 | 1 | 35 | 44 | 20 |
| Average | 23 | 30 | 47 | 62 | 23 | 15 | 87 | 11 | 1 | 36 | 44 | 20 |
Abbreviation: MCQ, multiple-choice question.
As denoting to the selected book and/or resource (List of sources was detailed in Supplementary Material S1 [available in the online version]).
Good items mean Items without any flaw (violation).
The differential distribution of different violations and their percentage out of the 1,763 MCQs, (1,763/2,300 = 77%) which contain item flaws (537 good items, i.e., without flaws)
| Violation criteria | Violations per 1,763 MCQs | % (NB) | |
|---|---|---|---|
| V No. | Structural violations (flaws) | ||
| V1 | The item is not conclusive | 123 | 7 |
| V2 | The item is not typical SBA type | 280 | 16 |
| V3 | The item is not focused | 310 | 18 |
| V4 | The item is not clearly expressed | 251 | 14 |
| V5 | All the options are not uniform | 260 | 15 |
| V6 | All the options are not homogenous | 230 | 13 |
| V7 | All the options are not plausible | 108 | 6 |
| Test-wiseness violations | |||
| V8 | There is clang | 30 | 2 |
| V9 | There is clueing | 72 | 4 |
| V10 | There is convergence | 14 | 1 |
| V11 | There is absolute or vague terms | 85 | 5 |
| V12 | There is crowded key option | 122 | 10 |
| Irrelevant difficulty violations | |||
| V13 | The item is overloaded with information | 169 | 10 |
| V14 | The item is not stated positively | 250 | 14 |
| V15 | The options are not arranged in order | 1,067 | 61 |
| V16 | The options have numerical Inconsistency | 9 | 1 |
| V17 | The options have overlap | 75 | 4 |
| V18 | The options have ambiguity | 134 | 8 |
| V19 | There is “all of the above” or “none of the above” | 228 | 13 |
| V20 | The options have “complex choices” | 71 | 4 |
Abbreviations: MCQ, multiple-choice question; NB, the major violations in the current study were compared with the different resources and was presented in Supplementary Material S3 (available in the online version); SBA, single-best answer; V, violation.
Note: Violation no. in the checklist (the detailed checklist was mentioned in the Supplementary Material S2 [available in the online version]).
Distribution of the violations according to the different percentile values for the overall violations, as well as, the structural and irrelevant difficulty violations (ranking of the resources was done with sources with the least violations were at the top)
| Percentile |
Source
| No_Viol (%) | Percentile |
Source
| No_S_Viol (%) | Percentile |
Source
| No_D_Viol (%) |
|---|---|---|---|---|---|---|---|---|
| >75th (>35.5) | 10 | 68 | >75th (>74.8 | 18 | 83 | >75th (>49.5) | 9 | 84 |
| 9 | 52 | 10 | 78 | 10 | 81 | |||
| 13 | 45 | 17 | 77 | 8 | 70 | |||
| 11 | 43 | 20 | 76 | 11 | 53 | |||
| 8 | 36 | 11 | 75 | 17 | 50 | |||
| 50th–75th (15–35.5) | 17 | 34 | 50th–75th (>57.5–74.8) | 13 | 74 | 50th–75th (>31.5–49.8) | 13 | 49 |
| 20 | 24 | 19 | 65 | 6 | 40 | |||
| 12 | 23 | 9 | 63 | 12 | 38 | |||
| 15 | 16 | 3 | 62 | 20 | 35 | |||
| 19 | 15 | 7 | 58 | 4 | 34 | |||
| 25th–50th (8.5–14.5) | 4 | 14 | 25th–50th (44.5–57.5) | 15 | 57 | 25th–50th (8.5–14.5) | 19 | 29 |
| 18 | 13 | 8 | 51 | 3 | 26 | |||
| 3 | 12 | 4 | 50 | 15 | 24 | |||
| 6 | 11 | 12 | 49 | 18 | 17 | |||
| 1 | 10 | 5 | 46 | 5 | 12 | |||
| <25th (<8.5) | 14 | 8 | <25th (<44.5) | 14 | 44 | <25th (<12) | 14 | 12 |
| 16 | 7 | 1 | 42 | 16 | 12 | |||
| 2 | 4 | 16 | 39 | 1 | 10 | |||
| 5 | 4 | 2 | 38 | 2 | 8 | |||
| 7 | 4 | 6 | 37 | 7 | 8 |
No_D_Viol, items without irrelevant difficulty flaws; No_S_Viol, items without structural flaws; No_Viol, total good items without flaws.
Note: Test-wiseness violations were not included as they were minimal as seen in Table 1 .
Sources: • Study-book related multiple-choice questions (MCQs) are resources Coded: 1, 2, 3, 18, 19, and 20. • Review-book related MCQs are resources Coded: 4, 5, and 6. • Self-assessment books are resources Coded: 8, 9, 10, 11, 12, 13, 15, and 17. • Online shared MCQs are resources Coded: 7, 14, and 16.
Statistical analysis of the differential distribution the percentage of item without violations (flaws) after clustering of the resources into C1, C2, C3, and C4
| Overall items violations |
Good items
| One flaw (%) | >1 flaw (%) | Structural violations | No flaws (%) | One flaw (%) | >1 flaw (%) |
|---|---|---|---|---|---|---|---|
| C1: study book–related MCQs | 16 | 39 | 45 | C1: study book–related MCQs | 69 | 17 | 14 |
| C2: review book–related MCQs | 10 | 26 | 64 | C2: review book–related MCQs | 42 | 37 | 21 |
|
C3: self-assessment books
| 38 | 25 | 37 |
C3: self-assessment books
| 66 | 22 | 12 |
| C4: online-shared MCQs | 7 | 21 | 72 | C4: online-shared MCQs | 44 | 32 | 24 |
| Test-wiseness violations | No flaws (%) | One flaw (%) | >1 flaw (%) | Irrelevant difficulty violations | No flaws (%) | One flaw (%) | >1 flaw (%) |
| C1: study book–related MCQs | 87 | 12 | 1 | C1: study book–related MCQs | 25 | 55 | 20 |
| C2: review book–related MCQs | 75 | 22 | 3 | C2: review book–related MCQs | 32 | 33 | 35 |
|
C3: self-assessment books
| 90 | 9 | 1 |
C3: self-assessment books
| 54 | 33 | 13 |
| C4: online-shared MCQs | 86 | 13 | 1 | C4: online-shared MCQs | 11 | 53 | 36 |
Abbreviations: MCQ, multiple-choice question.
Note: χ 2 (KW) Kruskal–Wallis H -test.
Items without any flaw (violation).
Highly significant ( p < 0.01).
b Comparison of the number of the reviewed questions and percentage of different items in the previously published, with the currently reported results
| No. | Reference | Year | Discipline | No of test banks or books | Reviewed questions |
Good items
| One-item flaws | >1 item flaws |
|---|---|---|---|---|---|---|---|---|
| 1 | The current study | 2022 | Medical discipline | 20 | 2,300 | 23 | 30 | 47 |
| 2 |
Ellsworth et al
| 1990 | Educational psychology | 14 | 1,080 | 39 | 44 | 17 |
| 3 |
Hansen and Dexter
| 1997 | Business auditing | 10 | 400 | 25 | 42 | 33 |
| 4 |
Garrison et al
| 1997 | Business law | 11 | 440 | 33 | 46 | 21 |
| 5 |
Bailey et al
| 1998 | Accounting | 16 | 100 | 94 | 6 | |
| 6 |
Masters et al
| 2001 | Nursing | 17 | 2,913 | 24 | 76 | |
| 7 |
Moncada and Harmon
| 2004 | Accounting | 5 | 684 | |||
| 8 |
Tarrant et al
| 2006 | Nursing | – |
997
| 46 | 34 | 20 |
| 9 |
Ibbett and Wheldon
| 2016 | Financial accounting | 6 | 263 | 33 | 56 | 10 |
Good items mean Items without any flaw (violation).
This table is not one of our results, however, it is a review of different resourced that would serve comparison in the discussion.
2,770 multiple-choice questions were reviewed, out of them 36% (997) were derived from test-banks.