| Literature DB >> 32468274 |
Felicitas-Maria Lahner1,2, Stefan Schauber3, Andrea Carolin Lörwald4, Roger Kropf5, Sissel Guttormsen4, Martin R Fischer6, Sören Huwendiek4.
Abstract
INTRODUCTION: In high-stakes assessment, the measurement precision of pass-fail decisions is of great importance. A concept for analyzing the measurement precision at the cut score is conditional reliability, which describes measurement precision for every score achieved in an exam. We compared conditional reliabilities in Classical Test Theory (CTT) and Item Response Theory (IRT) with a special focus on the cut score and potential factors influencing conditional reliability at the cut score.Entities:
Keywords: Conditional reliability; Measurement precision; Multiple choice exams; Reliability
Mesh:
Year: 2020 PMID: 32468274 PMCID: PMC7459012 DOI: 10.1007/s40037-020-00586-0
Source DB: PubMed Journal: Perspect Med Educ ISSN: 2212-2761
Fit indices for each exam
| Infit | Outfit | SRMR | SRMSR | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Exam | Q3 | Min | Max | Mean | % not fitting items | Min | Max | Mean | % not fitting items | ||
| 1 | 0.06 | 0.88 | 1.26 | 1.02 | 6.67 | 0.40 | 1.57 | 1.03 | 10.67 | 0.067 | 0.084 |
| 2 | 0.07 | 0.85 | 1.36 | 1.01 | 2.00 | 0.00 | 2.04 | 0.98 | 13.33 | Inf | Inf |
| 3 | 0.06 | 0.90 | 1.34 | 1.03 | 3.47 | 0.56 | 1.92 | 1.05 | 6.25 | 0.071 | 0.089 |
| 4 | 0.06 | 0.00 | 1.24 | 1.03 | 4.05 | 0.00 | 1.47 | 1.05 | 8.11 | 0.068 | 0.085 |
| 5 | 0.07 | 0.90 | 1.26 | 1.02 | 1.68 | 0.36 | 1.87 | 1.03 | 3.36 | 0.078 | 0.098 |
| 6 | 0.07 | 0.89 | 1.21 | 1.02 | 1.67 | 0.00 | 1.33 | 1.01 | 1.67 | 0.073 | 0.092 |
| 7 | 0.07 | 0.90 | 1.18 | 1.02 | 0.00 | 0.48 | 1.57 | 1.04 | 8.55 | 0.072 | 0.089 |
| 8 | 0.07 | 0.88 | 1.21 | 1.01 | 3.33 | 0.56 | 1.53 | 1.00 | 6.67 | 0.078 | 0.098 |
| 9 | 0.07 | 0.92 | 1.13 | 1.01 | 0.00 | 0.00 | 1.67 | 1.00 | 1.68 | 0.069 | 0.087 |
| 10 | 0.05 | 0.91 | 1.22 | 1.02 | 6.78 | 0.79 | 1.27 | 1.03 | 23.73 | 0.060 | 0.076 |
| 11 | 0.05 | 0.90 | 1.20 | 1.02 | 6.78 | 0.54 | 1.48 | 1.03 | 20.34 | 0.061 | 0.077 |
| 12 | 0.05 | 0.90 | 1.27 | 1.01 | 6.78 | 0.64 | 1.36 | 1.02 | 16.95 | 0.067 | 0.084 |
| 13 | 0.06 | 0.88 | 1.18 | 1.01 | 3.33 | 0.52 | 1.33 | 0.99 | 15.00 | 0.068 | 0.087 |
| 14 | 0.05 | 0.90 | 1.22 | 1.02 | 3.70 | 0.65 | 2.16 | 1.03 | 14.81 | 0.059 | 0.074 |
| 15 | 0.05 | 0.87 | 1.30 | 1.03 | 5.13 | 0.57 | 2.17 | 1.06 | 23.08 | 0.065 | 0.081 |
| 16 | 0.07 | 0.94 | 1.14 | 1.02 | 0.00 | 0.67 | 1.94 | 1.04 | 1.72 | 0.067 | 0.083 |
| 17 | 0.06 | 0.93 | 1.08 | 1.01 | 0.00 | 0.85 | 1.30 | 1.01 | 0.00 | 0.066 | 0.082 |
| 18 | 0.06 | 1.00 | 1.00 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 0.00 | 0.063 | 0.081 |
| 19 | 0.05 | 0.76 | 1.31 | 1.00 | 11.67 | 0.22 | 1.57 | 0.95 | 40.00 | 0.074 | 0.093 |
| 20 | 0.05 | 0.89 | 1.28 | 1.02 | 10.00 | 0.74 | 1.66 | 1.04 | 23.33 | 0.063 | 0.079 |
| 21 | 0.07 | 0.91 | 1.39 | 1.02 | 6.67 | 0.77 | 1.51 | 1.01 | 26.67 | 0.079 | 0.140 |
| 22 | 0.05 | 0.83 | 1.40 | 1.03 | 16.67 | 0.52 | 1.57 | 1.04 | 33.33 | 0.065 | 0.082 |
| 23 | 0.05 | 0.90 | 1.23 | 1.02 | 5.56 | 0.74 | 1.48 | 1.02 | 21.11 | 0.058 | 0.073 |
| 24 | 0.06 | 0.92 | 1.13 | 1.01 | 2.22 | 0.71 | 1.51 | 1.02 | 2.22 | 0.066 | 0.121 |
| 25 | 0.05 | 0.91 | 1.17 | 1.02 | 2.02 | 0.62 | 1.90 | 1.03 | 10.10 | 0.053 | 0.066 |
| 26 | 0.05 | 0.86 | 1.22 | 1.02 | 9.09 | 0.46 | 1.41 | 1.01 | 19.19 | 0.057 | 0.072 |
| 27 | 0.05 | 0.84 | 1.24 | 1.01 | 0.00 | 0.38 | 2.46 | 1.03 | 22.00 | 0.062 | 0.078 |
| 28 | 0.05 | 0.90 | 1.28 | 1.01 | 6.12 | 0.40 | 1.42 | 1.01 | 13.27 | 0.057 | 0.072 |
| 29 | 0.05 | 0.94 | 1.17 | 1.03 | 0.00 | 0.00 | 1.58 | 1.04 | 5.04 | 0.056 | 0.070 |
| 30 | 0.05 | 0.94 | 1.10 | 1.01 | 0.00 | 0.56 | 1.23 | 1.00 | 0.00 | 0.056 | 0.070 |
| 31 | 0.05 | 0.95 | 1.12 | 1.01 | 1.72 | 0.68 | 2.03 | 1.01 | 5.17 | 0.055 | 0.069 |
| 32 | 0.05 | 0.93 | 1.14 | 1.02 | 1.65 | 0.69 | 1.21 | 1.02 | 4.96 | 0.055 | 0.069 |
Fig. 1Mean conditional reliability and standard deviation in classical test theory (CTT) and item response theory (IRT) calculated over 32 exams
Regression analyses to analyze the influence of the used theory, the number of items, the range of examinees’ performance, medical school, percentage of MTF items and interactions between the respective theory and the five other variables on conditional reliability at the cut score using, displaying the unstandardized beta (B), the standard error for the unstandardized beta (SE(B)), the standardized beta (β), the t‑test statistic (t), and the probability value (p)
| Variable | B | SE (B) | β | t | |
|---|---|---|---|---|---|
| Intercept | −0.542 | 0.227 | 0.000 | 0.000 | <0.05 |
| Theory | 0.533 | 0.144 | 0.677 | 13.190 | <0.05 |
| Range of examinees’ performance | 0.011 | 0.002 | 0.753 | 10.503 | <0.05 |
| Number of items | 0.003 | 0.001 | 0.521 | 6.531 | <0.05 |
| Medical school | 0.020 | 0.032 | 0.078 | 0.901 | 0.37 |
| Percentage of MTF items | 0.471 | 0.316 | 0.130 | 1.747 | 0.09 |
| Theory* Range of examinees’ performance | −0.004 | 0.001 | −0.282 | −3.904 | <0.05 |
| Theory* Number of items | −0.001 | 0.001 | −0.150 | −1.868 | 0.07 |
| Theory* Medical school | −0.007 | 0.020 | −0.032 | −0.362 | 0.72 |
| Theory* Percentage of MTF items | −0.197 | 0.200 | −0.074 | −0.988 | 0.33 |
Correlations between influencing variables (*p < 0.05)
| Year of study | Number of items | |
|---|---|---|
| Number of items | 0.48* | |
| Range of examinees’ performance | −0.78* | −0.67* |