| Literature DB >> 32038427 |
Jianhua Xiong1,2, Shuliang Ding2, Fen Luo1,2, Zhaosheng Luo1.
Abstract
Computerized adaptive testing (CAT) is an efficient testing mode, which allows each examinee to answer appropriate items according his or her latent trait level. The implementation of CAT requires a large-scale item pool, and item pool needs to be frequently replenished with new items to ensure test validity and security. Online calibration is a technique to calibrate the parameters of new items in CAT, which seeds new items in the process of answering operational items, and estimates the parameters of new items through the response data of examinees on new items. The most popular estimation methods include one EM cycle method (OEM) and multiple EM cycle method (MEM) under dichotomous item response theory models. This paper extends OEM and MEM to the graded response model (GRM), a popular model for polytomous data with ordered categories. Two simulation studies were carried out to explore online calibration under a variety of conditions, including calibration design, initial item parameter calculation methods, calibration methods, calibration sample size and the number of categories. Results show that the calibration accuracy of new items were acceptable, and which were affected by the interaction of some factors, therefore some conclusions were given.Entities:
Keywords: computerized adaptive testing; graded response model; multiple EM cycle method; one EM cycle method; online calibration; squeezing average method
Year: 2020 PMID: 32038427 PMCID: PMC6989429 DOI: 10.3389/fpsyg.2019.03085
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
RMSE under different combinations.
| Random | Poly-Sq-Ini | OEM | 0.2047 | 0.2696 | 0.1567 | 0.2377 |
| MEM | 0.2022 | 0.1705 | 0.1522 | 0.2009 | ||
| Poly-Ini | OEM | 0.2892 | 0.1789 | 0.1705 | 0.2306 | |
| MEM | 0.2632 | 0.2142 | 0.1847 | 0.2595 | ||
| Adaptive | Poly-Sq-Ini | OEM | 0.2266 | 0.2651 | 0.2108 | 0.2501 |
| MEM | 0.2259 | 0.2700 | 0.2101 | 0.2433 | ||
| Poly-Ini | OEM | 0.2324 | 0.3106 | 0.2005 | 0.3179 | |
| MEM | 0.2324 | 0.3116 | 0.2070 | 0.3231 | ||
Bias under different combinations.
| Random | Poly-Sq-Ini | OEM | 0.1258 | −0.1310 | −0.0549 | 0.0261 |
| MEM | 0.0483 | −0.0423 | −0.0422 | −0.0398 | ||
| Poly-Ini | OEM | 0.2380 | 0.0727 | −0.0367 | −0.1391 | |
| MEM | 0.2163 | 0.1065 | −0.0292 | −0.1589 | ||
| Adaptive | Poly-Sq-Ini | OEM | 0.0286 | 0.0777 | 0.0099 | −0.0482 |
| MEM | 0.0296 | 0.0783 | 0.0126 | −0.0472 | ||
| Poly-Ini | OEM | 0.1744 | 0.2182 | 0.0120 | −0.1887 | |
| MEM | 0.1751 | 0.2159 | 0.0056 | −0.1875 | ||
Figure 1RMSE and bias of a- parameter and b-parameters under different combinations. C1 denotes the combination of Random, Poly-Sq-Ini and OEM; C2 denotes the combination of Random, Poly-Sq-Ini and MEM; C3 denotes the combination of Random, Poly-Ini and OEM; C4 denotes the combination of Random, Poly-Ini and MEM; C5 denotes the combination of Adaptive, Poly-Sq-Ini and OEM; C6 denotes the combination of Adaptive, Poly-Sq-Ini and MEM; C7 denotes the combination of Adaptive, Poly-Ini and OEM; C8 denotes the combination of Adaptive, Poly-Ini and MEM.
RMSE of different calibration sample size under different categories.
| 0.2730 | 0.2716 | 0.2683 | 0.2656 | 0.2722 | ||
| 0.2495 | 0.2259 | 0.2216 | 0.2078 | 0.2060 | ||
| 0.2876 | 0.2660 | 0.2602 | 0.2554 | 0.2470 | ||
| Mean( | 0.2706 | 0.2481 | 0.2427 | 0.2338 | 0.2286 | |
| 0.2189 | 0.2141 | 0.2119 | 0.2074 | 0.2033 | ||
| 0.2413 | 0.2237 | 0.1954 | 0.1919 | 0.1865 | ||
| 0.2127 | 0.1827 | 0.1723 | 0.1673 | 0.1568 | ||
| 0.2674 | 0.2395 | 0.2270 | 0.2249 | 0.2156 | ||
| Mean( | 0.2439 | 0.2187 | 0.2014 | 0.1993 | 0.1899 | |
| 0.2166 | 0.2150 | 0.2138 | 0.2149 | 0.2081 | ||
| 0.2989 | 0.2866 | 0.2599 | 0.2458 | 0.2262 | ||
| 0.2232 | 0.1968 | 0.1760 | 0.1634 | 0.1577 | ||
| 0.2357 | 0.2016 | 0.1908 | 0.1610 | 0.1659 | ||
| 0.2996 | 0.2611 | 0.2564 | 0.2337 | 0.2294 | ||
| Mean( | 0.2722 | 0.2432 | 0.2345 | 0.2098 | 0.2007 | |
| 0.2340 | 0.2407 | 0.2353 | 0.2301 | 0.2208 | ||
| 0.2837 | 0.2616 | 0.2604 | 0.2503 | 0.2491 | ||
| 0.1929 | 0.1706 | 0.1662 | 0.1583 | 0.1511 | ||
| 0.1693 | 0.1451 | 0.1419 | 0.1346 | 0.1210 | ||
| 0.1950 | 0.1743 | 0.1633 | 0.1600 | 0.1462 | ||
| 0.2672 | 0.2565 | 0.2368 | 0.2356 | 0.2257 | ||
| Mean( | 0.2284 | 0.2095 | 0.2044 | 0.1976 | 0.1873 | |
Bias of different calibration sample size under different categories.
| 0.1517 | 0.1561 | 0.1488 | 0.1564 | 0.1611 | ||
| −0.0231 | −0.0193 | −0.0289 | −0.0253 | −0.0336 | ||
| −0.0976 | −0.0912 | −0.0979 | −0.0945 | −0.1047 | ||
| Mean( | −0.0603 | −0.0553 | −0.0634 | −0.0599 | −0.0692 | |
| 0.0479 | 0.0415 | 0.0546 | 0.0500 | 0.0398 | ||
| −0.0451 | −0.0479 | −0.0424 | −0.0457 | −0.0589 | ||
| −0.046 | −0.0365 | −0.0395 | −0.0403 | −0.0502 | ||
| −0.0398 | −0.0385 | −0.0502 | −0.0435 | −0.0478 | ||
| Mean( | −0.0365 | −0.0409 | −0.0440 | −0.0432 | −0.0523 | |
| −0.0491 | −0.0445 | −0.0602 | −0.0477 | −0.0449 | ||
| −0.086 | −0.0829 | −0.089 | −0.1059 | −0.0957 | ||
| −0.0491 | −0.0354 | −0.0444 | −0.0544 | −0.0451 | ||
| −0.0298 | −0.0186 | −0.0227 | −0.0238 | −0.0115 | ||
| 0.0032 | 0.0132 | 0.0239 | 0.0204 | 0.0347 | ||
| Mean( | −0.0404 | −0.0309 | −0.0330 | −0.0409 | −0.0294 | |
| −0.1217 | −0.1305 | −0.1289 | −0.1232 | −0.1199 | ||
| −0.1567 | −0.1473 | −0.1699 | −0.1519 | −0.1568 | ||
| −0.0752 | −0.0662 | −0.0789 | −0.0665 | −0.0737 | ||
| −0.0238 | −0.0155 | −0.0239 | −0.0126 | −0.0211 | ||
| 0.0192 | 0.0273 | 0.0294 | 0.0390 | 0.0315 | ||
| 0.0817 | 0.0990 | 0.1018 | 0.1168 | 0.1031 | ||
| Mean( | −0.0309 | −0.0205 | −0.0283 | −0.0150 | −0.0233 | |
Figure 2RMSE of a- parameter and b-parameters under different categories.
Figure 3RMSE of different calibration sample size under different categories. 2-C, 2-categories; 3-C, 3-categories; 4-C, 4-categories; 5-C, 5-categories. Figure 5 also has the same definition.
Figure 5Bias of different calibration sample size under different categories.
Figure 4Bias of a- parameter and b-parameters under different categories.
Estimation accuracy of ability under different CAT scenarios.
| 2,000 | Variable-length | 0.1904 | 0.0007 |
| 10 | 0.1924 | −0.0004 | |
| 20 | 0.1340 | −0.0008 | |
| 30 | 0.1105 | −0.0012 | |
| 3,000 | Variable-length | 0.1882 | 0.0033 |
| 10 | 0.2012 | −0.0001 | |
| 20 | 0.1286 | −0.0024 | |
| 30 | 0.1057 | 0.0050 |
For variable-length CAT, the cumulative information was set to 25.
Bias of new item parameters under different CAT scenarios.
| 2,000 | Variable-length | 0.0998 | −0.0005 | −0.0330 | −0.0685 |
| 10 | −0.0719 | −0.0889 | −0.0088 | 0.0615 | |
| 20 | −0.0001 | −0.0464 | −0.0011 | 0.0272 | |
| 30 | 0.0589 | −0.0168 | −0.0211 | −0.0269 | |
| 3,000 | Variable-length | 0.0939 | −0.0117 | −0.0239 | −0.0458 |
| 10 | −0.0650 | −0.1364 | −0.0486 | 0.0287 | |
| 20 | −0.0228 | −0.0187 | −0.0106 | 0.0011 | |
| 30 | 0.0613 | −0.0152 | −0.0232 | −0.0486 | |
Taking 3-categories items as example and the calibration sample size is set to 500.
RMSE of new item parameters under different CAT scenarios.
| 2,000 | Variable-length | 0.2483 | 0.2109 | 0.1802 | 0.2294 |
| 10 | 0.2345 | 0.2224 | 0.1557 | 0.2182 | |
| 20 | 0.2169 | 0.1954 | 0.1545 | 0.2242 | |
| 30 | 0.2232 | 0.2060 | 0.1685 | 0.2357 | |
| 3,000 | Variable-length | 0.2337 | 0.1921 | 0.1620 | 0.2203 |
| 10 | 0.2302 | 0.2571 | 0.1668 | 0.2143 | |
| 20 | 0.2121 | 0.2102 | 0.1640 | 0.2078 | |
| 30 | 0.2069 | 0.2012 | 0.1664 | 0.2235 | |