| Literature DB >> 34295279 |
Luise Fischer1,2, Theresa Rohm1,2, Claus H Carstensen2, Timo Gnambs1.
Abstract
In the context of item response theory (IRT), linking the scales of two measurement points is a prerequisite to examine a change in competence over time. In educational large-scale assessments, non-identical test forms sharing a number of anchor-items are frequently scaled and linked using two- or three-parametric item response models. However, if item pools are limited and/or sample sizes are small to medium, the sparser Rasch model is a suitable alternative regarding the precision of parameter estimation. As the Rasch model implies stricter assumptions about the response process, a violation of these assumptions may manifest as model misfit in form of item discrimination parameters empirically deviating from their fixed value of one. The present simulation study investigated the performance of four IRT linking methods-fixed parameter calibration, mean/mean linking, weighted mean/mean linking, and concurrent calibration-applied to Rasch-scaled data with a small item pool. Moreover, the number of anchor items required in the absence/presence of moderate model misfit was investigated in small to medium sample sizes. Effects on the link outcome were operationalized as bias, relative bias, and root mean square error of the estimated sample mean and variance of the latent variable. In the light of this limited context, concurrent calibration had substantial convergence issues, while the other methods resulted in an overall satisfying and similar parameter recovery-even in the presence of moderate model misfit. Our findings suggest that in case of model misfit, the share of anchor items should exceed 20% as is currently proposed in the literature. Future studies should further investigate the effects of anchor item composition regarding unbalanced model misfit.Entities:
Keywords: Rasch model; anchor- items design; item response theory; limited item pools; linking methods; model misfit
Year: 2021 PMID: 34295279 PMCID: PMC8289883 DOI: 10.3389/fpsyg.2021.633896
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
True item difficulty and item discrimination parameters of the four test forms (t1–t4).
Descriptive statistics of the true anchor item parameters split by the experimental factor number of anchor items.
| t1/2 | t2/3 | t3/4 | ||||||||
| Anchor item difficulty parameters | ||||||||||
| Anchor | Position | Position | Position | |||||||
| 3 | 2,5,8 | 0.369 | 1.051 | 2,5,8 | 0.976 | 0.855 | 2,5,8 | 1.393 | 1.193 | |
| 5 | 2,3,4,6,9 | 0.333 | 1.050 | 1,5,6,7,8 | 0.873 | 1.138 | 2,3,4,6,9 | 1.316 | 1.193 | |
| 7 | 1,2,4,5,6,7,9 | 0.332 | 1.066 | 1,3,4,5,6,8,9 | 0.976 | 1.169 | 1,3,4,5,6,7,9 | 1.362 | 1.096 | |
| 9 | 1–9 | 0.360 | 1.022 | 1–9 | 0.950 | 1.074 | 1–9 | 1.358 | 1.120 | |
| 3 | 2,5,8 | 1.015 | 0.256 | 2,5,8 | 0.927 | 0.006 | 2,5,8 | 0.946 | 0.136 | |
| 5 | 2,3,4,6,9 | 1.013 | 0.160 | 1,5,6,7,8 | 0.978 | 0.058 | 2,3,4,6,9 | 1.035 | 0.123 | |
| 7 | 1,2,4,5,6,7,9 | 0.944 | 0.178 | 1,3,4,5,6,8,9 | 1.023 | 0.077 | 1,3,4,5,6,7,9 | 1.032 | 0.171 | |
| 9 | 1–9 | 1.013 | 0.206 | 1–9 | 1.006 | 0.076 | 1–9 | 1.030 | 0.148 | |
FIGURE 1Bias of sample mean over three time points (t2–t4). The figure is split by three linking methods and the experimental factors number of anchor items, sample size and Rasch model-data fit. FPC = fixed parameter calibration, m/m = mean/mean linking, w. m/m = weighted mean/mean linking. 95% confidence intervals are depicted.
FIGURE 2RMSE of sample mean over three time points (t2–t4). The figure is split by the three linking methods and the experimental factors number of anchor items, sample size and Rasch model-data fit. FPC = fixed parameter calibration, Mean/Mean = mean/mean linking, w. Mean/Mean = weighted Mean/Mean. 95% confidence intervals are depicted.
FIGURE 3Bias of sample variance over four time points (t1–t4). The figure is split by three linking methods and the experimental factors number of anchor items, sample size and Rasch model-data fit. FPC = fixed parameter calibration, m/m = mean/mean linking, w. m/m = weighted mean/mean linking. 95% confidence intervals are depicted.
FIGURE 4RMSE of sample variance over four time points (t1–t4). The figure is split by three linking methods and the experimental factors number of anchor items, sample size and Rasch model-data fit. FPC = fixed parameter calibration, Mean/Mean = mean/mean linking, w. Mean/Mean = weighted Mean/Mean. 95% confidence intervals are depicted.