| Literature DB >> 22900979 |
Abstract
BACKGROUND: Computerized adaptive testing (CAT) is being applied to health outcome measures developed as paper-and-pencil (P&P) instruments. Differences in how respondents answer items administered by CAT vs. P&P can increase error in CAT-estimated measures if not identified and corrected.Entities:
Mesh:
Year: 2012 PMID: 22900979 PMCID: PMC3552735 DOI: 10.1186/1471-2288-12-124
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Summary of CAT simulations by underlying measurement model, DIF size, mean CAT measures and percentage of DIF items
| | | | | | | | | |
| DIF % = 10 | 0.97 | 0.97 | 0.97 | 0.97 | 0.26 | 0.26 | 0.23 | 0.24 |
| DIF % = 30 | 0.97 | 0.96 | 0.97 | 0.97 | 0.26 | 0.26 | 0.24 | 0.24 |
| | | | | | | | | |
| DIF % = 10 | 0.97 | 0.97 | 0.97 | 0.97 | 0.26 | 0.26 | 0.24 | 0.24 |
| DIF % = 30 | 0.97 | 0.96 | 0.97 | 0.97 | 0.26 | 0.26 | 0.24 | 0.24 |
| Average | 0.97 | 0.97 | 0.97 | 0.97 | 0.26 | 0.26 | 0.24 | 0.24 |
1PL = one-parameter item response model; 2PL = two-parameter item response model; CAT = computerized adaptive testing; DIF = differential item functioning; IRT = item response theory; θ = person measure.
True positive and false positive rates as a function of generating IRT model, DIF size, number of DIF items, and mean difference between modes
| 1PL | 0.42 | 10 | 0 | 54.90 | 0.91 | 60.35 | 1.36 |
| 1 | 70.00 | 2.36 | 66.35 | 3.88 | |||
| 30 | 0 | 69.66 | 0.29 | 70.67 | 0.44 | ||
| 1 | 71.09 | 1.05 | 71.91 | 2.58 | |||
| 0.63 | 10 | 0 | 78.00 | 0.56 | 83.00 | 0.89 | |
| 1 | 75.00 | 3.56 | 81.00 | 5.22 | |||
| 30 | 0 | 82.67 | 2.57 | 87.00 | 3.00 | ||
| 1 | 76.33 | 2.14 | 79.33 | 3.29 | |||
| 2PL | 0.42 | 10 | 0 | 60.82 | 0.06 | 60.42 | 0.06 |
| 1 | 58.24 | 2.09 | 55.10 | 3.89 | |||
| 30 | 0 | 62.71 | 0.14 | 66.09 | 0.28 | ||
| 1 | 66.00 | 4.86 | 66.33 | 6.86 | |||
| 0.63 | 10 | 0 | 72.00 | 0.33 | 77.00 | 0.55 | |
| 1 | 70.00 | 2.39 | 77.00 | 3.56 | |||
| 30 | 0 | 72.33 | 3.14 | 77.67 | 3.14 | ||
| 1 | 66.33 | 3.33 | 69.00 | 5.00 | |||
| Average | 69.13 | 1.86 | 71.76 | 2.75 | |||
1PL = one-parameter item response model; 2PL = two-parameter item response model; CAT = computerized adaptive testing; DIF = differential item functioning; DIF% = percentage of items simulated with DIF; FP% = percentage of false positive DIF results; IRT = item response theory; TP% = percentage of true positive DIF results; θ = person measure.
Univariate and multivariate multilevel logistic regression to predict correct detection of mode effects defined by Robust Z and Bayesian 95% credible interval as a function of study variables
| Size of DIF | 1.49** | 0.55 | 3.42** | (2.58-4.54) |
| Percentage of DIF | 1.17 | 0.52 | 1.20 | (0.89,1.61) |
| 2PL IRT Modelb | 0.76** | 0.53 | 0.47** | (0.35,0.64) |
| Diff. Mean | 0.99 | 0.50 | 0.66** | (0.50,0.87) |
| CAT Item Usagec | 21133.86** | 0.94 | 3111.68** | (1417.85,6829.03) |
| Absolute Item Difficultyd | 0.03** | 0.85 | 0.10** | (0.07,0.14) |
| Item Discriminationd | 3.62** | 0.60 | 3.12** | (2.34,4.17) |
| Size of DIF | 1.73** | 0.56 | 3.52** | (2.73,4.53) |
| Percentage of DIF | 1.17 | 0.52 | 1.16 | (0.89,1.50) |
| 2PL IRT Modelb | 0.74** | 0.53 | 0.50** | (0.39,0.65) |
| Diff. Mean | 0.91 | 0.49 | 0.60** | (0.47,0.77) |
| CAT Item Usagec | 2468.29** | 0.92 | 505.64** | (264.29,967.37) |
| Absolute Item Difficultyd | 0.04** | 0.83 | 0.15** | (0.11,0.20) |
| Item Discriminationd | 2.86** | 0.58 | 1.99** | (1.54,2.56) |
Correct Detection of Mode Effects = true positive detection of mode DIF among items simulated with mode DIF; AUC = area under the ROC curve; CI = 95% confidence interval; IRT = item response theory model used to generate response data and parameters used in CAT; CAT item usage = number of times a given item was administered by CAT divided by 100; * p < .05; ** p < .01.
Univariate and multivariate multilevel logistic regression to predict incorrect detection of mode effects defined by Robust Z and Bayesian 95% credible interval as a function of study variables
| Size of DIF | 1.93** | 0.55 | 2.01** | (1.36,2.97) |
| Percentage of DIF | 1.44 | 0.55 | 1.48 | (0.99,2.20) |
| 2PL IRT Model | 1.14 | 0.52 | 0.96 | (0.63,1.46) |
| Diff. Mean | 3.31** | 0.59 | 3.95** | (2.56,6.08) |
| CAT Item Usage | 1.91** | 0.54 | 4.17** | (3.11,5.60) |
| Item Difficulty | 0.28** | 0.62 | 0.12** | (0.08,0.19) |
| Item Discrimination | 1.64** | 0.56 | 1.23 | (0.96,1.58) |
| Size of DIF | 1.62* | 0.55 | 1.61** | (1.20,2.15) |
| Percentage of DIF | 1.33 | 0.53 | 1.30 | (0.97,1.75) |
| 2PL IRT Model | 1.14 | 0.52 | 1.08 | (0.80,1.47) |
| Diff. Mean | 1.28E+08** | 0.62 | 4.01** | (2.90,5.55) |
| CAT Item Usage | 0.96 | 0.44 | 2.36** | (1.82,3.06) |
| Item Difficulty | 0.30** | 0.65 | 0.16** | (0.11,0.22) |
| Item Discrimination | 1.19 | 0.51 | 1.02 | (0.82,1.26) |
Incorrect detection of mode effects = False positive identification of DIF due to mode among items not simulated with mode DIF; AUC = area under the ROC curve; CI = 95% confidence interval; IRT = item response theory model used to generate response data and parameters used in CAT; CAT item usage = number of times a given item was administered by CAT divided by 100; * p < .05; ** p < .01.
Figure 1Mean Predicted True and False Positive Rates by P&P Item Difficulty and Analysis Procedure. Solid line – Robust z (RZ) true positive (TP%) rate. Dashed line – 95% credible interval (CrI) true positive rate. Dotted line – Robust Z false positive (FP%) rate. Dash dot line – 95% credible interval false positive rate.