| Literature DB >> 30761036 |
Chun Wang1, David J Weiss2, Shiyang Su3.
Abstract
This study explored calibrating a large item bank for use in multidimensional health measurement with computerized adaptive testing, using both item responses and response time (RT) information. The Activity Measure for Post-Acute Care is a patient-reported outcomes measure comprised of three correlated scales (Applied Cognition, Daily Activities, and Mobility). All items from each scale are Likert type, so that a respondent chooses a response from an ordered set of four response options. The most appropriate item response theory model for analyzing and scoring these items is the multidimensional graded response model (MGRM). During the field testing of the items, an interviewer read each item to a patient and recorded, on a tablet computer, the patient's responses and the software recorded RTs. Due to the large item bank with over 300 items, data collection was conducted in four batches with a common set of anchor items to link the scale. van der Linden's (2007) hierarchical modeling framework was adopted. Several models, with or without interviewer as a covariate and with or without interaction between interviewer and items, were compared for each batch of data. It was found that the model with the interaction between interviewer and item, when the interaction effect was constrained to be proportional, fit the data best. Therefore, the final hierarchical model with a lognormal model for RT and the MGRM for response data was fitted to all batches of data via a concurrent calibration. Evaluation of parameter estimates revealed that (1) adding response time information did not affect the item parameter estimates and their standard errors significantly; (2) adding response time information helped reduce the standard error of patients' multidimensional latent trait estimates, but adding interviewer as a covariate did not result in further improvement. Implications of the findings for follow up adaptive test delivery design are discussed.Entities:
Keywords: health measurement; hierarchical model; item response theory (IRT); multidimensional graded response model; response time
Year: 2019 PMID: 30761036 PMCID: PMC6361798 DOI: 10.3389/fpsyg.2019.00051
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1Path diagrams of four different bivariate models (the total number of items is hypothetically 96 for illustration purpose).
Number of unique items per domain for the four batches.
| 1 | 28 | 27 | 30 | 85 | 24 |
| 2 | 24 | 24 | 24 | 72 | 24 |
| 3 | 24 | 24 | 24 | 72 | 24 |
| 4 | 23 | 23 | 25 | 71 | 24 |
| Linking | 8 | 8 | 8 | 24 | 24 |
| Total | 107 | 106 | 111 | 324 | — |
Descriptive statistics of the observed data, by batch.
| Before cleaning | 630 | 542 | 555 | 543 |
| After cleaning | 563 | 490 | 500 | 507 |
| Trimmed proportion of RTs | 6.24% | 3.21% | 3.16% | 4.59% |
| 2 categories | 1 | 0 | 0 | 0 |
| 3 categories | 31 | 26 | 22 | 23 |
| 4 categories | 77 | 70 | 94 | 72 |
| Mean | 9.27 | 9.79 | 9.82 | 8.39 |
| SD | 21.28 | 17.39 | 25.37 | 17.62 |
| Skewness | 41.84 | 32.40 | 35.85 | 29.08 |
| Mean | 8.21 | 8.44 | 8.06 | 7.18 |
| SD | 4.14 | 4.92 | 4.80 | 4.05 |
| Skewness | 1.48 | 1.66 | 1.65 | 1.53 |
Global fit results (AIC, BIC,−2Log-likelihood) for the four bivariate models, by batch.
| Model 0 | 736 | 133566 | 136755 | 132094 |
| Model 1 | 1281 | 133174 | 138725 | 130612 |
| 131834 | ||||
| Model 3 | 741 | 133409 | 136620 | 131926 |
| Model 0 | 652 | 102468 | 105202 | 101164 |
| Model 1 | 940 | 102049 | 105992 | 100170 |
| 100924 | ||||
| Model 3 | 655 | 102339 | 105086 | 101030 |
| Model 0 | 656 | 111384 | 114149 | 110072 |
| Model 1 | 1040 | 110613 | 114996 | 108532 |
| 109682 | ||||
| Model 3 | 660 | 111323 | 114105 | 110004 |
| Model 0 | 648 | 108550 | 111290 | 107254 |
| Model 1 | 1028 | 107733 | 112080 | 105676 |
| 106870 | ||||
| Model 3 | 652 | 108364 | 111121 | 107060 |
Bold values highlighted the best-fitting model based on the information criteria.
Global model fit results.
| MGRM | 185303.933 | 190099.963 |
| Model 0 | 327447.703 | 336067.811 |
| | ||
| MGRM | 74013.471 | 75391.453 |
| Model 0 | 134224.604 | 136824.572 |
| | ||
Bold values highlighted the best-fitting model based on the information criteria.
Figure 2Scatterplots of item discrimination parameters (a) across three models. (A) MGRM vs. Model 0, (B) MGRM vs. Model 2, and (C) Model 0 vs. Model 2.
Figure 3Scatterplots of item boundary parameters (from left to right: , , ) across three models. (A) MGRM vs. Model 0, (B) MGRM vs. Model 2, and (C) Model 0 vs. Model 2.
Figure 4Scatterplot of estimates of between Model 0 and Model 2.
An illustration of the linear transformation relationship of from Model 0 and Model 2.
| 1 to | (τ | (τ | 1 |
| (τ | (τ | 2 | |
| (τ | (τ | 3 | |
| (τ | (τ | 4 | |
| (τ | (τ | 5 (reference) |
Mean and SD of SE of from three models.
| Mean | 0.307 | 0.280 | 0.279 |
| SD | 0.093 | 0.076 | 0.076 |
| Mean | 0.252 | 0.242 | 0.241 |
| SD | 0.082 | 0.071 | 0.070 |
| Mean | 0.178 | 0.171 | 0.171 |
| SD | 0.079 | 0.074 | 0.074 |
Final Pearson correlation parameter estimates for the three models from two calibration stages.
| MGRM | 0.624 | 0.468 | 0.846 | – | – | – | – |
| Model 0 | 0.625 | 0.488 | 0.839 | 0.425 | 0.458 | 0.418 | – |
| Model 2 | 0.628 | 0.492 | 0.840 | 0.583 | 0.629 | 0.578 | (1.150, 1.053, 0.725, −0.911) |
| MGRM | 0.702 | 0.545 | 0.869 | – | – | – | – |
| Model 0 | 0.707 | 0.584 | 0.881 | 0.400 | 0.433 | 0.457 | – |
| Model 2 | 0.706 | 0.583 | 0.880 | 0.527 | 0.577 | 0.605 | (1.063, −0.286, 1.315, −0.135, 1.046) |