| Literature DB >> 34463910 |
Steven P Reise1, Han Du2, Emily F Wong2, Anne S Hubbard2, Mark G Haviland3.
Abstract
Item response theory (IRT) model applications extend well beyond cognitive ability testing, and various patient-reported outcomes (PRO) measures are among the more prominent examples. PRO (and like) constructs differ from cognitive ability constructs in many ways, and these differences have model fitting implications. With a few notable exceptions, however, most IRT applications to PRO constructs rely on traditional IRT models, such as the graded response model. We review some notable differences between cognitive and PRO constructs and how these differences can present challenges for traditional IRT model applications. We then apply two models (the traditional graded response model and an alternative log-logistic model) to depression measure data drawn from the Patient-Reported Outcomes Measurement Information System project. We do not claim that one model is "a better fit" or more "valid" than the other; rather, we show that the log-logistic model may be more consistent with the construct of depression as a unipolar phenomenon. Clearly, the graded response and log-logistic models can lead to different conclusions about the psychometrics of an instrument and the scaling of individual differences. We underscore, too, that, in general, explorations of which model may be more appropriate cannot be decided only by fit index comparisons; these decisions may require the integration of psychometrics with theory and research findings on the construct of interest.Entities:
Keywords: IRT model assumptions; graded response model; log-logistic model
Mesh:
Year: 2021 PMID: 34463910 PMCID: PMC8437930 DOI: 10.1007/s11336-021-09802-0
Source DB: PubMed Journal: Psychometrika ISSN: 0033-3123 Impact factor: 2.500
PROMIS depression item content
| Item | Content |
|---|---|
| 1 | I felt worthless |
| 2 | I felt that I had nothing to look forward to |
| 3 | I felt helpless |
| 4 | I felt sad |
| 5 | I felt like a failure |
| 6 | I felt depressed |
| 7 | I felt unhappy |
| 8 | I felt hopeless |
Item-scale correlations (corrected for item overlap), means, and response proportions
| Item | R.drop | Mean | Response proportions | ||||
|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | |||
| 1 | .83 | 1.05 | .47 | .21 | .18 | .09 | .05 |
| 2 | .84 | 1.20 | .42 | .20 | .21 | .11 | .06 |
| 3 | .85 | 1.21 | .40 | .21 | .22 | .12 | .05 |
| 4 | .83 | 1.44 | .28 | .25 | .28 | .14 | .06 |
| 5 | .87 | 1.15 | .43 | .20 | .21 | .10 | .06 |
| 6 | .85 | 1.33 | .35 | .22 | .24 | .14 | .06 |
| 7 | .84 | 1.43 | .28 | .25 | .28 | .13 | .06 |
| 8 | .87 | 1.14 | .44 | .19 | .20 | .12 | .05 |
| Mean | .85 | 1.24 | .38 | .22 | .23 | .12 | .06 |
| SD | 0.01 | 0.05 | .07 | .02 | .03 | .02 | .00 |
R.drop is itemtest correlation after dropping item score from composite.
Fig. 1Histograms of raw score distributions include (top) and excluding (bottom) all zero response patterns.
Graded response model (GRM) item parameter estimates
| Loading | GRM slope | GRM locations | ||||
|---|---|---|---|---|---|---|
| 1 | .92 | 3.86 | - 0.05 | .54 | 1.14 | 1.75 |
| 2 | .92 | 4.04 | - 0.19 | .38 | 1.02 | 1.6 |
| 3 | .93 | 4.22 | - 0.24 | .36 | 1.01 | 1.71 |
| 4 | .90 | 3.59 | - 0.65 | .12 | .96 | 1.71 |
| 5 | .94 | 4.70 | - 0.14 | .40 | 1.03 | 1.61 |
| 6 | .92 | 4.06 | - 0.39 | .24 | .93 | 1.65 |
| 7 | .91 | 3.77 | - 0.62 | .13 | .96 | 1.70 |
| 8 | .95 | 4.99 | - 0.12 | .41 | 1.01 | 1.69 |
| Mean | 4.15 | - 0.30 | .32 | 1.01 | 1.68 | |
| SD | .44 | .21 | .14 | .06 | .05 | |
Log-logistic (LL) item parameter estimates
| LL Slope | LL Easiness | ||||
|---|---|---|---|---|---|
| 1 | 3.86 | 1.20 | .124 | .012 | .001 |
| 2 | 4.03 | 2.14 | .215 | .016 | .002 |
| 3 | 4.22 | 2.76 | .218 | .014 | .001 |
| 4 | 3.59 | 10.37 | .659 | .032 | .002 |
| 5 | 4.70 | 1.94 | .150 | .008 | .001 |
| 6 | 4.05 | 4.85 | .383 | .023 | .001 |
| 7 | 3.77 | 10.42 | .603 | .027 | .002 |
| 8 | 4.99 | 1.81 | .129 | .007 | .000 |
| Mean | 4.15 | 4.43 | .31 | .017 | .0011 |
| SD | .44 | 3.58 | .20 | .009 | .0006 |
Observed and model reproduced response proportions for graded response and log-logistic models
| Response proportions | ||||||
|---|---|---|---|---|---|---|
| Item | 0 | 1 | 2 | 3 | 4 | |
| 1 | .47 | .21 | .18 | .09 | .05 | |
| 2 | .42 | .20 | .21 | .11 | .06 | |
| 3 | .40 | .21 | .22 | .12 | .05 | |
| 4 | .28 | .25 | .28 | .14 | .06 | |
| 5 | .43 | .20 | .21 | .10 | .06 | |
| 6 | .35 | .22 | .24 | .14 | .06 | |
| 7 | .28 | .25 | .28 | .13 | .06 | |
| 8 | .44 | .19 | .20 | .12 | .05 | |
Chi-square significant alpha .
Fig. 2Item response curves under the graded response model and log-logistic model.
Fig. 3Average category response curves under the graded response model and log-logistic model.
Fig. 4EAP trait level estimates under the graded response model and log-logistic model.
Fig. 5EAP trait level estimates versus raw scores under the graded response model and log-logistic.
Fig. 6Test information under the graded response model and log-logistic model.
Fig. 7Confidence bands for trait level estimates under the graded response model and log-logistic model.