| Literature DB >> 31488915 |
Paul-Christian Bürkner1, Niklas Schulte1, Heinz Holling1.
Abstract
Forced-choice questionnaires have been proposed to avoid common response biases typically associated with rating scale questionnaires. To overcome ipsativity issues of trait scores obtained from classical scoring approaches of forced-choice items, advanced methods from item response theory (IRT) such as the Thurstonian IRT model have been proposed. For convenient model specification, we introduce the thurstonianIRT R package, which uses Mplus, lavaan, and Stan for model estimation. Based on practical considerations, we establish that items within one block need to be equally keyed to achieve similar social desirability, which is essential for creating forced-choice questionnaires that have the potential to resist faking intentions. According to extensive simulations, measuring up to five traits using blocks of only equally keyed items does not yield sufficiently accurate trait scores and inter-trait correlation estimates, neither for frequentist nor for Bayesian estimation methods. As a result, persons' trait scores remain partially ipsative and, thus, do not allow for valid comparisons between persons. However, we demonstrate that trait scores based on only equally keyed blocks can be improved substantially by measuring a sizable number of traits. More specifically, in our simulations of 30 traits, scores based on only equally keyed blocks were non-ipsative and highly accurate. We conclude that in high-stakes situations where persons are motivated to give fake answers, Thurstonian IRT models should only be applied to tests measuring a sizable number of traits.Entities:
Keywords: Mplus; R; Stan; Thurstonian IRT model; forced-choice format; ipsative data; lavaan; multidimensional IRT
Year: 2019 PMID: 31488915 PMCID: PMC6713979 DOI: 10.1177/0013164419832063
Source DB: PubMed Journal: Educ Psychol Meas ISSN: 0013-1644 Impact factor: 2.821
Figure 1.Relationship between true and estimated trait scores for one of three independent traits using 27 blocks per trait. Plots on the left- and right-hand sides show results for equally and unequally keyed triplets, respectively. Shaded areas are 95% confidence intervals. Regression curves were estimated by thin-plate splines.
Figure 2.Relationship between estimated trait scores and their standard errors computed with Stan based on three independent traits and 27 triplets. Plots on the left- and right-hand sides show results for equally (+) and unequally (+/−) keyed triplets, respectively. Each point corresponds to a particular person’s trait score and corresponding standard error. Horizontal dashed lines indicate mean standard errors. Colors indicate different traits.
Two Real-World Correlation Matrices of Big Five Personality Scores.
| NEO-PIR | Own Project | |||||||
|---|---|---|---|---|---|---|---|---|
| N | E | C | A | N | E | C | A | |
| E | −0.21 | −0.33 | ||||||
| C | −0.53 | 0.27 | −0.43 | 0.30 | ||||
| A | −0.25 | 0.00 | 0.24 | −0.37 | 0.32 | 0.27 | ||
| O | 0.00 | 0.40 | 0.00 | 0.00 | −0.04 | 0.16 | 0.05 | 0.13 |
Note. N = neuroticism; E = extraversion; C = conscientiousness; A = agreeableness; O = openness to experiences.
Selected Simulation Results of T-IRT Models Fitted With Mplus.
| Conditions | Traits = 3 | Traits = 5 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| BpT |
|
| Conv | Rel | RMSE |
| Conv | Rel | RMSE |
|
| 9 |
| 0 | 0.97 | 0.54 | 0.73 | −0.44 | 0.94 | 0.60 | 0.67 | −0.22 |
| 9 |
| 0.3 | 1.00 | 0.39 | 0.87 | −0.66 | 1.00 | 0.43 | 0.83 | −0.42 |
| 9 |
| NEO | 0.87 | 0.64 | 0.63 | −0.27 | 0.90 | 0.64 | 0.63 | −0.16 |
| 9 | + | 0 | 0.84 | 0.59 | 0.68 | −0.44 | 0.93 | 0.73 | 0.54 | −0.22 |
| 9 | + | 0.3 | 0.90 | 0.44 | 0.82 | −0.70 | 0.97 | 0.56 | 0.71 | −0.44 |
| 9 | + | NEO | 0.70 | 0.69 | 0.58 | −0.26 | 0.83 | 0.77 | 0.49 | −0.15 |
| 9 | +/− | 0 | 0.98 | 0.88 | 0.35 | 0.01 | 0.98 | 0.90 | 0.32 | 0.00 |
| 9 | +/− | 0.3 | 0.97 | 0.88 | 0.35 | 0.00 | 0.99 | 0.90 | 0.32 | 0.00 |
| 9 | +/− | NEO | 0.97 | 0.88 | 0.35 | 0.01 | 0.95 | 0.90 | 0.32 | 0.00 |
| 27 |
| 0 | 1.00 | 0.67 | 0.60 | −0.40 | 1.00 | 0.75 | 0.51 | −0.20 |
| 27 |
| 0.3 | 1.00 | 0.56 | 0.71 | −0.52 | 1.00 | 0.64 | 0.64 | −0.32 |
| 27 |
| NEO | 1.00 | 0.76 | 0.50 | −0.25 | 1.00 | 0.79 | 0.47 | −0.15 |
| 27 | + | 0 | 0.94 | 0.69 | 0.58 | −0.38 | 0.94 | 0.82 | 0.44 | −0.19 |
| 27 | + | 0.3 | 0.98 | 0.60 | 0.67 | −0.50 | 0.97 | 0.73 | 0.54 | −0.28 |
| 27 | + | NEO | 0.96 | 0.77 | 0.49 | −0.25 | 0.93 | 0.85 | 0.40 | −0.14 |
| 27 | +/− | 0 | 0.61 | 0.95 | 0.22 | 0.00 | 0.30 | 0.96 | 0.19 | 0.00 |
| 27 | +/− | 0.3 | 0.56 | 0.95 | 0.23 | −0.02 | 0.28 | 0.96 | 0.20 | 0.00 |
| 27 | +/− | NEO | 0.69 | 0.95 | 0.23 | 0.01 | 0.38 | 0.96 | 0.20 | 0.00 |
Note. Results were computed based on 100 simulation trials per condition. BpT = blocks per trait; Conv = convergence rate; Rel = reliability; RMSE = root mean squared error of z scores; -Bias = bias in the correlations of estimated trait scores; = positive factor loadings in [0.3, 0.7]; = positive factor loadings in [0.65, 0.95]; (+/−) = mixed factor loadings; (0) = independent traits; (0.3) = traits correlated by 0.3; (NEO) = correlations taken from the NEO-PIR.
Standard Deviations of Selected Simulation Results of T-IRT Models Fitted With Mplus.
| Conditions | Traits = 3 | Traits = 5 | ||||||
|---|---|---|---|---|---|---|---|---|
| BpT |
|
|
|
|
|
|
|
|
| 9 |
| 0 | 0.03 | 0.03 | 0.17 | 0.03 | 0.03 | 0.10 |
| 9 |
| 0.3 | 0.05 | 0.04 | 0.21 | 0.05 | 0.05 | 0.13 |
| 9 |
| NEO | 0.04 | 0.04 | 0.20 | 0.03 | 0.03 | 0.09 |
| 9 | + | 0 | 0.04 | 0.03 | 0.24 | 0.02 | 0.02 | 0.10 |
| 9 | + | 0.3 | 0.03 | 0.03 | 0.24 | 0.03 | 0.02 | 0.11 |
| 9 | + | NEO | 0.04 | 0.04 | 0.22 | 0.01 | 0.02 | 0.09 |
| 9 | +/− | 0 | 0.01 | 0.02 | 0.03 | 0.01 | 0.01 | 0.02 |
| 9 | +/− | 0.3 | 0.01 | 0.02 | 0.03 | 0.01 | 0.01 | 0.02 |
| 9 | +/− | NEO | 0.01 | 0.01 | 0.03 | 0.01 | 0.01 | 0.02 |
| 27 |
| 0 | 0.02 | 0.02 | 0.08 | 0.02 | 0.02 | 0.05 |
| 27 |
| 0.3 | 0.02 | 0.02 | 0.08 | 0.02 | 0.02 | 0.05 |
| 27 |
| NEO | 0.01 | 0.02 | 0.09 | 0.01 | 0.02 | 0.05 |
| 27 | + | 0 | 0.02 | 0.02 | 0.09 | 0.01 | 0.01 | 0.05 |
| 27 | + | 0.3 | 0.02 | 0.02 | 0.09 | 0.01 | 0.01 | 0.04 |
| 27 | + | NEO | 0.02 | 0.02 | 0.12 | 0.01 | 0.01 | 0.04 |
| 27 | +/− | 0 | 0.00 | 0.01 | 0.01 | 0.00 | 0.00 | 0.01 |
| 27 | +/− | 0.3 | 0.00 | 0.01 | 0.01 | 0.00 | 0.00 | 0.01 |
| 27 | +/− | NEO | 0.00 | 0.01 | 0.01 | 0.00 | 0.00 | 0.01 |
Note. Standard deviations were computed across on 100 simulation trials per condition. T-IRT = Thurstonian item response theory; BpT = blocks per trait; Conv = convergence rate; Rel = reliability; RMSE = root mean squared error of z scores; -Bias = bias in the correlations of estimated trait scores; = positive factor loadings in [0.3, 0.7]; (+) = positive factor loadings in [0.65, 0.95]; (+/−) = mixed factor loadings; (0) = independent traits; (0.3) = traits correlated by 0.3; (NEO) = correlations taken from the NEO-PIR.
Selected Simulation Results of T-IRT Models Fitted With Mplus, lavaan, or Stan.
| Conditions | Mplus | lavaan | Stan | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
| Rel | RMSE |
| Rel | RMSE |
| Rel | RMSE |
|
|
| 0 | 0.67 | 0.61 | −0.39 | 0.64 | 0.64 | −0.49 | 0.66 | 0.61 | −0.29 |
|
| 0.3 | 0.55 | 0.72 | −0.62 | 0.50 | 0.77 | −0.74 | 0.58 | 0.69 | −0.38 |
|
| NEO | 0.76 | 0.50 | −0.26 | 0.75 | 0.52 | −0.28 | 0.76 | 0.50 | −0.20 |
| + | 0 | 0.70 | 0.58 | −0.39 | 0.62 | 0.65 | −0.48 | 0.67 | 0.61 | −0.48 |
| + | 0.3 | 0.57 | 0.70 | −0.49 | 0.47 | 0.79 | −0.75 | 0.44 | 0.82 | −0.40 |
| + | NEO | 0.78 | 0.48 | −0.21 | 0.73 | 0.54 | −0.24 | 0.76 | 0.50 | −0.29 |
| +/− | 0 | 0.95 | 0.22 | −0.01 | 0.93 | 0.26 | −0.01 | 0.95 | 0.22 | 0.00 |
| +/− | 0.3 | 0.95 | 0.23 | −0.02 | 0.92 | 0.28 | 0.12 | 0.95 | 0.22 | −0.01 |
| +/− | NEO | 0.95 | 0.23 | 0.01 | 0.92 | 0.28 | −0.02 | 0.95 | 0.22 | 0.01 |
Note. Results are averaged across 3 traits each measured in 27 triplets. T-IRT = Thurstonian item response theory; Rel = reliability; RMSE = root mean squared error of z scores; -Bias = Bias in the correlations of estimated trait scores; = positive factor loadings in [0.3, 0.7]; (+) = positive factor loadings in [0.65, 0.95]; (+/−) = mixed factor loadings; (0) = independent traits; (0.3) = traits correlated by 0.3; (NEO) = correlations taken from the NEO-PIR.
Correlations of Trait Scores Coming From the Same Source.
| Source | Trait | Equally keyed triplets | Unequally keyed
triplets | ||||
|---|---|---|---|---|---|---|---|
| Trait1 | Trait2 | Trait3 | Trait1 | Trait2 | Trait3 | ||
| Truth | Trait1 | 1.00 | 1.00 | ||||
| Trait2 | 0.00 | 1.00 | 0.00 | 1.00 | |||
| Trait3 | 0.00 | 0.00 | 1.00 | 0.00 | 0.00 | 1.00 | |
| Mplus | Trait1 | 1.00 | 1.00 | ||||
| Trait2 | −0.26 | 1.00 | 0.01 | 1.00 | |||
| Trait3 | −0.47 | −0.48 | 1.00 | −0.05 | −0.05 | 1.00 | |
| lavaan | Trait1 | 1.00 | 1.00 | ||||
| Trait2 | −0.37 | 1.00 | 0.02 | 1.00 | |||
| Trait3 | −0.59 | −0.52 | 1.00 | −0.09 | −0.03 | 1.00 | |
| Stan | Trait1 | 1.00 | 1.00 | ||||
| Trait2 | −0.42 | 1.00 | 0.02 | 1.00 | |||
| Trait3 | −0.53 | −0.51 | 1.00 | −0.05 | −0.05 | 1.00 | |
Note. Estimated inter-trait correlations are based on three independent traits using 27 blocks per trait.
Correlations of Trait Scores Coming From Different Sources.
| Trait | Source | Equally keyed triplets | Unequally keyed
triplets | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Truth | Mplus | lavaan | Stan | Truth | Mplus | lavaan | Stan | ||
| Trait1 | Truth | 1.00 | 1.00 | ||||||
| Mplus | 0.83 | 1.00 | 0.98 | 1.00 | |||||
| lavaan | 0.78 | 0.97 | 1.00 | 0.97 | 0.99 | 1.00 | |||
| Stan | 0.81 | 0.99 | 0.98 | 1.00 | 0.98 | 1.00 | 0.99 | 1.00 | |
| Trait2 | Truth | 1.00 | 1.00 | ||||||
| Mplus | 0.83 | 1.00 | 0.98 | 1.00 | |||||
| lavaan | 0.78 | 0.96 | 1.00 | 0.97 | 0.99 | 1.00 | |||
| Stan | 0.81 | 0.99 | 0.98 | 1.00 | 0.98 | 1.00 | 0.99 | 1.00 | |
| Trait3 | Truth | 1.00 | 1.00 | ||||||
| Mplus | 0.85 | 1.00 | 0.98 | 1.00 | |||||
| lavaan | 0.81 | 0.96 | 1.00 | 0.96 | 0.99 | 1.00 | |||
| Stan | 0.83 | 0.98 | 0.98 | 1.00 | 0.98 | 1.00 | 0.99 | 1.00 | |
Note. Estimated inter-trait correlations are based on three independent traits using 27 blocks per trait.
Simulation Results of T-IRT Models for 30 Traits Fitted With Stan.
| Conditions | Factor scores | Bias of inter-trait
correlations | |||||
|---|---|---|---|---|---|---|---|
|
|
| Rel | RMSE |
|
| Minimum | Maximum |
| + | 0 | 0.87 | 0.36 | −0.03 | 0.02 | −0.09 | 0.02 |
| + | 0.3 | 0.70 | 0.57 | −0.30 | 0.03 | −0.39 | −0.22 |
| + | OPQ | 0.91 | 0.30 | 0.00 | 0.02 | −0.08 | 0.07 |
| +/− | 0 | 0.90 | 0.32 | 0.00 | 0.02 | −0.05 | 0.05 |
| +/− | 0.3 | 0.91 | 0.31 | 0.03 | 0.02 | −0.03 | 0.08 |
| +/− | OPQ | 0.92 | 0.29 | 0.00 | 0.02 | −0.05 | 0.07 |
Note. Results are averaged across 30 traits each measured in 10 triplets. T-IRT = Thurstonian item response theory; Rel = reliability; RMSE = root mean squared error; M = mean; SD = standard deviation; (+) = positive factor loadings; (+/−) = mixed factor loadings. (0) = independent traits; (0.3) = traits correlated by 0.3; (OPQ) = traits correlated according to the OPQ CM 4.2.
Figure 3.Relationship between estimated trait scores and their standard errors computed with Stan based on 30 traits correlated according to the OPQ CM 4.2 and each measured in 9 triplets. Plots on the left- and right-hand sides show results for equally (+) and unequally (+/−) keyed triplets, respectively. Each point corresponds to a particular person’s trait score and corresponding standard error. Horizontal dashed lines indicate mean standard errors. Colors indicate different traits.