| Literature DB >> 34505277 |
Rodrigo Schames Kreitchmann1, Francisco J Abad2, Miguel A Sorrel2.
Abstract
The use of multidimensional forced-choice questionnaires has been proposed as a means of improving validity in the assessment of non-cognitive attributes in high-stakes scenarios. However, the reduced precision of trait estimates in this questionnaire format is an important drawback. Accordingly, this article presents an optimization procedure for assembling pairwise forced-choice questionnaires while maximizing posterior marginal reliabilities. This procedure is performed through the adaptation of a known genetic algorithm (GA) for combinatorial problems. In a simulation study, the efficiency of the proposed procedure was compared with a quasi-brute-force (BF) search. For this purpose, five-dimensional item pools were simulated to emulate the real problem of generating a forced-choice personality questionnaire under the five-factor model. Three factors were manipulated: (1) the length of the questionnaire, (2) the relative item pool size with respect to the questionnaire's length, and (3) the true correlations between traits. The recovery of the person parameters for each assembled questionnaire was evaluated through the squared correlation between estimated and true parameters, the root mean square error between the estimated and true parameters, the average difference between the estimated and true inter-trait correlations, and the average standard error for each trait level. The proposed GA offered more accurate trait estimates than the BF search within a reasonable computation time in every simulation condition. Such improvements were especially important when measuring correlated traits and when the relative item pool sizes were higher. A user-friendly online implementation of the algorithm was made available to the users.Entities:
Keywords: forced-choice format; genetic algorithms; ipsative data; multidimensional item response theory; reliability; test assembly
Mesh:
Year: 2021 PMID: 34505277 PMCID: PMC9170671 DOI: 10.3758/s13428-021-01677-4
Source DB: PubMed Journal: Behav Res Methods ISSN: 1554-351X
Fig. 1Schematic description of the sampling operator for one decision vector
Fig. 2Schematic description of the main loop
Trait correlation matrix observed in the NEO PI-R (Costa & McCrae, 1992) with neuroticism reversed to emotional stability
| ES | EX | OE | AG | CO | |
|---|---|---|---|---|---|
| ES | 1 | ||||
| EX | 0.21 | 1 | |||
| OE | 0 | 0.4 | 1 | ||
| AG | 0.25 | 0 | 0 | 1 | |
| CO | 0.53 | 0.27 | 0 | 0.24 | 1 |
Note. ES: emotional stability, EX: extraversion, OE: openness to experiences, AG: agreeableness, CO: conscientiousness.
Fig. 3Average posterior marginal reliability over time for the best candidates in the genetic algorithm and a brute-force search. Note. J = number of blocks; N:J ratio = items-to-block ratio.
Average trait recovery across 20 replications for questionnaires assembled using the genetic algorithm and a brute-force search
| Φ | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| GA | BFbest | BFavg | GA | BFbest | BFavg | GA | BFbest | BFavg | |||
| Identity | 30 | 2 | 0.72 | 0.68 | 0.65 | 0.54 | 0.57 | 0.59 | −0.10 | −0.12 | −0.14 |
| 8 | 0.75 | 0.70 | 0.65 | 0.51 | 0.55 | 0.59 | −0.07 | −0.11 | −0.14 | ||
| 60 | 2 | 0.82 | 0.79 | 0.76 | 0.43 | 0.46 | 0.49 | −0.08 | −0.10 | −0.12 | |
| 8 | 0.84 | 0.80 | 0.76 | 0.40 | 0.44 | 0.49 | −0.05 | −0.08 | −0.12 | ||
NEO PI-R | 30 | 2 | 0.69 | 0.65 | 0.59 | 0.56 | 0.60 | 0.64 | −0.13 | −0.16 | −0.20 |
| 8 | 0.74 | 0.68 | 0.59 | 0.51 | 0.57 | 0.64 | −0.07 | −0.12 | −0.20 | ||
| 60 | 2 | 0.81 | 0.77 | 0.73 | 0.44 | 0.48 | 0.52 | −0.08 | −0.11 | −0.15 | |
| 8 | 0.84 | 0.79 | 0.73 | 0.40 | 0.46 | 0.52 | −0.04 | −0.09 | −0.15 | ||
Note. Φ = true trait correlation matrix; J = number of blocks; N:J ratio = items-to-blocks ratio; = true reliability; = root mean square error; = trait correlation bias; GA: genetic algorithm; BFbest = best brute-force solution; BFavg = average of brute-force solutions. The standard deviations of the indicators across replications ranged from 0.003 to 0.016.
Generalized eta-squared effect sizes for mixed-effects ANOVAs of trait estimate recovery indicators
| Method | |||
| Method × | 0.12** | 0.01* | 0.01* |
| Method × | |||
| Method × | |||
| Method × | 0.01* | 0.00 | 0.01* |
| Method × | 0.02** | 0.00 | 0.03** |
| Method × | 0.02** | 0.02** | 0.02** |
| Method × | 0.00 | 0.00 | 0.00 |
| 0.01 | 0.00 | 0.01 | |
| 0.09** | 0.04* | 0.11** | |
| 0.02* | 0.02* | 0.03* | |
| 0.01 | 0.00 | 0.01 | |
Note. J = number of blocks; N:J ratio = items-to-blocks ratio; = true reliability; Φ = true trait correlation matrix; = root mean square error; = trait correlation bias; * p < 0.05; ** p < 0.001.
Fig. 4Average conditional RMSE, bias, and standard errors of estimates for different assembly methods and true trait correlation matrices (Φ)
Average trait recovery across 20 replications for questionnaires assembled using the genetic algorithm and a brute-force search with one half consisting of hetero-polar blocks
| GA | BFbest | BFavg | GA | BFbest | BFavg | GA | BFbest | BFavg | |||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Identity | 30 | 2 | 0.77 | 0.75 | 0.71 | 0.49 | 0.50 | 0.54 | 0.00 | 0.00 | 0.00 |
| 8 | 0.81 | 0.77 | 0.72 | 0.45 | 0.48 | 0.53 | 0.00 | 0.00 | 0.00 | ||
| 60 | 2 | 0.87 | 0.86 | 0.84 | 0.37 | 0.38 | 0.40 | 0.00 | 0.00 | 0.00 | |
| 8 | 0.89 | 0.87 | 0.84 | 0.34 | 0.37 | 0.40 | 0.00 | 0.00 | 0.00 | ||
| NEO PI-R | 30 | 2 | 0.78 | 0.76 | 0.73 | 0.48 | 0.49 | 0.52 | 0.01 | 0.01 | 0.02 |
| 8 | 0.82 | 0.78 | 0.74 | 0.44 | 0.47 | 0.51 | 0.01 | 0.01 | 0.01 | ||
| 60 | 2 | 0.87 | 0.86 | 0.85 | 0.36 | 0.37 | 0.39 | 0.00 | 0.01 | 0.01 | |
| 8 | 0.89 | 0.87 | 0.85 | 0.33 | 0.36 | 0.39 | 0.00 | 0.01 | 0.01 | ||
Note. Φ = true trait correlation matrix; J = number of blocks; N:J ratio = items-to-blocks ratio; = true reliability; = root mean square error; = trait correlation bias; GA: genetic algorithm; BFbest = best brute-force solution; BFavg = average of brute-force solutions. The standard deviations across the replications ranged from 0.004 to 0.026.