| Literature DB >> 30687144 |
Eva A O Zijlmans1, Jesper Tijmstra1, L Andries van der Ark2, Klaas Sijtsma1.
Abstract
This study investigates the usefulness of item-score reliability as a criterion for item selection in test construction. Methods MS, λ6, and CA were investigated as item-assessment methods in item selection and compared to the corrected item-total correlation, which was used as a benchmark. An ideal ordering to add items to the test (bottom-up procedure) or omit items from the test (top-down procedure) was defined based on the population test-score reliability. The orderings the four item-assessment methods produced in samples were compared to the ideal ordering, and the degree of resemblance was expressed by means of Kendall's τ. To investigate the concordance of the orderings across 1,000 replicated samples, Kendall's W was computed for each item-assessment method. The results showed that for both the bottom-up and the top-down procedures, item-assessment method CA and the corrected item-total correlation most closely resembled the ideal ordering. Generally, all item assessment methods resembled the ideal ordering better, and concordance of the orderings was greater, for larger sample sizes, and greater variance of the item discrimination parameters.Entities:
Keywords: corrected item-total correlation; correction for attenuation; item selection in test construction; item-score reliability; method CA; method MS; method λ6
Year: 2019 PMID: 30687144 PMCID: PMC6336834 DOI: 10.3389/fpsyg.2018.02298
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Example item-selection procedure following the bottom-up procedure based on the test-score reliability .
| ⋯ | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 20 | 1 | 0.462 | 20 | 1 | 0.556 | ⋯ | 20 | 1 | 0.807 | 20 | 1 | 0.809 |
| 19 | 2 | 0.467 | 19 | 2 | 0.560 | ⋯ | 19 | 2 | 0.808 | 19 | ||
| 3 | 0.473 | 18 | 3 | 0.564 | ⋯ | 18 | 18 | |||||
| 4 | 0.479 | 4 | 0.568 | ⋯ | 17 | 17 | ||||||
| 5 | 0.484 | 5 | 0.573 | ⋯ | 16 | 16 | ||||||
| 6 | 0.491 | 6 | 0.577 | ⋯ | 15 | 15 | ||||||
| 7 | 0.497 | 7 | 0.582 | ⋯ | 14 | 14 | ||||||
| 8 | 0.503 | 8 | 0.586 | ⋯ | 13 | 13 | ||||||
| 9 | 0.510 | 9 | 0.591 | ⋯ | 12 | 12 | ||||||
| 10 | 0.516 | 10 | 0.595 | ⋯ | 11 | 11 | ||||||
| 11 | 0.523 | 11 | 0.600 | ⋯ | 10 | 10 | ||||||
| 12 | 0.530 | 12 | 0.605 | ⋯ | 9 | 9 | ||||||
| 13 | 0.536 | 13 | 0.610 | ⋯ | 8 | 8 | ||||||
| 14 | 0.543 | 14 | 0.615 | ⋯ | 7 | 7 | ||||||
| 15 | 0.550 | 15 | 0.620 | ⋯ | 6 | 6 | ||||||
| 16 | 0.557 | 16 | 0.625 | ⋯ | 5 | 5 | ||||||
| 17 | 0.564 | ⋯ | 4 | 4 | ||||||||
| ⋯ | 3 | |||||||||||
The example is based on the condition with small variance of discrimination parameters. The .
Example item-selection procedure following the top-down procedure based on the test-score reliability .
| ⋯ | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ⋯ | 1 | 0.571 | 1 | ||||||||
| 2 | 0.809 | 3 | 0.807 | ⋯ | 2 | 18 | 0.564 | 2 | 19 | 0.47 | ||
| 3 | 0.809 | 4 | 0.807 | ⋯ | 3 | 19 | 0.557 | 3 | 20 | 0.46 | ||
| 4 | 0.808 | 5 | 0.806 | ⋯ | 4 | 20 | 0.550 | 4 | ||||
| 5 | 0.808 | 6 | 0.806 | ⋯ | 5 | 5 | ||||||
| 6 | 0.807 | 7 | 0.805 | ⋯ | 6 | 6 | ||||||
| 7 | 0.806 | 8 | 0.804 | ⋯ | 7 | 7 | ||||||
| 8 | 0.806 | 9 | 0.804 | ⋯ | 8 | 8 | ||||||
| 9 | 0.805 | 10 | 0.803 | ⋯ | 9 | 9 | ||||||
| 10 | 0.804 | 11 | 0.802 | ⋯ | 10 | 10 | ||||||
| 11 | 0.804 | 12 | 0.801 | ⋯ | 11 | 11 | ||||||
| 12 | 0.803 | 13 | 0.800 | ⋯ | 12 | 12 | ||||||
| 13 | 0.802 | 14 | 0.799 | ⋯ | 13 | 13 | ||||||
| 14 | 0.801 | 15 | 0.799 | ⋯ | 14 | 14 | ||||||
| 15 | 0.800 | 16 | 0.798 | ⋯ | 15 | 15 | ||||||
| 16 | 0.800 | 17 | 0.797 | ⋯ | 16 | 16 | ||||||
| 17 | 0.799 | 18 | 0.796 | ⋯ | 17 | |||||||
| 18 | 0.798 | 19 | 0.795 | ⋯ | ||||||||
| 19 | 0.797 | 20 | 0.794 | ⋯ | ||||||||
| 20 | 0.796 | ⋯ | ||||||||||
The example is based on the condition with small variance of discrimination parameters. The .
Item Parameters used to Generate the Item Scores.
| Item 1 | 0.61 | 0.37 | 0.14 | 0 |
| Item 2 | 0.64 | 0.41 | 0.17 | 0 |
| Item 3 | 0.67 | 0.45 | 0.21 | 0 |
| Item 4 | 0.71 | 0.50 | 0.25 | 0 |
| Item 5 | 0.75 | 0.56 | 0.31 | 0 |
| Item 6 | 0.79 | 0.62 | 0.39 | 0 |
| Item 7 | 0.83 | 0.69 | 0.48 | 0 |
| Item 8 | 0.88 | 0.77 | 0.59 | 0 |
| Item 9 | 0.92 | 0.85 | 0.73 | 0 |
| Item 10 | 0.97 | 0.95 | 0.90 | 0 |
| Item 11 | 1.03 | 1.05 | 1.11 | 0 |
| Item 12 | 1.08 | 1.17 | 1.37 | 0 |
| Item 13 | 1.14 | 1.30 | 1.69 | 0 |
| Item 14 | 1.20 | 1.45 | 2.09 | 0 |
| Item 15 | 1.27 | 1.61 | 2.58 | 0 |
| Item 16 | 1.34 | 1.78 | 3.18 | 0 |
| Item 17 | 1.41 | 1.98 | 3.93 | 0 |
| Item 18 | 1.48 | 2.20 | 4.85 | 0 |
| Item 19 | 1.56 | 2.45 | 5.99 | 0 |
| Item 20 | 1.65 | 2.72 | 7.39 | 0 |
α = discrimination parameter, β = location parameter. The sets of discrimination parameters had the same mean, and contain values equidistantly. spaced, ranging from -0.5 to 0.5, -1 to 1, and -2 to 2 on the log scale, respectively.
Mean Kendall's τ for 1,000 replications between the ordering based on the population test-score reliability and the ordering produced by the three item-score reliability methods and the corrected item-total correlation (CITC), for the bottom-up and the top-down procedure in the six different conditions.
| Method MS | 0.44 (0.13) | 0.67 (0.08) | 0.80 (0.06) | 0.69 (0.08) | 0.83 (0.05) | 0.89 (0.04) |
| Method λ6 | 0.55 (0.11) | 0.75 (0.07) | 0.83 (0.05) | 0.80 (0.05) | 0.91 (0.03) | 0.94 (0.03) |
| Method CA | 0.58 (0.10) | 0.78 (0.06) | 0.87 (0.05) | 0.81 (0.05) | 0.92 (0.03) | 0.96 (0.02) |
| CITC | 0.58 (0.10) | 0.78 (0.06) | 0.87 (0.04) | 0.81 (0.05) | 0.92 (0.03) | 0.96 (0.02) |
| Method MS | 0.46 (0.14) | 0.64 (0.10) | 0.75 (0.08) | 0.61 (0.11) | 0.73 (0.09) | 0.80 (0.08) |
| Method λ6 | 0.55 (0.10) | 0.75 (0.07) | 0.83 (0.06) | 0.81 (0.05) | 0.91 (0.03) | 0.94 (0.03) |
| Method CA | 0.59 (0.10) | 0.78 (0.06) | 0.87 (0.05) | 0.81 (0.05) | 0.92 (0.03) | 0.96 (0.02) |
| CITC | 0.59 (0.10) | 0.78 (0.06) | 0.87 (0.04) | 0.81 (0.05) | 0.92 (0.03) | 0.96 (0.02) |
The Standard Deviation is in Parentheses.
Kendall's W for 1,000 replications between the ordering based on the population test-score reliability and the ordering produced by the three item-score reliability methods and the corrected item-total correlation (CITC), for the bottom-up and the top-down procedure in the six different conditions.
| Method MS | 0.53 | 0.79 | 0.90 | 0.80 | 0.92 | 0.96 |
| Method λ6 | 0.65 | 0.87 | 0.92 | 0.91 | 0.97 | 0.98 |
| Method CA | 0.69 | 0.89 | 0.95 | 0.91 | 0.97 | 0.99 |
| CITC | 0.69 | 0.89 | 0.95 | 0.91 | 0.97 | 0.99 |
| Method MS | 0.37 | 0.65 | 0.80 | 0.61 | 0.77 | 0.85 |
| Method λ6 | 0.53 | 0.82 | 0.89 | 0.88 | 0.96 | 0.98 |
| Method CA | 0.59 | 0.85 | 0.93 | 0.88 | 0.96 | 0.98 |
| CITC | 0.59 | 0.85 | 0.93 | 0.88 | 0.96 | 0.98 |
Figure 1Range of -values between the 2.5 and 97.5 percentiles of 1, 000 values produced by method CA in the six conditions for the top-down procedure. The black line indicates the -value for the ideal ordering.
Figure 2Range of - values between the 2.5 and 97.5 percentiles of 1, 000 values produced by method MS in the six conditions for the top-down procedure. The black line indicates the -value for the ideal ordering.