| Literature DB >> 26973568 |
Wei Wang1, Fritz Drasgow2, Liwen Liu3.
Abstract
Mixed format tests (e.g., a test consisting of multiple-choice [MC] items and constructed response [CR] items) have become increasingly popular. However, the latent structure of item pools consisting of the two formats is still equivocal. Moreover, the implications of this latent structure are unclear: For example, do constructed response items tap reasoning skills that cannot be assessed with multiple choice items? This study explored the dimensionality of mixed format tests by applying bi-factor models to 10 tests of various subjects from the College Board's Advanced Placement (AP) Program and compared the accuracy of scores based on the bi-factor analysis with scores derived from a unidimensional analysis. More importantly, this study focused on a practical and important question-classification accuracy of the overall grade on a mixed format test. Our findings revealed that the degree of multidimensionality resulting from the mixed item format varied from subject to subject, depending on the disattenuated correlation between scores from MC and CR subtests. Moreover, remarkably small decrements in classification accuracy were found for the unidimensional analysis when the disattenuated correlations exceeded 0.90.Entities:
Keywords: bi-factor model; classification accuracy; constructed response items; item response theory; mixed format test
Year: 2016 PMID: 26973568 PMCID: PMC4770050 DOI: 10.3389/fpsyg.2016.00270
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1Examples of factor loading matrices for between-item multidimensionality models (left) and bi-factor within-item multidimensionality models (right). The factors in between-item multidimensional models may be correlated whereas all factors in the bi-factor model are uncorrelated.
Classification accuracy of Unidimensional and Bi-factor approaches for 10 advanced placement tests.
| English Literature | 2010 | 0.778 | 3 | 55 | 20,000 | 60.90 | 65.56 | 4.65% |
| English Literature | 2009 | 0.77 | 3 | 55 | 20,000 | 63.04 | 67.17 | 4.13% |
| English Language | 2009 | 0.81 | 3 | 55 | 20,000 | 63.62 | 67.56 | 3.94% |
| English Language | 2010 | 0.807 | 3 | 54 | 20,000 | 63.37 | 67.15 | 3.78% |
| European History | 2009 | 0.92 | 7 | 80 | 20,000 | 68.77 | 70.33 | 1.56% |
| World History | 2009 | 0.89 | 3 | 70 | 20,000 | 69.70 | 70.75 | 1.05% |
| US History | 2010 | 0.908 | 3 | 80 | 6936 | 69.32 | 70.24 | 0.92% |
| European History | 2008 | 0.89 | 7 | 80 | 20,000 | 68.14 | 69.06 | 0.92% |
| World History | 2008 | 0.91 | 3 | 70 | 20,000 | 70.92 | 70.07 | 0.85% |
| Physics B | 2008 | 0.96 | 7 | 70 | 11,941 | 76.73 | 77.06 | 0.33% |
The disattenuated correlation denotes the estimated true score correlation between the subtest scores for the MC and CR items; they were provided by College Board.
The number of CR items included in the tests.
The number of MC items included in the tests. To maximize the comparability of results, we used 3 CR items and 55 MC items for all 10 tests.
US History 2010 data set had a sample size of 20,000. However, only 6936 students chose the same three CR items (i.e., #1, #3, and #4 items).
Figure 2Analysis procedures for this study. The numbers in the parentheses in each box represents the procedural step described in Method section. The software MultiNorm was provided by Edwards (Version 1.0; Edwards, 2010b). The software BifactorWise was written by the authors and is available for free from the first author upon request. It estimates bi-factor thetas based on the response data and bi-factor item parameters.