| Literature DB >> 34899514 |
Linyu Liao1, Don Yao1.
Abstract
Differential Item Functioning (DIF) analysis is always an indispensable methodology for detecting item and test bias in the arena of language testing. This study investigated grade-related DIF in the General English Proficiency Test-Kids (GEPT-Kids) listening section. Quantitative data were test scores collected from 791 test takers (Grade 5 = 398; Grade 6 = 393) from eight Chinese-speaking cities, and qualitative data were expert judgments collected from two primary school English teachers in Guangdong province. Two R packages "difR" and "difNLR" were used to perform five types of DIF analysis (two-parameter item response theory [2PL IRT] based Lord's chi-square and Raju's area tests, Mantel-Haenszel [MH], logistic regression [LR], and nonlinear regression [NLR] DIF methods) on the test scores, which altogether identified 16 DIF items. ShinyItemAnalysis package was employed to draw item characteristic curves (ICCs) for the 16 items in RStudio, which presented four different types of DIF effect. Besides, two experts identified reasons or sources for the DIF effect of four items. The study, therefore, may shed some light on the sustainable development of test fairness in the field of language testing: methodologically, a mixed-methods sequential explanatory design was adopted to guide further test fairness research using flexible methods to achieve research purposes; practically, the result indicates that DIF analysis does not necessarily imply bias. Instead, it only serves as an alarm that calls test developers' attention to further examine the appropriateness of test items.Entities:
Keywords: DIF; GEPT-Kids; grade; listening; mixed-methods approach
Year: 2021 PMID: 34899514 PMCID: PMC8656356 DOI: 10.3389/fpsyg.2021.767244
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
DIF studies in language testing (1980–2017).
| Author(s) and Year of Study | Specific Focus/Foci |
|
| L1 language |
|
| L1 language |
|
| Test method |
|
| Academic major |
|
| L1 language |
| Gender and minorities | |
|
| Major field and test content |
|
| L1 language |
| L1 language and gender | |
|
| L1 language |
|
| Question type and listening |
|
| Gender |
|
| Tape-mediated test |
|
| L1 language and ethnicity |
|
| Text content |
|
| L1 language |
|
| L1 language |
|
| Different Englishes |
|
| L1 language |
|
| Academic major and gender |
| Gender | |
| L1 language | |
|
| L1 language |
|
| L1 language |
|
| Age |
| L1 language | |
|
| L1 language |
|
| Gender |
| Age | |
|
| L1 language |
|
| Age |
| Gender and socioeconomic status |
Items without an asterisk are cited from
Descriptive analysis of test performance data.
| Test items | Mean (Grade 5) | Mean (Grade 6) | Corrected item-total correlation (Grade 5) | Corrected item-total correlation (Grade 6) |
| L1 | 0.82 | 0.86 | 0.353 | 0.251 |
| L2 | 0.90 | 0.89 | 0.373 | 0.472 |
| L3 | 0.94 | 0.97 | 0.378 | 0.258 |
| L4 | 0.95 | 0.93 | 0.240 | 0.420 |
| L5 | 0.96 | 0.96 | 0.313 | 0.277 |
| L6 | 0.84 | 0.84 | 0.549 | 0.655 |
| L7 | 0.83 | 0.86 | 0.426 | 0.516 |
| L8 | 0.81 | 0.87 | 0.454 | 0.339 |
| L9 | 0.98 | 0.98 | 0.243 | 0.305 |
| L10 | 0.87 | 0.93 | 0.425 | 0.384 |
| L11 | 0.88 | 0.91 | 0.411 | 0.424 |
| L12 | 0.98 | 0.98 | 0.330 | 0.436 |
| L13 | 0.94 | 0.97 | 0.525 | 0.428 |
| L14 | 0.96 | 0.97 | 0.360 | 0.404 |
| L15 | 0.94 | 0.93 | 0.500 | 0.410 |
| L16 | 0.97 | 0.97 | 0.396 | 0.406 |
| L17 | 0.92 | 0.92 | 0.526 | 0.518 |
| L18 | 0.79 | 0.86 | 0.385 | 0.433 |
| L19 | 0.89 | 0.88 | 0.436 | 0.509 |
| L20 | 0.85 | 0.93 | 0.562 | 0.522 |
| L21 | 0.52 | 0.65 | 0.332 | 0.519 |
| L22 | 0.65 | 0.74 | 0.618 | 0.493 |
| L23 | 0.71 | 0.80 | 0.543 | 0.613 |
| L24 | 0.79 | 0.87 | 0.372 | 0.418 |
| L25 | 0.62 | 0.76 | 0.510 | 0.635 |
DIF items flagged by different methods.
| Method(s) | Flagged items | No. of flagged items |
| 2PL IRT based Lord’s chi-square test | L15, L17, L19, L22 | 4 |
| 2PL IRT based Raju’s area test | L6, L15, L17, L21, L22 | 5 |
| MH test | L4, L19, L20, L25 | 4 |
| LR test | L4, L6, L15, L19, L20, L21, L22, L25 | 8 |
| NLR test | L2, L3, L4, L6, L7, L9, L13, L15, L16, L17, L19, L20, L23, L25 | 14 |
Types of DIF effect.
| DIF type | Items | No. of flagged items | |
| Little effect | L6, L7, L16, L17 | 4 | |
| Uniform DIF | Favoring Grade 6 | L3, L13, L20, L22, L23, L25 | 6 |
| Favoring Grade 5 | L2, L4, L9, L19 | 4 | |
| Non-uniform DIF | L15, L21 | 2 | |
FIGURE 1ICC of listening L16.
FIGURE 2ICC of listening L3.
FIGURE 3ICC of listening L2.
FIGURE 4ICC of listening L21.