| Literature DB >> 29615016 |
David Hope1, Karen Adamson2, I C McManus3, Liliana Chis4, Andrew Elder4.
Abstract
BACKGROUND: Fairness is a critical component of defensible assessment. Candidates should perform according to ability without influence from background characteristics such as ethnicity or sex. However, performance differs by candidate background in many assessment environments. Many potential causes of such differences exist, and examinations must be routinely analysed to ensure they do not present inappropriate progression barriers for any candidate group. By analysing the individual questions of an examination through techniques such as Differential Item Functioning (DIF), we can test whether a subset of unfair questions explains group-level differences. Such items can then be revised or removed.Entities:
Keywords: Assessment; Bias; Fairness
Mesh:
Year: 2018 PMID: 29615016 PMCID: PMC5883583 DOI: 10.1186/s12909-018-1143-0
Source DB: PubMed Journal: BMC Med Educ ISSN: 1472-6920 Impact factor: 2.463
Fig. 1A comparison of Differential Item Functioning Curves. Note: “Score” is the overall performance on the examination. “Probability” is the likelihood of the examinee answering this question correctly. From left to right, the plots show no DIF, uniform DIF, non-uniform DIF and crossing DIF. Where there are two curves, the gap between the curves shows the DIF effect
A summary of items with Differential Item Functioning
| Diet | Total | Ethnicity | Sex | Both | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Negligible | Medium | Large | Negligible | Medium | Large | Negligible | Medium | Large | ||
| 2011 (1) | 10 | 6 | 4 | |||||||
| 2011 (2) | 5 | 5 | ||||||||
| 2011 (3) | 33 | 20 | 10 | 3 | ||||||
| 2012 (1) | 16 | 9 | 1 | 5 | 1 | |||||
| 2012 (2) | 8 | 3 | 5 | |||||||
| 2012 (3) | 26 | 3 | 22 | 1 | ||||||
| 2013 (1) | 8 | 3 | 4 | 1 | ||||||
| 2013 (2) | 10 | 5 | 5 | |||||||
| 2013 (3) | 15 | 5 | 1 | 8 | 1 | |||||
| 2014 (1) | 22 | 7 | 12 | 3 | ||||||
| 2014 (2) | 6 | 4 | 2 | |||||||
| 2014 (3) | 42 | 10 | 29 | 2 | 1 | |||||
| 2015 (1) | 6 | 5 | 1 | |||||||
| 2015 (2) | 10 | 2 | 7 | 1 | ||||||
| Combined | 217 | 87 | 2 | 0 | 114 | 5 | 1 | 8 | 0 | 0 |
Note: Following the convention on effect sizes for DIF described by Magis et al., effect sizes below the medium threshold are classed as negligible rather than small. Blanks indicate no items of that magnitude. “Total” refers to total number of significant DIF tests. As some items exhibited DIF for both ethnicity and sex, the total number of items identified as problematic is slightly lower
Eight items exhibiting medium- or large- effects of Differential Item Functioning
Note: “Topic/Area” summarises the subject matter. “Differential Item Functioning” describes whether there was a sex or ethnicity difference, the type of curve observed, and gives a description of the trend. For “plot,” probability indicates the probability of answering the item correctly. Score indicates the candidate’s performance overall. Typically, candidates who score well on the exam should be more likely to answer the item correctly, but this is not always the case. The solid “reference” line refers to males, or ethnically white UK graduates, depending on whether a sex or ethnicity difference was found. The dashed line refers to females or ethnically non-white UK graduates. See Fig. 1 for more details about curve types. “Question” is the text of the question as seen by candidates. “Options” lists the five available options. The keyed answer is underlined
[Note for publication – high resolution versions of each plot are included as “Additional file 1”]