| Literature DB >> 28638360 |
Haiyan Zhao1,2, Björn Andersson3, Boliang Guo4, Tao Xin3.
Abstract
Writing assessments are an indispensable part of most language competency tests. In our research, we used cross-classified models to study rater effects in the real essay rating process of a large-scale, high-stakes educational examination administered in China in 2011. Generally, four cross-classified models are suggested for investigation of rater effects: (1) the existence of sequential effects, (2) the direction of the sequential effects, and (3) differences in raters by their individual characteristics. We applied these models to the data to account for possible cluster effects caused by the application of multiple rating strategies. The results of our research showed that raters demonstrated sequential effects during the rating process. In contrast to many other studies on rater effects, our study found that raters exhibited assimilation effects. The more experienced, lenient, and qualified raters were less susceptible to assimilation effects. In addition, our research demonstrated the feasibility and appropriateness of using cross-classified models in assessing rater effects for such data structures. This paper also discusses the implications for educators and practitioners who are interested in reducing sequential effects in the rating process, and suggests directions for future research.Entities:
Keywords: cross-classified models; large-scale educational assessment; multilevel modeling; rater bias; rater effects; sequential effects
Year: 2017 PMID: 28638360 PMCID: PMC5461360 DOI: 10.3389/fpsyg.2017.00933
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Description of variables included in the cross-classified models.
| Name | Description | Level | |
|---|---|---|---|
| Scores of the present essay item | Score | 146,727 | |
| Scores of the verbal section | Essay (student) | 67,500 | |
| Scores of the other essay item in the same test | Essay (student) | 67,500 | |
| The proportion of high scores in the nine previous scores | Score | 146,727 | |
| Times of rating similar tasks | Rater | 88 | |
| The proportions of essays rated by a third rater | Rater | 88 | |
| The means of all scores made by a rater | Rater | 88 | |
Parameter estimates (SE) for model 1 and model 2.
| Model 1 | Model 2 | |
|---|---|---|
| Fixed | ||
| Constant (β0) | 8.725 (0.062) | 8.596 (0.051) |
| ( | 0.077 (0.001) | |
| ( | 0.698 (0.004) | |
| | 1.788 (0.037) | |
| Random | ||
| Between rater variance | 0.309 (0.050) | 0.229 (0.036) |
| Between essay variance | 9.983 (0.102) | 1.744 (0.028) |
| Residual variance | 8.962 (0.039) | 4.910 (0.021) |
| DIC | 765936.1 | 668997.4 |
| DIC change (compared with the precious adjacent model) | -96938.4 | |
Parameter estimates (SE) for model 3 and model 4.
| Model 3 | Model 4 | |
|---|---|---|
| Fixed | ||
| Constant (β0) | 8.618 (0.064) | 8.661 (0.031) |
| ( | 0.077 (0.001) | 0.077 (0.001) |
| ( | 0.698 (0.003) | 0.698 (0.003) |
| | 1.740 (0.109) | 1.692 (0.082) |
| ( | 0.017 (0.017) | |
| | -0.117 (0.047) | |
| ( | 0.773 (0.054) | |
| | -0.355 (0.145) | |
| ( | -5.358 (0.906) | |
| | 16.235 (2.426) | |
| Random | ||
| Between-rater variance | 0.349 (0.055) | 0.069 (0.012) |
| Variance of | 0.857 (0.151) | 0.443 (0.092) |
| Covariance of | -0.362 (0.077) | -0.095 (0.027) |
| Between-essay variance | 1.739 (0.028) | 1.738 (0.028) |
| Residual variance | 4.886 (0.021) | 4.886 (0.021) |
| DIC | 668338.7 | 668324.3 |
| DIC change (compared with the precious adjacent model) | -658.7 | -14.4 |