| Literature DB >> 35936278 |
Muhamad Firdaus Mohd Noh1, Mohd Effendi Ewan Mohd Matore2.
Abstract
Evaluating candidates' answers in speaking skill is difficult and rarely explored. This task is challenging and can bring inconsistency in the rating quality among raters, especially in speaking assessments. Severe raters will bring more harm than good to the results that candidates receive. Many-faceted Rasch measurement (MFRM) was used to explore the differences in teachers' rating severity based on their rating experience, training experience, and teaching experience. The research uses a quantitative approach and a survey method to enlist 164 English teachers who teach lower secondary school pupils, who were chosen through a multistage clustered sampling procedure. All the facets involving teachers, candidates, items, and domains were calibrated using MFRM. Every teacher scored six candidates' responses in a speaking test consisting of three question items, and they were evaluated across three domains, namely vocabulary, grammar, and communicative competence. Results highlight that the rating quality was different in terms of teachers' rating experience and teaching experience. However, training experience did not bring any difference to teachers' rating quality on speaking test. The evidence from this study suggests that the two main factors of teaching and rating experience must be considered when appointing raters for the speaking test. The quality of training must be improved to produce a rater with good professional judgment. Raters need to be supplied with answer samples with varied levels of candidates' performance to practice before becoming a good rater. Further research might explore any other rater bias that may impact the psychological well-being of certain groups of students.Entities:
Keywords: MFRM; differences; rater; severity; speaking; teachers’ experience; teaching experience; training experience
Year: 2022 PMID: 35936278 PMCID: PMC9353031 DOI: 10.3389/fpsyg.2022.941084
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Respondents’ profile.
| Experiences | Number of teachers | Percentage (%) |
| Rating experience | ||
| No experience | 63 | 38.4 |
| 1–3 years of experience | 44 | 26.8 |
| 4–6 years of experience | 57 | 34.8 |
| Rater training experience | ||
| Have attended | 102 | 62.2 |
| Never attended | 62 | 37.8 |
| Teaching experience | ||
| 1–10 years of experience | 50 | 30.5 |
| 11–20 years of experience | 56 | 34.1 |
| More than 20 years of experience | 58 | 35.4 |
Report on item fit.
| Item | Total score | Logits | SE | Infit | Outfit | Point measure | |||
| MnSq | Zstd | MnSq | Zstd | Correlation | Expected | ||||
| Interview | 5,195 | −0.20 | 0.03 | 1.01 | 0.1 | 1.01 | 0.1 | 0.71 | 0.71 |
| Story telling | 5,120 | −0.13 | 0.03 | 1.00 | −0.1 | 0.99 | −0.1 | 0.69 | 0.71 |
| Discussion | 7,126 | 0.33 | 0.03 | 1.00 | −0.1 | 0.99 | −0.3 | 0.73 | 0.71 |
| Mean | 5,813.7 | 0.00 | 0.03 | 1.00 | 0.0 | 1.00 | −0.1 | 0.71 | – |
| SD (population) | 928.5 | 0.23 | 0.00 | 0.00 | 0.2 | 0.01 | 0.2 | 0.01 | – |
| SD (samples) | 1,137.1 | 0.29 | 0.00 | 0.01 | 0.2 | 0.01 | 0.3 | 0.02 | – |
Report on item separation.
| Statistics | Values |
| Separation ratio | 9.01 |
| Separation index | 12.35 |
| Separation reliability | 0.99 |
Report on scale functioning.
| Data | Quality control | Outfit MnSq | Rasch-Andrich threshold | Expectation measure at | Most probable for | Rasch-Thurstone threshold | Category Peak probability (%) | ||||||
| Scale | Used | % | Cum. % | Average | Expected | ||||||||
| 0 | 86 | 1 | 1 | −2.12 | −2.44 | 1.3 | − | − | −5.42 | Low | Low | 100 | |
| 1 | 807 | 12 | 13 | −1.63 | −1.58 | 0.9 | −4.27 | 0.11 | −3.22 | −4.5 | −4.27 | −4.37 | 59 |
| 2 | 2,278 | 33 | 46 | −0.52 | −0.53 | 1 | −2.1 | 0.04 | −1.09 | −2.13 | −2.1 | −2.11 | 57 |
| 3 | 2,711 | 40 | 86 | 0.66 | 0.66 | 1 | −0.12 | 0.03 | 1.17 | −0.01 | −0.12 | −0.07 | 64 |
| 4 | 785 | 11 | 98 | 1.87 | 1.87 | 1 | 2.51 | 0.04 | 3.28 | 2.3 | 2.51 | 2.39 | 51 |
| 5 | 161 | 2 | 100 | 2.94 | 2.89 | 1 | 3.98 | 0.09 | −5.2 | 4.36 | 3.98 | 4.14 | 100 |
Report on threshold changes.
| Pair of scales | Gaps | Threshold |
| S0–1 | 0.00 to −4.27 | 1.00 < 4.27 < 5.00 |
| S1–2 | − 4.27 to −2.1 | 1.00 < 2.17 < 5.00 |
| S2–3 | −2.1 to −0.12 | 1.00 < 1.98 < 5.00 |
| S3–4 | − 0.21 to 2.51 | 1.00 < 2.72 < 5.00 |
| S4–5 | 2.51 to 3.98 | 1.00 < 1.47 < 5.00 |
FIGURE 1Scale threshold values.
FIGURE 2Wright map.
Teachers’ differences in severity based on rating experience.
| Group | Total score | Total count | Observed average | Measure | Model SE |
| No experience | 5,062 | 1,837 | 2.8 | −0.20 | 0.03 |
| 1–3 years of experience | 6,077 | 2,384 | 2.5 | −0.06 | 0.03 |
| 4–6 years of experience | 6,312 | 2,607 | 2.4 | 0.03 | 0.03 |
| Mean | 5,813.7 | 2,276 | 2.6 | −0.08 | 0.03 |
| SD | 670 | 396.2 | 0.2 | 0.12 | 0.00 |
Fixed (all same) Chi-square: 26.0, df: 2, significances (probability): 0.00.
Rater facet report based on rating experience.
| Group | No experience | 1–3 years of experience | 4–6 years of experience |
| χ2 | 1,265.9 | 828.1 | 1,916.6 |
|
| 62 | 43 | 56 |
| Significance | 0.00 | 0.00 | 0.00 |
| Separation ratio | 4.35 | 4.21 | 5.72 |
| Separation index | 6.13 | 6.02 | 7.96 |
| Separation reliability | 0.95 | 0.95 | 0.97 |
Teachers’ differences in severity based on training experience.
| Group | Total score | Total count | Observed average | Measure | Model |
| Have attended training | 10,353 | 4,243 | 2.6 | −0.07 | 0.02 |
| Never attended training | 6,488 | 2,585 | 2.5 | −0.01 | 0.03 |
| Mean | 8,720.5 | 3,414 | 2.5 | −0.04 | 0.03 |
| SD | 3,157.2 | 1,172.4 | 0.1 | 0.05 | 0.00 |
Model, fixed (all same) Chi-square: 3.1, df: 1, significance (probability): 0.08.
Rater facet report based on training experience.
| Group | Attended training | No training |
| χ2 | 1,878.7 | 1,625.3 |
|
| 101 | 61 |
| Significance | 0.00 | 0.00 |
| Separation ratio | 4.18 | 4.96 |
| Separation index | 5.91 | 6.95 |
| Separation reliability | 0.95 | 0.96 |
Teachers’ differences in severity based on teaching experience.
| Group | Total score | Total count | Observed average | Measure | Model |
| 1–10 years of teaching experience | 6254 | 2418 | 2.6 | −0.10 | 0.03 |
| 11–20 years of teaching experience | 5948 | 2323 | 2.6 | −0.07 | 0.03 |
| More than 20 years of teaching experience | 5239 | 2087 | 2.5 | 0.02 | 0.03 |
| Mean | 5813.7 | 2276 | 2.6 | −0.05 | 0.3 |
| SD | 520.7 | 170.4 | 0.0 | 0.06 | 0.00 |
Model, fixed (all same) Chi-square: 7.3, df: 2, significance (probability): 0.00.
Rater facet report based on teaching experience.
| Group | No experience | 1–3 years of experience | 4–6 years of experience |
| χ2 | 1125.2 | 1462.8 | 1078.4 |
|
| 49 | 55 | 57 |
| Significance | 0.00 | 0.00 | 0.00 |
| Separation ratio | 4.64 | 5.05 | 8.18 |
| Separation index | 6.52 | 7.06 | 5.91 |
| Separation reliability | 0.96 | 0.96 | 0.95 |