| Literature DB >> 24146718 |
Maja Bučar Pajek1, Ivan Cuk, Jernej Pajek, Marjeta Kovač, Bojan Leskošek.
Abstract
In the present study, the reliability and validity of judging at the European championship in Berlin 2011 were analysed and the results were compared to a different level gymnastic competition - Universiade 2009 in Belgrade. For reliability and consistency assessment, mean absolute judge deviation from final execution score, Cronbach's alpha coefficient, intra-class correlations (ICC) and Armor's theta coefficient were calculated. For validity assessment mean deviations of judges' scores, Kendall's coefficient of concordance W and ANOVA eta-squared values were used. For Berlin 2011 in general Cronbach's alpha was above 0.95, minima of item-total correlations were above 0.8, and the ICC of average scores and Armor's theta were above 0.94. Comparison with Universiade 2009 identified vault and floor scores at both competitions to have inferior reliability indices. At both competitions average deviations of judges from the final E score were close to zero (p=0.84) but Berlin 2011 competition showed a higher number of apparatuses with significant Kendall's W (5 vs. 2 for Universiade 2009) and higher eta-squared values indicating higher judge panel bias in all-round and apparatus finals. In conclusion, the quality of judging was comparable at examined gymnastics competitions of different levels. Further work must be done to analyse the inferior results at vault and floor apparatuses.Entities:
Keywords: aesthetic sports; bias; evaluation; sport statistics; validity
Year: 2013 PMID: 24146718 PMCID: PMC3796836 DOI: 10.2478/hukin-2013-0038
Source DB: PubMed Journal: J Hum Kinet ISSN: 1640-5544 Impact factor: 2.193
Statistics of E scores with mean and standard deviation (SD), comparison to Universiade 2009 results
| Session | apparatus | N |
N
| Mean | SD |
Mean
|
SD
| D score |
D score
|
|---|---|---|---|---|---|---|---|---|---|
| 1 | VT | 97 | 71 | 8.40 | 0.33 | 8.35 | 0.42 | 5±0.5 |
4.8±0.48
|
| UB | 75 | 46 | 7.40 | 0.80 | 7.28 | 1.22 | 4,97±1,16 | 4,86±1 | |
| BB | 79 | 47 | 7.25 | 0.97 | 7.28 | 1.02 | 5.2±0.71 |
4.81±0.7
| |
| FX | 73 | 47 | 7.69 | 0.59 | 7.92 | 0.49 | 5±0.6 | 4.83±0.56 | |
|
| |||||||||
| 2 | VT | 24 | 24 | 8.55 | 0.20 | 8.38 | 0.49 | 5.3±0.52 |
4.95±0.37
|
| UB | 23 | 24 | 7.78 | 0.53 | 7.19 | 1.30 | 5.57±0.41 |
4.83±1
| |
| BB | 23 | 24 | 7.86 | 0.80 | 7.28 | 1.05 | 5.6±0.46 |
4.96±0.71
| |
| FX | 23 | 24 | 8.11 | 0.39 | 7.68 | 0.57 | 5.3±0.3 |
4.98±0.44
| |
|
| |||||||||
| 3 | VT | 16 | 16 | 8.57 | 0.43 | 8.75 | 0.14 | 5.6±0.54 | 5.19±0.51 |
| UB | 8 | 8 | 8.18 | 0.56 | 8.40 | 0.63 | 6.13±0.33 | 6.06±0.64 | |
| BB | 8 | 8 | 7.96 | 0.26 | 8.23 | 0.65 | 5.9±0.34 | 5.61±0.5 | |
| FX | 8 | 8 | 8.44 | 0.58 | 8.48 | 0.20 | 5.7±0.24 | 5.43±0.3 | |
Results are from Universiade 2009 in Belgrade (
Session 1,2,3: qualifications, all around finals, apparatus finals; VT: vault;
UB: uneven bars; BB: balance beam;
FX: floor; N: number of competitors.
the difference between D scores in both competitions is significant with p<0.05
The performance of individual judges
| Session | App | Dev max |
Dev max
| Ab dev max |
Ab dev max
| R min |
R min
| Cα |
Cα
|
|---|---|---|---|---|---|---|---|---|---|
| 1 | VT | −0.05 | −0.09 | 0.13 | 0.15 | 0.86 | 0.83 | 0.97 | 0.94 |
| UB | 0.08 | 0.09 | 0.27 | 0.28 | 0.88 | 0.92 | 0.97 | 0.98 | |
| BB | 0.11 | −0.25 | 0.25 | 0.34 | 0.94 | 0.92 | 0.99 | 0.97 | |
| FX | −0.05 | −0.03 | 0.20 | 0.15 | 0.88 | 0.88 | 0.97 | 0.95 | |
|
| |||||||||
| 2 | VT | −0.12 | −0.07 | 0.24 | 0.11 | 0.50 | 0.93 | 0.90 | 0.98 |
| UB | −0.18 | 0.07 | 0.26 | 0.20 | 0.87 | 0.97 | 0.97 | 0.99 | |
| BB | −0.1 | −0.08 | 0.18 | 0.24 | 0.93 | 0.92 | 0.99 | 0.98 | |
| FX | 0.16 | 0.06 | 0.21 | 0.14 | 0.80 | 0.91 | 0.94 | 0.97 | |
|
| |||||||||
| 3 | VT | −0.07 | −0.04 | 0.14 | 0.16 | 0.84 | 0.2 | 0.98 | 0.71 |
| UB | 0.13 | −0.08 | 0.18 | 0.20 | 0.91 | 0.9 | 0.98 | 0.98 | |
| BB | 0.17 | −0.13 | 0.21 | 0.22 | 0.97 | 0.89 | 0.99 | 0.98 | |
| FX | −0.08 | 0.12 | 0.14 | 0.21 | 0.69 | 0.36 | 0.94 | 0.83 | |
Presented results are from Universiade 2009 in Belgrade (
There were no statistically significant differences when values from Berlin and Belgrade competition were tested with the Mann-Whitney’s test.
Session 1,2,3: qualifications, all around finals, apparatus finals;
VT: vault; UB: uneven bars; BB: balance beam; FX: floor;
Dev max: maximal judge average deviation from E score, Ab dev max: maximum of average absolute deviation from E score;
R min: minimum of corrected item-total correlation of individual judges;
Cα: Cronbach’s alpha coefficient.
Figure 1
Boxplot for mean deviations (a measure of bias, dark grey) and mean absolute deviation (a measure of reliability, light grey) for both compared competitions.
P=0.84 for mean deviations difference between competition and p=0.25 for mean absolute deviation differences between competitions.
Figure 2
The eta-squared values of E-scores for Berlin 2011 and Belgrade 2009 competitions. (1 - qualifications, 2 - all around finals, 3 - apparatus finals;
VT: vault; UB: uneven bars; BB: balance beam; FX: floor)
Figure 3
Correlation matrix for between-judge correlations.
The remarkably inferior correlations below 0.7 are shown bold.
VT: vault; UB: uneven bars; BB: balance beam; FX: floor.
Overall measures of inter-judge reliability
| Session | Apparatus | ICC single |
ICC single
| ICC average |
| Armor’s theta | Kendall’s W | p(W) |
Kendall’s W
|
p(W)
|
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | VT | 0.83 |
| 0.97 | 5.21 | 0.97 |
|
|
|
|
| UB | 0.84 | 0.91 | 0.97 | 5.23 | 0.97 | 0.01 | 0.72 | 0.04 | 0.16 | |
| BB | 0.92 | 0.88 | 0.99 | 5.63 | 0.99 | 0.03 | 0.06 |
|
| |
| FX | 0.84 | 0.83 | 0.97 | 5.23 | 0.97 |
|
| 0.01 | 0.65 | |
|
| ||||||||||
| 2 | VT |
| 0.91 | 0.89 | 4.29 | 0.92 | 0.06 | 0.19 | 0.08 | 0.11 |
| UB | 0.84 | 0.97 | 0.97 | 5.35 | 0.98 |
|
| 0.07 | 0.17 | |
| BB | 0.93 | 0.92 | 0.99 | 5.70 | 0.99 |
|
| 0.01 | 0.78 | |
| FX |
| 0.89 | 0.93 | 4.80 | 0.95 |
|
| 0.04 | 0.43 | |
|
| ||||||||||
| 3 | VT | 0.90 |
| 0.98 | 5.56 | 0.98 | 0.08 | 0.28 | 0.05 | 0.59 |
| UB | 0.87 | 0.88 | 0.98 | 5.43 | 0.98 | 0.15 | 0.32 | 0.06 | 0.81 | |
| BB | 0.95 | 0.87 | 0.99 | 5.84 | 0.99 | 0.24 | 0.09 | 0.12 | 0.43 | |
| FX |
|
| 0.95 | 4.76 | 0.95 | 0.10 | 0.55 | 0.10 | 0.57 | |
Presented results are from Universiade 2009 in Belgrade (
For ICC single the correlation coefficients below 0.8 are put in bold.
For Kendall’s coefficient of concordance the significant values (expressing bias in the judge panel) are put in bold.
Session 1,2,3: qualifications, all around finals, apparatus finals;
VT: vault; UB: uneven bars; BB: balance beam; FX: floor;
ICC single (average): intra-class correlation for single (average) scores;
λ
p(W): p value of Kendall’s W.