| Literature DB >> 33195498 |
Corinna C A Clark1, Nicola J Sibbald2, Nicola J Rooney3.
Abstract
Self-assessments of performance are commonly used in the human workplace, although compared to peer or supervisor ratings, they may be subject to positive biases or leniency. The use of subjective ratings scales in animal sciences is also common, although little consideration is usually given to possible rater bias. Dog handlers, work very closely and form strong relationships with their dogs and are also best placed to monitor dog performance since they often work in isolation. Previous work found ratings of search dog performance correlated well between experienced dog trainers, instructors, and scientists; but until now, there has been no investigation into ratings made by a dog's own handler. We compared handlers' subjective assessment of their own dog's search performance to scores given by other handlers and in a second study, to scores made by impartial raters. We found that handlers generally showed leniency; for example scoring their own dogs more favorably for Control (responsiveness to commands) and Strength of Indication. But the degree of bias varied with the trait being scored and between raters. Such differences may be attributable to greater desirability or importance of favorable scores for certain traits, or a lack of clarity of their precise meaning. Handlers may vary in susceptibility to bias due to differing levels of experience and the extent to which they view their dog's ability as dependent on their own. The exact causes require further investigation. We suggest working dog agencies provide rater-training to overcome leniency, improve reliability and validity, and to increase handler's motivation to provide accurate assessments. This study represents one of a series of steps to formulate robust, validated and evidence-based performance rating systems and has relevance to any situation where raters assess their own performance or others (particularly where they may have a vested interest in, or loyalty toward, the ratee).Entities:
Keywords: bias; leniency; rating; reliability; validity; working dog
Year: 2020 PMID: 33195498 PMCID: PMC7533607 DOI: 10.3389/fvets.2020.00612
Source DB: PubMed Journal: Front Vet Sci ISSN: 2297-1769
Study 1: Agreement between own handler and other handler scores (all ratings, N = 120).
| Own/other (all ratings) | 0.539 | 0.486 | 0.372 | 0.417 | 0.025 | 0.263 | 0.282 | 0.529 |
| Pair 1 | 0.480 | 0.257 | 0.312 | 0.525 | −0.379 | |||
| Pair 2 | 0.354 | 0.113 | −0.334 | 0.300 | 0.414 | 0.584 | ||
| Pair 3 | 0.556 | 0.365 | 0.479 | −0.061 | −0.007 | 0.191 | ||
| Pair 4 | 0.351 | 0.239 | −0.026 | 0.279 | −0.338 | 0.055 | 0.272 | |
| Pair 5 | 0.577 | 0.212 | 0.067 | −0.425 | −0.112 | 0.342 | 0.518 | |
| Pair 6 | 0.354 | 0.385 | 0.099 | 0.362 | 0.118 | 0.494 | 0.439 |
Correlation coefficients between handlers in each pair (Spearman's rho, 2-tailed, N = 20, unless stated otherwise) for each trait. Moderate agreement (>0.5) shaded and good agreement (>0.6) in bold.
Where N <20 within pairs or <120 for overall comparison.
N = 19;
N = 17;
N = 16;
N = 15;
N = 118;
N = 103;
N = 119.
Study 1: Median scores given by handler for own dog's performance and scores given by other handler and Wilcoxon Signed Ranks statistic (z) comparing within dog, across all 12 handlers.
| −2.658 | −3.251 | −3.390 | 1.858 p = 0.063 | −2.726 | −1.147 | −2.853 | −3.236 | |
| Median score given to own dog | 3.5 (4) | 3.9 (4) | 4.1 (4) | 2.0 (2) | 4.3 (4) | 3.9 (4) | 3.9 (4) | 7.1 (7) |
| Median score given to other dog | 3.2 (3) | 3.6 (4) | 4.0 (4) | 2.2 (2) | 4.1 (4) | 3.8 (4) | 3.6 (4) | 6.8 (7) |
p < 0.01.
Study 1: Significant differences within pairs of handlers for each trait as shown by Wilcoxon Signed Rank tests.
| Pair 1 | 1 | own* | other* | other* | |||||
| 2 | own* | own** | own* | ||||||
| Pair 2 | 3 | ||||||||
| 4 | other* | ||||||||
| Pair 3 | 5 | other* | other* | ||||||
| 6 | own* | own** | own** | own* | |||||
| Pair 4 | 7 | own* | own* | own* | |||||
| 8 | own* | own* | other* | own** | own* | own* | |||
| Pair 5 | 9 | other* | other* | ||||||
| 10 | own** | own* | |||||||
| Pair 6 | 11 | own** | own** | other** | |||||
| 12 | own* | own* | own* | own* | |||||
Own denotes the dog's handler scored them significantly higher, other denotes the other handler rated the dog higher (p < 0.05*; p < 0.01**).
Study 2: Levels of agreement between scores given by own handler, other handlers, and experts.
| Control | 0.587 | |||
| Motivation | 0.460 | 0.331 | 0.135 | 0.110 |
| Stamina | 0.443 | 0.191 | 0.187 | 0.129 |
| Distraction | 0.179 | 0.448 | ||
| Confidence | 0.470 | |||
| Independence | 0.184 | 0.370 | ||
| Indication | 0.587 | 0.393 | 0.217 | 0.519 |
| Overall ability | 0.403 | −0.028 | −0.113 | 0.421 |
Kendall's coefficient of concordance (W), for 3 -way comparison and Spearman's rho (r.
Figure 1Differences between own (nine raters), mean of other handler (n = 8) and expert (n = 3) ratings for the performance traits. Asterisks denote significant differences seen between specific raters, using pair-wise Wilcoxon Signed Ranks tests (p < 0.05*; p < 0.01**).