| Literature DB >> 25401054 |
Paul Walsh1, Justin Thornton2, Julie Asato2, Nicholas Walker2, Gary McCoy2, Joe Baal2, Jed Baal2, Nanse Mendoza2, Faried Banimahd3.
Abstract
Objectives. To measure inter-rater agreement of overall clinical appearance of febrile children aged less than 24 months and to compare methods for doing so. Study Design and Setting. We performed an observational study of inter-rater reliability of the assessment of febrile children in a county hospital emergency department serving a mixed urban and rural population. Two emergency medicine healthcare providers independently evaluated the overall clinical appearance of children less than 24 months of age who had presented for fever. They recorded the initial 'gestalt' assessment of whether or not the child was ill appearing or if they were unsure. They then repeated this assessment after examining the child. Each rater was blinded to the other's assessment. Our primary analysis was graphical. We also calculated Cohen's κ, Gwet's agreement coefficient and other measures of agreement and weighted variants of these. We examined the effect of time between exams and patient and provider characteristics on inter-rater agreement. Results. We analyzed 159 of the 173 patients enrolled. Median age was 9.5 months (lower and upper quartiles 4.9-14.6), 99/159 (62%) were boys and 22/159 (14%) were admitted. Overall 118/159 (74%) and 119/159 (75%) were classified as well appearing on initial 'gestalt' impression by both examiners. Summary statistics varied from 0.223 for weighted κ to 0.635 for Gwet's AC2. Inter rater agreement was affected by the time interval between the evaluations and the age of the child but not by the experience levels of the rater pairs. Classifications of 'not ill appearing' were more reliable than others. Conclusion. The inter-rater reliability of emergency providers' assessment of overall clinical appearance was adequate when described graphically and by Gwet's AC. Different summary statistics yield different results for the same dataset.Entities:
Keywords: Clinical appearance; Cohen’s kappa; Emergency medicine; Fever; Graphical analysis; Gwet’s AC; Inter-rater agreement; Pediatric
Year: 2014 PMID: 25401054 PMCID: PMC4230550 DOI: 10.7717/peerj.651
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Diagnoses by gestalt classification for each rater.
Percentages exceed 100 because of rounding.
| First rater ‘gestalt’ impression | Second rater ‘gestalt’ impression | ||||||
|---|---|---|---|---|---|---|---|
|
| Not ill | Unsure | Ill | Not ill | Unsure | Ill | |
|
| 14 (9) | 3 | 3 | 8 | 3 | 1 | 10 |
|
| 4 (3) | 1 | 0 | 3 | 1 | 0 | 3 |
|
| 9 (6) | 2 | 0 | 7 | 2 | 1 | 6 |
|
| 6 (4) | 1 | 1 | 4 | 1 | 1 | 4 |
|
| 8 (5) | 1 | 1 | 6 | 0 | 0 | 8 |
|
| 6 (4) | 4 | 0 | 2 | 3 | 0 | 3 |
|
| 3 (2) | 1 | 0 | 2 | 1 | 0 | 2 |
|
| 36 (23) | 3 | 3 | 30 | 5 | 1 | 30 |
|
| 2 (1) | 0 | 0 | 2 | 0 | 0 | 2 |
|
| 2 (1) | 0 | 0 | 2 | 0 | 0 | 2 |
|
| 46 (29) | 5 | 5 | 36 | 4 | 5 | 37 |
|
| 1 (1) | 0 | 0 | 1 | 0 | 0 | 1 |
|
| 2 (1) | 0 | 1 | 1 | 0 | 0 | 2 |
|
| 5 (3) | 2 | 0 | 3 | 2 | 0 | 3 |
|
| 3 (2) | 0 | 0 | 3 | 0 | 0 | 3 |
|
| 12 (8) | 4 | 0 | 8 | 5 | 0 | 7 |
|
|
| 27 | 14 | 118 | 27 | 9 | 123 |
Notes.
Urinary tract infection
Upper respiratory tract infection
not otherwise specified
The levels of training of the first and second raters and numbers of patients seen by each combination.
| Second rater | |||||
|---|---|---|---|---|---|
| First rater | PGY1/2 | PGY3/4 | MLP | Attending | Total |
|
| 6 (4) | 16 (10) | 2 (1) | 6 (4) | 30 (19) |
|
| 9 (6) | 33 (21) | 3 (2) | 29 (18) | 74 (47) |
|
| 5 (3) | 21 (13) | 8 (5) | 6 (4) | 40 (25) |
|
| 4 (3) | 11 (7) | 0 (0) | 0 (0) | 15 (9) |
|
| 24 (15) | 81 (51) | 13 (8) | 41 (26) | 159 (100) |
Notes.
Post graduate year of training
Mid-level provider
Board certified emergency physician
Rounded percentages in parentheses.
Figure 1Time between evaluations.
Histogram showing the time interval between the first and second raters’ gestalt assessment of clincial appearance. Six outliers with interval >240 min are not shown.
Raw inter-rater agreement for immediate ‘gestalt’ impression of the child’s overall clinical appearance.
| First rater gestalt impression | Second rater gestalt impression | Total | ||
|---|---|---|---|---|
| Not ill appearing | Unsure | Ill appearing | ||
| Not ill appearing | 94 | 11 | 13 | 118 |
| Unsure | 12 | 0 | 2 | 14 |
| Ill appearing | 14 | 5 | 8 | 27 |
|
| 120 | 16 | 23 | 159 |
Raw inter-rater agreement for clinical impression of the child’s overall appearance after examining the child.
| First rater after examining child | Second rater after examining child | Total | ||
|---|---|---|---|---|
| Not ill appearing | Unsure | Ill appearing | ||
| Not ill appearing | 103 | 6 | 14 | 123 |
| Unsure | 8 | 0 | 1 | 9 |
| Ill appearing | 14 | 2 | 11 | 27 |
|
| 125 | 8 | 26 | 159 |
Intra-rater agreement between initial gestalt assessment and assessment following examination of overall clinical appearance for the first and second raters.
| First rater gestalt impression | First rater after examining child | Total | ||
|---|---|---|---|---|
| Not ill appearing | Unsure | Ill appearing | ||
| Not ill appearing | 113 | 3 | 2 | 118 |
| Unsure | 8 | 4 | 2 | 14 |
| Ill appearing | 2 | 2 | 23 | 27 |
| Total | 123 | 9 | 27 | 159 |
Figure 2Classification selected and provider training.
Frequency of classification selected by provider experience. PGY, post graduate year, MLP, mid-level provider.
Figure 3Graphical analysis of agreement between examiners.
Agreement between examiners’ initial ‘gestalt’ impression, agreement between examiners’ after completing their exam, and a simulation showing a uniform random agreement.
Inter rater reliability summary measures.
Inter rater reliability measured by Cohen’s κ, weighted κ using two commonly employed weighting schemes, polychoric correlation, and Gwet’s AC1.
| Cohen’s | Weighted | Weighted | Scott’s | Scott’s | Scott’s | Polychoric | Gwet’s | Gwet’s | Gwet’s | |
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
| correlation | AC1 | AC2( | AC2( | |||
| r-rater Gestalt | 0.119 | 0.181 | 0.223 | 0.118 | 0.177 | 0.261 | 0.334 | 0.550 | 0.601 | 0.635 |
| Inter-rater After exam | 0.235 | 0.283 | 0.314 | 0.216 | 0.261 | 0.289 | 0.482 | 0.655 | 0.672 | 0.683 |
| Intra-rater First rater | 0.690 | 0.781 | 0.844 | 0.695 | 0.777 | 0.833 | 0.955 | 0.852 | 0.893 | 0.920 |
| Intra-rater Second rater | 0.651 | 0.714 | 0.758 | 0.671 | 0.734 | 0.777 | 0.912 | 0.837 | 0.871 | 0.893 |
Notes.
Weights 1−|i−j|/(k−1).
Weights 1−[(i−j)/(k−1)]2 where i and j index the rows and columns of the ratings by the raters and k is the maximum number of possible ratings, (l) linear weighted, (q) quadratic weighted. This table is expanded to include other measures of inter-rater agreement in the appendices.
| Reviewer 1 and reviewer 2 agree | Ill appearing: ill appearing |
| Reviewer 1 considers infant more ill appearing | Ill appearing: unsure Unsure: |
| Reviewer 1 considers patient more ill appearing | Ill appearing: Not ill appearing |
| Reviewer 1 considers patient less ill appearing | Unsure: ill appearing |
| Reviewer 1 considers patient less ill appearing | Not ill appearing: Ill appearing |