Literature DB >> 33284465

Interobserver and intraobserver reliability for 2 grading systems for gastric ulcer syndrome in horses.

Jessica C Wise¹, Edwina J A Wilkes¹, Sharanne L Raidal¹, Gang Xie², Danielle E Crosby¹, Josephine N Hale¹, Kristopher J Hughes¹.

Abstract

BACKGROUND: Grading of equine gastric ulcer syndrome (EGUS) is undertaken in clinical and research settings, but the reliability of EGUS grading systems is poorly understood. HYPOTHESIS/
OBJECTIVES: Investigate interobserver and intraobserver reliability of an established ordinal grading system and a novel visual analog scale (VAS), and assess the influence of observer experience. ANIMALS: Sixty deidentified gastroscopy videos.
METHODS: Six observers (3 specialists and 3 residents) graded videos using the EGUS Council (EGUC) system and VAS. Observers graded the videos three 3 for each system, using a cross-over design with at least 1 week between each phase. The order of videos was randomized for each phase.
METHODS: Interobserver and intraobserver reliability were estimated using Gwet's agreement coefficient with ordinal weights applied (AC2) for the EGUC system and the intraclass correlation coefficient (ICC) for the VAS.
RESULTS: Using the EGUC system, interobserver reliability was substantial for squamous (AC2 = 0.69; 95% confidence interval [CI], 0.57-0.80) and glandular mucosa (AC2 = 0.72; 95% CI, 0.70-0.75), and intraobserver reliability was substantial for squamous (AC2 = 0.80; 95% CI, 0.71-0.90) and glandular mucosa (AC2 = 0.80; 95% CI, 0.74-0.86). Interobserver reliability using the VAS was moderate for squamous (ICC = 0.64; 95% CI, 0.31-0.96) and poor for glandular mucosa (ICC = 0.35; 95% CI, 0.06-0.64), and intraobserver reliability was moderate for squamous (ICC = 0.74; 95% CI, 0.62-0.86) and glandular mucosa (ICC = 0.56; 95% CI, 0.39-0.72). CONCLUSIONS AND CLINICAL IMPORTANCE: The EGUC system had acceptable intraobserver and interobserver reliability and performed well regardless of observer experience. Familiarity and observer experience improved reliability of the VAS.

Entities: Chemical

Keywords: EGUS; VAS; grading systems; horse; reliability

Mesh：

Year: 2020 PMID： 33284465 PMCID： PMC7848314 DOI： 10.1111/jvim.15987

Source DB: PubMed Journal: J Vet Intern Med ISSN： 0891-6640 Impact factor: 3.175

95% confidence interval Gwet's coefficient of agreement with ordinal weights applied Equine Gastric Ulcer Council equine gastric ulcer syndrome practitioner's simplified visual analog scale

INTRODUCTION

Equine gastric ulcer syndrome (EGUS) is the most common disorder of the equine stomach. Grading of EGUS lesions may inform treatment selection, comparison of the efficacy of different treatments, and the impact of husbandry protocols on ulcer healing. , , , , , For grading systems to be useful, good inter‐ and intraobserver reliability are required to facilitate comparisons of effects of treatments within and between studies and assessment when multiple clinicians are involved in case management. A simple gastric ulcer lesion grading system based on an ordinal scale (0‐4) was described in 1999 by the Equine Gastric Ulcer Council (EGUC). This grading system can be applied to the squamous and glandular mucosa of the equine stomach. , Other grading systems for EGUS have been described, including a number/severity system, the practitioner's simplified (PS) scoring system, , , and ordinal systems based on ulcer depth and surface area. , The EGUC grading system has higher interobserver reliability compared with the number/severity system, and currently is recommended for assessment of the squamous mucosa. However, there is no consensus on or uniformity in the use of the EGUC grading system, and uncertainty exists for glandular mucosa assessment. There are limitations in the assessment of disease when severity varies along a continuum, as occurs in EGUS, because ordinal grading systems require strict categorization of severity according to predetermined criteria or definitions. Visual analog scales (VAS) are used in complex clinical contexts to facilitate assessment of subjective characteristics that cannot be directly measured and allow users to integrate multiple variables into a single continuous variable. Previously, VAS have been used for grading of gastrointestinal lesions in humans , and may provide advantages over an ordinal scale‐based approach for assessment of gastric ulceration in horses, including collection of continuous data, which allows for different statistical analysis options. Our aims were to investigate (a) interobserver and intraobserver reliability of the EGUC grading system and a novel VAS for assessment of squamous and glandular gastric mucosal lesions and (b) the influence of observer experience on the outcomes for both systems. We hypothesized that the use of a VAS would result in superior estimates of reliability compared to the EGUC grading system and that experienced observers would have higher reliability for grading of gastric lesions than would less‐experienced observers.

MATERIALS AND METHODS

Horses

Sixty prerecorded, deidentified gastroscopy videos, obtained from horses during unrelated research projects, were used. For inclusion, visualization of the greater curvature, margo plicatus, lesser curvature, glandular mucosa, and pyloric antrum was required. Videos were selected by a single technician who did not participate in the study, and attempted to include an even distribution of lesion severity, based on gastric mucosal appearance.

Grading systems

Two grading systems were used: the EGUC system (Table 1) and a novel VAS (Figure 1). The VAS was a 10 cm line anchored at both ends with words descriptive of the maximal and minimal extremes of the dimension being measured. The VAS is used as a 100‐point continuous scale. Separate scores were recorded for the squamous and glandular mucosa for both systems.

TABLE 1

The Equine Gastric Ulcer Council (EGUC) 5‐point ordinal grading system for grading squamous and glandular gastric disease

Grade	Squamous mucosa	Glandular mucosa
0	The epithelium is intact and there is no appearance of hyperkeratosis	The epithelium is intact and there is no appearance of hyperemia
1	The mucosa is intact, but there are areas of hyperkeratosis	The epithelium is intact, but there are areas of hyperemia
2	Small, single or multifocal lesions	Small, single, or multifocal lesions
3	Large single or extensive superficial lesions	Large single or extensive superficial lesions
4	Extensive lesions with areas of apparent deep ulceration	Extensive lesions with area of apparent deep ulceration

FIGURE 1

The visual analog scoring system for grading the appearance of squamous and glandular gastric mucosa

The Equine Gastric Ulcer Council (EGUC) 5‐point ordinal grading system for grading squamous and glandular gastric disease The visual analog scoring system for grading the appearance of squamous and glandular gastric mucosa

Observers

Six observers were included: 3 specialists in equine medicine and 3 residents in equine disciplines (medicine, surgery and sports medicine). The observers graded the videos 3 times for each system. The grading systems were used alternatively in a cross‐over design with at least 1 week between each of the 6 phases of the study. For each phase, the order of videos was randomized to avoid pattern recognition that might contribute to measurement bias and influence study validity.

Statistical analysis

All statistical analyses were performed using Ra Statistical Software (R version 3.6.0 [2019]). For the EGUC system, intra‐ and interobserver reliability were assessed using Gewt's coefficient of agreement with ordinal weighting applied (AC2). Interpretation of AC2 was derived from a previously proposed system : ≤0.20: poor, 0.21 to 0.40: fair, 0.41 to 0.60: moderate, 0.61 to 0.80: substantial, and 0.81 to 1.0: excellent reliability. For the VAS, observer reliability was estimated by calculation of the intraclass correlation coefficient (ICC) based on a mean rating (k = 6), absolute agreement, 2‐way mixed effects model, and 95% confidence interval (CI). The benchmarking of ICC values was adapted from previous studies , : <0.50: poor, 0.50 to 0.75: moderate, 0.76 to 0.90: good, and >0.9: excellent reliability. For both the squamous and glandular mucosa, interobserver reliability coefficients were calculated for each of the 3 phases for each grading system, and the mean and 95% CI were calculated. Interobserver reliability was calculated for the 3 experienced observers (observers 1‐3) and the 3 less‐experienced observers (observers 4‐6), and the mean and 95% CI were calculated for observer groups across the 3 phases. The intraobserver reliability coefficients for the 6 observers were calculated from ratings obtained over the 3 phases of the study for each grading system. The mean and 95% CI for the reliability coefficients were calculated for all observers, experienced observers, and less‐experienced observers.

RESULTS

EGUS Council grading system

Interobserver reliability

Results of the analyses of interobserver reliability of the EGUC system for squamous and glandular gastric mucosa are provided in Figure 2 and Supplementary Item 1. Substantial interobserver reliability was found for grading of squamous (mean AC2, 0.69; 95% CI, 0.57‐0.80) and glandular mucosa (mean AC2, 0.72; 95% CI, 0.70‐0.75). Minimal difference was found in interobserver reliability of squamous or glandular mucosa over the 3 phases (Figure 2). Overall, experience had limited influence on interobserver reliability for grading of squamous or glandular mucosa (Figure 3, Supplementary Item 2), but experienced observers had higher reliability (AC2, 0.77; 95% CI, 0.70‐0.83) than did less‐experienced observers (AC2, 0.62; 95% CI, 0.52‐0.72) for glandular mucosal grading in Phase 1. Experienced observers demonstrated improvement in interobserver reliability when grading squamous mucosa between Phase 1 (AC2, 0.53; 95% CI, 0.41‐0.64) and Phase 3 (AC2, 0.74; 95% CI, 0.66‐0.81).

FIGURE 2

FIGURE 3

Results of Gwet's coefficient of agreement with ordinal weighting (AC2) comparing the interobserver reliability of experienced observers (specialists in equine medicine) and less‐experienced observers (residents in equine disciplines) grading squamous and glandular gastric mucosa using the Equine Gastric Ulcer Council (EGUC) system on 3 occasions. The figure is presented as mean and 95% CI

Results of Gwet's coefficient of agreement with ordinal weighting applied (AC2) for interobserver reliability of observers grading squamous and glandular gastric mucosa using the Equine Gastric Ulcer Council (EGUC) system on 3 occasions. The figure is presented as mean and 95% CI Results of Gwet's coefficient of agreement with ordinal weighting (AC2) comparing the interobserver reliability of experienced observers (specialists in equine medicine) and less‐experienced observers (residents in equine disciplines) grading squamous and glandular gastric mucosa using the Equine Gastric Ulcer Council (EGUC) system on 3 occasions. The figure is presented as mean and 95% CI

Intraobserver reliability

The estimates of intraobserver reliability of the EGUC system for squamous and glandular mucosa are provided in Table 2. Overall, substantial intraobserver reliability was found for assessment of both the squamous (AC2, 0.80; 95% CI, 0.71‐0.90) and glandular mucosa (AC2, 0.80; 95% CI, 0.74‐0.86) using the EGUC grading system. Experienced observers had excellent and substantial intraobserver reliability for EGUC system grading of squamous (mean AC2, 0.83; 95% CI, 0.75‐0.92) and glandular gastric mucosa (mean AC2, 0.79; 95% CI, 0.73‐0.85), respectively. Less‐experienced observers demonstrated substantial and excellent intraobserver reliability when grading squamous (mean AC2, 0.77; 95% CI, 0.45‐1.0) and glandular mucosa (mean AC2, 0.82; 95% CI, 0.62‐1.0), respectively. For individual observers, experience had minimal influence on intraobserver reliability: pairwise comparisons did not identify differences, with the exception of a lower AC2 result for Observer 5 compared to Observers 1 and 3 for squamous mucosal grading (Table 2).

TABLE 2

Results of Gwet's coefficient of agreement with ordinal weighting (AC2) for the intraobserver reliability of scoring of glandular and squamous gastric mucosa with the Equine Gastric Ulcer Council (EGUC) grading system. The mean AC2 has been calculated for the intraobserver reliability of experienced observers (specialists in equine medicine) and less experienced observers (residents in equine disciplines)

	Glandular mucosa			Squamous mucosa
	AC2	95% CI		AC2	95% CI
	AC2	Lower limit	Upper limit	AC2	Lower limit	Upper limit
Experienced
Observer 1	0.76	0.66	0.87	0.83	0.74	0.92
Observer 2	0.81	0.76	0.87	0.80	0.73	0.87
Observer 3	0.79	0.70	0.87	0.87	0.83	0.91
Mean (n = 3)	0.79	0.73	0.85	0.83	0.75	0.92
Less experienced observers
Observer 4	0.89	0.85	0.93	0.83	0.78	0.88
Observer 5	0.73	0.64	0.82	0.62	0.50	0.74
Observer 6	0.84	0.75	0.93	0.85	0.80	0.91
Mean (n = 3)	0.82	0.62	1.0	0.77	0.45	1.0
Overall mean (n = 6)	0.80	0.74	0.86	0.80	0.71	0.90

Visual analog scale

The estimates of the interobserver reliability of the VAS for grading squamous and glandular gastric mucosa are provided in Figure 4 and Supplementary Item 3. Overall, the interobserver reliability of the VAS was moderate for squamous mucosal grading (mean ICC, 0.64; 95% CI, 0.31‐0.96) and poor for glandular mucosal grading (mean ICC, 0.35; 95% CI, 0.06‐0.64). Interobserver reliability was higher for grading of the squamous mucosa than for glandular mucosa in Phase 2 (squamous ICC, 0.64; 95% CI, 0.53‐0.73; glandular ICC, 0.26; 95% CI, 0.15‐0.40) and Phase 3 (squamous ICC, 0.77; 95% CI, 0.69‐0.84; glandular ICC, 0.32; 95% CI, 0.20‐0.47), largely because of increasing reliability of squamous mucosal grading over time (Figure 4; Supplementary Item 3).

FIGURE 4

Results of the intraclass correlation coefficient (ICC), 1 way model, for the interobserver reliability of observers grading squamous and glandular gastric mucosa using the visual analog scale (VAS) on 3 occasions. The figure is presented as mean and 95% CI Overall, experience had an effect on the interobserver reliability of the VAS. For both squamous and glandular mucosal grading, reliability coefficients for experienced observers were higher than those of less‐experienced observers for all phases (Figure 5; Supplementary Item 4), most notably for grading of the glandular mucosa in Phase 2 (Figure 5). Both experienced and less‐experienced observers demonstrated improvement in reliability of grading squamous mucosa using the VAS from Phase 1 to Phase 3, whereas, overall, interobserver reliability for grading of the glandular mucosa was poor and did not improve, regardless of experience (Figure 5).

FIGURE 5

Results of the intraclass correlation coefficient (ICC), 1 way model, comparing the interobserver reliability of experienced observers (specialists in equine medicine) and less‐experienced observers (residents in equine disciplines) grading squamous and glandular gastric mucosa using the visual analog scale (VAS) on 3 occasions. The figure is presented as mean and 95% CI The estimates of intraobserver reliability of the VAS for grading of squamous and glandular gastric mucosa are provided in Table 3. Overall, intraobserver reliability using the VAS system was good and moderate for grading of the squamous (mean ICC, 0.74; 95% CI, 0.62‐0.86) and glandular mucosa (mean ICC, 0.56; 95% CI, 0.39‐0.72), respectively. By group, experienced observers had good and moderate intraobserver reliability for VAS grading of squamous (mean ICC, 0.83; 95% CI, 0.55‐1.0) and glandular mucosa (mean ICC, 0.65; 95% CI, 0.50‐0.80), respectively, whereas less‐experienced observers had moderate reliability when grading squamous mucosa (mean ICC, 0.67; 95% CI, 0.46‐0.86) and poor reliability for glandular mucosal grading (mean AC2, 0.46; 95% CI, 0.00‐0.92). Individual pair‐wise comparisons indicated some differences with experience. For squamous mucosal grading, observers 4 and 6 had lower reliability than did observers 1‐3, whereas for glandular grading, observer 5 had lower reliability than did observers 1 and 3, and observer 4 had lower reliability than did observer 3.

TABLE 3

Results of the intraclass correlation coefficient (ICC), one way model, for the intraobserver reliability of grading squamous and glandular mucosa using the novel visual analog scale. The mean ICC has been calculated for experienced observers (specialists in equine medicine) and less‐experienced observers (residents in equine disciplines)

	Glandular mucosa			Squamous mucosa
	ICC	95% CI		ICC	95% CI
	ICC	Lower limit	Upper limit	ICC	Lower limit	Upper limit
Experienced
Observer 1	0.68	0.53	0.79	0.75	0.64	0.83
Observer 2	0.58	0.41	0.72	0.85	0.78	0.90
Observer 3	0.69	0.56	0.80	0.89	0.83	0.93
Mean (n = 3)	0.65	0.50	0.80	0.83	0.65	1.0
Less experienced observers
Observer 4	0.41	0.25	0.56	0.59	0.45	0.71
Observer 5	0.31	0.15	0.48	0.75	0.65	0.83
Observer 6	0.67	0.54	0.77	0.64	0.52	0.75
Mean (n = 3)	0.46	0.0	0.92	0.67	0.46	0.86
Overall mean (n = 6)	0.56	0.39	0.72	0.74	0.62	0.86

DISCUSSION

We comprehensively investigated interobserver and intraobserver reliability of the EGUC system and introduced a novel VAS for scoring the endoscopic appearance of the equine stomach. Overall, the EGUC system had substantial interobserver and intraobserver reliability for grading of both squamous and glandular mucosa, and reliability was minimally influenced by experience. The reliability of the VAS was more variable, with poor reliability for grading glandular mucosa, and was influenced by observer experience and familiarity with the system. In our study, the EGUC system demonstrated substantial interobserver reliability and substantial to excellent intraobserver reliability. These results are consistent with the findings of an earlier study in which good interobserver agreement of the EGUC system was reported. Similarly, ordinal grading systems are used for the assessment of lameness, heart murmurs and ataxia in horses, and moderate to substantial interobserver and intraobserver reliability and agreement for these systems have been reported. , , , Given widespread application of ordinal grading systems in veterinary clinical practice and research, determination of intra‐ and interobserver agreement and reliability of each system is important. Although agreement reflects the extent to which scores, ratings or diagnoses are identical, reliability is the ratio of variability between scores or ratings of the same patients to the total variability of all scores in the sample and represents the ability of a measurement to differentiate between patients. Both agreement and reliability are important for the development of rating scales and conduct of clinical studies, and provide information on the error inherent in measurement, rating, or diagnosis. Although agreement is desirable for binary decisions, such as whether to institute treatment or not, our results indicate good ability of observers to distinguish between ulcer severity when using the EGUC system, which remains important in clinical and research settings because it indicates that this system can be used for comparison among studies and assessment of animal responses to treatment and management changes. There was minimal influence of experience on the interobserver or intraobserver reliability of the EGUC system. Within the group of less‐experienced observers, 2 of the 3 observers had no previous experience in using the EGUC grading system. The experienced observers all were specialists in equine internal medicine, with extensive clinical and research experience using the EGUC grading system. Our findings emphasize that interobserver and intraobserver reliability of the EGUC system is not affected when used by observers unfamiliar with the grading system, or by observers experienced using the system. Furthermore, interobserver reliability was not different when grading squamous or glandular mucosa. Intraobserver reliability was slightly better than interobserver reliability, possibly reflecting different interpretation of the EGUC system scale among individuals, but good ability of individual observers to repeatedly apply the grading system scale in the same way. Intraobserver reliability has been reported to be higher than interobserver reliability for other ordinal grading systems, , , which may reflect consistency in the interpretation or application of the grading system within observers, but differences in interpretation of the grading system among observers. Differences in interpretation of a grading system have been speculated to be affected by clinical experience and opinions of the disorder being assessed. However, the impact of experience on interobserver or intraobserver reliability of the EGUC grading system was minimal in our study. In our study, reliability of the EGUC system was estimated using Gwet's weighted agreement coefficient (AC2). In previous studies, Gwet's AC statistics have been found to provide good estimates of intra‐ and interobserver reliability for categorical scoring systems in human medicine. , , Gwet's AC1 is a first‐order agreement coefficient that is an alternative to the kappa coefficient and adjusts the overall probability of agreement for chance agreement. Although the AC1 statistic can be used for any number of raters, this coefficient is used primarily for nominal data. The second‐order agreement coefficient, Gwet's AC2 statistic, is a weighted version of AC1 that adjusts for chance agreement and accounts for misclassification errors and nonabsolute agreement, and is recommended for analyzing ordinal, interval, and ratio data. Other estimates of reliability, including Cohen's kappa and weighted Cohen's kappa statistics, have been used to estimate the interobserver and intraobserver reliability of ordinal grading systems. The advantage of Gwet's AC2 statistic over other estimates of agreement, including Cohen's kappa, is that it is paradox‐resistant and expected to provide a more accurate estimate of observer reliability because other estimates of agreement often are influenced by the number of categories available and the proportion of subjects in each category, creating a paradox whereby a low agreement coefficient is calculated despite good reliability. , To our knowledge, ours is the first study to use Gwet's AC2 statistic to assess the reliability of the EGUC grading system. The results indicated that reliability was substantial to excellent, within and between observers, for rating of both squamous and glandular mucosal lesions using the EGUC system (ie, observers graded lesions similarly but not identically). Although observers may grade lesions similarly, differences in the interpretation of the EGUC system remain possible, which is important when applying this system to measure treatment efficacy, as has been done previously. , , , , In some studies, a differences of 1 grade was considered a treatment effect or improvement, , , but our findings indicate that the intraobserver reliability of the EGUC system is not perfect, requiring consideration when assessing responses to treatment. Similarly, consideration of the interobserver reliability of the EGUC system is necessary when several clinicians are involved in the assessment of treatment responses in an individual animal, because our results suggest that observers grade mucosal lesions in a similar but not identical way. The severity of gastric lesions in our study varied along a continuum, presenting challenges for categorization using the defined grading criteria. Although the use of ordinal scales to assess changes is simpler for extreme categories, observer agreement can be more challenging for borderline categories or for assessment of mild and moderate disease, leading to higher variability and misclassification errors. Our study introduces the use of a VAS to grade EGUS, in an attempt to provide an alternative to the ordinal grading system. In our study, the inter‐ and intraobserver reliability of the VAS improved with time and there was some influence of observer experience. Because none of the observers in our study had previous experience in using the VAS, the differences in reliability between the 2 groups is likely more reflective of knowledge and clinical experience than of familiarity with the grading system. The reliability results for the VAS for experienced observers in our study were similar to those reported for clinical assessments in human dentistry (0.69‐0.92) and human medicine (0.77‐0.91). In our study, interobserver reliability of the VAS for grading of the squamous mucosa improved over time (Figures 4 and 5), which may reflect conditioning of observers to the VAS and increasing familiarity with the system. Previously, consistency among observers using a VAS has been improved by consensus meetings and by the use of guide points (anchors) adjacent to the scale. Further investigation into the value of training clinicians in the use of VAS, and whether reliability of this system is improved with repeated utilization of the system by observers, is warranted. Our results suggest that inter‐ and intraobserver reliability of grading squamous gastric mucosa with the VAS could be improved using these techniques. For the VAS, inter‐ and intraobserver reliability were better for grading squamous mucosa that glandular mucosa. These results likely reflect an observer's ability to grade severity of squamous gastric mucosal lesions and difficulties in interpreting glandular lesions and application of hierarchical grading systems. Although we tried to include a broad spectrum of squamous and glandular disease in the study, most of the included gastroscopy videos featured mild to moderate glandular disease, and very few chronic severe gastric lesions were available. The poor inter‐ and intraobserver reliability found when using the VAS to grade glandular gastric lesions may be compounded by the included observers' inexperience with using the VAS grading system, as well as the complexity of interpreting mild to moderate glandular gastric lesions. The poor reliability of the VAS also may be explained by increased categories (continuous scale) when compared with the EGUC system. Similarly, in a previous study, the N/S system, which contains a higher number of categories, had a poorer reliability when compared with the EGUC. The poor reliability of both the VAS and N/S systems in comparison with the EGUC may reflect increased ease of reliability with fewer categories. Visual analog scales have been established as valid and reliable in a range of clinical and research applications. In our study, the novel VAS used was designed as previously recommended using a 10 cm line with words descriptive of the maximal and minimal extremes of the dimension being measured. The 10 cm line was used as a 100‐point continuous scale and data were used for estimation of observer reliability. Intermediate points were not used in the VAS to avoid false clustering of scores around an intermediate point or numbers. The use of a VAS results in collection of continuous data, permitting a wider range of statistical analysis options and the potential for higher power and sensitivity of outcome rankings. The location and dispersion of scores might give information on the extent to which the observer takes advantage of the length of the scale, but this was not evaluated in our study. The use of benchmarking reliability coefficients allows for practical application and interpretation of results. However, the margin of error associated with the reliability coefficient also should be included in interpretation of the results. , The estimates of reliability calculated for the EGUC and VAS, by the AC2 coefficient and ICC, respectively, cannot be directly compared. As such, 2 different benchmarking systems were used to reflect the 2 different statistical methods used to estimate reliability in our study. Application of only the reliability coefficient to determine the benchmark often leads to an overly optimistic characterization of the extent of reliability. In our study, all measures of reliability, for both grading systems, had wide CI. The width of CI reflects the variability or precision of the calculated estimate, and precision is associated with the degree of random error, which is minimized by increasing sample size. In our study, the small number of observers increased random error and resulted in more imprecise estimates of reliability, reflected by the wide 95% CI. , When comparing the reliability coefficients, either between observers in intraobserver assessments, or over phases in interobserver assessments, the 95% CI overlapped, which reflects uncertainty as to whether a true difference existed (Figure 3). Conversely, a lack of overlap in 95% CI increases the likelihood of a true difference in results, such as the improvement in the interobserver reliability of grading squamous mucosa with the VAS shown in Figure 4. A limitation of our study was the use of prerecorded videos, rather than assessment of gastric ulceration at the time of gastroscopy. This approach was necessary to ensure appropriate stratification of gastric ulcer lesions and permit repeated evaluation of unchanged lesions. In a study of human patients, good agreement between video‐recorded and live colonoscopy examinations was found, although live assessment was perceived as easier. In a previous study, interobserver agreement in the evaluation of lameness was higher for examination of live horses, compared to video recordings. To our knowledge, comparison of live and recorded gastroscopic examinations in horses has not been performed. Glandular lesions are considered more difficult to grade than squamous lesions, and it has been suggested that the number, location and type of lesions be recorded. In our study, fewer severe glandular lesions were available for inclusion than was the case for squamous lesions, and this difference may have influenced the scoring of glandular lesions by either system. Another limitation of our study is that all included observers worked in the same referral hospital, which could have influenced the estimates of interobserver reliability.

CONCLUSION

The EGUC system for grading EGUS lesions has acceptable intraobserver and interobserver reliability and performs well regardless of clinician experience. A VAS may offer advantages in ease of use for rating of squamous mucosa, but observers should be practiced in the use of this system. Glandular lesions in horses may be more difficult to grade than squamous lesions.

CONFLICT OF INTEREST DECLARATION

Kristopher J. Hughes serves as Associate Editor for the Journal of Veterinary Internal Medicine. He was not involved in review of this manuscript.

OFF‐LABEL ANTIMICROBIAL DECLARATION

Authors declare no off‐label use of antimicrobials.

INSTITUTIONAL ANIMAL CARE AND USE COMMITTEE (IACUC) OR OTHER APPROVAL DECLARATION

As this study included videos from previous studies with appropriate ethical animal research approval, this study required no ethical animal or human research approval.

HUMAN ETHICS APPROVAL DECLARATION

Authors declare human ethics approval was not needed for this study. Supplementary Item 1 Results of Gwet's coefficient of agreement with ordinal weighting (AC2) for the interobserver reliability of observers grading squamous and glandular gastric mucosa using the EGUC system on 3 occasions. Click here for additional data file. Supplementary Item 2 Results of Gwet's coefficient of agreement with ordinal weighting (AC2) for the interobserver reliability of experienced observers (specialists in equine medicine) and less‐experienced observers (residents in equine disciplines) grading squamous and glandular gastric mucosa using the EGUC grading system on 3 occasions. Click here for additional data file. Supplementary Item 3 Results of intraclass correlation coefficient (ICC), 1 way model, for the interobserver reliability of observers grading squamous and glandular gastric mucosa using the VAS on 3 occasions. Click here for additional data file. Supplementary Item 4 Results of the intraclass correlation coefficient (ICC), 1 way model, for the interobserver reliability of experienced (specialists in equine medicine) and less‐experienced observers (residents in equine disciplines) grading squamous and glandular gastric mucosa using the VAS on 3 occasions. Click here for additional data file.

33 in total

1. Visual analogue scales for endoscopic evaluation of nonsteroidal anti-inflammatory drug-induced mucosal damage in the stomach and duodenum.

Authors: L Aabakken; S Larsen; M Osnes
Journal: Scand J Gastroenterol Date: 1990-05 Impact factor: 2.423

2. Gastric ulcers in standardbred racehorses: prevalence, lesion description, and risk factors.

Authors: Rachel M Dionne; André Vrins; Michèle Y Doucet; Julie Paré
Journal: J Vet Intern Med Date: 2003 Mar-Apr Impact factor: 3.333

3. A prospective comparison of live and video-based assessments of colonoscopy performance.

Authors: Michael A Scaffidi; Samir C Grover; Heather Carnahan; Jeffrey J Yu; Elaine Yong; Geoffrey C Nguyen; Simon C Ling; Nitin Khanna; Catharine M Walsh
Journal: Gastrointest Endosc Date: 2017-08-30 Impact factor: 9.427

4. The measurement of observer agreement for categorical data.

Authors: J R Landis; G G Koch
Journal: Biometrics Date: 1977-03 Impact factor: 2.571

5. Effects of omeprazole paste on healing of spontaneous gastric ulcers in horses and foals: a field trial.

Authors: C G MacAllister; R L Sifferman; S R McClure; G W White; N J Vatistas; J E Holste; G F Ericcson; J L Cox
Journal: Equine Vet J Suppl Date: 1999-04

6. The intra- and inter-assessor reliability of measurement of functional outcome by lameness scoring in horses.

Authors: Catherine J Fuller; Bruce M Bladon; Adam J Driver; Alistair R S Barr
Journal: Vet J Date: 2004-12-10 Impact factor: 2.688

7. Validity and application of immunoturbidimetric and enzyme-linked immunosorbent assays for the measurement of adiponectin concentration in ponies.

Authors: N J Menzies-Gow; E J Knowles; I Rogers; D I Rendle
Journal: Equine Vet J Date: 2018-05-17 Impact factor: 2.888

8. Serial evaluation of resting and exercising overground endoscopic examination results in young Thoroughbreds with no treatment intervention.

Authors: C L McGivney; J Sweeney; K F Gough; E W Hill; L M Katz
Journal: Equine Vet J Date: 2018-08-25 Impact factor: 2.888

9. European College of Equine Internal Medicine Consensus Statement--Equine Gastric Ulcer Syndrome in Adult Horses.

Authors: B W Sykes; M Hewetson; R J Hepburn; N Luthersson; Y Tamzali
Journal: J Vet Intern Med Date: 2015 Sep-Oct Impact factor: 3.333

10. Interobserver and intraobserver reliability for 2 grading systems for gastric ulcer syndrome in horses.

Authors: Jessica C Wise; Edwina J A Wilkes; Sharanne L Raidal; Gang Xie; Danielle E Crosby; Josephine N Hale; Kristopher J Hughes
Journal: J Vet Intern Med Date: 2020-12-07 Impact factor: 3.175

5 in total

1. Changes in Proteins in Saliva and Serum in Equine Gastric Ulcer Syndrome Using a Proteomic Approach.

Authors: Alberto Muñoz-Prieto; Maria Dolores Contreras-Aguilar; Jose Joaquín Cerón; Ignacio Ayala; Maria Martin-Cuervo; Juan Carlos Gonzalez-Sanchez; Stine Jacobsen; Josipa Kuleš; Anđelo Beletić; Ivana Rubić; Vladimir Mrljak; Fernando Tecles; Sanni Hansen
Journal: Animals (Basel) Date: 2022-05-02 Impact factor: 3.231

Review 2. Equine Squamous Gastric Disease: Prevalence, Impact and Management.

Authors: Michael Hewetson; Rose Tallon
Journal: Vet Med (Auckl) Date: 2021-12-31

3. Changes in Oxidative Status Biomarkers in Saliva and Serum in the Equine Gastric Ulcer Syndrome and Colic of Intestinal Aetiology: A Pilot Study.

Authors: María Dolores Contreras-Aguilar; Camila Peres Rubio; Luis Guillermo González-Arostegui; María Martín-Cuervo; Jose J Cerón; Ignacio Ayala; Ida-Marie Holm Henriksen; Stine Jacobsen; Sanni Hansen
Journal: Animals (Basel) Date: 2022-03-07 Impact factor: 2.752

4. Assessment of agreement using the equine glandular gastric disease grading system in 84 cases.

Authors: Stefanie Pratt; Ian Bowen; Gayle Hallowell; Emma Shipman; Adam Redpath
Journal: Vet Med Sci Date: 2022-04-12

5. Interobserver and intraobserver reliability for 2 grading systems for gastric ulcer syndrome in horses.

Authors: Jessica C Wise; Edwina J A Wilkes; Sharanne L Raidal; Gang Xie; Danielle E Crosby; Josephine N Hale; Kristopher J Hughes
Journal: J Vet Intern Med Date: 2020-12-07 Impact factor: 3.175

5 in total