Literature DB >> 25114496

Grader agreement, and sensitivity and specificity of digital photography in a community optometry-based diabetic eye screening program.

Luckni Sellahewa¹, Craig Simpson², Prema Maharajan², John Duffy², Iskandar Idris³.

Abstract

BACKGROUND: Digital retinal photography with mydriasis is the preferred modality for diabetes eye screening. The purpose of this study was to evaluate agreement in grading levels between primary and secondary graders and to calculate their sensitivity and specificity for identifying sight-threatening disease in an optometry-based retinopathy screening program.
METHODS: This was a retrospective study using data from 8,977 patients registered in the North Nottinghamshire retinal screening program. In all cases, the ophthalmology diagnosis was used as the arbitrator and considered to be the gold standard. Kappa statistics were used to evaluate the level of agreement between graders.
RESULTS: Agreement between primary and secondary graders was 51.4% and 79.7% for detecting no retinopathy (R0) and background retinopathy (R1), respectively. For preproliferative (R2) and proliferative retinopathy (R3) at primary grading, agreement between the primary and secondary grader was 100%. Where there was disagreement between the primary and secondary grader for R1, only 2.6% (n=41) were upgraded by an ophthalmologist. The sensitivity and specificity for detecting R3 was 78.2% and 98.1%, respectively. None of the patients upgraded from any level of retinopathy to R3 required photocoagulation therapy. The observed kappa between the primary and secondary grader was 0.3223 (95% confidence interval 0.2937-0.3509), ie, fair agreement, and between the primary grader and ophthalmology for R3 was 0.5667 (95% confidence interval 0.4557-0.6123), ie, moderate agreement.
CONCLUSION: These data provide information on the safety of a community optometry-based retinal screening program for screening as a primary and as a secondary grader. The level of agreement between the primary and secondary grader at a higher level of retinopathy (R2 and R3) was 100%. Sensitivity and specificity for R3 were 78.2% and 98.1%, respectively. None of the false-negative results required photocoagulation therapy.

Entities: Chemical Disease Gene Species

Keywords: community; diabetes; optometry; public health; retinopathy; screening

Year: 2014 PMID： 25114496 PMCID： PMC4109638 DOI： 10.2147/OPTH.S61483

Source DB: PubMed Journal: Clin Ophthalmol ISSN： 1177-5467

Introduction

Diabetic retinopathy is a highly specific microvascular complication of diabetes and the leading cause of blindness in people under the age of 60 years in industrialized countries.1–4 Data from the Early Treatment of Diabetic Retinopathy Study showed that early laser treatment would be more than 90% effective in preventing blindness,4 and as such, early detection of sight-threatening disease is crucial in preventing blindness in this group of patients. To this end, previous studies have shown the effectiveness of diabetes eye screening programs to prevent blindness in patients with diabetes.2–9 The United Kingdom National Screening Committee therefore recommended a systematic population screening program10 which was implemented in 2003. As a result, the current National Health Service (NHS) Diabetic Eye Screening Programme is in place.11 Digital retinal photography with mydriasis is the preferred modality for diabetic eye screening based on its reported values for sensitivity and specificity,12–15 and its ability to quality assure screening standards.16,17 This modality of retinopathy screening fulfils the Exeter minimum standard for sensitivity and specificity of 80% and 95%, respectively, for robust and safe diabetic retinopathy screening.18,19 Conventionally, this utilizes technicians to perform the primary grading, with secondary grading performed by more experienced screeners or clinicians, and arbitration grading performed by an ophthalmologist or a diabetologist with expertise in diabetic retinopathy screening. However, in selected screening programs, primary and secondary gradings are performed by trained opticians. Whilst data are available on the effectiveness of individual screening modalities,10–13,17–19 there is currently only one study that has looked at the interobserver agreement between primary graders and an expert grader.20 Information on the safety, effectiveness, and agreement between primary and secondary graders for images of patients undergoing routine diabetic eye screening in a community optometry-based retinopathy screening program has not yet been reported.

Materials and methods

The North Nottinghamshire diabetic retinopathy screening service has utilized an optometry-based model since April 2006 and involves 36 optometrists across 21 sites. Screening is undertaken by local optometrists, and two-field digital images of the retina are recorded in the database and graded. All models and makes of the retinal cameras in use, as well as their age, are approved based on criteria set by the NHS Diabetic Eye Screening Programme. Tropicamide 1% is used to dilate the pupils to an acceptable size for screening, which is performed according to a standard national screening protocol. Primary and secondary grading is carried out by optometrists on the digital retinal images, and a web-based referral to an ophthalmologist is required if there is disagreement between primary and secondary graders or if sight-threatening retinopathy is observed. For this study, data were collected retrospectively between January 2011 and December 2011 from a cohort of 8,977 patients registered in an optometry-based retinal screening program database currently in place in North Nottinghamshire. These patients were reviewed by optometrists who carried out digital retinal photography. Images were stored in a web-based database and graded according to the national screening standard.11 Grading levels were as follows: no retinopathy (R0), background retinopathy (R1), preproliferative retinopathy (R2), proliferative retinopathy (R3), and maculopathy (M1). Any retinopathy detected by a primary grader (R1, R2, M1) and 10% of images with no evidence of retinopathy (R0) was sent for secondary grading performed by another optometrist. If there was any disagreement between the primary and secondary grader, the images were sent to arbitration, which was performed by an ophthalmologist. The presence of proliferative retinopathy (R3) would require an urgent referral to ophthalmology. However, during 2011, due to an internal quality audit that was being undertaken, all patients with R1 were referred to the ophthalmologist for screening. Retinal images that were not gradable by the primary grader for reasons such as previous surgery or cataracts were referred directly to ophthalmology. Patients under ophthalmology follow-up were kept under ophthalmology review with follow-up appointments until their retinopathy was stable. The screening program also has in place a fail-safe mechanism (monitored by a fail-safe officer) whereby images of patients subsequently found to have R3 or have undergone photocoagulation therapy are traced back to see whether this was missed during screening on an ongoing basis. No R3 was being missed at screening during the period of this audit. Once the patients had stable retinopathy with no immediate intervention required, they were referred back into the local retinal screening recall process. We calculated the agreement between the primary and secondary grader as well as between individual graders and ophthalmologists by means of Kappa statistics.21 We also looked at the proportion of disagreement leading to an upgrading of the retinopathy level. Assessment of sensitivity and specificity values in this study was limited to images graded as R3, since all R3 are referred to an ophthalmologist for arbitration or a final grading. R3 grading from the primary grader was compared against the “gold standard” ophthalmological diagnosis. Sensitivity is calculated as the (number of true positives/true positives + false negatives) while specificity is calculated as the (number of true negatives/true negatives + false positives). This work is labeled as service evaluation. The audit work and data derived from this work are part of the program’s ongoing clinical governance exercise to maintain standards of retinopathy screening within the service. The statistical analysis was performed using SPSS version 14 software (SPSS Inc., Chicago, IL, USA).

Results

Of 8,977 patients (15,583 images), 734 patients were graded as R0 by the primary grader. Of these, 377 were graded as R0 by the secondary grader. This resulted in 51.4% agreement between the primary and secondary grader for patients graded as R0 at primary grading. The other 357 patients had no agreement between the primary and secondary grader. From these, 4.8% (n=17) were downgraded and 3.6% (n=13) were upgraded by ophthalmology (Table 1).

Table 1

Percentage of agreement, disagreement, upgrading, and downgrading of images in the North Nottingham screening program

	R0 (n=734)n (%)	R1 (n=7,784)n (%)	R2 (n=210)n (%)	R3 (n=249)n (%)
Agreement between primary and secondary grader	377 (51.4%)	6,204 (79.7%)	210 (100%)	249 (100%)
Agreement between primary grader and ophthalmology	Not evaluated	1,207 (15.5%)	78 (37.1%)	79 (31.7%)
Agreement between secondary grader and ophthalmology	Not evaluated	835 (10.7%)	78 (37.1%)	Not evaluated
Disagreement leading to downgrading by ophthalmologist	17 (4.8%)	Not evaluated	Not evaluated	113 (45.4%)
Disagreement leading to upgrading by ophthalmologist	13 (3.6%)	41 (2.6%)	Not evaluated	Not evaluated
Disagreement leading to upgrading to R3 by ophthalmologist	Not evaluated	13 (0.8%)	Not evaluated	Not evaluated

Notes: Using Kappa statistics to evaluate agreement between primary grader and ophthalmology for R3, the observed κ is 0.57 (95% confidence interval 0.46–0.61), ie, moderate agreement. Sensitivity and specificity for detecting R3 are 78.2% and 98.1%, respectively.

Abbreviations: R0, no retinopathy; R1, background retinopathy; R2, preproliferative retinopathy; R3, proliferative retinopathy.

Background retinopathy grading (R1) was given to 7,784 patients by the primary grader and 1,448 of these were graded by ophthalmology. The level of agreement between primary and secondary graders in this group was 79.7% (n=6,204). Among these patients, 15.5% (n=207) of agreement was reported between the primary grader and ophthalmology, while the agreement between the secondary grader and ophthalmology was 10.7% (n=835). For the proportion in which there was disagreement between the primary and secondary grader, 2.6% (n=41) were upgraded, of which 1% (n=16) were upgraded to R3 (Table 1). For the proportion in which there was disagreement between the primary and secondary grader, 0.8% (n=13) were downgraded to a different grade by ophthalmology (Table 1). Where patients were graded R2 (n=210) at primary grading, agreement between the primary and secondary grader was 100% (Table 1); 207 of the 210 that were graded as R2 by the primary grader were graded by the secondary grader as well as ophthalmology. This was due to an internal quality assurance audit that was taking place in 2011. Proliferative retinopathy (R3) was detected in 249 patients by the primary grader, but only 31.7% (79) of these were subsequently confirmed as R3 by ophthalmology. Of the total population screened (n=8,977), 8,728 were found not to have R3 by the primary grader, while 1,777 patients were confirmed by ophthalmology not to have R3. From these data, the sensitivity and specificity for R3 in our cohort is 78.2% and 98.1% (Table 1); 3.6% of normal (R0) and 2.6% of background retinopathy (R1) had a disagreement in grading, leading to an upgrading of retinopathy level by ophthalmology. Ten percent of images graded as R0 went through to ophthalmology for arbitration. Of these, there was no agreement between the primary and secondary grader, but there was 56.6% agreement between the primary grader and ophthalmology, and 36.6% agreement between the secondary grader and ophthalmology. We used Kappa statistics to evaluate the level of agreement between primary and secondary graders and between primary and arbitration graders for R0–R2. There was an observed kappa of 0.3223 (95% confidence interval 0.2937–0.3509) and 0.269 (95% confidence interval 0.216–0.321), respectively (Tables 2 and 3). The level of agreement between the primary grader and ophthalmology for R3 using Kappa statistics gives an observed kappa of 0.5667 (95% confidence interval 0.4557–0.6123).

Table 2

Agreement and disagreement for primary grader (horizontal axis) and secondary grader (vertical axis)

	R0	R1	R2
R0	17	185	6
R1	12	1,207	122
R2	0	36	78

Notes: Using Kappa statistics to evaluate overall level of agreement between primary and secondary graders for R0–R2, the observed κ is 0.3223 (95% confidence interval 0.2937–0.3509).

Abbreviations: R0, no retinopathy; R1, background retinopathy; R2, preproliferative retinopathy.

Table 3

Agreement and disagreement for primary grader (horizontal axis) and arbitration grader (vertical axis)

	R0	R1	R2
R0	377	1,107	0
R1	354	6,204	0
R2	3	261	210

Notes: Using kappa statistics to evaluate overall level of agreement, between primary and secondary graders for R0–R2, the observed κ is 0.269 (95% confidence interval 0.216–0.321).

Abbreviations: R0, no retinopathy; R1, background retinopathy; R2, preproliferative retinopathy.

Discussion

For a systematic screening program to be effective, it needs a database that is robust and well maintained. The system currently in place in North Nottinghamshire uses a central call/recall center with ongoing quality assurance taking place at all stages of the process. In addition to their professional qualification registered by the General Optical Council which regulates dispensing opticians and optometrists, all screeners/graders would have undertaken a certificate for diabetic retinopathy screening by City and Guilds, as well as undergoing a test training set mandated by the NHS Diabetic Eye Screening Programme. During the period of the audit, one test training set was performed by the opticians. However, data for the intergrader agreement based on this exercise were not available. Although the national program recommended only 10% of R0 to be secondarily screened, we performed an internal audit for the year 2009–2010, where all R0 underwent secondary grading as a result of a quality assurance exercise recommended by the NHS Retinopathy Screening Programme. No sight-threatening retinopathy (R2 or higher) was identified. The above study provides novel information on the safety and effectiveness of a community-based retinal screening program that uses optometrists at both the primary and secondary grader level compared with other optometry or nonoptometry-based programs that use senior graders, diabetologists, or ophthalmologists as secondary graders. Evidence for the effectiveness of screening is based on evidence of treatment efficacy especially after early detection and on cost-effectiveness. Comparing this screening program with the Exeter standards,18,19 ours achieved a specificity level above the expected 95% but the sensitivity level was marginally short of the recommended 80% threshold. Of note, the sensitivity data here refer to data analysis specific to R3 rather than data from the whole program. Moreover, it is conceivable that the slightly higher level of false-positives observed here reflects a slightly overcautious approach by optometrists to grading in patients with a higher likelihood of abnormalities in their eyes. In addition, image arbitration was performed by an ophthalmologist who may decide on the final “grade” based on clinical need for photocoagulation therapy rather than actual reporting of the images. Nevertheless, the importance of appropriate sensitivity and specificity for any screening modality has become more important in view of some recent evidence which may advocate for a different frequency of retinopathy screening for different individuals depending on the risk of retinopathy progression, based on baseline and/or previous screening results.24 Despite a high false-negative rate, none of the false negatives required urgent photocoagulation therapy, which reflects a subsequent “clinical” diagnosis by the ophthalmologist rather than a misdiagnosis by the optometrist. This has been confirmed by regular audit of our data based on the governance structure currently in place in our screening program. It was also reassuring to note that the levels of agreement between primary and secondary graders for higher levels of retinopathy (R2 and R3) were both 100%. For lower levels of retinopathy, ie, R0 and R1, agreement between primary and secondary graders were lower at 51.4% and 79.7%, respectively. Of these, 3.6% of normal (R0) and 2.6% of background (R1) retinopathy showed a disagreement in grading, leading to an upgrading of retinopathy level by ophthalmology, but none required photocoagulation therapy. Some limitations to this study needs to be highlighted. To calculate sensitivity and specificity, we analyzed data specific to R3 only. This was because only 10% of R0 and some of R1 and R2 were referred to ophthalmology, whereas all R3 were referred to an independent ophthalmologist. Because of this, we were unable to look at the sensitivity and specificity for the whole cohort, which affects the results reported in our study. We used the ophthalmologist grade as the gold standard, so it would be important to have all retinopathy graded as R2 by the primary grader reviewed by ophthalmology to ensure that none of these would need to be upgraded to R3, which would mean they will need ophthalmology follow-up and potential treatment. The study was carried out by retrospective data collection, which would also be considered as a limitation, due to the presence of confounding biases. We were also not able to reliably determine results for maculopathy within our program. Further, we were not able to accurately adjust results for ungradable images, due to poor patient compliance with the screening protocol, poor mydriasis, or other factors. Interpretation of the results is limited to this program and cannot necessarily be generalized to other programs. Lastly, although Kappa statistics is a recognized method for assessment of agreement, the magnitude of kappa reflecting adequate agreement is unclear. However, arbitrary guidelines are available to indicate level of agreement, although these are not evidence-based. Generally, however, it is accepted that a kappa score >80% would suggest very good agreement.25,26 Despite this, due to methodological limitations of other research in this area, and due to a lack of data and evidence of optometrists as primary and secondary graders in detecting R3 in a retinopathy screening program, we believe data from this study would enhance available knowledge concerning the safety and effectiveness of an optometry community-based retinopathy screening program. There is no clear evidence suggesting who has the best sensitivity and specificity for detecting sight-threatening retinopathy, ie, whether it is independent graders, optometrists, diabetologists, general practitioners, or ophthalmologists. A single study showed that retinal photographs assessed by optometrists could achieve >91% sensitivity in detecting R3 or sight-threatening retinopathy.20 Data on the effectiveness of individual screening modalities are widely available.13,17,19,23 However, our study provides unique data on the safety, effectiveness, and agreement between primary and secondary graders for images of patients undergoing routine diabetes eye screening in a community optometry-based retinopathy screening program.

21 in total

1. Diabetic retinopathy screening.

Authors: D R Owens; R L Gibbins; E Kohner; G M Grimshaw; R Greenwood; S Harding
Journal: Diabet Med Date: 2000-07 Impact factor: 4.359

2. The evaluation of screening policies for diabetic retinopathy using simulation.

Authors: R Davies; P Roderick; C Canning; S Brailsford
Journal: Diabet Med Date: 2002-09 Impact factor: 4.359

3. External quality assurance for image grading in the Scottish Diabetic Retinopathy Screening Programme.

Authors: K A Goatman; S Philip; A D Fleming; R D Harvey; K K Swa; C Styles; M Black; G Sell; N Lee; P F Sharp; J A Olson
Journal: Diabet Med Date: 2012-06 Impact factor: 4.359

Review 4. Screening and prevention of diabetic blindness.

Authors: E Stefánsson; T Bek; M Porta; N Larsen; J K Kristinsson; E Agardh
Journal: Acta Ophthalmol Scand Date: 2000-08

5. Cost effectiveness analysis of screening for sight threatening diabetic eye disease.

Authors: M James; D A Turner; D M Broadbent; J Vora; S P Harding
Journal: BMJ Date: 2000-06-17

6. Agreement and reasons for disagreement between photographic and hospital biomicroscopy grading of diabetic retinopathy.

Authors: A Sallam; P H Scanlon; I M Stratton; V Jones; C N Martin; M Brelen; R L Johnston
Journal: Diabet Med Date: 2011-06 Impact factor: 4.359

7. Interobserver agreement between primary graders and an expert grader in the Bristol and Weston diabetic retinopathy screening programme: a quality assurance audit.

Authors: S Patra; E M W Gomm; M Macipe; C Bailey
Journal: Diabet Med Date: 2009-08 Impact factor: 4.359

8. Grading and disease management in national screening for diabetic retinopathy in England and Wales.

Authors: S Harding; R Greenwood; S Aldington; J Gibson; D Owens; R Taylor; E Kohner; P Scanlon; G Leese
Journal: Diabet Med Date: 2003-12 Impact factor: 4.359

9. A comparative evaluation of digital imaging, retinal photography and optometrist examination in screening for diabetic retinopathy.

Authors: J A Olson; F M Strachan; J H Hipwell; K A Goatman; K C McHardy; J V Forrester; P F Sharp
Journal: Diabet Med Date: 2003-07 Impact factor: 4.359

10. A simple risk stratification for time to development of sight-threatening diabetic retinopathy.

Authors: Irene M Stratton; Stephen J Aldington; David J Taylor; Amanda I Adler; Peter H Scanlon
Journal: Diabetes Care Date: 2012-11-12 Impact factor: 19.112

6 in total

1. Telemedicine and Diabetic Retinopathy: Review of Published Screening Programs.

Authors: Kevin Tozer; Maria A Woodward; Paula A Newman-Casey
Journal: J Endocrinol Diabetes Date: 2015-11-11

2. Automated machine learning-based classification of proliferative and non-proliferative diabetic retinopathy using optical coherence tomography angiography vascular density maps.

Authors: Elias Khalili Pour; Khosro Rezaee; Hossein Azimi; Seyed Mohammad Mirshahvalad; Behzad Jafari; Kaveh Fadakar; Hooshang Faghihi; Ahmad Mirshahi; Fariba Ghassemi; Nazanin Ebrahimiadib; Masoud Mirghorbani; Fatemeh Bazvand; Hamid Riazi-Esfahani; Mohammad Riazi Esfahani
Journal: Graefes Arch Clin Exp Ophthalmol Date: 2022-09-02 Impact factor: 3.535

3. A Deep Learning Algorithm for Classifying Diabetic Retinopathy Using Optical Coherence Tomography Angiography.

Authors: Gahyung Ryu; Kyungmin Lee; Donggeun Park; Inhye Kim; Sang Hyun Park; Min Sagong
Journal: Transl Vis Sci Technol Date: 2022-02-01 Impact factor: 3.048

4. A Deep Learning Framework for Earlier Prediction of Diabetic Retinopathy from Fundus Photographs.

Authors: K Gunasekaran; R Pitchai; Gogineni Krishna Chaitanya; D Selvaraj; S Annie Sheryl; Hesham S Almoallim; Sulaiman Ali Alharbi; S S Raghavan; Belachew Girma Tesemma
Journal: Biomed Res Int Date: 2022-06-07 Impact factor: 3.246

5. A deep learning model for identifying diabetic retinopathy using optical coherence tomography angiography.

Authors: Gahyung Ryu; Kyungmin Lee; Donggeun Park; Sang Hyun Park; Min Sagong
Journal: Sci Rep Date: 2021-11-26 Impact factor: 4.379

6. Transforming Retinal Photographs to Entropy Images in Deep Learning to Improve Automated Detection for Diabetic Retinopathy.

Authors: Gen-Min Lin; Mei-Juan Chen; Chia-Hung Yeh; Yu-Yang Lin; Heng-Yu Kuo; Min-Hui Lin; Ming-Chin Chen; Shinfeng D Lin; Ying Gao; Anran Ran; Carol Y Cheung
Journal: J Ophthalmol Date: 2018-09-10 Impact factor: 1.909

6 in total