OBJECTIVE: To evaluate the performance of a system for automated detection of diabetic retinopathy in digital retinal photographs, built from published algorithms, in a large, representative, screening population. RESEARCH DESIGN AND METHODS: We conducted a retrospective analysis of 10,000 consecutive patient visits, specifically exams (four retinal photographs, two left and two right) from 5,692 unique patients from the EyeCheck diabetic retinopathy screening project imaged with three types of cameras at 10 centers. Inclusion criteria included no previous diagnosis of diabetic retinopathy, no previous visit to ophthalmologist for dilated eye exam, and both eyes photographed. One of three retinal specialists evaluated each exam as unacceptable quality, no referable retinopathy, or referable retinopathy. We then selected exams with sufficient image quality and determined presence or absence of referable retinopathy. Outcome measures included area under the receiver operating characteristic curve (number needed to miss one case [NNM]) and type of false negative. RESULTS: Total area under the receiver operating characteristic curve was 0.84, and NNM was 80 at a sensitivity of 0.84 and a specificity of 0.64. At this point, 7,689 of 10,000 exams had sufficient image quality, 4,648 of 7,689 (60%) were true negatives, 59 of 7,689 (0.8%) were false negatives, 319 of 7,689 (4%) were true positives, and 2,581 of 7,689 (33%) were false positives. Twenty-seven percent of false negatives contained large hemorrhages and/or neovascularizations. CONCLUSIONS: Automated detection of diabetic retinopathy using published algorithms cannot yet be recommended for clinical practice. However, performance is such that evaluation on validated, publicly available datasets should be pursued. If algorithms can be improved, such a system may in the future lead to improved prevention of blindness and vision loss in patients with diabetes.
OBJECTIVE: To evaluate the performance of a system for automated detection of diabetic retinopathy in digital retinal photographs, built from published algorithms, in a large, representative, screening population. RESEARCH DESIGN AND METHODS: We conducted a retrospective analysis of 10,000 consecutive patient visits, specifically exams (four retinal photographs, two left and two right) from 5,692 unique patients from the EyeCheck diabetic retinopathy screening project imaged with three types of cameras at 10 centers. Inclusion criteria included no previous diagnosis of diabetic retinopathy, no previous visit to ophthalmologist for dilated eye exam, and both eyes photographed. One of three retinal specialists evaluated each exam as unacceptable quality, no referable retinopathy, or referable retinopathy. We then selected exams with sufficient image quality and determined presence or absence of referable retinopathy. Outcome measures included area under the receiver operating characteristic curve (number needed to miss one case [NNM]) and type of false negative. RESULTS: Total area under the receiver operating characteristic curve was 0.84, and NNM was 80 at a sensitivity of 0.84 and a specificity of 0.64. At this point, 7,689 of 10,000 exams had sufficient image quality, 4,648 of 7,689 (60%) were true negatives, 59 of 7,689 (0.8%) were false negatives, 319 of 7,689 (4%) were true positives, and 2,581 of 7,689 (33%) were false positives. Twenty-seven percent of false negatives contained large hemorrhages and/or neovascularizations. CONCLUSIONS: Automated detection of diabetic retinopathy using published algorithms cannot yet be recommended for clinical practice. However, performance is such that evaluation on validated, publicly available datasets should be pursued. If algorithms can be improved, such a system may in the future lead to improved prevention of blindness and vision loss in patients with diabetes.
We thank Olson et al. (1) for their close reading of our recent study (2), where we examined sensitivity and specificity of automated diabetic retinopathy detection and demonstrated an area under the receiver operating characteristic curve of 0.87. A limited, 500-patient sample of all 10,000 photographic exams was examined by multiple, masked experts. We felt uncomfortable recommending a system for clinical practice for which patient safety compared to an accepted (gold) standard could not be established, concluding that it should be tested against widely accepted clinical standards, if practical. We have recently presented studies of an improved algorithm on a new, larger dataset of 15,000 exams with an area under the curve of 0.90 (3). Most of these results support the work of Olson and colleagues (4), although the fact that their system only detects small hemorrhages and microaneurysms is a serious limitation in our view, and sensitivity and specificity based upon the photographic interpretation by a single reader is unlikely to become widely accepted.Failure to detect large, rare, and/or advanced lesions deserves disproportionate attention. If a patient with isolated neovascularization of the disc, <1:5,000 in our series, were to be missed by a system but would not have been missed by a person, that is a failure likely to lead to vision loss or blindness for that patient, potential litigation, and a backlash against implementation of automated detection.Groups translating automated diabetic retinopathy detection into clinical practice operate in environments that differ on regulatory, legal, budgetary, and reimbursement aspects, but we disagree that “a recommendation against automated grading is only valid if it is shown that there is a higher performing and readily alternative methodology” (1). The currently established practice is human expert reading, and the burden of proof is therefore on the new system to be introduced, which is automated reading. For automated reading to gain widespread acceptance, no shortcuts regarding safety concerns will likely be permitted by regulatory agencies, payers, and patients.One study's entry criteria may be perceived as another's selection bias. The target population of the EyeCheck project consists of patients who had not been previously identified to have diabetic retinopathy. In most settings, as patients are identified with diabetic retinopathy they are referred for evaluation or treatment, removing them from the screened population. To establish any other inclusion criteria would have constituted selection, affecting the potential application of this data to current clinical practice.The potential positive effect of camera resolution on algorithmic performance is intriguing, although with less costly cameras presently offering at least 1,024 × 1,024 pixels, this debate may be self-limiting. We believe that comparison of algorithms to standardized datasets (http://roc.healthcare.uiowa.edu) as well as to the gold standard are required and should include: 1) demonstration in a prospective multicenter study of similar or better detection; on populations with defined race and ethnicity distributions, 2) acceptable comparison of detection to standard multifield stereo photographs read according to the Early Treatment of Diabetic Retinopathy Study standard; and 3) sensitivity/specificity analysis with standard and severity-weighted receiver operating characteristic curves.In summary, we agree that automated detection of diabetic retinopathy can make the prevention of blindness and vision loss objective, more accessible, and more cost-effective, provided safety issues are not overlooked.
Authors: S Philip; A D Fleming; K A Goatman; S Fonseca; P McNamee; G S Scotland; G J Prescott; P F Sharp; J A Olson Journal: Br J Ophthalmol Date: 2007-05-15 Impact factor: 4.638
Authors: Michael D Abràmoff; Meindert Niemeijer; Maria S A Suttorp-Schulten; Max A Viergever; Stephen R Russell; Bram van Ginneken Journal: Diabetes Care Date: 2007-11-16 Impact factor: 19.112
Authors: D Marin; M E Gegundez-Arias; B Ponte; F Alvarez; J Garrido; C Ortega; M J Vasallo; J M Bravo Journal: Med Biol Eng Comput Date: 2018-01-10 Impact factor: 2.602
Authors: Yaqin Li; Thomas P Karnowski; Kenneth W Tobin; Luca Giancardo; Scott Morris; Sylvia E Sparrow; Seema Garg; Karen Fox; Edward Chaum Journal: Telemed J E Health Date: 2011-08-05 Impact factor: 3.536
Authors: Michael D Abràmoff; Joseph M Reinhardt; Stephen R Russell; James C Folk; Vinit B Mahajan; Meindert Niemeijer; Gwénolé Quellec Journal: Ophthalmology Date: 2010-06 Impact factor: 12.079