Literature DB >> 29548646

Grader Variability and the Importance of Reference Standards for Evaluating Machine Learning Models for Diabetic Retinopathy.

Jonathan Krause1, Varun Gulshan1, Ehsan Rahimy2, Peter Karth3, Kasumi Widner1, Greg S Corrado1, Lily Peng4, Dale R Webster1.   

Abstract

PURPOSE: Use adjudication to quantify errors in diabetic retinopathy (DR) grading based on individual graders and majority decision, and to train an improved automated algorithm for DR grading.
DESIGN: Retrospective analysis. PARTICIPANTS: Retinal fundus images from DR screening programs.
METHODS: Images were each graded by the algorithm, U.S. board-certified ophthalmologists, and retinal specialists. The adjudicated consensus of the retinal specialists served as the reference standard. MAIN OUTCOME MEASURES: For agreement between different graders as well as between the graders and the algorithm, we measured the (quadratic-weighted) kappa score. To compare the performance of different forms of manual grading and the algorithm for various DR severity cutoffs (e.g., mild or worse DR, moderate or worse DR), we measured area under the curve (AUC), sensitivity, and specificity.
RESULTS: Of the 193 discrepancies between adjudication by retinal specialists and majority decision of ophthalmologists, the most common were missing microaneurysm (MAs) (36%), artifacts (20%), and misclassified hemorrhages (16%). Relative to the reference standard, the kappa for individual retinal specialists, ophthalmologists, and algorithm ranged from 0.82 to 0.91, 0.80 to 0.84, and 0.84, respectively. For moderate or worse DR, the majority decision of ophthalmologists had a sensitivity of 0.838 and specificity of 0.981. The algorithm had a sensitivity of 0.971, specificity of 0.923, and AUC of 0.986. For mild or worse DR, the algorithm had a sensitivity of 0.970, specificity of 0.917, and AUC of 0.986. By using a small number of adjudicated consensus grades as a tuning dataset and higher-resolution images as input, the algorithm improved in AUC from 0.934 to 0.986 for moderate or worse DR.
CONCLUSIONS: Adjudication reduces the errors in DR grading. A small set of adjudicated DR grades allows substantial improvements in algorithm performance. The resulting algorithm's performance was on par with that of individual U.S. Board-Certified ophthalmologists and retinal specialists.
Copyright © 2018 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.

Entities:  

Mesh:

Year:  2018        PMID: 29548646     DOI: 10.1016/j.ophtha.2018.01.034

Source DB:  PubMed          Journal:  Ophthalmology        ISSN: 0161-6420            Impact factor:   12.079


  78 in total

Review 1.  Update on Screening for Sight-Threatening Diabetic Retinopathy.

Authors:  Peter H Scanlon
Journal:  Ophthalmic Res       Date:  2019-05-27       Impact factor: 2.892

2.  Ensuring Fairness in Machine Learning to Advance Health Equity.

Authors:  Alvin Rajkomar; Michaela Hardt; Michael D Howell; Greg Corrado; Marshall H Chin
Journal:  Ann Intern Med       Date:  2018-12-04       Impact factor: 25.391

3.  Artificial Intelligence Screening for Diabetic Retinopathy: the Real-World Emerging Application.

Authors:  Valentina Bellemo; Gilbert Lim; Tyler Hyungtaek Rim; Gavin S W Tan; Carol Y Cheung; SriniVas Sadda; Ming-Guang He; Adnan Tufail; Mong Li Lee; Wynne Hsu; Daniel Shu Wei Ting
Journal:  Curr Diab Rep       Date:  2019-07-31       Impact factor: 4.810

Review 4.  How to read and review papers on machine learning and artificial intelligence in radiology: a survival guide to key methodological concepts.

Authors:  Burak Kocak; Ece Ates Kus; Ozgur Kilickesmez
Journal:  Eur Radiol       Date:  2020-10-01       Impact factor: 5.315

Review 5.  [Potential of methods of artificial intelligence for quality assurance].

Authors:  Philipp Berens; Sebastian M Waldstein; Murat Seckin Ayhan; Louis Kümmerle; Hansjürgen Agostini; Andreas Stahl; Focke Ziemssen
Journal:  Ophthalmologe       Date:  2020-04       Impact factor: 1.059

6.  DeepSeeNet: A Deep Learning Model for Automated Classification of Patient-based Age-related Macular Degeneration Severity from Color Fundus Photographs.

Authors:  Yifan Peng; Shazia Dharssi; Qingyu Chen; Tiarnan D Keenan; Elvira Agrón; Wai T Wong; Emily Y Chew; Zhiyong Lu
Journal:  Ophthalmology       Date:  2018-11-22       Impact factor: 12.079

Review 7.  Application of artificial intelligence in ophthalmology.

Authors:  Xue-Li Du; Wen-Bo Li; Bo-Jie Hu
Journal:  Int J Ophthalmol       Date:  2018-09-18       Impact factor: 1.779

8.  A deep learning system for identifying lattice degeneration and retinal breaks using ultra-widefield fundus images.

Authors:  Zhongwen Li; Chong Guo; Danyao Nie; Duoru Lin; Yi Zhu; Chuan Chen; Li Zhang; Fabao Xu; Chenjin Jin; Xiayin Zhang; Hui Xiao; Kai Zhang; Lanqin Zhao; Shanshan Yu; Guoming Zhang; Jiantao Wang; Haotian Lin
Journal:  Ann Transl Med       Date:  2019-11

Review 9.  Fundus photograph-based deep learning algorithms in detecting diabetic retinopathy.

Authors:  Rajiv Raman; Sangeetha Srinivasan; Sunny Virmani; Sobha Sivaprasad; Chetan Rao; Ramachandran Rajalakshmi
Journal:  Eye (Lond)       Date:  2018-11-06       Impact factor: 3.775

10.  Predicting conversion to wet age-related macular degeneration using deep learning.

Authors:  Jason Yim; Reena Chopra; Terry Spitz; Jim Winkens; Annette Obika; Christopher Kelly; Harry Askham; Marko Lukic; Josef Huemer; Katrin Fasler; Gabriella Moraes; Clemens Meyer; Marc Wilson; Jonathan Dixon; Cian Hughes; Geraint Rees; Peng T Khaw; Alan Karthikesalingam; Dominic King; Demis Hassabis; Mustafa Suleyman; Trevor Back; Joseph R Ledsam; Pearse A Keane; Jeffrey De Fauw
Journal:  Nat Med       Date:  2020-05-18       Impact factor: 53.440

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.