Jimmy S Chen1, Aaron S Coyner2, Susan Ostmo1, Kemal Sonmez3, Sanyam Bajimaya4, Eli Pradhan4, Nita Valikodath5, Emily D Cole5, Tala Al-Khaled5, R V Paul Chan5, Praveer Singh6, Jayashree Kalpathy-Cramer6, Michael F Chiang7, J Peter Campbell8. 1. Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon. 2. Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon. 3. Cancer Early Detection Advanced Research Center, Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon. 4. Tilganga Institute of Ophthalmology, Kathmandu, Nepal. 5. Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois at Chicago, Chicago, Illinois. 6. Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, Massachusetts; Center for Clinical Data Science, Massachusetts General Hospital and Brigham and Women's Hospital, Boston, Massachusetts. 7. Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon; Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon. 8. Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon. Electronic address: campbelp@ohsu.edu.
Abstract
PURPOSE: Stage is an important feature to identify in retinal images of infants at risk of retinopathy of prematurity (ROP). The purpose of this study was to implement a convolutional neural network (CNN) for binary detection of stages 1, 2, and 3 in ROP and to evaluate its generalizability across different populations and camera systems. DESIGN: Diagnostic validation study of CNN for stage detection. PARTICIPANTS: Retinal fundus images obtained from preterm infants during routine ROP screenings. METHODS: Two datasets were used: 5943 fundus images obtained by RetCam camera (Natus Medical, Pleasanton, CA) from 9 North American institutions and 5049 images obtained by 3nethra camera (Forus Health Incorporated, Bengaluru, India) from 4 hospitals in Nepal. Images were labeled based on the presence of stage by 1 to 3 expert graders. Three CNN models were trained using 5-fold cross-validation on datasets from North America alone, Nepal alone, and a combined dataset and were evaluated on 2 held-out test sets consisting of 708 and 247 images from the Nepali and North American datasets, respectively. MAIN OUTCOME MEASURES: Convolutional neural network performance was evaluated using area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), sensitivity, and specificity. RESULTS: Both the North American- and Nepali-trained models demonstrated high performance on a test set from the same population: AUROC, 0.99; AUPRC, 0.98; sensitivity, 94%; and AUROC, 0.97; AUPRC, 0.91; and sensitivity, 73%; respectively. However, the performance of each model decreased to AUROC of 0.96 and AUPRC of 0.88 (sensitivity, 52%) and AUROC of 0.62 and AUPRC of 0.36 (sensitivity, 44%) when evaluated on a test set from the other population. Compared with the models trained on individual datasets, the model trained on a combined dataset achieved improved performance on each respective test set: sensitivity improved from 94% to 98% on the North American test set and from 73% to 82% on the Nepali test set. CONCLUSIONS: A CNN can identify accurately the presence of ROP stage in retinal images, but performance depends on the similarity between training and testing populations. We demonstrated that internal and external performance can be improved by increasing the heterogeneity of the training dataset features of the training dataset, in this case by combining images from different populations and cameras.
PURPOSE: Stage is an important feature to identify in retinal images of infants at risk of retinopathy of prematurity (ROP). The purpose of this study was to implement a convolutional neural network (CNN) for binary detection of stages 1, 2, and 3 in ROP and to evaluate its generalizability across different populations and camera systems. DESIGN: Diagnostic validation study of CNN for stage detection. PARTICIPANTS: Retinal fundus images obtained from preterm infants during routine ROP screenings. METHODS: Two datasets were used: 5943 fundus images obtained by RetCam camera (Natus Medical, Pleasanton, CA) from 9 North American institutions and 5049 images obtained by 3nethra camera (Forus Health Incorporated, Bengaluru, India) from 4 hospitals in Nepal. Images were labeled based on the presence of stage by 1 to 3 expert graders. Three CNN models were trained using 5-fold cross-validation on datasets from North America alone, Nepal alone, and a combined dataset and were evaluated on 2 held-out test sets consisting of 708 and 247 images from the Nepali and North American datasets, respectively. MAIN OUTCOME MEASURES: Convolutional neural network performance was evaluated using area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), sensitivity, and specificity. RESULTS: Both the North American- and Nepali-trained models demonstrated high performance on a test set from the same population: AUROC, 0.99; AUPRC, 0.98; sensitivity, 94%; and AUROC, 0.97; AUPRC, 0.91; and sensitivity, 73%; respectively. However, the performance of each model decreased to AUROC of 0.96 and AUPRC of 0.88 (sensitivity, 52%) and AUROC of 0.62 and AUPRC of 0.36 (sensitivity, 44%) when evaluated on a test set from the other population. Compared with the models trained on individual datasets, the model trained on a combined dataset achieved improved performance on each respective test set: sensitivity improved from 94% to 98% on the North American test set and from 73% to 82% on the Nepali test set. CONCLUSIONS: A CNN can identify accurately the presence of ROP stage in retinal images, but performance depends on the similarity between training and testing populations. We demonstrated that internal and external performance can be improved by increasing the heterogeneity of the training dataset features of the training dataset, in this case by combining images from different populations and cameras.
Authors: Hanruo Liu; Liu Li; I Michael Wormstone; Chunyan Qiao; Chun Zhang; Ping Liu; Shuning Li; Huaizhou Wang; Dapeng Mou; Ruiqi Pang; Diya Yang; Linda M Zangwill; Sasan Moghimi; Huiyuan Hou; Christopher Bowd; Lai Jiang; Yihan Chen; Man Hu; Yongli Xu; Hong Kang; Xin Ji; Robert Chang; Clement Tham; Carol Cheung; Daniel Shu Wei Ting; Tien Yin Wong; Zulin Wang; Robert N Weinreb; Mai Xu; Ningli Wang Journal: JAMA Ophthalmol Date: 2019-12-01 Impact factor: 7.389
Authors: Tapan P Patel; Michael T Aaberg; Yannis M Paulus; Philip Lieu; Vaidehi S Dedania; Cynthia X Qian; Cagri G Besirli; Todd Margolis; Daniel A Fletcher; Tyson N Kim Journal: Graefes Arch Clin Exp Ophthalmol Date: 2019-09-09 Impact factor: 3.117
Authors: Philippe M Burlina; Neil Joshi; Katia D Pacheco; David E Freund; Jun Kong; Neil M Bressler Journal: JAMA Ophthalmol Date: 2018-12-01 Impact factor: 7.389
Authors: P Chang; J Grinband; B D Weinberg; M Bardis; M Khy; G Cadena; M-Y Su; S Cha; C G Filippi; D Bota; P Baldi; L M Poisson; R Jain; D Chow Journal: AJNR Am J Neuroradiol Date: 2018-05-10 Impact factor: 3.825
Authors: Jeffrey De Fauw; Joseph R Ledsam; Bernardino Romera-Paredes; Stanislav Nikolov; Nenad Tomasev; Sam Blackwell; Harry Askham; Xavier Glorot; Brendan O'Donoghue; Daniel Visentin; George van den Driessche; Balaji Lakshminarayanan; Clemens Meyer; Faith Mackinder; Simon Bouton; Kareem Ayoub; Reena Chopra; Dominic King; Alan Karthikesalingam; Cían O Hughes; Rosalind Raine; Julian Hughes; Dawn A Sim; Catherine Egan; Adnan Tufail; Hugh Montgomery; Demis Hassabis; Geraint Rees; Trevor Back; Peng T Khaw; Mustafa Suleyman; Julien Cornebise; Pearse A Keane; Olaf Ronneberger Journal: Nat Med Date: 2018-08-13 Impact factor: 53.440
Authors: Ali H Al-Timemy; Zahraa M Mosa; Zaid Alyasseri; Alexandru Lavric; Marcelo M Lui; Rossen M Hazarbassanov; Siamak Yousefi Journal: Transl Vis Sci Technol Date: 2021-12-01 Impact factor: 3.283
Authors: Kanan T Desai; Brian Befano; Zhiyun Xue; Helen Kelly; Nicole G Campos; Didem Egemen; Julia C Gage; Ana-Cecilia Rodriguez; Vikrant Sahasrabuddhe; David Levitz; Paul Pearlman; Jose Jeronimo; Sameer Antani; Mark Schiffman; Silvia de Sanjosé Journal: Int J Cancer Date: 2021-12-06 Impact factor: 7.316
Authors: Jimmy S Chen; Aaron S Coyner; R V Paul Chan; M Elizabeth Hartnett; Darius M Moshfeghi; Leah A Owen; Jayashree Kalpathy-Cramer; Michael F Chiang; J Peter Campbell Journal: Ophthalmol Sci Date: 2021-11-16
Authors: Aaron S Coyner; Jimmy S Chen; Ken Chang; Praveer Singh; Susan Ostmo; R V Paul Chan; Michael F Chiang; Jayashree Kalpathy-Cramer; J Peter Campbell Journal: Ophthalmol Sci Date: 2022-02-11