| Literature DB >> 35071603 |
Jui-En Lo1,2, Eugene Yu-Chuan Kang3,4,5, Yun-Nung Chen2, Yi-Ting Hsieh6, Nan-Kai Wang7, Ta-Ching Chen6,8, Kuan-Jen Chen3,4, Wei-Chi Wu3,4, Yih-Shiou Hwang3,4,9,10, Fu-Sung Lo4,11, Chi-Chun Lai3,4,12.
Abstract
This study is aimed at evaluating a deep transfer learning-based model for identifying diabetic retinopathy (DR) that was trained using a dataset with high variability and predominant type 2 diabetes (T2D) and comparing model performance with that in patients with type 1 diabetes (T1D). The Kaggle dataset, which is a publicly available dataset, was divided into training and testing Kaggle datasets. In the comparison dataset, we collected retinal fundus images of T1D patients at Chang Gung Memorial Hospital in Taiwan from 2013 to 2020, and the images were divided into training and testing T1D datasets. The model was developed using 4 different convolutional neural networks (Inception-V3, DenseNet-121, VGG1, and Xception). The model performance in predicting DR was evaluated using testing images from each dataset, and area under the curve (AUC), sensitivity, and specificity were calculated. The model trained using the Kaggle dataset had an average (range) AUC of 0.74 (0.03) and 0.87 (0.01) in the testing Kaggle and T1D datasets, respectively. The model trained using the T1D dataset had an AUC of 0.88 (0.03), which decreased to 0.57 (0.02) in the testing Kaggle dataset. Heatmaps showed that the model focused on retinal hemorrhage, vessels, and exudation to predict DR. In wrong prediction images, artifacts and low-image quality affected model performance. The model developed with the high variability and T2D predominant dataset could be applied to T1D patients. Dataset homogeneity could affect the performance, trainability, and generalization of the model.Entities:
Mesh:
Year: 2021 PMID: 35071603 PMCID: PMC8776492 DOI: 10.1155/2021/2751695
Source DB: PubMed Journal: J Diabetes Res Impact factor: 4.011
Figure 1Fundus image after (a) cropping and (b) normalization.
Figure 2Schematic of the development and evaluation of models. Two groups of models were trained using the T1D and Kaggle training sets, and they were tested with both the T1D and Kaggle testing sets.
Summary of the prediction performance of different transfer learning models in predicting diabetic retinopathy.
| Trained on Kaggle training set | Trained on T1D training set | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Tested on T1D testing set | Tested on Kaggle testing set | Tested on T1D testing set | Tested on Kaggle testing set | |||||||||
| AUC | SEN | SPE | AUC | SEN | SPE | AUC | SEN | SPE | AUC | SEN | SPE | |
| DenseNet-121 | 0.86 | 0.77 | 0.79 | 0.74 | 0.67 | 0.71 | 0.91 | 0.81 | 0.86 | 0.55 | 0.55 | 0.54 |
| InceptionV3 | 0.86 | 0.74 | 0.79 | 0.74 | 0.62 | 0.74 | 0.87 | 0.73 | 0.86 | 0.59 | 0.56 | 0.59 |
| VGG16 | 0.88 | 0.78 | 0.82 | 0.77 | 0.66 | 0.75 | 0.84 | 0.67 | 0.84 | 0.54 | 0.59 | 0.49 |
| Xception | 0.86 | 0.74 | 0.82 | 0.71 | 0.60 | 0.72 | 0.88 | 0.74 | 0.90 | 0.59 | 0.61 | 0.52 |
T1D: type 1 diabetes; AUC: area under the curve; SEN: sensitivity; SPE: specificity.
Figure 3Receiver operating characteristic (ROC) curves of different transfer learning models in predicting diabetic retinopathy. The ROC curve of models that were tested with the type 1 diabetes (T1D) testing set was plotted in blue, whereas those tested with the Kaggle testing set were plotted in orange. The point on the ROC curve was the selected threshold. (e)–(h) There was a significant decrease in AUC when models previously trained with the T1D training set were tested with the Kaggle dataset. (i)–(l) The models that were previously trained with the Kaggle training set have a more robust performance when tested with the T1D testing set.
Figure 4The images demonstrate the original (a) and superimposed Grad-CAM activation maps ((b)–(i)) of the selected diabetic retinopathy (DR) color fundus image. All models gave a true-positive prediction. There were some similarities in activation maps even in different transfer learning models trained with different datasets.
Figure 5The images present the original (a) and superimposed Grad-CAM activation maps ((b)–(i)) of the selected normal color fundus image. All models gave a true-negative prediction. There was a high variation in the activation map when given a normal fundus image. Some models focus on the optic disc ((e) and (g)), whereas others highlight the retinal vessels ((b), (c), and (g)), or macular region ((d), (h), (f), and (i)).
Figure 6Images in the Kaggle dataset with wrong prediction. (a) False-negative in an image with foggy view and retinal laser scar. (b) False-negative in an image with poor illumination. (c) False-negative in an image with reflective spots and shadows. (d) False-positive in an image with overexposure and halo. (e) False-positive in an image with underexposure and halo. (f) False-positive in an image with exudates caused by age-related macular degeneration.