| Literature DB >> 35740336 |
Yu-Chieh Ko1,2, Wei-Shiang Chen3, Hung-Hsun Chen4, Tsui-Kang Hsu5, Ying-Chi Chen6, Catherine Jui-Ling Liu1,2, Henry Horng-Shing Lu3.
Abstract
Automated glaucoma detection using deep learning may increase the diagnostic rate of glaucoma to prevent blindness, but generalizable models are currently unavailable despite the use of huge training datasets. This study aims to evaluate the performance of a convolutional neural network (CNN) classifier trained with a limited number of high-quality fundus images in detecting glaucoma and methods to improve its performance across different datasets. A CNN classifier was constructed using EfficientNet B3 and 944 images collected from one medical center (core model) and externally validated using three datasets. The performance of the core model was compared with (1) the integrated model constructed by using all training images from the four datasets and (2) the dataset-specific model built by fine-tuning the core model with training images from the external datasets. The diagnostic accuracy of the core model was 95.62% but dropped to ranges of 52.5-80.0% on the external datasets. Dataset-specific models exhibited superior diagnostic performance on the external datasets compared to other models, with a diagnostic accuracy of 87.50-92.5%. The findings suggest that dataset-specific tuning of the core CNN classifier effectively improves its applicability across different datasets when increasing training images fails to achieve generalization.Entities:
Keywords: deep learning; diagnosis; fundus photograph; glaucoma
Year: 2022 PMID: 35740336 PMCID: PMC9219722 DOI: 10.3390/biomedicines10061314
Source DB: PubMed Journal: Biomedicines ISSN: 2227-9059
Figure 1Study framework. The fundus images were first preprocessed with data augmentation before being forwarded to the EfficientNet B3 algorithm to build the deep learning classifier. (A) Taipei Veterans General Hospital (TVGH) model was built using images from TVGH dataset and then fine-tuned as dataset-specific model using the weight of the TVGH model and new training images from specific external dataset to improve the diagnostic performance on the specific dataset. (B) Integrated model was built using combined images from TVGH dataset and all the external datasets.
Characteristics of the datasets used in this Study.
| Dataset | Numbers (Glaucoma) | Image Size (Pixels) | Field of View (Center) | Camera | Origin |
|---|---|---|---|---|---|
| TVGH | 944 (479) | 3888 × 2552 | 30° (macula) | Canon CX-1, CR-2, DGi | Taiwan |
| CHGH | 158 (78) | 3216 × 2136 | 30° (macula) | Topcon TRC-NW8F | Taiwan |
| DRISHTI-GS1 | 101 (70) | 2047 × 1760 | 30° (disc) | Zeiss Visucam NM/FA | India |
| RIM-ONE r2 | 455 (200) | Not fixed | Cropped (disc) | Nidek AFC-210 | Spain |
TVGH, Taipei Veterans General Hospital; CHGH, Cheng Hsin General Hospital.
Figure 2Model and module architecture diagram. (A) The architecture of EfficientNet B3, consisting of one convolutional layer and seven MBConv modules. Each MBConv module is followed by a number 1 or 6, which is the multiplication factor n. The number n means the first 1 × 1 convolutional layer expands the channels by n times. (B) The architecture of MBConv module. The k × k following Depwise Conv is the kernel size of Depwise Conv in the MBConv module, and listed in (A) as 3 × 3 or 5 × 5. (C) The architecture of squeeze-and-excitation (SE) block. SE block increases the weight of essential features and reduces the weight of useless features according to the change of loss in the training process to improve the prediction performance. (D) Schematic diagram of Depthwise (Depwise) convolutions. Depthwise Conv performs convolution with different filters for each image channel to reduce the computation loading. FC: fully connected layer.
Network hyper-parameters used in different Models.
| Models 1 | Network Hyper-Parameters | ||
|---|---|---|---|
| Batch Size | Epoch | Initial Learning Rate | |
| TVGH | 20 | 20 | 0.00006 |
| Integrated | 16 | 25 | 0.00004 |
| DRISHTI-GS1-specific | 16 | 20 | 0.00001 |
| RIM-ONE r2-specific | 16 | 20 | 0.00005 |
| CHGH-specific | 20 | 20 | 0.00006 |
1 Each model was named after the training dataset(s) used for training. TVGH model is the core model of this study built with the architecture of EfficientNet B3 and TVGH training images. Integrated model was built with the architecture of EfficientNet B3 and the training images from all datasets. Dataset-specific model used specific dataset to fine-tune the TVGH model, for example, DRISHTI-GS1-specific model used training images from DRISHTI-GS1 dataset to fine-tune TVGH model. TVGH, Taipei Veterans General Hospital; CHGH, Cheng Hsin General Hospita.
Diagnostic performance of the deep learning models built with different training datasets on test images across different datasets.
| Models 1 | Training Datasets | Test Datasets | Accuracy | Specificity | Sensitivity | AUC (95% CI) |
|---|---|---|---|---|---|---|
| TVGH | TVGH | TVGH | 95.62% | 97.50% | 93.75% | 0.991 (0.982–1.000) |
| TVGH | TVGH | DRISHTI-GS1 | 55.00% | 100% | 10.00% | 0.770 (0.558–0.982) |
| TVGH | TVGH | RIM-ONE r2 | 52.50% | 90.00% | 15.00% | 0.624 (0.501–0.748) |
| TVGH | TVGH | CHGH | 80.00% | 95.00% | 65.00% | 0.910 (0.798–1.000) |
| DRISHTI-GS1-specific | TVGH + DRISHTI-GS1 | TVGH | 88.75% | 92.50% | 85.00% | 0.969 (0.945–0.993) |
| DRISHTI-GS1-specific | TVGH + DRISHTI-GS1 | DRISHTI-GS1 | 95.00% | 90.00% | 100.00% | 0.990 (0.958–1.000) |
| RIM-ONE r2-specific | TVGH + RIM-ONE r2 | TVGH | 94.38% | 98.75% | 90.00% | 0.986 (0.969–1.000) |
| RIM-ONE r2-specific | TVGH + RIM-ONE r2 | RIM-ONE r2 | 87.50% | 92.50% | 82.50% | 0.922 (0.859–0.985) |
| CHGH-specific | TVGH + CHGH | TVGH | 92.50% | 93.75% | 91.25% | 0.988 (0.977–1.000) |
| CHGH-specific | TVGH + CHGH | CHGH | 92.50% | 95.00% | 90.00% | 0.963 (0.901–1.000) |
| Integrated | All | TVGH | 91.88% | 91.25% | 92.50% | 0.981 (0.965–0.998) |
| Integrated | All | DRISHTI-GS1 | 50.00% | 20.00% | 80.00% | 0.840 (0.651–1.0) |
| Integrated | All | RIM-ONE r2 | 82.50% | 87.50% | 77.50% | 0.930 (0.875–0.985) |
| Integrated | All | CHGH | 85.00% | 75.00% | 95.00% | 0.960 (0.906–1.0) |
1 All the model architectures are adopted from the model architecture of EfficientNet B3. Model weights are different due to the various training data and hyperparameters. Each model was named after the training dataset(s) used for training. TVGH model is the core model of this study built with the architecture of EfficientNet B3 and TVGH training images. Dataset-specific model used specific dataset to fine-tune the TVGH model, for example, DRISHTI-GS1-specific model used training images from DRISHTI-GS1 and TVGH datasets to fine-tune TVGH model. Integrated model was built with the architecture of EfficientNet B3 and the training images from all datasets. TVGH, Taipei Veterans General Hospital; CHGH, Cheng Hsin General Hospital; AUC, area under receiver operating characteristic curve; CI: confidence interval.
Figure 3Receiver operating characteristic (ROC) curves of the TVGH model, dataset-specific models, and integrated model to differentiate between normal and glaucomatous fundus images on different test datasets. (A) The TVGH model is a CNN classifier constructed with the training images of the TVGH dataset. Different colored lines indicate the results obtained upon using the TVGH model to classify the test images of the TVGH, DRISHTI-GS1, RIM ONE r2, and CHGH datasets. (B) ROC curves of the dataset-specific models in predicting test images from the corresponding datasets. The line corresponding to the T + D model D test refers to the predictive result of DRISHTI-GS1-specific model trained with mixed training images from the TVGH and DRISHITI-GS1 datasets on DRISHTI-GS1 test images. T: TVGH; D: DRISHTI-GS1; R: RIM-ONE r2; C: CHGH. (C) ROC curves of the integrated model constructed with combined training images from all datasets in detecting glaucoma using various test datasets.
Figure 4Gradient-weighted class activation mapping (Grad-CAM) identifying features extracted for classification in the Taipei Veterans General Hospital (TVGH) model. The hot spots (red color) were localized at the optic nerve head, with some extending to the peripapillary nerve fiber bundles in both glaucoma (A,B) and healthy eyes (C,D).
Clinical characteristics and prediction accuracy of the TVGH model in images with primary open angle glaucoma.
| CNN Prediction | |||
|---|---|---|---|
| Correct ( | Incorrect ( | ||
| Age (years) | 58.00 ± 14.71 | 58.00 ± 19.20 | 0.91 |
| Cup-to-disc ratio | 0.79 ± 0.12 | 0.72 ± 0.11 | 0.16 |
| Visual field | |||
| MD (dB) | −5.62 ± 5.26 | −2.43 ± 2.04 | 0.09 |
| PSD (dB) | 5.71 ± 3.59 | 3.73 ± 2.25 | 0.37 |
| Average RNFL thickness (µm) | 71.96 ± 10.86 | 85.40 ± 13.70 | 0.04 |
Values are presented as mean ± SD. TVGH, Taipei Veterans General Hospital; CNN, convolutional neural network; dB, decibel; MD, mean deviation; PSD, pattern standard deviation; RNFL, retinal nerve fiber layer.