| Literature DB >> 32193499 |
Keunheung Park1,2, Jinmi Kim3, Jiwoong Lee4,5.
Abstract
Computer vision has greatly advanced recently. Since AlexNet was first introduced, many modified deep learning architectures have been developed and they are still evolving. However, there are few studies comparing these architectures in the field of ophthalmology. This study compared the performance of various state-of-the-art deep-learning architectures for detecting the optic nerve head and vertical cup-to-disc ratio in fundus images. Three different architectures were compared: YOLO V3, ResNet, and DenseNet. We compared various aspects of performance, which were not confined to the accuracy of detection but included, as well, the processing time, diagnostic performance, effect of the graphic processing unit (GPU), and image resolution. In general, as the input image resolution increased, the classification accuracy, localization error, and diagnostic performance all improved, but the optimal architecture differed depending on the resolution. The processing time was significantly accelerated with GPU assistance; even at the high resolution of 832 × 832, it was approximately 170 ms, which was at least 26 times slower without GPU. The choice of architecture may depend on the researcher's purpose when balancing between speed and accuracy. This study provides a guideline to determine deep learning architecture, optimal image resolution, and the appropriate hardware.Entities:
Mesh:
Year: 2020 PMID: 32193499 PMCID: PMC7081256 DOI: 10.1038/s41598-020-62022-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Demographic characteristics of the training group.
| Values | |
|---|---|
| Total number of patients (female/male) | 1,068 (531/537) |
| Total number of eyes | 1959 |
| Age (mean ± SD) | 58.2 ± 16.1 |
| Number of eyes binned by VCDR | |
| VCDR < 0.4 | 111 (5.7%) |
| 0.4 ≤ VCDR < 0.5 | 125 (6.4%) |
| 0.5 ≤ VCDR < 0.6 | 279 (14.2%) |
| 0.6 ≤ VCDR < 0.7 | 467 (23.8%) |
| 0.7 ≤ VCDR < 0.8 | 516 (26.3%) |
| 0.8 ≤ VCDR < 0.9 | 379 (19.3%) |
| 0.9 ≤ VCDR | 82 (4.2%) |
SD: standard deviation, VCDR: vertical cup-to-disc ratio
Figure 1Custom-developed deep-learning architecture tester software.
Demographic characteristics of the test group.
| Normal (n = 95) | Glaucoma (n = 109) | ||
|---|---|---|---|
| Age (year) | 51.8 ± 22.1 | 60.1 ± 16.0 | 0.003a |
| Female/male (number) | 47/48 | 52/57 | 0.801b |
| Spherical equivalence (diopter) | −1.92 ± 3.42 | −1.85 ± 3.20 | 0.878a |
| Intraocular pressure (mmHg) | 15.7 ± 3.3 | 15.3 ± 4.3 | 0.466a |
| Axial length (mm) | 24.89 ± 3.14 | 24.60 ± 1.50 | 0.607a |
| Central corneal thickness (µm) | 551.9 ± 46.5 | 539.0 ± 39.7 | 0.045a |
| - Mean deviation (dB) | −1.48 ± 2.02 | −11.25 ± 9.41 | <0.001c |
| - Pattern standard deviation (dB) | 1.84 ± 0.94 | 5.76 ± 3.85 | <0.001c |
| - Visual Field Index (%) | 98.0 ± 3.2 | 69.3 ± 32.3 | <0.001c |
aStudent’s t-test.
bχ2 test.
cMann–Whitney U test.
Values are presented as mean ± standard deviation.
Comparison of the mean detection time for different hardware.
| Resolution | Hardware | YOLO V3 (ms) | ResNet (ms) | DenseNet (ms) | Post-hoc analysis | |||
|---|---|---|---|---|---|---|---|---|
| 224 × 224 | CPU onlya | 531 ± 80 | 314 ± 53 | 394 ± 66 | <0.001 | <0.001 | <0.001 | <0.001 |
| with GPUb | 112 ± 32 | 116 ± 39 | 160 ± 57 | <0.001 | 0.582 | <0.001 | <0.001 | |
| 416 × 416 | CPU onlya | 1,977 ± 228 | 942 ± 158 | 1,198 ± 189 | <0.001 | <0.001 | <0.001 | <0.001 |
| with GPUb | 117 ± 27 | 111 ± 28 | 136 ± 32 | <0.001 | <0.001 | <0.001 | <0.001 | |
| 832 × 832 | CPU onlya | 11,365 ± 967 | 4,395 ± 395 | 6,055 ± 591 | <0.001 | <0.001 | <0.001 | <0.001 |
| with GPUb | 171 ± 35 | 165 ± 39 | 167 ± 30 | <0.001 | <0.001 | <0.001 | <0.001 | |
aIntel i5–8400 2.81 GHz + 32 GB RAM.
bNVIDIA Titan Xp 12 GB RAM.
cP value among all three architectures (Friedman test).
dP value between Yolo V3 and ResNet (Friedman test).
eP value between ResNet and DenseNet (Friedman test).
fP value between Yolo V3 and DenseNet (Friedman test).
Values are presented as mean ± standard deviation.
Localization accuracy.
| Resolution | Statistics | YOLO V3 | ResNet | DenseNet | Post-hoc analysis | |||
|---|---|---|---|---|---|---|---|---|
| 224 × 224 | Mean IoU, % | 67.7 ± 11.3 | 69.6 ± 10.7 | 80.0 ± 9.4 | <0.001e | 0.005f | <0.001f | <0.001f |
| mAP50, % | 93.14 | 94.61 | 99.51 | 0.016g | 0.512g | 0.003g | <0.001g | |
| 416 × 416 | Mean IoU, % | 79.4 ± 6.0 | 69.0 ± 10.2 | 81.2 ± 7.2 | <0.001e | <0.001f | <0.001f | <0.001f |
| mAP50, % | 100.00 | 95.10 | 100.00 | <0.001g | 0.001g | 0.001g | NE | |
| 832 × 832 | Mean IoU, % | 81.5 ± 7.1 | 77.2 ± 11.2 | 80.7 ± 7.8 | <0.001e | <0.001f | <0.001f | 0.651f |
| mAP50, % | 99.02 | 95.59 | 100.00 | 0.053g | 0.033g | 0.002g | 0.155g | |
aP value among all three architectures.
bP value between Yolo V3 and ResNet.
cP value between ResNet and DenseNet.
dP value between Yolo V3 and DenseNet.
eFriedman test.
fWilcoxon’s signed-rank test.
gGeneralized estimating equations (GEEs).
IoU: intersection over union, mAP50: mean average precision, NE: not estimable.
Values are presented as mean ± standard deviation.
Vertical cup-to-disc ratio (VCDR) classification accuracy.
| Statistics | YOLO V3 | ResNet | DenseNet | Post-hoc analysis | |||
|---|---|---|---|---|---|---|---|
| MAEa | 0.069 ± 0.071 | 0.062 ± 0.083 | 0.065 ± 0.083 | 0.001g | <0.001g | 0.245g | 0.001g |
| NAE <0.1b | 142 (68.9%) | 147 (71.4%) | 146 (70.9%) | 0.156h | 0.084h | 0.891h | 0.114h |
| NAE <0.2b | 183 (88.8%) | 186 (90.3%) | 183 (88.8%) | 0.126h | 0.064h | 0.564h | 0.131h |
| MAEa | 0.062 ± 0.062 | 0.061 ± 0.072 | 0.063 ± 0.073 | 0.531g | 0.851g | 0.820g | 0.803g |
| NAE <0.1b | 150 (72.8%) | 145 (70.4%) | 150 (72.8%) | 0.682h | 0.465h | 0.465h | 1.000h |
| NAE <0.2b | 193 (93.7%) | 187 (90.8%) | 189 (91.7%) | 0.369h | 0.159h | 0.655h | 0.347h |
| MAEa | 0.053 ± 0.059 | 0.062 ± 0.066 | 0.048 ± 0.063 | 0.047g | 0.118g | <0.001g | 0.025g |
| NAE <0.1b | 159 (77.2%) | 152 (73.8%) | 170 (82.5%) | 0.016h | 0.376h | 0.004h | 0.062h |
| NAE <0.2b | 199 (96.6%) | 193 (93.7%) | 199 (96.6%) | 0.123h | 0.099h | 0.063h | 0.655h |
aMean absolute error (MAE) between the detected VCDR and ground truth VCDR (mean ± standard deviation).
bThe number (percent) of patients whose absolute error (AE) between detected VCDR and ground truth VCDR is less than 0.1 or 0.2.
cP value among all three architectures.
dP value between Yolo V3 and ResNet.
eP value between ResNet and DenseNet.
fP value between Yolo V3 and DenseNet.
gFriedman test.
hGeneralized estimating equations (GEEs).
Values are presented as mean ± standard deviation.
Figure 2Vertical cup-to-disc ratio (VCDR) classification accuracy. Overall mean absolute error (MAE) tended to be high for patients (ground truth) with VCDR < 0.4 or VCDR > 0.9 and almost all other MAEs were <0.1. That is, excluding the patients with an extremely low or high VCDR, the MAE for the prediction of the VCDR was almost always <0.1.
Diagnostic performances of detected vertical cup-to-disc ratio (VCDR).
| AUROC (CI) | ||||
|---|---|---|---|---|
| - YOLO V3 | 0.793 (0.729–0.848) | 0.799 | 0.240 | 0.129 |
| - ResNet | 0.785 (0.721–0.841) | |||
| - DenseNet | 0.754 (0.687–0.813) | |||
| - YOLO V3 | 0.810 (0.749–0.861) | 0.602 | 0.574 | 0.278 |
| - ResNet | 0.799 (0.737–0.852) | |||
| - DenseNet | 0.787 (0.724–0.841) | |||
| - YOLO V3 | 0.832 (0.773–0.881) | 0.768 | 0.257 | 0.435 |
| - ResNet | 0.838 (0.780–0.886) | |||
| - DenseNet | 0.818 (0.758–0.869) | |||
aP value between YOLO V3 and ResNet (DeLong’s method).
bP value between ResNet and DenseNet (DeLong’s method).
cP value between YOLO V3 and DenseNet (DeLong’s method).
Values are presented as mean ± standard deviation.
AUROC: area under receiver operating characteristics, C.I.: confidence interval.
Figure 3Diagnostic performance of the detected vertical cup-to-disc ratio (VCDR).