| Literature DB >> 35418576 |
Adrian Galdran1, André Anjos2, José Dolz3, Hadi Chakor4, Hervé Lombaert3, Ismail Ben Ayed3.
Abstract
The segmentation of retinal vasculature from eye fundus images is a fundamental task in retinal image analysis. Over recent years, increasingly complex approaches based on sophisticated Convolutional Neural Network architectures have been pushing performance on well-established benchmark datasets. In this paper, we take a step back and analyze the real need of such complexity. We first compile and review the performance of 20 different techniques on some popular databases, and we demonstrate that a minimalistic version of a standard U-Net with several orders of magnitude less parameters, carefully trained and rigorously evaluated, closely approximates the performance of current best techniques. We then show that a cascaded extension (W-Net) reaches outstanding performance on several popular datasets, still using orders of magnitude less learnable weights than any previously published work. Furthermore, we provide the most comprehensive cross-dataset performance analysis to date, involving up to 10 different databases. Our analysis demonstrates that the retinal vessel segmentation is far from solved when considering test images that differ substantially from the training data, and that this task represents an ideal scenario for the exploration of domain adaptation techniques. In this context, we experiment with a simple self-labeling strategy that enables moderate enhancement of cross-dataset performance, indicating that there is still much room for improvement in this area. Finally, we test our approach on Artery/Vein and vessel segmentation from OCTA imaging problems, where we again achieve results well-aligned with the state-of-the-art, at a fraction of the model complexity available in recent literature. Code to reproduce the results in this paper is released.Entities:
Mesh:
Year: 2022 PMID: 35418576 PMCID: PMC9007957 DOI: 10.1038/s41598-022-09675-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Performance comparison of methods trained/tested on DRIVE, CHASE-DB, and HRF.
| # Pub/Year | # Params | DRIVE | CHASE-DB | HRF | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| AUC | Dice | MCC | AUC | Dice | MCC | AUC | DICE | MCC | |||
| Maninis et al.[ | – | – | 82.20 | – | – | – | – | – | – | – | |
| Zhang et al.[ | – | 96.36 | – | – | 96.06 | – | – | 96.08 | – | 74.10 | |
| Fu et al.[ | – | 94.04 | 78.75 | – | 94.82 | 75.49 | – | – | – | – | |
| Liskowski et al.[ | 48,000,000 | 97.90 | – | – | – | – | – | – | – | ||
| Orlando et al.[ | – | 95.07 | 78.57 | 75.56 | 95.24 | 73.32 | 70.46 | 95.24 | 71.58 | 68.97 | |
| Gu et al.[ | – | – | 78.86 | 75.89 | – | 72.02 | 69.08 | – | 77.49 | 75.41 | |
| Wu et al.[ | – | – | – | 98.25 | – | – | – | – | – | ||
| Yan et al.[ | – | 97.52 | 81.83 | – | 97.81 | – | – | – | 78.14 | – | |
| Wang et al.[ | – | – | 81.44 | 78.95 | – | 78.63 | 76.55 | – | – | – | |
| Wang et al.[ | – | 97.72 | – | 98.12 | 80.37 | – | – | – | – | ||
| Araujo et al.[ | – | 97.90 | – | – | 98.20 | – | – | – | – | – | |
| Fu et al.[ | – | 97.19 | 80.48 | – | – | – | – | – | – | – | |
| Wang et al.[ | – | – | 80.93 | 78.51 | – | 78.09 | 75.91 | – | 77.31 | – | |
| Wu et al.[ | – | 97.79 | – | – | – | – | – | – | – | – | |
| Zhao et al.[ | – | – | 78.82 | – | – | – | – | – | 76.59 | – | |
| Laibacher et al.[ | 549,748 | 97.14 | 80.91 | – | 97.03 | 80.06 | – | – | 78.14 | – | |
| Shin et al.[ | 7,910,000 | 98.01 | 82.63 | – | 98.30 | 80.34 | – | – | |||
| Zhao et al.[ | – | – | 82.29 | – | – | – | – | – | 77.31 | – | |
| Zhuo et al.[ | – | 97.54 | 81.63 | – | – | – | – | – | – | – | |
| Mou et al.[ | 56,030,000 | 97.96 | – | – | 98.12 | – | – | – | – | – | |
| 34,201 | 97.98 | 82.41 | 79.81 | 98.22 | 80.29 | 78.23 | 98.11 | 80.59 | 78.60 | ||
| 68,482 | 98.25 | 81.03 | |||||||||
Best results are marked bold. A result is underlined whenever it lies within the confidence interval of the Little W-Net model (specified in Table 3 below).
Figure 1This work provides a comprehensive cross-dataset performance study on vessel segmentation. This figure shows a representative image from each of the 10 databases used in this paper: (a) DRIVE[1], (b) CHASE-DB 1[2], (c) HRF[3], (d) STARE[4], (e) LES-AV[5], (f) IOSTAR[6], (g) DR HAGIS[7], (h) AV-WIDE[8], (i) DRIDB[9], (j) UoA-DR[10]. A detailed description of each database is given in Table 2.
Figure 2Representation of the WNet architecture. The left-hand-side part of the architecture corresponds to a standard minimal U-Net with 34 K parameters, which achieves performance on-par with the state-of-the-art. The full W-Net, defined by Eq. (1), is composed of two consecutive U-Nets; it outperforms all previous approaches with just around 70 k parameters: 1–3 orders of magnitude less than previously proposed CNNs.
Figure 3Domain Adaptation strategy employed in this work: a model trained on source data is used to generate Pseudo-Labels on a target dataset. The original source data and the target data with Pseudo-Labels are used to fine-tune the model and produce better predictions.
Description of each of the ten datasets considered in this paper in terms of image and population characteristics.
| Year | # ims. | Resolution | FOV | Challenges & Comments | |
|---|---|---|---|---|---|
| STARE[ | 2000 | 20 | 605 | 35 | Poor quality: scanned and digitized photographs Healthy and pathological images (10/10) |
| DRIVE[ | 2004 | 40 | 565 | 45 | Consistent good quality and contrast, low resolution Mostly healthy patients, some with mild DR (33/40) |
| CHASE-DB 1[ | 2012 | 28 | 999 | 30 | OD-centered images from 10-year old children Uneven background illumination and poor contrast |
| HRF[ | 2013 | 45 | 3504 | 60 | High visual quality, images taken with mydriatic dilation Healthy, diabetic, and glaucomatous patients (15/15/15) |
| DRiDB[ | 2013 | 50 | 720 | 45 | Highly varying quality, illumination, and image noise Mostly diabetic patients of varying grades (36/50) |
| AV-WIDE[ | 2015 | 30 | 2816 1500 | 200 | Uneven illumination, varying resolution due to cropping Healthy and age-related macular degeneration patients. |
| IOSTAR[ | 2016 | 30 | 1024 | 45 | Scanning Laser Ophthalmoscope images Macula-centered, high contrast and visual quality |
| DR HAGIS[ | 2017 | 40 | 2816 4752 | 45 | Multi-center, multi-device macula-centered images All diabetic patients with different co-morbities |
| UoA-DR[ | 2017 | 200 | 2124 | 45 | Both macula and OD-centered images Healthy, NP-DR and P-DR patients (56/114/30) |
| LES-AV[ | 2018 | 22 | 1144 1958 | 30 45 | OD-centered images, highly varying illumination 11 healthy and 11 glaucomatous patients |
Average performance of the Little W-Net model for each of the datasets in Table 2 over 5 training runs, with a confidence interval containing the true mean with probability , under a normality assumption of the performances.
| DRIVE | CHASE-DB | HRF | ||||||
|---|---|---|---|---|---|---|---|---|
| AUC | DICE | MCC | AUC | DICE | MCC | AUC | DICE | MCC |
Our domain adaptation strategy improves results in a wide range of external test sets.
| Training set | DRIVE | CHASE-DB | HRF | STARE | IOSTAR | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AUC | DICE | MCC | AUC | DICE | MCC | AUC | DICE | MCC | AUC | DICE | MCC | AUC | DICE | MCC | |
| DRIVE | 97.22 | 75.13 | 72.44 | 95.90 | 70.39 | 68.05 | 98.11 | 79.48 | 77.30 | 97.97 | 78.77 | 76.47 | |||
| PSEUDO-L | |||||||||||||||
First row: W-Net trained on DRIVE, second row (Pseudo-Labels): same model fine-tuned using the strategy illustrated in Fig. 3. Best metric marked in bold. Please note that Dice/MCC are computed in all cases from segmentations binarized using a threshold that is optimal for maximizing the Dice score in the training dataset (DRIVE).
Figure 4The proposed Domain Adaptation strategy recovers some missing vessels. Segmentations produced by a model trained on DRIVE (which contains macula-centered images) when using data from CHASE-DB and LES-AV (which contain OD-centered images). In (a,b), the retinal image (left), the segmentation by the model trained on DRIVE (center) and the one produced by the model trained on pseudo-labels (right).
Performance comparison for the artery/vein segmentation task.
| # Params | DRIVE | HRF | LES-AV | ||||
|---|---|---|---|---|---|---|---|
| DICE | MCC | DICE | MCC | DICE | MCC | ||
| Galdran et al.[ | 96.31 | | 74.79 | 25.07 | – | – | |||
| Hhemelings et al.[ | 77.57 | 24.67 | 96.88 | – | – | |||
| 96.69 | 95.55 | 76.19 | 96.46 | 70.30 | ||||
For DRIVE, performance is reported on the entire image domain | on a ring-shaped region around the Optic Disc[61]. Performance is computed using the predictions and code provided by[61]. Predictions on LES-AV are generated from models trained on DRIVE.
Figure 5Generalization ability of a W-Net trained for A/V segmentation. Results of our model trained on DRIVE and tested on (a) DRIVE, (b) LES-AV.
Figure 6OCTA vessel segmentation. (a,b): SVC images, (c,d): DVC images, (e,f): SVC+DVC images, (h,i): Rose-2 images. The second row shows predicted probabilities and the third rows corresponding manual ground-truths. Each pair shows representative best and worst case segmentations in the corresponding test set.
Performance comparison for OCTA vessel segmentation on ROSE-1 (SVC and SVC+DVC).
| # Pub/Year | ROSE-1 (SVC) | ROSE-1 (SVC+DVC) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AUC | DICE | ACC | G-mean | Kappa | FDR | AUC | DICE | ACC | G-mean | Kappa | FDR | ||
| IPAC[ | 84.20 | 57.51 | 82.45 | 75.17 | 46.64 | 48.16 | 79.41 | 52.23 | 80.07 | 70.54 | 39.82 | 52.11 | |
| COSFIRE[ | 92.86 | 75.17 | 92.27 | 78.83 | 70.89 | 88.00 | 66.71 | 89.81 | 72.56 | 61.25 | |||
| CE-Net[ | 92.92 | 75.11 | 91.21 | 82.56 | 69.78 | 19.95 | 91.55 | 73.00 | 89.90 | 82.03 | 66.78 | 24.79 | |
| CS-Net[ | 93.92 | 76.08 | 91.52 | 83.04 | 70.93 | 18.83 | 93.11 | 74.88 | 90.73 | 82.63 | 69.19 | 21.37 | |
| COOF[ | 86.89 | 66.06 | 85.30 | 81.61 | 56.84 | 41.21 | 82.17 | 56.85 | 77.62 | 77.42 | 43.06 | 54.65 | |
| OCTA-Net[ | 94.53 | 76.97 | 91.82 | 83.61 | 72.01 | 17.75 | 93.75 | 75.76 | 90.99 | 83.38 | 70.22 | 20.49 | |
| 18.20 | 18.25 | ||||||||||||
Best results are marked in bold.
Performance comparison for OCTA vessel segmentation on ROSE-1 (DVC) and ROSE-2.
| # Pub/Year | ROSE-1 (DVC) | ROSE-2 | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AUC | DICE | ACC | G-mean | Kappa | FDR | AUC | DICE | ACC | G-mean | Kappa | FDR | ||
| IPAC[ | 75.63 | 9.11 | 75.22 | 76.84 | 6.36 | 95.10 | 73.70 | 55.15 | 85.92 | 82.07 | 47.58 | 55.90 | |
| COSFIRE[ | 85.20 | 24.05 | 91.30 | 85.23 | 21.99 | 85.16 | 77.87 | 61.42 | 92.12 | 77.42 | 56.99 | 38.91 | |
| CE-Net[ | 95.05 | 57.83 | 98.43 | 85.03 | 57.07 | 51.47 | 84.67 | 70.66 | 93.77 | 82.48 | 67.08 | 29.30 | |
| CS-Net[ | 96.71 | 58,84 | 98.82 | 81.55 | 58.25 | 47.10 | 85.42 | 70.10 | 93.85 | 82.35 | 66.58 | 30.25 | |
| COOF[ | 81.62 | 10.03 | 66.78 | 78.47 | 6.49 | 94.65 | 74.42 | 61.12 | 89.45 | 81.17 | 54.98 | 46.20 | |
| OCTA-Net[ | 96.73 | 70.74 | 99.09 | 70.28 | 34.92 | 86.03 | 93.86 | 30.19 | |||||
| 83.87 | 69.70 | 80.78 | 66.46 | ||||||||||
Best results are marked in bold.
Performance comparison between a W-Net and a U-Net configured to have a comparable amount of weights.
| # Params | DRIVE | CHASE-DB | HRF | ||||
|---|---|---|---|---|---|---|---|
| AUC | DICE | AUC | DICE | AUC | DICE | ||
| 76,213 | 98.00 | 82.53 | 98.29 | 81.09 | 98.15 | 80.73 | |
| 68,482 | |||||||
p<0.05 | p<0.05 | p<0.05 | p<0.05 | p<0.05 | p<0.05 | ||
W-Net achieves higher performance, despite having slightly less parameters. Statistically significant results marked bold.
Parameters and memory requirements vs performance for several retinal vessel segmentation models.
| # Params | Size | DRIVE | CHASEDB | HRF | ||||
|---|---|---|---|---|---|---|---|---|
| AUC | DICE | AUC | DICE | AUC | DICE | |||
| DRIU[ | 15M | 57MB | n/a | 82.20 | n/a | n/a | n/a | n/a |
| M2U-Net[ | 0.5 M | 550 kb | 97.14 | 80.91 | 97.03 | 80.06 | n/a | 78.14 |
| Little U-Net | 34 K | 161 kb | 97.98 | 82.41 | 98.22 | 80.68 | 98.11 | 80.59 |
| Little W-Net | 68 K | 325 kb | 98.09 | 82.82 | 98.44 | 81.55 | 98.24 | 81.04 |