| Literature DB >> 31514466 |
Muhammad Arsalan1, Muhammad Owais2, Tahir Mahmood3, Se Woon Cho4, Kang Ryoung Park5.
Abstract
Automatic segmentation of retinal images is an important task in computer-assisted medical image analysis for the diagnosis of diseases such as hypertension, diabetic and hypertensive retinopathy, and arteriosclerosis. Among the diseases, diabetic retinopathy, which is the leading cause of vision detachment, can be diagnosed early through the detection of retinal vessels. The manual detection of these retinal vessels is a time-consuming process that can be automated with the help of artificial intelligence with deep learning. The detection of vessels is difficult due to intensity variation and noise from non-ideal imaging. Although there are deep learning approaches for vessel segmentation, these methods require many trainable parameters, which increase the network complexity. To address these issues, this paper presents a dual-residual-stream-based vessel segmentation network (Vess-Net), which is not as deep as conventional semantic segmentation networks, but provides good segmentation with few trainable parameters and layers. The method takes advantage of artificial intelligence for semantic segmentation to aid the diagnosis of retinopathy. To evaluate the proposed Vess-Net method, experiments were conducted with three publicly available datasets for vessel segmentation: digital retinal images for vessel extraction (DRIVE), the Child Heart Health Study in England (CHASE-DB1), and structured analysis of retina (STARE). Experimental results show that Vess-Net achieved superior performance for all datasets with sensitivity (Se), specificity (Sp), area under the curve (AUC), and accuracy (Acc) of 80.22%, 98.1%, 98.2%, and 96.55% for DRVIE; 82.06%, 98.41%, 98.0%, and 97.26% for CHASE-DB1; and 85.26%, 97.91%, 98.83%, and 96.97% for STARE dataset.Entities:
Keywords: Vess-Net; diabetic retinopathy; retinal vessels; vessel segmentation
Year: 2019 PMID: 31514466 PMCID: PMC6780110 DOI: 10.3390/jcm8091446
Source DB: PubMed Journal: J Clin Med ISSN: 2077-0383 Impact factor: 4.241
Comparison between Vess-Net and previous methods for vessel segmentation.
| Type | Methods | Strength | Weakness |
|---|---|---|---|
|
| Vessel segmentation using thresholding [ | Simple method to approximate vessel pixels | False points detected when vessel pixel values are closer to background |
| Fuzzy-based segmentation [ | Performs well with uniform pixel values | Intensive pre-processing is required to intensify blood vessels’ response | |
| Active contours [ | Better approximation for detection of real boundaries | Iterative and time-consuming processes are required | |
| Vessel tubular properties-based method [ | Good estimation of vessel-like structures | Limited by pixel discontinuities | |
| Line detection-based method [ | Removing background helps reduce false skin-like pixels | ||
|
| Random forest classifier-based method [ | Lighter method to classify pixels | Various transformations needed before classification to form features |
| Patch-based CNN [ | Better classification | Training and testing require long processing time | |
| SVM-based method [ | Lower training time | Use of pre-processing schemes with several images to produce feature vector | |
| Extreme machine-learning [ | Machine learning with many discriminative features | Morphology and other conventional approaches are needed to produce discriminative features | |
| Mahalanobis distance classifier [ | Simple procedure for training | Pre-processing overhead is still required to compute relevant features | |
| U-Net-based CNN for semantic segmentation [ | U-Net structure preserves the boundaries well | Gray scale pre-processing is required | |
| Multi-scale CNN [ | Better learning due to multi-receptive fields | Tiny vessels not detected in certain cases | |
| CNN with CRFs [ | CNN with few layers provides faster segmentation | CRFs are computationally complex | |
| SegNet-inspired method [ | Encoder and decoder architecture provides a uniform structure of network layers | Use of PCA to prepare data for training | |
| CNN with visual codebook [ | 10-layer CNN for correlation with ground truth representation | No end-to-end system for training and testing | |
| CNN with quantization and pruning [ | Pruned convolutions increase the efficiency of the network | Fully connected layers increase the number of trainable parameters | |
| Three-stage CNN-based deep-learning method [ | Fusion of multi-feature image provides powerful representation | Usage of three CNNs requires more computational power and cost | |
| Modified U-Net with dice loss [ | Dice loss provides good results with unbalanced classes | Use of PCA to prepare data for training | |
| Deformable U-Net-based method [ | Deformable networks can adequately accommodate geometric variations of data | Patch-based training and testing is time-consuming | |
| PixelBNN [ | Pixel CNN is famous for predicting pixels with spatial dimensions | Use of CLAHE for pre-processing | |
| Dense U-Net-based method [ | Dense block is good for alleviating vanishing gradient problem | Patch-based training and testing is time-consuming | |
| Cross-connected CNN (CcNet) [ | Cross-connections of layers empower features | Complex architecture with pre-processing | |
| Vess-Net | Robust segmentation with fewer layers | Augmented data necessary to fully train network |
Figure 1Flowchart of the proposed method.
Figure 2Vess-Net feature re-use schematic.
Figure 3Proposed Vess-Net.
Key differences between Vess-Net and previous architectures.
| Method | Other Architectures | Vess-Net |
|---|---|---|
| ResNet [ | Residual skip path between adjacent layers only | Skip connections between adjacent layers and directly between the encoder and decoder |
| All variants use 1 × 1 convolution bottleneck layer | 1 × 1 convolution is used only in non-identity residual paths in combination with BN | |
| No index information with max-pooling layers | Index information between max-pooling and max-unpooling layers used to maintain feature size and location | |
| One fully connected layer for classification network | Fully convolutional network (FCN) for semantic segmentation, so fully connected layers not used | |
| Average pooling at the end in place of max-pooling in each block | Max-pooling in each encoder block and max-unpooling in each decoder block | |
| IrisDenseNet [ | Total of 26 (3 × 3) convolutional layers | Total of 16 (3 × 3) convolutional layers |
| Uses dense connectivity inside each dense block via feature concatenation in just encoder | Uses residual connectivity via element-wise addition | |
| Different number of convolutional layers: first two blocks have two convolutional layers and other blocks have three | Same number of convolutional layers (two layers) in each block of encoder and decoder | |
| No dense connectivity inside decoder | Uses connectivity inside and between the encoder and decoder | |
| FRED-Net [ | Only uses residual connectivity between adjacent layers | Uses residual connectivity between adjacent layers in encoder and decoder and residual connectivity outside encoder and decoder |
| No outer residual skip paths for direct spatial edge information flow | Inner and outer residual connections for information flow | |
| Total of 6 skip connections inside the encoder and decoder | 10 residual skip connections in encoder and decoder | |
| Only non-identity mapping for residual connections | Non-identity mapping for inner residual connections (Stream 1) and identity mapping for outer residual connections (Stream 2) | |
| Only uses post-activation because ReLU activation is used after element-wise addition | Uses the pre-activation and post-activation as ReLU is used before and after the element-wise addition |
Vess-Net encoder with inner and outer residual skip connections and activation size of each layer (ECB, ECon, ORSP, IRSP, and Pool indicate encoder convolutional block, encoder convolution layer, outer residual kip path, internal residual skip path, and pooling layer, respectively). “**” refers to layers that include both batch normalization (BN) and ReLU. “*” refers to layers that include only BN, and *Pool/*Unpool shows that the pooling/unpooling layer is activated prior to the ReLU layer. Outer residual skip paths (ORSP-1 to ORSP-4) are initiated from the encoder to provide spatial information to the decoder. Vess-Net uses both pre- and post-activation. The table is based on the digital retinal images for vessel extraction (DRIVE) dataset, which has a size of 447 × 447 × 3.
| Block | Name/Size | Number of Filters | Output Feature Map Size (Width × Height × Number of Channels) | Number of Trainable Parameters (Econ + BN) |
|---|---|---|---|---|
|
| ECon-1_1 **/3 × 3 × 3 | 64 | 447 × 447 × 64 | 1792 + 128 |
| ECon-1_2 **/3 × 3 × 64 | 64 | 36,928 + 128 | ||
|
| Pool-1/2 × 2 | - | 223 × 223 × 64 | - |
|
| ECon-2_1 **/3 × 3 × 64 | 128 | 223 × 223 × 128 | 73,856 + 256 |
| IRSP-1 */1 × 1 × 64 | 128 | 8320 + 256 | ||
| ECon-2_2 */3 × 3 × 128 | 128 | 147,584 + 256 | ||
| Add-1 (ECon-2_2* + IRSP-1 *) | - | - | ||
|
| * Pool-2/2 × 2 | - | 111 × 111 × 128 | - |
|
| ECon-3_1 **/3 × 3 × 128 | 256 | 111 × 111 × 256 | 295,168 + 512 |
| IRSP-2 */1 × 1 × 128 | 256 | 33,024 + 512 | ||
| ECon-3_2 */3 × 3 × 256 | 256 | 590,080 + 512 | ||
| Add-2 (ECon-3_2 * + IRSP-2 *) | - | - | ||
|
| * Pool-3/2×2 | - | 55 × 55 × 256 | - |
|
| ECon-4_1 **/3 × 3 × 256 | 512 | 55 × 55 × 512 | 1,180,160 + 1024 |
| IRSP-3 */1 × 1 × 256 | 512 | 131,584 + 1024 | ||
| ECon-4_2 */3 × 3 × 512 | 512 | 2,359,808 + 1024 | ||
| Add-3 (ECon-4_2 * + IRSP-3 *) | - | - | ||
|
| * Pool-4/2 × 2 | - | 27 × 27 × 512 | - |
Vess-Net decoder with inner and outer residual skip connections and activation size of each layer (DCB, DCon, ORSP, IRSP, and Unpool indicate decoder convolutional block, decoder convolution layer, outer residual skip path, internal residual skip path, and unpooling layer, respectively). “**” refers to layers that include both BN and ReLU. “*” refers to layers with only BN, and *Pool/*Unpool shows that the pooling/unpooling layer is activated prior to the ReLU layer. Outer residual skip paths (ORSP-1 to ORSP-4) are initiated from the encoder to provide spatial information to the decoder. Vess-Net uses both pre- and post-activation. The table is based on the DRIVE dataset, which has a size of 447 × 447 × 3.
| Block | Name/Size | Number of Filters | Output Feature Map Size (Width × Height × Number of Channels) | Number of Trainable Parameters (DCon + BN) |
|---|---|---|---|---|
|
| Unpool-4 | - | 55 × 55 × 512 | - |
|
| DCon-4_2 **/3 × 3 × 512 | 512 | 2,359,808 + 1024 | |
| ORSP-4 from encoder ECon-4_1 ** | - | - | ||
| IRSP-4 */1 × 1 × 512 | 256 | 55 × 55 × 256 | 131,328 + 512 | |
| DCon-4_1 */3 × 3 × 512 | 256 | 1,179,904 + 512 | ||
| Add-5 (DCon-4_1 * + IRSP-4 *) | - | - | ||
|
| * Unpool-3 | - | 111 × 111 × 256 | - |
|
| DCon-3_2 **/3 × 3 × 256 | 256 | 590,080 + 512 | |
| ORSP-3 from encoder ECon-3_1 ** | - | - | ||
| IRSP-5 */1 × 1 × 256 | 128 | 111 × 111 × 128 | 32,896 + 256 | |
| DCon-3_1 **/3 × 3 × 256 | 128 | - | ||
| Add-7 (DCon-3_1 * + IRSP-5 *) | - | - | ||
|
| * Unpool-2 | - | 223 × 223 × 128 | - |
|
| DCon-2_2 **/3 × 3 × 128 | 128 | 147,584 + 256 | |
| ORSP-2 from encoder ECon-2_1 ** | - | - | ||
| IRSP-6 */1 × 1 × 128 | 64 | 223 × 223 × 64 | 8256 + 128 | |
| DCon-2_1 **/3 × 3 × 128 | 64 | 73,792 + 128 | ||
| Add-9 (DCon-3_1 * + IRSP-6*) | - | - | ||
|
| * Unpool-1 | - | 447 × 447 × 64 | - |
|
| DConv-1_2 **/3 × 3 × 64 | 64 | 36,928 + 128 | |
| ORSP-1 from encoder ECon-1_1 ** | - | - | ||
| DConv-1_1 **/3 × 3 × 64 | 2 | 447 × 447 × 2 | 1154 + 4 |
Figure 4Sample fundus images and ground-truths for DRIVE dataset.
Figure 5Data augmentation method: (a) Stage 1 augmentation by flipping; (b) Stage 2 augmentation by translation, flip, and resize (recursively); and (c) Stage 3 augmentation by translation, flip, and resize (non-recursively).
Figure 6Training accuracy and loss curves for Vess-Net.
Figure 7Examples of vessel segmentation by Vess-Net for DRIVE dataset: (a) original image; (b) ground-truth mask; (c) predicted mask by Vess-Net; (d) segmented image by Vess-Net (tp is presented in blue, fp in green, and fn in black).
Accuracies of Vess-Net and existing methods for DRIVE dataset (unit: %).
| Type | Method | Se | Sp | AUC | Acc |
|---|---|---|---|---|---|
| Handcrafted local feature-based methods | Akram et al. [ | - | - | 96.32 | 94.69 |
| Fraz et al. [ | 74.0 | 98.0 | - | 94.8 | |
| Kar et al. [ | 75.48 | 97.92 | - | 96.16 | |
| Zhao et al. (without Retinex) [ | 76.0 | 96.8 | 86.4 | 94.6 | |
| Zhao et al. (with Retinex) [ | 78.2 | 97.9 | 88.6 | 95.7 | |
| Pandey et al. [ | 81.06 | 97.61 | 96.50 | 96.23 | |
| Neto et al. [ | 78.06 | 96.29 | - | 87.18 | |
| Sundaram et al. [ | 69.0 | 94.0 | - | 93.0 | |
| Zhao et al. [ | 74.2 | 98.2 | 86.2 | 95.4 | |
| Jiang et al. [ | 83.75 | 96.94 | - | 95.97 | |
| Rodrigues et al. [ | 71.65 | 98.01 | - | 94.65 | |
| Sazak et al. [ | 71.8 | 98.1 | - | 95.9 | |
| Chalakkal et al. [ | 76.53 | 97.35 | - | 95.42 | |
| Akyas et al. [ | 74.21 | 98.03 | - | 95.92 | |
| Learned/deep feature-based methods | Zhang et al. (without post-processing) [ | 78.95 | 97.01 | - | 94.63 |
| Zhang et al. (with post-processing) [ | 78.61 | 97.12 | - | 94.66 | |
| Tan et al. [ | 75.37 | 96.94 | - | - | |
| Zhu et al. [ | 71.40 | 98.68 | - | 96.07 | |
| Wang et al. [ | 76.48 | 98.17 | - | 95.41 | |
| Tuba et al. [ | 67.49 | 97.73 | - | 95.38 | |
| Girard et al. [ | 78.4 | 98.1 | 97.2 | 95.7 | |
| Hu et al. [ | 77.72 | 97.93 | 97.59 | 95.33 | |
| Fu et al. [ | 76.03 | - | - | 95.23 | |
| Soomro et al. [ | 74.6 | 91.7 | 83.1 | 94.6 | |
| Guo et al. [ | 78.90 | 98.03 | 98.02 | 95.60 | |
| Chudzik et al. [ | 78.81 | 97.41 | 96.46 | - | |
| Yan et al. [ | 76.31 | 98.20 | 97.50 | 95.38 | |
| Soomro et al. [ | 73.9 | 95.6 | 84.4 | 94.8 | |
| Jin et al. [ | 79.63 | 98.00 | 98.02 | 95.66 | |
| Leopold et al. [ | 69.63 | 95.73 | 82.68 | 91.06 | |
| Wang et al. [ | 79.86 | 97.36 | 97.40 | 95.11 | |
| Feng et al. [ | 76.25 | 98.09 | 96.78 | 95.28 | |
| Vess-Net (this work) | 80.22 | 98.1 | 98.2 | 96.55 |
Figure 8Examples of fundus images from (a) CHASE-DB1 and (b) STARE datasets with corresponding ground truths.
Figure 9Examples of vessel segmentation by Vess-Net for CHASE-DB1: (a) original image; (b) ground-truth mask; (c) predicted mask by Vess-Net; (d) segmented image by Vess-Net (tp is presented in blue, fp in green, and fn in black).
Figure 10Examples of vessel segmentation by Vess-Net for STARE dataset: (a) original image; (b) ground-truth mask; (c) predicted mask by Vess-Net; (d) segmented image by Vess-Net (tp is presented in blue, fp in green, and fn in black).
Accuracies of Vess-Net and existing methods for CHASE-DB1 dataset (unit: %).
| Type | Method | Se | Sp | AUC | Acc |
|---|---|---|---|---|---|
| Handcrafted local feature-based methods | Fraz et al. [ | 72.2 | 74.1 | - | 94.6 |
| Pandey et al. [ | 81.06 | 95.30 | 96.33 | 94.94 | |
| Sundaram et al. [ | 71.0 | 96.0 | - | 95.0 | |
| Learned/deep feature-based methods | Zhang et al. (without post-processing) [ | 77.86 | 96.94 | - | 94.97 |
| Zhang et al. (with post-processing) [ | 76.44 | 97.16 | - | 95.02 | |
| Wang et al. [ | 77.30 | 97.92 | - | 96.03 | |
| Fu et al. [ | 71.30 | - | - | 94.89 | |
| Yan et al. [ | 76.41 | 98.06 | 97.76 | 96.07 | |
| Jin et al. [ | 81.55 | 97.52 | 98.04 | 96.10 | |
| Leopold et al. [ | 86.18 | 89.61 | 87.90 | 89.36 | |
| Vess-Net (this work) | 82.06 | 98.41 | 98.0 | 97.26 |
Accuracies of Vess-Net and existing methods for STARE dataset (unit: %).
| Type | Method | Se | Sp | AUC | Acc |
|---|---|---|---|---|---|
| Handcrafted local feature-based methods | Akram et al. [ | - | - | 97.06 | 95.02 |
| Fraz et al. [ | 75.54 | 97.6 | - | 95.3 | |
| Kar et al. (normal cases) [ | 75.77 | 97.88 | - | 97.30 | |
| Kar et al. (abnormal cases) [ | 75.49 | 96.99 | - | 97.41 | |
| Zhao et al. (without Retinex) [ | 76.6 | 97.72 | 86.9 | 94.9 | |
| Zhao et al. (with Retinex) [ | 78.9 | 97.8 | 88.5 | 95.6 | |
| Pandey et al. [ | 83.19 | 96.23 | 95.47 | 94.44 | |
| Neto et al. [ | 83.44 | 94.43 | - | 88.94 | |
| Zhao et al. [ | 78.0 | 97.8 | 87.4 | 95.6 | |
| Jiang et al. [ | 77.67 | 97.05 | - | 95.79 | |
| Sazak et al. [ | 73.0 | 97.9 | - | 96.2 | |
| Learned/deep feature-based methods | Zhang et al. (without post-processing) [ | 77.24 | 97.04 | - | 95.13 |
| Zhang et al. (with post-processing) [ | 78.82 | 97.29 | - | 95.47 | |
| Wang et al. [ | 75.23 | 98.85 | - | 96.40 | |
| Hu et al. [ | 75.43 | 98.14 | 97.51 | 96.32 | |
| Fu et al. [ | 74.12 | - | - | 95.85 | |
| Soomro et al. [ | 74.8 | 92.2 | 83.5 | 94.8 | |
| Chudzik et al. [ | 82.69 | 98.04 | 98.37 | - | |
| Hajabdollahi et at. (CNN) [ | 78.23 | 97.70 | - | 96.17 | |
| Hajabdollahi et at. (Quantized CNN) [ | 77.92 | 97.40 | - | 95.87 | |
| Hajabdollahi et at. (Pruned-quantized CNN) [ | 75.99 | 97.57 | - | 95.81 | |
| Yan et al. [ | 77.35 | 98.57 | 98.33 | 96.38 | |
| Soomro et al. [ | 74.8 | 96.2 | 85.5 | 94.7 | |
| Jin et al. [ | 75.95 | 98.78 | 98.32 | 96.41 | |
| Leopold et al. [ | 64.33 | 94.72 | 79.52 | 90.45 | |
| Wang et al. [ | 79.14 | 97.22 | 97.04 | 95.38 | |
| Feng et al. [ | 77.09 | 98.48 | 97.0 | 96.33 | |
| Vess-Net (this work) | 85.26 | 97.91 | 98.83 | 96.97 |
Accuracies of Vess-Net trained on DRIVE and CHASE-DB1, and tested on STARE dataset (unit: %).
| Method | Se | Sp | AUC | Acc |
|---|---|---|---|---|
| Vess-Net (this work) | 81.13 | 96.21 | 97.4 | 95.11 |
Figure 11Sample image of vessel segmentation for pixel count: (a) original image; (b) predicted mask by Vess-Net.
Figure 12Dual-stream feature empowerment by Vess-Net: (a) features before point “P” (before Stream 2); (b) features after point “P” (after Stream 2); and (c) features after point “Q” (after both Streams 1 and 2).