| Literature DB >> 34950679 |
Jiawei Zhang1,2,3,4, Yanchun Zhang4,5,6, Hailong Qiu1, Wen Xie1, Zeyang Yao1, Haiyun Yuan1, Qianjun Jia1, Tianchen Wang3, Yiyu Shi3, Meiping Huang1, Jian Zhuang1, Xiaowei Xu1.
Abstract
Retinal vessel segmentation plays an important role in the diagnosis of eye-related diseases and biomarkers discovery. Existing works perform multi-scale feature aggregation in an inter-layer manner, namely inter-layer feature aggregation. However, such an approach only fuses features at either a lower scale or a higher scale, which may result in a limited segmentation performance, especially on thin vessels. This discovery motivates us to fuse multi-scale features in each layer, intra-layer feature aggregation, to mitigate the problem. Therefore, in this paper, we propose Pyramid-Net for accurate retinal vessel segmentation, which features intra-layer pyramid-scale aggregation blocks (IPABs). At each layer, IPABs generate two associated branches at a higher scale and a lower scale, respectively, and the two with the main branch at the current scale operate in a pyramid-scale manner. Three further enhancements including pyramid inputs enhancement, deep pyramid supervision, and pyramid skip connections are proposed to boost the performance. We have evaluated Pyramid-Net on three public retinal fundus photography datasets (DRIVE, STARE, and CHASE-DB1). The experimental results show that Pyramid-Net can effectively improve the segmentation performance especially on thin vessels, and outperforms the current state-of-the-art methods on all the adopted three datasets. In addition, our method is more efficient than existing methods with a large reduction in computational cost. We have released the source code at https://github.com/JerRuy/Pyramid-Net.Entities:
Keywords: deep learning; feature aggregation; neural network; pyramid scale; retinal vessel segmentation
Year: 2021 PMID: 34950679 PMCID: PMC8688400 DOI: 10.3389/fmed.2021.761050
Source DB: PubMed Journal: Front Med (Lausanne) ISSN: 2296-858X
Figure 1Examples of challenging thin vessels in retinal vessel segmentation. The retinal fundus image (left) contains numerous thin vessels (1–2 pixels wide) and thick vessels (3 pixels wide or more) (10). Regions of representative thin and thick vessels, and their corresponding ground truth and predictions (11) are shown in the right. It can be noticed that the thick vessels obtain a better segmentation performance, while the thin vessels suffer a big miss (indicated by red rectangles).
Figure 2Illustrations of network structures of (a) basic U-Net (23) and (b–e) existing multi-scale feature aggregation methods, which mainly consist of two major categories: input-output level and intra-network level. The input-output level category means that the network employs multiple scaled inputs, and the scaled ground truth supervises the inter feature maps. In the intra-network level category, the encoder level, the decoder level, and the cross-level indicate implemented multi-scale feature aggregation in the encoder, the decoder, and their cross, respectively.
Figure 3The network structure of the proposed Pyramid-Net. IPABs (green rectangle) not only aggregate features at pyramid scales [the current scale (green line), the higher scale (dark green line) and the lower scale (bright green line)] containing coarse-to-fine context information. Meanwhile, pyramid input enhancement (yellow rectangle), deep pyramid supervision (purple rectangle), and pyramid skip connections (rad rectangle) are employed to further improve the overall segmentation. Best viewed in color.
Figure 4The network structure of (A) ResNet blocks and (B) our intra-layer pyramid-scale aggregation blocks (IPABs). IPABs (marked by green rectangles) aggregate coarse-to-fine features at the current scale and both the higher scale and the lower scale (pyramid scales). Meanwhile, pyramid input enhancement (marked by yellow rectangles) and deep pyramid supervision (marked by purple rectangles) are employed to fuse the original images with corresponding scales, and supervise the intermediate results in each layer of the decoder, respectively.
Performance comparison of Pyramid-Net and the state-of-the-art methods on the DRIVE dataset.
|
|
|
|
|
|
|---|---|---|---|---|
| FCN ( | 74.89 | 96.21 | 94.13 | 95.67 |
| U-Net ( | 75.31 | 96.45 | 94.45 | 96.01 |
| DeepVessel ( | 76.12 | 97.68 | 95.23 | 97.52 |
| ( | 76.53 | 98.18 | 95.42 | 97.52 |
| ( | 77.92 | 98.13 | 95.56 | 97.84 |
| ( | 78.44 | 98.07 | 95.67 | 98.19 |
| CE-Net ( | 83.09 | 97.47 | 95.45 | 97.79 |
| BTS-DSN ( | 78.91 | 98.04 | 95.61 | 98.06 |
| ( | 79.16 | 98.11 | 95.70 | 98.10 |
| ( | 79.40 | 98.16 | 95.67 | 97.72 |
| Vessel-Net ( | 80.38 | 98.02 | 95.78 | 98.21 |
| MResU-Net ( | 79.69 | 97.99 | - | 97.99 |
| CTF-Net ( | 78.49 | 98.13 | 95.67 | 97.88 |
| Hybrid-Net ( |
| 97.51 | 95.79 | - |
| HA-Net ( | 79.91 | 98.13 | 95.81 | 98.23 |
|
| 82.38 |
|
|
|
Bold values mean the state-of-the-art performance.
Performance comparison of Pyramid-Net and the state-of-the-art methods on the STARE dataset.
|
|
|
|
|
|
|---|---|---|---|---|
| ( | 73.20 | 98.40 | 95.60 | 96.70 |
| ( | 77.91 | 97.58 | 95.54 | 97.48 |
| ( | 76.80 | 97.38 | - | - |
| ( | 75.81 | 98.46 | 96.12 | 98.01 |
| ( | 75.95 | 98.78 | 96.41 | 98.32 |
| Three-stage ( | 77.35 | 98.57 | 96.38 | 98.33 |
| MResU-Net ( | 81.01 | 97.95 | - | 98.16 |
| Hybrid-Net ( | 79.46 | 98.21 | 96.26 | - |
| HA-Net ( | 81.86 | 98.44 | 96.73 | 98.32 |
|
|
|
|
|
|
Bold values mean the state-of-the-art performance.
Figure 5Visual comparison of Pyramid-Net and the state-of-the-art methods including DeepVessel (11) and CE-Net (32) on DRIVE (Row 1–2), CHASE-DB1 (Row 3–4), and STARE (Row 5) datasets. White (TP) and black (TN) pixels indicate correct predictions of object and background, respectively, while red (FP) and green (FN) pixels indicate incorrect predictions. The dark yellow rectangle contains the area used to compare segmentation details, and the bright yellow rectangle contains the zoomed area in the dark yellow rectangle. Best viewed in color.
Performance comparison on thick and thin vessels of Pyramid-Net on the DRIVE dataset.
|
|
|
|
|
|---|---|---|---|
| ( | 95.42 | 95.78 | 87.78 |
| CE-Net ( | 95.45 | 95.96 | 86.91 |
|
|
|
|
|
Bold values mean the state-of-the-art performance.
Ablation analysis of Pyramid-Net on the DRIVE dataset.
|
|
|
|
|---|---|---|
| Baseline | 95.45 | 97.79 |
| Baseline + IPABs | 96.07 | 98.09 |
| Baseline + IPABs + pyramid input | 96.10 | 98.15 |
| Baseline + IPABs + Pyramid supervision | 96.15 | 98.12 |
| Baseline + IPABs + pyramid skip connection | 96.21 | 98.24 |
|
|
|
|
Bold values mean the state-of-the-art performance.
Cross-training evaluation on the DRIVE dataset and the STARE dataset.
|
|
|
|
|
|
|---|---|---|---|---|
| ( | 70.14 | 98.02 | 94.44 | 95.68 |
| ( | 65.05 | 99.14 | 94.81 | 97.18 |
| HA-Net ( | 71.40 | 98.79 | 95.30 | 97.58 |
| Pyramid-Net |
|
|
|
|
|
| ||||
| ( | 73.19 | 98.40 | 95.80 | 96.78 |
| ( | 70.00 | 97.59 | 94.74 | 97.18 |
| HA-Net ( | 81.87 |
| 95.30 | 97.58 |
| Pyramid-Net |
| 98.76 |
|
|
Bold values mean the state-of-the-art performance.
Comparison with existing multi-scale aggregation methods on the DRIVE Dataset.
|
|
|
|
|
|
|---|---|---|---|---|
| U-Net ( | 94.45 | 96.01 | 334.95G | <0.01 |
| DPC ( | 95.56 | 97.65 | 351.33G | <0.01 |
| CB-Net ( | 95.61 | 97.52 | 441.62G | <0.01 |
| DDSC ( | 95.42 | 97.48 | 381.07G | <0.01 |
| U-Net ++ ( | 95.27 | 96.82 | 828.69G | <0.01 |
| CE-Net ( | 95.45 | 97.79 | - | <0.05 |
|
|
|
| 188.15G | - |
Bold values mean the state-of-the-art performance.
Performance comparison of Pyramid-Net and the state-of-the-art methods on the CHASE-DB1 dataset.
|
|
|
|
|
|
|---|---|---|---|---|
| ( | 76.15 | 95.75 | 94.67 | 96.23 |
| ( | 75.07 | 97.93 | 95.81 | 97.16 |
| ( | 81.94 | 97.39 | 96.30 | - |
| ( | 76.33 | 98.09 | 96.10 | 97.81 |
| ( | 77.56 | 98.20 | 96.34 | 98.15 |
| FCN ( | 76.41 | 98.06 | 96.07 | 97.76 |
| ( | 81.55 | 97.52 | 96.10 | 98.04 |
| ( | 78.88 | 98.01 | 96.27 | 98.40 |
| ( | 80.74 | 98.21 | 96.61 | 98.12 |
| ( | 81.32 | 98.14 | 96.61 | 98.60 |
| Three-stage ( | 76.41 | 98.06 | 96.07 | 97.76 |
| CTF-Net ( | 79.48 |
| 96.48 | 98.47 |
| Hybrid-Net ( | 81.76 | 97.76 | 96.32 | - |
| HA-Net ( |
| 98.13 | 96.70 | 98.70 |
| Pyramid-Net | 81.17 | 98.26 |
|
|
Bold values mean the state-of-the-art performance.