| Literature DB >> 36006956 |
Jinkang Wang1, Xiaohui He1, Faming Shao1, Guanlin Lu1, Ruizhe Hu1, Qunyan Jiang1.
Abstract
With the exploration and development of marine resources, deep learning is more and more widely used in underwater image processing. However, the quality of the original underwater images is so low that traditional semantic segmentation methods obtain poor segmentation results, such as blurred target edges, insufficient segmentation accuracy, and poor regional boundary segmentation effects. To solve these problems, this paper proposes a semantic segmentation method for underwater images. Firstly, the image enhancement based on multi-spatial transformation is performed to improve the quality of the original images, which is not common in other advanced semantic segmentation methods. Then, the densely connected hybrid atrous convolution effectively expands the receptive field and slows down the speed of resolution reduction. Next, the cascaded atrous convolutional spatial pyramid pooling module integrates boundary features of different scales to enrich target details. Finally, the context information aggregation decoder fuses the features of the shallow network and the deep network to extract rich contextual information, which greatly reduces information loss. The proposed method was evaluated on RUIE, HabCam UID, and UIEBD. Compared with the state-of-the-art semantic segmentation algorithms, the proposed method has advantages in segmentation integrity, location accuracy, boundary clarity, and detail in subjective perception. On the objective data, the proposed method achieves the highest MIOU of 68.3 and OA of 79.4, and it has a low resource consumption. Besides, the ablation experiment also verifies the effectiveness of our method.Entities:
Mesh:
Year: 2022 PMID: 36006956 PMCID: PMC9409518 DOI: 10.1371/journal.pone.0272666
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Fig 1The original underwater images with low image quality.
Comparison of semantic image segmentation methods.
| Algorithm | Brief methodology | Highlights | Limitations |
|---|---|---|---|
| FCN | Upsampling,Skip layer | The slip layer method considers both global semantic information and local location information. | A series of pooling operations lead to the reduction of feature map resolution and the loss of pixel spatial position information. |
| Deeplab V3+ | Improved atrous convolution, Improved ASPP | Improved atrous convolution and improved ASPP can effectively expand the receptive field, capture context information, and improve the spatial accuracy of segmentation results. | The dense network structure leads to slow segmentation speed, and the segmentation effect is not obvious for small-sized objects. |
| Dilation10 | Atrous convolution, Feature fusion | Properly abandoning the pooling layer can effectively slow down the reduction of feature map resolution while increasing the receptive field. | The continuity of local pixel information is interrupted, and the adaptability to unknown deformation is poor. |
| Segnet | Encoder-decoder mechanism | Up-pooling is used to restore The spatial position information of pixels and improve the segmentation resolution. | Too many network training parameters lead to high computational costs |
| RefineNet | Multi-path optimization, Refining module | The multi-path optimization network can effectively obtain the context information of the image, and improve the utilization of local and global features of the image. | The boundary information of the segmentation target will be partially lost. |
| DFANet | Deep feature aggregation, Lightweight backbones | Incorporate high-level context into the function of coding, and strike an effective balance between segmentation speed and accuracy. | Insufficient ability to obtain image location information. |
| DANet | Dual attention | The self-attention mechanism is used to integrate the local features of the image and capture the context dependence. | The relationship between each pixel needs to be considered, so the amount of calculation is large. |
| APCNet | Global-guided local affinity, Adaptive context module | Adaptive construction of multi-scale context representation, combined with local and global representation information to estimate robust weights, greatly improves the semantic segmentation effect. | The network structure is complex, the number of parameters is large, and the calculation cost is high. |
| CANet | Dense Comparison Module, Iterative Optimization Module | The multi-level feature representation from CNN is effectively used for intensive feature comparison, and an attention mechanism is added to fuse information from different support examples. | Insufficient ability to obtain image location information. |
Fig 2The pipeline of the proposed method.
Fig 3The flowchart of the proposed underwater image enhancement algorithm.
Fig 4Comparison of underwater image enhancement effect.
The upper line shows the original underwater images, and the lower line shows the enhanced images.
Fig 5Covering effect of three-layer atrous convolution with equal atrous rate.
Fig 6Covering effect of hybrid atrous convolution with unequal atrous rate.
Fig 7The structure of the CASPP module.
Fig 8The structure of the context information aggregation decoder.
Summary of underwater datasets.
| Dataset | Quantity | Marked quantity | Resolution ratio | Scene distribution | Source |
|---|---|---|---|---|---|
| RUIE [ | 4000 | 500 | 400×300 | Scallop, kelp, sea urchin | Dalian University of Science and Technology |
| HabCam UID [ | 10465 | 2300 | 2720×1024 | Fish, scallop | CVPR AAMVEM studio |
| UIEBD [ | 950 | 200 | Multiresolution | Diver, statue, marine life | City University of Hong Kong |
Fig 9The color representation of labeled categories.
Fig 10Qualitative comparisons on colored cast underwater images.
From left to right are original images, the enhanced images, and the results generated by Deeplab V3+, DFANet, APCNet, the method proposed by Liu et al., our method, and the ground truth.
Fig 11Qualitative comparisons on clear underwater images.
From left to right are the original images, the enhanced images, and the results generated by Deeplab V3+, DFANet, APCNet, the method proposed by Liu et al., our method, and the ground truth.
Comparison of IOU and F1 values between our method and the other four state-of-the-art methods on the underwater dataset.
The best results are marked in bold.
| Target category | Deeplab V3+ | DFANet | APCNet | Liu et al. | Our method | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| IOU | F1 | IOU | F1 | IOU | F1 | IOU | F1 | IOU | F1 | |
| Fish | 63.4 | 75.2 | 60.2 | 77.2 | 66.1 | 76.3 | 64.4 | 76.9 |
|
|
| Diver | 63.2 | 73.7 | 61.4 | 75.8 | 63.0 | 75.2 | 63.1 | 76.2 |
|
|
| Coral | 61.1 | 74.1 |
| 75.4 | 63.5 | 76.4 | 63.8 |
| 65.6 | 76.3 |
| Rock | 59.8 | 80.2 | 59.1 | 80.5 | 61.4 | 80.2 | 61.2 | 80.1 |
|
|
| Sculpture | 62.4 | 74.5 | 62.1 | 76.1 | 64.2 |
| 63.0 | 77.8 |
| 77.4 |
| Octopus | 64.6 | 72.3 | 61.8 | 75.1 | 63.6 | 75.2 | 66.3 | 74.9 |
|
|
| Turtle |
| 75.9 | 60.3 | 77.7 | 63.7 | 77.1 | 65.8 | 77.8 | 66.7 |
|
| Seaweed | 60.3 | 79.1 | 65.7 | 79.3 | 66.3 | 79.6 | 67.1 |
|
| 79.4 |
| Manta Ray | 62.8 | 78.7 | 62.9 | 79.4 | 67.5 | 79.8 |
| 80.1 | 67.8 |
|
| Starfish | 61.1 | 79.4 | 61.7 | 80.4 | 67.0 | 80.5 | 68.4 | 80.4 |
|
|
| Shell | 60.2 | 75.1 | 60.9 | 75.7 | 64.1 | 76.3 | 62.6 | 76.5 |
|
|
| Sea urchin | 60.7 | 79.5 | 64.5 | 80.6 | 64.9 | 80.7 | 63.4 | 80.7 |
|
|
Comparison of MIOU and OA values between our method and the other four state-of-the-art methods on the underwater dataset.
The best results are in bold.
| Method | Deeplab V3+ | DFANet | APCNet | Liu et al. | Our method |
|---|---|---|---|---|---|
| MIOU | 61.3 | 63.2 | 65.5 | 64.9 |
|
| OA | 77.5 | 78.0 | 78.6 | 78.5 |
|
Comparison of execution speed of different algorithms on the CUOID dataset.
| Method | Input size | Parameters(M) | FLOPs(G) | FPS |
|---|---|---|---|---|
| Deeplab V3+ | 513×513 | 213.1 | ~ | 0.4 |
| DFANet | 512×1024 | 7.8 | 1.7 | 160 |
| APCNet | 512×1024 | ~ | ~ | ~ |
| CANet | 640×360 | 19.7 | 6.2 | 72 |
| Our method | 512×512 | 9.6 | 1.8 | 125 |
The contribution of adding DCHAC, ACSPP, CIA decoder, and UIE to the objective evaluation performance.
The best results are marked in bold.
| Baseline | DCHAC | CASPP | Decoder | CIA Decoder | UIE | MIOU |
|---|---|---|---|---|---|---|
| ✓ | ✓ | ✓ | 60.2 | |||
| ✓ | ✓ | ✓ | 61.9 | |||
| ✓ | ✓ | ✓ | ✓ | 64.3 | ||
| ✓ | ✓ | ✓ | ✓ | 65.7 | ||
| ✓ | ✓ | ✓ | ✓ | ✓ |
|
Fig 12Failure examples of the proposed method.