| Literature DB >> 35746271 |
Jiejun Yang1, Liejun Wang1, Yongming Li1.
Abstract
Different feature learning strategies have enhanced performance in recent deep neural network-based salient object detection. Multi-scale strategy and residual learning strategies are two types of multi-scale learning strategies. However, there are still some problems, such as the inability to effectively utilize multi-scale feature information and the lack of fine object boundaries. We propose a feature refined network (FRNet) to overcome the problems mentioned, which includes a novel feature learning strategy that combines the multi-scale and residual learning strategies to generate the final saliency prediction. We introduce the spatial and channel 'squeeze and excitation' blocks (scSE) at the side outputs of the backbone. It allows the network to concentrate more on saliency regions at various scales. Then, we propose the adaptive feature fusion module (AFFM), which efficiently fuses multi-scale feature information in order to predict superior saliency maps. Finally, to supervise network learning of more information on object boundaries, we propose a hybrid loss that contains four fundamental losses and combines properties of diverse losses. Comprehensive experiments demonstrate the effectiveness of the FRNet on five datasets, with competitive results when compared to other relevant approaches.Entities:
Keywords: attention mechanism; deep learning; salient object detection
Mesh:
Year: 2022 PMID: 35746271 PMCID: PMC9228599 DOI: 10.3390/s22124490
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Illustration of FRNet.
Figure 2(a) Structure of DCPP [15] module; (b) details of ARM [15] module.
Figure 3Structure of scSE.
The MAE, Max-F, Sm of 15 methods on 5 datasets. The top two results are in red and blue.
| Method | ECSSD | PASCAL-S | DUT-OMRON | DUT-TEST | HKU-IS |
|---|---|---|---|---|---|
| MAE/Max-F/Sm | MAE/Max-F/Sm | MAE/Max-F/Sm | MAE/Max-F/Sm | MAE/Max-F/Sm | |
| Amulet (17) | 0.0690/0.9150/0.8840 | 0.1000/0.8280/0.8180 | 0.0980/0.7430/0.7810 | 0.0840/0.7780/0.7960 | 0.0510/0.8970/0.8860 |
| NLDF (17) | 0.0630/0.9030/0.8750 | 0.0980/0.8220/0.8030 | 0.0790/0.7530/0.7500 | 0.0650/0.8160/0.8050 | 0.0480/0.9020/0.8780 |
| PiCANet (18) | 0.0460/0.9310/0.9140 | 0.0770/0.8570/0.8500 | 0.0640/0.8200/0.8080 | 0.0500/0.8630/0.8500 | 0.0440/0.9200/0.9050 |
| DSS (19) | 0.0520/0.9160/0.8820 | 0.0960/0.8360/0.7970 | 0.0740/0.7600/0.7650 | 0.0650/0.8130/0.8120 | 0.0500/0.9000/0.8780 |
| MLMS (19) | 0.0450/0.9280/0.9110 | 0.0740/0.8550/0.8440 | 0.0640/0.7740/0.8090 | 0.0480/0.8520/0.8510 | 0.0390/0.9210/0.9070 |
| CPD (19) | 0.0370/0.9390/0.9180 | 0.0710/0.8610/0.8480 | 0.0560/0.7970/0.8250 | 0.0430/0.8650/0.8580 | 0.0340/0.9250/0.9050 |
| PoolNet (19) | 0.0390/ | 0.0750/ | 0.0560/0.8080/0.8360 | 0.0400/0.8800/0.8710 | 0.0330/0.9320/0.9170 |
| BASNet (19) | 0.0370/0.9420/0.9160 | 0.0760/0.8540/0.8380 | 0.0560/0.8050/0.8360 | 0.0470/0.8600/0.8530 | 0.0320/0.9280/0.9090 |
| GCPA (20) | 0.0350/0.9431/ | 0.0712/0.8632/0.8561 | 0.0560/ | 0.0364/0.8781/0.8715 | 0.0322/0.9284/0.9144 |
| ITSD (20) | 0.0346/0.9393/0.9249 | 0.0712/0.8354/ | 0.0608/0.7916/ | 0.0408/0.8669/ | 0.0307/0.9257/0.9169 |
| MINet (20) | 0.0780/0.8610/0.8480 | 0.0560/0.8100/0.8220 | 0.0370/ | 0.0330/0.9310/0.9140 | |
| F3Net (20) | 0.0530/0.7660/0.8380 | 0.0350/0.8400/0.8751 | 0.0280/0.9100/ | ||
| MPI (21) | 0.0318/0.9415/0.9252 | 0.0690/0.8381/ | 0.0560/0.7798/0.8336 | 0.0348/0.8749/ | |
| RCSB (22) | 0.0335/0.9355/0.9218 | 0.0684/0.8311/0.8597 | |||
| Baseline (R2Net) | 0.0440/0.9350/0.9150 | 0.0750/0.8280/0.8470 | 0.0610/0.7715/0.8240 | 0.0500/0.8582/0.8610 | 0.0390/0.9210/0.9030 |
| Ours | 0.0273/ |
Figure 4These saliency maps are generated by Amulet, BasNet, PiCANet, CPD, DSS, F3Net, GCPA, ITSD, MINet, NLDF, PoolNet, MLMS, MPI, RCSB and R2Net, respectively. FRNet generates predictions that are the best, especially for the object boundary details.
Figure 5The PR curves of FRNet and the other nine advanced models on five datasets.
Figure 6The F-measure curves of FRNet and the other nine advanced methods on five datasets.
Comparison of different combinations of three components. B denotes the baseline. AFFM denotes adaptive feature fusion module. scSE denotes channel-spatial attention. Loss denotes hybrid loss. The best results are in bold.
| Method | ECSSD | PASCAL-S | DUT-OMRON | DUT-TEST | HKU-IS |
|---|---|---|---|---|---|
| MAE/Max-F/Sm | MAE/Max-F/Sm | MAE/Max-F/Sm | MAE/Max-F/Sm | MAE/Max-F/Sm | |
| B | 0.0440/0.9350/0.9150 | 0.0750/0.8280/0.8470 | 0.0610/0.7715/0.8240 | 0.0500/0.8582/0.8610 | 0.0390/0.9210/0.9030 |
| B+AFFM | 0.0363/0.9245/0.9093 | 0.0706/0.8269/0.8506 | 0.0498/0.7570/0.8161 | 0.0344/0.8526/0.8695 | 0.0296/0.9136/0.9026 |
| B+Loss | 0.0354/0.9295/0.9110 | 0.0690/0.8319/0.8542 | 0.0502/0.7643/0.8192 | 0.0351/0.8539/0.8666 | 0.0294/0.9168/0.9020 |
| B+scSE | 0.0356/0.9285/0.9103 | 0.0701/0.8292/0.8503 | 0.0512/0.7655/0.8210 | 0.0347/0.8583/0.8701 | 0.0300/0.9151/0.9015 |
| B+scSE+Loss | 0.0327/0.9327/0.9164 | 0.0697/0.8325/0.8528 | 0.0506/0.7710/0.8252 | 0.0345/0.8600/0.8738 | 0.0285/0.9200/0.9068 |
| B+Loss+AFFM | 0.0366/0.9286/0.9063 | 0.0691/0.8303/0.8509 | 0.0521/0.7622/0.8128 | 0.0374/0.8455/0.8564 | 0.0318/0.9107/0.8935 |
| B+scSE+AFFM | 0.0328/0.9357/0.9174 | 0.0681/0.8325/0.8553 | 0.0506/0.7663/0.8221 | 0.0358/0.8545/0.8695 | 0.0288/0.9191/0.9064 |
| B+scSE+Loss+AFFM |
|
|
|
|
|
Figure 7The comparison of different combinations with the backbone.
Comparison of different combinations of loss functions. The summation of loss functions is without extra parameters. The bold number indicate the best result.
| Method | DUT-OMRON | DUT-TEST | HKU-IS |
|---|---|---|---|
| MAE/Max-F/Sm | MAE/Max-F/Sm | MAE/Max-F/Sm | |
| Baseline (BCE) | 0.0610/ | 0.0500/ | 0.0390/ |
| Dice+IOU | 0.0569/0.7554/0.8159 | 0.0418/0.8481/0.8518 | 0.0355/0.9046/0.8980 |
| BCE+Dice | 0.0525/0.7561/0.8156 | ||
| BCE+SSIM | 0.0373/0.8436/0.8554 | 0.0330/0.9126/0.8913 | |
| BCE+SSIM+Dice+IOU | 0.0532/0.7524/0.8079 | 0.0366/0.8385/0.8610 | 0.0334/0.9075/0.8991 |
Ablation analysis of hybrid loss hyper-parameters. α denotes ration of (BCE+SSIM), β denotes ration of (Dice+IOU). The best results are in bold.
| Method | DUT-OMRON | DUT-TEST | HKU-IS |
|---|---|---|---|
| MAE/Max-F/Sm | MAE/Max-F/Sm | MAE/Max-F/Sm | |
| Baseline (BCE) | 0.0610/ | 0.0500/0.8582/0.8610 | 0.0390/0.9210/0.9030 |
|
| 0.0523/0.7520/0.8082 | 0.0380/0.8432/0.8518 | 0.0337/0.9080/0.8908 |
|
| 0.0541/0.7529/0.8084 | 0.0408/0.8322/0.8470 | 0.0346/0.9081/0.8887 |
|
| 0.0538/0.7475/0.8015 | 0.0399/0.8344/0.8430 | 0.0360/0.9048/0.8820 |
|
|
|
|