| Literature DB >> 34883953 |
Mengfan Xue1, Minghao Chen1,2, Dongliang Peng1, Yunfei Guo1, Huajie Chen1.
Abstract
Attention mechanisms have demonstrated great potential in improving the performance of deep convolutional neural networks (CNNs). However, many existing methods dedicate to developing channel or spatial attention modules for CNNs with lots of parameters, and complex attention modules inevitably affect the performance of CNNs. During our experiments of embedding Convolutional Block Attention Module (CBAM) in light-weight model YOLOv5s, CBAM does influence the speed and increase model complexity while reduce the average precision, but Squeeze-and-Excitation (SE) has a positive impact in the model as part of CBAM. To replace the spatial attention module in CBAM and offer a suitable scheme of channel and spatial attention modules, this paper proposes one Spatio-temporal Sharpening Attention Mechanism (SSAM), which sequentially infers intermediate maps along channel attention module and Sharpening Spatial Attention (SSA) module. By introducing sharpening filter in spatial attention module, we propose SSA module with low complexity. We try to find a scheme to combine our SSA module with SE module or Efficient Channel Attention (ECA) module and show best improvement in models such as YOLOv5s and YOLOv3-tiny. Therefore, we perform various replacement experiments and offer one best scheme that is to embed channel attention modules in backbone and neck of the model and integrate SSAM into YOLO head. We verify the positive effect of our SSAM on two general object detection datasets VOC2012 and MS COCO2017. One for obtaining a suitable scheme and the other for proving the versatility of our method in complex scenes. Experimental results on the two datasets show obvious promotion in terms of average precision and detection performance, which demonstrates the usefulness of our SSAM in light-weight YOLO models. Furthermore, visualization results also show the advantage of enhancing positioning ability with our SSAM.Entities:
Keywords: YOLO; attention mechanism; light-weight model; object detection; sharpening filter
Mesh:
Year: 2021 PMID: 34883953 PMCID: PMC8659721 DOI: 10.3390/s21237949
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Comparison of two different fusion methods of attention mechanisms.
| Description | AP50 | AP50:95 |
|---|---|---|
| YOLOv5s | 59.9% | 35.5% |
| YOLOv5s+SE (left) | 59.3% | 34.9% |
| YOLOv5s+SE (right) | 60.5% | 35.9% |
| YOLOv5s+ECA (left) | 59.4% | 35.1% |
| YOLOv5s+ECA (right) | 61.1% | 36.0% |
| YOLOv5s+CBAM (left) | 58.9% | 34.5% |
| YOLOv5s+CBAM (right) | 59.2% | 34.8% |
Figure 1Our proposed SSA module.
Figure 2The ECA module [15].
Figure 3Two different fusion methods.
Comparison of different combination methods of our SSAM (Confidence threshold: 0.001; NMS threshold: 0.6).
| Description | Backbone | Neck | Head | AP50 | AP50:95 |
|---|---|---|---|---|---|
| YOLOv5s | No | No | No | 59.9% | 35.5% |
| YOLOv5s+ECA | No | No | ECA | 60.7% | 35.9% |
| YOLOv5s+SSA | No | No | SSA | 60.4% | 35.7% |
| YOLOv5s+SSAM | No | No | ECA+SSA | 60.7% | 35.7% |
| YOLOv5s+SE | SE | SE | SE | 60.5% | 35.9% |
| YOLOv5s+CBAM | CBAM | CBAM | CBAM | 59.2% | 34.8% |
| YOLOv5s+ECA | ECA | ECA | ECA | 61.1% | 36.0% |
| YOLOv5s+[ECA+SAM] | ECA | ECA | ECA+SAM | 59.5% | 35.1% |
| YOLOv5s+SSAM | SSA+ECA | SSA+ECA | SSA | False | False |
| YOLOv5s+SSAM | SSA+ECA | SSA+ECA | SSA+ECA | False | False |
| YOLOv5s+SSAM | ECA | SSA+ECA | SSA+ECA | 61.4% | 36.1% |
| YOLOv5s+SSAM | ECA | ECA | SSA+ECA | 62.2% | 36.8% |
| YOLOv5s+SSAM | ECA | ECA | ECA+SSA | 62.3% | 37.1% |
| YOLOv5s+SSAM | ECA | ECA | ECA+NSA | 59.6% | 35.0% |
| YOLOv5s+SSAM | ECA | ECA | SSA | 60.6% | 35.9% |
Figure 4Two different parts of attention.
Comparison of different operators of edge detection in our SSA module.
| Description | Laplace 3 × 3 | Laplace 5 × 5 | Sobel 3 × 3 | AP50 | AP50:95 |
|---|---|---|---|---|---|
| YOLOv5s | 59.9% | 35.5% | |||
| YOLOv5s+SSAM | √ | 61.4% | 36.4% | ||
| YOLOv5s+SSAM | √ | 62.3% | 37.1% | ||
| YOLOv5s+SSAM | √ | 61.8% | 36.5% |
Comparison of different extraction methods of our SSA module.
| Description | Maxpool | Avgpool | Max and Avgpool | AP50 | AP50:95 |
|---|---|---|---|---|---|
| YOLOv5s | 59.9% | 35.5% | |||
| YOLOv5s+SSAM | √ | 61.3% | 36.3% | ||
| YOLOv5s+SSAM | √ | 60.9% | 36.0% | ||
| YOLOv5s+SSAM | √ | 62.3% | 37.1% |
Comparison of different structures of our SSAM in YOLOv5s model.
| Description | AP50 | AP75 | AP50:95 | FPS | Gflops | Parameters | Weights |
|---|---|---|---|---|---|---|---|
| YOLOv5s | 55.6% | 39.0% | 36.8% | 455 | 17.0 | 7,276,605 | 14.11 m |
| YOLOv5s+SE | 56.0% | 40.1% | 36.8% | 416 | 17.1 | 7,371,325 | 14.30 m |
| YOLOv5s+SE+SSA | 56.9% | 39.8% | 36.9% | 416 | 17.1 | 7,371,406 | 14.31 m |
| YOLOv5s+ECA | 56.7% | 40.2% | 37.0% | 435 | 17.1 | 7,276,629 | 14.12 m |
| YOLOv5s+ECA+SSA | 57.6% | 40.9% | 37.7% | 435 | 17.1 | 7,276,710 | 14.13 m |
Comparison of different structures of our SSAM in YOLOv3-tiny model.
| Description | AP50 | AP75 | AP50:95 | FPS | Gflops | Parameters | Weights |
|---|---|---|---|---|---|---|---|
| YOLOv3-tiny | 34.9% | 15.8% | 17.6% | 667 | 13.3 | 8,852,366 | 16.94 m |
| YOLOv3-tiny+SE | 35.7% | 16.4% | 18.1% | 588 | 13.4 | 8,969,742 | 17.18 m |
| YOLOv3-tiny+SE+SSA | 36.0% | 16.8% | 18.3% | 588 | 13.4 | 8,969,796 | 17.19 m |
| YOLOv3-tiny+ECA | 35.6% | 16.4% | 18.0% | 625 | 13.3 | 8,885,155 | 17.01 m |
| YOLOv3-tiny+ECA+SSA | 35.8% | 16.5% | 18.2% | 625 | 13.3 | 8,885,209 | 17.02 m |
Comparison of different object size with our SSAM in YOLOv5s model.
| Description | APsmall | APmedium | APlarge |
|---|---|---|---|
| YOLOv5s | 21.1% | 41.9% | 45.5% |
| YOLOv5s+SE | 20.9% | 42.1% | 46.5% |
| YOLOv5s+SE+SSA | 21.7% | 42.0% | 46.6% |
| YOLOv5s+ECA | 20.5% | 42.4% | 47.1% |
| YOLOv5s+ECA+SSA | 23.1% | 43.2% | 47.8% |
Comparison of different object size with our SSAM in YOLOv3-tiny model.
| Description | APsmall | APmedium | APlarge |
|---|---|---|---|
| YOLOv3-tiny | 9.6% | 22.2% | 22.1% |
| YOLOv3-tiny+SE | 9.8% | 22.6% | 22.8% |
| YOLOv3-tiny+SE+SSA | 10.4% | 23.0% | 22.6% |
| YOLOv3-tiny+ECA | 9.8% | 22.6% | 22.9% |
| YOLOv3-tiny+ECA+SSA | 10.1% | 22.5% | 23.1% |
Figure 5Detection results of YOLOv5s model with SSAM: (a) YOLOv5s; (b) YOLOv5s+ECA; (c) YOLOv5s+ECA+SSA.
Figure 6Visualization 1 of YOLOv5s model with SSAM: (a) YOLOv5s; (b) YOLOv5s+ECA; (c) YOLOv5s+ECA+SSA.
Figure 7Visualization 2 of YOLOv5s model with SSAM: (a) YOLOv5s; (b) YOLOv5s+ECA; (c) YOLOv5s+ECA+SSA.
Figure 8Visualization 3 of YOLOv5s model with SSAM: (a) YOLOv5s; (b) YOLOv5s+ECA; (c) YOLOv5s+ECA+SSA.