| Literature DB >> 35891034 |
Yunzhi Zhang1, Jiancheng Liang1, Qinghua Lu1, Lufeng Luo1, Wenbo Zhu1, Quan Wang1, Junmeng Lin1.
Abstract
When performing robotic automatic sorting and assembly operations of multi-category hardware, there are some problems with the existing convolutional neural network visual recognition algorithms, such as large computing power consumption, low recognition efficiency, and a high rate of missed detection and false detection. A novel efficient convolutional neural algorithm for multi-category aliasing hardware recognition is proposed in this paper. On the basis of SSD, the novel algorithm uses Resnet-50 instead of VGG16 as the backbone feature extraction network, and it integrates ECA-Net and Improved Spatial Attention Block (ISAB): two attention mechanisms to improve the ability of learning and extract target features. Then, we pass the weighted features to extra feature layers to build an improved SSD algorithm. At last, in order to compare the performance difference between the novel algorithm and the existing algorithms, three kinds of hardware with different sizes are chosen to constitute an aliasing scene that can simulate an industrial site, and some comparative experiments have been completed finally. The experimental results show that the novel algorithm has an mAP of 98.20% and FPS of 78, which are better than Faster R-CNN, YOLOv4, YOLOXs, EfficientDet-D1, and original SSD in terms of comprehensive performance. The novel algorithm proposed in this paper can improve the efficiency of robotic sorting and assembly of multi-category hardware.Entities:
Keywords: attention mechanisms; complex aliasing scenes; convolutional neural networks; multi-category hardware
Mesh:
Year: 2022 PMID: 35891034 PMCID: PMC9317917 DOI: 10.3390/s22145358
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Structure of SSD algorithm.
Figure 2Improved SSD algorithm architecture.
Figure 3The structure of the backbone (based on Resnet-50) proposed in this paper.
Figure 4The structure of the attention mechanism proposed in this paper.
Figure 5Structure of ECA-Net.
Figure 6Structure of Improved Spatial Attention Block.
Comparison of SAM and ISAB.
| SAM | ISAB |
|---|---|
| Input | Input |
| Max Pooling | Max Pooling |
| Agv Pooling | Agv Pooling |
| Concat | Transpose |
| Sigmoid (Conv | Sigmoid (Conv |
| Output | Output |
Figure 7Dataset augmentation.
Figure 8The details of hardware included in the dataset.
Model training hyperparameter.
| Hyperparameter | Value |
|---|---|
| Input size |
|
| Learning rate | 0.0005 |
| Weight decay | 0.0005 |
| Batch size | 4 |
| Epochs | 200 |
| Momentum | 0.9 |
| Gamma | 0.9 |
| Optimizer | Adam |
Figure 9Heatmap of network learning area before and after improvement. (The brighter areas in the graph represent the higher weights assigned by the algorithm.)
Average precision (AP) comparison of different models.
| Methods | A/% | B/% | C/% |
|---|---|---|---|
| Faster R-CNN | 98.34 | 98.54 | 90.79 |
| YOLOv4 | 98.31 | 98.27 | 97.87 |
| YOLOXs | 98.00 | 97.18 | 98.07 |
| EfficientDet-D1 | 98.12 | 97.41 | 99.55 |
| Original SSD | 98.24 | 97.95 | 88.66 |
| SSD+ResNet-50 | 98.52 | 97.87 | 94.31 |
| SSD+ResNet-50+ECA | 98.43 | 96.63 | 96.43 |
| SSD+ResNet-50+ECA+SAM | 98.40 | 97.96 | 97.82 |
| Ours | 98.50 | 98.22 | 97.88 |
Performance comparison of different models. (FLOPs and GPU memory footprint are measured by torchstat-0.0.7 and NVIDIA-SMI-472.12, respectively.)
| Methods | mAP/% | FPS | FLOPs /GFLOPs | GPU Memory Footprint /GB |
|---|---|---|---|---|
| Faster R-CNN | 95.89 | 18 | 184.99 | 6.6 |
| YOLOv4 | 98.15 | 42 | 29.89 | 6.3 |
| YOLOXs | 97.75 | 62 | 13.32 | 2.0 |
| EfficientDet-D1 | 98.36 | 20 | 11.21 | 3.2 |
| Original SSD | 94.95 | 91 | 30.59 | 3.6 |
| SSD+ResNet-50 | 96.90 | 89 | 15.12 | 2.3 |
| SSD+ResNet-50+ECA | 97.83 | 81 | 15.12 + 0 | 2.4 |
| SSD+ResNet-50+ECA+SAM | 98.06 | 80 | 15.12 + 0.00025 | 2.5 |
| Ours | 98.20 | 78 | 15.12 + 0.00051 | 2.5 |
Figure 10The actual results of six algorithms for metal parts recognition are shown. (The yellow ellipse dotted line in the figure is the model’s misidentified or missed identification.)