| Literature DB >> 35052138 |
Shangwang Liu1,2, Tongbo Cai1,2, Xiufang Tang1,2, Yangyang Zhang1,2, Changgeng Wang1,2.
Abstract
Aiming at recognizing small proportion, blurred and complex traffic sign in natural scenes, a traffic sign detection method based on RetinaNet-NeXt is proposed. First, to ensure the quality of dataset, the data were cleaned and enhanced to denoise. Secondly, a novel backbone network ResNeXt was employed to improve the detection accuracy and effection of RetinaNet. Finally, transfer learning and group normalization were adopted to accelerate our network training. Experimental results show that the precision, recall and mAP of our method, compared with the original RetinaNet, are improved by 9.08%, 9.09% and 7.32%, respectively. Our method can be effectively applied to traffic sign detection.Entities:
Keywords: ResNeXt; RetinaNet; group normalization; natural scenes; traffic signs
Year: 2022 PMID: 35052138 PMCID: PMC8774394 DOI: 10.3390/e24010112
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Structure of RetinaNet-NeXt network.
Figure 2Backbone Network (ResNeXt).
Figure 3Structure of Feature Pyramid Network.
Figure 4Structure of classification and regression subnet.
Figure 5The types of traffic signs in the public dataset TT100K. (a) instruction; (b) prohibition; (c) warning. It is mainly used in this work.
The number of images in TT100k dataset.
| Type | Size | Total |
|---|---|---|
| Train | 2048 × 2048 | 7196 |
| Test | 3071 |
Figure 6Data augmentation results of traffic sign image: (a) Original image; (b) Size cropping; (c) Color change.
Figure 7Training loss curve.
Figure 8Traffic sign recognition results.
Precision and recall of different anchor.
| IOU | Size | Precision (%) | Recall (%) |
|---|---|---|---|
| 0.5 | (0, 32] | 83.39 | 73.88 |
| 0.5 | (32, 96] | 90.79 | 86.22 |
| 0.5 | (96, 512] | 90.29 | 71.90 |
| 0.5 | (0, 512] | 87.45 | 79.65 |
Figure 9Distribution of ground truth and prediction: (a) The distribution of box size in ground truth; (b) The distribution of box size in prediction.
Comparison of different detection frameworks.
| Frameworks | ||||
|---|---|---|---|---|
| Faster RCNN [ | 74.32 | 55.08 | 63.26 | 74.02 |
| YOLOv5 [ | 78.80 | 75.00 | 76.85 | 81.70 |
| RetinaNet [ | 78.37 | 70.56 | 78.37 | 79.39 |
| RetinaNet-NeXt (Ours) | 87.45 | 79.65 | 83.37 | 86.71 |
Figure 10Comparison of recognition results of different detection frameworks: (a) Faster RCNN; (b) YOLOv5; (c) RetinaNet; (d) RetinaNet-NeXt.
Comparison of different models.
| Backbone | |||
|---|---|---|---|
| ResNet50 | 78.37 | 70.56 | 79.39 |
| ResNet101 | 89.95 | 88.22 | 92.02 |
| ResNet152 | 90.54 | 89.29 | 92.80 |
| ResNeXt50 | 87.45 | 79.65 | 86.71 |
Comparison of different models under the effect of anchor.
| Backbone | Anchor | ||||
|---|---|---|---|---|---|
| (0, 32] | (32, 96] | (96, 512] | (0, 512] | ||
| ResNet50 | 68.73 | 85.85 | 88.94 | 78.37 | |
| 60.22 | 79.83 | 75.81 | 70.56 | ||
| ResNet101 | 86.32 | 93.37 | 90.49 | 89.95 | |
| 85.14 | 92.58 | 77.62 | 88.22 | ||
| ResNet152 | 86.85 | 94.09 | 90.58 | 90.54 | |
| 86.53 | 93.50 | 77.53 | 89.29 | ||
| ResNeXt50 | 83.39 | 90.79 | 90.29 | 87.45 | |
| 73.88 | 86.22 | 71.64 | 79.63 | ||
Figure 11PR curves of different models under the effect of anchor: (a) The anchor size is (0, 32]; (b) The anchor size is (32, 96]; (c) The anchor size is (96, 512]; (d) The anchor size is (0, 512].