| Literature DB >> 35663726 |
Li Huang1,2, Cheng Chen1, Juntong Yun3,4, Ying Sun3,4,5, Jinrong Tian3,4, Zhiqiang Hao3,4,5, Hui Yu6, Hongjie Ma7.
Abstract
The development of object detection technology makes it possible for robots to interact with people and the environment, but the changeable application scenarios make the detection accuracy of small and medium objects in the practical application of object detection technology low. In this paper, based on multi-scale feature fusion of indoor small target detection method, using the device to collect different indoor images with angle, light, and shade conditions, and use the image enhancement technology to set up and amplify a date set, with indoor scenarios and the SSD algorithm in target detection layer and its adjacent features fusion. The Faster R-CNN, YOLOv5, SSD, and SSD target detection models based on multi-scale feature fusion were trained on an indoor scene data set based on transfer learning. The experimental results show that multi-scale feature fusion can improve the detection accuracy of all kinds of objects, especially for objects with a relatively small scale. In addition, although the detection speed of the improved SSD algorithm decreases, it is faster than the Faster R-CNN, which better achieves the balance between target detection accuracy and speed.Entities:
Keywords: SSD; convolutional neural network; indoor scene; multi-scale feature fusion; small target detection
Year: 2022 PMID: 35663726 PMCID: PMC9160233 DOI: 10.3389/fnbot.2022.881021
Source DB: PubMed Journal: Front Neurorobot ISSN: 1662-5218 Impact factor: 3.493
Figure 1SSD network structure.
Prior box size of each feature layer.
|
|
|
|
|
|---|---|---|---|
| Conv4_3 | 4 | 30 | 60 |
| Fc7(Conv7) | 6 | 60 | 111 |
| Conv8_2 | 6 | 111 | 162 |
| Conv9_2 | 6 | 162 | 213 |
| Conv10_2 | 4 | 213 | 264 |
| Conv11_2 | 4 | 264 | 315 |
Figure 2SSD network structure based on multi-scale feature fusion.
Figure 3Average pooling and maximum pooling.
Figure 4Deconvolution up-sampling.
Figure 5Different methods of feature fusion. (A) Element-by-element addition, (B) cascade.
Figure 6Multi-scale feature fusion structure.
Figure 7Color images from different angles, backgrounds, and illumination.
Figure 8Partial images of target detection dataset in a complex indoor scene.
Figure 9LabelImg annotation interface.
Related parameters of experimental environment.
|
|
|
|---|---|
| Operating system | Windows10 |
| CPU | AMD Ryzen 7 |
| GPU | NVIDIA GeForce RTX 2070 |
| Cuda with Cudnn | 10.0/7.6.5 |
| Python | 3.6 |
| Tensorflow, Keras | 1.13.2/2.1.5 |
| Opencv | 4.5.1 |
Comparison of detection performance of feature fusion networks at different target detection layers.
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
|
| |||||||
| No fusion | 0.9838 | 0.9898 | 0.9658 | 0.9121 | 0.9046 | 0.4715 | 87.13 |
| Conv3_3/Conv4_3/Conv5_3 | 0.9855 | 0.9889 | 0.9638 | 0.9823 | 0.9757 | 0.8115 | 95.13 |
| Conv4_3/Conv7/Conv8_2 | 0.9936 | 0.9910 | 0.9902 | 0.9118 | 0.8953 | 0.4819 | 87.73 |
| Conv7/Conv8_2/Conv9_2 | 0.9822 | 0.9895 | 0.9626 | 0.9195 | 0.8926 | 0.4841 | 87.17 |
| Multi-scale feature fusion | 0.9980 | 0.9990 | 0.9970 | 0.9947 | 0.9841 | 0.8513 | 96.90 |
Figure 10SSD network training based on multi-scale feature fusion. (A) Loss of training set, (B) loss of verification set.
Test results of different network models on data sets.
|
|
|
|
|---|---|---|
| Faster R-CNN | 98.78 | 12 |
| YOLOv5 | 82.83 | 36 |
| SSD | 87.13 | 26 |
| SSD with multi-scale feature fusion | 96.90 | 19 |
Figure 11Detection of each algorithm corresponding to different classes in the indoor scene. (A) Faster R-CNN. (B) YOLO v5. (C) SSD. (D) SSD network with multi-scale feature fusion.
Figure 12Comparison of detection effects of various networks. (A) Faster R-CNN. (B)YOLO v5. (C) SSD. (D) SSD network with multi-scale feature fusion.