| Literature DB >> 33961655 |
Abstract
With the rapid development of Unmanned Aerial Vehicles, vehicle detection in aerial images plays an important role in different applications. Comparing with general object detection problems, vehicle detection in aerial images is still a challenging research topic since it is plagued by various unique factors, e.g. different camera angle, small vehicle size and complex background. In this paper, a Feature Fusion Deep-Projection Convolution Neural Network is proposed to enhance the ability to detect small vehicles in aerial images. The backbone of the proposed framework utilizes a novel residual block named stepwise res-block to explore high-level semantic features as well as conserve low-level detail features at the same time. A specially designed feature fusion module is adopted in the proposed framework to further balance the features obtained from different levels of the backbone. A deep-projection deconvolution module is used to minimize the impact of the information contamination introduced by down-sampling/up-sampling processes. The proposed framework has been evaluated by UCAS-AOD, VEDAI, and DOTA datasets. According to the evaluation results, the proposed framework outperforms other state-of-the-art vehicle detection algorithms for aerial images.Entities:
Year: 2021 PMID: 33961655 PMCID: PMC8104367 DOI: 10.1371/journal.pone.0250782
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The summarize of the significant related work and contributions.
| Algorithms | Contribution |
|---|---|
| AVDNet 2019 [ | 1. Introducing ConvRes residual blocks at multiple scales to alleviate the problem of vanishing features for smaller objects caused |
| 2. Proposing a recurrent-feature aware visualization (RFAV) technique to analyze the network behavior. | |
| 3. A new airborne image data set (ABD) | |
| W. Liu et al. 2019 [ | 1. Concatenating feature maps from layers of different depths |
| 2. Adopting a feature introducing strategy based on oriented response dilated convolution. | |
| A2RMNet 2019 [ | 1. A multi-scale feature gate fusion network which is composed of gate fusion modules, refine blocks and region proposal networks. |
| 2. An aspect ratio attention network is leveraged to preserve the aspect ratios of objects | |
| Artacho, B et al. 2019 [ | 1. A new efficient architecture for semantic segmentation, based on a “Waterfall” Atrous Spatial Pooling architecture, that achieves a considerable accuracy increase while decreasing the number of network parameters and memory footprint. |
| W.Li et al. 2019 [ | 1. Exploring why the state-of-the-art detectors fail in highly dense drone scenes |
| 2. An effective loss | |
| 3. Combining bottom-up cues with top-down attention mechanisms | |
| Rabbi. J et al. 2020 [ | 1. A new edge-enhanced super-resolution GAN (EESRGAN) is applied to improve the quality of remote sensing images |
| 2. Using different detector networks in an end-to-end manner where detector loss was back-propagated into the EESRGAN to improve the detection performance. |
Fig 1Overview of the proposed framework.
Fig 2Comparison between stepwise res-block and res-block, (a) is stepwise res-block, (b) is res-block.
Parameter comparison between original res-block and stepwise res-block.
| Block name | Original res-block | Stepwise res-block |
|---|---|---|
| Parameter of | 1×1×m | 1×1×m |
| Parameter of | 3×3×m | 3×3×m× |
| Parameter of | - | 3×3×m× |
| Parameter of | - | 3×3×m× |
| Parameter of | 1×1×m | 1×1×m |
| Total number of parameter | 3×3×m× | 3×3×m× |
| Max convolution Number (ignoring | 1 | 3 |
| Average parameter per convolution | 3×3×m× | 3×3×m× |
The structure of backbone.
| Type | Filters | Size/Stride | Output | |
|---|---|---|---|---|
| Stepwise res-block | 32 | 3×3/1 | 640×640 | |
| Convolution layer | 32 | 3×3/2 | 320×320 | |
| 2× | Stepwise res-block | 64 | 3×3/1 | 320×320 |
| Convolution layer | 64 | 3×3/2 | 160×160 | |
| 2× | Stepwise res-block | 64 | 3×3/1 | 160×160 |
| 8× | Stepwise res-block | 128 | 3×3/1 | 160×160 |
| Convolution layer | 128 | 3×3/2 | 80×80 | |
| 8× | Stepwise res-block | 128 | 3×3/1 | 80×80 |
| 8× | Stepwise res-block | 256 | 3×3/1 | 80×80 |
| Convolution layer | 256 | 3×3/2 | 40×40 | |
| 4× | Stepwise res-block | 256 | 3×3/1 | 40×40 |
| Convolution layer | 256 | 3×3/2 | 20×20 |
Fig 3Feature fusion and multi-scale detection network.
Fig 4Deep-projection deconvolution model.
(a) is up projection unit. (b) is down projection unit.
Fig 5The distribution of ground truth on UCAS-AOD.
UCAS-AOD dataset categories information.
| Class name | Version1.0 | Version2.0 | Total |
|---|---|---|---|
| Plane | 3591 | 3891 | 7482 |
| Vehicle | 4475 | 2639 | 7114 |
VEDAI dataset categories information.
| Class name | Total | Meta-class name | Total |
|---|---|---|---|
| Pickup | 950 | Small Land Vehicles | 2950 |
| Tractor | 190 | ||
| Vans | 100 | ||
| Car | 1340 | ||
| Truck | 300 | Large Land Vehicles | 690 |
| Camping Car | 390 | ||
| Plane | 47 | - | - |
| Others | 200 | ||
| Boat | 170 |
DOTA dataset categories information.
| Class name | Total |
|---|---|
| Large-vehicle | 11552 |
| Small-vehicle | 12841 |
| Harbor | 5723 |
| Ship | 23467 |
| Ground track field | 398 |
| Soccer ball field | 407 |
| Baseball diamond | 495 |
| Swimming pool | 986 |
| Roundabout | 417 |
| Tennis court | 2044 |
| Basketball | 353 |
| Plane | 6629 |
| Helicopter | 196 |
| Bridge | 1227 |
| Storage tank | 7636 |
Comparison between proposed framework and other methods on UCAS-AOD.
| Method | AP(%) |
|---|---|
| YOLO v2 2017 [ | 79.20 |
| SSD 2020 [ | 81.37 |
| R-DFPN 2018 [ | 82.50 |
| Improved Faster RCNN 2017 [ | 83 |
| DRBox 2017 [ | 85 |
| 86.72 | |
| P-RSDet 2020 [ | 87.36 |
| R-FCN 2016 [ | 89.3 |
| Deformable R-FCN 2017 [ | 91.7 |
| S2ARN 2019 [ | 92.2 |
| FADet 2019 [ | 92.72 |
| RetinaNet-H 2019 [ | 93.6 |
| R3Det 2019 [ | 94.14 |
| A2RMNet 2019 [ | 94.65 |
| SCRDet + + 2020 [ | 94.97 |
| ICN 2018 [ | 95.67 |
| UCAS + NWPU + VS-GANs 2019 [ | 96.12 |
| Improved FBPN-Based Detection Network [ | 96.18 |
Fig 6(a) P-R curve of FPN with Stepwise Res-block (b) P-R curve of FFDP-CNN with Res2Net (c) P-R curve of FFDP-CNN.
Comparison between proposed framework and other methods on UCAS-AOD.
| Method | R | P | AP(%) | F1-Score |
|---|---|---|---|---|
Comparison of parameter numbers and other information.
| Method | Input size | Model size | Parameter Number |
|---|---|---|---|
| YOLO V2 [ | 608×608 | 255 MB | 67 M |
| YOLO V3 [ | 608×608 | 235 MB | 61 M |
| Faster R-CNN [ | 608×608 | 253 MB | 59 M |
| RetinaNet [ | 608×608 | 146 MB | 36 M |
| AVDNet [ | 608×608 | 53 MB | 13 M |
| An Improved FBPN [ | 600×600 | 479 MB | 61 M |
| Ju.M et.al [ | 512×512 | 2.8 MB | - |
Comparison between proposed framework and other methods on VEDAI.
| Method | AP(%) |
|---|---|
| AVDNet 2019 [ | 51.95 |
| VDN 2017 [ | 54.6 |
| DPM 2015 [ | 60.5 |
| 69.0 | |
| Faster-RCNN 2017 [ | 70.9 |
| Improved Faster RCNN 2017 [ | 74.30 |
| Ju, et al. 2019 [ | 80.16 |
| YOLOv3_Joint-SRVDNet 2020 [ | 80.4 |
| Faster RER-CNN 2018 [ | 83.5 |
| YOLOv3_HR [ | 85.66 |
| Faster RCNN + Res2Net (resnet101) 2019 [ | 81.96 |
| Faster RCNN + WaterFall (resnet50) 2019 [ | 77.36 |
| Improved FBPN-Based Detection Network [ | 91.27 |
Fig 7The P-R curve of our framework on VEDAI.
F1-Measure on VEDAI.
| Method | R | P | AP(%) | F1-Score |
|---|---|---|---|---|
Comparison between proposed framework and other methods on DOTA.
| Method | AP(%) (Ship) | AP(%) (Plane) | AP(%) (Small Vehicle) | mAP(%) |
|---|---|---|---|---|
| L-RCNN 2020 [ | - | - | 56.09 | - |
| Yang et al. 2018 [ | - | - | 61.16 | - |
| RoITransformer [ | 83.59 | 88.64 | 68.81 | 80.35 |
| the Light-Head R-CNN OBB+W/FPN [ | 75.77 | 88.02 | 70.15 | 77.98 |
| Faster RCNN Adapted 2018 [ | 87.7 | 87.4 | 74.9 | 83.33 |
| DYOLO Module B 2018 [ | 88.2 | 76.0 | 84.3 | |
| SSD Adapted2018 [ | 85.0 | 88.2 | 76.3 | 83.17 |
| DFRCNN 2018 [ | - | - | 76.5 | - |
| DSSD 2017 [ | 87.5 | 91.1 | 79.0 | 85.87 |
| DYOLO Module A 2018 [ | 87.8 | 86.6 | 79.2 | 84.53 |
| RefineDet 2018 [ | 87.5 | 80.0 | 87.17 | |
| Ju, et al. 2019 [ | - | - | 88.63 | - |
| Improved FBPN-Based Detection Network [ | - | - | 88.76 | - |
| 88.48 | 91.33 |
Fig 8The P-R curve of our framework on DOTA different categories, (a) is the P-R curve on Ship, (b) is the P-R curve on Plane, (c) is the P-R curve on Small Vehicle.
F1-Measure on DOTA.
| Method | R | P | mAP(%) | F1-Score |
|---|---|---|---|---|