| Literature DB >> 29186756 |
Jiandan Zhong1,2,3, Tao Lei1, Guangle Yao1,2,3.
Abstract
Vehicle detection in aerial images is an important and challenging task. Traditionally, many target detection models based on sliding-window fashion were developed and achieved acceptable performance, but these models are time-consuming in the detection phase. Recently, with the great success of convolutional neural networks (CNNs) in computer vision, many state-of-the-art detectors have been designed based on deep CNNs. However, these CNN-based detectors are inefficient when applied in aerial image data due to the fact that the existing CNN-based models struggle with small-size object detection and precise localization. To improve the detection accuracy without decreasing speed, we propose a CNN-based detection model combining two independent convolutional neural networks, where the first network is applied to generate a set of vehicle-like regions from multi-feature maps of different hierarchies and scales. Because the multi-feature maps combine the advantage of the deep and shallow convolutional layer, the first network performs well on locating the small targets in aerial image data. Then, the generated candidate regions are fed into the second network for feature extraction and decision making. Comprehensive experiments are conducted on the Vehicle Detection in Aerial Imagery (VEDAI) dataset and Munich vehicle dataset. The proposed cascaded detection model yields high performance, not only in detection accuracy but also in detection speed.Entities:
Keywords: aerial image; convolutional neural network; deep learning; vehicle detection
Year: 2017 PMID: 29186756 PMCID: PMC5751529 DOI: 10.3390/s17122720
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Framework of the proposed model.
Figure 2(a) Training images are rotated with four angles in clockwise; (b) The flip operation of the training images.
Figure 3Examples of the original and updated annotations.
Figure 4(a) The architecture of VGG-16 model; (b) The architecture of VPN.
Figure 5The architecture of VDN.
Figure 6Examples from the VEDAI dataset.
The statistical data of VEDAI.
| Classes | Tag | Number |
|---|---|---|
| Car | car | 1340 |
| Pick-up | pic | 950 |
| Truck | tru | 300 |
| Plane | pla | 47 |
| Boat | boa | 170 |
| Camping car | cam | 390 |
| Tractor | tra | 190 |
| Vans | van | 100 |
| Other | oth | 200 |
Comparison results of various detection models on VEDAI.
| Detection Model | Image Size | Recall Rate | AP | F1-Score |
|---|---|---|---|---|
| Faster R-CNN (Z&F) | 1024 × 1024 | 63.5% | 30.8% | 0.229 |
| Faster R-CNN (VGG-16) | 1024 × 1024 | 42.1% | 0.232 | |
| Fast R-CNN (VGG-16) | 1024 × 1024 | 72.2% | 39.8% | 0.216 |
| SLIC with Z&F | 1024 × 1024 | 58.3% | 25.4% | 0.066 |
| SLIC with VGG-16 | 1024 × 1024 | 58.8% | 23.2% | 0.064 |
| Our Model | 1024 × 1024 | 72.3% | ||
| Faster R-CNN (Z&F) | 512 × 512 | 60.9% | 32.0% | 0.212 |
| Faster R-CNN (VGG-16) | 512 × 512 | 40.9% | 0.225 | |
| Fast R-CNN (VGG-16) | 512 × 512 | 69.4% | 37.3% | 0.224 |
| Our Model | 512 × 512 | 69.7% |
Figure 7Precision-recall curve of four models: (a) VEDAI 1024 (b) VEDAI 512.
Figure 8Recall vs. IoU curve of three CNN-based models: (a) VEDAI 1024 (b) VEDAI 512.
Comparison of detection time (fps: frames per second) and training time (h: hours).
| Detection Model | Image Size | Detection Time | Training Time |
|---|---|---|---|
| Faster R-CNN (Z&F) | 1024 × 1024 | 28.4 h | |
| Faster R-CNN (VGG-16) | 1024 × 1024 | 5.4 fps | 28.5 h |
| Fast R-CNN (VGG-16) | 1024 × 1024 | 0.4 fps | 8.2 h |
| SLIC with Z&F | 1024 × 1024 | 5.6 fps | 7.9 h |
| SLIC with VGG-16 | 1024 × 1024 | 4.9 fps | 8.2 h |
| Our Model | 1024 × 1024 | 4.5 fps | 10.7 h |
| Faster R-CNN (Z&F) | 512 × 512 | 28.3 h | |
| Faster R-CNN (VGG-16) | 512 × 512 | 5.6 fps | 28.6 h |
| Fast R-CNN (VGG-16) | 512 × 512 | 0.4 fps | 8.1 h |
| Our Model | 512 × 512 | 4.6 fps | 10.6 h |
Figure 9(a–l) some detection examples of VEDAI dataset.
Comparison results of various detection models on Munich Vehicle dataset.
| Detection Model | Recall Rate | AP | F1-Score | Detection Time (fps) |
|---|---|---|---|---|
| Faster R-CNN (Z&F) | 66.8% | 53.9% | 0.657 | |
| Faster R-CNN (VGG-16) | 78.3% | 64.8% | 0.779 | 4.9 |
| Our Model | 3.2 |
Figure 10Comparisons of three detection models (a) precision-recall curve (b) recall vs. IoU curve.
Figure 11(a–l) some detection examples of VEDAI dataset.