| Literature DB >> 29795038 |
Phong Ha Nguyen1, Muhammad Arsalan2, Ja Hyung Koo3, Rizwan Ali Naqvi4, Noi Quang Truong5, Kang Ryoung Park6.
Abstract
Autonomous landing of an unmanned aerial vehicle or a drone is a challenging problem for the robotics research community. Previous researchers have attempted to solve this problem by combining multiple sensors such as global positioning system (GPS) receivers, inertial measurement unit, and multiple camera systems. Although these approaches successfully estimate an unmanned aerial vehicle location during landing, many calibration processes are required to achieve good detection accuracy. In addition, cases where drones operate in heterogeneous areas with no GPS signal should be considered. To overcome these problems, we determined how to safely land a drone in a GPS-denied environment using our remote-marker-based tracking algorithm based on a single visible-light-camera sensor. Instead of using hand-crafted features, our algorithm includes a convolutional neural network named lightDenseYOLO to extract trained features from an input image to predict a marker's location by visible light camera sensor on drone. Experimental results show that our method significantly outperforms state-of-the-art object trackers both using and not using convolutional neural network in terms of both accuracy and processing time.Entities:
Keywords: autonomous landing; lightDenseYOLO; real-time marker detection; unmanned aerial vehicle; visible light camera sensor on drone
Year: 2018 PMID: 29795038 PMCID: PMC6022018 DOI: 10.3390/s18061703
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Summary of comparisons of proposed and previous studies.
| Category | Type of Feature | Type of Camera | Descriptions | Strength | Weakness | |
|---|---|---|---|---|---|---|
| Passive methods | Hand-crafted features | Multisensory fusion system with a pan-tilt unit (PTU), infrared camera, and ultra-wide-band radar, [ | Ground-based system that first detects the unmanned aerial vehicle (UAV) in the recovery area to start tracking in the hover area and then send commands for autonomous landing. | A multiple sensor-fusion method guides UAV to land in both day and night time. | Tracking algorithm and 3D pose estimation need to be improved. Multisensory system requires complicated calibration process. | |
|
Ground stereo vision system with two PTUs placed on each side of a runway. Each PTU includes a visible-light camera [ |
Two PTUs are allocated on both sides of a runway to enlarge the baseline. The location of the UAV is detected by the Chan-Vese model- approach and updated by an extended Kalman filter algorithm. | Ground stereo vision-based system successfully detects and tracks the UAV and shows robust detection results in real time. | Setting up two PTU ground-based systems requires extensive calibration. | |||
| Two-infrared-camera array system with an infrared laser lamp [ | Infrared laser lamp is fixed on the nose of the UAV for easy detection. | Infrared camera array system successfully guides the UAV to perform automatic landing in a GPS-denied environment at a distance of 1 km. |
Not practical for use in a narrow landing area. Complicated set-up of two-camera array system on the ground is required. | |||
| Active methods | Without marker | Single down-facing visible-light camera [ |
A local 3D elevation map of ground environment is generated using the input image from the camera sensor. Safe landing spot is estimated by a probabilistic approach. | Without a marker, this method can help a drone find the landing spot in an emergency case. | Experiments were not conducted in various places and at different times, and the maximum height for testing was only 4–5 m. | |
| Infrared camera [ |
Fixed infrared camera below the head of the UAV detects the position of four infrared lamps on a runway. Based on prior knowledge of distance between infrared lamps, the pose parameters are calculated during the landing process. | Successfully detects infrared lamps on the ground in both day and night time at a distance of 450 m. | The series of infrared lamps required is difficult to deploy in various places. | |||
| Active methods | With marker | Hand-crafted features | Thermal camera [ | Feature points are extracted from a letter-based marker enabling drone to approach closer to target and finish the landing operation. | Detect marker using thermal images and overcomes various illumination challenges. | Drone must carry a costly thermal camera. |
| Visible-light camera [ | Marker is detected by line segments or contour detectors. | Marker is detected by using only a single visible-light camera sensor. | Marker is detected only in daytime and within a limited range. | |||
| Trained features | Visible-light camera [ | Double-deep Q-networks solve marker detection and command the drone to reach the target simultaneously. | First approach to solve the autonomous landing problem using deep reinforcement learning. | Testing is done in an indoor environment, and there is a gap between indoor and outdoor environments. | ||
| Visible-light camera (proposed method). |
Uses lightweight lightDenseYOLO convolutional neural network (CNN) marker detector to roughly predict marker location. Enhanced detection of marker center and direction is performed by Profile Checker v2 algorithm. |
Requires only a single visible-light camera. Detects marker center and direction at very long distance at a fast speed. | An embedded system which can support deep learning is required to operate marker detection in real time. | |||
Figure 1Flowchart of proposed far distance marker-based tracking algorithm.
Figure 2Bottleneck layer design.
Figure 3Transition layer design.
Figure 4Dense block example.
Figure 5Proposed lightDenseNet architecture with two dense blocks and two lateral connections.
Figure 6Overall flowchart of the training of YOLO and YOLO v2 for object detection
Differences of characteristics of YOLO and YOLO v2 (unit: px).
| Characteristic | YOLO | YOLO v2 | |
|---|---|---|---|
| Feature Extractor | Darknet | Darknet-19 448 × 448 | |
| Input size | Training from scratch using ImageNet dataset | 224 × 224 | 448 × 448 |
| Training by fine-tuning using Pascal VOC or MS COCO dataset | 448 × 448 | 448 × 448 | |
| Testing | 448 × 448 | 448 × 448 | |
Figure 7Example of YOLO v2 marker detection. (a) YOLO v2 divides the input image into grid, and (b) each grid cell predicts five bounding boxes based on five prior anchor boxes. The predictions are stored in an ) output tensor. This example shows how YOLO v2 predicts marker from an input image with a grid size S = 8 and C = 1 (number of classes to be detected) so that the size of the final output tensor is ; (c) The yellow shaded cell shows higher potential of detecting marker compared to other red shaded cells; (d) The result of the detected marker.
Architecture of lightDenseYOLO for marker detection. Each conv layer is the sequence of batch normalization (BN), rectified linear unit (ReLU), and convolution layer. s1 and s2 present stride by 1 and 2 px, respectively.
| Layer | Input Size | Output Size | |
|---|---|---|---|
| Input | 320 × 320 × 3 | 320 × 320 × 3 | |
| 7 × 7 conv, s2 | 320 × 320 × 3 | 160 × 160 × 64 | |
| 2 × 2 pooling, s2 | 160 × 160 × 64 | 80 × 80 × 64 | |
| Dense block 1 |
| 80 × 80 × 64 | 80 × 80 × 256 |
| Transition layer |
| 80 × 80 × 256 | 40 × 40 × 128 |
| Dense block 2 |
| 40 × 40 × 128 | 40 × 40 × 512 |
| Transition layer |
| 40 × 40 × 512 | 20 × 20 × 256 |
| Reshape | 40 × 40 × 320 | 20 × 20 × 1280 | |
| Bottleneck layer |
| 20 × 20 × 1280 | 20 × 20 × 32 |
| Reshape | 80 × 80 × 128 | 20 × 20 × 2048 | |
| Bottleneck layer |
| 20 × 20 × 2048 | 20 × 20 × 32 |
| Concatenation | 20 × 20 × 32 | 20 × 20 × 320 | |
|
| 20 × 20 × 320 | 20 × 20 × 30 | |
Figure 8Flowchart of proposed Profile Checker v2 algorithm to find marker center and direction.
Figure 9Example images of procedure of Profile Checker v2 algorithm. (a) Input image; (b) image by adaptive thresholding; (c) image by morphological transform; and (d) detected marker center and direction.
Figure 10Detected center and direction of marker (a) using our Profile Checker v2; (b) Profile visualization from the red circle of Figure 10a.
Description of Snapdragon 835 mobile hardware development kit.
| Components | Specifications |
|---|---|
| Central Processing Unit (CPU) | Qualcomm® Kryo™ 280 (dual-quad core, 64-bit ARM V8 compliant processors, 2.2 GHz and 1.9 GHz clusters) |
| Graphics Processing Unit (GPU) | Qualcomm® Adreno™ 540 |
| Digital Processing Unit (DSP) | Qualcomm® Hexagon™ DSP with Hexagon vector extensions |
| RAM | 4 GB |
| Storage | 128 GB |
| Operating System | Android 7.0 “Nougat” |
Figure 11Snapdragon 835 mobile hardware development kit.
Figure 12Examples of images in our DDroneC-DB2 dataset.
Description of DDroneC-DB2.
| Sub-Dataset | Number of Images | Condition | Description | |
|---|---|---|---|---|
| Morning | Far | 3088 | Humidity: 44.7% | Landing speed: 5.5 m/s |
| Close | 641 | |||
| Close (from DdroneC-DB1 [ | 425 | Humidity: 41.5% | Landing speed: 4 m/sAuto mode of camera shutter speed(8~1/8000 s) and ISO (100~3200) | |
| Afternoon | Far | 2140 | Humidity: 82.1% | Landing speed: 7 m/s |
| Close | 352 | |||
| Close (from DdroneC-DB1 [ | 148 | Humidity: 73.8% | Landing speed: 6 m/s | |
| Evening | Far | 3238 | Humidity: 31.5% | Landing speed: 6 m/s |
| Close | 326 | |||
| Close (from DdroneC-DB1 [ | 284 | Humidity: 38.4% | Landing speed: 4 m/s | |
Summary of training hyper parameters used on different models for DDroneC-DB2 dataset.
| lightDenseYOLO (Ours) | YOLO v2 | Faster | MobileNets-SSD | |
|---|---|---|---|---|
| Input size (unit: px) | Multi-scale training (from 128 × 128 to 640 × 640) | Multi-scale training (from 128 × 128 to 640 × 640) | 320 × 320 | 320 × 320 |
| Number of epochs | 60 | 60 | 60 | 60 |
| Batch size | 64 | 64 | 64 | 64 |
| Initial learning rate | 0.0001 | 0.0001 | 0.0001 | 0.004 |
| Momentum | 0.9 | 0.9 | 0.9 | 0.9 |
| Decay | 0.0005 | 0.0005 | 0.0005 | 0.9 |
| Backbone architecture | lightDenseNet | Darknet-19 448 × 448 | VGG 16 | MobileNets |
Precision (P) and recall (R) at IoU = 0.5 of different CNN marker detectors.
| Morning | Afternoon | Evening | Entire Dataset | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Far | Close | Far | Close | Far | Close | Far | Close | Far + Close | ||||||||||
| P | R | P | R | P | R | P | R | P | R | P | R | P | R | P | R | P | R | |
| lightDenseYOLO | 0.96 | 0.95 | 0.96 | 0.96 | 0.94 | 0.96 | 0.95 | 0.95 | 0.95 | 0.96 | 0.97 | 0.96 | 0.95 | 0.96 | 0.96 | 0.96 | 0.96 | 0.96 |
| YOLO v2 | 0.95 | 0.95 | 0.96 | 0.95 | 0.92 | 0.94 | 0.93 | 0.95 | 0.94 | 0.93 | 0.95 | 0.96 | 0.94 | 0.94 | 0.95 | 0.95 | 0.94 | 0.95 |
| Faster R-CNN | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.98 | 0.98 | 0.99 | 0.99 | 0.99 | 0.99 | 0.98 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 |
| MobileNets-SSD | 0.98 | 0.98 | 0.99 | 0.98 | 0.97 | 0.96 | 0.97 | 0.98 | 0.98 | 0.97 | 0.97 | 0.99 | 0.98 | 0.97 | 0.98 | 0.98 | 0.98 | 0.98 |
Precision (P) and recall (R) at IoU = 0.5 of different CNN marker detectors with Profile Checker algorithms.
| Morning | Afternoon | Evening | Entire Dataset | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Far | Close | Far | Close | Far | close | Far | Close | Far +close | ||||||||||
| P | R | P | R | P | R | P | R | P | R | P | R | P | R | P | R | P | R | |
| lightDenseYOLO +Profile Checker v1 | 0.97 | 0.96 | 0.96 | 0.95 | 0.96 | 0.97 | 0.96 | 0.98 | 0.95 | 0.96 | 0.96 | 0.96 | 0.96 | 0.96 | 0.96 | 0.96 | 0.96 | 0.96 |
| lightDenseYOLO +Profile Checker v2 | 0.99 | 0.99 | 0.98 | 0.99 | 0.98 | 0.98 | 0.99 | 0.98 | 0.98 | 0.99 | 0.99 | 0.99 | 0.98 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 |
| YOLO v2 + Profile Checker v2 | 0.98 | 0.97 | 0.99 | 0.99 | 0.98 | 0.99 | 0.99 | 0.98 | 0.98 | 0.98 | 0.99 | 0.99 | 0.98 | 0.98 | 0.99 | 0.99 | 0.99 | 0.98 |
| Faster R-CNN + Profile Checker v2 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 |
| MobileNets-SSD + Profile Checker v2 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 |
Figure 13Comparative graphs of (a) precision and (b) recall by different CNN marker detectors according to intersection over union (IoU) threshold.
Figure 14Comparative graphs of (a) precision and (b) recall by different CNN marker detectors with Profile Checker algorithms according to IoU thresholds.
Figure 15Example of detected marker at close distance image.
Figure 16Comparison of center location error (CLE) of long-distance images of various methods.
Figure 17Comparison of CLE of close-distance images of various methods.
Figure 18Comparison of CLE of entire DDroneC-DB2 dataset of various methods.
Figure 19CLE comparison between our method and non-CNN marker trackers.
Figure 20Comparison of predicted direction error (PDE) close distance images of various methods.
Figure 21Marker detection example obtained using our method and previous methods in the (a) morning; (b) afternoon; and (c) evening.
Comparisons of average processing speed of proposed method with those of other CNN marker detectors (fps).
| Desktop Computer | Snapdragon 835 kit | |
|---|---|---|
| lightDenseYOLO | ~50 | ~25 |
| YOLO v2 | ~33 | ~9.2 |
| Faster R-CNN | ~5 | ~2.5 |
| MobileNets-SSD | ~12.5 | ~7.14 |
| lightDenseYOLO + Profile Checker v1 | ~40 | ~20.83 |
| lightDenseYOLO + Profile Checker v2 | ~40 | ~20 |
| YOLO v2 + Profile Checker v2 | ~28.6 | ~7.7 |
| Faster R-CNN + Profile Checker v2 | ~4.87 | ~2 |
| MobileNets-SSD + Profile Checker v2 | ~11.8 | ~6.75 |