| Literature DB >> 35221535 |
Philipp Foehn1, Dario Brescianini1, Elia Kaufmann1, Titus Cieslewski1, Mathias Gehrig1, Manasi Muglikar1, Davide Scaramuzza1.
Abstract
This paper presents a novel system for autonomous, vision-based drone racing combining learned data abstraction, nonlinear filtering, and time-optimal trajectory planning. The system has successfully been deployed at the first autonomous drone racing world championship: the 2019 AlphaPilot Challenge. Contrary to traditional drone racing systems, which only detect the next gate, our approach makes use of any visible gate and takes advantage of multiple, simultaneous gate detections to compensate for drift in the state estimate and build a global map of the gates. The global map and drift-compensated state estimate allow the drone to navigate through the race course even when the gates are not immediately visible and further enable to plan a near time-optimal path through the race course in real time based on approximate drone dynamics. The proposed system has been demonstrated to successfully guide the drone through tight race courses reaching speeds up to 8 m / s and ranked second at the 2019 AlphaPilot Challenge.Entities:
Keywords: Drone racing . Agile flight . Aerial vehicles
Year: 2021 PMID: 35221535 PMCID: PMC8827337 DOI: 10.1007/s10514-021-10011-y
Source DB: PubMed Journal: Auton Robots ISSN: 0929-5593 Impact factor: 3.000
Fig. 1Our AlphaPilot drone waiting on the start podium to autonomously race through the gates ahead
Fig. 2Illustration of the race drone with its body-fixed coordinate frame in blue and a camera coordinate frame in red
Sensor specifications
| Sensor | Model | Rate | Details |
|---|---|---|---|
| Cam | Leopard imaging IMX 264 | Global shutter, color resolution: | |
| IMU | Bosch BMI088 | Range: | |
| LRF | Garmin LIDAR-Lite v3 | Range: 1– |
Fig. 3Overview of the system architecture and its main components. All components within a dotted area run in a single thread
Fig. 4The gate detection module returns sets of corner points for each gate in the input image (fourth column) using a two-stage process. In the first stage, a neural network transforms an input image, (first column), into a set of confidence maps for corners, (second column), and Part Affinity Fields (PAFs) Cao et al. (2017), (third column). In the second stage, the PAFs are used to associate sets of corner points that belong to the same gate. For visualization, both corner maps, (second column), and PAFs, (third column), are displayed in a single image each. While color encodes the corner class for , it encodes the direction of the 2D vector fields for . The yellow lines in the bottom of the second column show the six edge candidates of the edge class (TL, TR) (the TL corner of the middle gate is below the detection threshold), see Sect. 4.2. Best viewed in color (Color figure online)
Fig. 5Example time-optimal motion primitive starting from rest at the origin to a random final position with non-zero final velocity. The velocities are constrained to and the inputs to . The dotted lines denote the per-axis time-optimal maneuvers
Fig. 6Top view of the planned (left) and executed (center) path at the championship race, and an executed multi-lap path at a testing facility (right). Left: Fastest planned path in color, sub-optimal sampled paths in gray. Center: VIO trajectory as and corrected estimate as (Color figure online)
Comparison of different network architectures with respect to intersection over union (IoU), precision (Pre.) and recall (Rec.). The index in the architecture name denotes the number of levels in the U-Net. All networks contain one layer per level with kernel sizes of [3, 3, 5, 7, 7] and [12, 18, 24, 32, 32] filters per level. Architectures labelled with ’L’ contain twice the amount of filters per level. Timings are measured for single input images of size 352 x 592 on a desktop computer equipped with an NVIDIA RTX 2080 Ti
| Arch. | IoU | Pre. | Rec. | #params | Latency [s] |
|---|---|---|---|---|---|
| UNet-5L | 0.966 | 0.997 | 0.967 | 613k | 0.106 |
| UNet-5 | 0.964 | 0.997 | 0.918 | 160k | 0.006 |
| UNet-4L | 0.948 | 0.997 | 0.920 | 207k | 0.085 |
| UNet-4 | 0.941 | 0.989 | 0.862 | 58k | 0.005 |
| UNet-3L | 0.913 | 0.991 | 0.634 | 82k | 0.072 |
| UNet-3 | 0.905 | 0.988 | 0.520 | 27k | 0.005 |
Total flight time vs. computation time averaged over 100 runs. The percentage in parenthesis is the computation time with respect to the computational time for the full track
| Flight time | Computation time | |
|---|---|---|
| 1 | ||
| 2 | ||
| 3 | ||
| 4 | ||
| 5 (full track) | ||
| CPC Foehn and Scaramuzza ( |