| Literature DB >> 34884156 |
Nivesh Gadipudi1, Irraivan Elamvazuthi1, Cheng-Kai Lu1, Sivajothi Paramasivam2, Steven Su3.
Abstract
Visual odometry is the process of estimating incremental localization of the camera in 3-dimensional space for autonomous driving. There have been new learning-based methods which do not require camera calibration and are robust to external noise. In this work, a new method that do not require camera calibration called the "windowed pose optimization network" is proposed to estimate the 6 degrees of freedom pose of a monocular camera. The architecture of the proposed network is based on supervised learning-based methods with feature encoder and pose regressor that takes multiple consecutive two grayscale image stacks at each step for training and enforces the composite pose constraints. The KITTI dataset is used to evaluate the performance of the proposed method. The proposed method yielded rotational error of 3.12 deg/100 m, and the training time is 41.32 ms, while inference time is 7.87 ms. Experiments demonstrate the competitive performance of the proposed method to other state-of-the-art related works which shows the novelty of the proposed technique.Entities:
Keywords: deep learning; pose estimation; pose optimization; visual odometry
Mesh:
Year: 2021 PMID: 34884156 PMCID: PMC8662456 DOI: 10.3390/s21238155
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Architecture of the feature encoder. The filter’s size decreases as the depth of the network increases.
| Layer | Kernel Size | Channels | Stride | Dilation |
|---|---|---|---|---|
| Input | - | 2 | - | - |
| Layer-1 | 3 × 9 | 16 | 2 | 2 |
| Layer-2 | 3 × 9 | 16 | 2 | 1 |
| Layer-3 | 3 × 7 | 32 | 2 | 2 |
| Layer-4 | 3 × 7 | 32 | 2 | 1 |
| Layer-5 | 3 × 5 | 64 | 1 | 2 |
| Layer-6 | 3 × 5 | 64 | 1 | 1 |
| Layer-7 | 2 × 2 | 64 | 2 | 1 |
Figure 1Overview of the WPO-NET. composition layer is used to derive the composite poses from predicted poses.
Effects of varying quantities of DA on MPO-Net (the least error results are high-lighted in bold text).
| DA | ATE | Trans | Rot |
|---|---|---|---|
| 0 | 91.82 | 12.82 | 5.07 |
| 10 | 57.52 |
| 3.27 |
| 20 | 96.81 | 9.31 | 3.91 |
| 30 |
| 8.57 |
|
| 40 | 94.03 | 9.70 | 3.49 |
| 50 | 79.06 | 9.38 | 3.28 |
Figure 2Trajectories of sequences 09 (a) and 10 (b) under different data augmentation (DA) quantities. X and Y-axis represent motion along the Z (forward) and X (left/right) axis of the vehicle in the vehicular frame.
Figure 3Comparison of rotational and translational errors of different DA quantities at subsamples of varying length (100 m, 200 m, 300 m, …, 800 m) sequences 09 and 10.
Comparative results on the KITTI benchmark (data is extracted from the corresponding works/citations, the least error results are high-lighted in bold text).
| Method | Sequence 09 | Sequence 10 | Avg | |||
|---|---|---|---|---|---|---|
| Trans | Rot | Trans | Rot | Trans | Rot | |
| VISO2M [ |
|
| 41.60 | 32.99 | 24.34 | 17.07 |
| ORB-SLAM [ | - | - | 86.51 | 98.90 | 30.01 | 35.53 |
| Flowdometry [ | 12.64 | 8.04 | 11.65 | 7.28 | 11.42 | 6.92 |
| DeepVO [ | - | - | 8.11 | 8.83 |
| 6.12 |
| SfM Learner [ | 17.84 | 6.78 | 37.91 | 17.78 | 27.88 | 12.28 |
| GeoNet [ | 43.76 | 16.00 | 35.6 | 13.80 | 39.68 | 14.90 |
| Zhan et al. [ | 11.92 | 3.60 | 12.62 | 3.43 | 12.27 | 3.52 |
| Wang et al. [ | 9.30 | 3.50 |
| 3.90 | 8.26 | 3.70 |
| SC-SfM [ | 11.20 | 3.35 | 10.10 | 4.96 | 10.65 | 4.16 |
| CM-VO [ | 9.69 | 3.37 | 10.01 | 4.87 | 9.85 | 4.12 |
| WPO-Net (proposed) | 8.19 | 3.02 | 8.95 |
| 8.57 |
|
Figure 4An illustration of the number of images taken as input to the network for .
Effects of different window sizes on MPO-Net (the least error results are high-lighted in bold text).
| WS | Forward Passes | ATE | Trans | Rot |
|---|---|---|---|---|
| 2 (no WPO) | 1 | 98.30 | 12.95 | 4.79 |
| 3 | 2 | 84.25 | 9.41 | 3.48 |
|
|
|
|
|
|
Figure 5Trajectories of sequences 09 (a) and 10 (b) under different window sizes . X and Y-axis represent motion along the Z (forward) and X (left/right) axis of the vehicle in the vehicular frame.