| Literature DB >> 36080905 |
Abstract
Multi-sensor fusion is important in the field of autonomous driving. A basic prerequisite for multi-sensor fusion is calibration between sensors. Such calibrations must be accurate and need to be performed online. Traditional calibration methods have strict rules. In contrast, the latest online calibration methods based on convolutional neural networks (CNNs) have gone beyond the limits of the conventional methods. We propose a novel algorithm for online self-calibration between sensors using voxels and three-dimensional (3D) convolution kernels. The proposed approach has the following features: (1) it is intended for calibration between sensors that measure 3D space; (2) the proposed network is capable of end-to-end learning; (3) the input 3D point cloud is converted to voxel information; (4) it uses five networks that process voxel information, and it improves calibration accuracy through iterative refinement of the output of the five networks and temporal filtering. We use the KITTI and Oxford datasets to evaluate the calibration performance of the proposed method. The proposed method achieves a rotation error of less than 0.1° and a translation error of less than 1 cm on both the KITTI and Oxford datasets.Entities:
Keywords: convolutional neural network; online self-calibration; voxel information
Mesh:
Year: 2022 PMID: 36080905 PMCID: PMC9460808 DOI: 10.3390/s22176447
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Overall structure of the proposed network. In the attention module, the T within a circle represents the transpose of a matrix; @ within a circle represents a matrix multiplication; S’ within a circle represents the soft max function; C’ within a circle represents concatenation. In the inference network, Trs and Rot represent the translation and rotation parameters predicted by the network, respectively.
Figure 2Point cloud constituting one frame in the Oxford dataset. The green dots represent points obtained by the right LiDAR, and the red dots represent the points obtained by the left LiDAR.
Figure 3Results of applying the proposed method to a test frame of the KITTI dataset. (a) Transformation by randomly sampled deviations. (b) Transformation by given calibrated parameters. (c) Transformation by inferred from Net1. (d) Transformation by obtained from iterative refinement by five networks.
Quantitative results of calibration performed on the KITTI dataset without temporal filtering. See footnotes 1,2 for hyper-parameter settings.
| Refinement Stage | Rotation Error (°) | Translation Error (cm) | ||||
|---|---|---|---|---|---|---|
| Roll | Pitch | Yaw | X | Y | Z | |
| After Net1 1 | 0.182 | 0.110 | 0.386 | 2.393 | 1.205 | 1.781 |
| After Net2 1 | 0.112 | 0.068 | 0.176 | 1.513 | 1.356 | 1.663 |
| After Net3 1 | 0.071 | 0.046 | 0.134 | 1.119 | 0.709 | 1.027 |
| After Net4 1 | 0.039 | 0.024 | 0.088 | 0.750 | 0.428 | 0.735 |
| After Net5 2 | 0.024 | 0.018 | 0.060 | 0.472 | 0.272 | 0.448 |
1 S = 5, (V, V) = (96, 160), G = 1024, (λ1, λ2) = (1, 2), B = 8. 2 S = 2.5, (V, V) = (384, 416), G = 128, (λ1, λ2) = (0.5, 5), B = 4.
Figure 4Calibration results and error distribution when temporal filtering was applied. (a) Transformation by randomly sampled deviation from Rg1. (b) Transformation by randomly sampled deviation from Rg1. (c) Calibration results from random deviations shown in (a). (d) Calibration results from random deviations shown in (b). (e) Rotation error for the results shown in (c). (f) Rotation error for the results shown in (d). (g) Translation error for the results shown in (c). (h) Translation error for the results shown in (d).
Comparison of calibration performance between our method and other CNN-based methods.
| Method | Range | Bundle Size of Frame | Rotation Error (°) | Translation Error (cm) | ||||
|---|---|---|---|---|---|---|---|---|
| Roll | Pitch | Yaw | X | Y | Z | |||
| RegNet [ | Rg1 | 4541 | 0.24 | 0.25 | 0.36 | 7 | 7 | 4 |
| CalibNet [ | (±10°, ±0.2 m) | - | 0.18 | 0.9 | 0.15 | 4.2 | 1.6 | 7.22 |
| LCCNet [ | Rg1 | 4541 | 0.020 | 0.012 | 0.019 | 0.262 | 0.271 | 0.357 |
| Ours | Rg1 | 4541 | 0.002 | 0.011 | 0.004 | 0.183 | 0.068 | 0.183 |
Figure 5Changes in loss calculated during the training of Net1 and Net5 on the KITTI dataset. (a) calculated using Equation (10). (b) calculated using Equation (11).
Figure 6Results of applying the proposed method to a test frame of the Oxford dataset. (a) Transformation by randomly sampled deviations. (b) Transformation by given calibrated parameters. (c) Transformation by inferred from Net1. (d) Transformation by obtained from iterative refinement by five networks.
Figure 7Calibration results and error distribution when temporal filtering was applied to the Oxford dataset. (a) Transformation by randomly sampled deviations from Rg1. (b) Transformation by randomly sampled deviations from Rg1. (c) Calibration results from random deviations shown in (a). (d) Calibration results from random deviations shown in (b). (e) Rotation error for the results shown in (c). (f) Rotation error for the results shown in (d). (g) Translation error for the results shown in (c). (h) Translation error for the results shown in (d).
Quantitative results of calibration performed on the Oxford dataset without temporal filtering.
| Refinement Stage | Rotation Error (°) | Translation Error (cm) | ||||
|---|---|---|---|---|---|---|
| Roll | Pitch | Yaw | X | Y | Z | |
| After Net1 | 0.302 | 0.223 | 0.370 | 3.052 | 4.440 | 3.603 |
| After Net2 | 0.249 | 0.262 | 0.266 | 1.048 | 2.155 | 2.240 |
| After Net3 | 0.136 | 0.068 | 0.099 | 1.469 | 1.191 | 1.348 |
| After Net4 | 0.072 | 0.036 | 0.073 | 0.632 | 0.809 | 0.985 |
| After Net5 | 0.056 | 0.029 | 0.082 | 0.520 | 0.628 | 0.350 |
Quantitative results of calibration on Oxford dataset with temporal filtering.
| Method | Range | Bundle Size of Frame | Rotation Error (°) | Translation Error (cm) | ||||
|---|---|---|---|---|---|---|---|---|
| Roll | Pitch | Yaw | X | Y | Z | |||
| Ours | Rg1 | 100 | 0.035 | 0.017 | 0.060 | 0.277 | 0.305 | 0.247 |
Figure 8Changes in calculated losses during Net1 and Net5 training on the Oxford dataset. (a) Rotation loss calculated using Equation (10). (b) Translation loss calculated using Equation (11).
Comparison of calibration performance according to the cropped area on the Oxford dataset.
| Size of Area to be Cropped | Rotation Error (°) | Translation Error (cm) | ||||
|---|---|---|---|---|---|---|
| Roll | Pitch | Yaw | X | Y | Z | |
| N/A | 0.038 | 0.030 | 0.070 | 1.922 | 0.868 | 0.476 |
| [−5~5 m, −2–1 m, −5~5 m] | 0.033 | 0.027 | 0.062 | 0.538 | 0.668 | 0.564 |
| [−10~10 m, −2–1 m, −10~10 m] | 0.033 | 0.025 | 0.054 | 0.490 | 1.109 | 0.496 |
Comparison of calibration performance according to S on the KITTI dataset.
| Hyper-Parameter Setting | Rotation Error (°) | Translation Error (cm) | ||||
|---|---|---|---|---|---|---|
| Roll | Pitch | Yaw | X | Y | Z | |
| For the Combination of Net1 and Rg1 | ||||||
| 0.228 | 0.166 | 0.421 | 3.103 | 1.681 | 2.155 | |
| 0.199 | 0.199 | 0.429 | 2.881 | 1.613 | 2.514 | |
| 0.182 | 0.110 | 0.386 | 2.393 | 1.205 | 1.781 | |
| 0.295 | 0.206 | 0.595 | 4.489 | 2.288 | 2.840 | |
| For the Combination of Net5 and Rg5 | ||||||
| 0.344 | 0.019 | 0.063 | 0.778 | 0.429 | 0.887 | |
| 0.030 | 0.020 | 0.059 | 0.646 | 0.487 | 0.776 | |
| 0.028 | 0.017 | 0.070 | 0.610 | 0.363 | 0.702 | |
| 0.023 | 0.016 | 0.045 | 0.450 | 0.312 | 0.537 | |
Comparison of calibration performance according to S on the Oxford dataset.
| Hyper-Parameter Setting | Rotation Error (°) | Translation Error (cm) | ||||
|---|---|---|---|---|---|---|
| Roll | Pitch | Yaw | X | Y | Z | |
| For the Combination of Net1 and Rg1 | ||||||
| 0.382 | 0.328 | 0.436 | 2.606 | 8.114 | 2.881 | |
| 0.415 | 0.263 | 0.433 | 3.542 | 7.574 | 4.151 | |
| 0.302 | 0.223 | 0.370 | 3.052 | 4.440 | 3.603 | |
| - | - | - | - | - | - | |
| For the Combination of Net5 and Rg5 | ||||||
| 0.046 | 0.031 | 0.085 | 0.626 | 1.431 | 0.610 | |
| 0.031 | 0.228 | 0.535 | 0.448 | 1.357 | 0.457 | |
| 0.033 | 0.027 | 0.062 | 0.538 | 0.668 | 0.564 | |
| 0.036 | 0.025 | 0.057 | 0.552 | 0.699 | 0.539 | |
Comparison of calibration performance according to the bundle size of frames for temporal filtering on the KITTI dataset.
| Bundle Size of Frames | Rotation Error (°) | Translation Error (cm) | ||||
|---|---|---|---|---|---|---|
| Roll | Pitch | Yaw | X | Y | Z | |
| 1 | 0.024 | 0.017 | 0.057 | 0.414 | 0.257 | 0.395 |
| 10 | 0.009 | 0.013 | 0.018 | 0.210 | 0.102 | 0.245 |
| 25 | 0.006 | 0.011 | 0.013 | 0.176 | 0.080 | 0.197 |
| 50 | 0.004 | 0.011 | 0.008 | 0.170 | 0.070 | 0.190 |
| 100 | 0.003 | 0.011 | 0.006 | 0.175 | 0.069 | 0.195 |
Comparison of calibration performance according to the bundle size of frame for temporal filtering on the Oxford dataset.
| Bundle Size of Frames | Rotation Error (°) | Translation Error (cm) | ||||
|---|---|---|---|---|---|---|
| Roll | Pitch | Yaw | X | Y | Z | |
| 1 | 0.055 | 0.028 | 0.080 | 0.536 | 0.532 | 0.330 |
| 10 | 0.049 | 0.024 | 0.066 | 0.363 | 0.335 | 0.305 |
| 25 | 0.044 | 0.022 | 0.066 | 0.334 | 0.303 | 0.272 |
| 50 | 0.039 | 0.019 | 0.065 | 0.290 | 0.286 | 0.269 |
| 100 | 0.035 | 0.017 | 0.060 | 0.277 | 0.305 | 0.247 |