| Literature DB >> 35746385 |
Abstract
Dense depth perception is critical for many applications. However, LiDAR sensors can only provide sparse depth measurements. Therefore, completing the sparse LiDAR data becomes an important task. Due to the rich textural information of RGB images, researchers commonly use synchronized RGB images to guide this depth completion. However, most existing depth completion methods simply fuse LiDAR information with RGB image information through feature concatenation or element-wise addition. In view of this, this paper proposes a method to adaptively fuse the information from these two sensors by generating different convolutional kernels according to the content and positions of the feature vectors. Specifically, we divided the features into different blocks and utilized an attention network to generate a different kernel weight for each block. These kernels were then applied to fuse the multi-modal features. Using the KITTI depth completion dataset, our method outperformed the state-of-the-art FCFR-Net method by 0.01 for the inverse mean absolute error (iMAE) metric. Furthermore, our method achieved a good balance of runtime and accuracy, which would make our method more suitable for some real-time applications.Entities:
Keywords: adaptive mechanism; convolutional neural networks; depth completion; depth estimation; multi-modal fusion
Mesh:
Year: 2022 PMID: 35746385 PMCID: PMC9227403 DOI: 10.3390/s22124603
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1The adaptive fusion mechanism: (a) the overall flow of our adaptive fusion mechanism (for image feature I and depth map feature S, the adaptive kernel mechanism dynamically generates a kernel to fuse I and S and outputs the new depth feature ); (b) the adaptive kernel generation.
Figure 2The network architectures: (a) the Net1 architecture; (b) the Net2 architecture.
Performance comparison using the KITTI testing dataset. We tested our method using the Net1 architecture. The results were evaluated by the KITTI testing server and the different methods were ranked according to their RMSE. “Additional information” means that the algorithm was trained with additional data or labels. Note that most of the runtimes in the table are from their own papers and were tested on different GPUs, so only rough comparisons could be drawn.
| Methods | Additional Information | RMSE | MAE | iRMSE | iMAE | Time (s) |
|---|---|---|---|---|---|---|
| DFuseNet [ | × | 1206.66 | 429.93 | 3.62 | 1.79 | 0.08 |
| CSPN [ | × | 1019.64 | 279.46 | 2.93 | 1.15 | 1 |
| HMS-Net [ | × | 937.48 | 258.48 | 2.93 | 1.14 | - |
| Sparse-to-dense [ | × | 814.73 | 249.95 | 2.80 | 1.21 | 0.08 |
| Cross-Guidanced [ | × | 807.42 | 253.98 | 2.73 | 1.33 | 0.2 |
| PwP [ | ✓ | 777.05 | 235.17 | 2.42 | 1.13 | 0.1 |
| DSPN [ | × | 766.74 | 220.36 | 2.47 | 1.03 | 0.34 |
| DeepLiDAR [ | ✓ | 758.38 | 226.50 | 2.56 | 1.15 | 0.07 |
| UberATG-FuseNet [ | × | 752.88 | 221.19 | 2.34 | 1.14 | 0.09 |
| CSPN++ [ | × | 743.69 | 209.28 | 2.07 | 0.90 | 0.2 |
| NLSPN [ | × | 741.68 |
|
|
| 0.20 |
|
| × | 740.16 | 215.69 | 2.28 | 0.97 |
|
| GuidedNet [ | × | 736.24 | 218.83 | 2.25 | 0.99 | 0.14 |
| FCFR-Net [ | × |
| 217.15 | 2.20 | 0.98 | 0.13 |
Figure 3The qualitative comparisons were performed by CSPN [26] and NConv-CNN [12] using the KITTI testing dataset. The results are from the KITTI depth completion leaderboard, in which depth images are colorized along with the depth range. Our method (framed by the white boxes) achieved a better performance and recovered better details.
Performance comparison using the NYUv2 dataset. The settings of both 200 samples and 500 samples were evaluated.
| Samples | Methods | RMSE |
|---|---|---|
| Sparse-to-dense [ | 0.204 | |
| NConv-CNN [ | 0.129 | |
| 500 | CSPN [ | 0.117 |
| DeepLiDAR [ | 0.115 | |
|
| 0.104 | |
| GuidedNet [ |
| |
| Sparse-to-dense [ | 0.230 | |
| 200 | NConv-CNN [ | 0.173 |
| GuidedNet [ | 0.142 | |
|
|
|
The results of the different fusion methods using the KITTI validation dataset.
| Net Architecture | Fusion Methods | RMSE | MAE | iRMSE | iMAE |
|---|---|---|---|---|---|
| Add | 811.51 | 237.32 | 3.61 | 1.20 | |
| Net1 | Concat | 798.30 | 232.63 | 2.30 | 1.06 |
|
|
|
|
|
| |
| Add | 812.59 | 239.48 | 3.67 | 1.24 | |
| Net2 | Concat | 803.77 | 233.67 | 2.49 | 1.08 |
|
|
|
|
|
|
Figure 4The robustness was tested using the validation dataset of the KITTI depth completion benchmark (Net1 architecture): (a) the randomly added noise as a percentage of the input data; (b) the randomly discarded data as a percentage of input data; (c) the addition of varying degrees of noise to 10 percent of the input data (0 means no noise was added).