| Literature DB >> 36236485 |
Yongseop Jeong1, Jinsun Park2, Donghyeon Cho3, Yoonjin Hwang4, Seibum B Choi4, In So Kweon5.
Abstract
Depth perception capability is one of the essential requirements for various autonomous driving platforms. However, accurate depth estimation in a real-world setting is still a challenging problem due to high computational costs. In this paper, we propose a lightweight depth completion network for depth perception in real-world environments. To effectively transfer a teacher's knowledge, useful for the depth completion, we introduce local similarity-preserving knowledge distillation (LSPKD), which allows similarities between local neighbors to be transferred during the distillation. With our LSPKD, a lightweight student network is precisely guided by a heavy teacher network, regardless of the density of the ground-truth data. Experimental results demonstrate that our method is effective to reduce computational costs during both training and inference stages while achieving superior performance over other lightweight networks.Entities:
Keywords: depth completion; knowledge distillation; local similarity; model compression; multimodal learning; sensor fusion
Mesh:
Year: 2022 PMID: 36236485 PMCID: PMC9573132 DOI: 10.3390/s22197388
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1An overall pipeline of the proposed algorithm. The ResNet34-based teacher network consists of two separate encoders for RGB and LiDAR and a decoder for depth prediction. Output feature dimensionalities of each layer are shown together. Encoder features from RGB and LiDAR are concatenated and fed into the decoder. Skip connections deliver encoder features to decoder layers by concatenation. The ResNet18-based student network is distilled with the knowledge from the teacher network.
Quantitative evaluation results on the KITTI DC validation dataset [32] (T: Teacher, S: Student, D: Distilled).
| Network | # Params (M)/GFLOPs (912 × 220) | Distillation | Metrics | |||
|---|---|---|---|---|---|---|
| RMSE | MAE | iRMSE | iMAE | |||
| Self S2D [ | 26.11/637.89 | - | 878.6 | 260.9 | 3.3 | 1.3 |
| ResNet34 (T) | 51.77/349.36 | - | 865.2 | 222.1 | 2.4 | 1.0 |
| ResNet18 (S) | 8.56/59.22 | - | 921.5 | 233.3 | 2.7 | 1.0 |
| ResNet18 (D) | 8.56/59.22 | PROB [ | 902.6 | 243.3 | 8.5 | 1.1 |
| ATT [ | 907.6 | 245.0 | 2.7 | 1.1 | ||
| Ours | 893.0 | 234.9 | 2.8 | 1.0 | ||
| Ours + PROB | 893.7 | 238.6 | 2.6 | 1.0 | ||
| Ours + ATT | 893.3 | 243.5 | 2.6 | 1.0 | ||
| Ours + PROB + ATT | 891.8 | 238.6 | 2.7 | 1.0 | ||
Figure 2Depth prediction results on the KITTI DC validation dataset [32]. (a) RGB. (b) Sparse Depth. (c) GT. (d) Teacher. (e) Student. (f) PROB [30]. (g) ATT [31]. (h) Ours.
Quantitative evaluation results on the NYUv2 validation dataset [28] (T: Teacher, S: Student, D: Distilled).
| Network | # Params (M)/GFLOPs (304 × 228) | Distillation | Metrics | ||||
|---|---|---|---|---|---|---|---|
| RMSE | REL |
|
|
| |||
| S2D + SPN [ | 31.88/24.53 | - | 172.0 | 0.0310 | 0.9710 | 0.9940 | 0.9980 |
| DeepLiDAR [ | 143.98/502.12 | - | 115.0 | 0.0220 | 0.9930 | 0.9990 | 1.0000 |
| ResNet34 (T) | 51.77/112.32 | - | 114.4 | 0.0184 | 0.9932 | 0.9989 | 0.9998 |
| ResNet18 (S) | 0.66/1.46 | - | 152.1 | 0.0282 | 0.9875 | 0.9978 | 0.9995 |
| ResNet18 (D) | 0.66/1.46 | PROB [ | 154.8 | 0.0328 | 0.9891 | 0.9982 | 0.9996 |
| ATT [ | 149.9 | 0.0302 | 0.9891 | 0.9982 | 0.9996 | ||
| Ours | 138.8 | 0.0248 | 0.9899 | 0.9984 | 0.9997 | ||
| Ours + PROB | 138.9 | 0.0249 | 0.9900 | 0.9984 | 0.9997 | ||
| Ours + ATT | 138.7 | 0.0248 | 0.9899 | 0.9984 | 0.9997 | ||
| Ours + PROB + ATT | 143.6 | 0.0268 | 0.9899 | 0.9984 | 0.9997 | ||
Performance comparison with various combinations of layers for the distillation on the KITTI DC validation dataset [32].
| Encoder | Decoder | Metrics | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| - | - | ✓ | ✓ | ✓ | - | - | - | - | - | 899.3 | 241.8 | 2.7 | 1.1 |
| - | - | - | - | - | ✓ | ✓ | ✓ | - | - | 897.1 | 239.9 | 2.9 | 1.0 |
| - | ✓ | ✓ | ✓ | - | - | - | - | - | - | 894.4 | 235.5 | 2.5 | 1.0 |
| - | - | - | - | - | - | ✓ | ✓ | ✓ | - | 896.4 | 237.9 | 2.6 | 1.0 |
| ✓ | ✓ | ✓ | - | - | - | - | - | - | - | 901.7 | 239.4 | 2.8 | 1.1 |
| - | - | - | - | - | - | - | ✓ | ✓ | ✓ | 899.8 | 242.0 | 2.6 | 1.0 |
| ✓ | ✓ | ✓ | ✓ | - | - | - | - | - | - | 898.1 | 237.2 | 2.6 | 1.0 |
| - | - | - | - | - | - | ✓ | ✓ | ✓ | ✓ | 894.1 | 238.3 | 2.6 | 1.0 |
| - | - | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | - | - | 902.8 | 236.7 | 2.7 | 1.0 |
| - | ✓ | ✓ | ✓ | - | - | ✓ | ✓ | ✓ | - | 893.0 | 234.9 | 2.8 | 1.0 |
| ✓ | ✓ | ✓ | - | - | - | - | ✓ | ✓ | ✓ | 898.7 | 236.8 | 2.6 | 1.0 |
| ✓ | ✓ | ✓ | ✓ | - | - | ✓ | ✓ | ✓ | ✓ | 894.9 | 235.7 | 2.6 | 1.0 |