| Literature DB >> 30691239 |
Hui Zeng1,2, Bin Yang3,4, Xiuqing Wang5, Jiwei Liu6,7, Dongmei Fu8,9.
Abstract
With the development of low-cost RGB-D (Red Green Blue-Depth) sensors, RGB-D object recognition has attracted more and more researchers' attention in recent years. The deep learning technique has become popular in the field of image analysis and has achieved competitive results. To make full use of the effective identification information in the RGB and depth images, we propose a multi-modal deep neural network and a DS (Dempster Shafer) evidence theory based RGB-D object recognition method. First, the RGB and depth images are preprocessed and two convolutional neural networks are trained, respectively. Next, we perform multi-modal feature learning using the proposed quadruplet samples based objective function to fine-tune the network parameters. Then, two probability classification results are obtained using two sigmoid SVMs (Support Vector Machines) with the learned RGB and depth features. Finally, the DS evidence theory based decision fusion method is used for integrating the two classification results. Compared with other RGB-D object recognition methods, our proposed method adopts two fusion strategies: Multi-modal feature learning and DS decision fusion. Both the discriminative information of each modality and the correlation information between the two modalities are exploited. Extensive experimental results have validated the effectiveness of the proposed method.Entities:
Keywords: DS evidence theory; RGB-D object recognition; deep neural network; multi-modal learning
Year: 2019 PMID: 30691239 PMCID: PMC6387151 DOI: 10.3390/s19030529
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The flowchart of the proposed RGB-D object recognition method.
Figure 2The architecture of the proposed multi-modal network.
Figure 3The results of image scaling. (a) The RGB and depth images from the “ceteal_box” class; (b) the RGB and depth images from the “flashlight” class; (c) the RGB and depth images from the “cap” class; (d) the resized images of (a); (e) the resized images of (b); (f) the resized images of (c); (g) the scaled images of (a); (h) the scaled images of (b); (i) the scaled images of (c).
Figure 4Examples of the RGB images and the depth images. (a–c) Three samples from the class “orange”; (d–f) three samples from the class “tomato”; (g–i) three samples from the class “cereal_box”; (j–l) three samples from the class “toothpaste”.
Figure 5Objects of different categories from the Washington RGB-D object dataset.
10 recognition accuracies of our proposed method.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Mean | Var |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 92.9 | 92.7 | 90.1 | 91.9 | 92.2 | 90.4 | 93.1 | 90.2 | 91.7 | 92.8 | 91.8 | 1.4 |
Comparison of different baselines on the Washington RGB-D object dataset.
| Method | Accuracy (%) |
|---|---|
| RGB CNN | 85.7 ± 2.3 |
| Depth CNN | 81.3 ± 2.2 |
| RGB CNN+SVM | 87.5 ± 2.1 |
| Depth CNN+SVM | 84.8 ± 2.0 |
| RGB-D CNNs+Multi-modal learning | 90.2 ± 1.8 |
| RGB-D CNNs+ DS fusion | 88.9 ± 1.9 |
| RGB-D CNNs+Multi-modal learning+DS fusion |
|
Figure 6Examples of misclassified samples of the proposed method.
Comparison with state-of-the-art methods on the Washington RGB-D object dataset.
| Method | Accuracy (%) | ||
|---|---|---|---|
| RGB | Depth | RGB-D | |
| Linear SVM [ | 74.3 ± 3.3 | 53.1 ± 1.7 | 81.9 ± 2.8 |
| kSVM [ | 74.5 ± 3.1 | 64.7 ± 2.2 | 83.8 ± 3.5 |
| HKDES [ | 76.1 ± 2.2 | 75.7 ± 2.6 | 84.1 ± 2.2 |
| Kernel Descriptor [ | 77.7 ± 1.9 | 78.8 ± 2.7 | 86.2 ± 2.1 |
| CNN-RNN [ | 80.8 ± 4.2 | 78.9 ± 3.8 | 86.8 ± 3.3 |
| RGB-D HMP [ | 82.4 ± 3.1 | 81.2 ± 2.3 | 87.5 ± 2.9 |
| MMSS [ | 74.6 ± 2.9 | 75.6 ± 2.7 | 88.5 ± 2.2 |
| Fus-CNN (HHA) [ | 84.1 ± 2.7 | 83.0 ± 2.7 | 91.0 ± 1.9 |
| Fus-CNN (Jet) [ | 84.1 ± 2.7 | 83.8 ± 2.7 | 91.3 ± 1.4 |
| CFK [ | 86.8 ± 2.2 |
| 91.2 ± 1.5 |
| MDCNN [ | 87.9 ± 2.0 | 85.2 ± 2.1 |
|
| VGGnet + 3D CNN + VGG3D [ |
| 78.4 ± 2.4 | 91.8 ± 0.9 |
| Our proposed method | 87.5 ± 2.1 | 84.8 ± 2.0 | 91.8 ± 1.4 |