| Literature DB >> 32722044 |
Shao-Kang Huang1, Chen-Chien Hsu1, Wei-Yen Wang1, Cheng-Hung Lin1.
Abstract
Accurate estimation of 3D object pose is highly desirable in a wide range of applications, such as robotics and augmented reality. Although significant advancement has been made for pose estimation, there is room for further improvement. Recent pose estimation systems utilize an iterative refinement process to revise the predicted pose to obtain a better final output. However, such refinement process only takes account of geometric features for pose revision during the iteration. Motivated by this approach, this paper designs a novel iterative refinement process that deals with both color and geometric features for object pose refinement. Experiments show that the proposed method is able to reach 94.74% and 93.2% in ADD(-S) metric with only 2 iterations, outperforming the state-of-the-art methods on the LINEMOD and YCB-Video datasets, respectively.Entities:
Keywords: LINEMOD; convolution neural network; deep learning; object pose estimation
Year: 2020 PMID: 32722044 PMCID: PMC7436036 DOI: 10.3390/s20154114
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The architecture of the proposed pose estimation system.
Figure 2The proposed iterative pose refinement process.
Figure 3The process of image modification.
Comparison between the proposed method and state-of-the-art approaches on LINEMOD in ADD(-S) metric, where the best results are marked in bold. Note that * denotes the objects are symmetric.
| BB8 w/Ref. [ | SSD-6D w/Ref. [ | PVNet [ | Tien [ | DenseFusion [ | Proposed Method | |
|---|---|---|---|---|---|---|
| Ape | 40.4 | 65 | 43.62 | 85.03 | 92 | 92.95 |
| Bench vise | 91.8 | 80 |
| 95.54 | 93 | 92.05 |
| Cam | 55.7 | 78 | 86.86 | 91.27 | 94 |
|
| Can | 64.1 | 86 |
| 95.18 | 93 | 93.31 |
| Cat | 62.6 | 70 | 79.34 | 93.61 |
| 96.31 |
| Driller | 74.4 | 73 |
| 82.56 | 87 | 88.80 |
| Duck | 44.3 | 66 | 52.58 | 88.08 | 92 |
|
| Eggbox * | 57.8 |
| 99.15 | 99.90 |
| 99.71 |
| Glue * | 41.2 |
| 95.66 | 99.61 |
| 99.90 |
| Hole pucher | 67.2 | 49 | 81.92 |
| 92 | 91.15 |
| Iron | 84.7 | 78 |
| 95.91 | 97 | 96.32 |
| Lamp | 76.5 | 73 |
| 94.43 | 95 | 94.91 |
| Phone | 54.0 | 79 | 92.41 | 93.56 | 93 |
|
| Average | 62.7 | 79 | 86.27 | 92.87 | 94 |
|
Comparison between the proposed method and state-of-the-art approaches on YCB-Video dataset in AUC metric, where the best results are marked in bold.
| Tien [ | Posecnn+ICP [ | DenseFusion [ | Proposed Method | |
|---|---|---|---|---|
| 002_master_chef_can | 93.9 | 95.8 |
|
|
| 003_cracker_box | 92.9 | 91.8 | 95.5 |
|
| 004_sugar_box | 95.4 | 98.2 | 97.5 | 97.6 |
| 005_tomato_soup_can | 93.3 | 94.5 |
| 94.5 |
| 006_mustard_bottle | 95.4 | 98.4 | 97.2 | 97.4 |
| 007_tuna_fish_can | 94.9 | 97.1 | 96.6 |
|
| 008_pudding_box | 94.0 | 97.9 | 96.5 |
|
| 009_gelatin_box | 97.6 |
| 98.1 | 98.0 |
| 010_potted_meat_can | 90.6 |
| 91.3 | 90. 7 |
| 011_banana | 91.7 |
| 96.6 | 96.1 |
| 019_pitcher_base | 93.1 |
| 97.1 | 97.5 |
| 021_bleach_cleanser | 93.4 |
| 95.8 | 95.9 |
| 024_bowl |
| 78.3 | 88.2 | 89.5 |
| 025_mug | 96.1 | 95.1 |
| 96.7 |
| 035_power_drill | 93.3 |
| 96.0 | 96.1 |
| 036_wood_block | 87.6 | 90.5 | 89.7 |
|
| 037_scissors |
| 92.2 | 95.2 | 92.1 |
| 040_large_marker | 95.6 | 97.2 | 97.5 |
|
| 051_large_clamp |
|
| 72.9 | 72.5 |
| 052_extra_large_clamp |
| 65.3 | 69.8 | 70.0 |
| 061_foam_brick |
| 97.1 | 92.5 | 92.0 |
| Average | 91.8 | 93.0 | 93.1 |
|
Figure 4The object pose estimation results by yellow bounding box are drawn in the corresponding scene images according to the LINEMOD dataset. The name of the target objects in column from left to right are Cam, Can, Driller, and Lamp, respectively.
Figure 5The evolving process of the estimated pose during the refinement process, for the scene shown in the far right column in Figure 4.
Figure 6(a) The accuracy-threshold curves of DenseFusion [11] (green line) and the proposed method (red line) based on LINEMOD. (b) A closer look of the enlarged portion indicated by the black-dash rectangle in (a).