| Literature DB >> 35324608 |
Felix Gorschlüter1,2, Pavel Rojtberg1,2, Thomas Pöllabauer1,2.
Abstract
Six-dimensional object detection of rigid objects is a problem especially relevant for quality control and robotic manipulation in industrial contexts. This work is a survey of the state of the art of 6D object detection with these use cases in mind, specifically focusing on algorithms trained only with 3D models or renderings thereof. Our first contribution is a listing of requirements typically encountered in industrial applications. The second contribution is a collection of quantitative evaluation results for several different 6D object detection methods trained with synthetic data and the comparison and analysis thereof. We identify the top methods for individual requirements that industrial applications have for object detectors, but find that a lack of comparable data prevents large-scale comparison over multiple aspects.Entities:
Keywords: RGBD; machine learning; neural networks; object detection; pose estimation; synthetic training
Year: 2022 PMID: 35324608 PMCID: PMC8952329 DOI: 10.3390/jimaging8030053
Source DB: PubMed Journal: J Imaging ISSN: 2313-433X
Figure 1Two examples for applying object detectors in industrial production processes. (a) Quality control. (b) Bin picking (image from [1] published under CC BY 4.0).
Properties of 6D object detectors. Methods are sorted alphabetically. For detailed explanations of properties, see Section 4.1. Alternating grey and white rows are visual aids.
| Method | By | Year | Modality | Features | Scope | Output |
|---|---|---|---|---|---|---|
| AAE | Sundermeyer et al. [ | 2020 | RGB | Learned | Global | Cont. |
| CAE | Kehl et al. [ | 2016 | RGBD | Learned | Local | Cont. |
| CDPNv2 | Li et al. [ | 2019 | RGBD | Learned | Local | Cont. |
| CosyPose | Labbé et al. [ | 2020 | RGB | Learned | Global | Cont. |
| DPOD | Zakharov et al. [ | 2019 | RGB | Learned | Local | Cont. |
| DTT-OPT-3D | Rios-Cabrera and Tuytelaars [ | 2013 | RGBD | Learned | Global | Disc. |
| EPOS | Hodaň et al. [ | 2020 | RGB | Learned | Local | Cont. |
| FFB6D | He et al. [ | 2021 | RGBD | Learned | Local | Cont. |
| HybridPose | König and Drost [ | 2020 | RGBD | Learned | Global | Cont. |
| LCHF | Tejani et al. [ | 2014 | RGBD | Learned | Local | Cont. |
| LCHF | Tejani et al. [ | 2018 | RGBD | Learned | Local | Cont. |
| LineMOD | Hinterstoisser et al. [ | 2013 | RGBD | Hand-crafted | Global | Disc. |
| ObjPoseFromSyn | Rambach et al. [ | 2018 | RGB | Learned | Global | Cont. |
| Pix2Pose | Park et al. [ | 2019 | RGB | Learned | Local | Cont. |
| PointVoteNet | Hagelskjar and Buch [ | 2020 | D | Learned | Both | Cont. |
| PoseCluster | Buch et al. [ | 2017 | D | Learned | Local | Cont. |
| PoseRBPF | Deng et al. [ | 2021 | RGBD | Learned | Global | Cont. |
| PPF | Drost et al. [ | 2010 | D | Hand-crafted | Local | Cont. |
| PPF | Hinterstoisser et al. [ | 2016 | D | Hand-crafted | Local | Cont. |
| PPF | Vidal et al. [ | 2018 | D | Hand-crafted | Local | Cont. |
| PVNet | Peng et al. [ | 2019 | RGB | Learned | Local | Cont. |
| RandomForest | Brachmann et al. [ | 2014 | RGB | Learned | Local | Cont. |
| SSD6D | Kehl et al. [ | 2017 | RGB | Learned | Global | Cont. |
| SurfEmb | Haugaard and Buch [ | 2021 | RGBD | Learned | Global | Cont. |
| SyDPose | Thalhammer et al. [ | 2019 | D | Learned | Global | Cont. |
| SynPo-Net | Su et al. [ | 2021 | RGB | Learned | Global | Cont. |
| TemplateBased | Hodan et al. [ | 2015 | RGBD | Hand-crafted | Global | Cont. |
| UncertaintyDriven | Brachmann et al. [ | 2016 | RGB | Learned | Local | Cont. |
| YOLO6D | Tekin et al. [ | 2018 | RGB | Learned | Local | Cont. |
Figure 2Sample images from the three datasets considered in this work. Lighter renderings show annotated poses. (a) LM, (b) LMO, (c) TLESS. Images used with permission from Hodaň et al. [3].
Dataset–metric combinations we found empirical data for, the tasks (Section 3.1) and challenges (Section 3.2) that they address, and the number of data points we found for each (i.e., the number relevant methods evaluated). Alternating grey and white rows are visual aids.
| Dataset | Metric | Task | Challenges | Data-Points |
|---|---|---|---|---|
| LM | ADD(S)-Recall | Localization | Background-clutter | 19 |
| LM | ADD(S)-F1 | Detection | Background-clutter | 7 |
| LM | VSD-Recall | Localization | Background-clutter | 11 |
| LMO | VSDBOP-Recall | Localization | Background-clutter, occlusion | 17 |
| TLESS | VSD-Recall | Localization | Texturelessness, symmetry | 11 |
| TLESS | VSDBOP-Recall | Localization | Texturelessness, symmetry | 12 |
Scores of model-based trained 6D object detectors for different datasets and metrics in percentages. Methods are sorted alphabetically for easy cross referencing with Table 1. Methods variants are given in brackets (e.g. refinements, like ICP). Unlike in Table 1, refinement steps are considered in the modality-column. All except ADD(S)-F1 are recall-based scores. Darker green in cell backgrounds denotes higher scores, column wise; alternating grey and white rows are visual aids. The top three methods of a column are in bold. † denotes that a threshold of was used for ADD(S) instead of . No citation is given for the values of FFB6D, as we evaluated them ourselves.
| LM | LM | LM | LMO | TLESS | TLESS | ||
|---|---|---|---|---|---|---|---|
| Method | Modality | ADD(S) | ADD(S)-F1 | VSD | VSDBOP | VSD | VSDBOP |
| RGBD | 71.58 [ | ||||||
|
| RGB | 32.63 [ | 20.53 [ | ||||
| RGBD | 58.2 [ | 24.6 [ | |||||
| RGBD | 46.9 [ | 36.8 [ | |||||
|
| RGBD | 44.5 [ | 30.3 [ | ||||
|
| RGB | 48 [ | |||||
|
| RGB | 10.1 [ | 4.8 [ | ||||
|
| RGBD | ||||||
|
| RGB | 38.9 [ | 38 [ | ||||
|
| RGBD | 54.08 | 55.5 | 37.7 | |||
| RGBD | |||||||
| RGBD | 78.6 [ | ||||||
|
| RGBD | 12.1 [ | |||||
| RGBD | |||||||
|
| RGBD | ||||||
| RGBD | 96.3 [ | ||||||
|
| RGBD | 63 [ | |||||
|
| RGB | 10.22 [ | |||||
|
| RGB | 11.32 [ | 15.6 [ | ||||
| D | 0.3 [ | ||||||
| D | 56.6 [ | ||||||
| D | 33.33 [ | ||||||
| RGBD | |||||||
|
| RGBD | ||||||
| RGBD | 42.5 [ | 67.5 [ | 46.9 [ | ||||
| D | 43.7 [ | 37.5 [ | |||||
| RGBD | 79.13 [ | 39.2 [ | 37 [ | ||||
|
| D | 78.9 [ | 51.7 † [ | 56.81 [ | |||
|
| D | 96.4 [ | |||||
| D | 47.3 [ | 66.51 [ | 46.4 [ | ||||
|
| D | 66.3 [ | |||||
| RGBD | 50.2 [ | ||||||
|
| RGB | 42.8 [ | |||||
|
| RGB | 67.6 [ | |||||
| RGBD | 79 [ | ||||||
|
| RGB | 2.42 [ | 4.7 [ | ||||
|
| RGBD | ||||||
|
| D | 30.21 [ | 59.1 † [ | ||||
| RGBD | 72.29 [ | ||||||
|
| RGB | 44.13 [ | |||||
| RGBD | 94.9 [ | ||||||
|
| RGBD | 69.83 [ | 63.18 [ | ||||
|
| RGB | 75.33 [ | 17.84 [ | ||||
|
| RGB | 21.43 [ |
Runtimes of methods and their variants. Note that the comparability of the values listed here is under great reservations, as they were generated by different persons with different parameters, using different hardware over the course of 11 years. Nevertheless, they can at least give an impression of the order of magnitude in which algorithms perform. Alternating grey and white row colors are visual aids.
| Method | Variant | Modality | Runtime [s] |
|---|---|---|---|
|
SynPo-Net [ | RGB | 0.015 | |
| YOLO6D [ | RGB | 0.02 | |
| DTT-OPT-3D [ | RGBD | 0.055 | |
| PoseRBPF [ | RGBD | 0.071 | |
| SSD6D [ | RGB | 0.083 | |
| PoseRBPF [ | SDF | RGBD | 0.156 |
| FFB6D [ | RGBD | 0.196 | |
| AAE [ | RGB | 0.2 | |
| DPOD [ | RGB | 0.206 | |
| HybridPose [ | ICP | RGBD | 0.337 |
| CosyPose [ | RGB | RGB | 0.47 |
| CosyPose [ | RGB | 0.493 | |
| AAE [ | ICP | RGBD | 0.8 |
| CDPNv2 [ | RGB | 0.98 | |
| PPF [ | ICP | D | 1.38 |
| LCHF [ | RGBD | 1.4 | |
| RandomForest [ | RGBD | 1.4 | |
| CDPNv2 [ | ICP | RGBD | 1.49 |
| CAE [ | ICP | RGBD | 1.8 |
| EPOS [ | RGB | 1.87 | |
| LCHF [ | RGBD | 1.96 | |
| TemplateBased [ | PSO | RGBD | 2.1 |
| PPF [ | D | 2.3 | |
| PPF [ | ICP | D | 3.22 |
| UncertaintyDriven [ | RGBD | 4.4 | |
| SurfEmb [ | RGBD | 9.227 | |
| TemplateBased [ | RGBD | 12.3 | |
| PoseCluster [ | ICP, PPFH | D | 14.2 |
| PoseCluster [ | ICP, SI | D | 15.9 |
| PPF [ | Edge | RGBD | 21.5 |
| PPF [ | ICP, Edge | D | 21.5 |
| PPF [ | ICP | RGBD | 87.57 |