| Literature DB >> 32977436 |
Tomoya Kaichi1, Tsubasa Maruyama2, Mitsunori Tada2, Hideo Saito1.
Abstract
Human motion capture (MoCap) plays a key role in healthcare and human-robot collaboration. Some researchers have combined orientation measurements from inertial measurement units (IMUs) and positional inference from cameras to reconstruct the 3D human motion. Their works utilize multiple cameras or depth sensors to localize the human in three dimensions. Such multiple cameras are not always available in our daily life, but just a single camera attached in a smart IP devices has recently been popular. Therefore, we present a 3D pose estimation approach from IMUs and a single camera. In order to resolve the depth ambiguity of the single camera configuration and localize the global position of the subject, we present a constraint which optimizes the foot-ground contact points. The timing and 3D positions of the ground contact are calculated from the acceleration of IMUs on foot and geometric transformation of foot position detected on image, respectively. Since the results of pose estimation is greatly affected by the failure of the detection, we design the image-based constraints to handle the outliers of positional estimates. We evaluated the performance of our approach on public 3D human pose dataset. The experiments demonstrated that the proposed constraints contributed to improve the accuracy of pose estimation in single and multiple camera setting.Entities:
Keywords: human pose estimation; inertial measurement units; sensor fusion; single view
Mesh:
Year: 2020 PMID: 32977436 PMCID: PMC7582626 DOI: 10.3390/s20195453
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Relations among the local coordinate systems.
Figure 2Visualization of the ground contact constraint.
Figure 3The top graph (a) represents per joint mean position error for each frame. The bottom figures (b) and (c) illustrate the the view of the used single camera and the detected joints by the 2D joint detector, OpenPose [6]. The human models colored in green, red, and blue represent the inference by IMU only, the proposed approach, and ground-truth from optical MoCap, respectively. It is observed that the position of the foot touching the ground is estimated correctly.
3D position error (cm) on TotalCapture dataset.
| S1 | S2 | S3 | S4 | S5 | Mean | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| W2 | A3 | F3 | W2 | A3 | R3 | W2 | A3 | F1 | W2 | A3 | F3 | W2 | F1 | F3 | ||
| Mean position error (cm) | ||||||||||||||||
| 52.4 | 90.1 | 22.5 | 33.3 | 22.6 | 27.4 | 51.4 | 26.9 | 24.6 | 50.4 | 53.3 | 56.1 | 57.7 | 37.1 | 43.1 | 43.3 | |
| 45.0 | 42.7 | 44.2 | 144 | 63.9 | 8.91 | 34.8 | 72.3 | 62.4 | 42.3 | 221 | 39.4 | 124 | 32.9 | 81.0 | 70.6 | |
|
| 54.4 | 41.7 | 29.4 | 142 | 63.3 | 12.2 | 33.0 | 68.8 | 68.5 | 42.8 | 224 | 39.2 | 124 | 28.2 | 78.1 | 70.0 |
|
|
|
|
|
|
| 7.37 | 15.3 |
| 14.3 |
| 13.8 |
|
| 46.7 | 17.5 | 15.8 |
|
| 20.2 | 15.6 | 12.2 | 12.2 | 10.2 |
|
| 12.5 |
| 16.3 |
| 14.7 | 16.0 |
|
|
|
| Mean orientation error (degrees) | ||||||||||||||||
| 9.32 | 8.25 | 9.43 | 8.59 | 8.27 | 12.5 | 6.50 | 6.55 | 10.6 | 7.10 | 8.14 | 9.51 | 6.59 | 8.37 | 11.6 | 8.75 | |
|
| 9.38 | 8.45 | 9.45 | 8.74 | 8.51 | 12.5 | 6.65 | 6.63 | 10.9 | 7.07 | 8.20 | 9.52 | 6.72 | 8.37 | 11.3 | 8.83 |
The minimum error values are shown in bold.
Figure 4(a) Mean per-joint positional error of the human motion capture (MoCap) by the proposed method on all the scenes in the test set. The error values of wrist, elbow, shoulder, ankle, knee, and hip represent the average error of the both side of the segments, i.e., the error of the wrist denotes the average error of left wrist and right wrist. (b) Mean 3D position and orientation errors on subjects S3-F1 and S4-F3 with 8 to 13 IMUs.
3D orientation error (degrees) on TotalCapture dataset.
| S1-F3 | S2-R3 | S3-F1 | S4-F3 | S5-F1 | Mean | |
|---|---|---|---|---|---|---|
| Trumble et al. [ | 9.4 | 9.3 | 13.6 | 11.6 | 10.5 | 10.9 |
| Malleson et al. [ | 7.4 |
| 6.7 | 6.4 | 7.0 | 6.3 |
|
|
| 5.66 | 6.70 |
|
|
|
The minimum error values are shown in bold.