| Literature DB >> 32751645 |
Sunwon Jeong1, Ju Yong Chang1.
Abstract
In this paper, we address the problem of 3D human mesh reconstruction from a single 2D human pose based on deep learning. We propose MeshLifter, a network that estimates a 3D human mesh from an input 2D human pose. Unlike most existing 3D human mesh reconstruction studies that train models using paired 2D and 3D data, we propose a weakly supervised learning method based on a loop structure to train the MeshLifter. The proposed method alleviates the difficulty of obtaining ground-truth 3D data to ensure that the MeshLifter can be trained successfully from a 2D human pose dataset and an unpaired 3D motion capture dataset. We compare the proposed method with recent state-of-the-art studies through various experiments and show that the proposed method achieves effective 3D human mesh reconstruction performance. Notably, our proposed method achieves a reconstruction error of 59.1 mm without using the 3D ground-truth data of Human3.6M, the standard dataset for 3D human mesh reconstruction.Entities:
Keywords: 3D human mesh reconstruction; 3D human pose estimation; deep neural network; weakly supervised learning
Mesh:
Year: 2020 PMID: 32751645 PMCID: PMC7436123 DOI: 10.3390/s20154257
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Overview of the proposed method.
Comparison of our proposed method with other previous methods for 3D human body mesh reconstruction. “Optimization” indicates that the method depends on the optimization process, which requires parameter initialization and is generally slow. “Regression” indicates that the method is a deep neural network that requires large-scale training data. “Paired 2D–3D” indicates that paired 2D and 3D data should be used for training. “2D pose input” indicates that a 2D human pose can be used as the input to the method instead of an red, green, and blue (RGB) image.
| Method | Optimization | Regression | Paired 2D–3D | 2D Pose Input |
|---|---|---|---|---|
| SMPLify [ | √ | √ | ||
| UP-3D [ | √ | |||
| HMR [ | √ | |||
| CMR [ | √ | √ | ||
| RGB-D [ | √ | √ | ||
| SPIN [ | √ | √ | ||
| Ours | √ | √ |
Figure 2Overview of the MeshLifter.
Figure 3Overview of the loop structure.
Figure 4Rotation around z-axis.
Description of the datasets used in our experiment.
| Dataset | Human3.6M [ | MPI-INF-3DHP [ | Mosh [ | MPII [ |
|---|---|---|---|---|
| Data acquisition | Marker-based motion capture | Marker-less motion capture | Marker-based motion capture | YouTube search |
| 2D image | √ | √ | √ | |
| 2D human pose | √ | √ | √ | |
| 3D human pose | √ | √ | √ | |
| SMPL parameters | √ | |||
| Number of subjects | 11 | 8 | 39 | 40K |
| Number of examples | 3.6M | 100K | 410K | 40K |
| Purpose of use | Training and evaluation | Training and evaluation | Adversarial training | Qualitative evaluation |
Figure 5Curves of all our losses during training.
Performance of 2D human pose estimation. The numbers denote mean Euclidean distances in pixels.
| Dataset | Pixel Error |
|---|---|
| Human3.6M | 3.2 |
| MPII | 6.3 |
Ablation experiments with various combinations of losses. The numbers denote reconstruction errors in mm.
| Loss Variations | Reconstruction Error |
|---|---|
| Self (baseline) | 157.0 |
| Self + Loop | 136.1 |
| Self + Loop + Mesh | 83.5 |
| Self + Loop + Mesh + 2D | 58.8 |
| Self + Loop + Mesh + 2D + Reg | 59.1 |
Figure 6Input images (left), and the reconstruction results of using (right) and not using (middle) the regularization term.
Quantitative results of the proposed model and the existing state-of-the-art methods for the Human3.6M dataset. The numbers denote reconstruction errors in mm.
| Method | Reconstruction Error |
|---|---|
| SMPLify [ | 82.0 |
| Pavlakos et al. [ | 75.9 |
| HMR-unpaired [ | 66.5 |
| SPIN-unpaired [ | 62.0 |
| Ours |
|
Quantitative results of the proposed model and the existing state-of-the-art methods for the MPI-INF-3DHP dataset. The numbers denote reconstruction errors in mm.
| Method | Reconstruction Error |
|---|---|
| HMR-unpaired [ | 113.2 |
| VNect [ | 98.0 |
| Ours | 96.0 |
| SPIN-unpaired [ |
|
Figure 7Qualitative results on the Human3.6M dataset.
Figure 8Qualitative results on the MPI-INF-3DHP dataset.
Figure 9Qualitative results on the MPII dataset.
Quantitative comparison with existing methods on the Rendered Handpose Dataset (RHD) dataset. The numbers denote reconstruction errors in mm.
| Method | Reconstruction Error |
|---|---|
| Zimmermann and Brox [ | 30.42 |
| Yang and Yao [ | 19.95 |
| Spurr et al. [ | 19.73 |
| Yang et al. [ |
|
| Ours | 14.02 |
Figure 10Qualitative results on the RHD dataset.