| Literature DB >> 35196346 |
Hao Wang1,2, Ming-Hui Sun1,2, Hao Zhang1, Li-Yan Dong1,2.
Abstract
The cross-view 3D human pose estimation model has made significant progress, it better completed the task of human joint positioning and skeleton modeling in 3D through multi-view fusion method. The multi-view 2D pose estimation part of this model is very important, but its training cost is also very high. It uses some deep learning networks to generate heatmaps for each view. Therefore, in this article, we tested some new deep learning networks for pose estimation tasks. These deep networks include Mobilenetv2, Mobilenetv3, Efficientnetv2 and Resnet. Then, based on the performance and drawbacks of these networks, we built multiple deep learning networks with better performance. We call our network in this article LHPE-nets, which mainly includes Low-Span network and RDNS network. LHPE-nets uses a network structure with evenly distributed channels, inverted residuals, external residual blocks and a framework for processing small-resolution samples to achieve training saturation faster. And we also designed a static pose sample simplification method for 3D pose data. It implemented low-cost sample storage, and it was also convenient for models to read these samples. In the experiment, we used several recent models and two public estimation indicators. The experimental results show the superiority of this work in fast start-up and network lightweight, it is about 1-5 epochs faster than the Resnet-34 during training. And they also show the accuracy improvement of this work in estimating different joints, the estimated performance of approximately 60% of the joints is improved. Its performance in the overall human pose estimation exceeds other networks by more than 7mm. The experiment analyzes the network size, fast start-up and the performance in 2D and 3D pose estimation of the model in this paper in detail. Compared with other pose estimation models, its performance has also reached a higher level of application.Entities:
Mesh:
Year: 2022 PMID: 35196346 PMCID: PMC8865690 DOI: 10.1371/journal.pone.0264302
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Comparison of Resnet-34 network and Low-S network in the number of parameters and memory usage.
| network | Total params | Trainable params | Estimated Total |
|---|---|---|---|
| Resnet-18 | 11176512 | 11176512 | 291.39MB |
| Resnet-34 | 21284672 | 21284672 | 504.94MB |
| Low-S | 9037312 | 9037312 | 485.04MB |
The JDR results of each joint on the MPI-INF-3DHP dataset.
The better results in our method are written in bold. R18 = Resnet-18, R34 = Resnet-34, M2 = Mobilenetv2.
| Method | R18 | R34 | M2 | RDNS |
|---|---|---|---|---|
| Right hip | 0.947 | 0.946 | 0.82 |
|
| Right knee | 0.529 | 0.544 | 0.568 |
|
| Right ankle | 0.363 | 0.432 | 0.497 |
|
| Left hip | 0.942 | 0.953 | 0.828 |
|
| Left knee | 0.521 | 0.537 | 0.601 | 0.52 |
| Left ankle | 0.416 | 0.435 | 0.512 | 0.394 |
| Belly | 0.914 | 0.953 | 0.87 |
|
| Neck | 0.922 | 0.921 | 0.809 | 0.916 |
| Nose | 0.912 | 0.916 | 0.827 | 0.912 |
| Head | 0.871 | 0.875 | 0.735 | 0.872 |
| Left shoulder | 0.792 | 0.835 | 0.598 |
|
| Left elbow | 0.592 | 0.623 | 0.404 |
|
| Left wrist | 0.476 | 0.503 | 0.254 |
|
| Right shoulder | 0.738 | 0.772 | 0.571 |
|
| Right elbow | 0.612 | 0.6 | 0.397 |
|
| Right wrist | 0.547 | 0.494 | 0.293 |
|
The JDR (%) results of each joint on the MPII dataset.
The better results in our method are written in bold.
| Method | Resnet-18 | Resnet-34 | RDNS |
|---|---|---|---|
| Right hip | 0.428 | 0.438 | 0.437 |
| Right knee | 0.535 | 0.539 |
|
| Right ankle | 0.727 | 0.727 |
|
| Left hip | 0.725 | 0.725 |
|
| Left knee | 0.537 | 0.543 |
|
| Left ankle | 0.424 | 0.448 | 0.444 |
| Belly | 0.77 | 0.77 |
|
| Neck | 0.924 | 0.928 | 0.926 |
| Nose | 0.928 | 0.931 |
|
| Head | 0.884 | 0.893 |
|
| Left shoulder | 0.564 | 0.579 | 0.572 |
| Left elbow | 0.676 | 0.693 | 0.687 |
| Left wrist | 0.813 | 0.827 |
|
| Right shoulder | 0.823 | 0.837 | 0.837 |
| Right elbow | 0.684 | 0.691 | 0.691 |
| Right wrist | 0.561 | 0.571 |
|
Fig 1The structure of the Resnet-18 and Resnet-34 network.
The performance of CVF3D model (with Resnet-152) compared with other models in 3D pose estimation.
The metrics (MPJPE) here is introduced in the experiment section of this article. This experiment uses the H36M dataset.
| Methods | PVH-TSP [ | Wei | Pavlakos | Tome | Zheng | CVF3D |
|---|---|---|---|---|---|---|
| MPJPE | 87.3mm | 57.1mm | 56.9mm | 52.8mm | 49.5mm | 31.17mm |
Fig 2The realization and structure of our work, and its operation process.
Fig 3Low-Sv1 and Low-Sv2 network structure.
Fig 4Low-S network structure.
The “inverted residual block with external residual” on the right is the specific implementation inside the “inverted residual”.
Fig 5The projection of the 3D pose data in the four views is converted to four small images with a resolution of 128*128 (or 256*256) without losing their relevance and continuity.
Fig 6The structure of the RDNS deep network.
Fig 7Comparative experiment results of Low-S, Low-Sv1, Low-Sv2 and other deep networks.
Fig 8Comparative experiment results between RDNS network (the network is represented as an improved model) and Resnet-34 network.
The performance of our method in 3D pose estimation.
The data in the table is the mean error per joint position (MPJPE). The better results in our method are written in bold. No action is distinguished in the table. The unit of MPJPE is mm.
| Method | Resnet-18 | Resnet-34 | Mobilenetv2 | RDNS method |
|---|---|---|---|---|
| Origin model | 125.26 | 120.17 | 175.74 |
|
Comparison of our method and multiple mainstream 3D pose estimation models in MPI-INF-3DHP.
The better results are written in bold, they include multi-view models.
| Method | Mehta | VNect | Multi Person [ | Zhou | Kanazawa | CVF3D | Wei | Biswas | Ours |
|---|---|---|---|---|---|---|---|---|---|
| MPJPE | 117.6 | 124.7 | 122.2 | 137.1 | 113.2 | 120.17 | 117.2 | 120.17 |
|
The 3D estimation performance of our method in different actions.
The data in the table is the mean error per joint position (MPJPE). The better results in our method are written in bold. R18 = Resnet-18, R34 = Resnet-34, W/S = Walking/Standing, C/R = Crouch/Reach. The unit of MPJPE is mm.
| Method | R18 | R34 | RDNS method |
|---|---|---|---|
| W/S | 95.9 | 97.63 |
|
| Exercise | 181.5 | 197.88 |
|
| Sitting (1) | 181.26 | 210.65 |
|
| C/R | 104.76 | 138.5 |
|
| On the Floor | 186.25 | 161.18 | 166.54 |
| Sports | 70.42 | 58.75 | 65.78 |
| Sitting (2) | 103.24 | 86.27 | 86.53 |
| Miscellaneous | 93.1 | 94.55 |
|