| Literature DB >> 36015812 |
Theocharis Chatzis1, Dimitrios Konstantinidis1, Kosmas Dimitropoulos1.
Abstract
Ergonomic risk assessment is vital for identifying work-related human postures that can be detrimental to the health of a worker. Traditionally, ergonomic risks are reported by human experts through time-consuming and error-prone procedures; however, automatic algorithmic methods have recently started to emerge. To further facilitate the automatic ergonomic risk assessment, this paper proposes a novel variational deep learning architecture to estimate the ergonomic risk of any work-related task by utilizing the Rapid Entire Body Assessment (REBA) framework. The proposed method relies on the processing of RGB images and the extraction of 3D skeletal information that is then fed to a novel deep network for accurate and robust estimation of REBA scores for both individual body parts and the entire body. Through a variational approach, the proposed method processes the skeletal information to construct a descriptive skeletal latent space that can accurately model human postures. Moreover, the proposed method distills knowledge from ground truth ergonomic risk scores and leverages it to further enhance the discrimination ability of the skeletal latent space, leading to improved accuracy. Experiments on two well-known datasets (i.e., University of Washington Indoor Object Manipulation (UW-IOM) and Technische Universität München (TUM) Kitchen) validate the ability of the proposed method to achieve accurate results, overcoming current state-of-the-art methods.Entities:
Keywords: computer vision; deep learning; ergonomic risk assessment; work-related musculoskeletal disorders
Mesh:
Year: 2022 PMID: 36015812 PMCID: PMC9416453 DOI: 10.3390/s22166051
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1An overview of the proposed variational framework. Given an input image, 3D pose information is extracted from any human pose estimation algorithm and fed into a multi-stream encoder and a multi-layer Transformer encoder to model local and global joint interactions and generate the skeletal latent space. A second variational branch is employed to derive the true posterior distribution of REBA scores. Finally, the variational aligning process aims to bring the computed skeletal latent space closer to the one related to the ground truth REBA scores, enhancing the discrimination ability of the skeletal latent space and improving the ergonomic risk assessment results. The operator ⊗ denotes concatenation.
The selected joints employed in the proposed methodology.
| Selected Joints | |
|---|---|
| Head | Left Wrist |
| Neck | Right Hip |
| Right Shoulder | Left Hip |
| Left Shoulder | Right Knee |
| Right Elbow | Right Elbow |
| Left Elbow | Right Ankle |
| Right Wrist | Left Ankle |
The proposed network employs three streams that contain: trunk joints, arms joints, and legs joints.
| Trunk Stream | Arms Stream | Legs Stream |
|---|---|---|
| Head | Head | Head |
| Neck | Neck | Neck |
| Right Shoulder | Right Hip | Right Shoulder |
| Left Shoulder | Left Hip | Left Shoulder |
| Right Hip | Right Knee | Right Elbow |
| Left Hip | Left Knee | Right Elbow |
| Right Ankle | Right Wrist | |
| Left Ankle | Right Wrist |
Figure 2An overview of (a) the multi-stream encoder and (b) the multi-layer Transformer encoder . The multi-stream encoder performs feature vector upsampling to each input stream of dimensionality K with the purpose to model the joint local relationships. The Transformer encoder performs self-attention through three encoder blocks and dimensionality reduction using linear projections. The final output is a pair of vectors (,) that composes the skeletal latent distribution. Each encoder block has N layers and H attention heads. P denotes the dimension of the concatenated input skeletal feature vector.
Comparison against state-of-the-art approaches in the UW-IOM and TUM Kitchen datasets.
| Method | UW-IOM | TUM Kitchen |
|---|---|---|
| MTL-base | 0.89 ± 0.24 | 1.18 ± 0.68 |
| MTL-emb | 0.61 ± 0.36 | 1.11 ± 0.38 |
| MSDN | 0.31 ± 0.04 | 0.28 ± 0.03 |
| Proposed | 0.297 ± 0.03 | 0.265 ± 0.04 |
Performance of the proposed method in terms of partial and total REBA scores in the UW-IOM dataset.
| REBA Scores | UW-IOM | |||||
|---|---|---|---|---|---|---|
| Proposed | MSDN | |||||
| MSE | MAE | RMSE | MSE | MAE | RMSE | |
| Neck | 0.024 ± 0.002 | 0.113 ± 0.02 | 0.168 ± 0.04 | 0.03 ± 0.003 | 0.117 ± 0.03 | 0.171 ± 0.03 |
| Trunk | 0.075 ± 0.061 | 0.211 ± 0.03 | 0.278 ± 0.03 | 0.079 ± 0.065 | 0.217 ± 0.02 | 0.281 ± 0.04 |
| Legs | 0.071 ± 0.082 | 0.186 ± 0.03 | 0.261 ± 0.05 | 0.075 ± 0.077 | 0.192 ± 0.04 | 0.265 ± 0.05 |
| Upper arms | 0.095 ± 0.029 | 0.221 ± 0.04 | 0.301 ± 0.04 | 0.098 ± 0.023 | 0.226 ± 0.05 | 0.306 ± 0.03 |
| Lower arms | 0.015 ± 0.002 | 0.076 ± 0.03 | 0.119 ± 0.03 | 0.017 ± 0.001 | 0.079 ± 0.03 | 0.121 ± 0.04 |
| Total | 0.297 ± 0.032 | 0.377 ± 0.04 | 0.531 ± 0.06 | 0.31 ± 0.04 | 0.395 ± 0.05 | 0.557 ± 0.07 |
Performance of the proposed method in terms of partial and total REBA scores in the TUM Kitchen dataset.
| REBA Scores | TUM-Kitchen | |||||
|---|---|---|---|---|---|---|
| Proposed | MSDN | |||||
| MSE | MAE | RMSE | MSE | MAE | RMSE | |
| Neck | 0.018 ± 0.002 | 0.090 ± 0.02 | 0.135 ± 0.04 | 0.02 ± 0.003 | 0.093 ± 0.02 | 0.139 ± 0.03 |
| Trunk | 0.053 ± 0.009 | 0.159 ± 0.03 | 0.228 ± 0.03 | 0.055 ± 0.008 | 0.161 ± 0.03 | 0.236 ± 0.04 |
| Legs | 0.085 ± 0.005 | 0.213 ± 0.04 | 0.276 ± 0.05 | 0.088 ± 0.002 | 0.221 ± 0.05 | 0.286 ± 0.04 |
| Upper arms | 0.061 ± 0.018 | 0.186 ± 0.03 | 0.242 ± 0.03 | 0.065 ± 0.016 | 0.194 ± 0.04 | 0.251 ± 0.03 |
| Lower arms | 0.012 ± 0.002 | 0.067 ± 0.02 | 0.109 ± 0.02 | 0.013 ± 0.002 | 0.069 ± 0.02 | 0.114 ± 0.02 |
| Total | 0.265 ± 0.042 | 0.364 ± 0.04 | 0.499 ± 0.05 | 0.28 ± 0.03 | 0.389 ± 0.05 | 0.529 ± 0.06 |
Ground truth and predicted REBA risk level distribution on the UW-IOM and TUM Kitchen datasets.
| REBA Risk Level | UW-IOM | TUM Kitchen | ||
|---|---|---|---|---|
| Ground Truth | Predicted | Ground Truth | Predicted | |
| Negligible | 0% | 0% | 0% | 0% |
| Low | 12.32% | 3.84% | 12.07% | 6.24% |
| Medium | 79.34% | 89.23% | 76.44% | 83.79% |
| High | 8.32% | 6.91% | 11.42% | 9.89% |
| Very High | 0.02% | 0.02% | 0.07% | 0.08% |
Figure 3Visualization of ground truth (red lines) and predicted (blue lines) partial and total REBA scores on a video sequence from the UW-IOM dataset. At the top, five frames (a–e) that correspond to extreme postures with high ergonomic risks are displayed, while REBA scores for individual body parts and the whole body follows.