| Literature DB >> 33291687 |
Elise Klæbo Vonstad1, Xiaomeng Su1, Beatrix Vereijken2, Kerstin Bach1, Jan Harald Nilsen1.
Abstract
Using standard digital cameras in combination with deep learning (DL) for pose estimation is promising for the in-home and independent use of exercise games (exergames). We need to investigate to what extent such DL-based systems can provide satisfying accuracy on exergame relevant measures. Our study assesses temporal variation (i.e., variability) in body segment lengths, while using a Deep Learning image processing tool (DeepLabCut, DLC) on two-dimensional (2D) video. This variability is then compared with a gold-standard, marker-based three-dimensional Motion Capturing system (3DMoCap, Qualisys AB), and a 3D RGB-depth camera system (Kinect V2, Microsoft Inc). Simultaneous data were collected from all three systems, while participants (N = 12) played a custom balance training exergame. The pose estimation DLC-model is pre-trained on a large-scale dataset (ImageNet) and optimized with context-specific pose annotated images. Wilcoxon's signed-rank test was performed in order to assess the statistical significance of the differences in variability between systems. The results showed that the DLC method performs comparably to the Kinect and, in some segments, even to the 3DMoCap gold standard system with regard to variability. These results are promising for making exergames more accessible and easier to use, thereby increasing their availability for in-home exercise.Entities:
Keywords: deep learning; exergaming; human movement; image analysis; kinect; markerless motion capture; motion capture; segment lengths
Mesh:
Year: 2020 PMID: 33291687 PMCID: PMC7730529 DOI: 10.3390/s20236940
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Screenshots from the game. (A) shows the start of the game, and (B) shows the cart leaning sideways with the movements of the player to hit coins along the track.
Figure 2Experimental setup.
Figure 3Joint centers as defined by the three motion capture systems (not to scale). (A) = DeepLabCut, (B) = 3DMoCap, (C) = Kinect.
Figure 4Joint locations, axes directions, and segment length definitions extracted from all three motion capture systems.
Mean segment lengths (mm (1SD)). L = left, R = Right. N = data from number of participants. 3DMoCap = 3D motion capture system, DLC = DeepLabCut.
| Segment | Side | 3DMoCap | DLC | Kinect | |||
|---|---|---|---|---|---|---|---|
|
|
|
| |||||
| Shoulders | 11 | 328.8 (23.5) | 12 | 308.8 (25.5) | 12 | 330.9 (20.2) | |
| Upper arm | L | 11 | 269.5 (19.1) | 12 | 351.0 (20.5) | 12 | 269.8 (15.9) |
| R | 11 | 279.5 (22.6) | 12 | 357.8 (23.2) | 12 | 267.1 (13.2) | |
| Lower arm | L | 11 | 228.5 (20.4) | 12 | 228.7 (16.6) | 12 | 235.8 (13.3) |
| R | 11 | 225.1 (12.8) | 12 | 231.9 (16.1) | 12 | 235.3 (14.8) | |
| Torso | L | 11 | 444.7 (27.8) | 12 | 568.5 (33.7) | 12 | 503.8 (27.4) |
| R | 11 | 439.9 (25.9) | 12 | 566.1 (38.3) | 12 | 497.9 (27.6) | |
| Pelvis | 11 | 148.6 (5.6) | 12 | 280.6 (28.5) | 12 | 154.8 (9.5) | |
| Thigh | L | 11 | 409.0 (33.2) | 12 | 405.9 (21.9) | 12 | 373.8 (26.1) |
| R | 11 | 410.6 (33.0) | 12 | 411.9 (27.1) | 12 | 372.5 (29.4) | |
| Shank | L | 11 | 404.9 (23.4) | 8 | 415.2 (34.0) | 12 | 378.8 (29.0) |
| R | 11 | 402.8 (22.4) | 8 | 414.4 (33.5) | 12 | 374.3 (27.3) |
Mean standard deviation (mm, (coefficient of variation)) of segment lengths. L = left, R = Right. N = data from number of participants. 3DMoCap = 3D motion capture system, DLC = DeepLabCut. Light green = lowest mean SD within system; light red = highest SD within system. Bright green = overall lowest mean SD; bright red = overall highest mean SD.
| Segment | Side | 3DMoCap | DLC | Kinect | |||
|---|---|---|---|---|---|---|---|
|
|
|
| |||||
| Shoulders | 11 | 9.1 (0.02) | 12 | 16.6 (0.04) | 12 | 17.3 (0.05) | |
| Upper arm | L | 11 | 7.4 (0.03) | 12 | 11.7 (0.04) | 12 | 15.1 (0.05) |
| R | 11 | 7.3 (0.02) | 12 | 13.0 (0.04) | 12 | 15.2 (0.06) | |
| Lower arm | L | 11 | 9.6 (0.04) | 12 | 14.4 (0.08) | 12 | 13.7 (0.06) |
| R | 11 | 10.2 (0.04) | 12 | 20.4 (0.08) | 12 | 13.3 (0.05) | |
| Torso | L | 11 | 15.9 (0.03) | 12 | 22.5 (0.04) | 12 | 12.8 (0.02) |
| R | 11 | 15.7 (0.09) | 12 | 22.5 (0.03) | 12 | 13.1 (0.02) | |
| Pelvis | 11 | 2.8 (0.01) | 12 | 7.3 (0.04) | 12 | 6.1 (0.03) | |
| Thigh | L | 11 | 8.3 (0.02) | 12 | 16.4 (0.03) | 12 | 25.5 (0.06) |
| R | 11 | 8.7 (0.02) | 12 | 20.5 (0.04) | 12 | 23.1 (0.06) | |
| Shank | L | 11 | 8.6 (0.02) | 8 | 14.5 (0.02) | 12 | 21.1 (0.05) |
| R | 11 | 8.6 (0.02) | 8 | 13.6 (0.02) | 12 | 20.5 (0.05) |
Chi-square (), p-value, and mean ranks from the Friedman test of statistical difference between mean segment length standard deviation. Df = degrees of freedom. L = left, R = Right. 3DMoCap = 3D motion capture system, DLC = DeepLabCut.
| Segment | Side | Mean Rank | ||||
|---|---|---|---|---|---|---|
|
|
|
|
|
| ||
| Upper arm | L | 3.8 (2) | 0.148 | 1.55 | 2.09 | 2.36 |
| R | 8.7 (2) | 0.023 | 1.27 | 2.36 | 2.36 | |
| Lower arm | L | 11.6 (2) | 0.003 | 1.27 | 2.73 | 2.0 |
| R | 7.81 (2) | 0.020 | 1.45 | 2.64 | 1.91 | |
| Shoulders | 11.1 (2) | 0.004 | 1.18 | 2.45 | 2.36 | |
| Torso | L | 5.6 (2) | 0.060 | 1.91 | 2.55 | 1.55 |
| R | 5.5 (2) | 0.103 | 2.00 | 2.45 | 1.55 | |
| Pelvis | 20.2 (2) | 0.000 | 1.09 | 3.00 | 1.91 | |
| Thigh | L | 16.5 (2) | 0.000 | 1.18 | 1.91 | 2.91 |
| R | 16.9 (2) | 0.000 | 1.0 | 2.36 | 2.64 | |
| Shank | L | 4.6 (2) | 0.102 | 1.43 | 2.00 | 2.57 |
| R | 8.9 (2) | 0.012 | 1.14 | 2.14 | 2.71 |
Figure 5Comparison of variations in temporal segment lengths of the shoulder (A) and right shank (B). Image recording frequency 30 Hz.
Figure 6Box plots of variation of standard deviations of upper arms (A), lower arms (B), torso (C), shoulders and pelvis (D), thighs (E), and shanks (F) for the left and right side of the body. 3DMoCap = 3D motion capture system, DLC = DeepLabCut. Dotted lines signify p > 0.017, solid lines p < 0.017 from Wilcoxons Signed Rank test.