| Literature DB >> 32942561 |
Faegheh Sardari1, Adeline Paiement2, Sion Hannuna1, Majid Mirmehdi1.
Abstract
We propose a view-invariant method towards the assessment of the quality of human movements which does not rely on skeleton data. Our end-to-end convolutional neural network consists of two stages, where at first a view-invariant trajectory descriptor for each body joint is generated from RGB images, and then the collection of trajectories for all joints are processed by an adapted, pre-trained 2D convolutional neural network (CNN) (e.g., VGG-19 or ResNeXt-50) to learn the relationship amongst the different body parts and deliver a score for the movement quality. We release the only publicly-available, multi-view, non-skeleton, non-mocap, rehabilitation movement dataset (QMAR), and provide results for both cross-subject and cross-view scenarios on this dataset. We show that VI-Net achieves average rank correlation of 0.66 on cross-subject and 0.65 on unseen views when trained on only two views. We also evaluate the proposed method on the single-view rehabilitation dataset KIMORE and obtain 0.66 rank correlation against a baseline of 0.62.Entities:
Keywords: health monitoring; movement analysis; view-invariant convolutional neural network (CNN)
Year: 2020 PMID: 32942561 PMCID: PMC7570706 DOI: 10.3390/s20185258
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1VI-Net has an view-invariant trajectory descriptor module (VTDM) and a movement score module (MSM) where the classifier output corresponds to a quality score.
Figure 2Typical camera views in the QMAR dataset with each one placed at a different height.
Figure 3Sample frames from QMAR dataset, showing all 6 views for (top row) walking with Parkinsons (W-P), (second row) walking with Stroke (W-S), (third row) sit-stand with Parkinsons (SS-P), and (bottom row) sit-stand with Stroke.
Details of the movements in the QMAR dataset.
| Action | Quality Score | # Sequences | #Frames/Video Min-Max | Total Frames | |
|---|---|---|---|---|---|
|
| Normal | 0 | 41 | 62–179 | 12,672 |
|
| Abnormal | 1–4 | 40 | 93–441 | 33618 |
|
| Abnormal | 1–5 | 68 | 104–500 | 57,498 |
|
| Normal | 0 | 42 | 28–132 | 9250 |
|
| Abnormal | 1–12 | 41 | 96–558 | 41,808 |
|
| Abnormal | 1–5 | 74 | 51–580 | 47,954 |
Details of abnormality score ranges in the QMAR dataset.
| Score | #1 | #2 | #3 | #4 | #5 | #6 | #7 | #8 | #9 | #10 | #11 | #12 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Action | |||||||||||||
|
| 4 | 8 | 16 | 12 | - | - | - | - | - | - | - | - | |
|
| 10 | 14 | 19 | 15 | 10 | - | - | - | - | - | - | - | |
|
| 1 | 1 | 6 | 8 | 4 | 4 | 4 | 3 | 3 | 1 | 2 | 4 | |
|
| 3 | 19 | 19 | 13 | 20 | - | - | - | - | - | - | - | |
Figure 4Sample frames of KIMORE for five different exercises.
Figure 5Walking example—all six views, and corresponding trajectory maps for feet.
VI-Net’s modules: : n 2D convolution filters with size d and channel size, : 2D max pooling with size d, : FC layer with N outputs. T is the # of clip frames, J is the # of joints and S is maximum score for a movement type.
| VTDM | MSM (Adapted VGG-19 or ResNeXt-50) | |
|---|---|---|
|
| ||
|
|
| |
|
|
Figure 6Scoring process for a full video sequence in testing phase.
Comparative cross-subject results on QMAR. The bold numbers show the best result for each action type.
| Action | W-P | W-S | SS-P | SS-S | Avg | ||
|---|---|---|---|---|---|---|---|
| Method | |||||||
|
| 0.50 | 0.37 | 0.25 | 0.54 | 0.41 | ||
|
| 0.79 | 0.47 | 0.54 | 0.55 | 0.58 | ||
|
|
|
| 0.81 | 0.49 | 0.57 |
| 0.65 |
|
| 0.82 | 0.52 | 0.55 | 0.73 | 0.65 | ||
|
|
|
|
| 0.48 | 0.72 | 0.65 | |
|
|
| 0.52 |
| 0.69 |
| ||
Cross-view results for all actions with single-view training. The bold numbers show the best result for each view of each action type; Yellow highlights: best results for W-P and W-S actions amongst all views, Orange highlights: best result for SS-P and SS-S actions amongst all views.
| View | VTDM+MSM | VTDM+MSM | View | VTDM+MSM | VTDM+MSM | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| w/o STN | w STN | w/o STN | w STN | w/o STN | w STN | w/o STN | w STN | ||||
| W-P | 1 | 0.51 |
| 0.64 |
| W-S | 1 | 0.51 | 0.43 | 0.60 |
|
| 2 | 0.69 | 0.66 | 0.58 |
| 2 | 0.47 | 0.54 | 0.55 |
| ||
| 3 | 0.62 | 0.66 | 0.63 |
| 3 |
| 0.56 | 0.61 | 0.59 | ||
| 4 | 0.67 | 0.64 |
|
| 4 | 0.60 | 0.59 | 0.60 |
| ||
| 5 | 0.67 | 0.67 | 0.68 |
| 5 | 0.62 | 0.60 | 0.62 |
| ||
| 6 | 0.69 | 0.72 | 0.69 |
| 6 | 0.46 | 0.40 | 0.53 |
| ||
|
| 0.64 | 0.67 | 0.65 |
|
| 0.55 | 0.52 | 0.58 |
| ||
| SS-P | 1 | 0.30 |
| 0.25 | 0.25 | SS-S | 1 | 0.36 |
| 0.44 | 0.45 |
| 2 | 0.27 | 0.31 | 0.31 |
| 2 | 0.47 | 0.40 |
|
| ||
| 3 | 0.16 | 0.23 | 0.36 |
| 3 | 0.37 |
| 0.38 | 0.43 | ||
| 4 | 0.10 | 0.34 | 0.44 |
| 4 | 0.38 | 0.34 | 0.41 |
| ||
| 5 | 0.50 |
| 0.43 | 0.45 | 5 | 0.26 |
|
| 0.48 | ||
| 6 | 0.41 | 0.24 |
| 0.44 | 6 | 0.21 |
| 0.13 | 0.16 | ||
|
| 0.29 | 0.32 | 0.37 |
|
| 0.34 | 0.42 | 0.40 |
| ||
Cross-view results for all actions with two-view training. The bold numbers show the best result for each combination of views of each action type; Green highlights: best results for W-P and W-S actions amongst all view combinations, Purple highlights: best results for SS-P and SS-S actions amongst all view combinations.
| View | VTDM+MSM | VTDM+MSM | View | VTDM+MSM | VTDM+MSM | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| w/o STN | w STN | w/o STN | w STN | w/o STN | w STN | w/o STN | w STN | ||||
| W-P | 2,4 | 0.77 | 0.81 | 0.87 |
| W-S | 2,4 | 0.58 | 0.72 |
| 0.73 |
| 2,5 | 0.72 | 0.75 | 0.90 |
| 2,5 | 0.74 | 0.74 | 0.80 |
| ||
| 2,6 | 0.75 | 0.76 | 0.73 |
| 2,6 | 0.64 | 0.67 |
| 0.68 | ||
| 1,5 | 0.70 | 0.76 |
| 0.75 | 1,5 | 0.70 | 0.68 |
| 0.81 | ||
| 3,5 | 0.73 | 0.79 |
| 0.84 | 3,5 | 0.66 | 0.66 |
| 0.79 | ||
|
| 0.73 | 0.77 |
|
|
| 0.66 | 0.69 |
| 0.76 | ||
| SS-P | 2,4 |
| 0.52 | 0.41 | 0.46 | SS-S | 2,4 | 0.57 |
| 0.54 |
|
| 2,5 |
| 0.53 | 0.49 | 0.46 | 2,5 | 0.62 | 0.56 |
| 0.61 | ||
| 2,6 |
| 0.35 | 0.36 | 0.42 | 2,6 | 0.50 |
| 0.48 | 0.46 | ||
| 1,5 | 0.46 |
| 039 | 0.52 | 1,5 |
| 0.53 | 0.48 | 0.58 | ||
| 3,5 |
| 0.40 | 0.43 | 0.47 | 3,5 | 0.62 | 0.60 | 0.63 |
| ||
|
|
| 0.47 | 0.41 | 0.46 |
|
|
| 0.55 | 0.58 | ||
Comparative results on the single-view KIMORE dataset. The bold numbers show the best result for each action type.
| Action | Ex #1 | Ex #2 | Ex #3 | Ex #4 | Ex #5 | Average | ||
|---|---|---|---|---|---|---|---|---|
| Method | ||||||||
|
| 0.66 | 0.64 |
| 0.59 | 0.60 | 0.62 | ||
|
| 0.45 | 0.56 | 0.57 | 0.64 | 0.58 | 0.56 | ||
|
|
|
| 0.63 | 0.50 | 0.55 |
|
| 0.64 |
|
|
|
| 0.57 | 0.59 | 0.70 |
| ||
|
|
| 0.55 | 0.42 | 0.33 | 0.62 | 0.57 | 0.49 | |
|
| 0.55 | 0.62 | 0.36 | 0.58 | 0.67 | 0.55 | ||