| Literature DB >> 35052200 |
Yubin Wu1, Qianqian Lin1, Mingrun Yang1, Jing Liu1, Jing Tian1, Dev Kapil2, Laura Vanderbloemen3,4.
Abstract
The main objective of yoga pose grading is to assess the input yoga pose and compare it to a standard pose in order to provide a quantitative evaluation as a grade. In this paper, a computer vision-based yoga pose grading approach is proposed using contrastive skeleton feature representations. First, the proposed approach extracts human body skeleton keypoints from the input yoga pose image and then feeds their coordinates into a pose feature encoder, which is trained using contrastive triplet examples; finally, a comparison of similar encoded pose features is made. Furthermore, to tackle the inherent challenge of composing contrastive examples in pose feature encoding, this paper proposes a new strategy to use both a coarse triplet example-comprised of an anchor, a positive example from the same category, and a negative example from a different category, and a fine triplet example-comprised of an anchor, a positive example, and a negative example from the same category with different pose qualities. Extensive experiments are conducted using two benchmark datasets to demonstrate the superior performance of the proposed approach.Entities:
Keywords: contrastive learning; deep learning; skeleton extraction; yoga pose classification; yoga pose grading
Year: 2021 PMID: 35052200 PMCID: PMC8775687 DOI: 10.3390/healthcare10010036
Source DB: PubMed Journal: Healthcare (Basel) ISSN: 2227-9032
An overview of related yoga pose classification and yoga pose grading research works in the literature. “−” means “not applicable”.
| Data | Method | Year | Pose | Pose | Number of Pose | Remark |
|---|---|---|---|---|---|---|
| Wearable | [ | 2019 |
|
| 18 | Neural network and IMU data |
| [ | 2021 | − | − | − | Pose measurement | |
| [ | 2014 |
| − | 12 | Body contour-based matching | |
| Kinect-based | [ | 2018 |
| − | 6 | Adaboost |
| [ | 2018 |
| − | 5 | Pose-based matching | |
| Computer | [ | 2019 |
| − | 6 | OpenPose + CNN-LSTM for video |
| [ | 2019 |
| − | 42 | Motion capture image + CNN | |
| [ | 2019 |
| − | 26 | Image-based CNN | |
| [ | 2020 |
| − | 6 | OpenPose + CNN | |
| [ | 2020 |
| − | 6 | Rule-based classification | |
| [ | 2020 |
| − | 82 | Image-based CNN | |
| [ | 2021 |
| − | 10 | Image-based CNN | |
| [ | 2021 |
| − | 14 | Image-based CNN | |
| [ | 2021 |
| − | 10 | 3D CNN for video | |
| [ | 2011 | − |
| − | Handcrafted SURF feature of the pose image | |
| [ | 2018 | − |
| 12 | Domain knowledge to check skeleton keypoints | |
| [ | 2021 | − |
| 5 | Domain knowledge to check skeleton keypoints | |
| [ | 2021 | − |
| 21 | Domain knowledge to check skeleton keypoints | |
| Ours | − | − |
| 45 | Contrastive skeleton feature representations |
Figure 1A conceptual overview of the proposed yoga pose grading framework. The model training process consists of three key components: (i) construction of contrastive examples, (ii) skeleton extraction, (iii) pose feature encoding using contrastive skeleton feature representations. The model inference process consists of (i) skeleton extraction, (ii) pose feature encoder, and (iii) feature similarity comparison. Both skeleton extraction and pose feature encoder are the same in these two processes.
Figure 2A comparison between (a) the coarse triplet example and (b) the fine triplet example. The coarse triplet example consists of one anchor from Salabhasana, one positive example from Salabhasana, and one negative example from a different category such as Chaturanga Dandasana. The fine triplet example consists of three examples from the same category such as Salabhasana; however, they have different pose grades: high-quality, medium-quality, low-quality (for the images from the left to the right, respectively).
Figure 3The detailed network architecture of the pose feature encoder that is used in the proposed framework.
Figure 4An overview of 45 categories of yoga poses in Dataset A.
Figure 5Examples of our yoga pose grading image in Dataset B. Three images are selected from the category Utthita Trikonasana. These images have low, medium, and high grades, respectively (from the left to the right).
Yoga pose grading performance comparison. The best performance is indicated by the bold fonts.
| Method | Dataset A | Dataset B | |||
|---|---|---|---|---|---|
| Accuracy | Precision | Recall | F1 | Accuracy | |
| Baseline Approach 1 | 0.7953 |
| 0.5942 | 0.7438 | 0.5709 |
| Baseline Approach 2 |
| 0.9911 | 0.6715 | 0.8006 | 0.6004 |
| Proposed Approach | 0.8321 | 0.8819 |
|
|
|
The ablation study of how the proposed contrastive examples contribute to the final pose grading performance of the proposed approach. The best performance is indicated by the bold fonts.
| Proposed Approach | Dataset A | Dataset B | |||
|---|---|---|---|---|---|
| Accuracy | Precision | Recall | F1 | Accuracy | |
| Coarse contrastive examples only | 0.7760 | 0.6961 |
| 0.8138 | 0.5827 |
| Both coarse and fine contrastive examples |
|
| 0.7669 |
|
|