| Literature DB >> 35991679 |
Hong-Bo Zhang1,2, Li-Jia Dong1,3, Qing Lei2, Li-Jie Yang3, Ji-Xiang Du3.
Abstract
Most existing action quality assessment (AQA) methods provide only an overall quality score for the input video and lack an evaluation of each substage of the movement process; thus, these methods cannot provide detailed feedback for users. Moreover, the existing datasets do not provide labels for substage quality assessment. To address these problems, in this work, a new label-reconstruction-based pseudo-subscore learning (PSL) method is proposed for AQA in sporting events. In the proposed method, the overall score of an action is not only regarded as a quality label but also used as a feature of the training set. A label-reconstruction-based learning algorithm is built to generate pseudo-subscore labels for the training set. Moreover, based on the pseudo-subscore labels and overall score labels, a multi-substage AQA model is fine-tuned from the PSL model to predict the action quality score of each substage and the overall score for an athlete. Several ablation experiments are performed to verify the effectiveness of each module. The experimental results show that our approach achieves state-of-the-art performance.Entities:
Keywords: Action quality assessment; Label reconstruction; Multi-substage AQA model; Pseudo-subscore learning; Substage quality assessment
Year: 2022 PMID: 35991679 PMCID: PMC9374585 DOI: 10.1007/s10489-022-03984-5
Source DB: PubMed Journal: Appl Intell (Dordr) ISSN: 0924-669X Impact factor: 5.019
Fig. 1Example of multi-substage AQA in diving
Fig. 2Calculation process applied in this work. It involves temporal semantic segmentation, feature representation, a PSL model, and a multi-substage AQA model
Fig. 3Illustration of the PSL model. It consists of a feature extractor, a subscore generator and an overall score generator
Fig. 4Illustration of the multi-substage AQA regression model
Ablation experiments of the proposed method
| Method | Original task | Final ranking prediction task | ||||
|---|---|---|---|---|---|---|
| SRC | MSE | MED | SRC | MSE | MED | |
| OSL | 0.78 | 130.71 | 8.59 | 0.34 | 165.99 | 11.24 |
| OSL+PSL (TS) | 0.78 | 60.59 | 6.23 | 0.55 | 84.81 | 7.85 |
| OSL+PSL (FTPSL) | 0.80 | 86.37 | 7.59 | 0.61 | 70.76 | 7.06 |
| ESL | 0.87 | 85.24 | 5.66 | 0.74 | 52.68 | 6.60 |
| ESL+PSL (TS) | 0.85 | 38.99 | 4.98 | 0.81 | 42.53 | 3.59 |
| ESL+PSL (FTPSL) | 0.87 | 38.68 | 4.80 | 0.75 | 51.68 | 5.15 |
Comparison of the results of different fully connected (FC) network structures
| FC Structure | SRC | MSE | MED |
|---|---|---|---|
| 2049-1024-512-64-16-1 | 0.81 | 96.84 | 6.86 |
| 2049-64-16-1 | 0.81 | 84.64 | 6.53 |
| 2049-16-1 | 0.82 | 99.43 | 7.43 |
| 2049-1 | 0.84 | 61.64 | 5.81 |
The structure used in our paper is indicated in bold
Comparison of the results of the proposed method with the results of state-of-the-art methods
| Method | DL | SRC | MED | MSE |
|---|---|---|---|---|
| S3D [ | × | 0.86 | 6.90 | 97.46 |
| C3D-AVG-STL [ | × | 0.83 | - | - |
| JRG [ | × | 0.76 | - | - |
| AIM [ | × | 0.74 | - | - |
| Metric Learning[ | × | 0.76 | - | 105.62 |
| EAGLE-Eye [ | × | 0.83 | - | - |
| SCN+ATCN [ | × | 0.85 | - | - |
| USDL [ | × | 0.81 | - | - |
| Adaptive [ | × | 0.83 | - | - |
| C3D+SVR [ |
| 0.74 | - | - |
| C3D+CNN [ |
| 0.80 | 7.78 | - |
| ScoringNet [ |
| 0.84 | 5.36 | - |
| FALCONS [ |
| 0.85 | - | - |
| MRSM (ESL) [ | 0.88 | - | 73.92 | |
| Ours (OSL+TS) | × | 0.78 | 6.23 | 60.59 |
| Ours (OSL+FTPSL) | × | 0.80 | 7.59 | 86.37 |
| Ours (ESL+TS) |
| 0.85 | 4.98 | 38.99 |
| Ours (ESL+FTPSL) |
|
DL: difficulty level
The bold entries represents the best result
Comparison of the proposed method with S3D on the finals ranking prediction task
| Method | SRC | MSE | MED | A-SRC | A-MSE | A-MED |
|---|---|---|---|---|---|---|
| S3D (full video) | 0.59 | 101.55 | 8.45 | 0.63 | 4179.87 | 59.12 |
| S3D (entry substage) | 0.61 | 78.00 | 8.12 | 0.78 | 4381.17 | 62.56 |
| Ours (OSL+TS) | 0.55 | 84.81 | 7.85 | 1612.82 | 36.60 | |
| Ours (OSL+FTPSL) | 0.61 | 70.76 | 7.06 | 0.75 | 1352.71 | 33.06 |
| Ours (ESL+TS) | 0.81 | |||||
| Ours (ESL+FTPSL) | 0.75 | 51.68 | 5.18 | 748.62 | 23.57 |
The bold entries represents the best result
Fig. 5Predicted and ground-truth scores of the 12 athletes in the finals. The abscissa presents the athlete identifiers, and the ordinate indicates the total scores
Fig. 6Visualization of the execution scores for the individual substages of the test videos
Fig. 7Execution scores for the last two substages of selected samples