| Literature DB >> 34209582 |
Cristian Kaori Valencia-Marin1, Juan Diego Pulgarin-Giraldo2, Luisa Fernanda Velasquez-Martinez3, Andres Marino Alvarez-Meza3, German Castellanos-Dominguez3.
Abstract
Motion capture (Mocap) data are widely used as time series to study human movement. Indeed, animation movies, video games, and biomechanical systems for rehabilitation are significant applications related to Mocap data. However, classifying multi-channel time series from Mocap requires coding the intrinsic dependencies (even nonlinear relationships) between human body joints. Furthermore, the same human action may have variations because the individual alters their movement and therefore the inter/intraclass variability. Here, we introduce an enhanced Hilbert embedding-based approach from a cross-covariance operator, termed EHECCO, to map the input Mocap time series to a tensor space built from both 3D skeletal joints and a principal component analysis-based projection. Obtained results demonstrate how EHECCO represents and discriminates joint probability distributions as kernel-based evaluation of input time series within a tensor reproducing kernel Hilbert space (RKHS). Our approach achieves competitive classification results for style/subject and action recognition tasks on well-known publicly available databases. Moreover, EHECCO favors the interpretation of relevant anthropometric variables correlated with players' expertise and acted movement on a Tennis-Mocap database (also publicly available with this work). Thereby, our EHECCO-based framework provides a unified representation (through the tensor RKHS) of the Mocap time series to compute linear correlations between a coded metric from joint distributions and player properties, i.e., age, body measurements, and sport movement (action class).Entities:
Keywords: Hilbert embedding; Mocap data; classification; joint distribution; time series
Mesh:
Year: 2021 PMID: 34209582 PMCID: PMC8271882 DOI: 10.3390/s21134443
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Schematic illustration of our EHECCO-based metric. Input spaces and are mapped to RKHSs and respectively. Then, the tensor space is built using a cross-covariance operator strategy.
Figure 2EHECCO-based Mocap data classification framework. Hip joint normalization and spectral clustering-based codebook generation are carried out to extract relevant skeletal poses. Then, 3D joint representation () and PCA-based latent projection ( ) are used to support the EHECCO metric from joint probability. Lastly, an SVM classifier is trained from the EHECCO distance that also supports 2D data visualization.
Tennis dataset’s anthropomorphic measurements. The color represents the measurement group: age (brown), weight (light green), length (red), perimeters (blue), fat fold (pink), and tennis move (black).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Forehand (FORE) |
|
|
|
| Smash (SMA) |
|
|
|
| Backhand (BAC) |
|
|
|
| Serve (SER) |
|
|
|
| Volley (VOL) |
|
|
|
| Backhand Volley (BAV) |
Figure 3Illustrative results for codebook generation and latent space-based representation (HDM05 and CMU subset datasets). Top: Codebook generation for a Mocap video of the throwing high with the right hand while standing class (HDM05). Middle: Codebook generation for a Mocap record of boxing class (CMU subset). Bottom left: PCA-based latent space for HDM05 video. Bottom right: PCA-based latent space for CMU subset video. The first two components are shown for visualization purposes. Black markers represent the original input Mocap frames (time series). Color markers represent the chosen frames (codebook).
Figure 4EHECCO-based classification results for HDM05 and CMU subset databases. Top left: HDM05’s confusion matrix (style/subject recognition). Top right: HDM05 t-SNE-based 2D projection from EHECCO distance. Bottom left: CMU subset’s confusion matrix (action recognition). Bottom right: CMU subset t-SNE-based 2D projection from EHECCO distance.
Comparing results of Mocap-based style/subject recognition (HDM05 dataset). The average accuracy is reported concerning the cited works vs. our approach—EHECCO+SVM.
| Method | Accuracy (%) |
|---|---|
| SPDNet [ | 61.45 |
| SE [ | 70.26 |
| SO [ | 71.31 |
| LieNet [ | 75.78 |
| Seq2Im+SVM [ | 70.70 |
| Seq2Im+KNN [ | 66.82 |
| Seq2IM+RF [ | 80.62 |
| Seq2Im+CNN (fine-tuning) [ | 83.33 |
| EHECCO+SVM | 88.80 |
Comparing results of Mocap-based action recognition (CMU subset database). The average accuracy is reported concerning the cited works vs. our approach—EHECCO+SVM.
| Method | Accuracy (%) |
|---|---|
| MT+DTW [ | 82.9 |
| SSM+DTW [ | 85.3 |
| EMR [ | 86.7 |
| MW+CNN [ | 90.7 |
| EHECCO+SVM | 90.0 |
Figure 5Illustrative results for codebook generation (Tennis-Mocap dataset). Top: forehand; Middle: volley; Bottom: Smash.
Figure 6EHECCO-based classification and anthropomorphic measurement results for Tennis-Mocap database. Top left: confusion matrix (action recognition). Top right: t-SNE-based 2D projection from EHECCO distance. Bottom left: Absolute value of the Pearson’s correlation coefficient between the EHECCO first t-SNE-based mean projection of each player’s videos and his/her anthropomorphic measurements. The most relevant correlations are shown.