| Literature DB >> 35669063 |
Anthony Bourached1, Ryan-Rhys Griffiths2, Robert Gray1, Ashwani Jha1, Parashkev Nachev1.
Abstract
The task of predicting human motion is complicated by the natural heterogeneity and compositionality of actions, necessitating robustness to distributional shifts as far as out-of-distribution (OoD). Here, we formulate a new OoD benchmark based on the Human3.6M and Carnegie Mellon University (CMU) motion capture datasets, and introduce a hybrid framework for hardening discriminative architectures to OoD failure by augmenting them with a generative model. When applied to current state-of-the-art discriminative models, we show that the proposed approach improves OoD robustness without sacrificing in-distribution performance, and can theoretically facilitate model interpretability. We suggest human motion predictors ought to be constructed with OoD challenges in mind, and provide an extensible general framework for hardening diverse discriminative architectures to extreme distributional shift. The code is available at: https://github.com/bouracha/OoDMotion.Entities:
Keywords: deep learning; generative models; human motion prediction; variational autoencoders
Year: 2022 PMID: 35669063 PMCID: PMC9159682 DOI: 10.1002/ail2.63
Source DB: PubMed Journal: Appl AI Lett ISSN: 2689-5595
FIGURE A6Network architecture with discriminative and variational autoencoder (VAE) branch
FIGURE A7Graph convolutional layer (GCL) and a residual graph convolutional block (GCB)
FIGURE 1Graph convolutional network (GCN) network architecture with variational autoencoder (VAE) branch. Here, is the number of latent variables per joint
Short‐term prediction of Euclidean distance between predicted and ground truth joint angles on H3.6M
| Walking (ID) | Eating (OoD) | Smoking (OoD) | Average (of 14 for OoD) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Milliseconds | 160 | 320 | 400 | 160 | 320 | 400 | 160 | 320 | 400 | 160 | 320 | 400 |
| GCN (OoD) |
| 0.60 | 0.65 | 0.38 | 0.65 | 0.79 | 0.55 | 1.08 | 1.10 | 0.69 | 1.09 | 1.27 |
| SD | 0.008 | 0.008 | 0.01 | 0.01 | 0.03 | 0.04 | 0.01 | 0.02 | 0.02 | 0.02 | 0.04 | 0.04 |
| Ours (OoD) |
|
|
|
|
|
|
|
|
|
|
|
|
| SD | 0.004 | 0.03 | 0.03 | 0.01 | 0.03 | 0.04 | 0.01 | 0.01 | 0.02 |
|
|
|
Note: Each experiment conducted three times. We report the mean and SD. Note that we have lower variance in our results. Full table is given in Table A1. Bold values correspond to the best score for the respective simulation across the different models.
Abbreviations: GCN, graph convolutional network; OoD, out‐of‐distribution.
Long‐term prediction of Eucildean distance between predicted and ground truth joint angles on H3.6M
| Walking | Eating | Smoking | Discussion | Average | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Milliseconds | 560 | 1000 | 560 | 1000 | 560 | 1000 | 560 | 1000 | 560 | 1000 |
| GCN (OoD) | 0.80 | 0.80 |
| 1.20 | 1.26 | 1.85 | 1.45 |
| 1.10 | 1.43 |
| Ours (OoD) |
|
| 0.90 |
|
|
|
| 1.90 |
|
|
Note: Bold correspond to lowest values.
Abbreviations: GCN, graph convolutional network; OoD, out‐of‐distribution.
Short‐term prediction of Euclidean distance between predicted and ground truth joint angles on H3.6M
| Walking (ID) | Eating (OoD) | Smoking (OoD) | Discussion (OoD) | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Milliseconds | 80 | 160 | 320 | 400 | 80 | 160 | 320 | 400 | 80 | 160 | 320 | 400 | 80 | 160 | 320 | 400 |
| GCN (OoD) |
|
| 0.60 | 0.65 | 0.22 | 0.38 | 0.65 | 0.79 |
| 0.55 | 1.08 | 1.10 |
|
| 0.98 | 1.08 |
| SD | 0.001 | 0.008 | 0.008 | 0.01 | 0.003 | 0.01 | 0.03 | 0.04 | 0.01 | 0.01 | 0.02 | 0.02 | 0.004 | 0.01 | 0.04 | 0.04 |
| Ours (OoD) | 0.23 |
|
|
|
|
|
|
|
|
|
|
| 0.31 |
|
|
|
| SD | 0.003 | 0.004 | 0.03 | 0.03 | 0.008 | 0.01 | 0.03 | 0.04 | 0.005 | 0.01 | 0.01 | 0.02 | 0.005 | 0.009 | 0.02 | 0.01 |
Note: Each experiment conducted three times. We report the mean and standard deviation. Note that we have lower variance in our results.
Abbreviations: GCN, graph convolutional network; ID, in‐distribution; OoD, out‐of‐distribution.
Euclidean distance between predicted and ground truth joint angles on CMU
| Basketball (ID) | Basketball signal (OoD) | Average (of 7 for OoD) | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Milliseconds | 80 | 160 | 320 | 400 | 1000 | 80 | 160 | 320 | 400 | 1000 | 80 | 160 | 320 | 400 | 1000 |
| GCN |
| 0.67 |
|
|
|
|
|
|
| 2.18 | 0.36 | 0.65 | 1.41 | 1.49 | 2.17 |
| Ours |
|
| 1.12 | 1.29 | 1.76 | 0.28 | 0.57 | 1.15 | 1.43 |
|
|
|
|
|
|
Note: Full table is given in Table A2. Bold values correspond to the best score for the respective simulation across the different models.
Abbreviations: GCN, graph convolutional network; ID, in‐distribution; OoD, out‐of‐distribution.
Mean joint per position error (MPJPE) between predicted and ground truth three‐dimensional Cartesian coordinates of joints on CMU
| Basketball | Basketball signal | Average (of 7 for OoD) | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Milliseconds | 80 | 160 | 320 | 400 | 1000 | 80 | 160 | 320 | 400 | 1000 | 80 | 160 | 320 | 400 | 1000 |
| GCN (OoD) |
|
|
|
| 108.4 | 14.4 | 30.4 | 63.5 | 78.7 | 114.8 |
| 43.8 | 86.3 | 105.8 | 169.2 |
| Ours (OoD) | 16.0 | 30.0 | 54.5 | 65.5 |
|
|
|
|
|
| 21.6 |
|
|
|
|
Note: Full table is given in Table A3.
Abbreviations: GCN, graph convolutional network; OoD, out‐of‐distribution.
Euclidean distance between predicted and ground truth joint angles on CMU
| Basketball (ID) | Basketball signal (OoD) | Directing traffic (OoD) | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Milliseconds | 80 | 160 | 320 | 400 | 1000 | 80 | 160 | 320 | 400 | 1000 | 80 | 160 | 320 | 400 | 1000 |
| GCN |
| 0.67 |
|
|
|
|
|
|
| 2.18 | 0.31 | 0.62 | 1.05 | 1.24 | 2.49 |
| Ours |
|
| 1.12 | 1.29 | 1.76 | 0.28 | 0.57 | 1.15 | 1.43 |
|
|
|
|
|
|
Abbreviations: GCN, graph convolutional network; ID, in‐distribution; OoD, out‐of‐distribution.
Mean joint per position error (MPJPE) between predicted and ground truth 3D Cartesian coordinates of joints on CMU
| Basketball (ID) | Basketball signal (OoD) | Directing traffic (OoD) | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Milliseconds | 80 | 160 | 320 | 400 | 1000 | 80 | 160 | 320 | 400 | 1000 | 80 | 160 | 320 | 400 | 1000 |
| GCN |
|
|
|
| 108.4 | 14.4 | 30.4 | 63.5 | 78.7 | 114.8 | 18.5 | 37.4 |
|
| 210.7 |
| Ours | 16.0 | 30.0 | 54.5 | 65.5 |
|
|
|
|
|
|
|
| 75.7 | 93.8 |
|
Abbreviations: GCN, graph convolutional network; ID, in‐distribution; OoD, out‐of‐distribution.
Long‐term prediction of three‐dimensional joint positions on H3.6M
| Walking (ID) | Eating (OoD) | Smoking (OoD) | Average (of 14 for OoD) | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Milliseconds | 560 | 720 | 880 | 1000 | 560 | 720 | 880 | 1000 | 560 | 720 | 880 | 1000 | 560 | 720 | 880 | 1000 |
| att‐GCN (OoD) |
|
|
|
| 87.6 | 103.6 | 113.2 | 120.3 | 81.7 | 93.7 | 102.9 | 108.7 |
| 129.6 | 140.3 | 147.8 |
| Ours (OoD) | 58.7 | 60.6 | 65.5 | 69.1 |
|
|
|
|
|
|
|
| 113.1 |
|
|
|
Note: Here ours is also trained with the attention‐GCN model. Full table is given in Table A4. Bold values correspond to the best score for the respective simulation across the different models.
Abbreviations: GCN, graph convolutional network; ID, in‐distribution; OoD, out‐of‐distribution.
Long‐term prediction of 3D joint positions on H3.6M
| Walking (ID) | Eating (OoD) | Smoking (OoD) | Discussion (OoD) | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| milliseconds | 560 | 720 | 880 | 1000 | 560 | 720 | 880 | 1000 | 560 | 720 | 880 | 1000 | 560 | 720 | 880 | 1000 |
| Attention‐GCN (OoD) |
|
|
|
| 87.6 | 103.6 | 113.2 | 120.3 | 81.7 | 93.7 | 102.9 | 108.7 |
| 130.0 |
|
|
| Ours (OoD) | 58.7 | 60.6 | 65.5 | 69.1 |
|
|
|
|
|
|
|
| 115.4 |
| 134.5 | 139.4 |
Note: Here ours is also trained with the attention‐GCN model.
Abbreviations: GCN, graph convolutional network; ID, in‐distribution; OoD, out‐of‐distribution.