| Literature DB >> 32408586 |
Kanghui Du1, Thomas Kaczmarek1, Dražen Brščić1, Takayuki Kanda1.
Abstract
Detecting and recognizing low-moral actions in public spaces is important. But low-moral actions are rare, so in order to learn to recognize a new low-moral action in general we need to rely on a limited number of samples. In order to study the recognition of actions from a comparatively small dataset, in this work we introduced a new dataset of human actions consisting in large part of low-moral behaviors. In addition, we used this dataset to test the performance of a number of classifiers, which used either depth data or extracted skeletons. The results show that both depth data and skeleton based classifiers were able to achieve similar classification accuracy on this dataset (Top-1: around 55%, Top-5: around 90%). Also, using transfer learning in both cases improved the performance.Entities:
Keywords: 3D CNN; depth maps; human action recognition; low-moral actions; skeleton
Year: 2020 PMID: 32408586 PMCID: PMC7285506 DOI: 10.3390/s20102758
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1An illustration of the experimental setup. The circles represent the positions of the Kinect sensors. The size of the experiment area is about 2.8 × 1.5 m.
Figure 2Collected actions in the Low-Moral Actions (LMA) dataset.
Figure 3Example data maps before and after preprocessing: (left) original 512 × 424 depth map from Kinect; (right) transformed 64 × 64 depth map containing only the parts belonging to person.
Figure 4The neural network structure used for the depth map based classification (DM3DCNN).
Figure 5Simulation of human depth maps: (left) two examples of human models inside the simulator performing different actions; (right) the corresponding depth maps from a simulated Kinect sensor (showing just the part of the depth map close to the model).
Figure 6Example skeleton representation obtained from Microsoft Kinect v2 (overlaid on the depth image). The skeleton consists of 25 joints (shown as white circles), here connected with light blue lines for better visualization.
Recognition accuracy (Top-1) on the Low-Moral Actions (LMA) dataset. (FCNN = fully connected neural network; for transfer learning on skeleton data the dataset used for pretraining is denoted in squared brackets.) For the two highlighted best methods we also denote the Top-5 accuracy in brackets.
| Training from Scratch: | Transfer Learning: | |||
|---|---|---|---|---|
|
| DM3DCNN | 47.27% | DM3DCNN (feature-based t. l.) | 44.47% |
|
| ||||
|
| 5-layer FCNN | 18.92% | ST-GCN [NTU] | 53.52% |
| SVM | 21.40% | ST-GCN [CMU] | 47.84% | |
| ST-GCN | 5.00% |
| ||
| DGNN | 11.60% | DGNN [CMU] | 51.78% | |
Figure 7Confusion matrices for: (left)—DM3DCNN with fine tuning; (right)—DGNN with pretraining on the NTU RGB+D dataset.