| Literature DB >> 32438776 |
Junwoo Lee1, Bummo Ahn1,2.
Abstract
Human action recognition is an important research area in the field of computer vision that can be applied in surveillance, assisted living, and robotic systems interacting with people. Although various approaches have been widely used, recent studies have mainly focused on deep-learning networks using Kinect camera that can easily generate data on skeleton joints using depth data, and have achieved satisfactory performances. However, their models are deep and complex to achieve a higher recognition score; therefore, they cannot be applied to a mobile robot platform using a Kinect camera. To overcome these limitations, we suggest a method to classify human actions in real-time using a single RGB camera, which can be applied to the mobile robot platform as well. We integrated two open-source libraries, i.e., OpenPose and 3D-baseline, to extract skeleton joints on RGB images, and classified the actions using convolutional neural networks. Finally, we set up the mobile robot platform including an NVIDIA JETSON XAVIER embedded board and tracking algorithm to monitor a person continuously. We achieved an accuracy of 70% on the NTU-RGBD training dataset, and the whole process was performed on an average of 15 frames per second (FPS) on an embedded board system.Entities:
Keywords: RGB camera; embedded board; human action recognition; mobile robot; real-time
Mesh:
Year: 2020 PMID: 32438776 PMCID: PMC7287597 DOI: 10.3390/s20102886
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1System overview.
Figure 23D skeleton joints obtained by the integrated pipeline.
Figure 3Joint data conversion to RGB image.
Figure 4Converted image of the action sequence.
Figure 5The proposed convolutional neural network (CNN) structure.
Figure 6Mobile robot.
Accuracy by camera and number of joints.
| Method | Accuracy | Precision | Recall |
|---|---|---|---|
|
|
|
|
|
| Kinect-25 | 75% | 0.74 | 0.74 |
| Kinect-17 | 74% | 0.73 | 0.73 |
Figure 7Confusion matrix of Kinect camera-17.
Figure 8Confusion matrix of RGB camera-17.
Frames per second between models and module power modes.
| Model | 15 W Power FPS | MAX Power FPS |
|---|---|---|
| VGG 19 [ | x | x |
| Inception V4 [ | 4–5 | 8–9 |
| Resnet-50 [ | 5–6 | 9–10 |
|
|
|
|
Figure 9Tracking results during the movement.
Figure 10Predicting actions observed by the mobile robot.