| Literature DB >> 35721275 |
Yaling Zhang1, Huan Tang2, Fateh Zereg3, Dekai Xu4.
Abstract
Sports videos are blowing up over the internet with enriching material life and the higher pursuit of spiritual life of people. Thus, automatically identifying and detecting helpful information from videos have arisen as a relatively novel research direction. Accordingly, the present work proposes a Human Pose Estimation (HPE) model to automatically classify sports videos and detect hot spots in videos to solve the deficiency of traditional algorithms. Firstly, Deep Learning (DL) is introduced. Then, amounts of human motion features are extracted by the Region Proposal Network (RPN). Next, an HPE model is implemented based on Deep Convolutional Neural Network (DCNN). Finally, the HPE model is applied to motion recognition and video classification in sports videos. The research findings corroborate that an effective and accurate HPE model can be implemented using the DCNN to recognize and classify videos effectively. Meanwhile, Big Data Technology (BDT) is applied to count the playing amounts of various sports videos. It is convinced that the HPE model based on DCNN can effectively and accurately classify the sports videos and then provide a basis for the following statistics of various sports videos by BDT. Finally, a new outlook is proposed to apply new technology in the entertainment industry.Entities:
Keywords: big data technology; deep convolutional neural network; hot spot detection; human motion recognition model; sports video
Year: 2022 PMID: 35721275 PMCID: PMC9204289 DOI: 10.3389/fnbot.2022.829445
Source DB: PubMed Journal: Front Neurorobot ISSN: 1662-5218 Impact factor: 3.493
Figure 1Basic architecture of Big Data.
Figure 2Detection flow of the target identification algorithm (TIA).
Figure 3Bounding box selection by Region Proposal Network (RPN) method.
Figure 4Schematic diagram of the cross product.
Figure 5The modular expression framework.
Figure 6Structure of simple neural network.
Figure 7Deep Convolutional Neural Network (DCNN) structure.
Detailed AlexNet-based Deep Convolutional Neural Network (DCNN) parameters.
|
| ||
|---|---|---|
| Convolution layer 1 | 11 * 11 convolution kernel a, number = 48, step size = 4 | 11 * 11 convolution kernel b, number = 48, step size 4 |
| Activate function (relu) | Activate function (relu) | |
| Pooling layer (kernel size = 3, stride = 2) | Pooling layer (kernel size = 3, stride = 2) | |
| Standardization | ||
| Convolution layer 2 | Convolution layer size = 5*5, number = 128, step size = 1 | Convolution layer size = 5*5, number = 128, step size = 1. |
| Activate function (relu) | Activate function (relu) | |
| Pooling layer (kernel size = 3, stride = 2) | Pooling layer (kernel size =3, stride = 2) | |
| Standardization | ||
| Convolution layer 3 | Convolution kernel size = 3 * 3, number = 192, step size = 1 | Convolution kernel size = 3 * 3, number = 192, step size 1. |
| Activate function (relu) | Activate function (relu) | |
| Convolution layer 4 | Convolution kernel size = 3 * 3, number = 192, step size = 1 | Convolution kernel size = 3 * 3, number = 192, step size 1. |
| Activate function (relu) | Activate function (relu) | |
| Convolution layer 5 | Convolution kernel size = 3*3, number = 192, step size = 1 | Convolution kernel size = 3*3, number = 192, step size = 1. |
| Activate function (relu) | Activate function (relu) | |
| Pooling layer (kernel size = 3, stride = 2) | Pooling layer (kernel size = 3, stride = 2) | |
| Fully connected layer 6 | 2,048 neurons | 2,048 neurons. |
| Dropout | Dropout | |
| Fully connected layer 7 | 2,048 neurons | 2,048 neurons. |
| Dropout | Dropout | |
| Fully connected layer 8 | 1,000 neurons | |
Figure 8Structure of the improved three-dimensional Convolutional Neural Network (3DCNN).
Figure 9Identification and detection steps.
Figure 10KTH datasets.
Figure 11UCF Sports datasets.
Figure 12Target detection frame of algorithm.
Figure 13Football video detection results.
Figure 14Volleyball video detection results.
Figure 15Comparative experiment of mixed sports video recognition. (A) Retraining rate as 0.01; (B) retraining rate as 0.001.
Performance comparison of the same Human Pose Estimation (HPE) algorithm.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
|
|
|
|
| |
|
|
|
|
| ||
|
|
|
| |||
| FR-CNN | 147 ms | 10 ms | 157 ms | 86.1% | |
| Alphapose | 158 ms | 203 ms | 78.3% | ||
| Pifpaf | 258 ms | 102 ms | 10 ms | 443 ms | 79.5% |
| Local Pifpaf | 213 ms | 302 ms | 294 ms | 79.6% |
Figure 16Test results of false separation rate.
Figure 17Recognition results of the confusion matrix. (A) Results of KTH confusion matrix, (B) Results of UCF Sports confusion matrix.