| Literature DB >> 31344042 |
Ruo-Hong Huan1, Chao-Jie Xie1, Feng Guo1, Kai-Kai Chi1, Ke-Ji Mao1, Ying-Long Li1, Yun Pan2.
Abstract
In this paper, we propose a human action recognition method using HOIRM (histogram of oriented interest region motion) feature fusion and a BOW (bag of words) model based on AP (affinity propagation) clustering. First, a HOIRM feature extraction method based on spatiotemporal interest points ROI is proposed. HOIRM can be regarded as a middle-level feature between local and global features. Then, HOIRM is fused with 3D HOG and 3D HOF local features using a cumulative histogram. The method further improves the robustness of local features to camera view angle and distance variations in complex scenes, which in turn improves the correct rate of action recognition. Finally, a BOW model based on AP clustering is proposed and applied to action classification. It obtains the appropriate visual dictionary capacity and achieves better clustering effect for the joint description of a variety of features. The experimental results demonstrate that by using the fused features with the proposed BOW model, the average recognition rate is 95.75% in the KTH database, and 88.25% in the UCF database, which are both higher than those by using only 3D HOG+3D HOF or HOIRM features. Moreover, the average recognition rate achieved by the proposed method in the two databases is higher than that obtained by other methods.Entities:
Year: 2019 PMID: 31344042 PMCID: PMC6658076 DOI: 10.1371/journal.pone.0219910
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Determining the region of spatiotemporal interest points.
Fig 2HOIRM of “waving”.
Fig 33D HOG and 3D HOF descriptors.
Fig 4Visual dictionary constructed using the AP clustering BOW algorithm.
Recognition rate in the KTH database under different dictionary capacities (%).
| Dictionary capacity | 300 | 400 | 500 | 800 | 1000 | 1500 |
|---|---|---|---|---|---|---|
| Boxing | 100.00 | 100.00 | 100.00 | 98.55 | 95.20 | 95.60 |
| Hand clapping | 94.33 | 95.00 | 94.50 | 94.25 | 87.33 | 88.25 |
| Hand waving | 98.75 | 100.00 | 97.33 | 95.50 | 90.00 | 90.00 |
| Jogging | 84.33 | 85.50 | 85.50 | 86.50 | 80.50 | 78.50 |
| Running | 88.25 | 90.00 | 80.50 | 80.33 | 80.50 | 82.33 |
| Walking | 94.50 | 95.50 | 92.00 | 87.23 | 85.50 | 86.50 |
| Average | 93.36 | 94.33 | 91.63 | 90.39 | 86.50 | 86.86 |
Experimental results of BOW based on K-Means clustering and AP clustering in the KTH database.
| Clustering method | Visual dictionary capacity | Average recognition rate (%) | IGP value | Running time (s) |
|---|---|---|---|---|
| K-MEANS | 300 | 93.36 | 0.3506 | 28644.5 |
| 400 | 94.33 | 0.3617 | 35748.9 | |
| 500 | 91.63 | 0.3089 | 42132.1 | |
| 800 | 90.39 | 0.2974 | 49628.5 | |
| 1000 | 86.50 | 0.2776 | 53125.1 | |
| 1500 | 86.86 | 0.2535 | 65535.1 | |
Fig 5Change in the average recognition rate in the KTH and UCF databases under different dictionary capacities.
Fig 6Change in the IGP value in the KTH and UCF databases under different dictionary capacities.
Recognition rate in the UCF data set under different dictionary capacities (%).
| Dictionary capacity | 300 | 400 | 500 | 800 | 1000 | 1500 |
|---|---|---|---|---|---|---|
| Diving | 95.80 | 96.50 | 100.00 | 100.00 | 98.00 | 96.50 |
| Golf swing | 84.80 | 85.50 | 86.80 | 87.60 | 86.80 | 85.50 |
| Kicking | 87.80 | 88.00 | 89.80 | 91.50 | 90.00 | 88.00 |
| Lifting | 70.20 | 71.80 | 74.50 | 75.80 | 72.10 | 71.80 |
| Riding horse | 65.20 | 67.60 | 69.50 | 70.80 | 70.60 | 67.60 |
| Running | 70.00 | 74.20 | 76.10 | 78.80 | 75.20 | 74.20 |
| Skating boarding | 83.20 | 85.00 | 86.80 | 88.50 | 86.40 | 85.00 |
| Swing-Bench | 90.00 | 91.50 | 92.10 | 93.50 | 90.50 | 91.50 |
| Swing-Side | 94.80 | 95.20 | 98.00 | 100.00 | 98.80 | 95.20 |
| Walking | 84.30 | 86.50 | 90.00 | 91.30 | 88.80 | 86.50 |
| Average | 82.61 | 84.18 | 86.36 | 87.78 | 85.72 | 84.18 |
Experimental results of BOW based on K-Means clustering and AP clustering in the UCF data set.
| Clustering method | Visual dictionary capacity | Average recognition rate (%) | IGP value | Running time (s) |
|---|---|---|---|---|
| K-Means | 300 | 82.61 | 0.2314 | 20248.1 |
| 400 | 84.98 | 0.2836 | 27483.5 | |
| 500 | 86.36 | 0.3325 | 31320.2 | |
| 800 | 87.78 | 0.3928 | 38288.5 | |
| 1000 | 85.72 | 0.3275 | 40749.9 | |
| 1500 | 84.18 | 0.2743 | 52735.8 | |
| AP |
Comparison of recognition rates of three different features (%).
| Features | KTH database | UCF dataset |
|---|---|---|
| 3D HOG+3D HOF | 91.50 | 85.95 |
| HOIRM | 92.43 | 86.52 |
| The proposed fused feature | 95.75 | 88.25 |
Comparison of the recognition rate of the proposed method with that of other methods.
| KTH database | UCF dataset | ||
|---|---|---|---|
| Methods | recognition rate (%) | Methods | recognition rate (%) |
| Naidoo [ | 82.00 | Wang [ | 85.60 |
| Jaouedi [ | 91.00 | Klaser [ | 86.70 |
| Zhang [ | 91.67 | Bregonizo [ | 86.90 |
| Laptev [ | 91.80 | Farrajota[ | 87.20 |
| Najar [ | 91.97 | The proposed method | 88.25 |
| Yuan [ | 93.30 | ||
| Tong [ | 93.96 | ||
| Fu [ | 94.33 | ||
| Kovashka [ | 94.53 | ||
| The proposed method | 95.75 | ||