| Literature DB >> 25750750 |
Mona M Moussa1, Elsayed Hamayed2, Magda B Fayek2, Heba A El Nemr1.
Abstract
This paper presents a fast and simple method for human action recognition. The proposed technique relies on detecting interest points using SIFT (scale invariant feature transform) from each frame of the video. A fine-tuning step is used here to limit the number of interesting points according to the amount of details. Then the popular approach Bag of Video Words is applied with a new normalization technique. This normalization technique remarkably improves the results. Finally a multi class linear Support Vector Machine (SVM) is utilized for classification. Experiments were conducted on the KTH and Weizmann datasets. The results demonstrate that our approach outperforms most existing methods, achieving accuracy of 97.89% for KTH and 96.66% for Weizmann.Entities:
Keywords: Action recognition; Bag of words; SIFT; SVM
Year: 2013 PMID: 25750750 PMCID: PMC4348441 DOI: 10.1016/j.jare.2013.11.007
Source DB: PubMed Journal: J Adv Res ISSN: 2090-1224 Impact factor: 10.479
Fig. 1A block diagram of the proposed system.
Fig. 2The effect of fine-tuning the SIFT threshold on the number of interest points. The first row is a group of frames and the detected interest points in them without fine-tuning the threshold (a lot of points and most of them are at the background) and the second row is a group of frames and the detected interest points in them with fine-tuning the threshold according to the amount of details in the video (here the points are much more less and indicative).
Fig. 3The effect changing the codebook size on the results accuracy.
Accuracy using the proposed method for each of the four scenarios.
| Outdoor | Scale variations | Changes in clothing | Indoor | |
|---|---|---|---|---|
| Accuracy % | 96 | 96.7 | 100 | 99.3056 |
Comparing the proposed normalization with ℓ1-Normalization, ℓ2-Normalization and Power-Normalization.
| The normalization used | Accuracy | Time (s) |
|---|---|---|
| ℓ1 Normalization | 60.3% | 22.979 |
| ℓ2 Normalization | 67.7% | 20.6230 |
| ℓ1 With power normalization | 93% | 14.96 |
| ℓ2 With power normalization | 95.5% | 13.79 |
| Power normalization | 96.5% | 11.85 |
| Proposed with power normalization | 97.7% | 14.508 |
| Proposed normalization | 97.9% | 14.446 |
Comparison with other methods.
| Method | KTH | Weizmann |
|---|---|---|
| The proposed method | 97.89 | 96.66 |
| Bregonzio et al. | 94.33 | 96.6 |
| Liu and Shah | 94.2 | – |
| Lin et al. | 93.43 | 100 |
| Chen and Hauptman | 95.83 | – |
| Niebles et al. | 83.3 | 90 |
| Tran et al. | 95.67 | – |
| Schuldt et al. | 71.72 | – |
| Fathi and Mori | 90.5 | 100 |
| Kovashka and Grauman | 94.53 | – |
| Cao et al. | 95.02 | – |
| Kaaniche and Bremond | 94.67 | – |
| Dollar et al. | 81.17 | 85.2 |
| Klaser et al. | 91.4 | 84.3 |
| Zhang et al. | 91.33 | 92.89 |
Confusion matrix of Weizmann dataset.
| if np>25 then th=14 |
| else if np >20 then th=10 |
| else if np>10 then th=8 |
| else th=6 |
Confusion matrix of KTH dataset using ℓ1-Normalization.
Confusion matrix of KTH dataset using ℓ2-Normalization.
Confusion matrix of KTH dataset using power-normalization.
Confusion matrix of KTH dataset using the proposed normalization.