| Literature DB >> 34960594 |
Abstract
The continuous development of intelligent video surveillance systems has increased the demand for enhanced vision-based methods of automated detection of anomalies within various behaviors found in video scenes. Several methods have appeared in the literature that detect different anomalies by using the details of motion features associated with different actions. To enable the efficient detection of anomalies, alongside characterizing the specificities involved in features related to each behavior, the model complexity leading to computational expense must be reduced. This paper provides a lightweight framework (LightAnomalyNet) comprising a convolutional neural network (CNN) that is trained using input frames obtained by a computationally cost-effective method. The proposed framework effectively represents and differentiates between normal and abnormal events. In particular, this work defines human falls, some kinds of suspicious behavior, and violent acts as abnormal activities, and discriminates them from other (normal) activities in surveillance videos. Experiments on public datasets show that LightAnomalyNet yields better performance comparative to the existing methods in terms of classification accuracy and input frames generation.Entities:
Keywords: anomaly detection; behavior analysis; convolutional neural network; fall detection; suspicious behavior detection; violence detection
Mesh:
Year: 2021 PMID: 34960594 PMCID: PMC8704800 DOI: 10.3390/s21248501
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Summary of the abnormal behavior detection methods.
| Reference | Data Used | Feature/Model | Type(s) of | Dataset(s) |
|---|---|---|---|---|
|
| ||||
| Harari et al. [ | Accelerometer data, gyroscope signals | Acceleration threshold, logistic regression-based classifier | Falling | Self-collected |
| Vishnu et al. [ | RGB | GMM, FMMM, fall motion vector | Falling | UR Fall Detection, Montreal |
| Min and Moon [ | RGB | Embedding module, attended memory module | Falling | AI Hub DS |
| Zerrouki and Houacine [ | RGB | Curvelet transforms, area ratios features, SVM-HMM | Falling | UR Fall Detection |
| Cheoi [ | Optical flow | Optical flow, temporal saliency map | Falling, violence, | UMN, Avenue, Self-collected from CCTV footage |
| Kim et al. [ | RGB | Object detection, YOLOv4 | Falling, intrusion, | Korea Internet & Security DS |
|
| ||||
| Nunez et al. [ | RGB, optical flow | 2D-CNN | Falling | UR Fall Detection, Multicam, FDD |
| Yao et al. [ | RGB | GMM, 2D-CNN | Falling | Self-collected |
| Khraief et al. [ | RGB, depth | Multi-stream CNN | Falling | Self-collected, UR Fall Detection, FDD |
| Pan et al. [ | RGB, optical flow | 3D-CNN | Violence | UCF-Crime, UCF-101 |
| Roman and Chavez [ | RGB | CNN | Violence | Hockey Fights, Violent Flows, UCFCrime2Local |
| Rendón-Segador et al. [ | RGB, optical flow | Multi-head self-attention, bidirectional convolutional LSTM | Violence | Hockey Fights, Movies, Violent Flows, Real Life Violence Situations |
| Ullah et al. [ | RGB, optical flow | CNN | Violence | Hockey Fights, Violent Flows, Surveillance Fight |
| Asad et al. [ | RGB | Feature fusion, 2D-CNN, LSTM | Violence | Hockey Fights, Movies, Violent Flows, BEHAVE |
| Ullah et al. [ | RGB | Spatiotemporal features, CNN, bidirectional convolutional LSTM | Violence | UCF-Crime, UCFCrime2Local |
| Ullah et al. [ | RGB | 3D-CNN | Violence | UCF-Crime |
| Song et al. [ | RGB | Key frames sampling, 3D-CNN | Violence | Hocky Fights, Movies, Violent Flows |
| Fang et al. [ | RGB | CNN, YOLOv3 | Suspicious | Self-collected |
| Sha et al. [ | RGB, optical flow | Two-stream 2D-CNN | Suspicious | Self-collected |
| Chriki et al. [ | RGB | HOG, HOG3D, CNN | Suspicious | Mini-Drone Video |
| Mehmood [ | RGB, optical flow | 2-stream 3D-CNN | Falling, loitering, | UFLV |
Figure 1The overall architecture of the proposed LightAnomalyNet framework.
Figure 2(a) Process of generating SG3I images from sequential video frames; (b) Example of SG3I generation for URFD dataset.
Figure 3Sample SG3I images generated for Avenue (row 1), Mini-Drone Video (row 2), and Hockey Fights (row 3) datasets.
Figure 4Proposed architecture of the lightweight CNN and the analysis of learnable parameters at each layer of the network. Note that the total of all learnable parameters for the proposed structure of the CNN is 7154.
Statistical Information of the datasets adopted for SG3Is preparation.
| Anomalous Samples | Non-Anomalous | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Dataset | # Video Samples Used | Frame Rate | Resolution | # Samples | # Anomaly Sequences | # Frames | # Samples | # Non-Anomaly Sequences | # Frames |
| UR Fall * | 48 | 30 |
| 24 | 24 | 720 | 24 | 250 | 7500 |
| Avenue | 37 | 25 |
| 18 | 57 | 3750 | 19 | 238 | 10,350 |
| Mini-Drone Video | 38 | 30 |
| 24 | 43 | 6380 | 10 | 24 | 2925 |
| Hockey Fights ** | 70 | 25 |
| 35 | 35 | 875 | 35 | 35 | 875 |
* a separate set of 12 videos was used for testing. ** a separate set of 20 videos was used for testing.
Figure 5Confusion matrix of the proposed framework on: (a) UR Fall dataset (b) Avenue dataset (c) Mini-Drone Video dataset (d) Hockey Fights dataset.
Figure 6ROC curve and AUC values for: (a) UR Fall dataset (b) Avenue dataset (c) Mini-Drone Video dataset (d) Hockey Fights dataset.
Classification results of the proposed framework on four datasets adopted in the study.
| UR Fall | Avenue | Mini-Drone Video | Hockey Fights | |
|---|---|---|---|---|
| Recall | 0.9892 | 0.9569 | 0.9659 | 0.9981 |
| FP Rate | 0.0121 | 0.0513 | 0.0497 | 0.0034 |
| Precision | 0.9879 | 0.9491 | 0.9511 | 0.9966 |
| Accuracy | 0.9886 | 0.9528 | 0.9581 | 0.9974 |
| F1 | 0.9886 | 0.9530 | 0.9584 | 0.9974 |
Statistical analysis of the proposed framework based on Margin of Error (MoE) at confidence level 95%.
| Dataset | Accuracy (%)—100 Iterations | Statistical Measures | ||||
|---|---|---|---|---|---|---|
| Minimum | Average | Maximum | Standard Deviation | Standard Error | MoE | |
| UR fall | 97.01 | 98.06 | 98.88 | 0.5574 | 0.0258 | 0.1098 |
| Avenue | 93.06 | 94.21 | 95.54 | 0.7432 | 0.0342 | 0.1464 |
| Mini-drone video | 93.09 | 94.24 | 95.83 | 0.7643 | 0.0435 | 0.1506 |
| Hockey fights | 98.76 | 99.34 | 99.92 | 0.3247 | 0.0147 | 0.0640 |
Figure 7Comparison of classification results on different splits of: (a) UR Fall dataset (b) Avenue dataset (c) Mini-Drone Video dataset (d) Hockey Fights dataset.
Comparison of the execution times (frames per second) taken for input frames generation.
| Dataset | Optical Flow | Dynamic Image | SG3I |
|---|---|---|---|
| UR fall | 16.59 | 175.10 | 719.61 |
| Avenue | 15.93 | 184.65 | 776.12 |
| Mini-drone video | 16.09 | 189.14 | 789.36 |
| Hockey fights | 16.77 | 177.85 | 745.70 |
Comparison of the proposed lightweight model with different networks.
| Network | No. of Learnable Parameters | Size (MB) | Time per Inference Step (ms)—CPU | Time per Inference Step (ms)—GPU | UR Fall Accuracy% | Avenue Accuracy% | Mini-Drone Video Accuracy% | Hockey Fights Accuracy% |
|---|---|---|---|---|---|---|---|---|
| ResNet-50 + SG3I | 25M+ | 106 | 698.40 | 45.50 | 97.92 | 95.78 | 95.18 | 99.78 |
| Inception-V3 + SG3I | 23M+ | 101 | 507.00 | 68.60 | 98.89 | 95.17 | 95.86 | 99.71 |
| DenseNet-250 + SG3I | 15M+ | 93 | 1526.88 | 66.70 | 97.21 | 94.91 | 95.66 | 99.08 |
| LightAnomalyNet | 7154 | 14 | 278.45 | 23.05 | 98.86 | 95.28 | 95.81 | 99.74 |
Comparison of classification accuracy with the state-of-the-art methods in falling category.
| UR Fall Dataset | ||||
|---|---|---|---|---|
| Method | AUC% | Recall% | Precision% | Accuracy% |
| Vishnu et al. [ | - | 97.5 | 96.9 | - |
| Zerrouki and Houacine [ | - | - | - | 97.0 |
| Nunez et al. [ | - | 100.0 | - | 95.0 |
| Khraief et al. [ | - | 100.0 | 95.0 | - |
| LightAnomalyNet | 98.71 | 98.92 | 98.79 | 98.86 |
Comparison of classification accuracy with the state-of-the-art methods in suspicious actions category.
| Avenue Dataset | Mini-Drone Video | |||||||
|---|---|---|---|---|---|---|---|---|
| Method | AUC% | Recall% | Precision% | Accuracy% | AUC% | Recall% | Precision% | Accuracy% |
| Cheoi [ | - | 94.5 | 93.2 | 90.1 | - | - | - | - |
| Chriki et al. [ | - | - | - | - | - | 100.0 | 88.37 | 93.57 |
| LightAnomalyNet | 94.97 | 95.69 | 94.91 | 95.28 | 96.11 | 96.59 | 95.11 | 95.81 |
Comparison of classification accuracy with the state-of-the-art methods in violence category.
| Hockey Fights | ||||
|---|---|---|---|---|
| Method | AUC% | Recall% | Precision% | Accuracy% |
| Roman and Chavez [ | - | - | - | 96.40 |
| Song et al. [ | - | - | - | 99.62 |
| Ullah et al. [ | - | 98.10 | 98.10 | 98.00 |
| Asad et al. [ | - | - | - | 98.80 |
| Mehmood [ | 99.76 | 99.82 | 99.59 | 99.71 |
| LightAnomalyNet | 99.78 | 99.81 | 99.66 | 99.74 |