| Literature DB >> 36262136 |
Gulshan Saleem1, Usama Ijaz Bajwa1, Rana Hammad Raza2, Fayez Hussain Alqahtani3, Amr Tolba4, Feng Xia5.
Abstract
Smart surveillance is a difficult task that is gaining popularity due to its direct link to human safety. Today, many indoor and outdoor surveillance systems are in use at public places and smart cities. Because these systems are expensive to deploy, these are out of reach for the vast majority of the public and private sectors. Due to the lack of a precise definition of an anomaly, automated surveillance is a challenging task, especially when large amounts of data, such as 24/7 CCTV footage, must be processed. When implementing such systems in real-time environments, the high computational resource requirements for automated surveillance becomes a major bottleneck. Another challenge is to recognize anomalies accurately as achieving high accuracy while reducing computational cost is more challenging. To address these challenge, this research is based on the developing a system that is both efficient and cost effective. Although 3D convolutional neural networks have proven to be accurate, they are prohibitively expensive for practical use, particularly in real-time surveillance. In this article, we present two contributions: a resource-efficient framework for anomaly recognition problems and two-class and multi-class anomaly recognition on spatially augmented surveillance videos. This research aims to address the problem of computation overhead while maintaining recognition accuracy. The proposed Temporal based Anomaly Recognizer (TAR) framework combines a partial shift strategy with a 2D convolutional architecture-based model, namely MobileNetV2. Extensive experiments were carried out to evaluate the model's performance on the UCF Crime dataset, with MobileNetV2 as the baseline architecture; it achieved an accuracy of 88% which is 2.47% increased performance than available state-of-the-art. The proposed framework achieves 52.7% accuracy for multiclass anomaly recognition on the UCF Crime2Local dataset. The proposed model has been tested in real-time camera stream settings and can handle six streams simultaneously without the need for additional resources. ©2022 Saleem et al.Entities:
Keywords: Anomaly recognition; Crime detection; Deep learning; Video analysis; Video surveillance
Year: 2022 PMID: 36262136 PMCID: PMC9575851 DOI: 10.7717/peerj-cs.1117
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1General data flow of anomaly detection framework.
Figure 2Video representation in terms of multidimensional array.
Figure 3Video input is forwarded to temporal feature extractor which performs temporal and spatial modeling with the help of MobileNetV2 whereas fully connected layers perform anomaly recognition via spatiotemporal modelling.
Figure 4Proposed temporal based anomaly recognizer (TAR) framework with 2D MobileNetV2 baseline architecture.
Hyperparameters settings of experiment.
| Hyperparameter | Hyperparameter value |
|---|---|
| Batch size | 16 |
| Epochs | 100 |
| Learning rate | 0.01 (decays by 0.1 at epoch 40 80) |
| Dropout | 0.5 |
Comparison of the results achieved using ResNet50 and MobileNetV2 architecture.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Attention Residual LSTM ( | 78.43% | 87% | 78% | 12.8 | 3.3M | 618.3 |
| DEARESt ( | 76.786% | – | – | 1187.5 | 305M | – |
| ResNet50+multi-layer BD-LSTM ( | 85.53% | – | – | 143 | 25M | – |
| TAR(Baseline ResNet50) | 93.4% | 97.8% | 89% | 91.2 | 23.5M | 6768 |
| TAR(Baseline MobileNetV2) | 88% | 92.2% | 83% | 8.61 | 2.2M | 564 |
Figure 5Confusion matrix of proposed framework with ResNet50 as the 2D CNN baseline model.
Figure 6Confusion matrix of proposed framework with MobileNetV2 as 2D CNN baseline model.
Figure 7Accuracy of proposed framework.
Figure 8Loss curve of proposed framework.
Figure 9Performance of our desktop application based on proposed framework (TAR).
Figure 10Desktop application stores anomaly clips.
Figure 11Multiclass confusion matrix of temporal based anomaly recognizer (TAR).
Performance of proposed framework (TAR) for multiclass anomaly recognition.
| Method | Accuracy |
|---|---|
| ResNet50 | 45.20% |
| MobileNetV2 | 41.9% |
| Proposed method | 52.70% |