| Literature DB >> 30965646 |
Marius Baba1, Vasile Gui2, Cosmin Cernazanu3, Dan Pescaru4.
Abstract
Citizen safety in modern urban environments is an important aspect of life quality. Implementation of a smart city approach to video surveillance depends heavily on the capability of gathering and processing huge amounts of live urban data. Analyzing data from high bandwidth surveillance video streams provided by large size distributed sensor networks is particularly challenging. We propose here an efficient method for automatic violent behavior detection designed for video sensor networks. Known solutions to real-time violence detection are not suitable for implementation in a resource-constrained environment due to the high processing power requirements. Our algorithm achieves real-time processing on a Raspberry PI-embedded architecture. To ensure separation of temporal and spatial information processing we employ a computationally effective cascaded approach. It consists of a deep neural network followed by a time domain classifier. In contrast with current approaches, the deep neural network input is fed exclusively with motion vector features extracted directly from the MPEG encoded video stream. As proven by results, we achieve state-of-the-art performance, while running on a low computational resources embedded architecture.Entities:
Keywords: action classification; deep learning; sensor networks; smart cities; violence detection
Year: 2019 PMID: 30965646 PMCID: PMC6479846 DOI: 10.3390/s19071676
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The main steps of the algorithm running on a sensor node.
Figure 2The principle of block motion estimation. Left image: block in current frame marked with yellow rectangle. Right image: best match block in previous frame, marked with green and motion vector marked with red arrow.
Figure 3Optical flow on a violent sequence. First line contains eight frames from a fight sequence. The second line contains the corresponding color-coded MPEG flow feature maps (rescaled).
Figure 4The classifier architecture.
Figure 5Example of augmentation ROIs.
Figure 6MPEG flow vs. Optical flow classifiers performance. (a) Precision/recall. for different threshold values. (b) F1 score.
Figure 7Output collected on different stages. (a) CNN prediction. (b) Filter 1 prediction. (c) Filter 2 prediction.
Confusion matrix for the best recall (100%).
| Labeled class | |||
|---|---|---|---|
| Violence | No violence | ||
| Predicted class | Violence | 15 | 19 |
| No violence | 0 | 52 | |
Detection results on the BEHAVE dataset.
| Algorithm | ACC ± SD | AUC | |
|---|---|---|---|
| Existing algorithms | HOG+BoW [ | 58.69 ± 0.35% | 0.6322 |
| HOF+BoW [ | 59.91 ± 0.28% | 0.5893 | |
| HNF+BoW [ | 57.97 ± 0.31% | 0.6089 | |
| ViF [ | 82.02 ± 0.19% | 0.8592 | |
| MoSIFT+BoW [ | 62.02 ± 0.23% | 0.6578 | |
| RVD [ | 85.29 ± 0.16% | 0.8878 | |
| AMDN [ | 84.22 ± 0.17% | 0.8562 | |
| MoWLD+BoW [ | 83.19 ± 0.18% | 0.8517 | |
| MoWLD+SparseCoding [ | 85.75 ± 0.15% | 0.8891 | |
| MoWLD+KDE+SparseCoding [ | 87.17 ± 0.13% | 0.8993 | |
| Proposed method | 86.93 ± 0.21% | 0.9543 | |