| Literature DB >> 31554333 |
Jingtao Hu1, En Zhu2, Siqi Wang3, Xinwang Liu4, Xifeng Guo5, Jianping Yin6.
Abstract
Video anomaly detection is widely applied in modern society, which is achieved by sensors such as surveillance cameras. This paper learns anomalies by exploiting videos under the fully unsupervised setting. To avoid massive computation caused by back-prorogation in existing methods, we propose a novel efficient three-stage unsupervised anomaly detection method. In the first stage, we adopt random projection instead of autoencoder or its variants in previous works. Then we formulate the optimization goal as a least-square regression problem which has a closed-form solution, leading to less computational cost. The discriminative reconstruction losses of normal and abnormal events encourage us to roughly estimate normality that can be further sifted in the second stage with one-class support vector machine. In the third stage, to eliminate the instability caused by random parameter initializations, ensemble technology is performed to combine multiple anomaly detectors' scores. To the best of our knowledge, it is the first time that unsupervised ensemble technology is introduced to video anomaly detection research. As demonstrated by the experimental results on several video anomaly detection benchmark datasets, our algorithm robustly surpasses the recent unsupervised methods and performs even better than some supervised approaches. In addition, we achieve comparable performance contrast with the state-of-the-art unsupervised method with much less running time, indicating the effectiveness, efficiency, and robustness of our proposed approach.Entities:
Keywords: random projection; surveillance camera; unsupervised ensemble learning; video anomaly detection
Year: 2019 PMID: 31554333 PMCID: PMC6806243 DOI: 10.3390/s19194145
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The workflow of our proposed method: (A) Preprocessing: Rescale and partition unlabeled videos into spatio-temporal cubes and extract features from them. (B) Normalcy Estimation: Roughly evaluate normality from all spatio-temporal cubes without any prior information. (C) Model Refinement: Eliminate remaining abnormality and build a model of normalcy. (D) Inference Ensemble: Infer the anomaly scores of all video events through the normalcy model under different settings and ensemble the scores.
Figure 2An example of reconstruction discriminative distribution of UCSD Ped2 Dataset.
Figure 3The schematic diagram of RR-Net structure.
Figure 4An example of one-class SVM building a tight boundary on a synthetic data with a peanut shape distribution.
Figure 5An illustration of our proposed anomaly scores ensemble under different initializations and final anomaly scores computation.
Details of datasets used in our experiments.
| Frames (Total) | Pixels | Rescaled Pixels | |
|---|---|---|---|
| UCSD Ped1 | 14,000 | 238 × 158 | 160 × 240, 120 × 180, 100 × 150 |
| UCSD Ped2 | 4560 | 360 × 240 | 180 × 270, 120 × 180, 100 × 150 |
| Avenue | 30,652 | 640 × 360 | 120 × 160, 30 × 40 |
Figure 6The representative abnormal events in the benchmark datasets (highlighted in red box).
Runtime comparisons of proposed method and state-of-art unsupervised VAD method.
| UCSD Ped1 | UCSD Ped2 | Avenue | |
|---|---|---|---|
| Proposed | 1 h | 0.13 h | 0.27 h |
| Wang et al. [ | 6.8 h | 1.4 h | 2.3 h |
Frame-level EER and AUC evaluations on UCSD Ped1 dataset (“-” denotes that the result is not reported).
| Method | Frame-Level | ||
|---|---|---|---|
| EER | AUC | ||
| Supervised | Adam et al. [ |
|
|
| MPPCA [ |
|
| |
| SF [ |
|
| |
| SF+MPPCA [ |
|
| |
| MDT [ |
|
| |
| Bertini et al. [ |
| - | |
| SRC [ |
|
| |
| Lu et al. [ |
|
| |
| HMDT-CRF [ |
| - | |
| CAE [ |
|
| |
| STAE [ |
|
| |
| WTA-CAE [ |
|
| |
| Liu et al. [ | - |
| |
| Unsupervised | Unmasking [ | - |
|
| Wang et al. [ |
|
| |
| Proposed |
|
| |
Frame-level EER and AUC evaluations on UCSD Ped2 dataset (“-” denotes that the result is not reported).
| Method | Frame-Level | ||
|---|---|---|---|
| EER | AUC | ||
| Supervised | Adam et al. [ |
|
|
| MPPCA [ |
|
| |
| SF [ |
|
| |
| SF+MPPCA [ |
|
| |
| MDT [ |
|
| |
| Bertini et al. [ |
| - | |
| HMDT-CRF [ |
| - | |
| OWC-MTT [ |
|
| |
| CAE [ |
|
| |
| STAE [ |
|
| |
| TSC-sRNN [ | - |
| |
| WTA-CAE [ |
|
| |
| Liu et al. [ | - |
| |
| Unsupervised | Unmasking [ | - |
|
| Wang et al. [ |
|
| |
| Proposed |
|
| |
Frame-level EER and AUC evaluations on Avenue dataset (“-” denotes that the result is not reported).
| Method | Frame-Level | ||
|---|---|---|---|
| EER | AUC | ||
| Supervised | Lu et al. [ | - |
|
| CAE [ |
|
| |
| STAE [ |
|
| |
| TSC-sRNN [ | - |
| |
| WTA-CAE [ |
|
| |
| Liu et al. [ | - |
| |
| Unsupervised | Del Giorno et al. [ | - |
|
| Unmasking [ | - |
| |
| Wang et al. [ |
|
| |
| Proposed |
|
| |
Figure 7The visualization results of our method on UCSD Ped1 dataset (highlighted in red).
Figure 8The visualization results of our method on UCSD Ped2 dataset (highlighted in red).
Figure 9The visualization results of our method detection on Avenue dataset (highlighted in red).
Frame-level EER evaluations (mean ± std) with different settings on three benchmark datasets.
| Settings |
|
|
|
| |
|---|---|---|---|---|---|
| Datasets | |||||
| UCSD Ped1 |
|
|
|
| |
| UCSD Ped2 |
|
|
|
| |
| Avenue |
|
|
|
| |
Frame-level AUC evaluations (mean ± std) with different settings on three benchmark datasets.
| Settings |
|
|
|
| |
|---|---|---|---|---|---|
| Datasets | |||||
| UCSD Ped1 |
|
|
|
| |
| UCSD Ped2 |
|
|
|
| |
| Avenue |
|
|
|
| |
Figure 10The EER and AUC comparison (mean ± std) on three benchmark datasets under different settings.