| Literature DB >> 35746427 |
Abstract
As an essential task in computer vision, video anomaly detection technology is used in video surveillance, scene understanding, road traffic analysis and other fields. However, the definition of anomaly, scene change and complex background present great challenges for video anomaly detection tasks. The insight that motivates this study is that the reconstruction error for normal samples would be lower since they are closer to the training data, while the anomalies could not be reconstructed well. In this paper, we proposed a Convolutional Recurrent AutoEncoder (CR-AE), which combines an attention-based Convolutional Long-Short-Term Memory (ConvLSTM) network and a Convolutional AutoEncoder. The ConvLSTM network and the Convolutional AutoEncoder could capture the irregularity of the temporal pattern and spatial irregularity, respectively. The attention mechanism was used to obtain the current output characteristics from the hidden state of each Covn-LSTM layer. Then, a convolutional decoder was utilized to reconstruct the input video clip and the testing video clip with higher reconstruction error, which were further judged to be anomalies. The proposed method was tested on two popular benchmarks (UCSD ped2 Dataset and Avenue Dataset), and the experimental results demonstrated that CR-AE achieved 95.6% and 73.1% frame-level AUC on two public datasets, respectively.Entities:
Keywords: convolutional autoencoder; convolutional long–short-term memory; deep learning; video anomaly detection
Mesh:
Year: 2022 PMID: 35746427 PMCID: PMC9230876 DOI: 10.3390/s22124647
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Examples of the hand-crafted feature. (a) Object trajectory [24]. (b) Dense trajectory [26]. (c) Histograms of gradients (HOG) [31]. (d) Spatio-temporal video volumes (STVs) [30].
Figure 2Examples of the deep learning-based method. (a) GMFC-VAE [32]. (b) GAN [18,33].
Figure 3Overview of our proposed method.
Figure 4Overall architecture of the proposed CR-AE model.
Figure 5This is a figure. Schemes follow the same formatting.
Specifications of the CR-AE model.
| Layer | Input | Kernel Size | Stride/ | Output | Last/ |
|---|---|---|---|---|---|
| Input | 5 × 227 × 227 | ||||
| Conv1 | 5 × 227 × 227 | 3 × 3 | 2/0 | 128 × 55 × 55 | Input/Conv2 + Lstm1 |
| Conv2 | 128 × 27 × 27 | 3 × 3 | 2/0 | 65 × 27 × 27 | Conv 1/Conv3 + Lstm2 |
| Conv3 | 64 × 27 × 27 | 3 × 3 | 2/0 | 64 × 13 × 13 | Conv 2/Conv4 + Lstm3 |
| Conv4 | 64 × 13 × 13 | 3 × 3 | 2/0 | 32 × 13 × 13 | Conv 3/De-conv1 + Lstm4 |
| Lstm1 | 128 × 55 × 55 | N/A | N/A | 128 × 55 × 55 | Conv1/De-conv4 |
| Lstm2 | 64 × 27 × 27 | N/A | N/A | 64 × 27 × 27 | Conv2/De-conv3 |
| Lstm3 | 64 × 13 × 13 | N/A | N/A | 64 × 13 × 13 | Conv3/De-conv2 |
| Lstm4 | 32 × 13 × 13 | N/A | N/A | 32 × 13 × 13 | Conv4/De-conv1 |
| De-conv1 | 32 × 13 × 13 | 3 × 3 | 2/0 | 64 × 13 × 13 | Lstm4 + Conv4/De-conv2 |
| De-conv2 | 64 × 13 × 13 | 3 × 3 | 2/0 | 128 × 27 × 27 | Lstm3 + Conv1/De-conv3 |
| De-conv3 | 128 × 27 × 27 | 3 × 3 | 2/0 | 256 × 55 × 55 | Lstm2 + Conv2/De-conv4 |
| De-conv4 | 128 × 55 × 55 | 3 × 3 | 2/0 | 5 × 277 × 277 | Lstm3 + De-conv3/Output |
| Output | 5 × 277 × 277 |
Input, input layer; Conv, convolutional layer; Lstm, ConvLSTM layer; De-conv, deconvolutional layer; Output, output layer. The Encoder and Decoder consist of Conv1, Conv2, Conv3, Conv4 and De-conv1, De-conv2, De-conv3, De-conv4, respectively.
Figure 6ROC curves for the UCSD Ped2 dataset.
Comparison with the state-of-the-art methods in terms of AUC% on the USCD Ped2 Dataset.
| Method | AUC |
|---|---|
| MPPCA [ | 69.3% |
| MDT [ | 82.9% |
| SSS [ | 94.0% |
| Online GNG [ | 94.0% |
| Unmasking [ | 82.2% |
| ADMN [ | 90.8% |
| MT-FRCN [ | 92.2% |
| Conv2D-AE [ | 85.0% |
| Conv3D-AE [ | 91.2% |
| ConvLSTM-AE [ | 88.1% |
| StackRNN [ | 92.2% |
| Baseline [ | 95.4% |
| The proposed CR-AE | 95.6% |
Comparison with the state-of-the-art methods in terms of AUC% on the ShanghaiTech dataset.
| Method | AUC |
|---|---|
| Conv2D-AE [ | 60.9% |
| StackRNN [ | 68.0% |
| Baseline [ | 72.8% |
| Asymptotic Bound [ | 70.9% |
| MemAE [ | 71.2% |
| The proposed CR-AE | 73.1% |
Figure 7Visualization of the testing results.
Figure 8Examples of better and worse abnormality detection results. (a) cars on the sidewalk. (b) cyclists on the sidewalk. (c) intense movements. (d) cars on the sidewalk. (e) occluded, cyclists on the sidewalk (f) poorly illuminated, cyclists on the sidewalk. (g) occluded, scooters on the sidewalk. (h) lost package.
Running time comparison of the UCSD Ped2 dataset.
| Method | Computing Environment | CPU | GPU | RAM | Detection Speed (fps) |
|---|---|---|---|---|---|
| MDT [ | - | 3.0 GHz | - | 2.0 GB | 0.04 |
| StackRNN [ | Python + Tensorflow | 3.5 GHz | - | 16 GB | 120 |
| AMDN [ | MATLAB 2015 | 2.1 GHz | Nvidia Quadro K4000 | 32 GB | 0.11 |
| Unmasking [ | Python + Tensorflow | - | GTX TITAN Xp | - | 20 |
| Proposed CR-AE | Python 3.7 + Tensorflow2.5 | 5.1 GHz | NVIDIA GTX 3080 | 32 GB | 249 |