| Literature DB >> 30518131 |
Yao Wang1,2, Zujun Yu3,4, Liqiang Zhu5,6.
Abstract
Foreground detection, which extracts moving objects from videos, is an important and fundamental problem of video analysis. Classic methods often build background models based on some hand-craft features. Recent deep neural network (DNN) based methods can learn more effective image features by training, but most of them do not use temporal feature or use simple hand-craft temporal features. In this paper, we propose a new dual multi-scale 3D fully-convolutional neural network for foreground detection problems. It uses an encoder⁻decoder structure to establish a mapping from image sequences to pixel-wise classification results. We also propose a two-stage training procedure, which trains the encoder and decoder separately to improve the training results. With multi-scale architecture, the network can learning deep and hierarchical multi-scale features in both spatial and temporal domains, which is proved to have good invariance for both spatial and temporal scales. We used the CDnet dataset, which is currently the largest foreground detection dataset, to evaluate our method. The experiment results show that the proposed method achieves state-of-the-art results in most test scenes, comparing to current DNN based methods.Entities:
Keywords: 3D convolutional networks; background modeling; deep learning; deep neural networks; foreground detection; fully convolutional networks
Year: 2018 PMID: 30518131 PMCID: PMC6308466 DOI: 10.3390/s18124269
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The architecture of the proposed network. (The downsampling rate or the upsampling rate are shown in each layer. The dimensions of the tensors are shown beside corresponding arrows.)
Figure 2The detailed structure of a 3D convolutional layer.
Figure 3The detailed structure of a 2D de-convolutional layer.
Figure 4Example scenes in the CDnet 2014 dataset.
Figure 5Results of CDnet 2014 dataset (From top to bottom: “baseline”, “badWeather”, “cameraJitter”, “dynamicBackground”, “intermittentObjectMotion”, “lowFramerate”, “nightVideo”, “PTZ”, “shadow”, “thermal”, and “turbulence”. The gray pixels in the ground truth indicate regions that are not interested).
FM metric comparison of different foreground detection methods over all categories of the CDnet2014: baseline (BL), cameraJitter (CJ), badWeather (BW), dynamicBackgournd (DB), intermittentObjectMotion (IOM), lowFramerate (LF), nightVideo (NV), PTZ, shadow (SH), thermal (TH), and turbulence (TU).
| Methods | BL | CJ | BW | DB | IOM | LF | NV | PTZ | SH | TH | TU | Average |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DMFC3D (Ours) |
| 0.9744 |
|
|
|
|
| 0.9287 |
|
|
|
|
| Cascade [ | 0.9786 |
| 0.9451 | 0.9658 | 0.8505 | 0.8804 | 0.8926 |
| 0.9593 | 0.8958 | 0.9215 | 0.9273 |
| DeepBS [ | 0.9580 | 0.8990 | 0.8647 | 0.8761 | 0.6097 | 0.5900 | 0.6359 | 0.3306 | 0.9304 | 0.7583 | 0.8993 | 0.7593 |
| SuBSENSE [ | 0.9503 | 0.8152 | 0.8594 | 0.8177 | 0.6569 | 0.6594 | 0.4918 | 0.3894 | 0.8986 | 0.8171 | 0.8423 | 0.7453 |
| PAWCS [ | 0.9397 | 0.8137 | 0.8059 | 0.8938 | 0.7764 | 0.6433 | 0.4171 | 0.4450 | 0.8934 | 0.8324 | 0.7667 | 0.7479 |
PWC (%) comparison of different foreground detection methods over all categories of the CDnet2014: baseline (BL), cameraJitter (CJ), badWeather (BW), dynamicBackgournd (DB), intermittentObjectMotion (IOM), lowFramerate (LF), nightVideo (NV), PTZ, shadow (SH), thermal (TH), and turbulence (TU).
| Methods | BL | CJ | BW | DB | IOM | LF | NV | PTZ | SH | TH | TU | Average |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DMFC3D (Ours) |
| 0.2253 |
| 0.0613 |
|
|
| 0.2031 |
|
|
|
|
| Cascade [ | 0.1405 |
| 0.1910 |
| 1.5416 | 0.1317 | 0.6116 |
| 0.3500 | 1.0478 | 0.0584 | 0.4052 |
| DeepBS [ | 0.2424 | 0.8994 | 0.3784 | 0.2067 | 4.1292 | 1.3564 | 2.5754 | 7.7228 | 0.7403 | 3.5773 | 0.0838 | 1.9920 |
| SuBSENSE [ | 0.3574 | 1.6469 | 0.4527 | 0.4042 | 3.8349 | 0.9968 | 3.7717 | 3.8160 | 1.0120 | 2.0125 | 0.1527 | 1.6780 |
| PAWCS [ | 0.4491 | 1.4220 | 0.5319 | 0.1917 | 2.3536 | 0.7258 | 3.3386 | 1.1162 | 1.0230 | 1.4018 | 0.6378 | 1.1993 |
FM metric comparison of multi-scale FC3D and single-scale FC3D over CDnet2014.
| Category | DMFC3D | FC3D |
|---|---|---|
| baseline | 0.9950 | 0.9941 |
| cameraJitter | 0.9744 | 0.9651 |
| badWeather | 0.9703 | 0.9699 |
| dynamicBackground | 0.9780 | 0.9775 |
| intermittentObjectMotion | 0.8835 | 0.8779 |
| lowFramerate | 0.9233 | 0.8575 |
| nightVideo | 0.9696 | 0.9595 |
| PTZ | 0.9287 | 0.9240 |
| shadow | 0.9893 | 0.9881 |
| thermal | 0.9924 | 0.9902 |
| turbulence | 0.9773 | 0.9729 |
|
| 0.9620 | 0.9524 |