| Literature DB >> 36172321 |
Peng Liu1,2, Junying Feng1, Jianli Sang1, Yong Kim2.
Abstract
Foreground detection is a classic video processing task, widely used in video surveillance and other fields, and is the basic step of many computer vision tasks. The scene in the real world is complex and changeable, and it is difficult for traditional unsupervised methods to accurately extract foreground targets. Based on deep learning theory, this paper proposes a foreground detection method based on the multiscale U-Net architecture with a fusion attention mechanism. The attention mechanism is introduced into the U-Net multiscale architecture through skip connections, causing the network model to pay more attention to the foreground objects, suppressing irrelevant background regions, and improving the learning ability of the model. We conducted experiments and evaluations on the CDnet-2014 dataset. The proposed model inputs a single RGB image and only utilizes spatial information, with an overall F-measure of 0.9785. The input of multiple images is fused, and the overall F-measure can reach 0.9830 by using spatiotemporal information. Especially in the Low Framerate category, the F-measure exceeds the current state-of-the-art methods. The experimental results demonstrate the effectiveness and superiority of our proposed method.Entities:
Year: 2022 PMID: 36172321 PMCID: PMC9512601 DOI: 10.1155/2022/7432615
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Steps of background modeling method.
Figure 2The architecture of the proposed AMU-Net for foreground detection.
Figure 3The internal structure of the attention mechanism module.
The proposed AMU-Net model configurations.
| Layer | Kernel | Stride | Channel | Output size |
|---|---|---|---|---|
| Input | — | — | 3 | 640 |
| conv1_1 | 3 | 1 | 64 | 640 |
| conv1_2 | 3 | 1 | 64 | 640 |
| maxpool_1 | 2 | 2 | 64 | 320 |
| conv2_1 | 3 | 1 | 128 | 320 |
| conv2_2 | 3 | 1 | 128 | 320 |
| maxpool_2 | 2 | 2 | 128 | 160 |
| conv3_1 | 3 | 1 | 256 | 160 |
| conv3_2 | 3 | 1 | 256 | 160 |
| conv3_3 | 3 | 1 | 256 | 160 |
| maxpool_3 | 2 | 2 | 256 | 80 |
| conv4_1 | 3 | 1 | 512 | 80 |
| conv4_2 | 3 | 1 | 512 | 80 |
| conv4_3 | 3 | 1 | 512 | 80 |
| maxpool_4 | 2 | 2 | 512 | 40 |
| conv5_1 | 3 | 1 | 512 | 40 |
| conv5_2 | 3 | 1 | 512 | 40 |
| conv5_3 | 3 | 1 | 512 | 40 |
| conv6 | 1 | 1 | 512 | 40 |
| attention4 | 1 | 1 | 512 | 80 |
| attention3 | 1 | 1 | 256 | 160 |
| attention2 | 1 | 1 | 128 | 320 |
| attention1 | 1 | 1 | 64 | 640 |
| tranconv4 | 4 | 2 | 256 | 80 |
| conv4d | 3 | 1 | 512 | 80 |
| tranconv3 | 4 | 2 | 256 | 160 |
| conv3d | 3 | 1 | 256 | 160 |
| tranconv2 | 4 | 2 | 128 | 320 |
| conv2d | 3 | 1 | 128 | 320 |
| tranconv1 | 4 | 2 | 64 | 640 |
| conv1d | 3 | 1 | 64 | 640 |
| conv_out | 1 | 1 | 1 | 640 |
Results of ablation analysis.
| Model | Precision | Recall | F-measure |
|---|---|---|---|
| MU-Net | 0.9807 | 0.9684 | 0.9742 |
| AMU-Net |
|
|
|
The bold values indicate the better result in a given column.
Complete results of the AMU-Net on CDnet-2014 datasets.
| Category | Precision | Recall | Specificity | FNR | FPR | PWC | F-measure |
|---|---|---|---|---|---|---|---|
| PTZ | 0.9907 | 0.9628 | 0.9999 | 0.0372 | 0.0001 | 0.0285 | 0.9759 |
| Bad Weather | 0.9897 | 0.9853 | 0.9998 | 0.0147 | 0.0002 | 0.0425 | 0.9875 |
| Baseline | 0.9970 | 0.9903 | 0.9999 | 0.0097 | 0.0001 | 0.0328 | 0.9936 |
| Camera Jitter | 0.9937 | 0.9889 | 0.9997 | 0.0111 | 0.0003 | 0.0679 | 0.9913 |
| Dynamic Bg | 0.9965 | 0.9864 | 0.9999 | 0.0136 | 0.0001 | 0.0145 | 0.9914 |
| Intermitt | 0.9957 | 0.9805 | 0.9997 | 0.0195 | 0.0003 | 0.1613 | 0.9879 |
| Low Framerate | 0.9150 | 0.8921 | 0.9998 | 0.1079 | 0.0002 | 0.0475 | 0.9030 |
| Night Videos | 0.9860 | 0.9701 | 0.9997 | 0.0299 | 0.0003 | 0.0934 | 0.9779 |
| Shadow | 0.9922 | 0.9921 | 0.9996 | 0.0079 | 0.0004 | 0.0657 | 0.9921 |
| Thermal | 0.9907 | 0.9854 | 0.9995 | 0.0146 | 0.0005 | 0.0842 | 0.9880 |
| Turbulence | 0.9876 | 0.9630 | 0.9999 | 0.0370 | 0.0001 | 0.0256 | 0.9751 |
| Overall | 0.9850 | 0.9724 | 0.9998 | 0.0276 | 0.0002 | 0.0603 | 0.9785 |
F-measure comparison of different methods on CDnet-2014 dataset.
|
|
The differences between AMU-Net and FgSegNet_v2.
| Methods | Number of models | Network parameters | Training time | GPU |
|---|---|---|---|---|
| FgSegNet_v2 [ | 53 | 489 M (53 | 29 days [ | 1080 Ti |
| AMU-Net | 1 | 24.9 M (24, 915, 969) | 12 hours | 2080 Ti |
Figure 4Qualitative comparison of the proposed AMU-Net and other models.
Figure 5The architecture of the proposed AMU-Net_M1 for foreground detection.
Figure 6The architecture of the proposed AMU-Net_M2 for foreground detection.
F-measure comparison of different methods on CDnet-2014 dataset.
| Method | PTZ | badWeat | Baseline | cameraJit | dynaBg | Intermit | lowFrame | nightVid | Shadow | Thermal | Turbul | Overall |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AMU-Net_M1 | 0.9684 | 0.9865 | 0.9936 | 0.9884 | 0.9909 |
|
| 0.9733 | 0.9928 | 0.9898 |
| 0.9827 |
| AMU-Net_M2 |
| 0.9829 |
| 0.9873 | 0.9913 | 0.9888 | 0.9596 | 0.9741 |
|
| 0.9754 |
|
| AMU-Net |
|
| 0.9936 |
|
| 0.9879 | 0.9030 |
| 0.9921 | 0.9880 | 0.9751 | 0.9785 |
| CascadeCNN [ | 0.9344 | 0.9451 | 0.9786 | 0.9758 | 0.9658 | 0.8505 | 0.8804 | 0.8926 | 0.9593 | 0.8958 | 0.9215 | 0.9272 |
| SuBSENSE [ | 0.3476 | 0.8619 | 0.9503 | 0.8152 | 0.8177 | 0.6569 | 0.6445 | 0.5599 | 0.8986 | 0.8171 | 0.7792 | 0.7408 |
| FTSG [ | 0.3241 | 0.8228 | 0.9330 | 0.7513 | 0.8792 | 0.7891 | 0.6259 | 0.5130 | 0.8535 | 0.7768 | 0.7127 | 0.7283 |
| GMM [ | 0.1046 | 0.7406 | 0.8382 | 0.5670 | 0.6328 | 0.5325 | 0.5065 | 0.3960 | 0.7322 | 0.6548 | 0.4169 | 0.5566 |
Figure 7Qualitative comparison of different methods.