| Literature DB >> 35590863 |
Gemma Canet Tarrés1, Montse Pardàs1.
Abstract
Foreground object segmentation is a crucial first step for surveillance systems based on networks of video sensors. This problem in the context of dynamic scenes has been widely explored in the last two decades, but it still has open research questions due to challenges such as strong shadows, background clutter and illumination changes. After years of solid work based on statistical background pixel modeling, most current proposals use convolutional neural networks (CNNs) either to model the background or to make the foreground/background decision. Although these new techniques achieve outstanding results, they usually require specific training for each scene, which is unfeasible if we aim at designing software for embedded video systems and smart cameras. Our approach to the problem does not require specific context or scene training, and thus no manual labeling. We propose a network for a refinement step on top of conventional state-of-the-art background subtraction systems. By using a statistical technique to produce a rough mask, we do not need to train the network for each scene. The proposed method can take advantage of the specificity of the classic techniques, while obtaining the highly accurate segmentation that a deep learning system provides. We also show the advantage of using an adversarial network to improve the generalization ability of the network and produce more consistent results than an equivalent non-adversarial network. The results provided were obtained by training the network on a common database, without fine-tuning for specific scenes. Experiments on the unseen part of the CDNet database provided 0.82 a F-score, and 0.87 was achieved for LASIESTA databases, which is a database unrelated to the training one. On this last database, the results outperformed by 8.75% those available in the official table. The results achieved for CDNet are well above those of the methods not based on CNNs, and according to the literature, among the best for the context-unsupervised CNNs systems.Entities:
Keywords: adversarial networks; background subtraction; computer vision; deep learning; video sensors
Mesh:
Year: 2022 PMID: 35590863 PMCID: PMC9102692 DOI: 10.3390/s22093171
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Analysis of foreground segmentation methods.
| Background Modeling | Background Subtraction | Statistical | CNN | GAN | Context Dependent | |
|---|---|---|---|---|---|---|
| 2,3,4,7,8 | X | X | ||||
| 5,6,9,10,11,12,13,14 | X | X | X | |||
| 16 | X | X | X | |||
| 17 | X | X | X | X | ||
| 18,19,26,27 | X | X | X | X | ||
| 20,28,29,30 | X | X | X | X | ||
| 31 | X | X | X | |||
| 32,33 | X | X | ||||
| Ours | X | X | X | X | X |
Figure 1Pipeline of the mask refinement network at test time.
Figure 2Pipeline of the complete scheme for training the refinement network.
Figure 3Scheme of the generator.
Figure 4Scheme of the discriminator.
Main features of CDNet and LASIESTA datasets.
| CDNet | LASIESTA | |
|---|---|---|
| Sequences | 53 | 20 |
| Categories | 11 | 10 |
| Indoor sequences | 8 | 14 |
| Outdoor sequences | 45 | 10 |
| Frames/sequence | 600–7999 | 225–1400 |
| Resolution | 320 × 240–720 × 576 | 352 × 288 |
| Labeled images | 1 out of 10 | All |
Figure 5Examples of the masks generated by the network with and without the usage of a discriminator. First row: frame from the sequence “Park” (“thermal” category). Second row: frame from “wetSnow” (“badWeather” category). Third row: frame from “turbulence3” (“turbulence” category). First column: the input image of the frame is shown. Second column: its corresponding ground truth. Finally, the third column is for results obtained without a discriminator, and the last column shows results obtained using a discriminator.
Figure 6Precision vs. recall for the LASIESTA database.
Performance in precision, recall, and F-measure for the LASIESTA dataset. Categories are: simple sequences (SI), camouflage (CA), occlusions (OC), illumination changes (IL), modified background (MB), bootstrap (BS), cloudy conditions (CL), rainy conditions (RA), snowy conditions (SN), and sunny conditions (SU). Last column shows the Average values.
| SI | CA | OC | IL | MB | BS | CL | RA | SN | SU | AV | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Prec | 0.77 | 0.86 | 0.74 | 0.76 | 0.94 | 0.85 | 0.84 | 0.85 | 0.89 | 0.93 |
|
| Rec | 0.94 | 0.93 | 0.94 | 0.9 | 0.92 | 0.87 | 0.85 | 0.95 | 0.89 | 0.82 |
|
| F-meas | 0.84 | 0.89 | 0.83 | 0.82 | 0.93 | 0.86 | 0.84 | 0.90 | 0.89 | 0.87 |
|
Reported F-measures for the best methods published on LASIESTA’s official website. The last two rows correspond to the performances of the input to the mask refinement network (input mask in Figure 2) and the generated masks (output mask in the same Figure) with the network proposed. The results are given as final averages for all the sequences in the database (last column), and independently for each of the categories in the dataset.
| SI | CA | OC | IL | MB | BS | CL | RA | SN | SU | AV | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Wren [ | 0.82 | 0.76 | 0.89 | 0.49 | 0.74 | 0.47 | 0.86 | 0.85 | 0.60 | 0.75 |
|
| Stauffer [ | 0.83 | 0.83 | 0.89 | 0.29 | 0.76 | 0.36 | 0.87 | 0.78 | 0.60 | 0.72 |
|
| Zivkovik [ | 0.90 | 0.83 | 0.95 | 0.24 | 0.87 | 0.53 | 0.88 | 0.88 | 0.38 | 0.71 |
|
| Madd. [ |
| 0.86 |
| 0.21 | 0.91 | 0.40 |
| 0.90 | 0.81 |
|
|
| Haines [ | 0.89 |
| 0.92 | 0.85 | 0.84 | 0.68 | 0.83 | 0.89 | 0.17 | 0.85 |
|
| Cuevas [ | 0.88 | 0.84 | 0.79 | 0.65 |
| 0.66 | 0.93 | 0.87 | 0.78 | 0.72 |
|
| Mandal [ | 0.86 | 0.49 | 0.93 | 0.85 | 0.79 | 0.87 | 0.87 | 0.87 | 0.49 | 0.83 |
|
| Tezcan [ | 0.92 | 0.68 | 0.96 |
| 0.81 | 0.77 | 0.93 |
| 0.84 | 0.79 |
|
| MOG [ | 0.70 | 0.68 | 0.72 | 0.41 | 0.65 | 0.55 | 0.57 | 0.67 | 0.38 | 0.71 |
|
| Ours | 0.84 |
| 0.83 | 0.82 | 0.93 |
| 0.84 | 0.90 |
| 0.87 |
|
Figure 7Examples for the evaluation of the mask through the mask refinement method with LASIESTA database. First column shows original images. Second column, input rough mask provided by [11]. Third row, results of the network using the semantic segmentation network without adversarial loss, as in [21]. Forth column shows the results of this paper. Last column shows the ground truth masks.