| Literature DB >> 31756958 |
Tong Liu1, Jianghua Cheng1, Xiangyu Du1, Xiaobing Luo1, Liang Zhang1, Bang Cheng1, Yang Wang1.
Abstract
Smoke detection technology based on computer vision is a popular research direction in fire detection. This technology is widely used in outdoor fire detection fields (e.g., forest fire detection). Smoke detection is often based on features such as color, shape, texture, and motion to distinguish between smoke and non-smoke objects. However, the salience and robustness of these features are insufficiently strong, resulting in low smoke detection performance under complex environment. Deep learning technology has improved smoke detection performance to a certain degree, but extracting smoke detail features is difficult when the number of network layers is small. With no effective use of smoke motion characteristics, indicators such as false alarm rate are high in video smoke detection. To enhance the detection performance of smoke objects in videos, this paper proposes a concept of change-cumulative image by converting the YUV color space of multi-frame video images into a change-cumulative image, which can represent the motion and color-change characteristics of smoke. Then, a fusion deep network is designed, which increases the depth of the VGG16 network by arranging two convolutional layers after each of its convolutional layer. The VGG16 and Resnet50 (Deep residual network) network models are also arranged using the fusion deep network to improve feature expression ability while increasing the depth of the whole network. Doing so can help extract additional discriminating characteristics of smoke. Experimental results show that by using the change-cumulative image as the input image of the deep network model, smoke detection performance is superior to the classic RGB input image; the smoke detection performance of the fusion deep network model is better than that of the single VGG16 and Resnet50 network models; the smoke detection accuracy, false positive rate, and false alarm rate of this method are better than those of the current popular methods of video smoke detection.Entities:
Keywords: convolutional neural networks; deep learning; object detection; video smoke detection
Year: 2019 PMID: 31756958 PMCID: PMC6928619 DOI: 10.3390/s19235060
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Comparison with YUV and change-cumulative image of the 108th frame in the video “wildfire_smoke_4.avi”.
Figure 2Comparison with traditional convolutional layer and cascading convolutional layer.
Figure 3VGG16 network structure.
Figure 4ResNet50 network structure.
Figure 5Fusion deep network structure.
The details of fusion deep network structure.
| Input Layer (None, 224, 224, 3) | |||||
|---|---|---|---|---|---|
| VGG FEATURE Extractor | ResNet50 Feature Extractor | ||||
| Block | Layer (type) | Output Shape | Stage | Layer (type) | Output Shape |
| Block 1 | Conv2D * 2 | (None, 224, 224, 64) | Stage 1 | ZeroPadding | (None, 230, 230, 3) |
| Conv2D | (None, 112, 112, 64) | ||||
| MaxPooling | (None, 112, 112, 64) | BatchNormalization | (None, 112, 112, 64) | ||
| MaxPooling | (None, 56, 56, 64) | ||||
| Block 2 | Conv2D * 2 | (None, 112, 112, 128) | Stage 2 |
| (None, 56, 56, 256) |
| MaxPooling | (None, 56, 56, 128) | ||||
| Block 3 | Conv2D * 3 | (None, 56, 56, 256) | Stage 3 |
| (None, 28, 28, 512) |
| MaxPooling | (None, 28, 28, 256) | ||||
| Block 4 | Conv2D * 3 | (None, 28, 28, 512) | Stage 4 |
| (None, 14, 14, 1024) |
| MaxPooling | (None, 14, 14, 512) | ||||
| Block 5 | Conv2D * 3 | (None, 14, 14, 512) | Stage 5 |
| (None, 7, 7, 2048) |
| MaxPooling | (None, 7, 7, 512) | ||||
| Concatenate (None, 7, 7, 2560) | |||||
| Flatten (None, 125400) | |||||
| Fc & dropout 0.3 (None, 1024) | |||||
| Fc & dropout (None, 128) | |||||
| Output Fc & sigmoid (None, 1) | |||||
Description of the dataset used in this paper.
| Dataset | Description | Names |
|---|---|---|
| training dataset | smoke videos | Dry_leaf_smoke_02.avi [ |
| non-smoke videos | Waving_ leaves_895.avi [ | |
| testing dataset | smoke videos | Cotton_rope_smoke_04.avi, Black_smoke_517.avi [ |
| non-smoke videos | Traffic_1000.avi, Basketball_yard.avi [ |
Figure 6Training curves of different network models.
Training-relevant hyper-parameters.
| Hyper-Parameters | α | β1 | β2 | ε |
|---|---|---|---|---|
| Value | 0.001 | 0.9 | 0.999 | 10e-8 |
Figure 7ROC curves with different input images and network models.
Comparisons with different input images and network models.
| Input Image | Network | AR/% | FPR/% | FAR/% |
|---|---|---|---|---|
| RGB image | VGG16 | 88.06 | 14.82 | 6.97 |
| RGB image | ResNet50 | 89.98 | 13.54 | 3.94 |
| RGB image | fusion deep network | 92.86 | 9.64 | 2.82 |
| change-cumulative image | VGG16 | 90.48 | 13.24 | 3.09 |
| change-cumulative image | ResNet50 | 91.33 | 12.74 | 1.64 |
| change-cumulative image | fusion deep network (without pre-trained ImageNet weights) | 91.96 | 12.05 | 1.12 |
| change-cumulative image | fusion deep network (with pre-trained ImageNet weights) | 94.67 | 7.99 | 0.73 |
Figure 8Some detection results.
Comparisons with different methods.
| Methods | AR/% | FPR/% | FAR/% | dfps |
|---|---|---|---|---|
| HS’I model [ | 62.68 | 48.58 | 17.88 | 137 |
| LBP + SVM [ | 81.17 | 25.60 | 7.14 | 57 |
| CNN [ | 88.36 | 16.58 | 3.05 | 4 |
| DNCNN [ | 89.49 | 14.37 | 3.84 | 2 |
| dynamic characteristics [ | 81.18 | 28.14 | 2.72 | 27 |
| our method | 94.67 | 7.99 | 0.73 | 1 |