| Literature DB >> 30889820 |
Akmaljon Palvanov1, Young Im Cho2.
Abstract
Visibility is a complex phenomenon inspired by emissions and air pollutants or by factors, including sunlight, humidity, temperature, and time, which decrease the clarity of what is visible through the atmosphere. This paper provides a detailed overview of the state-of-the-art contributions in relation to visibility estimation under various foggy weather conditions. We propose VisNet, which is a new approach based on deep integrated convolutional neural networks for the estimation of visibility distances from camera imagery. The implemented network uses three streams of deep integrated convolutional neural networks, which are connected in parallel. In addition, we have collected the largest dataset with three million outdoor images and exact visibility values for this study. To evaluate the model's performance fairly and objectively, the model is trained on three image datasets with different visibility ranges, each with a different number of classes. Moreover, our proposed model, VisNet, evaluated under dissimilar fog density scenarios, uses a diverse set of images. Prior to feeding the network, each input image is filtered in the frequency domain to remove low-level features, and a spectral filter is applied to each input for the extraction of low-contrast regions. Compared to the previous methods, our approach achieves the highest performance in terms of classification based on three different datasets. Furthermore, our VisNet considerably outperforms not only the classical methods, but also state-of-the-art models of visibility estimation.Entities:
Keywords: Fast Fourier transform; VisNet; convolutional neural networks; spectral filter; visibility
Year: 2019 PMID: 30889820 PMCID: PMC6471280 DOI: 10.3390/s19061343
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Comparisons of previous proposed methods.
| Category | Used Methods | Evaluation Range [m] | Advantages | Disadvantages |
|---|---|---|---|---|
| ANN-based methods | Uses feed-forward neural networks [ | 60–250 | High evaluation accuracy on the FROSI dataset; | Classifies only synthetic images and cannot be used for real-world images; |
| Combination of CNN and RNN are used for learning atmospheric visibility [ | 300–800 | Can effectively adapt to both small and big data scenario; | Computationally costly and consumes long time to train; | |
| Pre-trained CNN model (AlexNet [ | 5000–35,000 | Can evaluate a dataset of images with smaller resolutions; | Uses an unbalanced dataset; | |
| Deep neural networks for visibility forecasting [ | 0–5000 | A working application for aviation meteorological services at the airport; | Achieves a big absolute error (the absolute error of hourly visibility is 706 m. The absolute error is 325 m when visibility ≤1000 m); | |
| Visibility distance is evaluated using a system consists of three layers of forward-transfer, back-propagation, and risk neural network [ | 0–10,000 | A higher risk is given to low-visibility and vice-versa; | Needs a collection of several daily data that require high-cost measurements to predict future hours; | |
| Statistical methods | Estimation of atmospheric visibility distance via ordinary outdoor cameras based on the contrast expectation in the scene [ | 1–5000 | Uses publicly available dataset; | Model-driven; More effective for high-visibility; An estimation low-visibility is error-prone; Requires extra meteorological sensors to obtain important values; |
| A model based on the Sobel edge detection algorithm and normalized edge extraction method to detect visibility from camera imagery [ | 400–16,000 | Uses small amounts of images; | Cannot predict the distance if visibility is less than 400 m; Uses high-cost camera (COHU 3960 Series environmental camera); | |
| Uses Gaussian image entropy and piecewise stationary time series analysis algorithms [ | 0–600 | Uses a very big dataset (2,016,000 frames) to verify the model; | Can be used only road scenes; | |
| Exploration of visibility estimation from camera imagery [ | 250–5000 | Uses low-cost cameras; | Target objects are needed; |
Figure 1Structure of the VisNet that can adapt different visibility ranges. The model has two stages: pre-processing and classification.
Figure 2The map of stations for the visibility observation (from Ref. [98]).
Figure A1Samples from the collected FOVI dataset. Each image saved in the RGB color space with a resolution of 704 × 576. Visibility values registered as a text file with a range of 0 to 20,000 m.
Distributions of images between classes on both datasets.
| Dataset | Range | Total Number of Selected Images | Visibility Distance [m] | Number of Classes | Range Between Classes [m] |
|---|---|---|---|---|---|
| FOVI | Long-range | 140,000 | 0 to 20,000 | 41 | 500 |
| Short-range | 100,000 | 0 to 1000 | 21 | 50 | |
| FROSI | Short-range | 3528 | 0 to >250 | 7 | 50 |
Figure 3Samples from the FOVI dataset. (a) Very good visibility; (b) poor visibility (less than 2 km); (c) very poor visibility (less than 1 km); (d) dense fog (less than 50 m).
Figure 4Samples from the FROSI dataset. (a) Excellent visibility; (b) less than 250 m, (c) less than 150 m; (d) less than 50 m.
Figure 52D fast Fourier transform based filtering algorithm. The block-diagrams 2D FFT and 2D IFFT are fast Fourier transform and inverse fast Fourier transform accordingly. 2D FFT converts the image to frequency domain, 2D IFFT converts the filtered image into spatial domain from the Fourier domain. The high pass filter filters the spectrum in the frequency domain and removes low frequencies.
Figure 6Logical implementation of the row-major format.
Figure 7Shifting the DC value (Direct Current component) in the center of the image. (a) Original spectrum; (b) shifted spectrum.
Figure 8Outputs of the FFT-based filtering algorithm. Cloudy regions of the input image removed successfully (red rectangle). (a) Results from FVOI dataset; (b) results from FROSI dataset.
Figure 9Layers of the RGB color channel of an input image. (a) An original input image; (b) B-channel (blue); (c) R-channel (red); (d) G-channel (green).
Figure 10Spectrum of our color map. Fog and low contrast mostly appear in the color within the red-dashed range.
Figure 11Outputs of the spectral filter. (a) Results from the FOVI dataset; (b) results from the FROSI dataset.
Figure 12The architecture of the deep integrated CNNs.
Distributions of images between classes on both datasets.
| STREAM-1 | STREAM-2 | STREAM-3 | ||||||
|---|---|---|---|---|---|---|---|---|
| Layer Type | Num. of Filters | Size of Feature Map | Num. of Filters | Size of Feature Map | Num. of Filters | Size of Feature Map | Size of Kernel | Num. of Stride |
| Image input layer (weight × height × channel) | 400 × 300 × 3 | 400 × 300 × 3 | 400 × 300 × 3 | |||||
| 1th convolutional layer | 64 | 400 × 300 × 64 | 64 | 400 × 300 × 64 | 64 | 400 × 300 × 64 | 1 × 1 × 3 | 1 × 1 |
| 2nd convolutional layer | 64 | 398 × 298 × 64 | 64 | 398 × 298 × 64 | 64 | 398 × 298 × 64 | 3 × 3 × 3 | 1 × 1 |
| Max-pooling layer | 1 | 199 × 149 × 64 | 1 | 199 × 149 × 64 | 1 | 199 × 149 × 64 | 2 × 2 | 2 × 2 |
| Elementwise addition | ||||||||
| Elementwise addition | ||||||||
| 3rd convolutional layer | 128 | 199 × 149 × 128 | 128 | 199 × 149 × 128 | 128 | 199 × 149 × 128 | 1 × 1 × 3 | 1 × 1 |
| 4th convolutional layer | 128 | 197 × 147 × 128 | 128 | 197 × 147 × 128 | 128 | 197 × 147 × 128 | 3 × 3 × 3 | 1 × 1 |
| Max-pooling layer | 1 | 98 × 73 × 128 | 1 | 98 × 73 × 128 | 1 | 98 × 73 × 128 | 2 × 2 | 2 × 2 |
| Elementwise addition | ||||||||
| Elementwise addition | ||||||||
| 5th convolutional layer | 256 | 98 × 73 × 256 | 256 | 98 × 73 × 256 | 256 | 98 × 73 × 256 | 1 × 1 × 3 | 1 × 1 |
| 6nd convolutional layer | 256 | 48 × 36 × 256 | 256 | 48 × 36 × 256 | 256 | 48 × 36 × 256 | 3 × 3 × 3 | 2 × 2 |
| 7nd convolutional layer | 256 | 48 × 36 × 256 | 256 | 48 × 36 × 256 | 256 | 48 × 36 × 256 | 1 × 1 × 3 | 1 × 1 |
| Max-pooling layer | 1 | 24 × 18 × 256 | 1 | 24 × 18 × 256 | 1 | 24 × 18 × 256 | 2 × 2 | 2 × 2 |
| Elementwise addition | ||||||||
| 1st and 2nd fully connected layers | 1024 | 2048 | ||||||
| Dropout layers | 1024 | 2048 | ||||||
| 3rd fully connected layer | 4096 | |||||||
| Classification layer (output layer) | 41 or 21 or 7 | |||||||
Characteristics of the hardware and software of the machine.
| Item | Content |
|---|---|
| CPU | AMD Ryzen Threadripper 1950X 16-Core Processor |
| GPU | NVIDIA GeForce GTX 1080 Ti |
| RAM | 64 GB |
| Operating system | Windows 10 |
| Programming language | Python 3.6 |
| Deep learning library | Tensorflow 1.11 |
| Cuda | cuda 9.2 |
Figure 13Functions of validation and test accuracy as well as validation loss on three datasets. (a) Long-range visibility on FOVI, (b) short-range visibility on FOVI, and (c) short-range visibility on FROSI datasets.
Results of the validation and test accuracy on three datasets (unit in %).
| Dataset | Range | Validation Accuracy | Test Accuracy |
|---|---|---|---|
| FOVI | Long-range | 93.04 | 91.30 |
| Short-range | 91.77 | 89.51 | |
| FROSI | Short-range | 96.80 | 94.03 |
List of models and classification results (unit in %).
| Models | FOVI Dataset (Long-Range) | FOVI Dataset (Short-Range) | FROSI Dataset (Short-Range) | |||
|---|---|---|---|---|---|---|
| Val. Acc. | Test Acc. | Val. Acc. | Test Acc. | Val. Acc. | Test Acc. | |
| Simple neural network [ | 48.0 | 45.2 | 42.9 | 38.1 | 58.8 | 57.0 |
| Relative SVM [ | 69.7 | 68.5 | 59.0 | 57.0 | 73.0 | 72.2 |
| Relative CNN-RNN [ | 82.2 | 81.3 | 79.0 | 78.4 | 85.5 | 84.4 |
| ResNet-50 [ | 72.0 | 70.9 | 71.8 | 68.6 | 82.1 | 79.2 |
| VGG-16 [ | 89.6 | 89.0 | 89.0 | 88.5 | 88.6 | 91.0 |
| Alex-Net [ | 87.3 | 86.4 | 83.4 | 81.0 | 88.3 | 88.7 |
|
|
|
|
|
|
|
|
Bold values are the highest performances in the columns.
Performance comparison of multiple models on three datasets. Experimental results obtained by calculating the mean square error (MSE) of validation and test sets.
| Models | FOVI Dataset (Long-Range) | FOVI Dataset (Short-Range) | FROSI Dataset (Short-Range) | |||
|---|---|---|---|---|---|---|
| Val. Error | Test Error | Val. Error | Test Error | Val. Error | Test Error | |
| Simple neural network [ | 19.5 | 15.8 | 18.6 | 17.9 | 12.9 | 11.0 |
| Relative SVM [ | 14.7 | 13.1 | 16.3 | 14.5 | 13.1 | 11.9 |
| Relative CNN-RNN [ | 12.4 | 12.2 | 11.2 | 11.7 | 11.8 | 10.7 |
| ResNet-50 [ | 11.8 | 10.7 | 14.4 | 13.7 | 9.5 | 9.7 |
| VGG-16 [ | 10.1 | 9.9 | 10.4 | 10.0 | 8.0 | 7.8 |
| Alex-Net [ | 11.4 | 10.9 | 11.3 | 11.1 | 9.1 | 8.9 |
|
|
|
|
|
|
|
|
Bold values are the lowest error in the columns. Val. error—validation error.
Performance of multiple models trained using original (ORG), spectral filtered (SCP), and FFT filtered (FFT) images of three datasets. The weighted sum rule (Sum) was used to combine the obtained results of each trained network (unit in %).
| Models | FOVI Dataset (Long-Range) | FOVI Dataset (Short-Range) | FROSI Dataset (Short-Range) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ORG | SPC | FFT | Sum. | ORG | SPC | FFT | Sum. | ORG | SPC | FFT | Sum. | |
| Simple neural network [ | 45.2 | 41.9 | 51.3 | 49.8 | 38.1 | 36.1 | 37.6 | 37.4 | 57.0 | 52.1 | 53.8 | 55.0 |
| Relative SVM [ | 68.5 | 59.0 | 61.1 | 65.6 | 57.0 | 55.6 | 58.3 | 57.0 | 72.2 | 69.6 | 70.5 | 71.2 |
| Relative CNN-RNN [ | 81.3 | 77.0 | 75.6 | 78.0 | 78.4 | 71.6 | 75.5 | 76.0 | 84.4 | 73.5 | 79.3 | 79.8 |
| ResNet-50 [ | 70.9 | 61.9 | 73.4 | 71.2 | 68.6 | 60.0 | 67.5 | 64.8 | 79.2 | 76.7 | 78.0 | 78.1 |
| VGG-16 [ |
| 79.7 | 81.0 | 86.7 |
| 79.8 | 81.1 | 85.2 |
| 86.4 | 88.2 | 89.3 |
| Alex-Net [ | 86.4 | 76.0 | 78.5 | 82.1 | 81.0 | 76.3 | 77.1 | 78.2 | 88.7 | 79.0 | 83.6 | 84.7 |
| VisNet (single STREAM) | 70.1 | 70.5 | 79.1 | 74.7 | 72.7 | 69.4 | 71.9 | 71.3 | 78.8 | 75.6 | 80.3 | 78.8 |
Bold values are the highest performance in the dataset columns. ORG—original input images, SPC—spectral filtered input images, FFT—FFT filtered input images.
Comparisons of VisNet and multiple CNNs fusion results (unit in %).
| Models | FOVI Dataset (Long-Range) | FOVI Dataset (Short-Range) | FROSI Dataset (Short-Range) |
|---|---|---|---|
| VisNet | 91.3 | 89.5 | 94.0 |
| VGG-16/ORG + VisNet |
| 90.0 | 94.15 |
| VGG-16/SPC + VisNet | 91.5 | 89.53 | 94.25 |
| VGG-16/FFT + VisNet | 91.3 | 89.8 |
|
| VGG-16/ORG + VGG-16/SPC + VGG-16/FFT + VisNet | 91.40 |
| 94.2 |
Bold values are the highest performance in the columns.
Configurations of networks and classifications results of the five best solutions on three datasets.
| Net. Name | Convolutional Layers | Class. Acc. [%] | |||||||
|---|---|---|---|---|---|---|---|---|---|
| C-1 | C-2 | C-3 | C-4 | C-5 | C-6 | C-7 | C-8 | ||
| CNN-1 | 32 * | 64 | 128 | 256 | 512 | – | – | – | 78.3 |
| CNN-2 | 32 | 64 | 128 | 128 | 256 | 256 | – | – | 86.0 |
| CNN-3 | 64 | 64 | 128 | 128 | 256 | 256 | 256 | – |
|
| CNN-4 | 64 | 64 | 128 | 128 | 256 | 256 | 512 | 512 | 84.0 |
| CNN-5 | 64 | 128 | 256 | 256 | 256 | 512 | 512 | 512 | 83.0 |
Net. name—network name; *—number of filters; FS—filter size; MS—size of feature map; Class. Acc.—Classification accuracy for 3 ranges: (long-range FOVI)–(short-range FOVI)–(short-range FROSI); Padding = 0 for each convolutional layer; Fully connected layers (FC1, FC2, FC3) = (1024, 2048, 4096), respectively; Number of outputs = 41, 21, 7.
Figure 14Representations of output filters of three streams. (a) Representation of feature maps from the long-range FOVI dataset; (b) representation of feature maps from the short-range FROSI dataset.