| Literature DB >> 32429370 |
Nazanin Fouladgar1, Kary Främling1,2.
Abstract
Multivariate time series with missing data is ubiquitous when the streaming data is collected by sensors or any other recording instruments. For instance, the outdoor sensors gathering different meteorological variables may encounter low material sensitivity to specific situations, leading to incomplete information gathering. This is problematic in time series prediction with massive missingness and different missing rate of variables. Contribution addressing this problem on the regression task of meteorological datasets by employing Long Short-Term Memory (LSTM), capable of controlling the information flow with its memory unit, is still missing. In this paper, we propose a novel model called forward and backward variable-sensitive LSTM (FBVS-LSTM) consisting of two decay mechanisms and some informative data. The model inputs are mainly the missing indicator, time intervals of missingness in both forward and backward direction and missing rate of each variable. We employ this information to address the so-called missing not at random (MNAR) mechanism. Separately learning the features of each parameter, the model becomes adapted to deal with massive missingness. We conduct our experiment on three real-world datasets for the air pollution forecasting. The results demonstrate that our model performed well along with other LSTM-derivation models in terms of prediction accuracy.Entities:
Keywords: LSTM; massive missingness; multivariate time series; regression
Year: 2020 PMID: 32429370 PMCID: PMC7285013 DOI: 10.3390/s20102832
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1FBVS-LSTM unit.
Missing rate of datasets features.
| Dataset | Features | Missing Rate |
|---|---|---|
| Beijing PM2.5 | PM2.5 | 75% |
| dew | 52% | |
| temperature | 66% | |
| pressure | 49% | |
| wind direction | 89% | |
| wind speed | 67% | |
| snow | 1% | |
| rain | 5% | |
| Italy Air Quality | PT08.S1(CO) | 34% |
| PT08.S2(NMHC) | 38% | |
| PT08.S3(NOx) | 88% | |
| PT08.S4(NO2) | 32% | |
| PT08.S5(O3) | 45% | |
| Beijing Multi-Site Air-Quality | PM2.5 | 34% |
| PM10 | 28% | |
| SO2 | 81% | |
| NO2 | 22% | |
| CO | 36% | |
| O3 | 38% |
Figure 2PM2.5 concentration over 200 samples in Beijin PM2.5. (a) Actual data; (b) Generated missing data.
Figure 3NOx over 200 samples in Italy Air Quality. (a) Actual data; (b) Generated missing data.
Figure 4SO2 over 200 samples in Beijing Multi-Site Air-Quality. (a) Actual data; (b) Generated missing data.
Figure 5The general architecture of FBVS-LSTM units.
Parameter settings.
| Datasets | Parameters | ||||
|---|---|---|---|---|---|
| Epoch Number | Learning Rate | Hidden Layers | Features (Input Size) | Output Size | |
| Beijing PM2.5 | 30 | 0.01 | 24 | 8 | 1 |
| Italy Air Quality | 30 | 0.01 | 24 | 5 | 1 |
| Beijing Multi-Site Air-Quality | 30 | 0.01 | 24 | 6 | 1 |
All models performance in each dataset.
| Dataset | Model | MSE ± STD | |
|---|---|---|---|
| Train Error | Test Error | ||
| Beijin PM2.5 | LSTM-0 | 0.021 ± 0.020 | 0.016 ± 0.009 |
| LSTM-mean | 0.021 ± 0.016 | 0.015 ± 0.011 | |
| B-LSTM | 0.016 ± 0.008 | 0.010 ± 0.004 | |
| F-LSTM | 0.180 ± 0.324 | 0.172 ± 0.323 | |
| BVS-LSTM | 0.013 ± 0.006 | 0.010 ± 0.004 | |
| FBVS-LSTM | 0.012 ± 0.004 | 0.011 ± 0.005 | |
| Italy Air Quality | LSTM-0 | 0.122 ± 0.123 | 0.130 ± 0.142 |
| LSTM-mean | 0.063 ± 0.078 | 0.066 ± 0.082 | |
| B-LSTM | 0.027 ± 0.003 | 0.027 ± 0.009 | |
| F-LSTM | 0.030 ± 0.007 | 0.029 ± 0.015 | |
| BVS-LSTM | 0.031 ± 0.011 | 0.031 ± 0.013 | |
| FBVS-LSTM | 0.023 ± 0.002 | 0.024 ± 0.006 | |
| Beijing Multi-Site Air-Quality | LSTM-0 | 0.049 ± 0.029 | 0.06± 0.032 |
| LSTM-mean | 0.038 ± 0.015 | 0.03 ± 0.023 | |
| B-LSTM | 0.031 ± 0.011 | 0.03 ± 0.016 | |
| F-LSTM | 0.179 ± 0.3 | 0.148 ± 0.25 | |
| BVS-LSTM | 0.034 ± 0.008 | 0.040 ± 0.024 | |
| FBVS-LSTM | 0.026 ± 0.019 | 0.031 ± 0.002 | |
Figure 6Models performance in Beijing PM2.5 dataset. (a) Training set errors; (b) Test set errors.
Figure 7Models performance in Italy Air Quality dataset. (a)Training set errors; (b) Test set errors.
Figure 8Models performance in Beijing Multi-Site Air-Quality dataset. (a) Training set errors; (b) Test set errors.
Figure 9Prediction and ground truth outputs over 70 test samples. (a) PM2.5 in Beijing PM2.5 dataset; (b) PT08.S1(CO) in Italy Air Quality dataset; (c) O3 in Beijing Multi-Site Air-Quality.
Statistical analysis with t-test.
| Dataset | Model | FBVS-LSTM | |
|---|---|---|---|
| Beijin PM2.5 | LSTM-0 | −1.62 | 0.11 |
| LSTM-mean | −0.87 | 0.39 | |
| B-LSTM | −0.5 | 0.61 | |
| F-LSTM | −78.23 | 0.0001 | |
| BVS-LSTM | −0.02 | 0.98 | |
| Italy Air Quality | LSTM-0 | −135.68 | 0.0002 |
| LSTM-mean | −24.17 | 0.0005 | |
| B-LSTM | −1.58 | 0.12 | |
| F-LSTM | −1.87 | 0.06 | |
| BVS-LSTM | −1.51 | 0.13 | |
| Beijing Multi-Site Air-Quality | LSTM-0 | −15.07 | 0.0001 |
| LSTM-mean | −868.89 | 0.0004 | |
| B-LSTM | −0.37 | 0.71 | |
| F-LSTM | −52.92 | 0.0008 | |
| BVS-LSTM | −6.00 | 0.0001 | |