| Literature DB >> 29250110 |
Jun-He Yang1, Ching-Hsue Cheng1, Chia-Pan Chan1.
Abstract
Reservoirs are important for households and impact the national economy. This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir's water level. This study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. The two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. The proposed time-series forecasting model summarily has three foci. First, this study uses five imputation methods to directly delete the missing value. Second, we identified the key variable via factor analysis and then deleted the unimportant variables sequentially via the variable selection method. Finally, the proposed model uses a Random Forest to build the forecasting model of the reservoir's water level. This was done to compare with the listing method under the forecasting error. These experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing model. In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability.Entities:
Mesh:
Substances:
Year: 2017 PMID: 29250110 PMCID: PMC5700551 DOI: 10.1155/2017/8734214
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1The procedure of proposed model.
Description of variables in the research dataset.
| Output | Shimen Reservoir daily discharge release |
| Input | Shimen Reservoir daily inflow discharge |
| Temperature | Daily temperature in Daxi, Taoyuan |
| Rainfall | The previous day Shimen Reservoir accumulated rainfall |
| Pressure | Daily barometric pressure in Daxi, Taoyuan |
| Relative Humidity | Daily relative humidity in Daxi, Taoyuan |
| Wind Speed | Daily wind speed in Daxi, Taoyuan |
| Direction | Daily wind direction in Daxi, Taoyuan |
| Rainfall_Dasi | Daily rainfall in Daxi, Taoyuan |
The partial collected data.
| Date | Rainfall | Input | Output | Rainfall_Dasi | Temperature | Wind Speed | Direction | Pressure | Relative Humidity | Water level |
|---|---|---|---|---|---|---|---|---|---|---|
| 2008/1/1 | 0.1 | 83.6 | 0 | 10.2 | 4.7 | 65 | 1001.5 | 56 | 244.09 | |
| 2008/1/2 | 0.1 | 96.08 | 286.24 | 0 | 10.4 | 6.3 | 38 | 1000.7 | 59 | 243.93 |
| 2008/1/3 | 0 | 82.72 | 82.17 | 0 | 14.5 | 4.5 | 50 | 997.7 | 67 | 243.81 |
| 2008/1/4 | 0 | 133.32 | 262.22 | 0 | 15.3 | 3.4 | 40 | 996.5 | 82 | 243.78 |
| 2008/1/5 | 0 | 125.6 | 305.94 | 0 | 16 | 2.9 | 46 | 996.3 | 77 | 243.55 |
| 2008/1/6 | 0.3 | 98.74 | 192.33 | 0 | 16.7 | 1.2 | 170 | 995.4 | 83 | 243.32 |
| 2008/1/7 | 0 | 116.6 | 192.76 | 0 | 18.4 | 1.9 | 46 | 994.9 | 77 | 243.34 |
| 2008/1/8 | 0 | 93.12 | 109.73 | 0 | 19.9 | 1.3 | 311 | 992.9 | 78 | 243.33 |
| 2008/1/9 | 0 | 107.57 | 123.98 | 0 | 19.9 | 2.2 | 11 | 992.2 | 77 | 243.23 |
| 2008/1/10 | 0 | 65.15 | 276.74 | 0 | 19.6 | 1.6 | 357 | 991.6 | 80 | 243 |
| 2008/1/11 | 0 | 55.64 | 249.09 | 0 | 21.5 | 1.3 | 185 | 990.2 | 71 | 242.78 |
| 2008/1/12 | 0 | 91.67 | 191.81 | 0 | 19.8 | 4.2 | 37 | 992.1 | 75 | 242.74 |
| 2008/1/13 | 0.9 | 107.34 | 182.22 | 1.5 | 14.2 | 7.1 | 39 | 996.2 | 85 | 242.53 |
| 2008/1/14 | 5.2 | 80.09 | 146.62 | 1 | 12.7 | 7.1 | 36 | 997.7 | 85 | 242.49 |
| 2008/1/15 | 4 | 85.77 | 243.82 | 0 | 13.5 | 7.1 | 35 | 998.4 | 86 | 242.38 |
The results of listing models with the five imputation methods under percentage spilt (dataset partition into 66% training data and 34% testing data) before variable selection.
| Methods | Index | RBF Network | Kstar | Random Forest | IBK | Random Tree | |
|---|---|---|---|---|---|---|---|
| Before variable selection | |||||||
| Delete the rows with missing data | CC | 0.085 | 0.546 | 0.728 | 0.288 | 0.534 | |
| MAE | 0.183 | 0.133 | 0.111 | 0.186 | 0.145 | ||
| RMSE | 0.227 | 0.200 | 0.157 | 0.271 | 0.226 | ||
| RAE | 0.998 | 0.727 | 0.607 | 1.015 | 0.789 | ||
| RRSE | 0.997 | 0.879 | 0.690 | 1.193 | 0.995 | ||
| Serial mean | CC | 0.052 | 0.557 | 0.739 | 0.198 | 0.563 | |
| MAE | 0.174 | 0.123 | 0.102 | 0.191 | 0.126 | ||
| RMSE | 0.222 | 0.189 |
| 0.283 | 0.202 | ||
| RAE | 1.001 | 0.705 | 0.587 | 1.098 | 0.722 | ||
| RRSE | 0.999 | 0.850 | 0.679 | 1.276 | 0.908 | ||
| Linear | CC | 0.054 | 0.565 | 0.734 | 0.200 | 0.512 | |
| MAE | 0.175 | 0.121 | 0.101 | 0.189 | 0.138 | ||
| RMSE | 0.222 | 0.188 | 0.152 | 0.281 | 0.218 | ||
| RAE | 1.000 | 0.690 | 0.575 | 1.082 | 0.787 | ||
| RRSE | 0.999 | 0.844 | 0.684 | 1.264 | 0.980 | ||
| Near median | CC | 0.054 | 0.571 | 0.737 | 0.227 | 0.559 | |
| MAE | 0.175 | 0.120 | 0.101 | 0.188 | 0.126 | ||
| RMSE | 0.222 | 0.186 | 0.152 | 0.277 | 0.202 | ||
| RAE | 1.000 | 0.689 | 0.577 | 1.074 | 0.719 | ||
| RRSE | 0.999 | 0.838 | 0.681 | 1.244 | 0.907 | ||
| Near mean | CC | 0.053 | 0.572 |
| 0.232 | 0.512 | |
| MAE | 0.175 | 0.121 |
| 0.186 | 0.132 | ||
| RMSE | 0.222 | 0.186 |
| 0.275 | 0.217 | ||
| RAE | 1.000 | 0.690 |
| 1.062 | 0.756 | ||
| RRSE | 0.999 | 0.837 |
| 1.235 | 0.975 | ||
| Regression | CC | 0.052 | 0.564 | 0.739 | 0.200 | 0.509 | |
| MAE | 0.174 | 0.121 | 0.102 | 0.191 | 0.133 | ||
| RMSE | 0.222 | 0.188 | 0.151 | 0.283 | 0.216 | ||
| RAE | 1.001 | 0.695 | 0.586 | 1.096 | 0.762 | ||
| RRSE | 0.999 | 0.845 | 0.680 | 1.275 | 0.974 |
∗ denotes the best performance among 5 imputation methods.
The results of listing models with the five imputation methods under 10-folds cross-validation before variable selection.
| Methods | Index | RBF Network | Kstar | Random Forest | IBK | Random Tree | |
|---|---|---|---|---|---|---|---|
| Before variable selection | Delete the rows with missing data | CC | 0.041 | 0.590 | 0.737 | 0.246 | 0.505 |
| MAE | 0.184 | 0.126 | 0.109 | 0.195 | 0.143 | ||
| RMSE | 0.227 | 0.188 | 0.154 | 0.281 | 0.225 | ||
| RAE | 1.000 | 0.682 | 0.592 | 1.059 | 0.775 | ||
| RRSE | 0.999 | 0.825 | 0.678 | 1.235 | 0.986 | ||
| Serial mean | CC | 0.038 | 0.612 | 0.755 | 0.237 | 0.574 | |
| MAE | 0.171 | 0.113 | 0.098 | 0.181 | 0.125 | ||
| RMSE | 0.217 | 0.175 |
| 0.270 | 0.202 | ||
| RAE | 1.001 | 0.660 | 0.575 | 1.058 | 0.731 | ||
| RRSE | 0.999 | 0.802 | 0.658 | 1.241 | 0.929 | ||
| Linear | CC | 0.042 | 0.615 | 0.753 | 0.243 | 0.551 | |
| MAE | 0.173 | 0.112 | 0.098 | 0.181 | 0.127 | ||
| RMSE | 0.218 | 0.175 | 0.144 | 0.269 | 0.207 | ||
| RAE | 1.000 | 0.649 | 0.568 | 1.057 | 0.736 | ||
| RRSE | 0.999 | 0.800 | 0.660 | 1.233 | 0.948 | ||
| Near median | CC | 0.043 | 0.614 | 0.752 | 0.251 | 0.535 | |
| MAE | 0.173 | 0.113 | 0.098 | 0.180 | 0.131 | ||
| RMSE | 0.218 | 0.175 | 0.144 | 0.268 | 0.211 | ||
| RAE | 1.000 | 0.653 | 0.568 | 1.041 | 0.757 | ||
| RRSE | 0.999 | 0.801 | 0.661 | 1.227 | 0.967 | ||
| Near mean | CC | 0.043 | 0.613 |
| 0.250 | 0.558 | |
| MAE | 0.173 | 0.113 |
| 0.179 | 0.125 | ||
| RMSE | 0.218 | 0.175 | 0.144 | 0.268 | 0.205 | ||
| RAE | 1.000 | 0.654 |
| 1.039 | 0.725 | ||
| RRSE | 0.999 | 0.802 |
| 1.226 | 0.937 | ||
| Regression | CC | 0.038 | 0.618 | 0.754 | 0.240 | 0.522 | |
| MAE | 0.171 | 0.112 | 0.098 | 0.181 | 0.133 | ||
| RMSE | 0.217 | 0.174 |
| 0.270 | 0.214 | ||
| RAE | 1.001 | 0.653 | 0.574 | 1.055 | 0.778 | ||
| RRSE | 0.999 | 0.798 | 0.659 | 1.239 | 0.983 |
∗ denotes the best performance among 5 imputation methods.
The results of variable selection.
| Factor | |||
|---|---|---|---|
| 1 | 2 | 3 | |
| Input | . | .072 | .164 |
| Output | . | .037 | .071 |
| Rainfall | . | .064 | .503 |
| Temperature | .092 | . | −.149 |
| Pressure | −.254 | −. | −.182 |
| Wind Speed | .096 | −. | .052 |
| Direction | .041 | . | −.067 |
| Rainfall_Dasi | .290 | .068 | . |
| Relative Humidity | .025 | −.196 | . |
Note. Each factor's highest factor loading appears in bold.
The results of compare forecasting models under percentage spilt (dataset partition into 66% training data and 34% testing data) after variable selection.
| Methods | Index | RBF Network | Kstar | Random Forest | IBK | Random Tree | |
|---|---|---|---|---|---|---|---|
| After variable selection | Delete the rows with missing data | CC | 0.033 | 0.638a | 0.729a | 0.251 | 0.545a |
| MAE | 0.182a | 0.121a | 0.111a | 0.199 | 0.135a | ||
| RMSE | 0.229 | 0.176a | 0.156a | 0.287 | 0.212a | ||
| RAE | 0.992a | 0.657a | 0.602a | 1.083 | 0.736a | ||
| RRSE | 1.007 | 0.775a | 0.688a | 1.262 | 0.935a | ||
| Series mean | CC | 0.107a | 0.661a | 0.739a | 0.242a | 0.551 | |
| MAE | 0.172a | 0.107a | 0.101a | 0.179a | 0.129 | ||
| RMSE | 0.221a | 0.167a | 0.151a | 0.268a | 0.205 | ||
| RAE | 0.988a | 0.615a | 0.579a | 1.027a | 0.740 | ||
| RRSE | 0.995a | 0.753a | 0.678a | 1.208a | 0.923 | ||
| Linear | CC | 0.105a | 0.666a | 0.735a | 0.258a | 0.596a | |
| MAE | 0.173a | 0.106a | 0.100a | 0.175a | 0.120a | ||
| RMSE | 0.221a | 0.166a | 0.151a | 0.266a | 0.196a | ||
| RAE | 0.987a | 0.606a | 0.572a | 1.002a | 0.683a | ||
| RRSE | 0.995a | 0.748a | 0.681a | 1.198a | 0.883a | ||
| Median of nearby points | CC | 0.106a | 0.666a | 0.740a | 0.264a | 0.553 | |
| MAE | 0.173a | 0.107a | 0.100a | 0.177a | 0.127 | ||
| RMSE | 0.221a | 0.166a | 0.151a | 0.266a | 0.207 | ||
| RAE | 0.987a | 0.611a | 0.571a | 1.013a | 0.723 | ||
| RRSE | 0.995a | 0.747a | 0.677a | 1.195a | 0.932 | ||
| Mean of nearby points | CC | 0.1059a | 0.667a |
| 0.249a | 0.540a | |
| MAE | 0.173a | 0.107a |
| 0.179a | 0.129a | ||
| RMSE | 0.221a | 0.166a |
| 0.268a | 0.214a | ||
| RAE | 0.987a | 0.611a |
| 1.025a | 0.735a | ||
| RRSE | 0.995a | 0.747a |
| 1.207a | 0.962a | ||
| Regression | CC | 0.107a | 0.663a | 0.739a | 0.242a | 0.559a | |
| MAE | 0.172a | 0.106a | 0.101a | 0.179a | 0.126a | ||
| RMSE | 0.221a | 0.167a | 0.151a | 0.268a | 0.200a | ||
| RAE | 0.987a | 0.610a | 0.581a | 1.027a | 0.723a | ||
| RRSE | 0.994a | 0.752a | 0.678a | 1.207a | 0.900a |
a denotes after variable selection with enhancing performance; b denotes the best performance among 5 models after variable selection.
The results of compare forecasting models under 10-folds cross-validation after variable selection.
| Methods | Index | RBF Network | Kstar | Random Forest | IBK | Random Tree | |
|---|---|---|---|---|---|---|---|
| After variable selection | Delete the rows with missing data | CC | 0.103a | 0.665a | 0.737a | 0.233 | 0.529a |
| MAE | 0.181a | 0.115a | 0.108a | 0.193a | 0.143a | ||
| RMSE | 0.226a | 0.171a | 0.154a | 0.282a | 0.223a | ||
| RAE | 0.984a | 0.627a | 0.589a | 1.047a | 0.774a | ||
| RRSE | 0.994a | 0.749a | 0.677a | 1.238 | 0.977a | ||
| Series mean | CC | 0.081a | 0.688a | 0.751a | 0.295a | 0.547 | |
| MAE | 0.169a | 0.103a | 0.098a | 0.170a | 0.131 | ||
| RMSE | 0.217a | 0.158a | 0.144 | 0.260a | 0.209 | ||
| RAE | 0.988a | 0.600a | 0.571a | 0.990a | 0.767 | ||
| RRSE | 0.996a | 0.727a | 0.661 | 1.193a | 0.960 | ||
| Linear | CC | 0.081a | 0.692a | 0.750 | 0.286a | 0.551a | |
| MAE | 0.171a | 0.102a | 0.098a | 0.169a | 0.128 | ||
| RMSE | 0.218a | 0.158a | 0.145 | 0.261a | 0.207a | ||
| RAE | 0.988a | 0.590a | 0.566a | 0.981a | 0.740 | ||
| RRSE | 0.996a | 0.723a | 0.662 | 1.196a | 0.948a | ||
| Median of nearby points | CC | 0.083a | 0.692a | 0.752a | 0.305a | 0.555a | |
| MAE | 0.171a | 0.102a |
| 0.169a | 0.126a | ||
| RMSE | 0.218a | 0.158a | 0.144a | 0.259a | 0.208a | ||
| RAE | 0.987a | 0.593a |
| 0.980a | 0.732a | ||
| RRSE | 0.996a | 0.722a | 0.660a | 1.186a | 0.951a | ||
| Mean of nearby points | CC | 0.082a | 0.694a |
| 0.276a | 0.537 | |
| MAE | 0.171a | 0.102a |
| 0.171a | 0.129 | ||
| RMSE | 0.218a | 0.157a |
| 0.263a | 0.210 | ||
| RAE | 0.988a | 0.593a | 0.564a | 0.993a | 0.747 | ||
| RRSE | 0.996a | 0.721a |
| 1.204a | 0.960 | ||
| Regression | CC | 0.081a | 0.690a |
| 0.295a | 0.572 | |
| MAE | 0.169a | 0.102a | 0.098a | 0.169a | 0.126 | ||
| RMSE | 0.217a | 0.158a |
| 0.259a | 0.204 | ||
| RAE | 0.988a | 0.595a | 0.571a | 0.989a | 0.735 | ||
| RRSE | 0.996a | 0.725a |
| 1.193a | 0.938 |
a denotes after variable selection with enhancing performance; b denotes the best performance among 5 models after variable selection.