| Literature DB >> 31193949 |
Heri Kuswanto1, Achmad Naufal1.
Abstract
East Nusa Tenggara Province is one of the most vulnerable regions in Indonesia to drought. Drought prediction is definitely needed as a mitigation action to minimize the risk of drought. However, a sparse dataset has led to difficulties in accurately predicting future droughts in areas without meteorological stations, and hence a dataset with a finer resolution is required. This research investigates the performance of a 3-month Standardized Precipitation Index (SPI) derived from the Tropical Rainfall Measuring Mission (TRMM) and Modern-Era Retrospective analysis for Research and Applications (MERRA-2) to predict drought. CART and Random Forest are applied as the classification methods. Using several predictors, the analysis finds that CART has lower predictability than Random Forest. The average accuracy of the prediction using Random Forest reaches 100% with an average Area Under Curve (AUC) of about 0.8. The analysis also shows that predictions using the MERRA-2 dataset lead to higher accuracy and AUC than those using the TRMM. Therefore, using the MERRA-2 dataset predicted by Random Forest can be an optimal way to predict drought in East Nusa Tenggara. The methods confirmed that average soil surface temperature (day and night), Multivariate ENSO Index (MEI), Arctic Oscillation Index (AOI) and Normalized Difference Vegetation Index (NDVI) are strong predictors of drought. The performance of CART and Random Forest is improved with the Synthetic Minority Over-Sampling Technique (SMOTE). The techniques described: •translate drought information and predictors of drought into a base classifier that optimizes the AUC;•allow drought to be predicted for many grid points efficiently and with high accuracy; and•are computationally efficient and easy to implement.Entities:
Keywords: CART; Drought; Random forest; Random forest and CART; Remote-sensing
Year: 2019 PMID: 31193949 PMCID: PMC6545411 DOI: 10.1016/j.mex.2019.05.029
Source DB: PubMed Journal: MethodsX ISSN: 2215-0161
Research Variables.
| Variables | Variable Name | Spatial Resolution | Scale |
|---|---|---|---|
| Y1 | 0.25° × 0.25° | Categorical | |
| ≥ (−1.00) = Normal | |||
| (−1.00) to (−1.49) = Moderate | |||
| ≤ (−1.50) = Severe | |||
| Y2 | 0.5° × 0.625° | Categorical | |
| ≥ (−1.00) = Normal | |||
| (−1.00) to (−1.49) = Moderate | |||
| ≤ (−1.50) = Severe | |||
| X1 | Average surface temperature (Day) | 0.05° × 0.05° | Numeric |
| X2 | Average surface temperature (Night) | 0.05° × 0.05° | Numeric |
| X3 | NDVI | 0.05° × 0.05° | Numeric |
| X4 | MEI | – | Numeric |
| X5 | AOI | – | Numeric |
Cross Tabulation of Classification Results.
| Actual Class | Predicted Class | Total | ||
|---|---|---|---|---|
| 0 | 1 | 2 | ||
| 0 | ||||
| 1 | ||||
| 2 | ||||
| Total | ||||
where:
m = the number of observations of class i rightly predicted as belonging to class j (i = j).
m = the number of observations from class i incorrectly predicted as belonging to class j. (i ≠ j).
M = number of observations of class i.
M = number of observations of class j.
M = total number of observations or predictions.
Fig. 1Illustration of the 10-fold cross validation procedure.
Fig. 2Drought characteristics in East Nusa Tenggara Province based on TRMM (left) and MERRA-2 (right) for (a) moderate and (b) severe levels.
Fig. 3Complexity parameter (left) and tree (right) at the selected grid.
Accuracy and AUC of CART for drought prediction using TRMM at (8.625°S; 120.125°E).
| Fold | Training Accuracy (%) | Training AUC | Testing Accuracy (%) | Testing AUC |
|---|---|---|---|---|
| Fold01 | 85.71% | 0.5000 | 84.21% | 0.5000 |
| Fold02 | 85.23% | 0.5000 | 88.89% | 0.5000 |
| Fold03 | 85.63% | 0.5000 | 85.00% | 0.5000 |
| Fold04 | 86.78% | 0.7806 | 80.00% | 0.6458 |
| Fold05 | 86.78% | 0.8690 | 80.00% | 0.8824 |
| Fold06 | 85.23% | 0.5000 | 88.89% | 0.5000 |
| Fold07 | 86.78% | 0.7933 | 70.00% | 0.5980 |
| Fold08 | 86.29% | 0.8401 | 73.68% | 0.4363 |
| Fold09 | 85.63% | 0.5000 | 85.00% | 0.5000 |
| Fold10 | 86.78% | 0.8661 | 80.00% | 0.4265 |
| Average | 86.08% | 0.6649 | 81.57% | 0.5489 |
Fig. 4Plot of AUC for TRMM (left) and MERRA-2 (right) analysed using CART.
Fig. 5Settings of parameters for Random Forest.
Accuracy and AUC of Random Forest for drought prediction using MERRA-2 at (8.625°S; 120.125°E).
| Fold | Training Accuracy (%) | Training AUC | Testing Accuracy (%) | Testing AUC |
|---|---|---|---|---|
| Fold01 | 86.29% | 0.6036 | 84.21% | 0.4896 |
| Fold02 | 86.36% | 0.5830 | 88.89% | 0.6146 |
| Fold03 | 84.48% | 0.6149 | 85.00% | 0.7696 |
| Fold04 | 85.06% | 0.6120 | 80.00% | 0.5469 |
| Fold05 | 85.06% | 0.5908 | 90.00% | 0.8971 |
| Fold06 | 84.66% | 0.5923 | 88.89% | 0.5885 |
| Fold07 | 86.21% | 0.6139 | 85.00% | 0.3186 |
| Fold08 | 85.14% | 0.6412 | 84.21% | 0.3627 |
| Fold09 | 86.21% | 0.5889 | 85.00% | 0.8284 |
| Fold10 | 86.21% | 0.5898 | 85.00% | 0.6176 |
| Average | 85.57% | 0.6030 | 85.62% | 0.6034 |
Fig. 6Plot of AUC for TRMM (left) and MERRA-2 (right) analysed using Random Forest.
Fig. 7Illustration of the SMOTE Procedure.
Fig. 8Illustration of SMOTE procedure in k-fold cross validation.
Fig. 9Performance of SMOTE-CART (upper panel) and SMOTE-Random Forest (lower panel) with TRMM (left) and MERRA-2 (right).
Fig. 10Summary statistic of the AUC values of CART and Random Forest with and without SMOTE.
| Subject Area: | Environmental Science |
| More specific subject area: | Drought Prediction |
| Method name: | Random Forest and CART |
| Name and references for original method: | Random Forest and CART |
| Breiman, L. (1996) Bagging Predictors, Machine Learning, 26, 123-140. | |
| Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32. | |
| Breiman, L. Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984) Classification and Regression Trees, Wadsworth, Monterey, CA. | |
| Resource availability: | MERRA-2 Re-analysis dataset available online |
| TRMM satellite data available online | |
| R (Open source software for data processing) |