| Literature DB >> 35205036 |
Zhichao Li1, Helen Gurgel2, Lei Xu3, Linsheng Yang1, Jinwei Dong1.
Abstract
Timely and accurate forecasts of dengue cases are of great importance for guiding disease prevention strategies, but still face challenges from (1) time-effectiveness due to time-consuming satellite data downloading and processing, (2) weak spatial representation capability due to data dependence on administrative unit-based statistics or weather station-based observations, and (3) stagnant accuracy without the application of historical case information. Geospatial big data, cloud computing platforms (e.g., Google Earth Engine, GEE), and emerging deep learning algorithms (e.g., long short term memory, LSTM) provide new opportunities for advancing these efforts. Here, we focused on the dengue epidemics in the urban agglomeration of the Federal District of Brazil (FDB) during 2007-2019. A new framework was proposed using geospatial big data analysis in the Google Earth Engine (GEE) platform and long short term memory (LSTM) modeling for dengue case forecasts over an epidemiological week basis. We first defined a buffer zone around an impervious area as the main area of dengue transmission by considering the impervious area as a human-dominated area and used the maximum distance of the flight range of Aedes aegypti and Aedes albopictus as a buffer distance. Those zones were used as units for further attribution analyses of dengue epidemics by aggregating the pixel values into the zones. The near weekly composite of potential driving factors was generated in GEE using the epidemiological weeks during 2007-2019, from the relevant geospatial data with daily or sub-daily temporal resolution. A multi-step-ahead LSTM model was used, and the time-differenced natural log-transformed dengue cases were used as outcomes. Two modeling scenarios (with and without historical dengue cases) were set to examine the potential of historical information on dengue forecasts. The results indicate that the performance was better when historical dengue cases were used and the 5-weeks-ahead forecast had the best performance, and the peak of a large outbreak in 2019 was accurately forecasted. The proposed framework in this study suggests the potential of the GEE platform, the LSTM algorithm, as well as historical information for dengue risk forecasting, which can easily be extensively applied to other regions or globally for timely and practical dengue forecasts.Entities:
Keywords: Google Earth Engine; LSTM; dengue; geospatial big data; risk forecasting
Year: 2022 PMID: 35205036 PMCID: PMC8869738 DOI: 10.3390/biology11020169
Source DB: PubMed Journal: Biology (Basel) ISSN: 2079-7737
Figure 1The framework of dengue risk forecasting based on the analysis of geospatial big data in GEE and LSTM modeling.
Figure 2Geolocation of the Federal District of Brazil (a) and the number of dengue cases per epidemiological week during 2007–2019 (b). The impervious land indicates the human-dominated area, and the buffer zone of 1 km indicates the main area of dengue transmission.
Summary of explanatory factors and data sources used in this study.
| Explanatory Factors | Unit | Algorithm | Data Sources and Spatio-Temporal Resolutions | |
|---|---|---|---|---|
| Log-transformed weekly dengue cases | Number | Sum | SINAN | weekly (epi week), city |
| dLSTmean | °C | Average | MOD11A1 | daily, 1000 m |
| nLSTmean | °C | Average | ||
| NDVImean | - | Average | MOD09GA | daily, 500 m |
| EVImean | - | Average | ||
| Rsum | mm | Sum | TRMM 3B42 | 3-hourly, 0.25 × 0.25 degree |
| Tmean | °C | Average | GLDAS-2.1 | daily, 0.25 × 0.25 degree |
| RHmean | % | Average | GLDAS-2.1 | |
Results of the stationarity ADF test and KPSS test of time series of dengue data and external factors.
| Dengue Data | ADF | KPSS |
|---|---|---|
| Weekly dengue cases | −6.28 * | 0.399 ** |
| Natural log-transformed weekly dengue cases | −5919 * | 0.789 ** |
| Time-differencing natural log-transformed weekly dengue cases | −5.67 * | 0.068 * |
| NDVImean | −7.875 * | 0.061 * |
| RHmean | −7.662 * | 0.293 * |
| Rsum | −8.387 * | 0.052 * |
| Tmean | −7.497 * | 1.008 ** |
| 1% level | −3.4401 | 0.739 |
| 5% level | −2.8658 | 0.463 |
| 10% level | −2.569 | 0.347 |
* Stationary ** Non-stationary.
Figure 3The correlations among the climate and environmental factors (A) and the temporal pattern of time series of the natural log-transformed weekly dengue cases and four selected driving factors (NDVImean, RHmean, Rsum, and Tmean) during 2007–2019 (B–E). One asterisk (*) and two asterisks (**) in (A) represent a p-value of correlation coefficient less than 0.05 and 0.01, respectively.
The parameters in LSTM models used in this study. Time step refers to the length of input features used to make predictions. Loss function measures the difference between predicted and observed values. Number of units refers to the number of units in the LSTM layer. Epoch represents the number of completed training using all data in a training set. Batch size refers to the size of the input data used to update LSTM parameters one time. Learning rate refers to the rate for updating LSTM parameters. Optimizer refers to the algorithm for updating parameters. Dropout rate is the percent of units in the LSTM layer that is randomly discarded in the model training. The two groups of LSTM parameters were fixed separately by comparing the RMSE and MAE computed, based on validation, dataset.
| Parameters | LSTM with NDVImean, RHmean, Rsum and Tmean | LSTM with Historical Dengue Data, NDVImean, RHmean, Rsum and Tmean |
|---|---|---|
| Time step | 12 | 12 |
| Loss function | MSE | MSE |
| Number of units | 64 | 64 |
| Epoch | 1150 | 2000 |
| Batch size | 12 | 12 |
| Learning rate | 0.005 | 0.001 |
| Optimizer | Adam | Adam |
| Dropout rate | 0.8 | 0.65 |
Accuracy comparison of multi-step-ahead LSTM modeling with two groups of input features and ARIMA using root-mean-square error (RMSE) and mean absolute error (MAE). The two indices were computed based on the actual and predicted weekly changes in natural log-transformed dengue cases.
| Model | 2018–2019 | 2019 Peak Period | ||||
|---|---|---|---|---|---|---|
| RMSE | MAE | RMSE | MAE | |||
| LSTM modeling | LSTM with | 1-week | 0.36 | 0.29 | 0.28 | 0.23 |
| 2-week | 0.35 | 0.28 | 0.30 | 0.23 | ||
| 3-week | 0.36 | 0.28 | 0.34 | 0.26 | ||
| 4-week | 0.32 | 0.25 | 0.22 | 0.18 | ||
| 5-week | 0.36 | 0.29 | 0.29 | 0.24 | ||
| 6-week | 0.36 | 0.29 | 0.31 | 0.25 | ||
| 7-week | 0.38 | 0.3 | 0.35 | 0.29 | ||
| 8-week | 0.37 | 0.29 | 0.36 | 0.28 | ||
| 9-week | 0.38 | 0.3 | 0.34 | 0.29 | ||
| 10-week | 0.36 | 0.29 | 0.34 | 0.27 | ||
| 11-week | 0.36 | 0.29 | 0.34 | 0.29 | ||
| 12-week | 0.36 | 0.27 | 0.31 | 0.25 | ||
| LSTM with historical dengue data, | 1-week | 0.35 | 0.27 | 0.23 | 0.20 | |
| 2-week | 0.34 | 0.27 | 0.22 | 0.19 | ||
| 3-week | 0.34 | 0.27 | 0.25 | 0.20 | ||
| 4-week | 0.35 | 0.26 | 0.25 | 0.21 | ||
| 5-week | 0.34 | 0.27 | 0.22 | 0.19 | ||
| 6-week | 0.40 | 0.31 | 0.26 | 0.21 | ||
| 7-week | 0.37 | 0.30 | 0.28 | 0.22 | ||
| 8-week | 0.38 | 0.29 | 0.29 | 0.23 | ||
| 9-week | 0.38 | 0.29 | 0.32 | 0.27 | ||
| 10-week | 0.39 | 0.31 | 0.28 | 0.22 | ||
| 11-week | 0.34 | 0.27 | 0.28 | 0.23 | ||
| 12-week | 0.40 | 0.33 | 0.33 | 0.28 | ||
| Baseline | ARIMA (3, 1, 2) | 1.60 | 1.18 | 2.68 | 2.51 | |
Figure 4The 1- to 12-week-ahead prediction with two types of input features for the FDB. The red points represent the number of observed cases per epi week during 2018-2019. The blue interval represents the number of predicted cases per epi weekusing LSTM with historical dengue data, climate factors and environmental factors. The grey interval represents the number of predicted cases per epi week using LSTM with climate and environmental factors.