| Literature DB >> 31364559 |
G Wang1, W Wei2, J Jiang2, C Ning1, H Chen3, J Huang2, B Liang2, N Zang1, Y Liao1, R Chen1, J Lai1, O Zhou1, J Han1, H Liang1, L Ye2.
Abstract
Guangxi, a province in southwestern China, has the second highest reported number of HIV/AIDS cases in China. This study aimed to develop an accurate and effective model to describe the tendency of HIV and to predict its incidence in Guangxi. HIV incidence data of Guangxi from 2005 to 2016 were obtained from the database of the Chinese Center for Disease Control and Prevention. Long short-term memory (LSTM) neural network models, autoregressive integrated moving average (ARIMA) models, generalised regression neural network (GRNN) models and exponential smoothing (ES) were used to fit the incidence data. Data from 2015 and 2016 were used to validate the most suitable models. The model performances were evaluated by evaluating metrics, including mean square error (MSE), root mean square error, mean absolute error and mean absolute percentage error. The LSTM model had the lowest MSE when the N value (time step) was 12. The most appropriate ARIMA models for incidence in 2015 and 2016 were ARIMA (1, 1, 2) (0, 1, 2)12 and ARIMA (2, 1, 0) (1, 1, 2)12, respectively. The accuracy of GRNN and ES models in forecasting HIV incidence in Guangxi was relatively poor. Four performance metrics of the LSTM model were all lower than the ARIMA, GRNN and ES models. The LSTM model was more effective than other time-series models and is important for the monitoring and control of local HIV epidemics.Entities:
Keywords: ARIMA model; HIV; LSTM model; incidence; prediction
Mesh:
Year: 2019 PMID: 31364559 PMCID: PMC6518582 DOI: 10.1017/S095026881900075X
Source DB: PubMed Journal: Epidemiol Infect ISSN: 0950-2688 Impact factor: 2.451
Fig. 1.Diagram of LSTM neural network pattern. Input gate (i) determines which information needs to be updated in the unit state; the forgetting gate (f) controls information which needs to be discarded from the unit state; then input gate and a vector are created by Tanh to determine which new information is stored in the unit state to update the old unit state, and turn into the new unit state (c). Finally, cell state information is filtered with the output gate (o) to update the hidden state (h), which is the output of the LSTM cell.
Fig. 2.Monthly incidence of HIV in Guangxi, China (from January 2005 to December 2015). According to the trend section, it can be found that the incidence of HIV shows seasonal tendency (s = 12). From 2005 to 2011, the HIV incidence in Guangxi was increasing slowly, and the epidemic situation in 2011–2016 showed a seasonal slow decline.
Fig. 4.The forecasting curves of the optimal LSTM and other models as well as the actual HIV incidence series. Comparison of LSTM model and other models. LSTM, the long short-term memory neural network model; ARIMA, the autoregressive integrated moving average model. The black line means the actual data, the blue dashed line means the predictive data via the LSTM model, the red dashed line means the predictive value via the ARIMA model, the yellow dashed line means the predictive value via the SES model, while the green dashed line means the predictive value via the GRNN model. Compared with ARIMA, SES and GRNN, the predicted value of LSTM was closer to the actual value.
Fig. 3.The MSE of LSTM models with different values using HIV incidence in 2015 and 2016. MSE, mean square error; N: the number of input to the LSTM model. The yellow line means N value and corresponding MSE in 2015, while the purple line means N and MSE in 2016. As can be seen from the figure, when the N was 12, the model had the minimum MSE in 2015 and 2016.
Performance of LSTM and the other models in 2015 and 2016
| 2015 | 2016 | |||||||
|---|---|---|---|---|---|---|---|---|
| MSE | RMSE | MAE | MAPE | MSE | RMSE | MAE | MAPE | |
| LSTM | 0.0308 | 0.1755 | 0.1231 | 0.1588 | 0.0026 | 0.4189 | 0.0103 | 0.0966 |
| ARIMA | 0.0357 | 0.1888 | 0.1672 | 0.1925 | 0.0030 | 0.4345 | 0.0139 | 0.4926 |
| GRNN | 0.2005 | 0.2612 | 0.0557 | 0.2359 | 0.2100 | 0.1983 | 0.0489 | 0.2212 |
| SES | 0.0640 | 0.2458 | 0.2093 | 0.2577 | 0.1300 | 0.3606 | 0.3201 | 0.3162 |
| ES | 0.0371 | 0.1927 | 0.1663 | 0.1941 | 0.1118 | 0.3344 | 0.2972 | 0.2859 |
MSE, mean square error; RMSE, root mean square error; MAE, mean absolute error; MAPE, mean absolute percentage error.