| Literature DB >> 35857360 |
Bi Fan1, Jiaxuan Peng2, Hainan Guo1, Haobin Gu3, Kangkang Xu4, Tingting Wu1.
Abstract
BACKGROUND: Emergency department (ED) overcrowding is a concerning global health care issue, which is mainly caused by the uncertainty of patient arrivals, especially during the pandemic. Accurate forecasting of patient arrivals can allow health resource allocation in advance to reduce overcrowding. Currently, traditional data, such as historical patient visits, weather, holiday, and calendar, are primarily used to create forecasting models. However, data from an internet search engine (eg, Google) is less studied, although they can provide pivotal real-time surveillance information. The internet data can be employed to improve forecasting performance and provide early warning, especially during the epidemic. Moreover, possible nonlinearities between patient arrivals and these variables are often ignored.Entities:
Keywords: emergency department; internet search index; machine learning; nonlinear model; patient arrival forecasting
Year: 2022 PMID: 35857360 PMCID: PMC9350824 DOI: 10.2196/34504
Source DB: PubMed Journal: JMIR Med Inform
Figure 1A framework of the intelligent forecasting system with the internet search index. ANN: artificial neural network; ARIMA: autoregressive integrated moving average model; ARIMAX: ARIMA with explanatory variables; ELM: extreme learning machine; GLM: generalized linear model; LSTM: long short-term memory; RF: random forest; SVM: support vector machine.
Figure 2Weekly patient arrivals to ED. ED: emergency department.
Figure 3Boxplot of patient volumes of ED visits per month. ED: emergency department.
Figure 4Boxplot of patient volumes in ED of holidays. ED: emergency department; HOL: holidays; NON: nonholidays; SCH: school closure.
Initial search queries related to emergency department patients
| Aspects | Index |
| Names | 癌症 ( |
| Causes | 天氣 ( |
| Symptoms | 喉嚨痛 ( |
| Treatments | 克拉汀 ( |
| Others | 蜂蜜 ( |
Maximum correlation coefficient of search queries from Google Trends.
| Number | Index | Aspects | Laga | Correlation coefficient | |
| 1 | ginger | Treatment | 1 | –0.33 | .02 |
| 2 | swine fluc symptoms | Disease | 1 | 0.50 | <.001 |
| 3 | Infect | Symptom | 1 | 0.36 | .01 |
| 4 | 衛生署 (Department of Health) | Others | 1 | 0.43 | .00 |
| 5 | fever | Symptom | 2 | 0.31 | .04 |
| 6 | 豬流感 診所 (swine flu clinic) | Disease | 2 | 0.49 | <.001 |
| 7 | 牙醫 (dentist) | Others | 2 | –0.32 | .04 |
| 8 | 肠病毒 (enterovirus) | Disease | 6 | 0.38 | <.001 |
| 9 | cough | Symptom | 7 | –0.42 | <.001 |
aThe unit of lag is week(s).
bThe P value is modified by false discovery rate (significance level=.05).
cSwine flu is the nickname of H1N1 influenza in Hong Kong.
Figure 5Trend of ED patient arrivals and internet search index. ED: emergency department.
Prediction performance of weekly emergency department patient arrivals.
| Models | Training | Testing | ||||
| MAPEa (%) | RMSEb | MAPE (%) | RMSE | |||
|
| ||||||
|
| ARIMAc | 3.6 | 17.02 | 5.2 | 24.28 | |
|
| ANNd | 3.5 | 19.19 | 3.6 | 19.20 | |
|
| SVMe | 2.2 | 18.52 | 4.1 | 18.59 | |
|
| RFf | 2.5 | 19.36 | 3.0 | 19.67 | |
|
| LSTMg | 2.9 | 16.93 | 4.2 | 20.79 | |
|
| ELMh | 2.8 | 16.52 | 3.2 | 16.99 | |
|
| ||||||
|
| GLMi | 3.2 | 16.79 | 5.3 | 24.44 | |
|
| ARIMAXj | 3.5 | 17.85 | 5.1 | 23.16 | |
|
| ANN | 3.4 | 16.10 | 4.0 | 18.48 | |
|
| SVM | 2.8 | 16.24 | 3.9 | 17.45 | |
|
| RF | 2.9 | 17.05 | 3.7 | 18.36 | |
|
| LSTM | 3.2 | 16.43 | 4.4 | 19.94 | |
|
| ELM | 2.7 | 13.17 | 3.5 | 16.72 | |
|
| ||||||
|
| GLM | 3.2 | 16.24 | 5.0 | 23.18 | |
|
| ARIMAX | 3.4 | 17.84 | 5.1 | 22.00 | |
|
| ANN | 3.0 | 14.51 | 3.3 | 15.45 | |
|
| SVM | 2.6 | 14.84 | 3.1 | 15.09 | |
|
| RF | 2.9 | 15.92 | 3.3 | 16.32 | |
|
| LSTM | 3.0 | 15.15 | 3.4 | 16.69 | |
|
| ELM | 2.6 | 13.10 | 3.0 | 14.55 | |
aMAPE: average mean absolute percentage error.
bRMSE: root mean square error.
cARIMA: autoregressive integrated moving average model.
dANN: artificial neural network.
eSVM: support vector machine.
fRF: random forest.
gLSTM: long short-term memory.
hELM: extreme learning machine.
iGLM: generalized linear model.
jARIMAX: ARIMA with explanatory variables.
DMa test results of testing data set for same data set.
| Test model | Reference modelb,c | ||||||||||||
| GLMd | ARIMAXe | ANNf | SVMg | RFh | LSTMi | ||||||||
|
| |||||||||||||
|
| ELM | 2.8297 (<.001) | 3.0624 (<.001) | 2.012 (<.001) | 1.8178 (<.001) | 2.8481 (<.001) | 2.1002 (<.001) | ||||||
|
| GLM |
| 0.2935 (.31) | 0.86595 (.11) | 1.0707 (.06) | 0.7643 (.09) | 0.1663 (.54) | ||||||
|
| ARIMAX |
|
| 0.64435 (.17) | 1.0691 (.06) | 0.3957 (.23) | 0.4876 (.68) | ||||||
|
| ANN |
|
|
| 0.13244 (.38) | 0.2746 (.45) | 1.9512 (.01) | ||||||
|
| SVM |
|
|
|
| 0.5823 (.27) | 0.8714 (.12) | ||||||
|
| RF |
|
|
|
|
| 1.0045 (.08) | ||||||
|
| |||||||||||||
|
| ELM | 2.5062 (<.001) | 3.79 (<.001) | 2.0047 (<.001) | 2.0325 (<.001) | 2.0476 (<.001) | 1.6659 (.02) | ||||||
|
| GLM |
| 0.32675 (.30) | 1.1462 (.12) | 1.6064 (.07) | 1.0467 (.09) | 0.3647 (.64) | ||||||
|
| ARIMAX |
|
| 1.7314 (.06) | 2.2885 (.05) | 1.5946 (.11) | 1.2671 (.07) | ||||||
|
| ANN |
|
|
| 0.14419 (.40) | 0.2104 (.49) | 1.2304 (.08) | ||||||
|
| SVM |
|
|
|
| 0.2593 (.36) | 1.4391 (.04) | ||||||
|
| RF |
|
|
|
|
| 1.2992 (.06) | ||||||
aDM: Diebold-Mariano.
bThe P value modified by false discovery rate is given in brackets. The significance level is .05.
cValues are presented as the Diebold-Mariano statistic (P value modified by false discovery rate).
dGLM: generalized linear model.
eARIMAX: ARIMA with explanatory variables.
fANN: artificial neural network.
gSVM: support vector machine.
hRF: random forest.
iLSTM: long short-term memory.
DM test results of testing data set for different data sets.
| Test model (with internet data) | Reference model (without internet data)a | |||||||
| GLMb | ARIMAXc | ANNd | SVMe | RFf | LSTMg | ELMh |
| |
| GLM | 2.4848 (<.001) | 0.4041 (.31) | 0.2797 (.37) | 0.2314 (.40) | 1.8806 (.22) | 2.1002 (.31) | 0.8635 (.16) |
|
| ARIMAX | 1.5701 (.04) | 2.5818 (<.001) | 0.4968 (.28) | 1.7698 (.12) | 0.7756 (.20) | 1.5748 (.24) | 0.0337 (.51) |
|
| ANN | 2.2546 (<.001) | 1.7547 (.02) | 4.1945 (<.001) | 2.8276 (<.001) | 3.4393 (<.001) | 1.2432 (.09) | 4.4291 (<.001) |
|
| SVM | 2.244 (<.001) | 1.7374 (.02) | 6.0597 (<.001) | 2.3394 (<.001) | 1.6767 (.02) | 0.8602 (.07) | 3.2791 (<.001) |
|
| RF | 1.7886 (.02) | 1.9591 (.02) | 2.7599 (<.001) | 1.8785 (.02) | 2.3097 (<.001) | 0.5800 (.05) | 2.3075 (<.001) |
|
| LSTM | 0.8556 (.01) | 3.3685 (<.001) | 1.3620 (.07) | 1.6441 (.04) | 1.0560 (.12) | 2.4263 (<.001) | 1.9995 (.02) |
|
| ELM | 2.2546 (<.001) | 1.7547 (.03) | 4.1946 (<.001) | 2.8276 (<.001) | 3.4394 (<.001) | 2.175 (.01) | 4.4291 (<.001) |
|
aValues are presented as the Diebold-Mariano statistic. The P value modified by false discovery rate is in brackets. The significance level is .05.
bGLM: generalized linear model.
cARIMAX: ARIMA with explanatory variables.
dANN: artificial neural network.
eSVM: support vector machine.
fRF: random forest.
gLSTM: long short-term memory.
hELM: extreme learning machine.
Robustness analysis.
| SD | Forecasting model | ||||||||||||||
|
| GLMa | ANNb | SVMc | ARIMAXd | LSTMe | RFf | ELMg | ||||||||
|
| |||||||||||||||
|
| SD of MAPEh (%) | 2.5 | 1.0 | 1.0 | 1.7 | 1.7 | 1.2 | 1.0 | |||||||
|
| SD of RMSEi | 15.638 | 4.385 | 5.158 | 5.843 | 7.409 | 5.371 | 4.099 | |||||||
|
| |||||||||||||||
|
| SD of MAPE (%) | 2.4 | 0.8 | 0.7 | 0.9 | 1.3 | 0.8 | 0.7 | |||||||
|
| SD of RMSE | 15.212 | 3.577 | 4.008 | 5.681 | 5.797 | 3.985 | 3.370 | |||||||
aGLM: generalized linear model.
bANN: artificial neural network.
cSVM: support vector machine.
dARIMAX: ARIMA with explanatory variables.
eLSTM: long short-term memory.
fRF: random forest.
gELM: extreme learning machine.
hMAPE: average mean absolute percentage error.
iRMSE: root mean square error.