| Literature DB >> 31781360 |
Jian Chen1, Hong Li1,2, Li Luo1, Yangyang Zhang1, Fengyi Zhang1, Fang Chen3, Mei Chen3.
Abstract
This study aimed to forecast the pattern of the demand for hemorrhagic stroke healthcare services based on air quality and machine learning. Hemorrhagic stroke, air quality, and meteorological data for 2016-2017 were obtained from the Longquanyi District of China, and the study included 1932 cases. Six machine learning methods were used to forecast the demand for hemorrhagic stroke healthcare services considering seasonality and a lag effect, and the average area under the curve was as high as 0.7971. Our results indicate that (1) the performance of forecasting during the warm season is significantly better than that in the cold season, (2) considering air pollution would improve the performance of forecasting the demand for hemorrhagic stroke healthcare services using machine learning, (3) the association between the demand for hemorrhagic stroke healthcare services and air pollutants is linear to some extent, and (4) it is feasible to use short-term concentrations of air pollutants to forecast the demand for hemorrhagic stroke healthcare services. This practical forecast model could provide an advance warning regarding the potentially high numbers of hemorrhagic stroke admissions to medical institutions, thus allowing time to implement an appropriate response to the increase in patient volumes.Entities:
Mesh:
Year: 2019 PMID: 31781360 PMCID: PMC6875383 DOI: 10.1155/2019/7463242
Source DB: PubMed Journal: J Healthc Eng ISSN: 2040-2295 Impact factor: 2.682
Statistics on the incidence of hemorrhagic stroke.
| Duration | Mean | SD | Min | 25% | 50% | 75% | Max | Sum |
|---|---|---|---|---|---|---|---|---|
| All | 2.9861 | 1.8650 | 0 | 2 | 3 | 4 | 12 | 1932 |
| Warm season | 2.9780 | 1.9848 | 0 | 2 | 2 | 4 | 12 | 947 |
| Cold season | 2.9939 | 1.7443 | 0 | 2 | 3 | 4 | 10 | 985 |
Statistics on air pollution and temperature.
| Pollutants | Mean | SD | Min | 25% | 50% | 75% | Max |
|---|---|---|---|---|---|---|---|
| SO2 | 11.0584 | 5.9144 | 3 | 7 | 9 | 13 | 46 |
| NO2 | 45.5800 | 21.6757 | 12 | 28 | 43 | 58.5 | 121 |
| CO | 1.0719 | 0.5262 | 0.4 | 0.8 | 0.9 | 1.2 | 10 |
| O3 | 98.7302 | 55.9866 | 2 | 55 | 92 | 136 | 334 |
| PM2.5 | 57.5307 | 43.9921 | 4 | 27 | 44 | 75 | 287 |
| PM10 | 90.5947 | 63.6597 | 6 | 47 | 71 | 113.75 | 411 |
| Lowest | 21.6461 | 7.8395 | 5 | 14 | 22 | 29 | 36 |
| Highest | 14.3133 | 7.2099 | –4 | 7 | 15 | 20 | 26 |
Mean denotes the average daily concentration. SD denotes the standard deviation of concentration. Min and Max denote the minimum and maximum concentrations; each quartile of the concentration is shown under the respective percentage. Lowest and highest denote the minimum and maximum temperatures, respectively.
Figure 1The curves of the cumulative proportion of hemorrhagic stroke event counts in the cold and warm seasons.
Comparative analysis of the performances of the models for the cold and warm seasons.
| AUC | Sensitivity | Specificity | |
|---|---|---|---|
| Cold season | 0.5721 | 0.1689 | 0.6934 |
| Warm season | 0.6801 | 0.2778 | 0.8353 |
|
| <0.0001 | <0.0001 | <0.0001 |
P values were obtained from the t-test between the performances in the cold and warm seasons.
The risk factors of warm datasets selected by LASSO.
| MaxLag | Elements |
|---|---|
| 1 | PM10_1, CO_1, Low_1 |
| 2 | PM10_1, CO_1, CO_2, Low_1, Low_2 |
| 3 | CO_1, CO_3, Low_3 |
| 4 | CO_1, CO_4, Low_3 |
| 5 | CO_1, CO_4, Low_1, Low_3 |
| 6 | CO_1, CO_4, Low_3 |
| 7 | CO_1, CO_4, Low_3 |
| 8 | CO_1, CO_4, Low_3, Low_8 |
| 9 | CO_1, CO_4, Low_3, Low_8 |
| 10 | CO_1, CO_4, Low_3, Low_8 |
| 11 | CO_1, CO_4, Low_3, Low_8 |
| 12 | CO_1, CO_4, Low_3, Low_8 |
| 13 | PM10_12, PM10_13, CO_1, CO_4, CO_13, Low_3, Low_8 |
| 14 | PM10_13, CO_4, CO_14, Low_3, Low_8 |
Suffix “_N” denotes the lag of N; for example, CO_1 refers to the concentration of CO one day ago. Low refers to lowest temperature.
Statistics on the performance of warm season models considering air pollution among the machine learning methods.
| Model | M-AUC | M-Sens | M-Spec | SD-AUC | SD-Sens | SD-Spec |
|---|---|---|---|---|---|---|
| LR | 0.7369 | 0.4684 | 0.8708 | 0.0276 | 0.0687 | 0.0137 |
| RF | 0.6811 | 0.2520 | 0.8368 | 0.0434 | 0.1096 | 0.0068 |
| SVMLinear | 0.6743 | 0.1795 | 0.7483 | 0.0409 | 0.0467 | 0.1159 |
| KNN | 0.6681 | 0.2558 | 0.8551 | 0.0371 | 0.1312 | 0.0224 |
| XGBLinear | 0.6601 | 0.2915 | 0.8448 | 0.0329 | 0.0574 | 0.0068 |
| XGBTree | 0.6599 | 0.2195 | 0.8563 | 0.0373 | 0.0838 | 0.0126 |
M-AUC, M-Sens, and M-Spec denote the average area under the curve (AUC), sensitivity, and specificity, respectively; SD-AUC, SD-Sens, and SD-Spec denote the standard deviation of the AUC, sensitivity, and specificity, respectively. LR, logistic regression; RF, random forest; SVMLinear, support-vector machines with linear kernel; KNN, k-nearest neighbor algorithm; XGBTree, extreme gradient boosting decision tree; XGBLinear, extreme gradient boosting linear model.
The P values of the t-test between different machine learning methods regarding AUC.
| LR | XGBTree | XGBLinear | KNN | SVMLinear | RF | |
|---|---|---|---|---|---|---|
| LR | 1 | <0.0001 | <0.0001 | 0.0001 | <0.0001 | <0.0001 |
| XGBTree | <0.0001 | 1 | 0.9878 | 0.5949 | 0.1579 | 0.1265 |
| XGBLinear | <0.0001 | 0.9878 | 1 | 0.5846 | 0.2964 | 0.0326 |
| KNN | 0.0001 | 0.5949 | 0.5846 | 1 | 0.6933 | 0.4353 |
| SVMLinear | <0.0001 | 0.1579 | 0.2964 | 0.6933 | 1 | 0.5933 |
| RF | <0.0001 | 0.1265 | 0.0326 | 0.4353 | 0.5933 | 1 |
0.001, 0.01, and 0.05. LR, logistic regression; RF, random forest; SVMLinear, support-vector machines with linear kernel; KNN, k-nearest neighbor algorithm; XGBTree, extreme gradient boosting decision tree; XGBLinear, extreme gradient boosting linear model.
Statistics on the performance of warm season models without considering air pollution among the machine learning methods.
| Model | M-AUC | M-Sens | M-Spec | SD-AUC | SD-Sens | SD-Spec |
|---|---|---|---|---|---|---|
| LR | 0.6062– | 0.4137– | 0.8543– | 0.1452+ | 0.3688+ | 0.0367+ |
| RF | 0.6504– | 0.3750+ | 0.8571+ | 0.1011+ | 0.4361+ | 0.0444+ |
| SVMLinear | 0.6229– | 0.2996+ | 0.8217+ | 0.1004+ | 0.2943+ | 0.0812– |
| KNN | 0.6931+ | 0.3667+ | 0.8510– | 0.1173+ | 0.4830+ | 0.0469+ |
| XGBLinear | 0.6783+ | 0.3600+ | 0.8590+ | 0.1415+ | 0.3719+ | 0.0428+ |
| XGBTree | 0.6453– | 0.3333– | 0.8422– | 0.0881+ | 0.4157+ | 0.0258+ |
M-AUC, M-Sens, and M-Spec denote the average area under the curve (AUC), sensitivity, and specificity, respectively; SD-AUC, SD-Sens, and SD-Spec denote the standard deviation of the AUC, sensitivity, and specificity, respectively. LR, logistic regression; RF, random forest; SVMLinear, support-vector machines with linear kernel; KNN, k-nearest neighbor algorithm; XGBTree, extreme gradient boosting decision tree; XGBLinear, extreme gradient boosting linear model. “+” indicates that the corresponding value without considering air pollution is higher than that considering air pollution. “–” indicates that the corresponding value considering air pollution is higher than that without considering air pollution.
The P values of the t-test between models with and without considering air pollution regarding different metrics.
| XGBTree | XGBLinear | LR | KNN | SVMLinear | RF | |
|---|---|---|---|---|---|---|
| AUC | 0.2114 | 0.3777 | 0.0316 | 0.969 | 0.3872 | 0.6214 |
| Sensitivity | 0.7939 | 0.0748 | 0.2558 | 0.2202 | 0.267 | 0.5365 |
| Specificity | 0.1211 | 0.1897 | 0.2148 | 0.4136 | 0.0004 | 0.5913 |
0.001, 0.01, and 0.05. LR, logistic regression; RF, random forest; SVMLinear, support-vector machines with linear kernel; KNN, k-nearest neighbor algorithm; XGBTree, extreme gradient boosting decision tree; XGBLinear, extreme gradient boosting linear model.
Statistics on the performance of warm season models regarding lag effects.
| Lag | M-AUC | M-Sens | M-Spec | SD-AUC | SD-Sens | SD-Spec |
|---|---|---|---|---|---|---|
| MaxLag-14 | 0.7314 | 0.3524 | 0.8040 | 0.0568 | 0.1631 | 0.1360 |
| MaxLag-9 | 0.6961 | 0.2863 | 0.8574 | 0.0369 | 0.1329 | 0.0184 |
| MaxLag-6 | 0.6948 | 0.3205 | 0.8584 | 0.0367 | 0.1383 | 0.0215 |
| MaxLag-11 | 0.6892 | 0.2902 | 0.8266 | 0.0309 | 0.1096 | 0.0763 |
| MaxLag-13 | 0.6876 | 0.2671 | 0.8010 | 0.0455 | 0.1619 | 0.1286 |
| MaxLag-10 | 0.6855 | 0.2644 | 0.8534 | 0.0356 | 0.1318 | 0.0170 |
| MaxLag-8 | 0.6803 | 0.2399 | 0.8422 | 0.0440 | 0.1273 | 0.0342 |
| MaxLag-4 | 0.6790 | 0.2953 | 0.8372 | 0.0310 | 0.1407 | 0.0359 |
| MaxLag-3 | 0.6785 | 0.2168 | 0.8231 | 0.0287 | 0.1443 | 0.0548 |
| MaxLag-12 | 0.6754 | 0.2643 | 0.8246 | 0.0528 | 0.1056 | 0.0769 |
| MaxLag-5 | 0.6742 | 0.2680 | 0.8477 | 0.0530 | 0.1080 | 0.0127 |
| MaxLag-7 | 0.6683 | 0.3191 | 0.8289 | 0.0541 | 0.1564 | 0.0767 |
| MaxLag-1 | 0.6409 | 0.3038 | 0.8486 | 0.0350 | 0.0882 | 0.0121 |
| MaxLag-2 | 0.6396 | 0.2008 | 0.8417 | 0.0371 | 0.0746 | 0.0066 |
M-AUC, M-Sens, and M-Spec denote the average area under the curve (AUC), sensitivity, and specificity, respectively; SD-AUC, SD-Sens, and SD-Spec denote the standard deviation of the AUC, sensitivity, and specificity, respectively. MaxLag-N refers to the risk factor sets that considered the air quality variables of the recent N days.
Performance of warm season models with AUC >0.75
| Lag | Models | M-AUC | M-Sens | M-Spec | SD-AUC | SD-Sens | SD-Spec |
|---|---|---|---|---|---|---|---|
| MaxLag-14 | LR | 0.7971 | 0.6252 | 0.8929 | 0.1158 | 0.2802 | 0.0429 |
| MaxLag-14 | SVMLinear | 0.7741 | 0.2266 | 0.5293 | 0.0829 | 0.2942 | 0.4593 |
| MaxLag-13 | LR | 0.7588 | 0.5483 | 0.8963 | 0.1289 | 0.2789 | 0.0541 |
| MaxLag-14 | RF | 0.7567 | 0.3500 | 0.8489 | 0.1116 | 0.4191 | 0.0307 |
| MaxLag-9 | LR | 0.7549 | 0.4617 | 0.8707 | 0.0668 | 0.2632 | 0.0347 |
M-AUC, M-Sens, and M-Spec denote the average area under the curve (AUC), sensitivity, and specificity, respectively; SD-AUC, SD-Sens, and SD-Spec denote the standard deviation of the AUC, sensitivity, and specificity, respectively. MaxLag-N refers to the risk factor sets that considered the air quality variables of the recent N days. LR, logistic regression; RF, random forest; SVMLinear, support-vector machines with linear kernel; KNN, k-nearest neighbor algorithm; XGBTree, extreme gradient boosting decision tree; XGBLinear, extreme gradient boosting linear model.