| Literature DB >> 32357880 |
Hang Qiu1,2, Lin Luo3, Ziqi Su4, Li Zhou5, Liya Wang3, Yucheng Chen6,7.
Abstract
BACKGROUND: Accumulating evidence has linked environmental exposure, such as ambient air pollution and meteorological factors, to the development and severity of cardiovascular diseases (CVDs), resulting in increased healthcare demand. Effective prediction of demand for healthcare services, particularly those associated with peak events of CVDs, can be useful in optimizing the allocation of medical resources. However, few studies have attempted to adopt machine learning approaches with excellent predictive abilities to forecast the healthcare demand for CVDs. This study aims to develop and compare several machine learning models in predicting the peak demand days of CVDs admissions using the hospital admissions data, air quality data and meteorological data in Chengdu, China from 2015 to 2017.Entities:
Keywords: Cardiovascular disease; Environmental exposure; Hospital admission; Machine learning; Prediction
Mesh:
Year: 2020 PMID: 32357880 PMCID: PMC7195717 DOI: 10.1186/s12911-020-1101-8
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Block Diagram of Classified Prediction Process
The features for prediction
| year | year of the date of hospital admission | |
| month | month of year | |
| day | day of month | |
| holiday | public holidays | |
| DOW | day of week | |
| Tem_lag04 | mean temperature for the moving average of current day and previous four days (lag04) | |
| RH_lag06 | relative humidity for the moving average of current day and previous six days (lag06) | |
| Rain_lag06 | rainfall for the moving average of current day and previous six days (lag06) | |
| PM2.5_lag3 | PM2.5 at the previous three days (lag3) | |
| PM10_lag3 | PM10 at the previous three days (lag3) | |
| PMC_lag3 | PMC at the previous three days (lag3) | |
| SO2_lag0 | SO2 at the current day (lag0) | |
| NO2_lag0 | NO2 at the current day (lag0) | |
| CO_lag0 | CO at the current day (lag0) | |
| O3 | O3 at the previous six days (lag6) |
Summary of parameter values in each model
| LR | penalty | L1 | penalty function |
| SVM | kernel | linear | kernel function |
| C | 5 | penalty parameter of the error term | |
| ANN | kernel initializer | uniform | kernel initializer function |
| activation1 | relu | activation of hidden layer | |
| activation2 | sigmoid | activation of output layer | |
| optimizer | Adam | training optimization algorithm | |
| epochs | 300 | number of times shown to the network | |
| batch size | 20 | batch size | |
| dropout | 0.0 | dropout rate | |
| RF | n estimators | 695 | number of iterations |
| max depth | 4 | maximum depth of variable interactions | |
| max features | 7 | number of features for the best split | |
| XGBoost | learning rate | 0.1 | learning rate |
| n estimators | 100 | number of iterations | |
| eta | 0.01 | control of learning rate | |
| max depth | 3 | maximum depth of variable interactions | |
| gamma | 0.6 | minimum loss reduction required to make a further partition on the tree’ leaf node | |
| subsample | 0.7 | subsample ratio | |
| co-sample by tree | 0.6 | subsample ratio of columns when constructing each tree | |
| min child weight | 2 | sum of the minimum weights that leaf nodes need to observe | |
| LightGBM | learning rate | 0.1 | learning rate |
| n estimators | 100 | number of iterations | |
| max depth | 8 | maximum depth of variable interactions | |
| num leaves | 10 | number of leaves in each tree | |
| bagging fraction | 0.7 | percentage of sampling used in each iteration | |
| feature fraction | 0.9 | ratio of features to build the tree in each iteration | |
| min data in leaf | 5 | minimum number of records in a leaf | |
| min split gain | 0.0 | smallest gain of the split |
Summary statistics of daily CVDs admissions, meteorological conditions and air pollutants concentrations in Chengdu, 2015–2017
| 208 | 90 | 33 | 206 | 476 | |
| Temperature (°C) | 17.0 | 7.2 | −1.1 | 17.8 | 30.2 |
| Relative Humidity (%) | 80.4 | 8.8 | 43.0 | 80.8 | 98.3 |
| Rainfall (mm) | 2.6 | 8.7 | 0.0 | 0.0 | 122.0 |
| PM2.5 (μg/m3) | 60.3 | 42.4 | 6.1 | 48.4 | 324.5 |
| PM10 (μg/m3) | 99.3 | 64.7 | 14.3 | 79.8 | 492.5 |
| PMC (μg/m3) | 39.0 | 25.8 | 4.8 | 31.6 | 238.2 |
| SO2 (μg/m3) | 13.9 | 5.8 | 3.9 | 12.7 | 37.9 |
| NO2 (μg/m3) | 55.0 | 17.3 | 15.7 | 53.0 | 130.4 |
| O3 (μg/m3) | 96.0 | 54.6 | 5.6 | 85.3 | 290.4 |
| CO (mg/m3) | 1.1 | 0.4 | 0.4 | 1.0 | 2.8 |
CVDs Cardiovascular diseases
Fig. 2Box plot of AUC for machine learning models with 10-fold cross-validation in training dataset. °: the outliers of box plot; *: the model is significantly different from the XGBoost model. LR: logistic regression; SVM: support vector machine; ANN: artificial neural network; RF: random forest; XGBoost: extreme gradient boosting; LightGBM: light gradient boosting machine
Fig. 3ROC curve of machine learning models in testing dataset. LR: logistic regression; SVM: support vector machine; ANN: artificial neural network; RF: random forest; XGBoost: extreme gradient boosting; LightGBM: light gradient boosting machine
The evaluation indicators of machine learning models in testing dataset
| LR | 0.842 (95% CI: 0.783–0.901) | 0.513 | 0.766 | 0.848 | 0.751 | 0.378 | 0.523 |
| SVM | 0.834 (95% CI: 0.774–0.894) | 0.344 | 0.748 | 0.879 | 0.724 | 0.362 | 0.513 |
| ANN | 0.890 (95% CI: 0.836–0.944) | 0.296 | 0.858 | 0.333 | 0.951 | 0.551 | 0.415 |
| RF | 0.926 (95% CI: 0.879–0.974) | 0.358 | 0.862 | 0.854 | 0.527 | 0.667 | |
| XGBoost | 0.930 (95% CI: 0.878–0.982) | 0.277 | 0.876 | 0.818 | 0.886 | 0.563 | 0.667 |
| 0.758 |
font bold: the optimal values; athe optimal model. LR logistic regression, SVM support vector machine, ANN artificial neural network, RF random forest, XGBoost extreme gradient boosting, LightGBM light gradient boosting machine
Fig. 4Features importance ranking based on LightGBM model