| Literature DB >> 35742033 |
Panagiotis Michailidis1, Athanasia Dimitriadou2, Theophilos Papadimitriou1, Periklis Gogas1.
Abstract
Hospital readmissions are regarded as a compounding economic factor for healthcare systems. In fact, the readmission rate is used in many countries as an indicator of the quality of services provided by a health institution. The ability to forecast patients' readmissions allows for timely intervention and better post-discharge strategies, preventing future life-threatening events, and reducing medical costs to either the patient or the healthcare system. In this paper, four machine learning models are used to forecast readmissions: support vector machines with a linear kernel, support vector machines with an RBF kernel, balanced random forests, and weighted random forests. The dataset consists of 11,172 actual records of hospitalizations obtained from the General Hospital of Komotini "Sismanogleio" with a total of 24 independent variables. Each record is composed of administrative, medical-clinical, and operational variables. The experimental results indicate that the balanced random forest model outperforms the competition, reaching a sensitivity of 0.70 and an AUC value of 0.78.Entities:
Keywords: forecasting; machine learning; readmissions
Year: 2022 PMID: 35742033 PMCID: PMC9222500 DOI: 10.3390/healthcare10060981
Source DB: PubMed Journal: Healthcare (Basel) ISSN: 2227-9032
Input Variables of the Dataset.
| No | Independent Variables | Characterization of Each Variable |
|---|---|---|
|
| ||
| 1 | Patient Age | Quantitative variable, Integer |
| 2 | Patient Gender | Qualitative variable, Categorical |
| 3 | Length of Stay | Quantitative variable, Integer |
| 4 | Patient Transfer | Qualitative variable, Binary |
| 5 | ICD-10 Diagnosis on Admission | Qualitative variable, Categorical |
| 6 | ICD-10 Diagnosis at Discharge | Qualitative variable, Categorical |
| 7 | Admission Clinic | Qualitative variable, Categorical |
| 8 | Discharge Clinic | Qualitative variable, Categorical |
| 9 | Clinic Change | Qualitative variable, Binary |
| 10 | Hospitalization Outcome | Qualitative variable, Categorical |
| 11 | Past Hospitalization | Qualitative variable, Binary |
|
| ||
| 12 | Clinic’s Occupancy Rate | Quantitative variable, Continuous |
| 13 | Clinic’s Number of Doctors | Quantitative variable, Integer |
| 14 | Clinic’s Number of Nurses | Quantitative variable, Integer |
|
| ||
| 15 | Blood Sugar (Glucose) | Quantitative variable, Continuous |
| 16 | Indication (Normal Range) Blood Sugar | Qualitative variable, Categorical |
| 17 | Potassium | Quantitative variable, Continuous |
| 18 | Indication (Normal Range) Potassium | Qualitative variable, Categorical |
| 19 | Sodium | Quantitative variable, Continuous |
| 20 | Indication (Normal Range) Sodium | Qualitative variable, Categorical |
| 21 | Blood Urea Nitrogen | Quantitative variable, Continuous |
| 22 | Indication Blood Urea (Normal Range) Nitrogen | Qualitative variable, Categorical |
| 23 | Blood Creatinine | Quantitative variable, Continuous |
| 24 | Indication (Normal Range) Blood Creatinine | Qualitative variable, Categorical |
Figure 1Hyperplane selection and support vectors. The pronounced black contour represents the SVs thus defining the margins with the dashed lines. The plain single line describes the separating hyperplane.
Figure 2The non-separable two-class scenario in the input space(left) and the two-dimensional data space in a three-feature space after the projection (right). The two classes are represented by the different colors: blue and red.
Figure 3Overview of a 3-fold Cross Validation training scheme. It shows that each fold is used as a testing sample, while the remaining folds are used for training the model for each parameters’ value combination.
Classification Results using a confusion matrix. True positives (TP)—number of samples correctly classified as readmissions. True negatives (TN)—number of samples correctly classified as non-readmissions. False positives (FP)—number of samples incorrectly classified as readmissions. False negatives (FN)— number of samples incorrectly classified as non-readmissions.
| Predicted | |||
|---|---|---|---|
| 0 | 1 | ||
| Actual | 0 | TN | FP |
| 1 | FN | TP | |
Optimal parameters of SVM Models.
| Parameter C | Parameter γ | |
|---|---|---|
| SVM Linear Kernel | 0.06 | --- |
| SVM RBF Kernel | 194.38 | 0.0001 |
Confusion matrix of SVM Model with linear kernel.
| Confusion Matrix (SVM, Linear Kernel) | |||
|---|---|---|---|
| Predicted | |||
| 0 | 1 | ||
| Actual | 0 | TN | FP |
| 1 | FN | TP | |
Confusion matrix of SVM Model with RBF Kernel.
| Confusion Matrix (SVM, RBF Kernel) | |||
|---|---|---|---|
| Predicted | |||
| 0 | 1 | ||
| Actual | 0 | TN | FP |
| 1 | FN | TP | |
Performance metrics of SVM Models.
| SVM Linear Kernel | ||||
| Recall | Accuracy | Precision | F1-Score | AUC |
| 0.59 | 0.74 | 0.31 | 0.40 | 0.77 |
| SVM RBF Kernel | ||||
| Recall | Accuracy | Precision | F1-Score | AUC |
| 0.60 | 0.74 | 0.31 | 0.41 | 0.76 |
Both SVM kernels produced similar results considering all performance metrics.
Optimal Parameters of random forest models.
| Total Number of Decision Trees | |
|---|---|
| Weighted Random Forest | 25 |
| Balanced Random Forest | 730 |
Confusion matrix of weighted random forest model.
| Confusion Matrix (Weighted Random Forest) | |||
|---|---|---|---|
| Predicted | |||
| 0 | 1 | ||
| Actual | 0 | TN | FP |
| 1 | FN | TP | |
Confusion matrix of balanced random forest model.
| Confusion Matrix (Balanced Random Forest) | |||
|---|---|---|---|
| Predicted | |||
| 0 | 1 | ||
| Actual | 0 | TN | FP |
| 1 | FN | TP | |
Performance metrics of random forest models.
| Weighted Random Forest | |||||
| Recall | Specificity | Accuracy | Precision | F1-Score | AUC |
| 0.25 | 0.98 | 0.88 | 0.80 | 0.38 | 0.74 |
| Balanced Random Forest | |||||
| Recall | Specificity | Accuracy | Precision | F1-Score | AUC |
| 0.70 | 0.74 | 0.73 | 0.32 | 0.44 | 0.78 |
Feature importance ranking, the significance of each feature in the classification of the random forest model in decreasing order.
| Importance | Feature |
|---|---|
| 0.141501 | ICD-10 Diagnosis at Discharge |
| 0.129996 | ICD-10 Diagnosis on Admission |
| 0.059492 | Clinic’s Occupancy Rate |
| 0.056195 | Hospitalization Outcome |
| 0.05464 | Blood Urea Nitrogen |
| 0.054075 | Patient Age |
| 0.052316 | Potassium |
| 0.051854 | Blood Sugar (Glucose) |
| 0.048263 | Length of Stay |
| 0.043971 | Blood Creatinine |
| 0.042001 | Sodium |
| 0.030861 | Discharge Clinic |
| 0.03023 | Clinic’s Number of Doctors |
| 0.029398 | Clinic’s Number of Nurses |
| 0.024448 | Indication (Normal Range) Blood Sugar |
| 0.020731 | Admission Clinic |
| 0.020721 | Patient Gender |
| 0.020681 | Past Hospitalization |
| 0.019997 | Indication (Normal Range) Blood Creatinine |
| 0.019523 | Indication (Normal Range) Potassium |
| 0.016082 | Indication Blood Urea (Normal Range) Nitrogen |
| 0.01222 | Patient Transfer |
| 0.01015 | Indication (Normal Range) Sodium |
| 0.00112 | Clinic Change |
Figure 4Aggregated results and comparison of proposed methodologies.
Figure 5Classification performance measurement (AUC).