| Literature DB >> 33190462 |
Mehrdad Karajizadeh1, Mahdi Nasiri1, Mahnaz Yadollahi2, Amir Hussain Zolfaghari3, Ali Pakdam1.
Abstract
OBJECTIVES: Machine learning has been widely used to predict diseases, and it is used to derive impressive knowledge in the healthcare domain. Our objective was to predict in-hospital mortality from hospital-acquired infections in trauma patients on an unbalanced dataset.Entities:
Keywords: C5.0; Data Mining; Decision Tree; Healthcare Associated Infections; Injuries; Machine Learning; Mortality
Year: 2020 PMID: 33190462 PMCID: PMC7674815 DOI: 10.4258/hir.2020.26.4.284
Source DB: PubMed Journal: Healthc Inform Res ISSN: 2093-3681
Detailed information about dataset used in this study
| Data variable name | Measurement | Data variable categories or values | Role | Definition of the data variable | |
|---|---|---|---|---|---|
| 1 | Sex | Nominal | 0 = Female | Input | The patient’s gender |
| 2 | Age category | Ordinal | 1 = “15–45” | Input | The patient’s age at the time of injury |
| 3 | Mechanism of injury | Nominal | 1 = Car accident | Input | The mechanism (or multiple injury factor) that caused the injury event |
| 4 | Injured body region | Nominal | 1 = Head and neck | Input | ISS body region |
| 5 | Injury Severity Score (ISS) category | Ordinal | 1 = “1–8” | Input | ISS was calculated based on the Baker formula. The ISS severity score that reflects the patient’s injuries. |
| 6 | Ward | Nominal | 1 = ICU | Input | Ward where detect nosocomial infection |
| 7 | Type of invasive intervention | Nominal | 1 = Catheter vein | Input | Type of invasive intervention performed |
| 8 | Infected day | Nominal | 1 = Infection is less than 21 day | Input | Substation detect infection date from admission date |
| 9 | Hospital-acquired infected | Nominal | 1 = upper respiratory infection | Input | Type of hospital-acquired infections |
| 10 | Survival status | Nominal | 0 = Non-survivors | Target | Survival status when patients discharge |
ICU: intensive care unit, UTI: urinary tract infection.
Bivariate analysis of mortality predictors
| Survivors (n = 464) | Non-survivors (n = 85) | Total (n = 549) | ||
|---|---|---|---|---|
| Sex | 0.137 | |||
| Male | 386 (85.6) | 65 (14.4) | 451 (100) | |
| Female | 78 (79.6) | 20 (20.4) | 98 (100) | |
|
| ||||
| Age (yr) | <0.05 | |||
| 15–45 | 318 (89.8) | 36 (10.2) | 354 (100) | |
| 46–64 | 84 (81.6) | 19 (18.4) | 103 (100) | |
| >65 | 62 (67.4) | 30 (32.6) | 92 (100) | |
|
| ||||
| Mechanism of injury | <0.05 | |||
| Car accident | 188 (86.2) | 30 (13.8) | 218 (100) | |
| Motorcycle accident | 117 (88.6) | 15 (11.4) | 132 (100) | |
| Pedestrian | 61 (82.4) | 13 (17.6) | 74 (100) | |
| Gunshot | 8 (66.7) | 4 (33.3) | 12 (100) | |
| Falling | 65 (74.7) | 22 (25.3) | 87 (100) | |
| Assault | 13 (100) | 0 (0) | 13 (100) | |
| Struck by objects | 13 (100) | 0 (0) | 13 (100) | |
|
| ||||
| Injured body region | 0.38 | |||
| Head and neck | 183 (84.7) | 33 (15.3) | 216 (100) | |
| Face | 17 (81) | 4 (19) | 21 (100) | |
| Thorax | 54 (84.4) | 10 (15.6) | 64 (100) | |
| Abdomen | 16 (94.1) | 1 (5.9) | 17 (100) | |
| Extremities | 107 (88.4) | 14 (11.6) | 121 (100) | |
| Multiple Injuries | 87 (79.1) | 23 (20.9) | 110 (100) | |
|
| ||||
| Injury Severity Score (n = 492) | 0.18 | |||
| 1–8 | 157 (89.2) | 19 (10.8) | 176 (100) | |
| 9–15 | 170 (82.5) | 36 (17.5) | 206 (100) | |
| ≥16 | 94 (85.5) | 16 (14.5) | 110 (100) | |
|
| ||||
| Ward | <0.05 | |||
| ICU | 312 (80.4) | 76 (19.6) | 388 (100) | |
| General or surgical ward | 152 (94.4) | 9 (5.6) | 161 (100) | |
|
| ||||
| Type of invasive intervention | ||||
| Catheter vein (yes) | 86 (89.6) | 10 (10.4) | 96 (100) | 0.13 |
| Urinary catheter (yes) | 113 (90.4) | 12 (9.6) | 125 (100) | <0.05 |
| Medical ventilator (yes) | 102 (75) | 34 (25) | 136 (100) | <0.05 |
| Tracheostomy (yes) | 74 (87.1) | 11 (12.9) | 85 (100) | 0.48 |
| Trachea intubation (yes) | 14 (70) | 6 (30) | 20 (100) | 0.06 |
| Arterial line (yes) | 2 (100) | 0 (0) | 2 (100) | 0.54 |
| Surgery (yes) | 74 (88.1) | 10 (11.9) | 84 (100) | 0.32 |
|
| ||||
| Infected day | 0.51 | |||
| Infected in less than 21 days after admission | 415 (84.9) | 74 (15.1) | 489 (100) | |
| Infected in more than 22 days after admission | 49 (81.7) | 11 (18.3) | 60 (100) | |
|
| ||||
| Hospital-acquired infected | ||||
| Upper respiratory infection (yes) | 252 (83.7) | 49 (16.3) | 301 (100) | 0.57 |
| Urinary tract infection - other UTI (yes) | 90 (85.7) | 15 (14.3) | 105 (100) | 0.70 |
| Surgical site infection - SKIN (yes) | 92 (85.2) | 16 (14.8) | 108 (100) | 0.83 |
| Bloodstream infection (yes) | 82 (80.4) | 20 (19.6) | 102 (100) | 0.20 |
| Pneumonia (yes) | 34 (85) | 6 (15) | 40 (100) | 0.93 |
| Upper respiratory infection - symptomatic UTI (yes) | 14 (87.5) | 2 (12.5) | 16 (100) | 0.73 |
| Central nervous system - meningitis (yes) | 17 (70.8) | 7 (29.2) | 24 (100) | <0.05 |
| Surgical site infection - surgery took place (yes) | 1 (50) | 1 (50) | 2 (100) | 0.17 |
Values are presented as number (%).
ICU: intensive care unit, UTI: urinary tract infection.
Performance evaluation of death models
| Model | Description | AUC | Accuracy (%) | Class | Precision (%) | Recall (%) |
|---|---|---|---|---|---|---|
| CHAID tree | Classification without the balanced data set | 0.781 | 85.16 | Survivors | 90.27 | 86.66 |
| Non-survivors | 17.64 | 62.50 | ||||
|
| ||||||
| C5.0 tree | Classification without the balanced data set | 0.619 | 86.16 | Survivors | 99.13 | 86.46 |
| Non-survivors | 15.29 | 76.47 | ||||
AUC: area under the curve.
Performance evaluation of death models (random under-sampling)
| Model | Description | AUC | Accuracy (%) | Class | Precision (%) | Recall (%) |
|---|---|---|---|---|---|---|
| CHAID tree | Classification using the balanced data set (random under-sampling) | 0.709 | 61.24 | Survivors | 28.76 | 80.76 |
| Non-survivors | 94.11 | 70.79 | ||||
|
| ||||||
| C5.0 tree | Classification using the balanced data set (random under-sampling) | 0.797 | 70.69 | Survivors | 61.79 | 76.38 |
| Non-survivors | 80.00 | 66.66 | ||||
AUC: area under the curve.
Performance evaluation of death models (random over-sampling)
| Model | Description | AUC | Accuracy (%) | Class | Precision (%) | Recall (%) |
|---|---|---|---|---|---|---|
| CHAID tree | Classification with the balanced data set (boost) | 0.883 | 79.47 | Survivors | 74.35 | 82.53 |
| Non-survivors | 69.70 | 76.98 | ||||
|
| ||||||
| C5.0 tree | Classification with the balanced data set (boost) | 0.974 | 94.74 | Survivors | 92.02 | 97.26 |
| Non-survivors | 97.88 | 92.58 | ||||
AUC: area under the curve.
Performance evaluation for death models on the clustered dataset
| Model | Cluster number | AUC | Accuracy (%) | Class | Precision (%) | Recall (%) |
|---|---|---|---|---|---|---|
| CHAID tree | Cluster 1 with alive data and dead data set | 0.862 | 79.19 | Survivors | 96.40 | 74.19 |
| Non-survivors | 57.25 | 92.59 | ||||
| Cluster 2 with alive data and dead data set | 0.961 | 89.34 | Survivors | 100 | 82.64 | |
| Non-survivors | 78.35 | 100 | ||||
| Cluster 3 with alive data and dead data set | 0.987 | 94.74 | Survivors | 94.66 | 94.66 | |
| Non-survivors | 95.87 | 95.87 | ||||
| Cluster 4 with alive data and dead data set | 0.993 | 97.60 | Survivors | 97.06 | 94.28 | |
| Non-survivors | 97.89 | 98.88 | ||||
| Cluster 5 with alive data and dead data set | 0.982 | 95.05 | Survivors | 96.59 | 93.40 | |
| Non-survivors | 93.62 | 96.70 | ||||
| Overall | - | 0.962 | 91.30 | Survivors | 96.98 | 83.35 |
| Non-survivors | 82.56 | 96.78 | ||||
|
| ||||||
| C5.0 tree | Cluster 1 with alive data and dead data set | 0.899 | 87.25 | Survivors | 95.80 | 83.77 |
| Non-survivors | 76.34 | 93.46 | ||||
| Cluster 2 with alive data and dead data set | 0.944 | 92.89 | Survivors | 96.00 | 90.57 | |
| Non-survivors | 89.69 | 95.60 | ||||
| Cluster 3 with alive data and dead data set | 0.962 | 94.77 | Survivors | 96.00 | 91.14 | |
| Non-survivors | 93.81 | 96.81 | ||||
| Cluster 4 with alive data and dead data set | 0.981 | 97.60 | Survivors | 91.18 | 100 | |
| Non-survivors | 100 | 96.80 | ||||
| Cluster 5 with alive data and dead data set | 0.999 | 97.80 | Survivors | 97.72 | 97.72 | |
| Non-survivors | 97.87 | 97.87 | ||||
| Overall | - | 0.965 | 93.02 | Survivors | 93.88 | 88.29 |
| Non-survivors | 90.39 | 96.04 | ||||
AUC: area under the curve.
Performance evaluation for death models with SMOTE-C5.0 and ADASYN-C5.0
| Model | AUC | Accuracy (%) | Class | Precision (%) | Recall (%) |
|---|---|---|---|---|---|
| SMOTE-C5.0 | 0.97 | 93.66 | Survivors | 96.35 | 90.95 |
| Non-survivors | 91.15 | 96.43 | |||
|
| |||||
| ADASYN-C5.0 | 0.95 | 90.93 | Survivors | 89.60 | 92.89 |
| Non-survivors | 92.40 | 88.91 | |||
|
| |||||
| SMOTE-SVM | 1.00 | 100 | Survivors | 100 | 100 |
| Non-survivors | 100 | 100 | |||
|
| |||||
| ADASYN-SVM | 0.99 | 98.57 | Survivors | 98.74 | 98.39 |
| Non-survivors | 98.43 | 98.71 | |||
|
| |||||
| SMOTE-ANN | 0.92 | 91.48 | Survivors | 86.54 | 95.74 |
| Non-survivors | 96.27 | 98.41 | |||
|
| |||||
| ADASYN-ANN | 0.97 | 97.46 | Survivors | 96.86 | 98.09 |
| Non-survivors | 98.08 | 96.83 | |||
SVM: support vector machine, ANN: artificial neural network, AUC: area under the curve.
Evaluation metrics in training, testing, and validation sets
| Model | Evaluation metrics | Training | Testing | Validation |
|---|---|---|---|---|
| Classification without the balanced data set (with CHAID) | AUC | 0.77 | 0.81 | 0.76 |
| Accuracy (%) | 82.34 | 85.57 | 92.54 | |
|
| ||||
| Classification without the balanced data set (with C5.0) | AUC | 0.59 | 0.75 | 0.60 |
| Accuracy (%) | 84.68 | 88.66 | 91.04 | |
|
| ||||
| Classification with balance data set (boost) with CHAID | AUC | 0.89 | 0.87 | 0.88 |
| Accuracy (%) | 79.11 | 76.72 | 82.42 | |
|
| ||||
| Classification with balance data set (boost) with C5.0 | AUC | 0.97 | 0.97 | 0.97 |
| Accuracy (%) | 92.65 | 94.71 | 91.21 | |
|
| ||||
| Classification with the balanced data set (random under-sampling) with CHAID | AUC | 0.64 | 0.53 | 0.74 |
| Accuracy (%) | 59.50 | 48.28 | 53.57 | |
|
| ||||
| Classification with the balanced data set (random under-sampling) with C5.0 | AUC | 0.78 | 0.80 | 0.84 |
| Accuracy (%) | 72.07 | 76.92 | 73.08 | |
|
| ||||
| Cluster 1 with alive data and dead data set and classification with C5.5 | AUC | 0.91 | 0.82 | 0.91 |
| Accuracy (%) | 88.29 | 81.82 | 87.76 | |
|
| ||||
| Cluster 2 with alive data and dead data set and classification with C5.5 | AUC | 0.95 | 0.91 | 0.96 |
| Accuracy (%) | 93.94 | 90.91 | 90.62 | |
|
| ||||
| Cluster 3 with alive data and dead data set and classification with C5.5 | AUC | 0.96 | 0.95 | 0.96 |
| Accuracy (%) | 95.76 | 96.30 | 88.46 | |
|
| ||||
| Cluster 4 with alive data and dead data set and classification with C5.5 | AUC | 0.98 | 0.98 | 1.00 |
| Accuracy (%) | 98.86 | 94.74 | 94.44 | |
|
| ||||
| Cluster 5 with alive data and dead data set and classification with C5.5 | AUC | 0.99 | 0.99 | 1.00 |
| Accuracy (%) | 97.54 | 98.88 | 100 | |
|
| ||||
| Cluster 1 with alive data and dead data set and classification with CHAID | AUC | 0.88 | 0.759 | 0.872 |
| Accuracy (%) | 81.46 | 72.73 | 75.51 | |
|
| ||||
| Cluster 2 with alive data and dead data set and classification with CHAID | AUC | 0.955 | 0.981 | 0.954 |
| Accuracy (%) | 89.39 | 93.94 | 84.38 | |
|
| ||||
| Cluster 3 with alive data and dead data set and classification with CHAID | AUC | 0.982 | 1.00 | 0.99 |
| Accuracy (%) | 94.07 | 96.30 | 96.15 | |
|
| ||||
| Cluster 4 with alive data and dead data set and classification with CHAID | AUC | 0.99 | 1.0 | 1.0 |
| Accuracy (%) | 96.59 | 100 | 100 | |
|
| ||||
| Cluster 5 with alive data and dead data set and classification with CHAID | AUC | 0.99 | 0.95 | 0.95 |
| Accuracy (%) | 98.36 | 87.50 | 89.66 | |
|
| ||||
| SMOTE-C5.0 | AUC | 0.98 | 0.84 | 0.89 |
| Accuracy (%) | 93.69 | 79.69 | 86.52 | |
|
| ||||
| ADASYN-C5.0 | AUC | 0.90 | 0.77 | 0.69 |
| Accuracy (%) | 86.37 | 77.16 | 75.86 | |
|
| ||||
| SMOTE-SVM | AUC | 1.00 | 0.989 | 0.98 |
| Accuracy (%) | 100 | 92.71 | 94.38 | |
|
| ||||
| ADASYN-SVM | AUC | 0.99 | 0.89 | 0.87 |
| Accuracy (%) | 98.57 | 81.73 | 80.46 | |
|
| ||||
| SMOTE-ANN | AUC | 0.92 | 0.87 | 0.86 |
| Accuracy (%) | 91.48 | 82.29 | 79.78 | |
|
| ||||
| ADASYN-ANN | AUC | 0.97 | 0.76 | 0.61 |
| Accuracy (%) | 97.46 | 72.59 | 62.07 | |
AUC: area under the curve.