| Literature DB >> 33654479 |
Naghmeh Khajehali1, Zohreh Khajehali2, Mohammad Jafar Tarokh1.
Abstract
The intensive care units (ICUs) are among the most expensive and essential parts of all hospitals for extremely ill patients. This study aims to predict mortality and explore the crucial factors affecting it. Generally, in the health care systems, having a fast and precise ICU mortality prediction for patients plays a key role in care quality, resulting in reduced costs and improved survival chances of the patients. In this study, we used a medical dataset, including patients' demographic details, underlying diseases, laboratory disorder, and LOS. Since accurate estimates are required to have optimal results, various data pre-processings as the initial steps are used here. Besides, machine learning models are employed to predict the risk of mortality ICU discharge. For AdaBoost model, these measures are considered AUC= 0.966, sensitivity (recall) = 87.88%, Kappa=0.859, F-measure = 89.23% making it, AdaBoost, accounts for the highest rate. Our model outperforms other comparison models by using various scenarios of data processing. The obtained results demonstrate that the high mortality can be caused by underlying diseases such as diabetes mellitus and high blood pressure, moderate Pulmonary Embolism Wells Score risk, platelet blood count less than 100000 (mcl), hypertension (HTN), high level of Bilirubin, smoking, and GCS level between 6 and 9.Entities:
Keywords: AdaBoost; Intensive care unit; Machine learning methods; Mortality prediction
Year: 2021 PMID: 33654479 PMCID: PMC7907311 DOI: 10.1007/s00779-021-01540-5
Source DB: PubMed Journal: Pers Ubiquitous Comput ISSN: 1617-4909
Medical variable along with its definition which is used in this work
| Variable | Definition |
|---|---|
| Hemiplegia/paraplegia | Paraplegia is an impairment of the legs and lower body resulting from injury to nerves in the lumbar or thoracic vertebrae areas of the body. Hemiplegia is the impairment of one vertical half of the body. |
| Anemia | It is a condition when the body lacks enough healthy red blood cells to transfer sufficient oxygen to the body’s tissues. The normal hemoglobin range for men, in general, is defined between 13.2–16.6 grams (g) of hemoglobin per deciliter (dL) of blood and between 11.6–15. G/dL for women. |
| Motor sensory disorder | It is a condition when the brain has trouble in the process of sensory information |
| Bilirubin | It is an orange-yellow substance in the body that is generally formed during the normal breakdown of red blood cells. Normal Bilirubin levels are less than 1.2 (mg/dL), and higher Bilirubin levels are an indicator of different types of liver problems. |
| Serum Creatinine | It is a waste substance in the blood that arises from the activity of muscle. Its normal level in the blood in adults is from 0.5 to 1.2 milligrams (mg) per deciliter (dL). Severe kidney impairment is a sign when Creatinine levels reach 2.0 or more in adults |
| Platelet blood count | Platelet blood test measures the number of platelets in the blood. High or low platelet levels are indicators of severe conditions. Its normal range is between 150,000 and 450,000 platelets per microliter (mcL) of blood. |
| Body mass index (BMI) | It is a value in which weight in kilograms (kg) relates to height in meters squared. The normal BMI range is from 18.5 to 25 kg/m2. Obesity, overweight, and underweight have a BMI of more than 30, 25, to 30, and less than 18.5 kg/m2. |
| Pulmonary Embolism Wells Score | It is a score of risk classification and clinical decision rule to predict the acute pulmonary embolism (PE) probable in patients who had the possibility of acute PE in their history and examination. A score is classified into three categories: a score greater than 6, a score of 2 to 6, and a score less than 2 considering as “high risk,” “intermediate risk,” and “low risk” of PE respectively. |
| Deep vein thrombosis (DVT) | A normal complication in trauma patients. It accounts for three risk classes: high with the range of 3 points or more, intermediate ranging from 1 to 2 points, and low including less than 1 point. |
| Hypertension (HTN) | A medical condition in which the blood pressure is continuously raised in the arteries. The range of 100–130 millimeters mercury (mmHg) systolic, and 60–80 mmHg diastolic is normal blood pressure at rest for most adults. |
| Glasgow Coma Scale (GCS) | It is a neurological scaling system describing the brain’s consciousness level and its changes to traumatic brain injury patients and further assessment. The criteria of the scale for assessing a person’s consciousness are ranging from 3 (a sign of deep unconsciousness) to 15 (normal scale of consciousness) |
| Intubation | It is a procedure that is used when one cannot breathe on their own |
| Diabetes mellitus | It is considered a metabolic disease that can result in high blood sugar. Normal blood sugar is 70–99 (mg)/(dl) and high blood sugar is between 80 and 130 (mg) and (dl) when fasting. |
| Multiple sclerosis (MS) | It is a disease that can potentially disable the brain and central nervous system (called the spinal cord). |
| Nutrition | It is the overall food that a person or other organism uses to maintain, grow, reproduce, health, and disease of an organism. It is divided into low in calories, NPO, and lose weight |
| Mobility | Is an average movement of patients during hospitalization. It includes agitation, slightly limited, completely immobile. |
| Bone fracture | It is a medical condition where the continuity of the bone is broken. |
| Cerebrovascular accident (CVA) | It is a medical condition in which blood cells’ flow in the brain suddenly deteriorates and stops suddenly. |
Selected attributes for modeling
| Variable | Type |
|---|---|
| Operation | Binominal |
| Smoking | |
| Hemiplegia/paraplegia | |
| Anemia | |
| Diabetes mellitus | |
| Motor sensory disorder | |
| Hypertension (HTN) | |
| Intubation | |
| Multiple sclerosis (MS) | |
| Bone fracture | |
| Cerebrovascular accident (CVA) | |
| Gender | |
| Bilirubin (mg/dL) | Numerical |
| Serum Creatinine (mg/dL) | |
| Platelet blood count (mcL) | |
| Body mass index (BMI) (kg/m2) | |
| Pulmonary Embolism Wells Score admit | |
| Pulmonary Embolism Wells Score discharge | |
| Deep vein thrombosis (DVT) score admit/discharge | |
| Deep vein thrombosis (DVT) score discharge | |
| Glasgow Coma Scale (GCS) admit | |
| Glasgow Coma Scale (GCS) discharge | |
| LOS | |
| Age | |
| Nutrition | Polynominal |
| Mobility | |
| Skin Type |
Fig. 1The proposed model, moving from selecting data to predicting discharge
Selected attributes and related ranges in our work
| Variable | Range |
|---|---|
| Bilirubin (mg/dL) | A: Bilirubin ≤ 1.2 B : 1.2 < Bilirubin ≤ 1.9 C : 1.9< Bilirubin ≤ 5.9 D : 5.9 < Bilirubin ≤ 11.9 E : 11.9 < Bilirubin |
| Serum Creatinine (mg/dL) | A: Cr≤ 1.2 B : 1.2 < Cr ≤ 1.9 C : 1.9 < Cr ≤ 3.4 D : 3.4 < Cr ≤ 4.9 E : 4.9 < Cr |
| Platelet blood count (mcL) | A: Plt ≤ 20000 B : 20000 < Plt ≤ 50000 C : 50000 < Plt ≤ 10000 D : 10000< Plt ≤ 150000 E : 150000< Plt |
| Body mass index (BMI) (kg/m2) | Normal : 18.5 ≤ BM < 25 Obesity : 25 ≤ BM < 30 Overweight : BM < 30 Underweight : BM < 18.5 |
| Pulmonary Embolism Wells Score admit/discharge | High probability: score >6 Moderate probability: score >2 and ≤ 6 Low probability : score < 2 |
| Deep vein thrombosis (DVT) score admit/discharge low: 0–2 normal: 2–3 high: 3–8 Glasgow Coma Scale (GCS) admit | Deep coma/Death: GCS < 3 Severe: 3≤ GCS≤ 5 Low : 5 < GCS≤ 9 Moderate : 9 < GCS≤ 12 Mild: 12 < GCS≤ 14 Fully alert : 14 < GCS |
| Glasgow Coma Scale (GCS) discharge | False: 15≤ GCS ≤ 9 True : 9<GCS<2 |
| Mobility | Agitation, slightly limited, completely immobile |
| Skin type | Dry, normal, moist |
Fig. 2Our proposed model evaluation in this study
The advantages and disadvantages of our model with other models used in this study
| Model | Advantages | Disadvantages |
|---|---|---|
| Bayesian Boosting | Is moderately robust to outliers Can be used on a small dataset Can learn nonlinear relationships | Vulnerable to overfitting Vulnerable to uniform noise |
| AdaBoost | Act remarkably in reality Is a type of reinforcement learning algorithm that can realize high precision classification by training several weak classifiers and assembling them into one strong classifier Can learn nonlinear relationships It is scalable Is moderately robust to outliers Its algorithm is fast, simple to implement, and easy to program Implicit feature selection Can remove overfitting (individual trees are inclined to overfitting since they can support branching till they memorize the training data) | Must adjust for cost-sensitive or imbalanced class problems Sensitive to noisy data & outliers Needs a termination condition Vulnerable to uniform noise |
| Vote (DT+K-NN) | • Can learn nonlinear relationships • It is scalable • Can be used on a small dataset | • Vulnerable to overfitting |
| K-nearest neighbors | • Not assume underlying data distribution • Is easy to implement | • Unable to construct a model • Unable to explore the relationship between a feature and the class • Computationally intensive recall • Require huge storage • With the existence of one class sample in excessive compared to other class, the control class will dominance the classification leading to an incorrect result |
| Decision tree | • Can be used on a small dataset • Can learn nonlinear relationships • Simple to analyze | • Over fitting or under fitting of the model is effortless • Negligible changes in the training data can generate huge changes in the result |
| Neural network | • Able to model complicated samples • Able to detect all possible interactions between predictor variables • Has the availability of multiple training algorithms | • Has a “black box” nature, • Has a great computational burden • Proneness to the overfitting • Inclined to an overfitting training dataset |
| Random forest | • Able to estimate noisy or missing data • Able to keep accuracy when a huge portion of the data is missing • Able to handle thousands of input variables • Can be Suitable for class imbalance issues | • Is not simply understandable model • It is like a black box approach for statistical modelers, the user has a low control level on what the model does • A large number of trees can make the algorithm too slow and ineffective for real-time predictions |
| Logistic regression(LR) | • Is easy to recognize and describe • Can be improved with no difficulty with new data • Does not require high computation power | • Acts weak when there are nonlinear relationships • Vulnerable to overfitting • Inflexible to absorb intricate patterns • Adding the right interaction terms or polynomials is risky and time-consuming. |
| Vote (DT+K-NN+LR) | • Is moderately robust to outliers • It is scalable • Can be used on a small dataset • Can learn nonlinear relationships | • Vulnerable to overfitting • Negligible changes in the training data can generate huge changes in the result • Require huge storage |
Comparison the performance of various machine learning methods
| Model | AUC | Sensivity (recall) (%) | Kappa | F-measure |
|---|---|---|---|---|
| Bayesian Boosting | 0.895 | 86.11% | 0.847 | 88.57% |
| AdaBoost | 0.966 | 87.88% | 0.859 | 89.23% |
| Vote(DT+K-NN) | 0.898 | 55.26% | 0.606 | 68.85% |
| 0.881 | 40.62% | 0.440 | 53.06% | |
| Decision tree | 0.836 | 72.41% | 0.764 | 82.35% |
| Neural network | 0.888 | 47.62% | 0.534 | 58.82% |
| Random forest | 0.916 | 73.33% | 0.614 | 68.75% |
| Logistic regression (LR) | 0.923 | 81.25% | 0.802 | 83.87% |
| Vote (DT+K-NN+LR) | 0.875 | 70.59% | 0.815 | 81.36% |
The performance of confusion matrix and related metrics.
| Hypothesized class/true class | ||
|---|---|---|
| Y | True positive (TP) | False positive (FP) |
| N | False negative (FN) | True negative (TN) |
The results of the confusion matrix eight models
| Model/confusion matrix | Model/confusion matrix | ||||
|---|---|---|---|---|---|
| Bayesian Boosting | Random forest | ||||
| Y | 98 | 5 | Y | 112 | 19 |
| N | 3 | 31 | N | 2 | 4 |
| AdaBoost | Neural network | ||||
| Y | 101 | 5 | Y | 63 | 4 |
| N | 3 | 31 | N | 6 | 11 |
| Vote (DT, K-NN) | Regression | ||||
| Y | 97 | 17 | Y | 65 | 3 |
| N | 2 | 21 | N | 2 | 13 |
| K-NN | Vote (DT, K-NN, RF) | ||||
| Y | 101 | 4 | Y | 102 | 10 |
| N | 3 | 13 | N | 1 | 24 |
| Decision tree | |||||
| Y | 107 | 8 | |||
| N | 1 | 21 | |||
