| Literature DB >> 36093147 |
Amirmohammad Khalaji1,2,3, Amir Hossein Behnoush1,2,3, Mana Jameie1,3,4, Ali Sharifi5, Ali Sheikhy1,3,4, Aida Fallahzadeh1,3,4, Saeed Sadeghian1,2, Mina Pashang1,2, Jamshid Bagheri1, Seyed Hossein Ahmadi Tafti1, Kaveh Hosseini1,3.
Abstract
Background: As the era of big data analytics unfolds, machine learning (ML) might be a promising tool for predicting clinical outcomes. This study aimed to evaluate the predictive ability of ML models for estimating mortality after coronary artery bypass grafting (CABG). Materials and methods: Various baseline and follow-up features were obtained from the CABG data registry, established in 2005 at Tehran Heart Center. After selecting key variables using the random forest method, prediction models were developed using: Logistic Regression (LR), Support Vector Machine (SVM), Naïve Bayes (NB), K-Nearest Neighbors (KNN), Extreme Gradient Boosting (XGBoost), and Random Forest (RF) algorithms. Area Under the Curve (AUC) and other indices were used to assess the performance.Entities:
Keywords: coronary artery bypass; feature selection; machine learning; mortality; prediction
Year: 2022 PMID: 36093147 PMCID: PMC9448905 DOI: 10.3389/fcvm.2022.977747
Source DB: PubMed Journal: Front Cardiovasc Med ISSN: 2297-055X
Baseline and hospitalization characteristics of the study cohort.
| Variable | Total cohort ( |
| Age (years) | 67.34 ± 9.67 |
| Male | 12,387 (73.51) |
| Hypertension | 9,056 (53.75) |
| Diabetes | 6,757 (40.10) |
| Dyslipidemia | 8,970 (53.23) |
| Family history of cardiovascular disease | 6,224 (36.94) |
| Smoking | 2,989 (17.74) |
| Prior MI | 5,681 (33.72) |
| Prior HF | 475 (2.82) |
| COPD | 610 (3.62) |
| Prior CABG | 86 (0.51) |
| Prior PCI | 1,277 (7.58) |
| PVD | 319 (1.89) |
| CVA | 1,155 (6.86) |
| Opium | 2,755 (16.35) |
| Off pump surgery | 1,592 (9.45) |
| BMI (kg/m2) | 27.23 ± 4.17 |
| FBS (mg/dl) | 110.41 ± 39.23 |
| EF (%) | 46.11 ± 9.11 |
| LDL-C (mg/dl) | 96.20 ± 36.35 |
| HDL-C (mg/dl) | 36.86 ± 9.68 |
| Cholesterol (mg/dl) | 155.35 ± 43.55 |
| TG (mg/dl) | 149.68 ± 78.22 |
| Creatinine (mg/dl) | 0.98 ± 0.56 |
| Hb (g/dl) | 13.83 ± 1.70 |
| Total ventilation hours | 13.36 ± 31.02 |
Data are presented as mean ± S.D. or number (%); MI, myocardial infarction; HF, heart failure; COPD, chronic obstructive pulmonary disease; PCI, primary cutaneous intervention; PVD, peripheral vascular disease; CVA, cerebrovascular accident; FBS, fasting blood glucose; LDL-C, low-density lipoprotein cholesterol; HDL-C, high-density lipoprotein cholesterol; TG, triglyceride; BMI, body mass index; Hb, hemoglobin; EF, ejection fraction.
FIGURE 1Number of survivors and non-survivors at each follow-up endpoint.
FIGURE 2Comparison of baseline and hospitalization characteristics of survivors and non-survivors with one-year follow-up. MI, myocardial infarction; HF, heart failure; COPD, chronic obstructive pulmonary disease; CABG, coronary artery bypass grafting; PCI, primary cutaneous intervention; PVD, peripheral vascular disease; CVA, cerebrovascular accident; FBS, fasting blood glucose; LDL-C, low-density lipoprotein cholesterol; HDL-C, high-density lipoprotein cholesterol; TG, triglyceride; BMI, body mass index; Hb, hemoglobin; EF, ejection fraction.
FIGURE 3Feature importance based on the Random Forest model. MI, myocardial infarction; HF, heart failure; COPD, chronic obstructive pulmonary disease; CABG, coronary artery bypass grafting; PCI, primary cutaneous intervention; PVD, peripheral vascular disease; CVA, cerebrovascular accident; FBS, fasting blood glucose; LDL-C, low-density lipoprotein cholesterol; HDL-C, high-density lipoprotein cholesterol; TG, triglyceride; BMI, body mass index; Hb, hemoglobin; EF, ejection fraction.
Evaluation of machine learning (ML) models for each of the five follow-up endpoints.
| Sensitivity (%) | Specificity (%) | Accuracy (%) | AUC [95% CI] | |
|
| ||||
| Random forest | 70.80 | 69.85 | 69.88 | 0.78 [0.74–0.82] |
| Naïve Bayes | 59.85 | 85.67 | 84.96 | 0.78 [0.74–0.83] |
| SVM | 63.50 | 74.65 | 74.35 | 0.74 [0.69–0.79] |
| XGBoost | 66.42 | 77.73 | 77.42 | 0.79 [0.75–0.83] |
| KNN | 68.61 | 63.52 | 63.66 | 0.72 [0.67–0.76] |
| Logistic regression | 72.99 | 77.79 | 77.66 | 0.81 [0.77–0.85] |
|
| ||||
| Random forest | 66.17 | 73.01 | 72.73 | 0.77 [0.73–0.80] |
| Naïve Bayes | 76.62 | 62.39 | 62.98 | 0.76 [0.72–0.79] |
| SVM | 48.76 | 85.26 | 83.73 | 0.67 [0.63–0.72] |
| XGBoost | 66.17 | 72.62 | 72.35 | 0.75 [0.71–0.79] |
| KNN | 63.18 | 69.50 | 69.24 | 0.73 [0.70–0.77] |
| Logistic regression | 75.12 | 70.37 | 70.57 | 0.79 [0.76–0.82] |
|
| ||||
| Random forest | 61.45 | 75.96 | 75.06 | 0.74 [0.71–0.77] |
| Naïve Bayes | 58.55 | 80.03 | 78.70 | 0.74 [0.71–0.78] |
| SVM | 45.09 | 84.53 | 82.08 | 0.68 [0.64–0.71] |
| XGBoost | 68.00 | 67.11 | 67.16 | 0.74 [0.71–0.77] |
| KNN | 70.18 | 64.92 | 65.24 | 0.72 [0.69–0.76] |
| Logistic regression | 67.64 | 72.79 | 72.47 | 0.76 [0.73–0.79] |
|
| ||||
| Random forest | 50.14 | 81.75 | 78.91 | 0.72 [0.69–0.75] |
| Naïve Bayes | 53.03 | 81.44 | 78.88 | 0.71 [0.68–0.75] |
| SVM | 42.07 | 83.32 | 79.61 | 0.64 [0.60–0.67] |
| XGBoost | 65.71 | 65.84 | 65.83 | 0.70 [0.67–0.74] |
| KNN | 67.72 | 63.70 | 64.06 | 0.70 [0.66–0.72] |
| Logistic regression | 69.74 | 67.52 | 67.72 | 0.73 [0.70–0.76] |
|
| ||||
| Random forest | 64.65 | 73.65 | 72.41 | 0.75 [0.72–0.77] |
| Naïve Bayes | 58.14 | 77.67 | 74.98 | 0.73 [0.71–0.76] |
| SVM | 48.37 | 80.61 | 76.16 | 0.66 [0.63–0.69] |
| XGBoost | 65.35 | 72.98 | 71.93 | 0.74 [0.72–0.77] |
| KNN | 60.70 | 73.20 | 71.48 | 0.73 [0.70–0.75] |
| Logistic regression | 67.21 | 70.08 | 69.68 | 0.75 [0.72–0.77] |
AUC, area under the receiver operating characteristic curve; CI, confidence interval; SVM, support vector machine; XGBoost, extreme gradient boosting; KNN: K-nearest neighbors.
FIGURE 4Receiver operating characteristic curve for mortality prediction. XGBoost, Extreme Gradient Boosting; SVM, Support Vector Machine; KNN, K-Nearest Neighbors.
FIGURE 5Summary of study design and machine learning models development and evaluation.