| Literature DB >> 35442202 |
Jinwan Wang1, Shuai Wang2, Mark Xuefang Zhu1, Tao Yang3, Qingfeng Yin4, Ya Hou4.
Abstract
BACKGROUND: As a major health hazard, the incidence of coronary heart disease has been increasing year by year. Although coronary revascularization, mainly percutaneous coronary intervention, has played an important role in the treatment of coronary heart disease, major adverse cardiovascular events (MACE) such as recurrent or persistent angina pectoris after coronary revascularization remain a very difficult problem in clinical practice.Entities:
Keywords: data imbalance; machine learning; major adverse cardiovascular events; oversampling; risk prediction
Year: 2022 PMID: 35442202 PMCID: PMC9069286 DOI: 10.2196/33395
Source DB: PubMed Journal: JMIR Med Inform
The confusion matrix for binary classification.
| Labeled | Predicted as negative | Predicted as positive |
| Negative | True negative | False positive |
| Positive | False negative | True positive |
Significant variables in univariate analysis.
| Characteristics | Without MACEa (n=753) | With MACE (n=251) | Statisticb |
| |||||
| Age (years), mean (SD) | 63.47 (10.98) | 66.82 (11.10) | –4.180 | 1002 | <.001 | ||||
|
| 4.360 | 1 | .04 | ||||||
|
| No | 429 (57.0) | 124 (49.4) |
|
|
| |||
|
| Yes | 324 (43.0) | 127 (50.6) |
|
|
| |||
|
| 6.213 | 1 | .013 | ||||||
|
| Physical work | 353 (46.9) | 95 (37.8) |
|
|
| |||
|
| Mental work | 400 (53.1) | 156 (62.2) |
|
|
| |||
| Course of disease (years since diagnosis), mean (SD) | 3.41 (5.11) | 5.43 (5.81) | –4.930 | 387.17 | <.001 | ||||
|
| 4.387 | 1 | .04 | ||||||
|
| No | 693 (92.0) | 220 (87.6) |
|
|
| |||
|
| Yes | 60 (8.0) | 31 (12.4) |
|
|
| |||
|
| 17.920 | 1 | <.001 | ||||||
|
| No obvious seasonality | 675 (89.6) | 199 (79.3) |
|
|
| |||
|
| Obvious seasonality | 78 (10.4) | 52 (20.7) |
|
|
| |||
|
| 80.775 | 1 | <.001 | ||||||
|
| No | 643 (85.4) | 147 (58.6) |
|
|
| |||
|
| Yes | 110 (14.6) | 104 (41.4) |
|
|
| |||
|
| 6.659 | 1 | .01 | ||||||
|
| No | 681 (90.4) | 240 (95.6) |
|
|
| |||
|
| Yes | 72 (9.6) | 11 (4.4) |
|
|
| |||
|
| 24.822 | 1 | <.001 | ||||||
|
| No | 671 (89.1) | 192 (76.5) |
|
|
| |||
|
| Yes | 82 (10.9) | 59 (23.5) |
|
|
| |||
|
| 6.249 | 1 | .01 | ||||||
|
| No | 712 (94.6) | 226 (90.0) |
|
|
| |||
|
| Yes | 41 (5.4) | 25 (10.0) |
|
|
| |||
|
| 4.489 | 1 | .03 | ||||||
|
| No | 570 (75.7) | 173 (68.9) |
|
|
| |||
|
| Yes | 183 (24.3) | 78 (31.1) |
|
|
| |||
|
| 47.408 | 1 | <.001 | ||||||
|
| No | 367 (48.7) | 185 (73.7) |
|
|
| |||
|
| Yes | 386 (51.3) | 66 (26.3) |
|
|
| |||
|
| 4.123 | 1 | .04 | ||||||
|
| No | 703 (93.4) | 243 (96.8) |
|
|
| |||
|
| Yes | 50 (6.6) | 8 (3.2) |
|
|
| |||
|
| 6.055 | 1 | .01 | ||||||
|
| No | 636 (84.5) | 195 (77.7) |
|
|
| |||
|
| Yes | 117 (15.5) | 56 (22.3) |
|
|
| |||
|
| 14.381 | 1 | <.001 | ||||||
|
| No | 634 (84.2) | 235 (93.6) |
|
|
| |||
|
| Yes | 119 (15.8) | 16 (6.4) |
|
|
| |||
|
| 12.446 | 1 | <.001 | ||||||
|
| No | 735 (97.6) | 233 (92.8) |
|
|
| |||
|
| Yes | 18 (2.4) | 18 (7.2) |
|
|
| |||
| LADd (mm), mean (SD) | 36.59 (4.91) | 37.70 (5.54) | –2.988 | 1002 | .003 | ||||
| LVEFe (%), mean (SD) | 52.65 (8.08) | 51.31 (8.91) | 2.113 | 395.97 | .04 | ||||
|
| 7.200 | 1 | .007 | ||||||
|
| No | 745 (98.9) | 242 (96.4) |
|
|
| |||
|
| Yes | 8 (1.1) | 9 (3.6) |
|
|
| |||
| HAMDf, mean (SD) | 7.23 (5.26) | 9.27 (5.87) | –4.877 | 392.12 | <.001 | ||||
| HAMAg, mean (SD) | 8.23 (6.59) | 11.13 (6.83) | –5.979 | 1002 | <.001 | ||||
aMACE: major adverse cardiovascular events.
bt statistics for continuous variable comparisons and χ2 statistics for categorical variables.
cTCM: traditional Chinese medicine.
dLAD: left atrial diameter.
eLVEF: left ventricular ejection fraction.
fHAMD: Hamilton depression scale.
gHAMA: Hamilton anxiety scale.
Data distribution before and after oversampling.
| Oversampling | Training set | Validation set | ||
|
| Without MACEa | With MACE | Without MACE | With MACE |
| Before | 527 | 176 | 226 | 75 |
| After | 527 | 527 | 226 | 226 |
aMACE: major adverse cardiovascular events.
Comparisons of machine learning algorithms before and after oversampling.
| Algorithms | Accuracy | Precision | Recall | F1-score | AUCa | |
|
| ||||||
|
| DTb | 0.7575 | 0.5217 | 0.3200 | 0.3967 | 0.7296 |
|
| RFc | 0.7741 | 0.6667 | 0.1867 | 0.2917 | 0.7888 |
|
| LRd | 0.7608 | 0.5405 | 0.2667 | 0.3571 | 0.7534 |
|
| NBe | 0.7442 | 0.4857 | 0.4533 | 0.4689 | 0.7224 |
|
| SVMf | 0.7641 | 0.7 | 0.0933 | 0.1647 | 0.7431 |
|
| XGBoostg | 0.7807 | 0.5918 | 0.3867 | 0.4677 | 0.7873 |
|
| ||||||
|
| DT | 0.7035 | 0.73 | 0.6460 | 0.6854 | 0.7748 |
|
| RF | 0.7522 | 0.7714 | 0.7168 | 0.7431 | 0.8434 |
|
| LR | 0.7434 | 0.7254 | 0.7832 | 0.7532 | 0.7841 |
|
| NB | 0.7058 | 0.6598 | 0.8495 | 0.7421 | 0.7463 |
|
| SVM | 0.7478 | 0.7593 | 0.7257 | 0.7421 | 0.8075 |
|
| XGBoost | 0.7788 | 0.8058 | 0.7345 | 0.7685 | 0.8599 |
aAUC: area under the curve.
bDT: decision tree.
cRF: random forest.
dLR: logistic regression.
eNB: naïve Bayes.
fSVM: support vector machine.
gXGBoost: extreme gradient boosting.
Figure 1ROC curves of machine learning algorithms after oversampling. ROC: receiver operating characteristic; DT: decision tree; RF: random forest; LR: logistic regression; NB: naïve Bayes; SVM: support vector machine; XGBoost: extreme gradient boosting; TPR: true positive rate; FPR: false positive rate.
Figure 2Confusion matrix of the risk prediction models with machine learning algorithms: (A) decision tree (DT), (B) random forest (RF), (C) logistic regression (LR), (D) naïve Bayes (NB), (E) support vector machine (SVM), (F) extreme gradient boosting (XGBoost).
Figure 3The relative importance of feature variables of the risk prediction models with machine learning algorithms: (A) decision tree (DT), (B) random forest (RF), (C) logistic regression (LR), (D) extreme gradient boosting (XGBoost). TCM: traditional Chinese medicine; LAD: left atrial diameter; LVEF: left ventricular ejection fraction; HAMD: Hamilton depression scale; HAMA: Hamilton anxiety scale.