| Literature DB >> 35672659 |
Ran Liu1,2, Miye Wang2, Tao Zheng2, Rui Zhang2, Nan Li2, Zhongxiu Chen3, Hongmei Yan4, Qingke Shi5.
Abstract
BACKGROUND: Myocardial infarction can lead to malignant arrhythmia, heart failure, and sudden death. Clinical studies have shown that early identification of and timely intervention for acute MI can significantly reduce mortality. The traditional MI risk assessment models are subjective, and the data that go into them are difficult to obtain. Generally, the assessment is only conducted among high-risk patient groups.Entities:
Keywords: Artificial intelligence; Imbalanced data; Machine learning; Myocardial infarction
Mesh:
Year: 2022 PMID: 35672659 PMCID: PMC9175344 DOI: 10.1186/s12859-022-04761-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Basic features of patients with and without MI
| Feature | With MI (n = 14,446) | Without MI (n = 220,369) |
|---|---|---|
| Mean age (yrs) | 65.9 ± 13.4 | 62.4 ± 16.8 |
| Male, n (%) | 11,406 (79) | 131,220 (60) |
| Troponin-T (ng/L) | 327.9 (23.4–2463.5) | 12.3 (7.4–25.2) |
| Urodilatin (pg/ml) | 1138 (348–3693.5) | 242 (79–1057) |
| Myoglobin (ng/ml) | 52.9 (29.3–183.9) | 32.9 (21–64.7) |
| Total cholesterol (mmol/L) | 3.8 (3.1–4.6) | 3.94 (3.21–4.73) |
| Creatine kinase Isoenzymes-MB (ng/ml) | 3.9 (1.9–39.3) | 1.55 (1.01–2.53) |
| Serum creatinine (umol/L) | 85.8(71–109) | 74(60–93) |
| Fasting plasma glucose (mmol/L) | 6.73(5.5–9) | 5.6(4.9–7.2) |
| Direct bilirubin (umol/L) | 4.3(3.1–6.1) | 3.9(2.7–5.8) |
Fig. 1Machine learning model building flowchart
Fig. 2Easy ensemble, w-easy ensemble architecture diagram
Fig. 3Overall results of model one, two, and three validation sets
Optimal results of Models 1–3 on the validation set
| Model name | Construction method | Optimal algorithm | Number of optimal feature | Negative training n sample | Positive training n sample | Negative validation sample | Positive validation sample | Validation accuracy | ValidationF1 score |
|---|---|---|---|---|---|---|---|---|---|
| Model1 | Proportional division | GBDT | 9 | 175,496 | 10,756 | 43,873 | 2690 | 0.96 | 0.78 |
| Model2 | Upsampling | RF | 3 | 175,395 | 172,209 | 43,974 | 42,927 | 0.99 | 0.99 |
| Model3 | Downsampling | GBDT | 24 | 10,784 | 10,729 | 2662 | 2717 | 0.84 | 0.84 |
Submodel building with ensemble data
| Optimal algorithm | Optimal feature | Accuracy | F1-score | |
|---|---|---|---|---|
| Model_E0 | GBDT | 18 | 0.87 | 0.87 |
| Model_E1 | GBDT | 19 | 0.85 | 0.85 |
| Model_E2 | GBDT | 17 | 0.84 | 0.84 |
| Model_E3 | GBDT | 20 | 0.85 | 0.85 |
| Model_E4 | GBDT | 20 | 0.84 | 0.84 |
| Model_E5 | GBDT | 14 | 0.85 | 0.85 |
| Model_E6 | GBDT | 17 | 0.85 | 0.85 |
| Model_E7 | GBDT | 17 | 0.85 | 0.85 |
| Model_E8 | GBDT | 14 | 0.85 | 0.85 |
| Model_E9 | GBDT | 10 | 0.85 | 0.85 |
| Model_E10 | GBDT | 20 | 0.85 | 0.85 |
| Model_E11 | GBDT | 21 | 0.87 | 0.87 |
| Model_E12 | GBDT | 16 | 0.89 | 0.89 |
| Model_E13 | GBDT | 16 | 0.89 | 0.89 |
| Model_E14 | GBDT | 16 | 0.88 | 0.88 |
Voting traversal results of Model 4 and Model 5
| Vote difference | Accuracy | F1 score | Negative precision | Positive precision | Specificity | Sensitivity | |
|---|---|---|---|---|---|---|---|
| Model4_1 | 1 | 0.86 | 0.66 | 0.99 | 0.27 | 0.86 | 0.82 |
| Model4_3 | 3 | 0.87 | 0.68 | 0.99 | 0.29 | 0.88 | 0.8 |
| Model4_5 | 5 | 0.89 | 0.69 | 0.98 | 0.32 | 0.9 | 0.77 |
| Model4_7 | 7 | 0.90 | 0.71 | 0.98 | 0.35 | 0.91 | 0.75 |
| Model4_9 | 9 | 0.92 | 0.73 | 0.98 | 0.39 | 0.93 | 0.72 |
| Model4_11 | 11 | 0.93 | 0.74 | 0.98 | 0.42 | 0.94 | 0.7 |
| Model4_13 | 13 | 0.94 | 0.76 | 0.95 | 0.47 | 0.95 | 0.66 |
| Model4_15 | 15 | 0.95 | 0.78 | 0.98 | 0.54 | 0.97 | 0.62 |
| Model5_0 | 0 | 0.85 | 0.66 | 0.99 | 0.27 | 0.86 | 0.82 |
| Model5_1 | 1 | 0.87 | 0.68 | 0.99 | 0.29 | 0.88 | 0.80 |
| Model5_2 | 2 | 0.89 | 0.69 | 0.99 | 0.32 | 0.89 | 0.78 |
| Model5_3 | 3 | 0.89 | 0.70 | 0.98 | 0.34 | 0.91 | 0.76 |
| Model5_4 | 4 | 0.90 | 0.71 | 0.98 | 0.35 | 0.91 | 0.75 |
| Model5_5 | 5 | 0.92 | 0.73 | 0.98 | 0.39 | 0.93 | 0.72 |
| Model5_6 | 6 | 0.93 | 0.74 | 0.98 | 0.42 | 0.94 | 0.70 |
| Model5_7 | 7 | 0.94 | 0.76 | 0.98 | 0.47 | 0.95 | 0.67 |
| Model5_8 | 8 | 0.94 | 0.77 | 0.98 | 0.51 | 0.96 | 0.64 |
| Model5_9 | 9 | 0.95 | 0.78 | 0.98 | 0.54 | 0.97 | 0.62 |
Optimal results of Models 4 and 5 on the validation set
| Model name | Construction method | Optimal Vote difference | Number of optimal feature | Negative training n sample | Positive training n sample | Negative validation n sample | Positive validation n sample | Validation accuracy | Validation F1 score |
|---|---|---|---|---|---|---|---|---|---|
| Model4 | Easy ensemble | 15 | 45 | 175,109 | 10,714 | 43,261 | 2732 | 0.95 | 0.78 |
| Model5 | W-easy ensemble | 9 | 45 | 175,109 | 10,714 | 43,261 | 2732 | 0.95 | 0.78 |
Results of all models on the test set
| Accuracy | F1 score | Negative precision | Positive precision | Specificity | Sensitivity | AUC | |
|---|---|---|---|---|---|---|---|
| Model1 | 0.71 | 0.69 | 0.64 | 0.98 | 0.99 | 0.44 | 0.90 |
| Model2 | 0.70 | 0.68 | 0.63 | 0.94 | 0.97 | 0.42 | 0.78 |
| Model3 | 0.83 | 0.83 | 0.81 | 0.87 | 0.88 | 0.80 | 0.91 |
| Model4 | 0.79 | 0.78 | 0.71 | 0.94 | 0.96 | 0.61 | 0.91 |
| Model5 | 0.80 | 0.80 | 0.73 | 0.93 | 0.95 | 0.65 | 0.91 |
Fig. 4ROC curve and key evaluation indicators
Key features used for MI risk prediction
| Feature name | Description | Normal reference (unit) |
|---|---|---|
| TNT | Troponin-T | 0–14 ng/L |
| UD | Urodilatin | 0–227 pg/ml |
| MB | Myoglobin | 20–80 ng/ml |
| ALB | Albumin | 35–50 g/L |
| TC | Total cholesterol | 2.9–6 mmol/L |
| Cl | Plasma chlorine | 96–106 mmol/L |
| CK-MB | creatine kinase isoenzymes-MB | 0–4.94 ng/ml |
| Cr | Serum creatinine | 54–106 umol/L |
| MONO% | Monocyte percent | 3–8% |
| FPG | Fasting plasma glucose | 3.9–6.1 mmol/L |
| DBil | Direct bilirubin | 0–6.8 umol/L |
| AG | Anion gap | 8–16 mmol/L |
| IBil | Indirect bilirubin | 1.7–10.2 umol/L |
| Age | ||
| Sex |