| Literature DB >> 35928935 |
Juntae Kim1, Su Yeon Lee1, Byung Hee Cha2, Wonseop Lee2, JiWung Ryu1, Young Hak Chung1, Dongmin Kim1, Seong-Hoon Lim1, Tae Soo Kang1, Byoung-Eun Park1, Myung-Yong Lee1, Sungsoo Cho3.
Abstract
Background: In patients with suspected obstructive coronary artery disease (CAD), evaluation using a pre-test probability model is the key element for diagnosis; however, its accuracy is controversial. This study aimed to develop machine learning (ML) models using clinically relevant biomarkers to predict the presence of stable obstructive CAD and to compare ML models with an established pre-test probability of CAD models.Entities:
Keywords: artificial intelligence; coronary artery disease; machine learning; personalized medicine; stable angina pectoris
Year: 2022 PMID: 35928935 PMCID: PMC9343708 DOI: 10.3389/fcvm.2022.933803
Source DB: PubMed Journal: Front Cardiovasc Med ISSN: 2297-055X
FIGURE 1Flowchart of the study population and process. CAD, coronary artery disease.
Baseline characteristics.
| Features | Total population ( | Control group ( | Case group ( | |
| Age (years) | 63.0 ± 11.8 | 58.6 ± 12.4 | 65.6 ± 10.7 | < 0.001 |
| Male gender | 806 (59.4%) | 240 (48.4%) | 566 (65.7%) | < 0.001 |
| Hypertension | 1003 (73.9%) | 310 (62.5%) | 693 (80.4%) | < 0.001 |
| Diabetes mellitus | 572 (42.1%) | 148 (29.8%) | 424 (49.2%) | < 0.001 |
| Dyslipidemia | 1018 (77.6%) | 367 (74.0%) | 691 (80.2%) | 0.010 |
| Cerebrovascular accident | 112 (8.2%) | 37 (7.5%) | 75 (8.7%) | 0.485 |
| Chronic kidney disease | 63 (4.6%) | 9 (1.8%) | 54 (6.3%) | < 0.001 |
| Smoking | 542 (39.9%) | 158 (31.9%) | 384 (44.5%) | < 0.001 |
| Non-smoking | 816 (60.1%) | 338 (68.1%) | 478 (55.5%) | |
| Current-smoking | 248 (18.3%) | 68 (13.7%) | 180 (20.9%) | |
| Ex-smoker | 294 (21.6%) | 90 (18.1%) | 204 (23.7%) | |
| BMI (kg/m2) | 25.1 ± 3.5 | 25.2 ± 3.7 | 25.0 ± 3.4 | 0.304 |
| Systolic blood pressure (mmHg) | 135.2 ± 19.6 | 132.3 ± 18.7 | 136.9 ± 20.0 | < 0.001 |
| Diastolic blood pressure (mmHg) | 81.4 ± 13.5 | 82.0 ± 13.6 | 81.1 ± 13.4 | 0.257 |
| Hemoglobin (g/dL) | 13.4 (12.3, 14.5) | 13.3 (12.4, 14.3) | 13.5 (12.1, 14.5) | 0.609 |
| Hematocrit (%) | 39.7 (36.5, 42.5) | 39.6 (37.0, 42.2) | 39.7 (36.1, 42.7) | 0.570 |
| Creatinine clearance (mL/min/1.73 m2) | 79.9 (60.2, 101.3) | 88.8 (68.9, 112.7) | 75.8 (54.3, 95.9) | < 0.001 |
| Total cholesterol (mg/dL) | 159.0 (135.0, 189.0) | 159.0 (137.0, 188.5) | 159.0 (134.0, 189.0) | 0.643 |
| LDL cholesterol (mg/dL) | 88.0 (65.2, 116.2) | 90.6 (66.8, 116.0) | 87.0 (64.5, 116.4) | 0.702 |
| HDL cholesterol (mg/dL) | 42.0 (35.0, 51.0( | 46.0 (37.0, 56.0) | 40.0 (34.0, 48.0) | < 0.001 |
| Triglyceride (mg/dL) | 130.0 (88.0, 190.0) | 115.0 (77.8, 177.2) | 136.5 (95.2, 200.0) | < 0.001 |
| Glucose (mg/dL) | 120.5 (103.0, 154.0) | 112.0 (102.0, 135.8) | 126.5 (104.0, 162.8) | < 0.001 |
| HbA1c (%) | 5.9 (5.5, 6.5) | 5.7 (5.5, 6.1) | 6.0 (5.6, 6.8) | < 0.001 |
| Troponin T (ng/mL) | 0.010 (0.005, 0.020) | 0.010 (0.003, 0.010) | 0.010 (0.007, 0.021) | 0.036 |
| LDH (mg/dL) | 201.0 (177.0, 242.2) | 205.0 (179.5, 243.0) | 200.0 (176.0, 242.0) | 0.222 |
| NT-proBNP (pg/mL) | 89.9 (36.5, 526.2) | 60.0 (25.1, 254.6) | 117.7 (47.9, 711.8) | 0.908 |
Values are n (%), mean ± SD (standard deviation), or median (Q1, Q3). BMI, body mass index; CRP, C-reactive protein; HbA1c, hemoglobin A1c; HDL, high-density lipoprotein; LDH, lactate dehydrogenase; LDL, low-density lipoprotein; NT-proBNP, N-terminal pro-brain natriuretic peptide.
FIGURE 2Receiver operating characteristic curves for the machine learning models and established pre-test probability of CAD models. (A) Comparing the eight machine learning models. (B) Comparing the CatBoost model and the established pre-test probability of CAD models. AUROC, area under the receiver operating characteristics; CAD, coronary artery disease; GBM, gradient boosting machine; XG, extreme gradient.
Comparison of performance between risk prediction models.
| Models | AUROC | MCC | Accuracy | Sensitivity | Specificity | PPV | NPV | F1 |
| CatBoost |
|
|
| 0.783 | 0.674 |
| 0.614 | 0.803 |
| XGBoost |
| 0.399 | 0.724 | 0.767 | 0.641 | 0.807 | 0.584 | 0.786 |
| LightGBM | 0.789 | 0.403 | 0.724 | 0.761 | 0.652 | 0.811 | 0.583 | 0.785 |
| Random forest | 0.742 | 0.413 | 0.728 | 0.761 | 0.663 | 0.815 | 0.587 | 0.787 |
| Gradient boost | 0.732 | 0.371 | 0.710 | 0.750 | 0.630 | 0.799 | 0.563 | 0.774 |
| Linear SVM | 0.721 | 0.373 | 0.699 | 0.706 | 0.685 | 0.814 | 0.543 | 0.756 |
| MLP | 0.728 | 0.379 | 0.710 | 0.739 | 0.652 | 0.806 | 0.561 | 0.771 |
| CAD consortium clinical | 0.727 | 0.313 | 0.676 | 0.706 | 0.620 | 0.784 | 0.518 | 0.743 |
| CAD consortium basic | 0.715 | 0.223 | 0.559 | 0.444 |
| 0.800 | 0.419 | 0.571 |
| Diamond-Forrester score | 0.687 | 0.271 | 0.706 |
| 0.261 | 0.712 |
|
|
| K-nearest neighbor | 0.704 | 0.313 | 0.676 | 0.706 | 0.620 | 0.784 | 0.518 | 0.743 |
AUROC, area under the receiver operating characteristics; CAD, coronary artery disease; GBM, gradient boosting machine; MCC, Matthews correlation coefficients; NPV, negative predictive value; PPV, positive predictive value; SVM, support vector machine; XG, extreme gradient. The bold values indicate the best performance of the 11 models.
FIGURE 3Feature importance ranking. (A) Mean SHAP value of features. (B) Impact on CatBoost model output of SHAP value. HbA1c, hemoglobin A1c; HDL, high-density lipoprotein; LDH, lactate dehydrogenase; SHAP, SHapley Additive exPlanations; Troponin T, high-sensitivity cardiac troponin T.
FIGURE 4SHAP dependence plots of the CatBoost model. HbA1c, hemoglobin A1c; HDL, high-density lipoprotein; LDH, lactate dehydrogenase; SHAP, SHapley Additive exPlanations; Troponin T, high-sensitivity cardiac troponin T.