| Literature DB >> 34873413 |
Rosario Megna1, Mario Petretta2, Roberta Assante3, Emilia Zampella3, Carmela Nappi3, Valeria Gaudieri3, Teresa Mannarino3, Adriana D'Antonio3, Roberta Green3, Valeria Cantoni3, Parthiban Arumugam4, Wanda Acampa1,3, Alberto Cuocolo3.
Abstract
Traditional approach for predicting coronary artery disease (CAD) is based on demographic data, symptoms such as chest pain and dyspnea, and comorbidity related to cardiovascular diseases. Usually, these variables are analyzed by logistic regression to quantifying their relationship with the outcome; nevertheless, their predictive value is limited. In the present study, we aimed to investigate the value of different machine learning (ML) techniques for the evaluation of suspected CAD; having as gold standard, the presence of stress-induced ischemia by 82Rb positron emission tomography/computed tomography (PET/CT) myocardial perfusion imaging (MPI) ML was chosen on their clinical use and on the fact that they are representative of different classes of algorithms, such as deterministic (Support vector machine and Naïve Bayes), adaptive (ADA and AdaBoost), and decision tree (Random Forest, rpart, and XGBoost). The study population included 2503 consecutive patients, who underwent MPI for suspected CAD. To testing ML performances, data were split randomly into two parts: training/test (80%) and validation (20%). For training/test, we applied a 5-fold cross-validation, repeated 2 times. With this subset, we performed the tuning of free parameters for each algorithm. For all metrics, the best performance in training/test was observed for AdaBoost. The Naïve Bayes ML resulted to be more efficient in validation approach. The logistic and rpart algorithms showed similar metric values for the training/test and validation approaches. These results are encouraging and indicate that the ML algorithms can improve the evaluation of pretest probability of stress-induced myocardial ischemia.Entities:
Mesh:
Year: 2021 PMID: 34873413 PMCID: PMC8643229 DOI: 10.1155/2021/3551756
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Clinical characteristics of cohort according to MPI outcome.
| Normal ( | Ischemic ( |
| |
|---|---|---|---|
| Age, | <0.001 | ||
| <55 | 777 (39) | 84 (17) | |
| 55-65 | 603 (30) | 146 (29) | |
| >65 | 622 (31) | 271 (54) | |
| Male gender, | 881 (44) | 334 (67) | <0.001 |
| Body mass index ≥30, | 1024 (51) | 258 (52) | 0.93 |
| Chest pain, | <0.001 | ||
| Typical | 678 (34) | 114 (23) | |
| Atypical | 256 (13) | 87 (17) | |
| Noncardiac∗ | 1068 (53) | 300 (60) | |
| Diabetes, | 479 (24) | 187 (37) | <0.001 |
| Dyspnea, | 446 (22) | 139 (28) | <0.05 |
| Family history of CAD, | 945 (47) | 199 (40) | <0.005 |
| Hypertension, | 1361 (68) | 401 (80) | <0.005 |
| Hyperlipidemia, | 1210 (60) | 343 (69) | <0.005 |
| Smoking, | 557 (28) | 144 (29) | 0.72 |
| Diagnostic question, | <0.001 | ||
| Diagnostic evaluation | 1642 (82) | 370 (74) | |
| Presurgery evaluation | 360 (18) | 131 (26) |
∗Considering noncardiac patients as the reference. §Considering diagnostic evaluation patients as the reference.
Figure 1Correlation matrix of the features used. The matrix elements are displayed in hierarchical clustering order. The numbers indicate the Spearman ρ coefficient between two features.
Figure 2Importance of the features for each ML algorithm. ADA, AdaBoost, and Naïve Bayesian features importance were grouped into a single bar plot as the values for the two adaptive algorithms turned out to be equals, and Naïve Bayesian values differed with them by less than 5%.
Values used for tuning of parameters for each ML technique.
| Parameter | Parameter space | Chosen value | |
|---|---|---|---|
| ADA | Number of trees | 10, 25, 50, 100, 200 | 25 |
| Max tree depth | 5, 10, 20, 50 | 10 | |
| Learning rate | 0.001, 0.005, 0.01, 0.05, 0.1, 0.5 | 0.01 | |
|
| |||
| AdaBoost | Number of trees | 10, 25, 50, 100, 200 | 50 |
| Method | AdaBoost.M1, real AdaBoost | AdaBoost.M1 | |
|
| |||
| Logistic | Family | Binomial | Binomial |
|
| |||
| Naïve Bayes | Laplace correction | 0, 0.5, 1.0 | 0 |
| Distribution type (kernel) | True, false | False | |
| Bandwidth adjustment | 0.01, 0.05, 0.1, 0.5, 1.0 | 0.1 | |
|
| |||
| Random Forest | Number of randomly selected predictors | 3, 5, 10, 20 | 10 |
|
| |||
| Rpart | Minimum number of observations in a node | 10, 15, 30 | 15 |
| Minimum number of observations in any leaf node | 3, 5, 10 | 5 | |
| Max tree depth | 3, 5, 10, 20 | 10 | |
| Complexity parameter of the tree | 0.0001, 0.001, 0.01, 0.1 | 0.001 | |
|
| |||
| SVM | Kernel | Linear, radial, sigmoid | Sigmoid |
| Parameter needed for sigmoid | 0.05, 0.1, 0.25, 0.5 | 0.1 | |
| Cost | 0.5, 1, 2, 5 | 1 | |
|
| |||
| XGBoost | Number of trees | 25, 50, 100, 200 | 100 |
| Max tree depth | 5, 10, 20 | 10 | |
| Learning rate | 0.001, 0.005, 0.01, 0.05, 0.1, 0.5 | 0.01 | |
| Subsamples | 0.5, 0.75, 1 | 1 | |
Metrics obtained from the ML techniques, evaluated on training/test and validation approaches.
| Training/test ( | Validation ( | |||||||
|---|---|---|---|---|---|---|---|---|
| Accuracy (%) | Sensitivity (%) | Specificity (%) | AUROC (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) | AUROC (%) | |
| ADA | 88 | 48 | 97 | 90 | 76 | 26 | 89 | 68 |
| AdaBoost | 89 | 67 | 95 | 95 | 71 | 23 | 87 | 66 |
| Logistic | 80 | 5 | 98 | 72 | 80 | 7 | 98 | 75 |
| Naïve Bayes | 77 | 23 | 91 | 70 | 80 | 27 | 92 | 73 |
| Random Forest | 89 | 51 | 98 | 93 | 75 | 21 | 89 | 65 |
| Rpart | 82 | 27 | 96 | 75 | 76 | 17 | 91 | 70 |
| SVM | 72 | 13 | 87 | 61 | 77 | 21 | 91 | 65 |
| XGBoost | 83 | 27 | 97 | 83 | 77 | 18 | 92 | 69 |
Figure 3Comparison among the ROC curves of the eight ML techniques considered. The ML performances are reported separately for the training/test approach (a) and validation approach (b). Parenthesis are reported the AUROC values.
Figure 4Decision tree obtained by rpart algorithm. Each node or leaf is reported the prevalence concerning MPI outcome (nor: normal; isch: ischemic), the ratio between the number of prevalent and total patients, and the relative percentage.