| Literature DB >> 35741275 |
Vassiliki I Kigka1,2, Eleni Georga1,2, Vassilis Tsakanikas1,2, Savvas Kyriakidis1,2, Panagiota Tsompou1, Panagiotis Siogkas1,2, Lampros K Michalis3, Katerina K Naka3, Danilo Neglia4, Silvia Rocchiccioli5, Gualtiero Pelosi5, Dimitrios I Fotiadis1,2, Antonis Sakellarios1,2.
Abstract
The prediction of obstructive atherosclerotic disease has significant clinical meaning for the decision making. In this study, a machine learning predictive model based on gradient boosting classifier is presented, aiming to identify the patients of high CAD risk and those of low CAD risk. The machine learning methodology includes five steps: the preprocessing of the input data, the class imbalance handling applying the Easy Ensemble algorithm, the recursive feature elimination technique implementation, the implementation of gradient boosting classifier, and finally the model evaluation, while the fine tuning of the presented model was implemented through a randomized search optimization of the model's hyper-parameters over an internal 3-fold cross-validation. In total, 187 participants with suspicion of CAD previously underwent CTCA during EVINCI and ARTreat clinical studies and were prospectively included to undergo follow-up CTCA. The predictive model was trained using imaging data (geometrical and blood flow based) and non-imaging data. The overall predictive accuracy of the model was 0.81, using both imaging and non-imaging data. The innovative aspect of the proposed study is the combination of imaging-based data with the typical CAD risk factors to provide an integrated CAD risk-predictive model.Entities:
Keywords: coronary artery disease; coronary artery disease risk stratification; machine learning models; noninvasive cardiovascular imaging
Year: 2022 PMID: 35741275 PMCID: PMC9221964 DOI: 10.3390/diagnostics12061466
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Imaging and non-imaging data utilized. * Imaging data from CTCA.
| Type | Features | |
|---|---|---|
| Imaging data * | Geometrical vasculature | Degree of Stenosis, Minimal Lumen Area, Minimal Lumen Diameter, Plaque Burden, Calcified Plaque Volume, Noncalcified Plaque Volume, SmartFFR Index, Number of Calcified Plaques, Number of Non-calcified Plaques |
| Non-imaging data | Demographics | Age, Gender |
| Risk factors | Family History of CAD, Hypertension, Diabetes, Dyslipidemia, Smoking, Obesity, Metabolic Syndrome, Past Smokers | |
| Biohumoral Markers | Creatinine, Uric Acid, Glucose, Total Cholesterol, HDL, LDL, Triglycerides, Insulin, Aspartate Aminotransferase, Alanine Aminotransferase, Alkaline Phosphatase, Gamma-glutamyl Transferase, Hs-C Reactive Protein, Interleukin-6, TSH, fT3, fT4, Leptin, MMP2 Protein Plasma, MMP9 Protein Plasma, hs-cardiac Troponin T, N terminal Fragment of Pro-brain Natriuretic Peptide, Lipidomics, Metabolomics | |
Figure 1Flow chart depicting the distribution of the cohort in CAD-severity groups based on the CTCA imaging at the follow-up step. in total, 287 patient imaging data (125 in Class 1 and 62 in Class 2) were analyzed. (CAGB, coronary artery bypass graft surgery; CAD, coronary artery disease).
Definition of the utilized CAD risk classes. (CAD, coronary artery disease).
| Proposed Classes | Recommended Stenosis Grading Scale of CAD | Quantitative Stenosis |
|---|---|---|
|
| 0: Normal | No luminal stenosis |
| 1: Minimal | Plaque with <25% stenosis | |
| 2: Mild | 25–49% stenosis | |
|
| 3: Moderate | 50–69% stenosis |
| 4: Severe | 70–99% stenosis | |
| 5: Occluded | 100% stenosis |
Figure 2Overall pipeline of the proposed methodology. The input is based on clinical data, laboratory test, and imaging data provided by the three-dimensional reconstruction of the artery and the blood-flow modelling. Different machine learning models were implemented for the prediction of coronary artery disease presence.
Evaluation of the CAD risk-prediction problem over 10-fold using imaging and non-imaging data (AUC, area under curve).
| Folds | Balanced Accuracy | Negative Predictive Value | Positive Predictive Value | ROC AUC | Sensitivity | Specificity |
|---|---|---|---|---|---|---|
| Fold #0 | 0.73 | 0.78 | 0.67 | 0.60 | 0.67 | 0.78 |
| Fold #1 | 0.75 | 0.86 | 0.63 | 0.82 | 0.84 | 0.67 |
| Fold #2 | 0.89 | 1 | 0.72 | 0.92 | 1 | 0.78 |
| Fold #3 | 0.69 | 0.78 | 0.6 | 0.72 | 0.6 | 0.78 |
| Fold #4 | 0.84 | 1 | 0.63 | 0.83 | 1 | 0.67 |
| Fold #5 | 0.84 | 1 | 0.63 | 0.89 | 1 | 0.67 |
| Fold #6 | 0.95 | 1 | 0.84 | 1 | 1 | 0.89 |
| Fold #7 | 0.78 | 1 | 0.56 | 0.78 | 1 | 0.56 |
| Fold #8 | 0.8 | 0.86 | 0.72 | 0.88 | 0.84 | 0.75 |
| Fold #9 | 0.8 | 0.86 | 0.72 | 0.75 | 0.84 | 0.75 |
| Mean ± std |
|
|
|
|
|
|
Figure 3Normalized Confusion Matrix regarding the Gradient Boosting Classification algorithm for the CAD risk prediction using imaging and non-imaging data. The percentage of the true negative predicted cases is 73%, whereas the percentage of the true positives cases is 87%.
Evaluation of the CAD risk-prediction problem over 10-fold using only non-imaging data (AUC, area under curve).
| Folds | Balanced Accuracy | Negative Predictive Value | Positive Predictive Value | ROC AUC | Sensitivity | Specificity |
|---|---|---|---|---|---|---|
| Fold #0 | 0.5 | 0.6 | 0.4 | 0.33 | 0.33 | 0.67 |
| Fold #1 | 0.56 | 0.64 | 0.5 | 0.65 | 0.33 | 0.78 |
| Fold #2 | 0.72 | 1 | 0.5 | 0.89 | 1 | 0.44 |
| Fold #3 | 0.69 | 0.78 | 0.6 | 0.69 | 0.6 | 0.78 |
| Fold #4 | 0.79 | 0.88 | 0.67 | 0.87 | 0.8 | 0.78 |
| Fold #5 | 0.78 | 1 | 0.56 | 0.78 | 1 | 0.56 |
| Fold #6 | 0.79 | 0.88 | 0.67 | 0.8 | 0.8 | 0.78 |
| Fold #7 | 0.72 | 1 | 0.5 | 0.76 | 1 | 0.44 |
| Fold #8 | 0.6 | 0.64 | 0.67 | 0.79 | 0.33 | 0.88 |
| Fold #9 | 0.71 | 0.75 | 0.67 | 0.83 | 0.67 | 0.75 |
| Mean ± std |
|
|
|
|
|
|
Figure 4Feature importance based on mean SHAP values. The number of the existing calcified plaques and the highest coronary degree of stenosis are indicated as the most significant features.
Figure 5Input Features Contribution Table (blue, features with negative effect; yellow, features with positive effect). The most significant features that contribute positively to the output target are thyroid stimulating hormone, medication therapy of beta blockers, aspartate aminotransferase, diabetes, and minimum lumen area, whereas the most significant feature with negative effect on the output is the number of the calcified plaques at the baseline analysis of patient imaging.