| Literature DB >> 35265932 |
J D Schwalm1,2, Shuang Di3,4, Tej Sheth1,2, Madhu K Natarajan1,2, Erin O'Brien1, Tara McCready1, Jeremy Petch1,2,3,5.
Abstract
Background: Conventional clinical risk scores and diagnostic algorithms are proving to be suboptimal in the prediction of obstructive coronary artery disease, contributing to the low diagnostic yield of invasive angiography. Machine learning could help better predict which patients would benefit from invasive angiography vs other noninvasive diagnostic modalities. Objective: To reduce patient risk and cost to the healthcare system by improving the diagnostic yield of invasive coronary angiography through optimized outpatient selection.Entities:
Keywords: Coronary angiography; Coronary artery disease; Coronary computed tomographic angiography; Machine learning; Prediction model
Year: 2021 PMID: 35265932 PMCID: PMC8890355 DOI: 10.1016/j.cvdhj.2021.12.001
Source DB: PubMed Journal: Cardiovasc Digit Health J ISSN: 2666-6936
Figure 1Study population profile.
Baseline characteristics of patients in the study cohort
| Characteristics | Overall | Nonsignificant CAD | Significant CAD | Missing | ||||
|---|---|---|---|---|---|---|---|---|
| (N = 29,688) | (N = 13,576) | (N = 16,112) | ||||||
| Demographic characteristics | ||||||||
| Sex, n (%) | <.001 | 1 | ||||||
| Female | 11,678 | (39.3) | 6610 | (48.7) | 5068 | (31.5) | ||
| Age (years), mean (SD) | 65.6 | (11.4) | 64.2 | (11.5) | 66.8 | (11.1) | <.001 | 0 |
| Ethnicity, n (%) | .290 | 26,002 | ||||||
| White | 3478 | (94.4) | 1875 | (94.7) | 1603 | (94.0) | ||
| South Asian | 68 | (1.8) | 29 | (1.5) | 39 | (2.3) | ||
| Asian | 38 | (1.0) | 17 | (0.9) | 21 | (1.2) | ||
| Aboriginal | 34 | (0.9) | 19 | (1.0) | 15 | (0.9) | ||
| Black | 21 | (0.6) | 14 | (0.7) | 7 | (0.4) | ||
| Socioeconomic status | ||||||||
| Residential instability, mean (SD) | 0.1 | (1.0) | 0.1 | (1.0) | 0.1 | (1.0) | .026 | 530 |
| Material deprivation, mean (SD) | 0.1 | (1.0) | 0.1 | (1.0) | 0.1 | (1.0) | .419 | 530 |
| Ethnic concentration, mean (SD) | -0.4 | (0.6) | -0.4 | (0.6) | -0.4 | (0.6) | .622 | 530 |
| Dependency, mean (SD) | 0.4 | (1.2) | 0.4 | (1.2) | 0.4 | (1.2) | .002 | 530 |
| Patient referral information | ||||||||
| Primary reason, n (%) | <.001 | 0 | ||||||
| Coronary disease | 25,856 | (87.1) | 11,739 | (86.5) | 14,117 | (87.6) | ||
| Other | 3832 | (12.9) | 1837 | (13.5) | 1995 | (12.4) | ||
| Primary reason type, n (%) | <.001 | 3052 | ||||||
| Elective, stable coronary disease | 13779 | (51.7) | 5082 | (42.2) | 8697 | (59.6) | ||
| Rule out CAD | 9150 | (34.4) | 5560 | (46.2) | 3590 | (24.6) | ||
| Other | 3707 | (13.9) | 1396 | (11.6) | 2311 | (15.8) | ||
| Translator required, n (%) | .554 | 7780 | ||||||
| Yes | 161 | (0.7) | 71 | (0.7) | 90 | (0.8) | ||
| Dye allergy, n (%) | .002 | 1353 | ||||||
| Yes | 315 | (1.1) | 170 | (1.3) | 145 | (0.9) | ||
| Anthropometric measures | ||||||||
| Height (cm), mean (SD) | 169.5 | (11.2) | 169.0 | (11.4) | 170.0 | (10.9) | <.001 | 289 |
| Weight (kg), mean (SD) | 86.2 | (21.3) | 86.6 | (21.5) | 85.9 | (21.1) | .005 | 212 |
| Clinical symptoms and risk factors | ||||||||
| Ischemic change, n (%) | <.001 | 1864 | ||||||
| Persistent | 2841 | (10.2) | 1012 | (8.0) | 1829 | (12.0) | ||
| Transient with pain | 336 | (1.2) | 105 | (0.8) | 231 | (1.5) | ||
| Transient without pain | 127 | (0.5) | 48 | (0.4) | 79 | (0.5) | ||
| Exercise ECG risk, n (%) | <.001 | 0 | ||||||
| High | 6294 | (21.2) | 2178 | (16.0) | 4116 | (25.5) | ||
| Low | 5531 | (18.6) | 2886 | (21.3) | 2645 | (16.4) | ||
| Functional imaging risk, n (%) | <.001 | 384 | ||||||
| High | 8358 | (28.5) | 3147 | (23.6) | 5211 | (32.7) | ||
| Low | 5878 | (20.1) | 3137 | (23.5) | 2741 | (17.2) | ||
| LV method, n (%) | <.001 | 1378 | ||||||
| Echo | 22,416 | (79.2) | 10,792 | (80.6) | 11,624 | (77.9) | ||
| Other | 3029 | (10.7) | 1185 | (8.9) | 1844 | (12.4) | ||
| Not done | 2865 | (10.1) | 1407 | (10.5) | 1458 | (9.8) | ||
| LV function, n (%) | <.001 | 3054 | ||||||
| ≥50% | 19,773 | (74.2) | 9577 | (79.4) | 10,196 | (70.0) | ||
| 35%–49% | 3472 | (13.0) | 1374 | (11.4) | 2098 | (14.4) | ||
| ≤34% | 2013 | (7.6) | 915 | (7.6) | 1098 | (7.5) | ||
| LV function value, mean (SD) | 53.7 | (12.9) | 53.9 | (13.2) | 53.5 | (12.5) | .219 | 23,904 |
| Creatinine, n (%) | .985 | 279 | ||||||
| Known | 29,360 | (99.8) | 13,442 | (99.8) | 15,918 | (99.8) | ||
| Creatinine (μmol/L), mean (SD) | 100.2 | (117.2) | 96.9 | (112.8) | 103.1 | (120.7) | <.001 | 458 |
| CCS class, n (%) | <.001 | 0 | ||||||
| 0 | 8990 | (30.3) | 4736 | (34.9) | 4254 | (26.4) | ||
| 1 | 4568 | (15.4) | 2274 | (16.8) | 2294 | (14.2) | ||
| 2 | 8805 | (29.7) | 3873 | (28.5) | 4932 | (30.6) | ||
| 3 | 6189 | (20.8) | 2261 | (16.7) | 3928 | (24.4) | ||
| 4 | 1136 | (3.8) | 432 | (3.2) | 704 | (4.4) | ||
| NYHA class, n (%) | <.001 | 16 | ||||||
| 1 | 19,008 | (64.1) | 8375 | (61.7) | 10,633 | (66.0) | ||
| 2 | 6433 | (21.7) | 3073 | (22.7) | 3360 | (20.9) | ||
| 3 | 2706 | (9.1) | 1326 | (9.8) | 1380 | (8.6) | ||
| 4 | 371 | (1.3) | 181 | (1.3) | 190 | (1.2) | ||
| Medical history | ||||||||
| History of MI, n (%) | <.001 | 0 | ||||||
| Yes | 3282 | (11.1) | 848 | (6.2) | 2434 | (15.1) | ||
| Recent MI, n (%) | .694 | 0 | ||||||
| Yes | 144 | (0.5) | 63 | (0.5) | 81 | (0.5) | ||
| History of cerebrovascular disease, n (%) | <.001 | 590 | ||||||
| Yes | 2184 | (7.5) | 818 | (6.2) | 1366 | (8.6) | ||
| History of peripheral vascular disease, n (%) | <.001 | 13 | ||||||
| Yes | 1618 | (5.5) | 472 | (3.5) | 1146 | (7.1) | ||
| Possible intracardiac thrombus, n (%) | .94 | 557 | ||||||
| Yes | 50 | (0.2) | 22 | (0.2) | 28 | (0.2) | ||
| History of infective endocarditis, n (%) | .005 | 51 | ||||||
| Yes | 32 | (0.1) | 23 | (0.2) | 9 | (0.1) | ||
| Active endocarditis, n (%) | .076 | 29,656 | ||||||
| Yes | 7 | (21.9) | 3 | (13.0) | 4 | (44.4) | ||
| (n = 29,688) | (n = 13,576) | (n = 16,112) | ||||||
| Congenital heart disease, n (%) | .133 | 7787 | ||||||
| Yes | 112 | (0.5) | 70 | (0.6) | 42 | (0.4) | ||
| History of congestive heart failure, n (%) | .003 | 33 | ||||||
| Yes | 2759 | (9.3) | 1336 | (9.8) | 1423 | (8.8) | ||
| Anticoagulant, n (%) | <.001 | 0 | ||||||
| None | 26,092 | (87.9) | 11,645 | (85.8) | 14,447 | (89.7) | ||
| Dialysis, n (%) | .436 | 5 | ||||||
| Yes | 790 | (2.7) | 350 | (2.6) | 440 | (2.7) | ||
| Diabetes, n (%) | <.001 | 3 | ||||||
| Yes | 8671 | (29.2) | 3506 | (25.8) | 5165 | (32.1) | ||
| Diabetes control, n (%) | <.001 | 21,010 | ||||||
| On oral hypoglycemics | 4970 | (57.3) | 1950 | (56.2) | 3020 | (58.0) | ||
| Insulin treatment | 2506 | (28.9) | 1036 | (29.9) | 1470 | (28.2) | ||
| Managed by diet only | 906 | (10.4) | 409 | (11.8) | 497 | (9.5) | ||
| No treatment | 296 | (3.4) | 72 | (2.1) | 224 | (4.3) | ||
| Hypertension, n (%) | <.001 | 2 | ||||||
| Yes | 20,165 | (67.9) | 8656 | (63.8) | 11,509 | (71.4) | ||
| Hyperlipidemia, n (%) | <.001 | 6 | ||||||
| Yes | 20,431 | (68.8) | 8470 | (62.4) | 11,961 | (74.3) | ||
| COPD, n (%) | <.001 | 14 | ||||||
| Yes | 1863 | (6.3) | 942 | (6.9) | 921 | (5.7) | ||
| Tobacco habits | ||||||||
| History of smoking, n (%) | <.001 | 1070 | ||||||
| Never | 13,363 | (46.7) | 6385 | (49.4) | 6978 | (44.5) | ||
| Former | 9252 | (32.3) | 4084 | (31.6) | 5168 | (32.9) | ||
| Current | 6003 | (21.0) | 2455 | (19.0) | 3548 | (22.6) | ||
CAD = coronary artery disease; CCS = Canadian Cardiovascular Society; COPD = chronic obstructive pulmonary disease; ECG = electrocardiogram; LV = left ventricular; MI = myocardial infarction; NYHA = New York Heart Association.
Discrimination performance of the reference model and machine learning models
| Model | Primary metric | Secondary metrics | |||||
|---|---|---|---|---|---|---|---|
| AUROC | AUPRC | Accuracy | Sensitivity | Specificity | PPV | NPV | |
| LightGBM | 0.81 | 0.84 | 0.73 | 0.75 | 0.71 | 0.75 | 0.70 |
| XGBoost | 0.79 | 0.82 | 0.72 | 0.75 | 0.69 | 0.74 | 0.70 |
| Gradient boosted Decision trees | 0.79 | 0.82 | 0.72 | 0.75 | 0.69 | 0.74 | 0.70 |
| Random forests | 0.79 | 0.81 | 0.71 | 0.76 | 0.66 | 0.73 | 0.70 |
| Logistic regression (Lasso) | 0.78 | 0.81 | 0.72 | 0.75 | 0.69 | 0.74 | 0.70 |
| Logistic regression | 0.78 | 0.80 | 0.71 | 0.74 | 0.68 | 0.73 | 0.69 |
| Deep neural network | 0.77 | 0.79 | 0.70 | 0.75 | 0.64 | 0.71 | 0.68 |
| Reference model | 0.62 | 0.65 | 0.58 | 0.63 | 0.53 | 0.61 | 0.55 |
AUPRC = area under the precision-recall curve; AUROC = area under the receiver operating characteristics curve; LightGBM = Light Gradient Boosting Machine; NPV = negative predictive value; PPV = positive predictive value; XGBoost = eXtreme Gradient Boosting.
Reclassification performance of the LightGBM model against the reference model
| Reference model | LightGBM model | Net correctly reclassified | |
|---|---|---|---|
| Nonsignificant CAD | Significant CAD | ||
| Nonsignificant CAD | |||
| Nonsignificant CAD | 1168 | 287 | 16.61% |
| Significant CAD | 738 | 522 | |
| Significant CAD | |||
| Nonsignificant CAD | 442 | 742 | 11.23% |
| Significant CAD | 380 | 1659 | |
CAD = coronary artery disease; LightGBM = Light Gradient Boosting Machine.
Figure 2Estimated of the importance of each predictor according to the selected final model using permutation importance.
Figure 3Example of a true negative case (i.e. label and prediction result are both non-significant for CAD) by applying the LIME framework to the LightGBM model to generate an explanation for each prediction result, including the predicted probability and the contribution of each predictor to the prediction result.